Received: by 10.223.176.5 with SMTP id f5csp597291wra; Tue, 6 Feb 2018 04:21:38 -0800 (PST) X-Google-Smtp-Source: AH8x2262R1YwNjz9NG08WwMXAif9YtopnoodtyxprIJ8hsex557cfhUeagx5r5bhgMk2ixXizIVM X-Received: by 10.98.131.139 with SMTP id h133mr2241807pfe.155.1517919697942; Tue, 06 Feb 2018 04:21:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517919697; cv=none; d=google.com; s=arc-20160816; b=lJe85d1O2BXdFyx7zDndc2V2PNxiMGYAUmu9ut8Pwm7I3n7xfnJjbi6HQMnxnfAYm9 bzXLHB0fnedpT6YY5nFz6/JQB/eyOgUrol8G4J/qvLw3SYvlyDFsz+tkujWI67qrvWB1 XTXFUx4tWc5fGjcVItrCmi4oZvE5fyOzXqkQ4p90/wruFWiQz1dTylbkUAjXrZRO+FEc TkVw4KgihbwZdlZzBqIImAlx5a4U6lsNGdsurKJ4ogjhLG8djSuH/iLJxlBFNNpo77I+ Kq3vx70zqBU5N7k51SjVgqxDpIsT1UW/Uq64SIO95SzFgmhlYL6Pxqvso9CSTBNgLBWk 03hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:date:from:arc-authentication-results; bh=NTmwnyEXj5Ez/mGITX05GfVPhJVRHrjE5qvtTlZ/IjY=; b=J5gohnrQf4qCVp+1U45MbdeOExOXksGbCMo//Ps6N9lMUi+nU0T/9Vtw/h0xA8dDM4 PWuC0yWojG3vT+dp8VeKD3vIMw/yD+yHfEdEMvitbFGdJL/IJJUtezsjPW2gOQFy6Tu2 5UZwvpm06AeovqqF3inZryBXRFqXHTntgU4AGkTsi0iIghSEGfJfYvksVbPTWt2dY+v1 5VJOcEWGcS9NlV86ertk7TrIHTV8Yj8KEnJUjpNXaayzDFn3396cIz28laJ1G3lPrkLL ZR4mReNDrWk6fAnICLpCqx4/tVS+o/7ww8OdSNwM6ri8nhMJ0KGIo1n9n7Dm5Jzq8Y9J DLNw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b5-v6si1301640pls.587.2018.02.06.04.21.23; Tue, 06 Feb 2018 04:21:37 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752885AbeBFMS5 (ORCPT + 99 others); Tue, 6 Feb 2018 07:18:57 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:45716 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752849AbeBFMSs (ORCPT ); Tue, 6 Feb 2018 07:18:48 -0500 Received: from mail-wr0-f197.google.com ([209.85.128.197]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1ej2D0-0002dL-6W for linux-kernel@vger.kernel.org; Tue, 06 Feb 2018 12:18:46 +0000 Received: by mail-wr0-f197.google.com with SMTP id a63so1124462wrc.15 for ; Tue, 06 Feb 2018 04:18:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=NTmwnyEXj5Ez/mGITX05GfVPhJVRHrjE5qvtTlZ/IjY=; b=I4EScO3A+4euVbyDZT1ZmChHkjeKbnT0uBB4FCpLSA+u3Bs9Exn+ijaYLWOSdypPGE V21PUUU0wSRmyeiyzu3OBu69DnQ3nAtCceaAF/DLCWMFaPRp+p9gQGsX+LGo17bdrPQG thlU6+4stST4+OZYC7DPSbY9DYD3ex/ImX4oOE7DxEUyH/HJJADfP4z5MmYvI30ygM/i 45Hpo/NFLwNkMDen7/CEZxM0uxYieKy97bdsD+Ihlag8BXf/p3BkWHGOwZA7HKFRRO0l 41ZXRUEy3l+QN4zc4rI2IAEW+fnDhXQa2OQnVaTvG3b0BNafm8HNNVj8XG6qVcf8udAA lo7Q== X-Gm-Message-State: APf1xPBWROQS4MLynUxEuDQxSHPK0saq8F4XQmpHmFafQ8qJ4uRUwSsA v5sEnzECndEMLjMf3ErAqBrWWlxJpsHtclqOuStAEzIq9WfrexFYkuKdGf7IreRULDJP9L4RYzx KILXX7+pGgFnDX/I3THGvvDi/JplWI7tQTsSndb425A== X-Received: by 10.28.137.85 with SMTP id l82mr1731738wmd.109.1517919525786; Tue, 06 Feb 2018 04:18:45 -0800 (PST) X-Received: by 10.28.137.85 with SMTP id l82mr1731717wmd.109.1517919525460; Tue, 06 Feb 2018 04:18:45 -0800 (PST) Received: from gmail.com (eap104082.extern.uni-tuebingen.de. [134.2.104.82]) by smtp.gmail.com with ESMTPSA id 77sm14153945wmt.37.2018.02.06.04.18.44 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 06 Feb 2018 04:18:44 -0800 (PST) From: Christian Brauner X-Google-Original-From: Christian Brauner Date: Tue, 6 Feb 2018 13:18:44 +0100 To: Kirill Tkhai Cc: Christian Brauner , netdev@vger.kernel.org, stephen@networkplumber.org, w.bumiller@proxmox.com, ebiederm@xmission.com, jbenc@redhat.com, nicolas.dichtel@6wind.com, linux-kernel@vger.kernel.org, dsahern@gmail.com, davem@davemloft.net Subject: Re: [PATCH net 1/1 v2] rtnetlink: require unique netns identifier Message-ID: <20180206121843.GA23417@gmail.com> References: <20180205155550.21432-1-christian.brauner@ubuntu.com> <20180205155550.21432-2-christian.brauner@ubuntu.com> <2eac607b-e847-1b21-b3cb-6a45130138ee@virtuozzo.com> <20180205232438.GA8695@gmail.com> <2ca51c73-71c3-7d4c-d6fb-a550038518bc@virtuozzo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <2ca51c73-71c3-7d4c-d6fb-a550038518bc@virtuozzo.com> User-Agent: Mutt/1.9.3 (2018-01-21) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 06, 2018 at 01:49:10PM +0300, Kirill Tkhai wrote: > Hi, Christian, > > On 06.02.2018 02:24, Christian Brauner wrote: > > On Tue, Feb 06, 2018 at 12:47:46AM +0300, Kirill Tkhai wrote: > >> On 05.02.2018 18:55, Christian Brauner wrote: > >>> Since we've added support for IFLA_IF_NETNSID for RTM_{DEL,GET,SET,NEW}LINK > >>> it is possible for userspace to send us requests with three different > >>> properties to identify a target network namespace. This affects at least > >>> RTM_{NEW,SET}LINK. Each of them could potentially refer to a different > >>> network namespace which is confusing. For legacy reasons the kernel will > >>> pick the IFLA_NET_NS_PID property first and then look for the > >>> IFLA_NET_NS_FD property but there is no reason to extend this type of > >>> behavior to network namespace ids. The regression potential is quite > >>> minimal since the rtnetlink requests in question either won't allow > >>> IFLA_IF_NETNSID requests before 4.16 is out (RTM_{NEW,SET}LINK) or don't > >>> support IFLA_NET_NS_{PID,FD} (RTM_{DEL,GET}LINK) in the first place. > >>>> Signed-off-by: Christian Brauner > >>> --- > >>> ChangeLog v1->v2: > >>> * return errno when the specified network namespace id is invalid > >>> * fill in struct netlink_ext_ack if the network namespace id is invalid > >>> * rename rtnl_ensure_unique_netns_attr() to rtnl_ensure_unique_netns() to > >>> indicate that a request without any network namespace identifying attributes > >>> is also considered valid. > >>> > >>> ChangeLog v0->v1: > >>> * report a descriptive error to userspace via struct netlink_ext_ack > >>> * do not fail when multiple properties specifiy the same network namespace > >>> --- > >>> net/core/rtnetlink.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> 1 file changed, 69 insertions(+) > >>> > >>> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > >>> index 56af8e41abfc..c096c4ff9a00 100644 > >>> --- a/net/core/rtnetlink.c > >>> +++ b/net/core/rtnetlink.c > >>> @@ -1951,6 +1951,59 @@ static struct net *rtnl_link_get_net_capable(const struct sk_buff *skb, > >>> return net; > >>> } > >>> > >>> +/* Verify that rtnetlink requests supporting network namespace ids > >>> + * do not pass additional properties referring to different network > >>> + * namespaces. > >>> + */ > >>> +static int rtnl_ensure_unique_netns(const struct sock *sk, struct nlattr *tb[], > >>> + struct netlink_ext_ack *extack) > >>> +{ > >>> + int ret = -EINVAL; > >>> + struct net *net = NULL, *unique_net = NULL; > >>> + > >>> + /* Requests without network namespace ids have been able to specify > >>> + * multiple properties referring to different network namespaces so > >>> + * don't regress them. > >>> + */ > >>> + if (!tb[IFLA_IF_NETNSID]) > >>> + return 0; > >>> + > >>> + /* Caller operates on the current network namespace. */ > >>> + if (!tb[IFLA_NET_NS_PID] && !tb[IFLA_NET_NS_FD]) > >>> + return 0; > >>> + > >>> + unique_net = get_net_ns_by_id(sock_net(sk), nla_get_s32(tb[IFLA_IF_NETNSID])); > >>> + if (!unique_net) { > >>> + NL_SET_ERR_MSG(extack, "invalid network namespace id"); > >>> + return ret; > >>> + } > >>> + > >>> + if (tb[IFLA_NET_NS_PID]) { > >>> + net = get_net_ns_by_pid(nla_get_u32(tb[IFLA_NET_NS_PID])); > >>> + if (net != unique_net) > >>> + goto on_error; > >>> + } > >>> + > >>> + if (tb[IFLA_NET_NS_FD]) { > >>> + net = get_net_ns_by_fd(nla_get_u32(tb[IFLA_NET_NS_FD])); > >>> + if (net != unique_net) > >>> + goto on_error; > >>> + } > >>> + > >>> + ret = 0; > >>> + > >>> +on_error: > >>> + put_net(unique_net); > >>> + > >>> + if (net && !IS_ERR(net)) > >>> + put_net(net); > >> > >> 1)When we have tb[IFLA_NET_NS_PID and tb[IFLA_NET_NS_FD] both set and pointing > >> to the same net, this function increments net::count in get_net_ns_by_pid() and > >> in get_net_ns_by_fd(), i.e. twice. But only single put_net(net) will be called. > >> So, after this function net::count will be incremented by 1, and it never will > >> die. > > > > Thanks for spotting this, Kirill. > > > >> > >> 2)The whole approach does not seem good for me. The first reason is it's racy. > >> Even if rtnl_ensure_unique_netns() returns 0, this does not guarantees that > >> tb[IFLA_IF_NETNSID] and tb[IFLA_NET_NS_PID] will be point the same net later, > >> as the pid may die or do setns(). Racy check is worse than no check at all. > >> > >> The second reason is after this patch get_net_ns_by_id/get_net_ns_by_pid()/ > >> get_net_ns_by_fd() will be called twice: the first time is in your check > >> and the second time is where they are actually used. This is not good for > >> performance. > > > > If this is really a performance problem we can simply fix this by > > performing the check when the target network namespace is retrieved in > > each request. The intention for doing it in one function at the > > beginning of each request was to make it generic and easily > > understandable. > > I haven't measured the performance with stopwatch, of course, but this is > additional operations, and we should not use them unless they are really need. > The approach with get_net()/put_net() is racy and it does not solve the > problem. So it does not seem good for me despite it is generic. > > >> > >> What is the problem people pass several different tb[xxx] in one call? We > >> may just describe the order of tb[xxx] in man page and their priorities, > >> and ignore the rest after the first not zero tb[xxx] is found, and do that > >> in the place, where net from tb[xxx] in actually used. This is the thing > >> we already do. > >> > >> Comparing to classic Linux interface such as syscalls, it's usual behavior > >> for them to ignore one argument, when another is set. Nobody confuses. > > > > From what I gather from recent discussions I had here using pids and > > fds to perform operations on network namespaces in netlink requests is > > not the future. Specifically, using pids and fds will not be extended to > > existing or future requests that do not already support it. > > > > It also very much smells like a security liability if what you've > > outlined above is true: a user sends a request with a pid and the task > > dies and the pid gets recycled. Now, we can't easily fix this by simply > > ignoring pids and fds from here on since this would likely break a bunch > > of userspace programs but we can ensure that if a network namespace > > identifier is passed that no other way of retrieving the target network > > namespace is passed. Especially with requests that already support pids > > and fds. It's either that or reversing the order meaning that if a > > network namespace identifier is passed then it should take precedence > > over the other identifiers. Furthermore, this would also clearly > > indicate that netns ids are the preferred way to perform operations on > > network namespaces via netlink requests. > > If we really need this, can't we simply zero excess identifiers instead? > > void rtnl_kill_them_all(struct nlattr *tb[]) > { > if (!tb[IFLA_IF_NETNSID]) > return; > tb[IFLA_NET_NS_PID] = tb[IFLA_NET_NS_FD] = NULL; > } > > It's not racy and solves the problem you are solving. I take it that you haven't seen the first version of my patch which does exactly that: https://patchwork.kernel.org/patch/10196409/ :) I'm going to resend this one with the extack fixes added. Thanks! Christian > > > I also do not think that your suggestion of making guarantees in what > > order additional netlink properties are evaluated is a good one. I don't > > think we want to give userspace the impression that sticking a pid, fd, > > and netnsid into the same netlink request is something that we actively > > support. > > > > What is certainly a good point is that if pids and fds are as you said > > inherently racy then we shouldn't perform the check but do what my > > original patch did and simply refuse to combine netns ids with pids > > and/or fds. > > Thanks, > Kirill