Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp268964img; Mon, 18 Mar 2019 02:42:57 -0700 (PDT) X-Google-Smtp-Source: APXvYqwREzDezykLzEtwLdymldSi7OFUpjZ6JTQuFXkjCCffzOF8BSpBafLfXjHa2vu+G6uJxBeh X-Received: by 2002:a63:9752:: with SMTP id d18mr17098317pgo.0.1552902177898; Mon, 18 Mar 2019 02:42:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552902177; cv=none; d=google.com; s=arc-20160816; b=0fi65GvLl21rFhfsCVg3Xr1xoostQYQ4XZUGED5GxBiw2nI6LD7s9DTAyS+Wccy5Le dTlRPcTzm/TbunPGEL2n2yNclZHJ8rtE5l7es3kVirfYma9HDLUQWQ61do7I38cJlQCP DnLpA3uVJcOnVDNRsxzGqJH9r23mBfMcXcF9cl0awUmukmoKT52yhfu5cXY4A8tHyqaR LT0FM2Y9WZyszLi6hVrjQUcd0HMkDO8gqe7W6b/FINa4ev5AcGJAClVXUJIBhmqtHOEZ kquTALcxeIBEGord+MmjQy6wZeXROXIxEWZEXe4uwAElxAhaci67AuSDBEoUTgZCS6IC f78A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=wkIDi/VjknmNEBkKAUeiICBMoN2AhXwHcTCuNY4XHu8=; b=mWKFbzdxYVUiS9PrHQ7QRJE8sS9+dK8lYFMP6M85Ck197RSWIMTWtuQCKg7AUd5VZv vz4MZsoE63ZQt20jbeefnPkvc34aspnEIajLPEm0Notdz+dFjdCb8SaJmQeY4z9RIE0P /cbh8RBlFiQjPT9CpXQ9X03ctudy8gcB2eRuK3DVh10vmDEBigJ0POV/E4TbkiXo1//N t/z/+PjqCwQ48q8SAQskcR+T671xR+7GCupShWsenBbmWP2HtK035JB3vMY49o8ptZmq RRkmJgNxNP6EI9qNbxL4K+IvQ4Jczz8O6gp6EXOx5T7+DwCYsBcPXJFY/tvZpCl2ExAX d7Lg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="qE5/kBw9"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h37si9096398plb.93.2019.03.18.02.42.42; Mon, 18 Mar 2019 02:42:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="qE5/kBw9"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387436AbfCRJfW (ORCPT + 99 others); Mon, 18 Mar 2019 05:35:22 -0400 Received: from mail.kernel.org ([198.145.29.99]:43714 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728110AbfCRJfU (ORCPT ); Mon, 18 Mar 2019 05:35:20 -0400 Received: from localhost (5356596B.cm-6-7b.dynamic.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 74E992083D; Mon, 18 Mar 2019 09:35:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1552901720; bh=6SFgga3CfRPHTC9iX/CGO16vQ5PmqlUJPGOlxmKDnww=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=qE5/kBw9fEABIZNctdiRAtwesBdCLWQ0RaUS+S8hl2T+VawMVhIGlDgylQlDrkPTf KsOVwY3SqGBokmdNG329Zq6GLKHQcUEDIDGelwdOQy3tyL117ifYblMtNLdtvLBFV5 +dfqOr3dWj/ji4T5CKnkhHTtsZwR7+lDnpbFucz0= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Daniel Borkmann , Mahesh Bandewar , "David S. Miller" Subject: [PATCH 4.14 27/34] ipvlan: disallow userns cap_net_admin to change global mode/flags Date: Mon, 18 Mar 2019 10:25:51 +0100 Message-Id: <20190318084148.560958376@linuxfoundation.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190318084144.657740413@linuxfoundation.org> References: <20190318084144.657740413@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: Daniel Borkmann [ Upstream commit 7cc9f7003a969d359f608ebb701d42cafe75b84a ] When running Docker with userns isolation e.g. --userns-remap="default" and spawning up some containers with CAP_NET_ADMIN under this realm, I noticed that link changes on ipvlan slave device inside that container can affect all devices from this ipvlan group which are in other net namespaces where the container should have no permission to make changes to, such as the init netns, for example. This effectively allows to undo ipvlan private mode and switch globally to bridge mode where slaves can communicate directly without going through hostns, or it allows to switch between global operation mode (l2/l3/l3s) for everyone bound to the given ipvlan master device. libnetwork plugin here is creating an ipvlan master and ipvlan slave in hostns and a slave each that is moved into the container's netns upon creation event. * In hostns: # ip -d a [...] 8: cilium_host@bond0: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 ipvlan mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 10.41.0.1/32 scope link cilium_host valid_lft forever preferred_lft forever [...] * Spawn container & change ipvlan mode setting inside of it: # docker run -dt --cap-add=NET_ADMIN --network cilium-net --name client -l app=test cilium/netperf 9fff485d69dcb5ce37c9e33ca20a11ccafc236d690105aadbfb77e4f4170879c # docker exec -ti client ip -d a [...] 10: cilium0@if4: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 ipvlan mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0 valid_lft forever preferred_lft forever # docker exec -ti client ip link change link cilium0 name cilium0 type ipvlan mode l2 # docker exec -ti client ip -d a [...] 10: cilium0@if4: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0 valid_lft forever preferred_lft forever * In hostns (mode switched to l2): # ip -d a [...] 8: cilium_host@bond0: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 10.41.0.1/32 scope link cilium_host valid_lft forever preferred_lft forever [...] Same l3 -> l2 switch would also happen by creating another slave inside the container's network namespace when specifying the existing cilium0 link to derive the actual (bond0) master: # docker exec -ti client ip link add link cilium0 name cilium1 type ipvlan mode l2 # docker exec -ti client ip -d a [...] 2: cilium1@if4: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 10: cilium0@if4: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0 valid_lft forever preferred_lft forever * In hostns: # ip -d a [...] 8: cilium_host@bond0: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 10.41.0.1/32 scope link cilium_host valid_lft forever preferred_lft forever [...] One way to mitigate it is to check CAP_NET_ADMIN permissions of the ipvlan master device's ns, and only then allow to change mode or flags for all devices bound to it. Above two cases are then disallowed after the patch. Signed-off-by: Daniel Borkmann Acked-by: Mahesh Bandewar Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ipvlan/ipvlan_main.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) --- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -482,7 +482,12 @@ static int ipvlan_nl_changelink(struct n struct ipvl_port *port = ipvlan_port_get_rtnl(ipvlan->phy_dev); int err = 0; - if (data && data[IFLA_IPVLAN_MODE]) { + if (!data) + return 0; + if (!ns_capable(dev_net(ipvlan->phy_dev)->user_ns, CAP_NET_ADMIN)) + return -EPERM; + + if (data[IFLA_IPVLAN_MODE]) { u16 nmode = nla_get_u16(data[IFLA_IPVLAN_MODE]); err = ipvlan_set_port_mode(port, nmode); @@ -551,6 +556,8 @@ int ipvlan_link_new(struct net *src_net, struct ipvl_dev *tmp = netdev_priv(phy_dev); phy_dev = tmp->phy_dev; + if (!ns_capable(dev_net(phy_dev)->user_ns, CAP_NET_ADMIN)) + return -EPERM; } else if (!netif_is_ipvlan_port(phy_dev)) { err = ipvlan_port_create(phy_dev); if (err < 0)