Received: by 10.192.165.148 with SMTP id m20csp2395249imm; Thu, 26 Apr 2018 10:13:58 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+pnFZOs/MgY8FqKPFxymij1MWrv0O51z9RHyQHGhr1ECB3c8UwPxEHeVbLLO/j8+bMOjZF X-Received: by 2002:a17:902:70c4:: with SMTP id l4-v6mr34436557plt.382.1524762838680; Thu, 26 Apr 2018 10:13:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524762838; cv=none; d=google.com; s=arc-20160816; b=txXwfglGYzo6NCuR6tAWtaR96wnsYAcNAn7qz8wYG4bNc7YpOiFA7G+K7gteXOY71f a4BbaopYenL5HKb5B06IvpET8rMCH8gJ+PYRxXVCwZXD96wpcCQG865tarkLBeo7TobV 4kmy+oBGuYU7ifpwxW8tD6snS0JlrZwjz90w18wVsW4lWqlHQc+7UMKGrlVGmflpbo0s lL0wMtfo9O8MD6c/eti9S8ro1wfX+nh8D3amINLtlFoUJ/d+FKCtoLcRDkQ5jN9q3x3J NlzPXdo1olG1xRxBj/9CPLnEmTdWQwV15PGyBOISxzIKzotpBXS8F0DKdgXC17BsQOzO IbzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:subject:mime-version:user-agent :message-id:in-reply-to:date:references:cc:to:from :arc-authentication-results; bh=f/fKckeb595TpOjFVNb6xZciyZ8aoJtA9I6UqgUgQWs=; b=c3o/KAcRBqbE5ONa4YVcBBSapch1PQUw5RPqhFTXALbshmaPxSVwzj0WVOU7UXfsFN 99BWMB6+tBJuhgyh5DdTHN0+Gq0VltL0/Q3XjmI23MhXKCA0/2xRtlmi9LBhPjEoQR0s XQgEz6GwbVCF9iTjw7NcXI4qrrYT/Hr76ZfHow5RjpPDSirboKKisuBndvpcOXwgIP15 WWy8NZGDFJxYvW5kA1czJrrj5lPo8Hg1Z3VyKDhZQnXz67WD8bn1SBpe9TCKfR/y8Qa/ I04vTnrDogf1iDze7ZrYPdFdOZGswCaOJqQj5Zjk93bS175B0kpc3Lr1im0cz8TakqNW 64CA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g1si5302466pge.538.2018.04.26.10.13.38; Thu, 26 Apr 2018 10:13:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756754AbeDZRMV (ORCPT + 99 others); Thu, 26 Apr 2018 13:12:21 -0400 Received: from out02.mta.xmission.com ([166.70.13.232]:56432 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754419AbeDZRMS (ORCPT ); Thu, 26 Apr 2018 13:12:18 -0400 Received: from in02.mta.xmission.com ([166.70.13.52]) by out02.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fBkRI-0000Md-87; Thu, 26 Apr 2018 11:12:12 -0600 Received: from [97.119.174.25] (helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fBkR2-0008Gj-SJ; Thu, 26 Apr 2018 11:12:12 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Christian Brauner Cc: David Miller , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, avagin@virtuozzo.com, ktkhai@virtuozzo.com, serge@hallyn.com, gregkh@linuxfoundation.org References: <20180424204335.12904-1-christian.brauner@ubuntu.com> <20180424204335.12904-2-christian.brauner@ubuntu.com> <87po2oz0s8.fsf@xmission.com> <87wowww6p8.fsf@xmission.com> <20180426161353.GA2014@gmail.com> <871sf1q5ig.fsf@xmission.com> <20180426170324.GA10061@gmail.com> Date: Thu, 26 Apr 2018 12:10:30 -0500 In-Reply-To: <20180426170324.GA10061@gmail.com> (Christian Brauner's message of "Thu, 26 Apr 2018 19:03:26 +0200") Message-ID: <878t99opvd.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1fBkR2-0008Gj-SJ;;;mid=<878t99opvd.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=97.119.174.25;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+9oW0Ic7F/ADMld/SGPCXfQA55FyCSttE= X-SA-Exim-Connect-IP: 97.119.174.25 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on sa07.xmission.com X-Spam-Level: X-Spam-Status: No, score=-0.2 required=8.0 tests=ALL_TRUSTED,BAYES_50, DCC_CHECK_NEGATIVE,T_TM2_M_HEADER_IN_MSG,T_TooManySym_01 autolearn=disabled version=3.4.1 X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.4999] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa07 1397; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_TooManySym_01 4+ unique symbols in subject X-Spam-DCC: XMission; sa07 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Christian Brauner X-Spam-Relay-Country: X-Spam-Timing: total 15038 ms - load_scoreonly_sql: 0.06 (0.0%), signal_user_changed: 6 (0.0%), b_tie_ro: 5 (0.0%), parse: 1.49 (0.0%), extract_message_metadata: 18 (0.1%), get_uri_detail_list: 5 (0.0%), tests_pri_-1000: 3.2 (0.0%), tests_pri_-950: 1.27 (0.0%), tests_pri_-900: 1.06 (0.0%), tests_pri_-400: 35 (0.2%), check_bayes: 34 (0.2%), b_tokenize: 13 (0.1%), b_tok_get_all: 11 (0.1%), b_comp_prob: 3.3 (0.0%), b_tok_touch_all: 4.1 (0.0%), b_finish: 0.68 (0.0%), tests_pri_0: 455 (3.0%), check_dkim_signature: 0.80 (0.0%), check_dkim_adsp: 3.8 (0.0%), tests_pri_500: 14512 (96.5%), poll_dns_idle: 14499 (96.4%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH net-next 1/2 v2] netns: restrict uevents X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Christian Brauner writes: > On Thu, Apr 26, 2018 at 11:47:19AM -0500, Eric W. Biederman wrote: >> Christian Brauner writes: >> >> > On Tue, Apr 24, 2018 at 06:00:35PM -0500, Eric W. Biederman wrote: >> >> Christian Brauner writes: >> >> >> >> > On Wed, Apr 25, 2018, 00:41 Eric W. Biederman wrote: >> >> > >> >> > Bah. This code is obviously correct and probably wrong. >> >> > >> >> > How do we deliver uevents for network devices that are outside of the >> >> > initial user namespace? The kernel still needs to deliver those. >> >> > >> >> > The logic to figure out which network namespace a device needs to be >> >> > delivered to is is present in kobj_bcast_filter. That logic will almost >> >> > certainly need to be turned inside out. Sign not as easy as I would >> >> > have hoped. >> >> > >> >> > My first patch that we discussed put additional filtering logic into kobj_bcast_filter for that very reason. But I can move that logic >> >> > out and come up with a new patch. >> >> >> >> I may have mis-understood. >> >> >> >> I heard and am still hearing additional filtering to reduce the places >> >> the packet is delievered. >> >> >> >> I am saying something needs to change to increase the number of places >> >> the packet is delivered. >> >> >> >> For the special class of devices that kobj_bcast_filter would apply to >> >> those need to be delivered to netowrk namespaces that are no longer on >> >> uevent_sock_list. >> >> >> >> So the code fundamentally needs to split into two paths. Ordinary >> >> devices that use uevent_sock_list. Network devices that are just >> >> delivered in their own network namespace. >> >> >> >> netlink_broadcast_filtered gets to go away completely. >> > >> > The split *might* make sense but I think you're wrong about removing the >> > kobj_bcast_filter. The current filter doesn't operate on the uevent >> > socket in uevent_sock_list itself it rather operates on the sockets in >> > mc_list. And if socket in mc_list can have a different network namespace >> > then the uevent_socket itself then your way won't work. That's why my >> > original patch added additional filtering in there. The way I see it we >> > need something like: >> >> We already filter the sockets in the mc_list by network namespace. > > Oh really? That's good to know. I haven't found where in the code this > actually happens. I thought that when netlink_bind() is called anyone > could register themselves in mc_list. The code in af_netlink.c does: > static void do_one_broadcast(struct sock *sk, > struct netlink_broadcast_data *p) > { > struct netlink_sock *nlk = nlk_sk(sk); > int val; > > if (p->exclude_sk == sk) > return; > > if (nlk->portid == p->portid || p->group - 1 >= nlk->ngroups || > !test_bit(p->group - 1, nlk->groups)) > return; > > if (!net_eq(sock_net(sk), p->net)) { ^^^^^^^^^^^^ Here > if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID)) > return; ^^^^^^^^^^^ Here > > if (!peernet_has_id(sock_net(sk), p->net)) > return; > > if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns, > CAP_NET_BROADCAST)) > return; > } Which if you are not a magic NETLINK_F_LISTEN_ALL_NSID socket filters you out if you are the wrong network namespace. >> When a packet is transmitted with netlink_broadcast it is only >> transmitted within a single network namespace. >> >> Even in the case of a NETLINK_F_LISTEN_ALL_NSID socket the skb is tagged >> with it's source network namespace so no confusion will result, and the >> permission checks have been done to make it safe. So you can safely >> ignore that case. Please ignore that case. It only needs to be >> considered if refactoring af_netlink.c >> >> When I added netlink_broadcast_filtered I imagined that we would need >> code that worked across network namespaces that worked for different >> namespaces. So it looked like we would need the level of granularity >> that you can get with netlink_broadcast_filtered. It turns out we don't >> and that it was a case of over design. As the only split we care about >> is per network namespace there is no need for >> netlink_broadcast_filtered. >> >> > init_user_ns_broadcast_filtered(uevent_sock_list, kobj_bcast_filter); >> > user_ns_broadcast_filtered(uevent_sock_list,kobj_bcast_filter); >> > >> > The question that remains is whether we can rely on the network >> > namespace information we can gather from the kobject_ns_type_operations >> > to decide where we want to broadcast that event to. So something >> > *like*: >> >> We can. We already do. That is what kobj_bcast_filter implements. >> >> > ops = kobj_ns_ops(kobj); >> > if (!ops && kobj->kset) { >> > struct kobject *ksobj = &kobj->kset->kobj; >> > if (ksobj->parent != NULL) >> > ops = kobj_ns_ops(ksobj->parent); >> > } >> > >> > if (ops && ops->netlink_ns && kobj->ktype->namespace) >> > if (ops->type == KOBJ_NS_TYPE_NET) >> > net = kobj->ktype->namespace(kobj); >> >> Please note the only entry in the enumeration in the kobj_ns_type >> enumeration other than KOBJ_NS_TYPE_NONE is KOBJ_NS_TYPE_NET. So the >> check for ops->type in this case is redundant. > > Yes, I know the reason for doing it explicitly is to block the case > where kobjects get tagged with other namespaces. So we'd need to be > vigilant should that ever happen but fine. It is fine to keep the check. I was intending to point out that it is much more likely that we remove the enumeration and remove some of the extra abstraction, than another namespace is implemented there. >> That is something else that could be simplifed. At the time it was the >> necessary to get the sysfs changes merged. >> >> > if (!net || net->user_ns == &init_user_ns) >> > ret = init_user_ns_broadcast(env, action_string, devpath); >> > else >> > ret = user_ns_broadcast(net->uevent_sock->sk, env, >> > action_string, devpath); >> >> Almost. >> >> if (!net) >> kobject_uevent_net_broadcast(kobj, env, action_string, >> dev_path); >> else >> netlink_broadcast(net->uevent_sock->sk, skb, 0, 1, GFP_KERNEL); >> >> >> I am handwaving to get the skb in the netlink_broadcast case but that >> should be enough for you to see what I am thinking. > > I have added a helper alloc_uevent_skb() that can be used in both cases. > > static struct sk_buff *alloc_uevent_skb(struct kobj_uevent_env *env, > const char *action_string, > const char *devpath) > { > struct sk_buff *skb = NULL; > char *scratch; > size_t len; > > /* allocate message with maximum possible size */ > len = strlen(action_string) + strlen(devpath) + 2; > skb = alloc_skb(len + env->buflen, GFP_KERNEL); > if (!skb) > return NULL; > > /* add header */ > scratch = skb_put(skb, len); > sprintf(scratch, "%s@%s", action_string, devpath); > > skb_put_data(skb, env->buf, env->buflen); > > NETLINK_CB(skb).dst_group = 1; > > return skb; > } > >> >> My only concern with the above is that we almost certainly need to fix >> the credentials on the skb so that userspace does not drop the packet >> sent to a network namespace because it has the credentials that will >> cause userspace to drop the packet today. >> >> But it should be straight forward to look at net->user_ns, to fix the >> credentials. > > Yes, afaict, the only thing that needs to be updated is the uid. I suspect there may also be a gid. Eric