Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp500223imu; Thu, 22 Nov 2018 00:37:20 -0800 (PST) X-Google-Smtp-Source: AFSGD/UhyfD3dsILYCtZhZpb6NV0xA2NpdYQi9fK+UqgpQVGjKjwpo9+4n1mBA8FyBOrEUxM5mYy X-Received: by 2002:a17:902:4281:: with SMTP id h1-v6mr10365995pld.114.1542875840367; Thu, 22 Nov 2018 00:37:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542875840; cv=none; d=google.com; s=arc-20160816; b=lG3BgOXPr0ysBbkPbl9a/MO+M4OvBf/zKTvOxpOAXBC7ShZq1IebLX2CttvI8ssoK7 XMitwxKvEde1afSIMOmKUm92l9qwCIxAcDvCT2jIVqIey+Ud9Soqo50Nths/Pkl26X38 aDl/DQE/Te5Q/TmFnPxt/oDipj8AlpVYgDpkj/LStcVYSkJF0JlAEOcCy61oE1jM5r8I yAzlv4urI5ACwEnt0iL2/IEVANsr/2X3mc2PwGxjrbuUUPW0i0tOuXkwKA+an0RFEwYz Lv2SPk9w5F5wxYRldBOHavgf3BQ+xkuvT+yVYVHwANly8uNSphcr0JZ8USB/xaKj4flU jOTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=keBV+bNnXPKxs5hdujsRQQtl7Pd1+mBd2tSp7M+GSFk=; b=DTgf2o/9pe+oESscdvWfNdezx6ftQ/HeRDhvo7lg+JqzGkmM1feC4213Et/Kn9a7T9 NaVus/to6phRixuIN6rhmo2EYgTNo1k08xtzjGd/2XqLnYuBvjPZHUabtRdDL3OSpPWg NO//PDbGdOfCpo09J/uVrTLm7+h0BbMBNU3pzkfGT8NdEGfj3xrmxaTyoxQKl0JQUCMN OGnbBiaof8MNCeP9YWsVTpaDdubv+Q/TZxqoWlz6NhjyNUzCG3teNjvqtQM9YWJKV7ec 2DYTxHWyRXCcql0W9UYiFzs92tPl2bP7rIWun7HtBWC0km8r+G7c3fLC9oQfA/XsPxxh PcjA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 38si48627097pgx.460.2018.11.22.00.37.05; Thu, 22 Nov 2018 00:37:20 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389887AbeKVIP4 (ORCPT + 99 others); Thu, 22 Nov 2018 03:15:56 -0500 Received: from mail.hallyn.com ([178.63.66.53]:49560 "EHLO mail.hallyn.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729727AbeKVIPz (ORCPT ); Thu, 22 Nov 2018 03:15:55 -0500 Received: by mail.hallyn.com (Postfix, from userid 1001) id 2336588A; Wed, 21 Nov 2018 15:39:46 -0600 (CST) Date: Wed, 21 Nov 2018 15:39:46 -0600 From: "Serge E. Hallyn" To: Christian Brauner Cc: "Eric W. Biederman" , Daniel Colascione , Aleksa Sarai , linux-kernel , "Serge E. Hallyn" , Jann Horn , Andy Lutomirski , Andrew Morton , Oleg Nesterov , Al Viro , Linux FS Devel , Linux API , Tim Murray , linux-man , Kees Cook Subject: Re: [PATCH v1 2/2] signal: add procfd_signal() syscall Message-ID: <20181121213946.GA10795@mail.hallyn.com> References: <20181119103241.5229-3-christian@brauner.io> <20181119202857.k5zw742xjfrw677j@yavin> <20181119205518.btew3vxwgva4w3zh@brauner.io> <20181119211810.73ptfhnwdmkngfi4@yavin> <20181119212126.u2nkijmula6wcfqi@brauner.io> <20181119213722.z54huio5g4kuldxk@brauner.io> <87muq4xs2n.fsf@xmission.com> <20181120103111.etlqp7zop34v6nv4@brauner.io> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181120103111.etlqp7zop34v6nv4@brauner.io> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 20, 2018 at 11:31:13AM +0100, Christian Brauner wrote: > On Mon, Nov 19, 2018 at 10:59:12PM -0600, Eric W. Biederman wrote: > > Daniel Colascione writes: > > > > > On Mon, Nov 19, 2018 at 1:37 PM Christian Brauner wrote: > > >> > > >> On Mon, Nov 19, 2018 at 01:26:22PM -0800, Daniel Colascione wrote: > > >> > On Mon, Nov 19, 2018 at 1:21 PM, Christian Brauner wrote: > > >> > > That can be done without a loop by comparing the level counter for the > > >> > > two pid namespaces. > > >> > > > > >> > >> > > >> > >> And you can rewrite pidns_get_parent to use it. So you would instead be > > >> > >> doing: > > >> > >> > > >> > >> if (pidns_is_descendant(proc_pid_ns, task_active_pid_ns(current))) > > >> > >> return -EPERM; > > >> > >> > > >> > >> (Or you can just copy the 5-line loop into procfd_signal -- though I > > >> > >> imagine we'll need this for all of the procfd_* APIs.) > > >> > > > >> > Why is any of this even necessary? Why does the child namespace we're > > >> > considering even have a file descriptor to its ancestor's procfs? If > > >> > > >> Because you can send file descriptors between processes and container > > >> runtimes tend to do that. > > > > > > Right. But why *would* a container runtime send one of these procfs > > > FDs to a container? > > > > > >> > it has one of these FDs, it can already *read* all sorts of > > >> > information it really shouldn't be able to acquire, so the additional > > >> > ability to send a signal (subject to the usual permission checks) > > >> > feels like sticking a finger in a dike that's already well-perforated. > > >> > IMHO, we shouldn't bother with this check. The patch would be simpler > > >> > without it. > > >> > > >> We will definitely not allow signaling processes in an ancestor pid > > >> namespace! That is a security issue! I can imagine container runtimes > > >> killing their monitoring process etc. pp. Not happening, unless someone > > >> with deep expertise in signals can convince me otherwise. > > > > > > If parent namespace procfs FDs or mounts really can leak into child > > > namespaces as easily as Aleksa says, then I don't mind adding the > > > check. I was under the impression that if you find yourself in this > > > situation, you already have a big problem. > > > > There is one big reason to have the check, and I have not seen it > > mentioned yet in this thread. > > > > When SI_USER is set we report the pid of the sender of the signal in > > si_pid. When the signal comes from the kernel si_pid == 0. When signal > > is sent from an ancestor pid namespace si_pid also equals 0 (which is > > reasonable). > > > > A signal out to a process in a parent pid namespace such as SIGCHLD is > > reasonable as we can map the pid. I really don't see the point of > > forbidding that. From the perspective of the process in the parent pid > > namespace it is just another process in it's pid namespace. So it > > should pose no problem from the perspective of the receiving process. > > > > A signal to a process in a pid namespace that is neither a parent nor a > > descendent pid namespace would be a problem, as there is no well defined > > notion of what si_pid should be set to. So for that case perhaps we > > should have something like a noprocess pid that we can set. Perhaps we > > could set si_pid to 0xffffffff. That would take a small extension to > > pid_nr_ns. > > > > File descriptors are not namespaced. It is completely legitimate to use > > file descriptors to get around limitations of namespaces. > > Frankly, I don't see a good argument for why we would allow that even if > safe. I have not heard a legitimate use-case or need for this. > At this point I care about very simple semantics. Being able to signal > into ancestor pid namespaces and cousin namespaces is interesting but > makes the syscall more brittle and harder to understand. Yeah, I'm with you on that. We can always open that door later if a good use case comes up, but I prefer simple at first. > Changing pid_nr_ns() might be the solution but this function is called > all over the place in the kernel and I'm not going to risk breaking > something by changing it for a feature that no one so far has ever > asked for. > If you are ok with this then we should hold off on this. We can always > add this feature later by removing the check when someone has a use-case > for it. > I'll send a v2 of the patch that keeps the restriction for now. If you > insist on it being removed we can make the change in a follow-up > iteration. > > Christian > > > > > Adding limitations to a file descriptor based api because someone else > > can't set up their processes in such a way as to get the restrictions > > they are looking for seems very sad. > > > > Frankly I think it is one of the better features of namespaces that we > > have to carefully handle and define these cases so that when the > > inevitable leaks happen you are not immediately in a world of hurt. All > > of the other permission checks etc continue to do their job. Plus you > > are prepared for the case when someone wants their containers to have an > > interesting communication primitive. > > > > Eric > > > > > > > >