Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755614AbbGPBUL (ORCPT ); Wed, 15 Jul 2015 21:20:11 -0400 Received: from mail-la0-f51.google.com ([209.85.215.51]:34184 "EHLO mail-la0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755592AbbGPBUH (ORCPT ); Wed, 15 Jul 2015 21:20:07 -0400 MIME-Version: 1.0 In-Reply-To: <87wpy1camr.fsf@x220.int.ebiederm.org> References: <1436989569-69582-1-git-send-email-seth.forshee@canonical.com> <1436989569-69582-4-git-send-email-seth.forshee@canonical.com> <20150715214848.GA24204@mail.hallyn.com> <87wpy1camr.fsf@x220.int.ebiederm.org> From: Andy Lutomirski Date: Wed, 15 Jul 2015 18:19:46 -0700 Message-ID: Subject: Re: [PATCH 3/7] fs: Ignore file caps in mounts from other user namespaces To: "Eric W. Biederman" Cc: "Serge E. Hallyn" , Seth Forshee , Alexander Viro , Serge Hallyn , James Morris , Linux FS Devel , LSM List , SELinux-NSA , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5135 Lines: 117 On Wed, Jul 15, 2015 at 3:35 PM, Eric W. Biederman wrote: > Andy Lutomirski writes: > >> On Wed, Jul 15, 2015 at 2:48 PM, Serge E. Hallyn wrote: >>> On Wed, Jul 15, 2015 at 02:46:04PM -0500, Seth Forshee wrote: >>>> Capability sets attached to files must be ignored except in the >>>> user namespaces where the mounter is privileged, i.e. s_user_ns >>>> and its descendants. Otherwise a vector exists for gaining >>>> privileges in namespaces where a user is not already privileged. >>>> >>>> Add a new helper function, in_user_ns(), to test whether a user >>>> namespace is the same as or a descendant of another namespace. >>>> Use this helper to determine whether a file's capability set >>>> should be applied to the caps constructed during exec. >>>> >>>> Signed-off-by: Seth Forshee >>> >>> Acked-by: Serge Hallyn >>> >>> I think it's an ok behavior, though let's just go over the >>> alternatives. >>> >>> It might actually be ok to simply require that the user_ns be >>> equal. If I unshare a new userns in which a different uid is >>> mapped to root, I may not want file capabilities to be granted >>> to tasks in that ns. (On the other hand, I might be creating >>> a new user_ns specifically to not have a uid 0 mapped into it >>> at all, and only have file capabilities grant privilege) >>> >>> Conversely, if I unshare one user_ns with a MS_SHARED mnt_ns, mount >>> an ext4fs, and then (from the parent shell) unshare another user_ns >>> with the same mapping, intending it to be a "peer" to the first one >>> I'd unshared and be able to use the ext4fs it mounted. This won't >>> work here. That's probably best - the appropriate thing to do was >>> to attach to the existing user_ns. But it could end up being >>> limiting in some special cases, so I'm bringing it up here. >>> >>> Again I think what you have here is the simplest and most sensible >>> choice, so ack. >>> >> >> I think I'm missing something. Why is this separate from mount_may_suid? >> >> I can see why it would make sense to check s_user_ns (or maybe >> s_user_ns *and* the vfsmount namespace) in mount_may_suid, but I don't >> see why we need separate checks. > > So I don't quite understand your concerns that lead to the mnt_may_suid > patch. But in my limited understanding there are two distinct issues. The issue is that we need some kind of control for whether a given operation should trust a given mounted filesystem. There are two kinds of trust: trusting the fs for execve security context (nosuid controls this) and trusting it for LSM access restrictions. I think that, in an unprivileged namespace context, the latter is a bit silly -- the creator of the fs owns it, full stop. I'm talking about the former. In particular, If I unshare everything, mount a fresh FUSE, shove a setuid, fcapped, LSM-labeled thing in it, pass a file descriptor out, and have someone in the root ns execve it, and *pwned*. My suggestion is to use a single function to control this, and I called it mnt_may_suid. We can certainly debate when that function should return true, but I'm unconvinced that the conditions for LSM and for regular setuid should be different. > > 1) What do file capabilities mean on a filesystem mounted with user > namespace privileges. Where the unprivileged user can control what > resides on disk. > > That is what this patch should be about. > > Meaning and restricting those permissions to unprivileged users. I think that file caps should mean what they usually do if the execve caller's userns should trust the file. Otherwise file caps should do nothing. My original idea was that a namespace trusts a vfsmount if the namespace or one of its ancestors created the mount. Doing the same thing but with s_user_ns might also make sense. > > 2) The second issue that I think your mnt_may_suid patch is about seems > to be what to do if a mount winds up in a place we never intended. > > Aka leaks. I don't think any changes to mnt_may_suid are necessary > in that sense. However they may be useful. > > So I think your mnt_may_suid change may be worth having but so far it > seems unnecessary. There's that, too. For one thing, with my mnt_may_suid patch (or a variant that checks the vfsmount and s_user_ns), we could drop the bind-mount nosuid restrictions. If you want to bind-mount an MS_NOSUID mount without MS_NOSUID, then that's fine -- you can't do any harm. > > Which is a long way of saying this patch is fundamentally necessary, > and I am not certain about the mnt_may_suid patch. > > Am I right in understanding it's purpose? Or does this patch actually > succeed in obsoleting it? Other way around. I think that an improved mnt_may_suid patch might render this patch unnecessary. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/