Sorry folks I was traveling and seems like lot happened on this thread. :p
I will try to response few of these comments selectively -
> The thing that makes me hesitate with this set is that it is a
> permanent new feature to address what (I hope) is a temporary
> problem.
I agree this is permanent new feature but it's not solving a temporary
problem. It's impossible to assess what and when new vulnerability
that could show up. I think Daniel summed it up appropriately in his
response
> Seems like there are two naive ways to do it, the first being to just
> look at all code under ns_capable() plus code called from there. It
> seems like looking at the result of that could be fruitful.
This is really hard. The main issue that there were features designed
and developed before user-ns days with an assumption that unprivileged
users will never get certain capabilities which only root user gets.
Now that is not true anymore with user-ns creation with mapping root
for any process. Also at the same time blocking user-ns creation for
eveyone is a big-hammer which is not needed too. So it's not that easy
to just perform a code-walk-though and correct those decisions now.
> It seems to me that the existing control in
> /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape
> in that case.
This solution is essentially blocking unprivileged users from using
the user-namespaces entirely. This is not really a solution that can
work. The solution that this patch-set adds allows unprivileged users
to create user-namespaces. Actually the proposed solution is more
fine-grained approach than the unprivileged_userns_clone solution
since you can selectively block capabilities rather than completely
blocking the functionality.
> I meant each task has a perm_cap_bset next to the cap_bset. So task
> p1 (if it has privilege) can drop CAP_SYS_ADMIN from perm_cap_bset,
> p2 (if it has privilege) can drop CAP_NET_ADMIN. When p1 creates a
> new user_ns, that init task has its cap_bset set to all caps but
> CAP_SYS_ADMIN.
>
> I think for simplicity perm_cap_bset would *only* affect the filling
> of cap_bset at user namespace creation. So if you wanted to drop a
> capability from your own cap_bset as well, you'd have to do that
> separately.
My original intention is to reduce the attack surface when
vulnerabilities are discovered / published, but I don't see how this
is solving that issue. Also the reason to have sysctl is to have
simplistic control across the board to contain the situation. If that
is not addressed then we might need some other solution on top of
this.
From 1583426058425925542@xxx Tue Nov 07 16:39:23 +0000 2017
X-GM-THRID: 1583003759650790753
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread