Received: by 10.223.164.202 with SMTP id h10csp1017772wrb; Thu, 9 Nov 2017 19:32:24 -0800 (PST) X-Google-Smtp-Source: ABhQp+Qt+MLgr109O0VO5Uca4Bj7F0lIOWlgBW/weLhYnDSml1fAJIrPAE8DsEDmK/g3B9dDkwE/ X-Received: by 10.84.246.195 with SMTP id j3mr2621569plt.7.1510284744419; Thu, 09 Nov 2017 19:32:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510284744; cv=none; d=google.com; s=arc-20160816; b=zH1SI3O7FY2r3ceSsxhJ92+szXzfDhHt1UJ6wSyPjIx4vRDtv7Ykr58mBLi2LtRmO3 QnRmEzZNugteQwTeecBtlFHc8LuoqDfnrH834BYwRYW/vXPxBDiEzqI2KhIsYbUZmbLV lawG4/jYzxRO5jzBK4BDdkYu60QreoDQQHtfIMvZJUvqnoqB7ftqITDBmIgG9z4ZTzfP ln8Y3xOlqbUfJ9Qp0bOSzd0qHMVRxE/IIkiUbTb8pGEHpP2Utp6ZCQuez2ifFnt4r0vd iRej4D/TklemXmqD8uEi2EMkwhBF1twrbyU+G0rBY3nO2X7W1cbrBJwnGZ09+QpNlwIv YRTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=BymapVzhbiXtWTtA45am1rORfW+TYVz2+kbzFHxy04Q=; b=EmLQTXHXjreUc4rBE5cKPJOV/kI0LBgCK2K4FXsbz7tUOnm+zU7CcG84TfrHwyyWCr /Li64cF6G1Sg1/HJuLbmvIeXfN5HG75H2oS8FK61pkqbav9eH3ZxasxGbdd5knnopij0 K2xLhD+sePMeU24PvI2rcXNJg5NkUtGMCEC8KpwkG9f8ASVnZqcDJUf7R1O/QVca9YBi HnBfAm2Z7xU0TwJmh7X38J/6FW+Fs4PNs++OCUA+j3Z8Gwd96gjgMdhG4b9fLEBkcN/N UixRL4KMsaFt6jsZ4kvJnGUdW8FrKXZqCXNnrfrv6BRBRwx9suA9FAzXaqKCk0Tioq5C Q01w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=THxAXSRR; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m66si8236836pfb.72.2017.11.09.19.32.12; Thu, 09 Nov 2017 19:32:24 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=THxAXSRR; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755906AbdKJDbe (ORCPT + 83 others); Thu, 9 Nov 2017 22:31:34 -0500 Received: from mail-yw0-f194.google.com ([209.85.161.194]:52820 "EHLO mail-yw0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755722AbdKJDbb (ORCPT ); Thu, 9 Nov 2017 22:31:31 -0500 Received: by mail-yw0-f194.google.com with SMTP id w2so7122595ywa.9 for ; Thu, 09 Nov 2017 19:31:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=BymapVzhbiXtWTtA45am1rORfW+TYVz2+kbzFHxy04Q=; b=THxAXSRRbCNUHF5LUe3Xft/Knon73jyyZ5jONZJ8h2zHkc7Xr+o5hBbxQkRLRDvRnX LzaQ0udwOUPc/sWc71X/Ufd5NZMcNZKsAcdBboaigm6JkengK0wWC/GqZvWgD7PyrWK0 2vJ441jCXuvsmoby/4LAeCYiKPljWEEAr1iKFO/X8sEwxrD1jfQ7c0EO0fhm1HPrxV96 EnzNr/W2kz08uzK2Iw1AwzDjG06nW4aGnj7fnQ1YJ2KQ7uq+MdDzlmbZfqmg9X1HyMfJ UPIKITSmW2pruLuzg3BhouPsvXZNaN1J6Mtd1+1iuqPvU+WIyvMN4kVw6s/xtm3Inurm ngkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=BymapVzhbiXtWTtA45am1rORfW+TYVz2+kbzFHxy04Q=; b=a84ukuZYrAvsY+oGCaqjKS0NWBYQXfFDMrM3tHS1qGJF/ahXUu2qxcap33oW3pZgZr OY48i0sgBckMmDj6WUK9X+AQvjCunvLmOn0XkfrpDBB5CfaCnMyPh2xB5lnmIQ5nh830 YZskc3znZvlSgViLug4PfabFjmwHvp/Ux0CHFB2VYxqjA8P/yk+4SnLXjagjRCDx71St ewbiAkQlPiVWbahkHOxijzaDu+1mpUo5JysErdM0HoTZBttuYPWz6Atpo9G8qyvSwjVs NL5yxnOOUFugxM/metfXk4q3/H1UlRQfUpBGv/ePLun1X+Z96ng8RbWv5PBGrM/1mTXk 7dWA== X-Gm-Message-State: AJaThX7QzDB/9837okT8xG3mx0GDmHqID9vUGfmT5HGLkLBZ8tTO1IE5 77+on5+mks0Z6JCsStceP83lggdlzKXBs9iKtF+p5xLQ X-Received: by 10.129.175.17 with SMTP id n17mr1624718ywh.199.1510284690064; Thu, 09 Nov 2017 19:31:30 -0800 (PST) MIME-Version: 1.0 Received: by 10.37.131.198 with HTTP; Thu, 9 Nov 2017 19:31:09 -0800 (PST) In-Reply-To: <20171109172201.GA26229@mail.hallyn.com> References: <20171103004433.39954-1-mahesh@bandewar.net> <20171109172201.GA26229@mail.hallyn.com> From: =?UTF-8?B?TWFoZXNoIEJhbmRld2FyICjgpK7gpLngpYfgpLYg4KSs4KSC4KSh4KWH4KS14KS+4KSwKQ==?= Date: Fri, 10 Nov 2017 12:31:09 +0900 Message-ID: Subject: Re: [PATCH resend 1/2] capability: introduce sysctl for controlled user-ns capability whitelist To: "Serge E. Hallyn" Cc: Mahesh Bandewar , LKML , Netdev , Kernel-hardening , Linux API , Kees Cook , "Eric W . Biederman" , Eric Dumazet , David Miller Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 10, 2017 at 2:22 AM, Serge E. Hallyn wrote: > Quoting Mahesh Bandewar (mahesh@bandewar.net): >> From: Mahesh Bandewar >> >> Add a sysctl variable kernel.controlled_userns_caps_whitelist. This >> takes input as capability mask expressed as two comma separated hex >> u32 words. The mask, however, is stored in kernel as kernel_cap_t type. >> >> Any capabilities that are not part of this mask will be controlled and >> will not be allowed to processes in controlled user-ns. >> >> Signed-off-by: Mahesh Bandewar >> --- >> Documentation/sysctl/kernel.txt | 21 ++++++++++++++++++ >> include/linux/capability.h | 3 +++ >> kernel/capability.c | 47 +++++++++++++++++++++++++++++++++++++++++ >> kernel/sysctl.c | 5 +++++ >> 4 files changed, 76 insertions(+) >> >> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt >> index 694968c7523c..a1d39dbae847 100644 >> --- a/Documentation/sysctl/kernel.txt >> +++ b/Documentation/sysctl/kernel.txt >> @@ -25,6 +25,7 @@ show up in /proc/sys/kernel: >> - bootloader_version [ X86 only ] >> - callhome [ S390 only ] >> - cap_last_cap >> +- controlled_userns_caps_whitelist >> - core_pattern >> - core_pipe_limit >> - core_uses_pid >> @@ -187,6 +188,26 @@ CAP_LAST_CAP from the kernel. >> >> ============================================================== >> >> +controlled_userns_caps_whitelist >> + >> +Capability mask that is whitelisted for "controlled" user namespaces. >> +Any capability that is missing from this mask will not be allowed to >> +any process that is attached to a controlled-userns. e.g. if CAP_NET_RAW >> +is not part of this mask, then processes running inside any controlled >> +userns's will not be allowed to perform action that needs CAP_NET_RAW >> +capability. However, processes that are attached to a parent user-ns >> +hierarchy that is *not* controlled and has CAP_NET_RAW can continue >> +performing those actions. User-namespaces are marked "controlled" at >> +the time of their creation based on the capabilities of the creator. >> +A process that does not have CAP_SYS_ADMIN will create user-namespaces >> +that are controlled. > > Hm. I think that's fine (the way 'controlled' user namespaces are > defined), but that is design decision in itself, and should perhaps be > discussed. > > Did you consider other ways? What about using CAP_SETPCAP? > I did try other ways e.g. using another bounding-set etc. but eventually settled with this approach because of main two properties - (a) This has creation time settings which can be turned on/off at runtime (b) the run-time knob actually controls the behavior which can range from no-op to very-drastic without needing the applications to change and controlled by admin. Also there are always more than one ways of solving the problem and there possibly could be better alternative and I don't deny that. :/ Controlling individual capabilities are going to give very different experience. So how the behavior of the process going to be for a specific capability is probably out-of-scope for this patch-set. I would like to offload that responsibility to the admin, as he/she would be the best judge and knowledgable of the situation / environment. This should be used as a tool to gain control. >> +The value is expressed as two comma separated hex words (u32). This > > Why comma separated? whitespace ok? Leading 0x ok? What is the > default at boot? (Obviously the patch tells me, I'm asking for it > to be spelled out in the doc) > I tried multiple ways including representing capabilities in string/name form for better readability but didn't want to add additional complexities of dealing with strings and possible string-related-issues for this. Also didn't want to reinvent the new form so settled with something that is widely used (cpu bounding/affinity/irq mapping etc.) and is capable of handling growing bit set (currently 37 but possibly more later). > Otherwise looks good, thanks! > > Serge > >> +sysctl is avaialble in init-ns and users with CAP_SYS_ADMIN in init-ns >> +are allowed to make changes. >> + >> +============================================================== >> + >> core_pattern: >> >> core_pattern is used to specify a core dumpfile pattern name. >> diff --git a/include/linux/capability.h b/include/linux/capability.h >> index b52e278e4744..6c0b9677c03f 100644 >> --- a/include/linux/capability.h >> +++ b/include/linux/capability.h >> @@ -13,6 +13,7 @@ >> #define _LINUX_CAPABILITY_H >> >> #include >> +#include >> >> >> #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3 >> @@ -247,6 +248,8 @@ extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns); >> >> /* audit system wants to get cap info from files as well */ >> extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps); >> +int proc_douserns_caps_whitelist(struct ctl_table *table, int write, >> + void __user *buff, size_t *lenp, loff_t *ppos); >> >> extern int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t size); >> >> diff --git a/kernel/capability.c b/kernel/capability.c >> index f97fe77ceb88..62dbe3350c1b 100644 >> --- a/kernel/capability.c >> +++ b/kernel/capability.c >> @@ -28,6 +28,8 @@ EXPORT_SYMBOL(__cap_empty_set); >> >> int file_caps_enabled = 1; >> >> +kernel_cap_t controlled_userns_caps_whitelist = CAP_FULL_SET; >> + >> static int __init file_caps_disable(char *str) >> { >> file_caps_enabled = 0; >> @@ -506,3 +508,48 @@ bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns) >> rcu_read_unlock(); >> return (ret == 0); >> } >> + >> +/* Controlled-userns capabilities routines */ >> +#ifdef CONFIG_SYSCTL >> +int proc_douserns_caps_whitelist(struct ctl_table *table, int write, >> + void __user *buff, size_t *lenp, loff_t *ppos) >> +{ >> + DECLARE_BITMAP(caps_bitmap, CAP_LAST_CAP); >> + struct ctl_table caps_table; >> + char tbuf[NAME_MAX]; >> + int ret; >> + >> + ret = bitmap_from_u32array(caps_bitmap, CAP_LAST_CAP, >> + controlled_userns_caps_whitelist.cap, >> + _KERNEL_CAPABILITY_U32S); >> + if (ret != CAP_LAST_CAP) >> + return -1; >> + >> + scnprintf(tbuf, NAME_MAX, "%*pb", CAP_LAST_CAP, caps_bitmap); >> + >> + caps_table.data = tbuf; >> + caps_table.maxlen = NAME_MAX; >> + caps_table.mode = table->mode; >> + ret = proc_dostring(&caps_table, write, buff, lenp, ppos); >> + if (ret) >> + return ret; >> + if (write) { >> + kernel_cap_t tmp; >> + >> + if (!capable(CAP_SYS_ADMIN)) >> + return -EPERM; >> + >> + ret = bitmap_parse_user(buff, *lenp, caps_bitmap, CAP_LAST_CAP); >> + if (ret) >> + return ret; >> + >> + ret = bitmap_to_u32array(tmp.cap, _KERNEL_CAPABILITY_U32S, >> + caps_bitmap, CAP_LAST_CAP); >> + if (ret != CAP_LAST_CAP) >> + return -1; >> + >> + controlled_userns_caps_whitelist = tmp; >> + } >> + return 0; >> +} >> +#endif /* CONFIG_SYSCTL */ >> diff --git a/kernel/sysctl.c b/kernel/sysctl.c >> index d9c31bc2eaea..25c3f7b76ece 100644 >> --- a/kernel/sysctl.c >> +++ b/kernel/sysctl.c >> @@ -1226,6 +1226,11 @@ static struct ctl_table kern_table[] = { >> .extra2 = &one, >> }, >> #endif >> + { >> + .procname = "controlled_userns_caps_whitelist", >> + .mode = 0644, >> + .proc_handler = proc_douserns_caps_whitelist, >> + }, >> { } >> }; >> >> -- >> 2.15.0.403.gc27cc4dac6-goog From 1583610622703534480@xxx Thu Nov 09 17:32:58 +0000 2017 X-GM-THRID: 1583003684527762870 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread