2017-04-19 22:21:16

by Djalal Harouni

[permalink] [raw]
Subject: [PATCH v3 0/2] modules:capabilities: automatic module loading restrictions

Hi List,

This is an update of the previous two RFCs that implemented module
auto-load restriction as a stackable LSM [1] [2].

The previous versions were presented as a stackable LSM, this new
version is implemented as a core kernel feature inside the capability
subsystem.

This new version is clean and smaller compared to previous versions.
Kees Cook suggested to implement this as a core kernel feature so it is
easy to enable and to be more consistent with "modules_disabled" sysctl
that restrict all module operations.

These patches are against next-20170419

==============

Currently, an explicit call to load or unload kernel modules require
CAP_SYS_MODULE capability. However unprivileged users have always been
able to load some modules using the implicit auto-load operation. An
automatic module loading happens when programs request a kernel feature
from a module that is not loaded. In order to satisfy userspace, the
kernel then automatically load all these required modules.

However, some programs may abuse the interface to load vulnerable or
buggy modules where system administrators still did not have a chance to
blacklist these modules. This affects the global state of the machine,
especially with containers where some applications may use it to exploit
the vulnerable parts and escape the container sandbox. Not to mention
that some devices which one may call IoT also started to use containers
semantics as a deployment workflow, but as an isolation tool too where the
base system image can be any generic distro or other root filesystem with
its own kernel. These setups may include unnecessary modules that the
final applications will not need. Untrusted access may abuse the module
auto-load feature to expose those vulnerabilities.

As every code contains bugs or vulnerabilties, the following
vulnerabilities that affected some features that are often compiled as
modules could have been completely blocked, by restricting autoloading
modules if the system does not need them.

Past months:
* DCCP use after free CVE-2017-6074
* n_hldc CVE-2017-2636
* XFRM framework CVE-2017-7184
* L2TPv3 CVE-2016-10200

Some of these bugs where advertised, websites claim that they have been
used against some distros in security contests. Other devices may also
be subject to such abuses, so lets protect our systems.


This patch introduces "modules_autoload" kernel sysctl flag. The flag
controls modules auto-load feature and complements "modules_disabled" which
apply to all modules operations. This new flag allows to control only
automatic module loading and if it is allowed or not. This allows to
align implicit module loading with the explicit one where both now are
covered by capabilities checks.

The "modules_autoload" sysctl was inspired from grsecurity
'GRKERNSEC_MODHARDEN'.

/proc/sys/kernel/modules_autoload takes three values:

*) When set to (0), the default, there are no restrictions.

*) When set to (1), processes must have CAP_SYS_MODULE to be able to
trigger a module auto-load operation, or CAP_NET_ADMIN for modules with
a 'netdev-%s' alias. Maybe in future more capabilities will allow to
load the specific related modules.

*) When set to (2), automatic module loading is disabled for all. Once
set, this value can not be changed.


The patches also support process trees, containers, and sandboxes by
providing an inherited per-task "modules_autoload" flag that cannot be
re-enabled once disabled. Any task can set its "modules_autoload" by
using:

prctl(PR_SET_MODULES_AUTOLOAD, value, 0, 0, 0).

*) When value is (0), the default, automatic modules loading is allowed.

*) When value is (1), task must have CAP_SYS_MODULE to be able to
trigger a module auto-load operation, or CAP_NET_ADMIN for modules with
a 'netdev-%s' alias. The capabilities checks are in the initial user
namespace.

*) When value is (2), automatic modules loading is disabled for the
current task.

The per-task "modules_autoload" value may only be increased, never
decreased, thus ensuring that once applied, processes can never relax
their setting.

The prctl() interface allows to restrict automatic module loading for
untrusted users without affecting the functionality of the rest of the
system.

When a request to a kernel module is denied, the module name with the
corresponding process name and its pid are logged. Administrators can use
such information to explicitly load the appropriate modules.


# Testing:

The following tool can be used to test the feature:
https://gist.githubusercontent.com/tixxdz/ed1ed92b890cc7fd268b5bdcf578c460/raw/6fd42c2cda8aae94e1b8fbaf1ec217fe8e76a3b8/pr_modules_autoload.c


Before:
$ lsmod | grep ipip -
$ sudo ip tunnel add mytun mode ipip remote 10.0.2.100 local 10.0.2.15 ttl 255
$ lsmod | grep ipip -
ipip 16384 0
tunnel4 16384 1 ipip
ip_tunnel 28672 1 ipip


After:
$ lsmod | grep ipip -
$ ./pr_modules_autoload
$ grep "Modules" /proc/self/status
ModulesAutoload: 2
$ sudo ip tunnel add mytun mode ipip remote 10.0.2.100 local 10.0.2.15 ttl 255
add tunnel "tunl0" failed: No such device
$ lsmod | grep ipip
$ dmesg | tail -3
[ 16.363903] virbr0: port 1(virbr0-nic) entered disabled state
[ 823.565958] Automatic module loading of netdev-tunl0 by "ip"[1362] was denied
[ 823.565967] Automatic module loading of tunl0 by "ip"[1362] was denied


Finally we already have a use case for the prctl() interface to enforce
some systemd services [3], and we plan to use it for our containers and
sandboxes.


# Changes since v2:
*) Implemented as a core kernel feature inside capabilities subsystem
*) Renamed sysctl to "modules_autoload" to align with "modules_disabled"
*) Improved documentation.
*) Removed unused code.


# Changes since v1:
*) Renamed module to ModAutoRestrict
*) Improved documentation to explicity refer to module autoloading.
*) Switched to use the new task_security_alloc() hook.
*) Switched from rhash tables to use task->security since it is in
linux-security/next branch now.
*) Check all parameters passed to prctl() syscall.
*) Many other bug fixes and documentation improvements.


[1] http://www.openwall.com/lists/kernel-hardening/2017/02/02/21
[2] http://www.openwall.com/lists/kernel-hardening/2017/04/09/1
[3] https://github.com/systemd/systemd/pull/5736


2017-04-19 22:21:21

by Djalal Harouni

[permalink] [raw]
Subject: [PATCH v3 1/2] modules:capabilities: automatic module loading restriction

Currently, an explicit call to load or unload kernel modules require
CAP_SYS_MODULE capability. However unprivileged users have always been
able to load some modules using the implicit auto-load operation. An
automatic module loading happens when programs request a kernel feature
from a module that is not loaded. In order to satisfy userspace, the
kernel then automatically load all these required modules.

However, some programs may abuse the interface to load vulnerable or
buggy modules where system administrators still did not have a chance to
blacklist these modules. This affects the global state of the machine,
especially with containers where some applications may use it to exploit
the vulnerable parts and escape the container sandbox. Not to mention
that some devices which one may call IoT also started to use containers
semantics as a deployment workflow, but as an isolation tool too where the
base system image can be any generic distro or other root filesystem with
its own kernel. These setups may include unnecessary modules that the
final applications will not need. Untrusted access may abuse the module
auto-load feature to expose those vulnerabilities.

As every code contains bugs or vulnerabilties, the following
vulnerabilities that affected some features that are often compiled as
modules could have been completely blocked, by restricting autoloading
modules if the system does not need them.

Past months:
* DCCP use after free CVE-2017-6074
* n_hldc CVE-2017-2636
* XFRM framework CVE-2017-7184
* L2TPv3 CVE-2016-10200

This patch introduces "modules_autoload" kernel sysctl flag. The flag
controls modules auto-load feature and complements "modules_disabled" which
apply to all modules operations. This new flag allows to control only
automatic module loading and if it is allowed or not. This allows to
align implicit module loading with the explicit one where both now are
covered by capabilities checks.

The "modules_autoload" sysctl was inspired from grsecurity
'GRKERNSEC_MODHARDEN'.

/proc/sys/kernel/modules_autoload takes three values:

*) When set to (0), the default, there are no restrictions.

*) When set to (1), processes must have CAP_SYS_MODULE to be able to
trigger a module auto-load operation, or CAP_NET_ADMIN for modules with
a 'netdev-%s' alias. Maybe in future more capabilities will allow to
load the specific related modules.

*) When set to (2), automatic module loading is disabled for all. Once
set, this value can not be changed.

The patch adds cap_kernel_module_request() as a
security_kernel_module_request() hook in the capabilities subsystem
to enforce the permission checks. Each request_module() call, the
"modules_autoload" value is checked to allow or deny automatic module
loading.

Before:
$ lsmod | grep ipip -
$ sudo ip tunnel add mytun mode ipip remote 10.0.2.100 local 10.0.2.15 ttl 255
$ lsmod | grep ipip -
ipip 16384 0
tunnel4 16384 1 ipip
ip_tunnel 28672 1 ipip

After:
$ lsmod | grep ipip -
$ sudo ip tunnel add mytun mode ipip remote 10.0.2.100 local 10.0.2.15 ttl 255
add tunnel "tunl0" failed: No such device
$ dmesg | tail -3
[ 158.832544] ipip: IPv4 and MPLS over IPv4 tunneling driver
[ 416.040569] Automatic module loading of netdev-tunl0 by "ip"[1398] was denied
[ 416.040578] Automatic module loading of tunl0 by "ip"[1398] was denied
$ cat /proc/sys/kernel/modules_autoload
2

Cc: Serge Hallyn <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Suggested-by: Kees Cook <[email protected]>
Signed-off-by: Djalal Harouni <[email protected]>
---
Documentation/sysctl/kernel.txt | 23 +++++++++++++++++++++++
include/linux/module.h | 18 +++++++++++++++++-
include/linux/security.h | 3 ++-
kernel/module.c | 36 ++++++++++++++++++++++++++++++++++++
kernel/sysctl.c | 40 ++++++++++++++++++++++++++++++++++++++++
security/commoncap.c | 23 +++++++++++++++++++++++
6 files changed, 141 insertions(+), 2 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index bac23c1..318ba30 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -43,6 +43,7 @@ show up in /proc/sys/kernel:
- l2cr [ PPC only ]
- modprobe ==> Documentation/debugging-modules.txt
- modules_disabled
+- modules_autoload
- msg_next_id [ sysv ipc ]
- msgmax
- msgmnb
@@ -411,6 +412,28 @@ to false. Generally used with the "kexec_load_disabled" toggle.

==============================================================

+modules_autoload:
+
+A sysctl to control if modules auto-load feature is allowed or not.
+This sysctl complements "modules_disabled" which is for all module
+operations where this flag applies only to automatic module loading.
+Automatic module loading happens when programs request a kernel feature
+that is implemented by an unloaded module, the kernel automatically
+runs the program pointed by "modprobe" sysctl in order to load the
+corresponding module.
+
+When modules_autoload is set to (0), the default, there are no
+restrictions.
+
+When modules_autoload is set to (1), processes must have CAP_SYS_MODULE
+to be able to trigger a module auto-load operation, or CAP_NET_ADMIN
+for modules with a 'netdev-%s' alias.
+
+When modules_autoload is set to (2), automatic module loading is
+disabled for all. Once set, this value can not be changed.
+
+==============================================================
+
msg_next_id, sem_next_id, and shm_next_id:

These three toggles allows to specify desired id for next allocated IPC
diff --git a/include/linux/module.h b/include/linux/module.h
index 9ad6856..4b96c10 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -261,7 +261,16 @@ struct notifier_block;

#ifdef CONFIG_MODULES

-extern int modules_disabled; /* for sysctl */
+enum {
+ MODULES_AUTOLOAD_ALLOWED = 0,
+ MODULES_AUTOLOAD_PRIVILEGED = 1,
+ MODULES_AUTOLOAD_DISABLED = 2,
+};
+
+extern int modules_disabled; /* sysctl for explicit module load/unload */
+extern int modules_autoload; /* sysctl for automatic module loading */
+extern const int modules_autoload_max; /* sysctl max value for modules_autoload */
+
/* Get/put a kernel symbol (calls must be symmetric) */
void *__symbol_get(const char *symbol);
void *__symbol_get_gpl(const char *symbol);
@@ -497,6 +506,8 @@ bool __is_module_percpu_address(unsigned long addr, unsigned long *can_addr);
bool is_module_percpu_address(unsigned long addr);
bool is_module_text_address(unsigned long addr);

+int modules_autoload_access(char *kmod_name);
+
static inline bool within_module_core(unsigned long addr,
const struct module *mod)
{
@@ -641,6 +652,11 @@ static inline bool is_livepatch_module(struct module *mod)

#else /* !CONFIG_MODULES... */

+static inline int modules_autoload_access(char *kmod_name)
+{
+ return 0;
+}
+
static inline struct module *__module_address(unsigned long addr)
{
return NULL;
diff --git a/include/linux/security.h b/include/linux/security.h
index af675b5..e274bb11 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -96,6 +96,7 @@ extern int cap_task_setscheduler(struct task_struct *p);
extern int cap_task_setioprio(struct task_struct *p, int ioprio);
extern int cap_task_setnice(struct task_struct *p, int nice);
extern int cap_vm_enough_memory(struct mm_struct *mm, long pages);
+extern int cap_kernel_module_request(char *kmod_name);

struct msghdr;
struct sk_buff;
@@ -904,7 +905,7 @@ static inline int security_kernel_create_files_as(struct cred *cred,

static inline int security_kernel_module_request(char *kmod_name)
{
- return 0;
+ return cap_kernel_module_request(kmod_name);
}

static inline int security_kernel_read_file(struct file *file,
diff --git a/kernel/module.c b/kernel/module.c
index f953df9..54cb6e0 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -282,6 +282,8 @@ module_param(sig_enforce, bool_enable_only, 0644);

/* Block module loading/unloading? */
int modules_disabled = 0;
+int modules_autoload = MODULES_AUTOLOAD_ALLOWED;
+const int modules_autoload_max = MODULES_AUTOLOAD_DISABLED;
core_param(nomodule, modules_disabled, bint, 0);

/* Waiting for a module to finish initializing? */
@@ -4296,6 +4298,40 @@ struct module *__module_text_address(unsigned long addr)
}
EXPORT_SYMBOL_GPL(__module_text_address);

+/*
+ * Return 0 if CAP_SYS_MODULE or if CAP_NET_ADMIN and the module is
+ * with a 'netdev-%s' alias. Otherwise -EPERM is returned.
+ */
+static int modules_autoload_privileged_access(const char *name)
+{
+ if (capable(CAP_SYS_MODULE))
+ return 0;
+ else if (name && strstr(name, "netdev-") && capable(CAP_NET_ADMIN))
+ return 0;
+
+ return -EPERM;
+}
+
+/**
+ * modules_autoload_access - Determine whether a module auto-load is permitted
+ * @kmod_name: The module name
+ *
+ * Determine whether a module should be automatically loaded or not. The check
+ * uses the sysctl "modules_autoload" value.
+ *
+ * Returns 0 if the module request is allowed or -EPERM if not.
+ */
+int modules_autoload_access(char *kmod_name)
+{
+ if (modules_autoload == MODULES_AUTOLOAD_ALLOWED)
+ return 0;
+ else if (modules_autoload == MODULES_AUTOLOAD_PRIVILEGED)
+ return modules_autoload_privileged_access(kmod_name);
+
+ /* MODULES_AUTOLOAD_DISABLED */
+ return -EPERM;
+}
+
/* Don't grab lock, we're oopsing. */
void print_modules(void)
{
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 3359739..a3f6dfd 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -186,6 +186,11 @@ static int proc_taint(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos);
#endif

+#ifdef CONFIG_MODULES
+static int modules_autoload_dointvec_minmax(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos);
+#endif
+
#ifdef CONFIG_PRINTK
static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos);
@@ -661,6 +666,16 @@ static struct ctl_table kern_table[] = {
.extra1 = &one,
.extra2 = &one,
},
+ {
+ .procname = "modules_autoload",
+ .data = &modules_autoload,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ /* Handle pinning to max value */
+ .proc_handler = modules_autoload_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = (void *)&modules_autoload_max,
+ },
#endif
#ifdef CONFIG_UEVENT_HELPER
{
@@ -2332,6 +2347,31 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
}
#endif

+#ifdef CONFIG_MODULES
+static int modules_autoload_dointvec_minmax(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ struct ctl_table t;
+
+ /*
+ * Only CAP_SYS_MODULE in init user namespace are allowed to change this
+ */
+ if (write && !capable(CAP_SYS_MODULE))
+ return -EPERM;
+
+ t = *table;
+ /*
+ * If "modules_autoload" already equals max value
+ * MODULES_AUTOLOAD_DISABLED which means it is disabled,
+ * then its value can not be changed anymore.
+ */
+ if (modules_autoload == modules_autoload_max)
+ t.extra1 = t.extra2;
+
+ return proc_dointvec_minmax(&t, write, buffer, lenp, ppos);
+}
+#endif
+
struct do_proc_dointvec_minmax_conv_param {
int *min;
int *max;
diff --git a/security/commoncap.c b/security/commoncap.c
index 7abebd7..67a6cfe 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -1069,6 +1069,28 @@ int cap_mmap_file(struct file *file, unsigned long reqprot,
return 0;
}

+/**
+ * cap_kernel_module_request - Determine whether a module auto-load is permitted
+ * @kmod_name: The module name
+ *
+ * Determine whether a module should be automatically loaded due to a request
+ * by the current task. Returns 0 if the module request should be allowed
+ * -EPERM if not.
+ */
+int cap_kernel_module_request(char *kmod_name)
+{
+ int ret;
+ char comm[sizeof(current->comm)];
+
+ ret = modules_autoload_access(kmod_name);
+ if (ret < 0)
+ pr_notice_ratelimited(
+ "Automatic module loading of %.64s by \"%s\"[%d] was denied\n",
+ kmod_name, get_task_comm(comm, current), current->pid);
+
+ return ret;
+}
+
#ifdef CONFIG_SECURITY

struct security_hook_list capability_hooks[] __lsm_ro_after_init = {
@@ -1090,6 +1112,7 @@ struct security_hook_list capability_hooks[] __lsm_ro_after_init = {
LSM_HOOK_INIT(task_setioprio, cap_task_setioprio),
LSM_HOOK_INIT(task_setnice, cap_task_setnice),
LSM_HOOK_INIT(vm_enough_memory, cap_vm_enough_memory),
+ LSM_HOOK_INIT(kernel_module_request, cap_kernel_module_request),
};

void __init capability_add_hooks(void)
--
2.10.2

2017-04-19 22:21:24

by Djalal Harouni

[permalink] [raw]
Subject: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

Previous patches added the global "modules_autoload" restriction. This patch
make it possible to support process trees, containers, and sandboxes by
providing an inherited per-task "modules_autoload" flag that cannot be
re-enabled once disabled. This allows to restrict automatic module
loading without affecting the rest of the system.

Any task can set its "modules_autoload". Once set, this setting is inherited
across fork, clone and execve. With "modules_autoload" set, automatic
module loading will have first to satisfy the per-task access permissions
before attempting to implicitly load the module. For example, automatic
loading of modules that contain bugs or vulnerabilities can be
restricted and untrusted users can no longer abuse such interfaces

To set modules_autoload, use prctl(PR_SET_MODULES_AUTOLOAD, value, 0, 0, 0).

When value is (0), the default, automatic modules loading is allowed.

When value is (1), task must have CAP_SYS_MODULE to be able to trigger a
module auto-load operation, or CAP_NET_ADMIN for modules with a
'netdev-%s' alias.

When value is (2), automatic modules loading is disabled for the current
task.

The 'modules_autoload' value may only be increased, never decreased, thus
ensuring that once applied, processes can never relax their setting.

When a request to a kernel module is denied, the module name with the
corresponding process name and its pid are logged. Administrators can use
such information to explicitly load the appropriate modules.

The per-task "modules_autoload" restriction:

Before:
$ lsmod | grep ipip -
$ sudo ip tunnel add mytun mode ipip remote 10.0.2.100 local 10.0.2.15 ttl 255
$ lsmod | grep ipip -
ipip 16384 0
tunnel4 16384 1 ipip
ip_tunnel 28672 1 ipip

After:
$ lsmod | grep ipip -
$ ./pr_modules_autoload
$ grep "Modules" /proc/self/status
ModulesAutoload: 2
$ cat /proc/sys/kernel/modules_autoload
0
$ sudo ip tunnel add mytun mode ipip remote 10.0.2.100 local 10.0.2.15 ttl 255
add tunnel "tunl0" failed: No such device
$ lsmod | grep ipip
$ dmesg | tail -3
[ 16.363903] virbr0: port 1(virbr0-nic) entered disabled state
[ 823.565958] Automatic module loading of netdev-tunl0 by "ip"[1362] was denied
[ 823.565967] Automatic module loading of tunl0 by "ip"[1362] was denied

Cc: Serge Hallyn <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Suggested-by: Kees Cook <[email protected]>
Signed-off-by: Djalal Harouni <[email protected]>
---
Documentation/filesystems/proc.txt | 3 ++
Documentation/prctl/modules_autoload.txt | 49 +++++++++++++++++++++++++++++++
fs/proc/array.c | 6 ++++
include/linux/module.h | 48 ++++++++++++++++++++++++++++--
include/linux/sched.h | 5 ++++
include/linux/security.h | 2 +-
include/uapi/linux/prctl.h | 8 +++++
kernel/fork.c | 4 +++
kernel/module.c | 17 +++++++----
security/commoncap.c | 50 ++++++++++++++++++++++++++++----
10 files changed, 178 insertions(+), 14 deletions(-)
create mode 100644 Documentation/prctl/modules_autoload.txt

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 4cddbce..df4d145 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -194,6 +194,7 @@ read the file /proc/PID/status:
CapBnd: ffffffffffffffff
NoNewPrivs: 0
Seccomp: 0
+ ModulesAutoload: 0
voluntary_ctxt_switches: 0
nonvoluntary_ctxt_switches: 1

@@ -267,6 +268,8 @@ Table 1-2: Contents of the status files (as of 4.8)
CapBnd bitmap of capabilities bounding set
NoNewPrivs no_new_privs, like prctl(PR_GET_NO_NEW_PRIV, ...)
Seccomp seccomp mode, like prctl(PR_GET_SECCOMP, ...)
+ ModulesAutoload modules autoload, like
+ prctl(PR_GET_MODULES_AUTOLOAD, ...)
Cpus_allowed mask of CPUs on which this process may run
Cpus_allowed_list Same as previous, but in "list format"
Mems_allowed mask of memory nodes allowed to this process
diff --git a/Documentation/prctl/modules_autoload.txt b/Documentation/prctl/modules_autoload.txt
new file mode 100644
index 0000000..242852e
--- /dev/null
+++ b/Documentation/prctl/modules_autoload.txt
@@ -0,0 +1,49 @@
+A request to a kernel feature that is implemented by a module that is
+not loaded may trigger the module auto-load feature, allowing to
+transparently satisfy userspace. In this case an implicit kernel module
+load operation happens.
+
+Usually to load or unload a kernel module, an explicit operation happens
+where programs are required to have some capabilities in order to perform
+such operations. However, with the implicit module loading, no
+capabilities are required, anyone who is able to request a certain kernel
+feature, may also implicitly load its corresponding kernel module. This
+operation can be abused by unprivileged users to expose kernel interfaces
+that maybe privileged users did not want to be made available for various
+reasons: resources, bugs, vulnerabilties, etc. The DCCP vulnerability is
+(CVE-2017-6074) is one real example.
+
+The new per-task "modules_autoload" flag, is a new way to restrict
+automatic module loading, preventing the kernel from exposing more of
+its interface. This particularly useful for containers and sandboxes
+where sandboxed processes should affect the rest of the system.
+
+Any task can set "modules_autoload". Once set, this setting is inherited
+across fork, clone and execve. With "modules_autoload" set, automatic
+module loading will have first to satisfy the per-task access permissions
+before attempting to implicitly load the module. For example, automatic
+loading of modules that contain bugs or vulnerabilities can be
+restricted and imprivileged users can no longer abuse such interfaces.
+
+To set modules_autoload, use prctl(PR_SET_MODULES_AUTOLOAD, value, 0, 0, 0).
+
+When value is (0), the default, automatic modules loading is allowed.
+
+When value is (1), task must have CAP_SYS_MODULE to be able to trigger a
+module auto-load operation, or CAP_NET_ADMIN for modules with a
+'netdev-%s' alias.
+
+When value is (2), automatic modules loading is disabled for the current
+task.
+
+The 'modules_autoload' value may only be increased, never decreased, thus
+ensuring that once applied, processes can never relax their setting.
+
+When a request to a kernel module is denied, the module name with the
+corresponding process name and its pid are logged. Administrators can use
+such information to explicitly load the appropriate modules.
+
+Please note that even if the per-task "modules_autoload" value allows to
+auto-load the corresponding module, automatic module loading may still
+fail due to the global "modules_autoload" sysctl. For more details please
+see "modules_autoload" in Documentation/sysctl/kernel.txt
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 88c3555..cbcf087 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -88,6 +88,7 @@
#include <linux/string_helpers.h>
#include <linux/user_namespace.h>
#include <linux/fs_struct.h>
+#include <linux/module.h>

#include <asm/pgtable.h>
#include <asm/processor.h>
@@ -346,10 +347,15 @@ static inline void task_cap(struct seq_file *m, struct task_struct *p)

static inline void task_seccomp(struct seq_file *m, struct task_struct *p)
{
+ int autoload = task_modules_autoload(p);
+
seq_put_decimal_ull(m, "NoNewPrivs:\t", task_no_new_privs(p));
#ifdef CONFIG_SECCOMP
seq_put_decimal_ull(m, "\nSeccomp:\t", p->seccomp.mode);
#endif
+ if (autoload != -ENOSYS)
+ seq_put_decimal_ull(m, "\nModulesAutoload:\t", autoload);
+
seq_putc(m, '\n');
}

diff --git a/include/linux/module.h b/include/linux/module.h
index 4b96c10..595800f 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -13,6 +13,7 @@
#include <linux/kmod.h>
#include <linux/init.h>
#include <linux/elf.h>
+#include <linux/sched.h>
#include <linux/stringify.h>
#include <linux/kobject.h>
#include <linux/moduleparam.h>
@@ -506,7 +507,33 @@ bool __is_module_percpu_address(unsigned long addr, unsigned long *can_addr);
bool is_module_percpu_address(unsigned long addr);
bool is_module_text_address(unsigned long addr);

-int modules_autoload_access(char *kmod_name);
+int modules_autoload_access(struct task_struct *task, char *kmod_name);
+
+/* Sets task's modules_autoload */
+static inline int task_set_modules_autoload(struct task_struct *task,
+ unsigned long value)
+{
+ if (value > MODULES_AUTOLOAD_DISABLED)
+ return -EINVAL;
+ else if (task->modules_autoload > value)
+ return -EPERM;
+ else if (task->modules_autoload < value)
+ task->modules_autoload = value;
+
+ return 0;
+}
+
+/* Returns task's modules_autoload */
+static inline void task_copy_modules_autoload(struct task_struct *dest,
+ struct task_struct *src)
+{
+ dest->modules_autoload = src->modules_autoload;
+}
+
+static inline int task_modules_autoload(struct task_struct *task)
+{
+ return task->modules_autoload;
+}

static inline bool within_module_core(unsigned long addr,
const struct module *mod)
@@ -652,11 +679,28 @@ static inline bool is_livepatch_module(struct module *mod)

#else /* !CONFIG_MODULES... */

-static inline int modules_autoload_access(char *kmod_name)
+static inline int modules_autoload_access(struct task_struct *task,
+ char *kmod_name)
{
return 0;
}

+static inline int task_set_modules_autoload(struct task_struct *task,
+ unsigned long value)
+{
+ return -ENOSYS;
+}
+
+static inline void task_copy_modules_autoload(struct task_struct *dest,
+ struct task_struct *src)
+{
+}
+
+static inline int task_modules_autoload(struct task_struct *task)
+{
+ return -ENOSYS;
+}
+
static inline struct module *__module_address(unsigned long addr)
{
return NULL;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 48fb8bc..7264e62 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -613,6 +613,11 @@ struct task_struct {

struct restart_block restart_block;

+#ifdef CONFIG_MODULES
+ /* per-task modules autoload access */
+ unsigned modules_autoload:2;
+#endif
+
pid_t pid;
pid_t tgid;

diff --git a/include/linux/security.h b/include/linux/security.h
index e274bb11..9581cc5 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -866,7 +866,7 @@ static inline int security_task_create(unsigned long clone_flags)
static inline int security_task_alloc(struct task_struct *task,
unsigned long clone_flags)
{
- return 0;
+ return cap_task_alloc(task, clone_flags);
}

static inline void security_task_free(struct task_struct *task)
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a8d0759..0244264 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -197,4 +197,12 @@ struct prctl_mm_map {
# define PR_CAP_AMBIENT_LOWER 3
# define PR_CAP_AMBIENT_CLEAR_ALL 4

+/*
+ * Control the per-task "modules_autoload" access.
+ *
+ * See Documentation/prctl/modules_autoload.txt for more details.
+ */
+#define PR_SET_MODULES_AUTOLOAD 48
+#define PR_GET_MODULES_AUTOLOAD 49
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 81347bd..141e06b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1695,6 +1695,10 @@ static __latent_entropy struct task_struct *copy_process(
p->sequential_io_avg = 0;
#endif

+#ifdef CONFIG_MODULES
+ p->modules_autoload = 0;
+#endif
+
/* Perform scheduler related setup. Assign this task to a CPU. */
retval = sched_fork(clone_flags, p);
if (retval)
diff --git a/kernel/module.c b/kernel/module.c
index 54cb6e0..e1eca74 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -4313,19 +4313,24 @@ static int modules_autoload_privileged_access(const char *name)
}

/**
- * modules_autoload_access - Determine whether a module auto-load is permitted
+ * modules_autoload_access - Determine whether the task is allowed to perform a
+ * module auto-load request
+ * @task: The task performing the request
* @kmod_name: The module name
*
- * Determine whether a module should be automatically loaded or not. The check
- * uses the sysctl "modules_autoload" value.
+ * Determine whether the task is allowed to perform a module auto-load request.
+ * This checks the per-task "modules_autoload" flag, if the access is not denied,
+ * then the global sysctl "modules_autoload" is evaluated.
*
* Returns 0 if the module request is allowed or -EPERM if not.
*/
-int modules_autoload_access(char *kmod_name)
+int modules_autoload_access(struct task_struct *task, char *kmod_name)
{
- if (modules_autoload == MODULES_AUTOLOAD_ALLOWED)
+ unsigned int autoload = max_t(unsigned int,
+ modules_autoload, task->modules_autoload);
+ if (autoload == MODULES_AUTOLOAD_ALLOWED)
return 0;
- else if (modules_autoload == MODULES_AUTOLOAD_PRIVILEGED)
+ else if (autoload == MODULES_AUTOLOAD_PRIVILEGED)
return modules_autoload_privileged_access(kmod_name);

/* MODULES_AUTOLOAD_DISABLED */
diff --git a/security/commoncap.c b/security/commoncap.c
index 67a6cfe..bcc1e09 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -886,6 +886,40 @@ static int cap_prctl_drop(unsigned long cap)
return commit_creds(new);
}

+static int pr_set_mod_autoload(unsigned long arg2, unsigned long arg3,
+ unsigned long arg4, unsigned long arg5)
+{
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+
+ return task_set_modules_autoload(current, arg2);
+}
+
+static inline int pr_get_mod_autoload(unsigned long arg2, unsigned long arg3,
+ unsigned long arg4, unsigned long arg5)
+{
+ if (arg2 || arg3 || arg4 || arg5)
+ return -EINVAL;
+
+ return task_modules_autoload(current);
+}
+
+/**
+ * cap_task_alloc - Implement process context allocation for this security module
+ * @task: task being allocated
+ * @clone_flags: contains the clone flags indicating what should be shared.
+ *
+ * Allocate or initialize the task context for this security module.
+ *
+ * Returns 0.
+ */
+int cap_task_alloc(struct task_struct *task, unsigned long clone_flags)
+{
+ task_copy_modules_autoload(task, current);
+
+ return 0;
+}
+
/**
* cap_task_prctl - Implement process control functions for this security module
* @option: The process control function requested
@@ -1015,6 +1049,11 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
cap_lower(new->cap_ambient, arg3);
return commit_creds(new);
}
+ case PR_SET_MODULES_AUTOLOAD:
+ return pr_set_mod_autoload(arg2, arg3, arg4, arg5);
+
+ case PR_GET_MODULES_AUTOLOAD:
+ return pr_get_mod_autoload(arg2, arg3, arg4, arg5);

default:
/* No functionality available - continue with default */
@@ -1070,19 +1109,19 @@ int cap_mmap_file(struct file *file, unsigned long reqprot,
}

/**
- * cap_kernel_module_request - Determine whether a module auto-load is permitted
+ * cap_kernel_module_request - Determine whether current task is allowed to
+ * automatically load the specified module.
* @kmod_name: The module name
*
- * Determine whether a module should be automatically loaded due to a request
- * by the current task. Returns 0 if the module request should be allowed
- * -EPERM if not.
+ * Determine whether current task is allowed to automatically load the module.
+ * Returns 0 if current task is allowed to auto-load the module, -EPERM if not.
*/
int cap_kernel_module_request(char *kmod_name)
{
int ret;
char comm[sizeof(current->comm)];

- ret = modules_autoload_access(kmod_name);
+ ret = modules_autoload_access(current, kmod_name);
if (ret < 0)
pr_notice_ratelimited(
"Automatic module loading of %.64s by \"%s\"[%d] was denied\n",
@@ -1106,6 +1145,7 @@ struct security_hook_list capability_hooks[] __lsm_ro_after_init = {
LSM_HOOK_INIT(inode_killpriv, cap_inode_killpriv),
LSM_HOOK_INIT(mmap_addr, cap_mmap_addr),
LSM_HOOK_INIT(mmap_file, cap_mmap_file),
+ LSM_HOOK_INIT(task_alloc, cap_task_alloc),
LSM_HOOK_INIT(task_fix_setuid, cap_task_fix_setuid),
LSM_HOOK_INIT(task_prctl, cap_task_prctl),
LSM_HOOK_INIT(task_setscheduler, cap_task_setscheduler),
--
2.10.2

2017-04-19 22:39:06

by Djalal Harouni

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Thu, Apr 20, 2017 at 12:20 AM, Djalal Harouni <[email protected]> wrote:
[...]
> +/* Returns task's modules_autoload */
> +static inline void task_copy_modules_autoload(struct task_struct *dest,
> + struct task_struct *src)
> +{
> + dest->modules_autoload = src->modules_autoload;
> +}

Kees Cook just pointed out that this hook is not needed since we
already dup everything from parent to child.


[...]
>
> diff --git a/include/linux/security.h b/include/linux/security.h
> index e274bb11..9581cc5 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -866,7 +866,7 @@ static inline int security_task_create(unsigned long clone_flags)
> static inline int security_task_alloc(struct task_struct *task,
> unsigned long clone_flags)
> {
> - return 0;
> + return cap_task_alloc(task, clone_flags);
> }

Will remove it in next iteration.


> static inline void security_task_free(struct task_struct *task)
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index a8d0759..0244264 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -197,4 +197,12 @@ struct prctl_mm_map {
> # define PR_CAP_AMBIENT_LOWER 3
> # define PR_CAP_AMBIENT_CLEAR_ALL 4
>
> +/*
> + * Control the per-task "modules_autoload" access.
> + *
> + * See Documentation/prctl/modules_autoload.txt for more details.
> + */
> +#define PR_SET_MODULES_AUTOLOAD 48
> +#define PR_GET_MODULES_AUTOLOAD 49
> +
> #endif /* _LINUX_PRCTL_H */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 81347bd..141e06b 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1695,6 +1695,10 @@ static __latent_entropy struct task_struct *copy_process(
> p->sequential_io_avg = 0;
> #endif
>
> +#ifdef CONFIG_MODULES
> + p->modules_autoload = 0;
> +#endif
> +
> /* Perform scheduler related setup. Assign this task to a CPU. */
> retval = sched_fork(clone_flags, p);
> if (retval)
> diff --git a/kernel/module.c b/kernel/module.c
> index 54cb6e0..e1eca74 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -4313,19 +4313,24 @@ static int modules_autoload_privileged_access(const char *name)
> }
>
> /**
> - * modules_autoload_access - Determine whether a module auto-load is permitted
> + * modules_autoload_access - Determine whether the task is allowed to perform a
> + * module auto-load request
> + * @task: The task performing the request
> * @kmod_name: The module name
> *
> - * Determine whether a module should be automatically loaded or not. The check
> - * uses the sysctl "modules_autoload" value.
> + * Determine whether the task is allowed to perform a module auto-load request.
> + * This checks the per-task "modules_autoload" flag, if the access is not denied,
> + * then the global sysctl "modules_autoload" is evaluated.
> *
> * Returns 0 if the module request is allowed or -EPERM if not.
> */
> -int modules_autoload_access(char *kmod_name)
> +int modules_autoload_access(struct task_struct *task, char *kmod_name)
> {
> - if (modules_autoload == MODULES_AUTOLOAD_ALLOWED)
> + unsigned int autoload = max_t(unsigned int,
> + modules_autoload, task->modules_autoload);
> + if (autoload == MODULES_AUTOLOAD_ALLOWED)
> return 0;
> - else if (modules_autoload == MODULES_AUTOLOAD_PRIVILEGED)
> + else if (autoload == MODULES_AUTOLOAD_PRIVILEGED)
> return modules_autoload_privileged_access(kmod_name);
>
> /* MODULES_AUTOLOAD_DISABLED */
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 67a6cfe..bcc1e09 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -886,6 +886,40 @@ static int cap_prctl_drop(unsigned long cap)
> return commit_creds(new);
> }
>
> +static int pr_set_mod_autoload(unsigned long arg2, unsigned long arg3,
> + unsigned long arg4, unsigned long arg5)
> +{
> + if (arg3 || arg4 || arg5)
> + return -EINVAL;
> +
> + return task_set_modules_autoload(current, arg2);
> +}
> +
> +static inline int pr_get_mod_autoload(unsigned long arg2, unsigned long arg3,
> + unsigned long arg4, unsigned long arg5)
> +{
> + if (arg2 || arg3 || arg4 || arg5)
> + return -EINVAL;
> +
> + return task_modules_autoload(current);
> +}
> +
> +/**
> + * cap_task_alloc - Implement process context allocation for this security module
> + * @task: task being allocated
> + * @clone_flags: contains the clone flags indicating what should be shared.
> + *
> + * Allocate or initialize the task context for this security module.
> + *
> + * Returns 0.
> + */
> +int cap_task_alloc(struct task_struct *task, unsigned long clone_flags)
> +{
> + task_copy_modules_autoload(task, current);
> +
> + return 0;
> +}

All the calls to initialize task->modules_autoload or dup its value
using the task_alloc will be removed in next iteration as pointed out
by Kees.

Thanks!

--
tixxdz

2017-04-19 23:16:03

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Wed, Apr 19, 2017 at 3:20 PM, Djalal Harouni <[email protected]> wrote:
> Previous patches added the global "modules_autoload" restriction. This patch
> make it possible to support process trees, containers, and sandboxes by
> providing an inherited per-task "modules_autoload" flag that cannot be
> re-enabled once disabled. This allows to restrict automatic module
> loading without affecting the rest of the system.
>
> Any task can set its "modules_autoload". Once set, this setting is inherited
> across fork, clone and execve. With "modules_autoload" set, automatic
> module loading will have first to satisfy the per-task access permissions
> before attempting to implicitly load the module. For example, automatic
> loading of modules that contain bugs or vulnerabilities can be
> restricted and untrusted users can no longer abuse such interfaces
>
> To set modules_autoload, use prctl(PR_SET_MODULES_AUTOLOAD, value, 0, 0, 0).
>
> When value is (0), the default, automatic modules loading is allowed.
>
> When value is (1), task must have CAP_SYS_MODULE to be able to trigger a
> module auto-load operation, or CAP_NET_ADMIN for modules with a
> 'netdev-%s' alias.
>
> When value is (2), automatic modules loading is disabled for the current
> task.
>
> The 'modules_autoload' value may only be increased, never decreased, thus
> ensuring that once applied, processes can never relax their setting.
>
> When a request to a kernel module is denied, the module name with the
> corresponding process name and its pid are logged. Administrators can use
> such information to explicitly load the appropriate modules.
>
> The per-task "modules_autoload" restriction:
>
> Before:
> $ lsmod | grep ipip -
> $ sudo ip tunnel add mytun mode ipip remote 10.0.2.100 local 10.0.2.15 ttl 255
> $ lsmod | grep ipip -
> ipip 16384 0
> tunnel4 16384 1 ipip
> ip_tunnel 28672 1 ipip
>
> After:
> $ lsmod | grep ipip -
> $ ./pr_modules_autoload
> $ grep "Modules" /proc/self/status
> ModulesAutoload: 2
> $ cat /proc/sys/kernel/modules_autoload
> 0
> $ sudo ip tunnel add mytun mode ipip remote 10.0.2.100 local 10.0.2.15 ttl 255
> add tunnel "tunl0" failed: No such device
> $ lsmod | grep ipip
> $ dmesg | tail -3
> [ 16.363903] virbr0: port 1(virbr0-nic) entered disabled state
> [ 823.565958] Automatic module loading of netdev-tunl0 by "ip"[1362] was denied
> [ 823.565967] Automatic module loading of tunl0 by "ip"[1362] was denied
>
> Cc: Serge Hallyn <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Suggested-by: Kees Cook <[email protected]>
> Signed-off-by: Djalal Harouni <[email protected]>
> ---
> Documentation/filesystems/proc.txt | 3 ++
> Documentation/prctl/modules_autoload.txt | 49 +++++++++++++++++++++++++++++++
> fs/proc/array.c | 6 ++++
> include/linux/module.h | 48 ++++++++++++++++++++++++++++--
> include/linux/sched.h | 5 ++++
> include/linux/security.h | 2 +-
> include/uapi/linux/prctl.h | 8 +++++
> kernel/fork.c | 4 +++
> kernel/module.c | 17 +++++++----
> security/commoncap.c | 50 ++++++++++++++++++++++++++++----
> 10 files changed, 178 insertions(+), 14 deletions(-)
> create mode 100644 Documentation/prctl/modules_autoload.txt
>
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 4cddbce..df4d145 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -194,6 +194,7 @@ read the file /proc/PID/status:
> CapBnd: ffffffffffffffff
> NoNewPrivs: 0
> Seccomp: 0
> + ModulesAutoload: 0
> voluntary_ctxt_switches: 0
> nonvoluntary_ctxt_switches: 1
>
> @@ -267,6 +268,8 @@ Table 1-2: Contents of the status files (as of 4.8)
> CapBnd bitmap of capabilities bounding set
> NoNewPrivs no_new_privs, like prctl(PR_GET_NO_NEW_PRIV, ...)
> Seccomp seccomp mode, like prctl(PR_GET_SECCOMP, ...)
> + ModulesAutoload modules autoload, like
> + prctl(PR_GET_MODULES_AUTOLOAD, ...)
> Cpus_allowed mask of CPUs on which this process may run
> Cpus_allowed_list Same as previous, but in "list format"
> Mems_allowed mask of memory nodes allowed to this process
> diff --git a/Documentation/prctl/modules_autoload.txt b/Documentation/prctl/modules_autoload.txt
> new file mode 100644
> index 0000000..242852e
> --- /dev/null
> +++ b/Documentation/prctl/modules_autoload.txt
> @@ -0,0 +1,49 @@
> +A request to a kernel feature that is implemented by a module that is
> +not loaded may trigger the module auto-load feature, allowing to
> +transparently satisfy userspace. In this case an implicit kernel module
> +load operation happens.
> +
> +Usually to load or unload a kernel module, an explicit operation happens
> +where programs are required to have some capabilities in order to perform
> +such operations. However, with the implicit module loading, no
> +capabilities are required, anyone who is able to request a certain kernel
> +feature, may also implicitly load its corresponding kernel module. This
> +operation can be abused by unprivileged users to expose kernel interfaces
> +that maybe privileged users did not want to be made available for various
> +reasons: resources, bugs, vulnerabilties, etc. The DCCP vulnerability is
> +(CVE-2017-6074) is one real example.
> +
> +The new per-task "modules_autoload" flag, is a new way to restrict
> +automatic module loading, preventing the kernel from exposing more of
> +its interface. This particularly useful for containers and sandboxes
> +where sandboxed processes should affect the rest of the system.
> +
> +Any task can set "modules_autoload". Once set, this setting is inherited
> +across fork, clone and execve. With "modules_autoload" set, automatic
> +module loading will have first to satisfy the per-task access permissions
> +before attempting to implicitly load the module. For example, automatic
> +loading of modules that contain bugs or vulnerabilities can be
> +restricted and imprivileged users can no longer abuse such interfaces.
> +
> +To set modules_autoload, use prctl(PR_SET_MODULES_AUTOLOAD, value, 0, 0, 0).
> +
> +When value is (0), the default, automatic modules loading is allowed.
> +
> +When value is (1), task must have CAP_SYS_MODULE to be able to trigger a
> +module auto-load operation, or CAP_NET_ADMIN for modules with a
> +'netdev-%s' alias.
> +
> +When value is (2), automatic modules loading is disabled for the current
> +task.
> +
> +The 'modules_autoload' value may only be increased, never decreased, thus
> +ensuring that once applied, processes can never relax their setting.
> +
> +When a request to a kernel module is denied, the module name with the
> +corresponding process name and its pid are logged. Administrators can use
> +such information to explicitly load the appropriate modules.
> +
> +Please note that even if the per-task "modules_autoload" value allows to
> +auto-load the corresponding module, automatic module loading may still
> +fail due to the global "modules_autoload" sysctl. For more details please
> +see "modules_autoload" in Documentation/sysctl/kernel.txt
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index 88c3555..cbcf087 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -88,6 +88,7 @@
> #include <linux/string_helpers.h>
> #include <linux/user_namespace.h>
> #include <linux/fs_struct.h>
> +#include <linux/module.h>
>
> #include <asm/pgtable.h>
> #include <asm/processor.h>
> @@ -346,10 +347,15 @@ static inline void task_cap(struct seq_file *m, struct task_struct *p)
>
> static inline void task_seccomp(struct seq_file *m, struct task_struct *p)
> {
> + int autoload = task_modules_autoload(p);
> +
> seq_put_decimal_ull(m, "NoNewPrivs:\t", task_no_new_privs(p));
> #ifdef CONFIG_SECCOMP
> seq_put_decimal_ull(m, "\nSeccomp:\t", p->seccomp.mode);
> #endif
> + if (autoload != -ENOSYS)
> + seq_put_decimal_ull(m, "\nModulesAutoload:\t", autoload);
> +
> seq_putc(m, '\n');
> }
>
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 4b96c10..595800f 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -13,6 +13,7 @@
> #include <linux/kmod.h>
> #include <linux/init.h>
> #include <linux/elf.h>
> +#include <linux/sched.h>
> #include <linux/stringify.h>
> #include <linux/kobject.h>
> #include <linux/moduleparam.h>
> @@ -506,7 +507,33 @@ bool __is_module_percpu_address(unsigned long addr, unsigned long *can_addr);
> bool is_module_percpu_address(unsigned long addr);
> bool is_module_text_address(unsigned long addr);
>
> -int modules_autoload_access(char *kmod_name);
> +int modules_autoload_access(struct task_struct *task, char *kmod_name);
> +
> +/* Sets task's modules_autoload */
> +static inline int task_set_modules_autoload(struct task_struct *task,
> + unsigned long value)
> +{
> + if (value > MODULES_AUTOLOAD_DISABLED)
> + return -EINVAL;
> + else if (task->modules_autoload > value)
> + return -EPERM;
> + else if (task->modules_autoload < value)
> + task->modules_autoload = value;
> +
> + return 0;
> +}

This needs to be more locked down. Otherwise someone could set this
and then run a setuid program. Admittedly, it would be quite odd if
this particular thing causes a problem, but the issue exists
nonetheless.

More generally, I think this feature would fit in fairly nicely with
my "implicit rights" idea. Unfortunately, Linus hated it, but maybe
if I actually implemented it, he wouldn't hate it so much.

2017-04-19 23:16:48

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] modules:capabilities: automatic module loading restriction

On Wed, Apr 19, 2017 at 3:20 PM, Djalal Harouni <[email protected]> wrote:
> Currently, an explicit call to load or unload kernel modules require
> CAP_SYS_MODULE capability. However unprivileged users have always been
> able to load some modules using the implicit auto-load operation. An
> automatic module loading happens when programs request a kernel feature
> from a module that is not loaded. In order to satisfy userspace, the
> kernel then automatically load all these required modules.

I like this feature.

--Andy

2017-04-19 23:43:25

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Wed, Apr 19, 2017 at 4:15 PM, Andy Lutomirski <[email protected]> wrote:
> On Wed, Apr 19, 2017 at 3:20 PM, Djalal Harouni <[email protected]> wrote:
>> +/* Sets task's modules_autoload */
>> +static inline int task_set_modules_autoload(struct task_struct *task,
>> + unsigned long value)
>> +{
>> + if (value > MODULES_AUTOLOAD_DISABLED)
>> + return -EINVAL;
>> + else if (task->modules_autoload > value)
>> + return -EPERM;
>> + else if (task->modules_autoload < value)
>> + task->modules_autoload = value;
>> +
>> + return 0;
>> +}
>
> This needs to be more locked down. Otherwise someone could set this
> and then run a setuid program. Admittedly, it would be quite odd if
> this particular thing causes a problem, but the issue exists
> nonetheless.

Eeeh, I don't agree this needs to be changed. APIs provided by modules
are different than the existing privilege-manipulation syscalls this
concern stems from. Applications are already forced to deal with
things being missing like this in the face of it simply not being
built into the kernel.

Having to hide this behind nnp seems like it'd reduce its utility...

-Kees

--
Kees Cook
Pixel Security

2017-04-20 01:58:41

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

Hi Djalal,

[auto build test ERROR on security/next]
[also build test ERROR on next-20170419]
[cannot apply to linus/master v4.11-rc7]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Djalal-Harouni/modules-capabilities-automatic-module-loading-restrictions/20170420-093603
base: https://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git next
config: x86_64-randconfig-x017-201716 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

All errors (new ones prefixed by >>):

In file included from include/net/scm.h:7:0,
from include/linux/netlink.h:8,
from include/uapi/linux/neighbour.h:5,
from include/linux/netdevice.h:51,
from include/net/sock.h:51,
from fs//ocfs2/cluster/tcp.h:32,
from fs//ocfs2/cluster/heartbeat.c:41:
include/linux/security.h: In function 'security_task_alloc':
>> include/linux/security.h:869:9: error: implicit declaration of function 'cap_task_alloc' [-Werror=implicit-function-declaration]
return cap_task_alloc(task, clone_flags);
^~~~~~~~~~~~~~
cc1: some warnings being treated as errors

vim +/cap_task_alloc +869 include/linux/security.h

863 return 0;
864 }
865
866 static inline int security_task_alloc(struct task_struct *task,
867 unsigned long clone_flags)
868 {
> 869 return cap_task_alloc(task, clone_flags);
870 }
871
872 static inline void security_task_free(struct task_struct *task)

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.86 kB)
.config.gz (26.64 kB)
Download all attachments

2017-04-20 02:22:44

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] modules:capabilities: automatic module loading restriction

On Thu, 2017-04-20 at 00:20 +0200, Djalal Harouni wrote:
[...] 
> +modules_autoload:
> +
> +A sysctl to control if modules auto-load feature is allowed or not.
> +This sysctl complements "modules_disabled" which is for all module
> +operations where this flag applies only to automatic module loading.
> +Automatic module loading happens when programs request a kernel feature
> +that is implemented by an unloaded module, the kernel automatically
> +runs the program pointed by "modprobe" sysctl in order to load the
> +corresponding module.
> +
> +When modules_autoload is set to (0), the default, there are no
> +restrictions.
> +
> +When modules_autoload is set to (1), processes must have CAP_SYS_MODULE
> +to be able to trigger a module auto-load operation, or CAP_NET_ADMIN
> +for modules with a 'netdev-%s' alias.
> +
> +When modules_autoload is set to (2), automatic module loading is
> +disabled for all. Once set, this value can not be changed.

I would expect a parameter 'modules_autoload' to be a boolean, so this
behaviour would be surprising.

What is the point of mode 2? Why would someone want to set
modules_disabled=0 and modules_autoload=2?

[...]
> --- a/kernel/module.c
> +++ b/kernel/module.c
[...]
> +static int modules_autoload_privileged_access(const char *name)
> +{
> + if (capable(CAP_SYS_MODULE))
> + return 0;
> + else if (name && strstr(name, "netdev-") && capable(CAP_NET_ADMIN))
[...]

We want a prefix match, so use strncmp() not strstr().

Ben.

--
Ben Hutchings
It is easier to change the specification to fit the program than vice
versa.


Attachments:
signature.asc (833.00 B)
This is a digitally signed message part

2017-04-20 02:42:14

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Wed, Apr 19, 2017 at 4:43 PM, Kees Cook <[email protected]> wrote:
> On Wed, Apr 19, 2017 at 4:15 PM, Andy Lutomirski <[email protected]> wrote:
>> On Wed, Apr 19, 2017 at 3:20 PM, Djalal Harouni <[email protected]> wrote:
>>> +/* Sets task's modules_autoload */
>>> +static inline int task_set_modules_autoload(struct task_struct *task,
>>> + unsigned long value)
>>> +{
>>> + if (value > MODULES_AUTOLOAD_DISABLED)
>>> + return -EINVAL;
>>> + else if (task->modules_autoload > value)
>>> + return -EPERM;
>>> + else if (task->modules_autoload < value)
>>> + task->modules_autoload = value;
>>> +
>>> + return 0;
>>> +}
>>
>> This needs to be more locked down. Otherwise someone could set this
>> and then run a setuid program. Admittedly, it would be quite odd if
>> this particular thing causes a problem, but the issue exists
>> nonetheless.
>
> Eeeh, I don't agree this needs to be changed. APIs provided by modules
> are different than the existing privilege-manipulation syscalls this
> concern stems from. Applications are already forced to deal with
> things being missing like this in the face of it simply not being
> built into the kernel.
>
> Having to hide this behind nnp seems like it'd reduce its utility...
>

I think that adding an inherited boolean to task_struct that can be
set by unprivileged tasks and passed to privileged tasks is a terrible
precedent. Ideally someone would try to find all the existing things
like this and kill them off.

I agree that I don't see how one would exploit this particular
feature, but I still think I dislike the approach. This is a slippery
slope to adding a boolean for perf_event_open(), unshare(), etc, and
we should solve these for real rather than half-arsing them IMO.

2017-04-20 12:45:01

by Djalal Harouni

[permalink] [raw]
Subject: Re: [kernel-hardening] Re: [PATCH v3 1/2] modules:capabilities: automatic module loading restriction

On Thu, Apr 20, 2017 at 4:22 AM, Ben Hutchings <[email protected]> wrote:
> On Thu, 2017-04-20 at 00:20 +0200, Djalal Harouni wrote:
> [...]
>> +modules_autoload:
>> +
>> +A sysctl to control if modules auto-load feature is allowed or not.
>> +This sysctl complements "modules_disabled" which is for all module
>> +operations where this flag applies only to automatic module loading.
>> +Automatic module loading happens when programs request a kernel feature
>> +that is implemented by an unloaded module, the kernel automatically
>> +runs the program pointed by "modprobe" sysctl in order to load the
>> +corresponding module.
>> +
>> +When modules_autoload is set to (0), the default, there are no
>> +restrictions.
>> +
>> +When modules_autoload is set to (1), processes must have CAP_SYS_MODULE
>> +to be able to trigger a module auto-load operation, or CAP_NET_ADMIN
>> +for modules with a 'netdev-%s' alias.
>> +
>> +When modules_autoload is set to (2), automatic module loading is
>> +disabled for all. Once set, this value can not be changed.
>
> I would expect a parameter 'modules_autoload' to be a boolean, so this
> behaviour would be surprising.
>
> What is the point of mode 2? Why would someone want to set
> modules_disabled=0 and modules_autoload=2?

modules_disabled is too restrictive and once set it can't be changed,
maybe that's why not all users use it.

With modules_disabled=0 and modules_autoload=2

* The functionality of the system can still be made available.
* You only disable automatic module loading
* Explicit module load/unload can still happen. Administrators or
privileged programs can still explicitly load modules provide a
feature without rebooting.
* You are able to restrict some applications from inserting new
modules at all by also applying a seccomp filter and removing their
CAP_SYS_MODULE, where explicit load/unload is still available to
others.
* You are able to unload an old bad version of the module without
rebooting, and maybe load the new version.


> [...]
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
> [...]
>> +static int modules_autoload_privileged_access(const char *name)
>> +{
>> + if (capable(CAP_SYS_MODULE))
>> + return 0;
>> + else if (name && strstr(name, "netdev-") && capable(CAP_NET_ADMIN))
> [...]
>
> We want a prefix match, so use strncmp() not strstr().

Indeed, will fix it.

Thanks!


> Ben.
>
> --
> Ben Hutchings
> It is easier to change the specification to fit the program than vice
> versa.
>



--
tixxdz

2017-04-20 15:02:44

by Ben Hutchings

[permalink] [raw]
Subject: Re: [kernel-hardening] Re: [PATCH v3 1/2] modules:capabilities: automatic module loading restriction

On Thu, 2017-04-20 at 14:44 +0200, Djalal Harouni wrote:
> > On Thu, Apr 20, 2017 at 4:22 AM, Ben Hutchings <[email protected]> wrote:
> > On Thu, 2017-04-20 at 00:20 +0200, Djalal Harouni wrote:
> > [...]
> > > +modules_autoload:
> > > +
> > > +A sysctl to control if modules auto-load feature is allowed or not.
> > > +This sysctl complements "modules_disabled" which is for all module
> > > +operations where this flag applies only to automatic module loading.
> > > +Automatic module loading happens when programs request a kernel feature
> > > +that is implemented by an unloaded module, the kernel automatically
> > > +runs the program pointed by "modprobe" sysctl in order to load the
> > > +corresponding module.
> > > +
> > > +When modules_autoload is set to (0), the default, there are no
> > > +restrictions.
> > > +
> > > +When modules_autoload is set to (1), processes must have CAP_SYS_MODULE
> > > +to be able to trigger a module auto-load operation, or CAP_NET_ADMIN
> > > +for modules with a 'netdev-%s' alias.
> > > +
> > > +When modules_autoload is set to (2), automatic module loading is
> > > +disabled for all. Once set, this value can not be changed.
> >
> > I would expect a parameter 'modules_autoload' to be a boolean, so this
> > behaviour would be surprising.
> >
> > What is the point of mode 2?  Why would someone want to set
> > modules_disabled=0 and modules_autoload=2?
>
> modules_disabled is too restrictive and once set it can't be changed,
> maybe that's why not all users use it.
>
> With modules_disabled=0 and modules_autoload=2
[...]

Hmm, OK. How about naming this modules_autoload_mode, then, so that
it's obviously not a boolean?

Ben.

--
Ben Hutchings
It is easier to change the specification to fit the program than vice
versa.


Attachments:
signature.asc (833.00 B)
This is a digitally signed message part

2017-04-20 20:39:27

by Djalal Harouni

[permalink] [raw]
Subject: Re: [kernel-hardening] Re: [PATCH v3 1/2] modules:capabilities: automatic module loading restriction

On Thu, Apr 20, 2017 at 5:02 PM, Ben Hutchings <[email protected]> wrote:
> On Thu, 2017-04-20 at 14:44 +0200, Djalal Harouni wrote:
>> > On Thu, Apr 20, 2017 at 4:22 AM, Ben Hutchings <[email protected]> wrote:
>> > On Thu, 2017-04-20 at 00:20 +0200, Djalal Harouni wrote:
>> > [...]
[...]
>> modules_disabled is too restrictive and once set it can't be changed,
>> maybe that's why not all users use it.
>>
>> With modules_disabled=0 and modules_autoload=2
> [...]
>
> Hmm, OK. How about naming this modules_autoload_mode, then, so that
> it's obviously not a boolean?

Yes that's fine by me, kees already suggested to rename it to
"modules_autoload" I can change it to that if it's the best
suggestion!

Thanks!

> Ben.
>
> --
> Ben Hutchings
> It is easier to change the specification to fit the program than vice
> versa.
>



--
tixxdz

2017-04-20 21:29:06

by Kees Cook

[permalink] [raw]
Subject: Re: [kernel-hardening] Re: [PATCH v3 1/2] modules:capabilities: automatic module loading restriction

On Thu, Apr 20, 2017 at 1:39 PM, Djalal Harouni <[email protected]> wrote:
> On Thu, Apr 20, 2017 at 5:02 PM, Ben Hutchings <[email protected]> wrote:
>> On Thu, 2017-04-20 at 14:44 +0200, Djalal Harouni wrote:
>>> > On Thu, Apr 20, 2017 at 4:22 AM, Ben Hutchings <[email protected]> wrote:
>>> > On Thu, 2017-04-20 at 00:20 +0200, Djalal Harouni wrote:
>>> > [...]
> [...]
>>> modules_disabled is too restrictive and once set it can't be changed,
>>> maybe that's why not all users use it.
>>>
>>> With modules_disabled=0 and modules_autoload=2
>> [...]
>>
>> Hmm, OK. How about naming this modules_autoload_mode, then, so that
>> it's obviously not a boolean?
>
> Yes that's fine by me, kees already suggested to rename it to
> "modules_autoload" I can change it to that if it's the best
> suggestion!

That's fine by me.

-Kees

--
Kees Cook
Pixel Security

2017-04-21 23:19:14

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Wed, Apr 19, 2017 at 7:41 PM, Andy Lutomirski <[email protected]> wrote:
> On Wed, Apr 19, 2017 at 4:43 PM, Kees Cook <[email protected]> wrote:
>> On Wed, Apr 19, 2017 at 4:15 PM, Andy Lutomirski <[email protected]> wrote:
>>> On Wed, Apr 19, 2017 at 3:20 PM, Djalal Harouni <[email protected]> wrote:
>>>> +/* Sets task's modules_autoload */
>>>> +static inline int task_set_modules_autoload(struct task_struct *task,
>>>> + unsigned long value)
>>>> +{
>>>> + if (value > MODULES_AUTOLOAD_DISABLED)
>>>> + return -EINVAL;
>>>> + else if (task->modules_autoload > value)
>>>> + return -EPERM;
>>>> + else if (task->modules_autoload < value)
>>>> + task->modules_autoload = value;
>>>> +
>>>> + return 0;
>>>> +}
>>>
>>> This needs to be more locked down. Otherwise someone could set this
>>> and then run a setuid program. Admittedly, it would be quite odd if
>>> this particular thing causes a problem, but the issue exists
>>> nonetheless.
>>
>> Eeeh, I don't agree this needs to be changed. APIs provided by modules
>> are different than the existing privilege-manipulation syscalls this
>> concern stems from. Applications are already forced to deal with
>> things being missing like this in the face of it simply not being
>> built into the kernel.
>>
>> Having to hide this behind nnp seems like it'd reduce its utility...
>>
>
> I think that adding an inherited boolean to task_struct that can be
> set by unprivileged tasks and passed to privileged tasks is a terrible
> precedent. Ideally someone would try to find all the existing things
> like this and kill them off.

(Tristate, not boolean, but yeah.)

I see two others besides seccomp and nnp:

PR_MCE_KILL
PR_SET_THP_DISABLE

I really don't think this needs nnp protection.

> I agree that I don't see how one would exploit this particular
> feature, but I still think I dislike the approach. This is a slippery
> slope to adding a boolean for perf_event_open(), unshare(), etc, and
> we should solve these for real rather than half-arsing them IMO.

I disagree (obviously); this would be protecting the entire module
autoload attack surface. That's hardly a specific control, and it's a
demonstrably needed flag.

-Kees

--
Kees Cook
Pixel Security

2017-04-21 23:29:24

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Fri, Apr 21, 2017 at 4:19 PM, Kees Cook <[email protected]> wrote:
> On Wed, Apr 19, 2017 at 7:41 PM, Andy Lutomirski <[email protected]> wrote:
>> On Wed, Apr 19, 2017 at 4:43 PM, Kees Cook <[email protected]> wrote:
>>> On Wed, Apr 19, 2017 at 4:15 PM, Andy Lutomirski <[email protected]> wrote:
>>>> On Wed, Apr 19, 2017 at 3:20 PM, Djalal Harouni <[email protected]> wrote:
>>>>> +/* Sets task's modules_autoload */
>>>>> +static inline int task_set_modules_autoload(struct task_struct *task,
>>>>> + unsigned long value)
>>>>> +{
>>>>> + if (value > MODULES_AUTOLOAD_DISABLED)
>>>>> + return -EINVAL;
>>>>> + else if (task->modules_autoload > value)
>>>>> + return -EPERM;
>>>>> + else if (task->modules_autoload < value)
>>>>> + task->modules_autoload = value;
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>
>>>> This needs to be more locked down. Otherwise someone could set this
>>>> and then run a setuid program. Admittedly, it would be quite odd if
>>>> this particular thing causes a problem, but the issue exists
>>>> nonetheless.
>>>
>>> Eeeh, I don't agree this needs to be changed. APIs provided by modules
>>> are different than the existing privilege-manipulation syscalls this
>>> concern stems from. Applications are already forced to deal with
>>> things being missing like this in the face of it simply not being
>>> built into the kernel.
>>>
>>> Having to hide this behind nnp seems like it'd reduce its utility...
>>>
>>
>> I think that adding an inherited boolean to task_struct that can be
>> set by unprivileged tasks and passed to privileged tasks is a terrible
>> precedent. Ideally someone would try to find all the existing things
>> like this and kill them off.
>
> (Tristate, not boolean, but yeah.)
>
> I see two others besides seccomp and nnp:
>
> PR_MCE_KILL

Well, that's interesting. That should presumably be reset on setuid
exec or something.

> PR_SET_THP_DISABLE

Um. At least that's just a performance issue.

>
> I really don't think this needs nnp protection.
>
>> I agree that I don't see how one would exploit this particular
>> feature, but I still think I dislike the approach. This is a slippery
>> slope to adding a boolean for perf_event_open(), unshare(), etc, and
>> we should solve these for real rather than half-arsing them IMO.
>
> I disagree (obviously); this would be protecting the entire module
> autoload attack surface. That's hardly a specific control, and it's a
> demonstrably needed flag.
>

The list is just going to get longer. We should probably have controls for:

- Use of perf. Unclear how fine grained they should be.

- Creation of new user namespaces. Possibly also use of things like
iptables without global privilege.

- Ability to look up tasks owned by different uids (or maybe other
tasks *at all*) by pid/tid. Conceptually, this is easy. The API is
the only hard part, I think.

- Ability to bind ports, maybe?

My point is that all of these need some way to handle configuration
and inheritance, and I don't think that a bunch of per-task prctls is
the right way. As just an example, saying that interactive users can
autoload modules but other users can't, or that certain systemd
services can, etc, might be nice. Linus already complained that he
(i.e. user "torvalds" or whatever) should be able to profile the
kernel but that other uids should not be able to.

I personally like my implicit_rights idea, and it might be interesting
to prototype it.

2017-04-21 23:41:53

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Fri, Apr 21, 2017 at 4:28 PM, Andy Lutomirski <[email protected]> wrote:
> On Fri, Apr 21, 2017 at 4:19 PM, Kees Cook <[email protected]> wrote:
>> On Wed, Apr 19, 2017 at 7:41 PM, Andy Lutomirski <[email protected]> wrote:
>>> On Wed, Apr 19, 2017 at 4:43 PM, Kees Cook <[email protected]> wrote:
>>>> On Wed, Apr 19, 2017 at 4:15 PM, Andy Lutomirski <[email protected]> wrote:
>>>>> On Wed, Apr 19, 2017 at 3:20 PM, Djalal Harouni <[email protected]> wrote:
>>>>>> +/* Sets task's modules_autoload */
>>>>>> +static inline int task_set_modules_autoload(struct task_struct *task,
>>>>>> + unsigned long value)
>>>>>> +{
>>>>>> + if (value > MODULES_AUTOLOAD_DISABLED)
>>>>>> + return -EINVAL;
>>>>>> + else if (task->modules_autoload > value)
>>>>>> + return -EPERM;
>>>>>> + else if (task->modules_autoload < value)
>>>>>> + task->modules_autoload = value;
>>>>>> +
>>>>>> + return 0;
>>>>>> +}
>>>>>
>>>>> This needs to be more locked down. Otherwise someone could set this
>>>>> and then run a setuid program. Admittedly, it would be quite odd if
>>>>> this particular thing causes a problem, but the issue exists
>>>>> nonetheless.
>>>>
>>>> Eeeh, I don't agree this needs to be changed. APIs provided by modules
>>>> are different than the existing privilege-manipulation syscalls this
>>>> concern stems from. Applications are already forced to deal with
>>>> things being missing like this in the face of it simply not being
>>>> built into the kernel.
>>>>
>>>> Having to hide this behind nnp seems like it'd reduce its utility...
>>>>
>>>
>>> I think that adding an inherited boolean to task_struct that can be
>>> set by unprivileged tasks and passed to privileged tasks is a terrible
>>> precedent. Ideally someone would try to find all the existing things
>>> like this and kill them off.
>>
>> (Tristate, not boolean, but yeah.)
>>
>> I see two others besides seccomp and nnp:
>>
>> PR_MCE_KILL
>
> Well, that's interesting. That should presumably be reset on setuid
> exec or something.
>
>> PR_SET_THP_DISABLE
>
> Um. At least that's just a performance issue.
>
>>
>> I really don't think this needs nnp protection.
>>
>>> I agree that I don't see how one would exploit this particular
>>> feature, but I still think I dislike the approach. This is a slippery
>>> slope to adding a boolean for perf_event_open(), unshare(), etc, and
>>> we should solve these for real rather than half-arsing them IMO.
>>
>> I disagree (obviously); this would be protecting the entire module
>> autoload attack surface. That's hardly a specific control, and it's a
>> demonstrably needed flag.
>>
>
> The list is just going to get longer. We should probably have controls for:
>
> - Use of perf. Unclear how fine grained they should be.

This can already be "given up" by a process by using seccomp. The
system-wide setting is what's missing here, and that's a whole other
thread already even though basically every distro has implemented the
= 3 sysctl knob level.

> - Creation of new user namespaces. Possibly also use of things like
> iptables without global privilege.

This is another one that can be controlled by seccomp. The system-wide
setting already exists in /proc/sys/user/max_user_namespaces.

> - Ability to look up tasks owned by different uids (or maybe other
> tasks *at all*) by pid/tid. Conceptually, this is easy. The API is
> the only hard part, I think.

The attack surface here is relatively small compared to the other examples.

> - Ability to bind ports, maybe?

seccomp and maybe a sysctl? I'd have to look at that more carefully,
but again, this isn't a comparable attack-surface/confinement issue.

> My point is that all of these need some way to handle configuration
> and inheritance, and I don't think that a bunch of per-task prctls is
> the right way. As just an example, saying that interactive users can
> autoload modules but other users can't, or that certain systemd
> services can, etc, might be nice. Linus already complained that he
> (i.e. user "torvalds" or whatever) should be able to profile the
> kernel but that other uids should not be able to.
>
> I personally like my implicit_rights idea, and it might be interesting
> to prototype it.

I don't like blocking a needed feature behind a large super-feature
that doesn't exist yet. We'd be able to refactor this code into using
such a thing in the future, so I'd prefer to move ahead with this
since it would stop actual exploits.

-Kees

--
Kees Cook
Pixel Security

2017-04-21 23:52:38

by Casey Schaufler

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On 4/21/2017 4:28 PM, Andy Lutomirski wrote:
> On Fri, Apr 21, 2017 at 4:19 PM, Kees Cook <[email protected]> wrote:
>> On Wed, Apr 19, 2017 at 7:41 PM, Andy Lutomirski <[email protected]> wrote:
>>> On Wed, Apr 19, 2017 at 4:43 PM, Kees Cook <[email protected]> wrote:
>>>> On Wed, Apr 19, 2017 at 4:15 PM, Andy Lutomirski <[email protected]> wrote:
>>>>> On Wed, Apr 19, 2017 at 3:20 PM, Djalal Harouni <[email protected]> wrote:
>>>>>> +/* Sets task's modules_autoload */
>>>>>> +static inline int task_set_modules_autoload(struct task_struct *task,
>>>>>> + unsigned long value)
>>>>>> +{
>>>>>> + if (value > MODULES_AUTOLOAD_DISABLED)
>>>>>> + return -EINVAL;
>>>>>> + else if (task->modules_autoload > value)
>>>>>> + return -EPERM;
>>>>>> + else if (task->modules_autoload < value)
>>>>>> + task->modules_autoload = value;
>>>>>> +
>>>>>> + return 0;
>>>>>> +}
>>>>> This needs to be more locked down. Otherwise someone could set this
>>>>> and then run a setuid program. Admittedly, it would be quite odd if
>>>>> this particular thing causes a problem, but the issue exists
>>>>> nonetheless.
>>>> Eeeh, I don't agree this needs to be changed. APIs provided by modules
>>>> are different than the existing privilege-manipulation syscalls this
>>>> concern stems from. Applications are already forced to deal with
>>>> things being missing like this in the face of it simply not being
>>>> built into the kernel.
>>>>
>>>> Having to hide this behind nnp seems like it'd reduce its utility...
>>>>
>>> I think that adding an inherited boolean to task_struct that can be
>>> set by unprivileged tasks and passed to privileged tasks is a terrible
>>> precedent. Ideally someone would try to find all the existing things
>>> like this and kill them off.
>> (Tristate, not boolean, but yeah.)
>>
>> I see two others besides seccomp and nnp:
>>
>> PR_MCE_KILL
> Well, that's interesting. That should presumably be reset on setuid
> exec or something.
>
>> PR_SET_THP_DISABLE
> Um. At least that's just a performance issue.
>
>> I really don't think this needs nnp protection.
>>
>>> I agree that I don't see how one would exploit this particular
>>> feature, but I still think I dislike the approach. This is a slippery
>>> slope to adding a boolean for perf_event_open(), unshare(), etc, and
>>> we should solve these for real rather than half-arsing them IMO.
>> I disagree (obviously); this would be protecting the entire module
>> autoload attack surface. That's hardly a specific control, and it's a
>> demonstrably needed flag.
>>
> The list is just going to get longer. We should probably have controls for:
>
> - Use of perf. Unclear how fine grained they should be.
>
> - Creation of new user namespaces. Possibly also use of things like
> iptables without global privilege.
>
> - Ability to look up tasks owned by different uids (or maybe other
> tasks *at all*) by pid/tid. Conceptually, this is easy. The API is
> the only hard part, I think.
>
> - Ability to bind ports, maybe?

One of my longer term (i.e. after stacking) projects
is to create sensible access control on ports. Why shouldn't
they have owners and mode bits (or ACLs, if you prefer)
or real names. I kind of think we should be able to eliminate
the need for dbus without resorting to kdbus.

So I don't like the idea of treating that as a special case.
I'd rather see ports controlled properly. (Of course, the
SELinux crowd will point out they have this handled, but I
remain unconvinced of the overall solution)

>
> My point is that all of these need some way to handle configuration
> and inheritance, and I don't think that a bunch of per-task prctls is
> the right way. As just an example, saying that interactive users can
> autoload modules but other users can't, or that certain systemd
> services can, etc, might be nice. Linus already complained that he
> (i.e. user "torvalds" or whatever) should be able to profile the
> kernel but that other uids should not be able to.
>
> I personally like my implicit_rights idea, and it might be interesting
> to prototype it.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2017-04-21 23:51:59

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Fri, Apr 21, 2017 at 4:40 PM, Kees Cook <[email protected]> wrote:
> On Fri, Apr 21, 2017 at 4:28 PM, Andy Lutomirski <[email protected]> wrote:
>> On Fri, Apr 21, 2017 at 4:19 PM, Kees Cook <[email protected]> wrote:
>>> On Wed, Apr 19, 2017 at 7:41 PM, Andy Lutomirski <[email protected]> wrote:
>>>> On Wed, Apr 19, 2017 at 4:43 PM, Kees Cook <[email protected]> wrote:
>>>>> On Wed, Apr 19, 2017 at 4:15 PM, Andy Lutomirski <[email protected]> wrote:
>>>>>> On Wed, Apr 19, 2017 at 3:20 PM, Djalal Harouni <[email protected]> wrote:
>>>>>>> +/* Sets task's modules_autoload */
>>>>>>> +static inline int task_set_modules_autoload(struct task_struct *task,
>>>>>>> + unsigned long value)
>>>>>>> +{
>>>>>>> + if (value > MODULES_AUTOLOAD_DISABLED)
>>>>>>> + return -EINVAL;
>>>>>>> + else if (task->modules_autoload > value)
>>>>>>> + return -EPERM;
>>>>>>> + else if (task->modules_autoload < value)
>>>>>>> + task->modules_autoload = value;
>>>>>>> +
>>>>>>> + return 0;
>>>>>>> +}
>>>>>>
>>>>>> This needs to be more locked down. Otherwise someone could set this
>>>>>> and then run a setuid program. Admittedly, it would be quite odd if
>>>>>> this particular thing causes a problem, but the issue exists
>>>>>> nonetheless.
>>>>>
>>>>> Eeeh, I don't agree this needs to be changed. APIs provided by modules
>>>>> are different than the existing privilege-manipulation syscalls this
>>>>> concern stems from. Applications are already forced to deal with
>>>>> things being missing like this in the face of it simply not being
>>>>> built into the kernel.
>>>>>
>>>>> Having to hide this behind nnp seems like it'd reduce its utility...
>>>>>
>>>>
>>>> I think that adding an inherited boolean to task_struct that can be
>>>> set by unprivileged tasks and passed to privileged tasks is a terrible
>>>> precedent. Ideally someone would try to find all the existing things
>>>> like this and kill them off.
>>>
>>> (Tristate, not boolean, but yeah.)
>>>
>>> I see two others besides seccomp and nnp:
>>>
>>> PR_MCE_KILL
>>
>> Well, that's interesting. That should presumably be reset on setuid
>> exec or something.
>>
>>> PR_SET_THP_DISABLE
>>
>> Um. At least that's just a performance issue.
>>
>>>
>>> I really don't think this needs nnp protection.
>>>
>>>> I agree that I don't see how one would exploit this particular
>>>> feature, but I still think I dislike the approach. This is a slippery
>>>> slope to adding a boolean for perf_event_open(), unshare(), etc, and
>>>> we should solve these for real rather than half-arsing them IMO.
>>>
>>> I disagree (obviously); this would be protecting the entire module
>>> autoload attack surface. That's hardly a specific control, and it's a
>>> demonstrably needed flag.
>>>
>>
>> The list is just going to get longer. We should probably have controls for:
>>
>> - Use of perf. Unclear how fine grained they should be.
>
> This can already be "given up" by a process by using seccomp. The
> system-wide setting is what's missing here, and that's a whole other
> thread already even though basically every distro has implemented the
> = 3 sysctl knob level.

But it can't be done the way Linus wants it, and I don't blame him for
complaining.

>
>> - Creation of new user namespaces. Possibly also use of things like
>> iptables without global privilege.
>
> This is another one that can be controlled by seccomp. The system-wide
> setting already exists in /proc/sys/user/max_user_namespaces.

Awkwardly, though.

>
>> - Ability to look up tasks owned by different uids (or maybe other
>> tasks *at all*) by pid/tid. Conceptually, this is easy. The API is
>> the only hard part, I think.
>
> The attack surface here is relatively small compared to the other examples.
>
>> - Ability to bind ports, maybe?
>
> seccomp and maybe a sysctl? I'd have to look at that more carefully,
> but again, this isn't a comparable attack-surface/confinement issue.
>
>> My point is that all of these need some way to handle configuration
>> and inheritance, and I don't think that a bunch of per-task prctls is
>> the right way. As just an example, saying that interactive users can
>> autoload modules but other users can't, or that certain systemd
>> services can, etc, might be nice. Linus already complained that he
>> (i.e. user "torvalds" or whatever) should be able to profile the
>> kernel but that other uids should not be able to.
>>
>> I personally like my implicit_rights idea, and it might be interesting
>> to prototype it.
>
> I don't like blocking a needed feature behind a large super-feature
> that doesn't exist yet. We'd be able to refactor this code into using
> such a thing in the future, so I'd prefer to move ahead with this
> since it would stop actual exploits.

I don't think the super-feature is so hard, and I think we should not
add the per-task thing the way it's done in this patch. Let's not add
per-task things where the best argument for their security is "not
sure how it would be exploited".

Anyway, I think the sysctl is really the important bit. The per-task
setting is icing on the cake IMO. One upon a time autoload was more
important, but these days modaliases are supposed to do most of the
work. I bet that modern distros don't need unprivileged autoload at
all.

--Andy

2017-04-22 00:01:05

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Fri, Apr 21, 2017 at 4:52 PM, Casey Schaufler <[email protected]> wrote:
> On 4/21/2017 4:28 PM, Andy Lutomirski wrote:
>> On Fri, Apr 21, 2017 at 4:19 PM, Kees Cook <[email protected]> wrote:
>>> On Wed, Apr 19, 2017 at 7:41 PM, Andy Lutomirski <[email protected]> wrote:
>>>> On Wed, Apr 19, 2017 at 4:43 PM, Kees Cook <[email protected]> wrote:
>>>>> On Wed, Apr 19, 2017 at 4:15 PM, Andy Lutomirski <[email protected]> wrote:
>>>>>> On Wed, Apr 19, 2017 at 3:20 PM, Djalal Harouni <[email protected]> wrote:
>>>>>>> +/* Sets task's modules_autoload */
>>>>>>> +static inline int task_set_modules_autoload(struct task_struct *task,
>>>>>>> + unsigned long value)
>>>>>>> +{
>>>>>>> + if (value > MODULES_AUTOLOAD_DISABLED)
>>>>>>> + return -EINVAL;
>>>>>>> + else if (task->modules_autoload > value)
>>>>>>> + return -EPERM;
>>>>>>> + else if (task->modules_autoload < value)
>>>>>>> + task->modules_autoload = value;
>>>>>>> +
>>>>>>> + return 0;
>>>>>>> +}
>>>>>> This needs to be more locked down. Otherwise someone could set this
>>>>>> and then run a setuid program. Admittedly, it would be quite odd if
>>>>>> this particular thing causes a problem, but the issue exists
>>>>>> nonetheless.
>>>>> Eeeh, I don't agree this needs to be changed. APIs provided by modules
>>>>> are different than the existing privilege-manipulation syscalls this
>>>>> concern stems from. Applications are already forced to deal with
>>>>> things being missing like this in the face of it simply not being
>>>>> built into the kernel.
>>>>>
>>>>> Having to hide this behind nnp seems like it'd reduce its utility...
>>>>>
>>>> I think that adding an inherited boolean to task_struct that can be
>>>> set by unprivileged tasks and passed to privileged tasks is a terrible
>>>> precedent. Ideally someone would try to find all the existing things
>>>> like this and kill them off.
>>> (Tristate, not boolean, but yeah.)
>>>
>>> I see two others besides seccomp and nnp:
>>>
>>> PR_MCE_KILL
>> Well, that's interesting. That should presumably be reset on setuid
>> exec or something.
>>
>>> PR_SET_THP_DISABLE
>> Um. At least that's just a performance issue.
>>
>>> I really don't think this needs nnp protection.
>>>
>>>> I agree that I don't see how one would exploit this particular
>>>> feature, but I still think I dislike the approach. This is a slippery
>>>> slope to adding a boolean for perf_event_open(), unshare(), etc, and
>>>> we should solve these for real rather than half-arsing them IMO.
>>> I disagree (obviously); this would be protecting the entire module
>>> autoload attack surface. That's hardly a specific control, and it's a
>>> demonstrably needed flag.
>>>
>> The list is just going to get longer. We should probably have controls for:
>>
>> - Use of perf. Unclear how fine grained they should be.
>>
>> - Creation of new user namespaces. Possibly also use of things like
>> iptables without global privilege.
>>
>> - Ability to look up tasks owned by different uids (or maybe other
>> tasks *at all*) by pid/tid. Conceptually, this is easy. The API is
>> the only hard part, I think.
>>
>> - Ability to bind ports, maybe?
>
> One of my longer term (i.e. after stacking) projects
> is to create sensible access control on ports. Why shouldn't
> they have owners and mode bits (or ACLs, if you prefer)
> or real names. I kind of think we should be able to eliminate
> the need for dbus without resorting to kdbus.

My implicit_rights concept gives any type of access control you can
use on inodes because they *are* inodes. So you get ACLs, etc.

Brief summary for those who didn't read my old email: We add a new
kind of filesystem object called a "right". It's a special kind of
socket inode that can't be bound or connected but is instead created
by a new syscall. It has a name, so "port:1234" might be a name of a
right.

To use an implicit right, you do whatever syscall you would do
normally. The kernel looks for a right object at
/dev/implicit_rights/<name>. If that object exists, is a right of the
correct type (i.e. the right's name matches <name>) and you have
execute access, you win. Otherwise you lose.

To avoid breaking existing distros, for things like modules_autoload,
you would set a sysctl
/proc/sys/kernel/required_implicit_rights/modules_autoload=1. With
that set, to autoload a module without CAP_SYS_MODULE, you need the
/dev/implicit_rights/modules_autoload.

>
> So I don't like the idea of treating that as a special case.
> I'd rather see ports controlled properly. (Of course, the
> SELinux crowd will point out they have this handled, but I
> remain unconvinced of the overall solution)

Agreed. But I think we should address all of these things together.

2017-04-22 00:12:34

by Djalal Harouni

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Sat, Apr 22, 2017 at 1:51 AM, Andy Lutomirski <[email protected]> wrote:
[...]
>>> I personally like my implicit_rights idea, and it might be interesting
>>> to prototype it.
>>
>> I don't like blocking a needed feature behind a large super-feature
>> that doesn't exist yet. We'd be able to refactor this code into using
>> such a thing in the future, so I'd prefer to move ahead with this
>> since it would stop actual exploits.
>
> I don't think the super-feature is so hard, and I think we should not
> add the per-task thing the way it's done in this patch. Let's not add
> per-task things where the best argument for their security is "not
> sure how it would be exploited".

Actually the XFRM framework CVE-2017-7184 [1] is one real example, of
course there are others. The exploit was used on a generic distro
during a security contest that distro is Ubuntu. That distro will
never provide a module autoloading restriction by default to not harm
it's users. Consumers or containers/sandboxes then can run their
confined apps using such facilities.

These bugs will stay in embedded devices that use these generic
distros for ever.

> Anyway, I think the sysctl is really the important bit. The per-task
> setting is icing on the cake IMO. One upon a time autoload was more
> important, but these days modaliases are supposed to do most of the
> work. I bet that modern distros don't need unprivileged autoload at
> all.

Actually I think they do and we can't just change that. Users may
depend on it, it is a well established facility.

Now the other problem is CAP_NET_ADMIN which does lot of things, it is
more like the CAP_SYS_ADMIN.

This is a quick list that I got from only the past months, I'm pretty
sure there are more:

* DCCP use after free CVE-2017-6074
* n_hldc CVE-2017-2636
* XFRM framework CVE-2017-7184
* L2TPv3 CVE-2016-10200

Most of these need CAP_NET_ADMIN to be autoloaded, however we also
need CAP_NET_ADMIN for other things... therefore it is better to have
an extra facility that could coexist with CAP_NET_ADMIN and other
sandbox features.


[1] http://www.openwall.com/lists/oss-security/2017/03/29/2


--
tixxdz

2017-04-22 00:13:39

by Casey Schaufler

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On 4/21/2017 5:00 PM, Andy Lutomirski wrote:
> On Fri, Apr 21, 2017 at 4:52 PM, Casey Schaufler <[email protected]> wrote:
>> On 4/21/2017 4:28 PM, Andy Lutomirski wrote:
>>> On Fri, Apr 21, 2017 at 4:19 PM, Kees Cook <[email protected]> wrote:
>>>> On Wed, Apr 19, 2017 at 7:41 PM, Andy Lutomirski <[email protected]> wrote:
>>>>> On Wed, Apr 19, 2017 at 4:43 PM, Kees Cook <[email protected]> wrote:
>>>>>> On Wed, Apr 19, 2017 at 4:15 PM, Andy Lutomirski <[email protected]> wrote:
>>>>>>> On Wed, Apr 19, 2017 at 3:20 PM, Djalal Harouni <[email protected]> wrote:
>>>>>>>> +/* Sets task's modules_autoload */
>>>>>>>> +static inline int task_set_modules_autoload(struct task_struct *task,
>>>>>>>> + unsigned long value)
>>>>>>>> +{
>>>>>>>> + if (value > MODULES_AUTOLOAD_DISABLED)
>>>>>>>> + return -EINVAL;
>>>>>>>> + else if (task->modules_autoload > value)
>>>>>>>> + return -EPERM;
>>>>>>>> + else if (task->modules_autoload < value)
>>>>>>>> + task->modules_autoload = value;
>>>>>>>> +
>>>>>>>> + return 0;
>>>>>>>> +}
>>>>>>> This needs to be more locked down. Otherwise someone could set this
>>>>>>> and then run a setuid program. Admittedly, it would be quite odd if
>>>>>>> this particular thing causes a problem, but the issue exists
>>>>>>> nonetheless.
>>>>>> Eeeh, I don't agree this needs to be changed. APIs provided by modules
>>>>>> are different than the existing privilege-manipulation syscalls this
>>>>>> concern stems from. Applications are already forced to deal with
>>>>>> things being missing like this in the face of it simply not being
>>>>>> built into the kernel.
>>>>>>
>>>>>> Having to hide this behind nnp seems like it'd reduce its utility...
>>>>>>
>>>>> I think that adding an inherited boolean to task_struct that can be
>>>>> set by unprivileged tasks and passed to privileged tasks is a terrible
>>>>> precedent. Ideally someone would try to find all the existing things
>>>>> like this and kill them off.
>>>> (Tristate, not boolean, but yeah.)
>>>>
>>>> I see two others besides seccomp and nnp:
>>>>
>>>> PR_MCE_KILL
>>> Well, that's interesting. That should presumably be reset on setuid
>>> exec or something.
>>>
>>>> PR_SET_THP_DISABLE
>>> Um. At least that's just a performance issue.
>>>
>>>> I really don't think this needs nnp protection.
>>>>
>>>>> I agree that I don't see how one would exploit this particular
>>>>> feature, but I still think I dislike the approach. This is a slippery
>>>>> slope to adding a boolean for perf_event_open(), unshare(), etc, and
>>>>> we should solve these for real rather than half-arsing them IMO.
>>>> I disagree (obviously); this would be protecting the entire module
>>>> autoload attack surface. That's hardly a specific control, and it's a
>>>> demonstrably needed flag.
>>>>
>>> The list is just going to get longer. We should probably have controls for:
>>>
>>> - Use of perf. Unclear how fine grained they should be.
>>>
>>> - Creation of new user namespaces. Possibly also use of things like
>>> iptables without global privilege.
>>>
>>> - Ability to look up tasks owned by different uids (or maybe other
>>> tasks *at all*) by pid/tid. Conceptually, this is easy. The API is
>>> the only hard part, I think.
>>>
>>> - Ability to bind ports, maybe?
>> One of my longer term (i.e. after stacking) projects
>> is to create sensible access control on ports. Why shouldn't
>> they have owners and mode bits (or ACLs, if you prefer)
>> or real names. I kind of think we should be able to eliminate
>> the need for dbus without resorting to kdbus.
> My implicit_rights concept gives any type of access control you can
> use on inodes because they *are* inodes. So you get ACLs, etc.
>
> Brief summary for those who didn't read my old email: We add a new
> kind of filesystem object called a "right". It's a special kind of
> socket inode that can't be bound or connected but is instead created
> by a new syscall. It has a name, so "port:1234" might be a name of a
> right.
>
> To use an implicit right, you do whatever syscall you would do
> normally. The kernel looks for a right object at
> /dev/implicit_rights/<name>. If that object exists, is a right of the
> correct type (i.e. the right's name matches <name>) and you have
> execute access, you win. Otherwise you lose.
>
> To avoid breaking existing distros, for things like modules_autoload,
> you would set a sysctl
> /proc/sys/kernel/required_implicit_rights/modules_autoload=1. With
> that set, to autoload a module without CAP_SYS_MODULE, you need the
> /dev/implicit_rights/modules_autoload.

Sounds good.

>> So I don't like the idea of treating that as a special case.
>> I'd rather see ports controlled properly. (Of course, the
>> SELinux crowd will point out they have this handled, but I
>> remain unconvinced of the overall solution)
> Agreed. But I think we should address all of these things together.

What I don't want is to have to buy into a hundred things I
don't want in order to get the one thing I do. A General mechanism
is dandy, but I don't want to have to write a gazillion policy
lines for features I don't want in order to get a simple control.
The problem with SELinux is not the effort required to protect
what you care about, it's the effort required to do everything else.

2017-04-22 01:19:59

by Djalal Harouni

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Sat, Apr 22, 2017 at 2:12 AM, Djalal Harouni <[email protected]> wrote:
> On Sat, Apr 22, 2017 at 1:51 AM, Andy Lutomirski <[email protected]> wrote:
> [...]
>>>> I personally like my implicit_rights idea, and it might be interesting
>>>> to prototype it.
>>>
>>> I don't like blocking a needed feature behind a large super-feature
>>> that doesn't exist yet. We'd be able to refactor this code into using
>>> such a thing in the future, so I'd prefer to move ahead with this
>>> since it would stop actual exploits.
>>
>> I don't think the super-feature is so hard, and I think we should not
>> add the per-task thing the way it's done in this patch. Let's not add
>> per-task things where the best argument for their security is "not
>> sure how it would be exploited".
>
> Actually the XFRM framework CVE-2017-7184 [1] is one real example, of
> course there are others. The exploit was used on a generic distro
> during a security contest that distro is Ubuntu. That distro will
> never provide a module autoloading restriction by default to not harm
> it's users. Consumers or containers/sandboxes then can run their
> confined apps using such facilities.
>
> These bugs will stay in embedded devices that use these generic
> distros for ever.

The DCCP CVE-2017-6074 exploit:
http://seclists.org/oss-sec/2017/q1/503

Well, pretty sure there is more... the bugs are real, as their
exploits. Anyway I think these features can coexist as they are
optional, and most process trees protections can get along by design.


--
tixxdz

2017-04-22 06:45:53

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Fri, Apr 21, 2017 at 5:13 PM, Casey Schaufler <[email protected]> wrote:
> On 4/21/2017 5:00 PM, Andy Lutomirski wrote:
>> On Fri, Apr 21, 2017 at 4:52 PM, Casey Schaufler <[email protected]> wrote:
>>> On 4/21/2017 4:28 PM, Andy Lutomirski wrote:
>>>> On Fri, Apr 21, 2017 at 4:19 PM, Kees Cook <[email protected]> wrote:
>>>>> On Wed, Apr 19, 2017 at 7:41 PM, Andy Lutomirski <[email protected]> wrote:
>>>>>> On Wed, Apr 19, 2017 at 4:43 PM, Kees Cook <[email protected]> wrote:
>>>>>>> On Wed, Apr 19, 2017 at 4:15 PM, Andy Lutomirski <[email protected]> wrote:
>>>>>>>> On Wed, Apr 19, 2017 at 3:20 PM, Djalal Harouni <[email protected]> wrote:
>>>>>>>>> +/* Sets task's modules_autoload */
>>>>>>>>> +static inline int task_set_modules_autoload(struct task_struct *task,
>>>>>>>>> + unsigned long value)
>>>>>>>>> +{
>>>>>>>>> + if (value > MODULES_AUTOLOAD_DISABLED)
>>>>>>>>> + return -EINVAL;
>>>>>>>>> + else if (task->modules_autoload > value)
>>>>>>>>> + return -EPERM;
>>>>>>>>> + else if (task->modules_autoload < value)
>>>>>>>>> + task->modules_autoload = value;
>>>>>>>>> +
>>>>>>>>> + return 0;
>>>>>>>>> +}
>>>>>>>> This needs to be more locked down. Otherwise someone could set this
>>>>>>>> and then run a setuid program. Admittedly, it would be quite odd if
>>>>>>>> this particular thing causes a problem, but the issue exists
>>>>>>>> nonetheless.
>>>>>>> Eeeh, I don't agree this needs to be changed. APIs provided by modules
>>>>>>> are different than the existing privilege-manipulation syscalls this
>>>>>>> concern stems from. Applications are already forced to deal with
>>>>>>> things being missing like this in the face of it simply not being
>>>>>>> built into the kernel.
>>>>>>>
>>>>>>> Having to hide this behind nnp seems like it'd reduce its utility...
>>>>>>>
>>>>>> I think that adding an inherited boolean to task_struct that can be
>>>>>> set by unprivileged tasks and passed to privileged tasks is a terrible
>>>>>> precedent. Ideally someone would try to find all the existing things
>>>>>> like this and kill them off.
>>>>> (Tristate, not boolean, but yeah.)
>>>>>
>>>>> I see two others besides seccomp and nnp:
>>>>>
>>>>> PR_MCE_KILL
>>>> Well, that's interesting. That should presumably be reset on setuid
>>>> exec or something.
>>>>
>>>>> PR_SET_THP_DISABLE
>>>> Um. At least that's just a performance issue.
>>>>
>>>>> I really don't think this needs nnp protection.
>>>>>
>>>>>> I agree that I don't see how one would exploit this particular
>>>>>> feature, but I still think I dislike the approach. This is a slippery
>>>>>> slope to adding a boolean for perf_event_open(), unshare(), etc, and
>>>>>> we should solve these for real rather than half-arsing them IMO.
>>>>> I disagree (obviously); this would be protecting the entire module
>>>>> autoload attack surface. That's hardly a specific control, and it's a
>>>>> demonstrably needed flag.
>>>>>
>>>> The list is just going to get longer. We should probably have controls for:
>>>>
>>>> - Use of perf. Unclear how fine grained they should be.
>>>>
>>>> - Creation of new user namespaces. Possibly also use of things like
>>>> iptables without global privilege.
>>>>
>>>> - Ability to look up tasks owned by different uids (or maybe other
>>>> tasks *at all*) by pid/tid. Conceptually, this is easy. The API is
>>>> the only hard part, I think.
>>>>
>>>> - Ability to bind ports, maybe?
>>> One of my longer term (i.e. after stacking) projects
>>> is to create sensible access control on ports. Why shouldn't
>>> they have owners and mode bits (or ACLs, if you prefer)
>>> or real names. I kind of think we should be able to eliminate
>>> the need for dbus without resorting to kdbus.
>> My implicit_rights concept gives any type of access control you can
>> use on inodes because they *are* inodes. So you get ACLs, etc.
>>
>> Brief summary for those who didn't read my old email: We add a new
>> kind of filesystem object called a "right". It's a special kind of
>> socket inode that can't be bound or connected but is instead created
>> by a new syscall. It has a name, so "port:1234" might be a name of a
>> right.
>>
>> To use an implicit right, you do whatever syscall you would do
>> normally. The kernel looks for a right object at
>> /dev/implicit_rights/<name>. If that object exists, is a right of the
>> correct type (i.e. the right's name matches <name>) and you have
>> execute access, you win. Otherwise you lose.
>>
>> To avoid breaking existing distros, for things like modules_autoload,
>> you would set a sysctl
>> /proc/sys/kernel/required_implicit_rights/modules_autoload=1. With
>> that set, to autoload a module without CAP_SYS_MODULE, you need the
>> /dev/implicit_rights/modules_autoload.
>
> Sounds good.
>
>>> So I don't like the idea of treating that as a special case.
>>> I'd rather see ports controlled properly. (Of course, the
>>> SELinux crowd will point out they have this handled, but I
>>> remain unconvinced of the overall solution)
>> Agreed. But I think we should address all of these things together.
>
> What I don't want is to have to buy into a hundred things I
> don't want in order to get the one thing I do. A General mechanism
> is dandy, but I don't want to have to write a gazillion policy
> lines for features I don't want in order to get a simple control.
> The problem with SELinux is not the effort required to protect
> what you care about, it's the effort required to do everything else.
>

The point is to make it super simple. chown, chmod and, if you want
to get fancy, setfacl. You'll need a mkright tool, but that's
trivial.

2017-04-22 06:52:15

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Fri, Apr 21, 2017 at 5:12 PM, Djalal Harouni <[email protected]> wrote:
> On Sat, Apr 22, 2017 at 1:51 AM, Andy Lutomirski <[email protected]> wrote:
> [...]
>>>> I personally like my implicit_rights idea, and it might be interesting
>>>> to prototype it.
>>>
>>> I don't like blocking a needed feature behind a large super-feature
>>> that doesn't exist yet. We'd be able to refactor this code into using
>>> such a thing in the future, so I'd prefer to move ahead with this
>>> since it would stop actual exploits.
>>
>> I don't think the super-feature is so hard, and I think we should not
>> add the per-task thing the way it's done in this patch. Let's not add
>> per-task things where the best argument for their security is "not
>> sure how it would be exploited".
>
> Actually the XFRM framework CVE-2017-7184 [1] is one real example, of
> course there are others. The exploit was used on a generic distro
> during a security contest that distro is Ubuntu. That distro will
> never provide a module autoloading restriction by default to not harm
> it's users. Consumers or containers/sandboxes then can run their
> confined apps using such facilities.
>
> These bugs will stay in embedded devices that use these generic
> distros for ever.
>
>> Anyway, I think the sysctl is really the important bit. The per-task
>> setting is icing on the cake IMO. One upon a time autoload was more
>> important, but these days modaliases are supposed to do most of the
>> work. I bet that modern distros don't need unprivileged autoload at
>> all.
>
> Actually I think they do and we can't just change that. Users may
> depend on it, it is a well established facility.
>
> Now the other problem is CAP_NET_ADMIN which does lot of things, it is
> more like the CAP_SYS_ADMIN.
>
> This is a quick list that I got from only the past months, I'm pretty
> sure there are more:
>
> * DCCP use after free CVE-2017-6074
> * n_hldc CVE-2017-2636
> * XFRM framework CVE-2017-7184
> * L2TPv3 CVE-2016-10200
>
> Most of these need CAP_NET_ADMIN to be autoloaded, however we also
> need CAP_NET_ADMIN for other things... therefore it is better to have
> an extra facility that could coexist with CAP_NET_ADMIN and other
> sandbox features.
>

I agree that the feature is important, but I think your implementation
is needlessly dangerous. I imagine that the main uses that you care
about involve containers. How about doing it in a safer way that
works for containers? I can think of a few. For example:

1. A sysctl that, if set, prevents autoloading outside the root
userns. This isn't very flexible at all, but it might work.

2. Your patch, but require privilege within the calling namespace to
set the prctl.

3. A per-user-ns sysctl.

2017-04-22 12:17:48

by Djalal Harouni

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Sat, Apr 22, 2017 at 1:28 AM, Andy Lutomirski <[email protected]> wrote:
> On Fri, Apr 21, 2017 at 4:19 PM, Kees Cook <[email protected]> wrote:
>> On Wed, Apr 19, 2017 at 7:41 PM, Andy Lutomirski <[email protected]> wrote:
>>> On Wed, Apr 19, 2017 at 4:43 PM, Kees Cook <[email protected]> wrote:
>>>> On Wed, Apr 19, 2017 at 4:15 PM, Andy Lutomirski <[email protected]> wrote:
>>>>> On Wed, Apr 19, 2017 at 3:20 PM, Djalal Harouni <[email protected]> wrote:
>>>>>> +/* Sets task's modules_autoload */
>>>>>> +static inline int task_set_modules_autoload(struct task_struct *task,
>>>>>> + unsigned long value)
>>>>>> +{
>>>>>> + if (value > MODULES_AUTOLOAD_DISABLED)
>>>>>> + return -EINVAL;
>>>>>> + else if (task->modules_autoload > value)
>>>>>> + return -EPERM;
>>>>>> + else if (task->modules_autoload < value)
>>>>>> + task->modules_autoload = value;
>>>>>> +
>>>>>> + return 0;
>>>>>> +}
>>>>>
>>>>> This needs to be more locked down. Otherwise someone could set this
>>>>> and then run a setuid program. Admittedly, it would be quite odd if
>>>>> this particular thing causes a problem, but the issue exists
>>>>> nonetheless.
>>>>
>>>> Eeeh, I don't agree this needs to be changed. APIs provided by modules
>>>> are different than the existing privilege-manipulation syscalls this
>>>> concern stems from. Applications are already forced to deal with
>>>> things being missing like this in the face of it simply not being
>>>> built into the kernel.
>>>>
>>>> Having to hide this behind nnp seems like it'd reduce its utility...
>>>>
>>>
>>> I think that adding an inherited boolean to task_struct that can be
>>> set by unprivileged tasks and passed to privileged tasks is a terrible
>>> precedent. Ideally someone would try to find all the existing things
>>> like this and kill them off.
>>
>> (Tristate, not boolean, but yeah.)
>>
>> I see two others besides seccomp and nnp:
>>
>> PR_MCE_KILL
>
> Well, that's interesting. That should presumably be reset on setuid
> exec or something.
>
>> PR_SET_THP_DISABLE
>
> Um. At least that's just a performance issue.
>
>>
>> I really don't think this needs nnp protection.
>>
>>> I agree that I don't see how one would exploit this particular
>>> feature, but I still think I dislike the approach. This is a slippery
>>> slope to adding a boolean for perf_event_open(), unshare(), etc, and
>>> we should solve these for real rather than half-arsing them IMO.
>>
>> I disagree (obviously); this would be protecting the entire module
>> autoload attack surface. That's hardly a specific control, and it's a
>> demonstrably needed flag.
>>
>
> The list is just going to get longer. We should probably have controls for:
>
> - Use of perf. Unclear how fine grained they should be.
>
> - Creation of new user namespaces. Possibly also use of things like
> iptables without global privilege.
>
> - Ability to look up tasks owned by different uids (or maybe other
> tasks *at all*) by pid/tid. Conceptually, this is easy. The API is
> the only hard part, I think.
>
> - Ability to bind ports, maybe?
>
> My point is that all of these need some way to handle configuration
> and inheritance, and I don't think that a bunch of per-task prctls is
> the right way. As just an example, saying that interactive users can
> autoload modules but other users can't, or that certain systemd
> services can, etc, might be nice. Linus already complained that he
> (i.e. user "torvalds" or whatever) should be able to profile the
> kernel but that other uids should not be able to.

Neat, maybe this could already be achieved with this interface and
systemd-logind, "ModulesAutoloadUsers=andy" in logind.conf where
"andy" is the only logged-in user able to trigger and autoload kernel
modules. However maybe we should not restrict too much other bits or
functionality of the other users, please let me will follow up later
on it.

> I personally like my implicit_rights idea, and it might be interesting
> to prototype it.

Yes I would use that if it is available, in mean time there was
several bugs and 2 public exploits last months.. and we should use
available features.

--
tixxdz

2017-04-22 19:29:43

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Fri, Apr 21, 2017 at 11:51 PM, Andy Lutomirski <[email protected]> wrote:
> On Fri, Apr 21, 2017 at 5:12 PM, Djalal Harouni <[email protected]> wrote:
>> On Sat, Apr 22, 2017 at 1:51 AM, Andy Lutomirski <[email protected]> wrote:
>> [...]
>>>>> I personally like my implicit_rights idea, and it might be interesting
>>>>> to prototype it.
>>>>
>>>> I don't like blocking a needed feature behind a large super-feature
>>>> that doesn't exist yet. We'd be able to refactor this code into using
>>>> such a thing in the future, so I'd prefer to move ahead with this
>>>> since it would stop actual exploits.
>>>
>>> I don't think the super-feature is so hard, and I think we should not
>>> add the per-task thing the way it's done in this patch. Let's not add
>>> per-task things where the best argument for their security is "not
>>> sure how it would be exploited".
>>
>> Actually the XFRM framework CVE-2017-7184 [1] is one real example, of
>> course there are others. The exploit was used on a generic distro
>> during a security contest that distro is Ubuntu. That distro will
>> never provide a module autoloading restriction by default to not harm
>> it's users. Consumers or containers/sandboxes then can run their
>> confined apps using such facilities.
>>
>> These bugs will stay in embedded devices that use these generic
>> distros for ever.
>>
>>> Anyway, I think the sysctl is really the important bit. The per-task
>>> setting is icing on the cake IMO. One upon a time autoload was more
>>> important, but these days modaliases are supposed to do most of the
>>> work. I bet that modern distros don't need unprivileged autoload at
>>> all.
>>
>> Actually I think they do and we can't just change that. Users may
>> depend on it, it is a well established facility.
>>
>> Now the other problem is CAP_NET_ADMIN which does lot of things, it is
>> more like the CAP_SYS_ADMIN.
>>
>> This is a quick list that I got from only the past months, I'm pretty
>> sure there are more:
>>
>> * DCCP use after free CVE-2017-6074
>> * n_hldc CVE-2017-2636
>> * XFRM framework CVE-2017-7184
>> * L2TPv3 CVE-2016-10200
>>
>> Most of these need CAP_NET_ADMIN to be autoloaded, however we also
>> need CAP_NET_ADMIN for other things... therefore it is better to have
>> an extra facility that could coexist with CAP_NET_ADMIN and other
>> sandbox features.
>>
>
> I agree that the feature is important, but I think your implementation
> is needlessly dangerous. I imagine that the main uses that you care
> about involve containers. How about doing it in a safer way that
> works for containers? I can think of a few. For example:
>
> 1. A sysctl that, if set, prevents autoloading outside the root
> userns. This isn't very flexible at all, but it might work.
>
> 2. Your patch, but require privilege within the calling namespace to
> set the prctl.

How about CAP_SYS_ADMIN || no_new_privs?

-Kees

>
> 3. A per-user-ns sysctl.



--
Kees Cook
Pixel Security

2017-04-24 04:32:20

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

Djalal Harouni <[email protected]> writes:
> When value is (1), task must have CAP_SYS_MODULE to be able to trigger a
> module auto-load operation, or CAP_NET_ADMIN for modules with a
> 'netdev-%s' alias.

Sorry, the magic 'netdev-' prefix is a crawling horror. To do this
properly, you need to hand the capability (if any) from the
request_module() call. Probably by adding a new request_module_cap and
making request_module() call that, then fixing up the callers.

Cheers,
Rusty.

2017-04-24 14:25:53

by Djalal Harouni

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Sat, Apr 22, 2017 at 9:29 PM, Kees Cook <[email protected]> wrote:
> On Fri, Apr 21, 2017 at 11:51 PM, Andy Lutomirski <[email protected]> wrote:
>> On Fri, Apr 21, 2017 at 5:12 PM, Djalal Harouni <[email protected]> wrote:
>>> On Sat, Apr 22, 2017 at 1:51 AM, Andy Lutomirski <[email protected]> wrote:
>>>
[...]
>>> * DCCP use after free CVE-2017-6074
>>> * n_hldc CVE-2017-2636
>>> * XFRM framework CVE-2017-7184
>>> * L2TPv3 CVE-2016-10200
>>>
>>> Most of these need CAP_NET_ADMIN to be autoloaded, however we also
>>> need CAP_NET_ADMIN for other things... therefore it is better to have
>>> an extra facility that could coexist with CAP_NET_ADMIN and other
>>> sandbox features.
>>>
>>
>> I agree that the feature is important, but I think your implementation
>> is needlessly dangerous. I imagine that the main uses that you care
>> about involve containers. How about doing it in a safer way that
>> works for containers? I can think of a few. For example:
>>
>> 1. A sysctl that, if set, prevents autoloading outside the root
>> userns. This isn't very flexible at all, but it might work.
>>
>> 2. Your patch, but require privilege within the calling namespace to
>> set the prctl.
>
> How about CAP_SYS_ADMIN || no_new_privs?
>
> -Kees
>

Yes I can update as per Andy suggestion to require privileges inside
the calling namespace to set prctl. Other options that are not prctl
based have more variants, that make them hard to use.

So I would got with CAP_SYS_ADMIN in the calling userns ||
no_new_privs , I would have said CAP_SYS_MODULE in the userns but it
seems better to standardize on CAP_SYS_ADMIN to set the prctl.

--
tixxdz

2017-04-24 18:02:53

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Mon, Apr 24, 2017 at 7:25 AM, Djalal Harouni <[email protected]> wrote:
> On Sat, Apr 22, 2017 at 9:29 PM, Kees Cook <[email protected]> wrote:
>> On Fri, Apr 21, 2017 at 11:51 PM, Andy Lutomirski <[email protected]> wrote:
>>> On Fri, Apr 21, 2017 at 5:12 PM, Djalal Harouni <[email protected]> wrote:
>>>> On Sat, Apr 22, 2017 at 1:51 AM, Andy Lutomirski <[email protected]> wrote:
>>>>
> [...]
>>>> * DCCP use after free CVE-2017-6074
>>>> * n_hldc CVE-2017-2636
>>>> * XFRM framework CVE-2017-7184
>>>> * L2TPv3 CVE-2016-10200
>>>>
>>>> Most of these need CAP_NET_ADMIN to be autoloaded, however we also
>>>> need CAP_NET_ADMIN for other things... therefore it is better to have
>>>> an extra facility that could coexist with CAP_NET_ADMIN and other
>>>> sandbox features.
>>>>
>>>
>>> I agree that the feature is important, but I think your implementation
>>> is needlessly dangerous. I imagine that the main uses that you care
>>> about involve containers. How about doing it in a safer way that
>>> works for containers? I can think of a few. For example:
>>>
>>> 1. A sysctl that, if set, prevents autoloading outside the root
>>> userns. This isn't very flexible at all, but it might work.
>>>
>>> 2. Your patch, but require privilege within the calling namespace to
>>> set the prctl.
>>
>> How about CAP_SYS_ADMIN || no_new_privs?
>>
>> -Kees
>>
>
> Yes I can update as per Andy suggestion to require privileges inside
> the calling namespace to set prctl. Other options that are not prctl
> based have more variants, that make them hard to use.
>
> So I would got with CAP_SYS_ADMIN in the calling userns ||
> no_new_privs , I would have said CAP_SYS_MODULE in the userns but it
> seems better to standardize on CAP_SYS_ADMIN to set the prctl.

Andy's concern is that it would provide an escalation from SYS_MODULE
to SYS_ADMIN through some privileged process being tricked through a
missing API provided by modules, so we have to use either SYS_ADMIN ||
nnp.

-Kees

--
Kees Cook
Pixel Security

2017-04-24 18:35:20

by Djalal Harouni

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Mon, Apr 24, 2017 at 8:02 PM, Kees Cook <[email protected]> wrote:
> On Mon, Apr 24, 2017 at 7:25 AM, Djalal Harouni <[email protected]> wrote:
>> On Sat, Apr 22, 2017 at 9:29 PM, Kees Cook <[email protected]> wrote:
>>> On Fri, Apr 21, 2017 at 11:51 PM, Andy Lutomirski <[email protected]> wrote:
>>>> On Fri, Apr 21, 2017 at 5:12 PM, Djalal Harouni <[email protected]> wrote:
>>>>> On Sat, Apr 22, 2017 at 1:51 AM, Andy Lutomirski <[email protected]> wrote:
>>>>>
>> [...]
>>>>> * DCCP use after free CVE-2017-6074
>>>>> * n_hldc CVE-2017-2636
>>>>> * XFRM framework CVE-2017-7184
>>>>> * L2TPv3 CVE-2016-10200
>>>>>
>>>>> Most of these need CAP_NET_ADMIN to be autoloaded, however we also
>>>>> need CAP_NET_ADMIN for other things... therefore it is better to have
>>>>> an extra facility that could coexist with CAP_NET_ADMIN and other
>>>>> sandbox features.
>>>>>
>>>>
>>>> I agree that the feature is important, but I think your implementation
>>>> is needlessly dangerous. I imagine that the main uses that you care
>>>> about involve containers. How about doing it in a safer way that
>>>> works for containers? I can think of a few. For example:
>>>>
>>>> 1. A sysctl that, if set, prevents autoloading outside the root
>>>> userns. This isn't very flexible at all, but it might work.
>>>>
>>>> 2. Your patch, but require privilege within the calling namespace to
>>>> set the prctl.
>>>
>>> How about CAP_SYS_ADMIN || no_new_privs?
>>>
>>> -Kees
>>>
>>
>> Yes I can update as per Andy suggestion to require privileges inside
>> the calling namespace to set prctl. Other options that are not prctl
>> based have more variants, that make them hard to use.
>>
>> So I would got with CAP_SYS_ADMIN in the calling userns ||
>> no_new_privs , I would have said CAP_SYS_MODULE in the userns but it
>> seems better to standardize on CAP_SYS_ADMIN to set the prctl.
>
> Andy's concern is that it would provide an escalation from SYS_MODULE
> to SYS_ADMIN through some privileged process being tricked through a
> missing API provided by modules, so we have to use either SYS_ADMIN ||
> nnp.

Yes, I would say that programs expect that maybe such functionality is
not provided, but we don't know. I will update.

Thanks!


--
tixxdz

2017-04-26 09:25:44

by Djalal Harouni

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

Hi Rusty,

On Mon, Apr 24, 2017 at 6:29 AM, Rusty Russell <[email protected]> wrote:
> Djalal Harouni <[email protected]> writes:
>> When value is (1), task must have CAP_SYS_MODULE to be able to trigger a
>> module auto-load operation, or CAP_NET_ADMIN for modules with a
>> 'netdev-%s' alias.
>
> Sorry, the magic 'netdev-' prefix is a crawling horror. To do this

Yes I agree, that's the not the best part. I added it for backward
compatibility since I did notice that some network daemon retain
CAP_NET_ADMIN for this case. The aim of the patches is to get modules
autoload covered with CAP_SYS_MODULE and make it more like explicit
modules loading.

> properly, you need to hand the capability (if any) from the
> request_module() call. Probably by adding a new request_module_cap and
> making request_module() call that, then fixing up the callers.

Hmm, sorry Rusty I'm a bit confused, when you refer to "callers", you
mean request_module() callers ?

If so, I'm not sure that the best thing here since it may defeat the
purpose of this feature if we allow to pass capabilities.

Right now we have modules autoload that can be triggered without
privileges, or with CAP_SYS_ADMIN, CAP_NET_ADMIN, CAP_SYS_MODULE...
and some caps may allow to load other modules to resolve symbols etc.

The public exploits did target CAP_NET_ADMIN, if we offer a way to
pass in capabilities it will be used it as it is the case now without
it, not to mention that some capabilities are overloaded, exploits
will continue for these ones.

The goal is to narrow modules autoload only to CAP_SYS_MODULE or
disable it completely for process trees. Later you can give
CAP_SYS_MODULE and you are sure that only /lib/modules/ will be
autoloaded, no arbitrary loads by using seccomp filter on module
syscalls, or separate the process in two one with CAP_SYS_MODULE and
the other with its own capabilities and feature use.

I also don't like that 'netdev-%s' but it is for compatibility for
specific cases, maybe there are others that we may have to add. But I
would prefer if we narrow it down to only CAP_SYS_MODULE.

How about I move all permission checks to security/commoncap.c
helpers, the module logic part will only contain set and read sysctl
"modules_autoload" and "task->modules_autoload" ?

I already added cap_kernel_module_request() which is called by
request_module(), so instead of counting on module to do the
permission check I will open code it inside
security/commoncap.c:cap_kernel_module_request() like other capability
helpers and note that CAP_NET_ADMIN 'netdev-%s' alias is only for
compatibility. This way we prevent that other capabilities get exposed
or targeted for exploitation. Do you think that this will work ?

Thanks!

> Cheers,
> Rusty.

--
tixxdz

2017-04-27 03:17:25

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

Djalal Harouni <[email protected]> writes:
> Hi Rusty,
>
> On Mon, Apr 24, 2017 at 6:29 AM, Rusty Russell <[email protected]> wrote:
>> Djalal Harouni <[email protected]> writes:
>>> When value is (1), task must have CAP_SYS_MODULE to be able to trigger a
>>> module auto-load operation, or CAP_NET_ADMIN for modules with a
>>> 'netdev-%s' alias.
>>
>> Sorry, the magic 'netdev-' prefix is a crawling horror. To do this
>
> Yes I agree, that's the not the best part. I added it for backward
> compatibility since I did notice that some network daemon retain
> CAP_NET_ADMIN for this case. The aim of the patches is to get modules
> autoload covered with CAP_SYS_MODULE and make it more like explicit
> modules loading.
>
>> properly, you need to hand the capability (if any) from the
>> request_module() call. Probably by adding a new request_module_cap and
>> making request_module() call that, then fixing up the callers.
>
> Hmm, sorry Rusty I'm a bit confused, when you refer to "callers", you
> mean request_module() callers ?

Yes.

> If so, I'm not sure that the best thing here since it may defeat the
> purpose of this feature if we allow to pass capabilities.
>
> Right now we have modules autoload that can be triggered without
> privileges, or with CAP_SYS_ADMIN, CAP_NET_ADMIN, CAP_SYS_MODULE...
> and some caps may allow to load other modules to resolve symbols etc.
>
> The public exploits did target CAP_NET_ADMIN, if we offer a way to
> pass in capabilities it will be used it as it is the case now without
> it, not to mention that some capabilities are overloaded, exploits
> will continue for these ones.
>
> The goal is to narrow modules autoload only to CAP_SYS_MODULE or
> disable it completely for process trees. Later you can give
> CAP_SYS_MODULE and you are sure that only /lib/modules/ will be
> autoloaded, no arbitrary loads by using seccomp filter on module
> syscalls, or separate the process in two one with CAP_SYS_MODULE and
> the other with its own capabilities and feature use.
>
> I also don't like that 'netdev-%s' but it is for compatibility for
> specific cases, maybe there are others that we may have to add. But I
> would prefer if we narrow it down to only CAP_SYS_MODULE.

There's one place where this is called, net/core/dev_ioctl.c:

if (no_module && capable(CAP_NET_ADMIN))
no_module = request_module("netdev-%s", name);

*That's the place* you want to add the exception, and the cleanest way
is probably to add another arg to __request_module:

(incomplete patch, but you get the idea):

diff --git a/include/linux/kmod.h b/include/linux/kmod.h
index c4e441e00db5..2ea82d5d20af 100644
--- a/include/linux/kmod.h
+++ b/include/linux/kmod.h
@@ -33,15 +33,16 @@ extern char modprobe_path[]; /* for sysctl */
/* modprobe exit status on success, -ve on error. Return value
* usually useless though. */
extern __printf(2, 3)
-int __request_module(bool wait, const char *name, ...);
-#define request_module(mod...) __request_module(true, mod)
-#define request_module_nowait(mod...) __request_module(false, mod)
-#define try_then_request_module(x, mod...) \
- ((x) ?: (__request_module(true, mod), (x)))
+int __request_module(bool wait, int allow_cap, const char *name, ...);
#else
-static inline int request_module(const char *name, ...) { return -ENOSYS; }
-static inline int request_module_nowait(const char *name, ...) { return -ENOSYS; }
-#define try_then_request_module(x, mod...) (x)
+static inline __printf(2,3)
+int __request_module(bool wait, int allow_cap, const char *name, ...)
+{ return -ENOSYS; }
+#endif
+#define request_module(mod...) __request_module(true, -1, mod)
+#define request_module_nowait(mod...) __request_module(false, -1, mod)
+#define try_then_request_module(x, mod...) \
+ ((x) ?: (__request_module(true, -1, mod), (x)))
#endif


diff --git a/include/linux/security.h b/include/linux/security.h
index 96899fad7016..9f1217c7cb23 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -889,9 +890,9 @@ static inline int security_kernel_create_files_as(struct cred *cred,
return 0;
}

-static inline int security_kernel_module_request(char *kmod_name)
+static inline int security_kernel_module_request(char *kmod_name, int allow_cap)
{
- return 0;
+ return cap_kernel_module_request(kmod_name, allow_cap);
}

static inline int security_kernel_read_file(struct file *file,
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 563f97e2be36..b2d2f525c80b 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -110,6 +110,7 @@ static int call_modprobe(char *module_name, int wait)
/**
* __request_module - try to load a kernel module
* @wait: wait (or not) for the operation to complete
+ * @allow_cap: if positive, always allow modprobe if this capability set.
* @fmt: printf style format string for the name of the module
* @...: arguments as specified in the format string
*
@@ -123,7 +124,8 @@ static int call_modprobe(char *module_name, int wait)
* If module auto-loading support is disabled then this function
* becomes a no-operation.
*/
-int __request_module(bool wait, const char *fmt, ...)
+
+int __request_module(bool wait, int allow_cap, const char *fmt, ...)
{
va_list args;
char module_name[MODULE_NAME_LEN];
@@ -150,7 +152,7 @@ int __request_module(bool wait, const char *fmt, ...)
if (ret >= MODULE_NAME_LEN)
return -ENAMETOOLONG;

- ret = security_kernel_module_request(module_name);
+ ret = security_kernel_module_request(module_name, allow_cap);
if (ret)
return ret;

diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index b94b1d293506..e7a7dc28761d 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -367,7 +367,8 @@ void dev_load(struct net *net, const char *name)

no_module = !dev;
if (no_module && capable(CAP_NET_ADMIN))
- no_module = request_module("netdev-%s", name);
+ no_module = __request_module(true, CAP_NET_ADMIN,
+ "netdev-%s", name);
if (no_module && capable(CAP_SYS_MODULE))
request_module("%s", name);
}

Hope that helps,
Rusty.

2017-04-27 13:16:24

by Djalal Harouni

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules:capabilities: add a per-task modules autoload restriction

On Thu, Apr 27, 2017 at 4:07 AM, Rusty Russell <[email protected]> wrote:
> Djalal Harouni <[email protected]> writes:
>> Hi Rusty,
>>
>> On Mon, Apr 24, 2017 at 6:29 AM, Rusty Russell <[email protected]> wrote:
>>> Djalal Harouni <[email protected]> writes:
>>>> When value is (1), task must have CAP_SYS_MODULE to be able to trigger a
>>>> module auto-load operation, or CAP_NET_ADMIN for modules with a
>>>> 'netdev-%s' alias.
>>>
>>> Sorry, the magic 'netdev-' prefix is a crawling horror. To do this
>>
>> Yes I agree, that's the not the best part. I added it for backward
>> compatibility since I did notice that some network daemon retain
>> CAP_NET_ADMIN for this case. The aim of the patches is to get modules
>> autoload covered with CAP_SYS_MODULE and make it more like explicit
>> modules loading.
>>
>>> properly, you need to hand the capability (if any) from the
>>> request_module() call. Probably by adding a new request_module_cap and
>>> making request_module() call that, then fixing up the callers.
>>
>> Hmm, sorry Rusty I'm a bit confused, when you refer to "callers", you
>> mean request_module() callers ?
>
> Yes.
>
>> If so, I'm not sure that the best thing here since it may defeat the
>> purpose of this feature if we allow to pass capabilities.
>>
>> Right now we have modules autoload that can be triggered without
>> privileges, or with CAP_SYS_ADMIN, CAP_NET_ADMIN, CAP_SYS_MODULE...
>> and some caps may allow to load other modules to resolve symbols etc.
>>
>> The public exploits did target CAP_NET_ADMIN, if we offer a way to
>> pass in capabilities it will be used it as it is the case now without
>> it, not to mention that some capabilities are overloaded, exploits
>> will continue for these ones.
>>
>> The goal is to narrow modules autoload only to CAP_SYS_MODULE or
>> disable it completely for process trees. Later you can give
>> CAP_SYS_MODULE and you are sure that only /lib/modules/ will be
>> autoloaded, no arbitrary loads by using seccomp filter on module
>> syscalls, or separate the process in two one with CAP_SYS_MODULE and
>> the other with its own capabilities and feature use.
>>
>> I also don't like that 'netdev-%s' but it is for compatibility for
>> specific cases, maybe there are others that we may have to add. But I
>> would prefer if we narrow it down to only CAP_SYS_MODULE.
>
> There's one place where this is called, net/core/dev_ioctl.c:
>
> if (no_module && capable(CAP_NET_ADMIN))
> no_module = request_module("netdev-%s", name);
>
> *That's the place* you want to add the exception, and the cleanest way
> is probably to add another arg to __request_module:

Ok I see. I guess we have to add comments that this is for netdev
only, and other parts should continue to go with standard
request_module().


> (incomplete patch, but you get the idea):
>
> diff --git a/include/linux/kmod.h b/include/linux/kmod.h
> index c4e441e00db5..2ea82d5d20af 100644
> --- a/include/linux/kmod.h
> +++ b/include/linux/kmod.h
> @@ -33,15 +33,16 @@ extern char modprobe_path[]; /* for sysctl */
> /* modprobe exit status on success, -ve on error. Return value
> * usually useless though. */
> extern __printf(2, 3)
> -int __request_module(bool wait, const char *name, ...);
> -#define request_module(mod...) __request_module(true, mod)
> -#define request_module_nowait(mod...) __request_module(false, mod)
> -#define try_then_request_module(x, mod...) \
> - ((x) ?: (__request_module(true, mod), (x)))
> +int __request_module(bool wait, int allow_cap, const char *name, ...);
> #else
> -static inline int request_module(const char *name, ...) { return -ENOSYS; }
> -static inline int request_module_nowait(const char *name, ...) { return -ENOSYS; }
> -#define try_then_request_module(x, mod...) (x)
> +static inline __printf(2,3)
> +int __request_module(bool wait, int allow_cap, const char *name, ...)
> +{ return -ENOSYS; }
> +#endif
> +#define request_module(mod...) __request_module(true, -1, mod)
> +#define request_module_nowait(mod...) __request_module(false, -1, mod)
> +#define try_then_request_module(x, mod...) \
> + ((x) ?: (__request_module(true, -1, mod), (x)))
> #endif
>
>
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 96899fad7016..9f1217c7cb23 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -889,9 +890,9 @@ static inline int security_kernel_create_files_as(struct cred *cred,
> return 0;
> }
>
> -static inline int security_kernel_module_request(char *kmod_name)
> +static inline int security_kernel_module_request(char *kmod_name, int allow_cap)
> {
> - return 0;
> + return cap_kernel_module_request(kmod_name, allow_cap);
> }
>
> static inline int security_kernel_read_file(struct file *file,
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index 563f97e2be36..b2d2f525c80b 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -110,6 +110,7 @@ static int call_modprobe(char *module_name, int wait)
> /**
> * __request_module - try to load a kernel module
> * @wait: wait (or not) for the operation to complete
> + * @allow_cap: if positive, always allow modprobe if this capability set.

Alright! and just to make sure that this will be true only if the
sysctl "modules_autoload_mode" or "task->modules_autoload_mode" are
not in 'disabled' mode. If disabled, then capability is ignored and
modules autoload is disabled for all or for the related process tree.
Ok I have to document that.



> * @fmt: printf style format string for the name of the module
> * @...: arguments as specified in the format string
> *
> @@ -123,7 +124,8 @@ static int call_modprobe(char *module_name, int wait)
> * If module auto-loading support is disabled then this function
> * becomes a no-operation.
> */
> -int __request_module(bool wait, const char *fmt, ...)
> +
> +int __request_module(bool wait, int allow_cap, const char *fmt, ...)
> {
> va_list args;
> char module_name[MODULE_NAME_LEN];
> @@ -150,7 +152,7 @@ int __request_module(bool wait, const char *fmt, ...)
> if (ret >= MODULE_NAME_LEN)
> return -ENAMETOOLONG;
>
> - ret = security_kernel_module_request(module_name);
> + ret = security_kernel_module_request(module_name, allow_cap);
> if (ret)
> return ret;
>
> diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
> index b94b1d293506..e7a7dc28761d 100644
> --- a/net/core/dev_ioctl.c
> +++ b/net/core/dev_ioctl.c
> @@ -367,7 +367,8 @@ void dev_load(struct net *net, const char *name)
>
> no_module = !dev;
> if (no_module && capable(CAP_NET_ADMIN))
> - no_module = request_module("netdev-%s", name);
> + no_module = __request_module(true, CAP_NET_ADMIN,
> + "netdev-%s", name);
> if (no_module && capable(CAP_SYS_MODULE))
> request_module("%s", name);
> }
>
> Hope that helps,

Indeed. Thank you!

So I'll wait for more feedback maybe other maintainers want to comment
on this bit, I'll re-send after the merge window.


> Rusty.

Thanks!


--
tixxdz