Hello,
I'm in the process of writing a Smack namespace that is based on LSM
namespace hooks that I'm implementing as well. The work is almost
finished and the patches are currently undergoing an internal review.
Smack namespace was designed with collaboration of Smack maintainer
Casey Schaufler.
Meanwhile I'd like to request some comments about the LSM hooks patch
I have here. I realize that maybe it's difficult to evaluate this
without live usage example (that Smack namespace will server as) but
at this point any comments would be great.
Smack namespace have been successfully implemented using these hooks
and to put some context to it I paste here a preliminary kernel
documentation on what Smack namespace wants to achieve.
LSM hooks themselves are documented in the security.h file inside the
patch.
======================================================================
=== What is Smack namespace ===
Smack namespace was developed to make it possible for Smack to work
nicely with Linux containers where there is a full operating system
with its own init inside the namespace. Such a system working with
Smack expects to have at least partially working SMACK_MAC_ADMIN to be
able to change labels of processes and files. This is required to be
able to securely start applications under the control of Smack and
manage their access rights.
It was implemented using LSM namespace hooks that were developed
together with Smack namespace.
=== Design ideas ===
"Smack namespace" is rather "Smack labels namespace" as not the whole
MAC is namespaced, only the labels. There is a great analogy between
Smack labels namespace and the user namespace part that remaps UIDs.
The idea is to create a map of labels for a namespace so the namespace
is only allowed to use those labels. Smack rules are always the same
as in the init namespace (limited only by what labels are mapped) and
cannot be manipulated from the child namespace. The map is actually
only for labels' names. The underlying structures for labels remain
the same. The filesystem also stores the "unmapped" labels from the
init namespace.
Let's say we have those labels in the init namespace:
label1
label2
label3
and those rules:
label1 label2 rwx
label1 label3 rwx
label2 label3 rwx
We create a map for a namespace:
label1 -> mapped1
label2 -> mapped2
This means that 'label3' is completely invisible in the namespace. As if
it didn't exist. All the rules that include it are ignored.
Effectively in the namespace we have only one rule:
mapped1 mapped2 rwx
Which in reality is:
label1 label2 rwx
All requests to access an object with a 'label3' will be denied. If it
ever comes to a situation where 'label3' would have to be printed
(e.g. reading an exec or mmap label from a file to which we have
access) then huh sign '?' will be printed instead.
All the operations in the namespace on the remaining labels will have
to be performed using their mapped names. Things like changing own
process's label, changing filesystem label. Labels will also be
printed with their mapped names.
You cannot import new labels in a namespace. Every operation that
would do so in an init namespace will return an error in the child
namespace. You cannot assign an unmapped or not existing label to an
object. You can only operate on labels that have been explicitly
mapped.
=== Capabilities ===
Smack related capabilities work to some extent. In several places
where capabilities are checked compatibility with Smack namespace has
been introduced. Capabilities are of course also limited to operate
only on mapped labels.
CAP_MAC_OVERRIDE works fully, will allow you to ignore Smack access
rules, but only between objects that have labels mapped. So in the
example above having this CAP will allow e.g. label2 to write to
label1, but will not allow any access to label3.
With CAP_MAC_ADMIN the following operations has been allowed inside
the namespace:
- setting and removing xattr on files, including the security ones
- setting process's own label (/proc/self/attr/current)
- mounting in a privileged Smack mode, which means one can specify
additional mount options like: smackfsdef, smackfsfloor etc.
Again this is also allowed only on the mapped labels. Labels on the
filesystem will be stored in unmapped form so they are preserved
through reboots.
Such a namespace construct allows e.g. systemd (that supports Smack)
working in a container to assign labels properly to daemons and other
processes.
=== Usage ===
Smack namespace is written using LSM hooks. It's a normal namespace
that behaves similarly to all the others existing right now.
You can create a new Smack/LSM namespace using e.g. unshare(). The
labels' map is in a file /proc/$PID/attr_map. By default it's empty so
it has to be filled before any other operation is performed (as no
mapped labels equals no labels inside a namespace equals all the
operations will not be permitted).
Due to the way Smack works only CAP_MAC_ADMIN from the init_user_ns is
allowed to fill the map. That means that for now an unprivileged user
in theory is allowed to create the namespace but it will not allow for
any operation inside. An administrator intervention to fill the
labels' maps is required. The possibility similar to user namespace
where a process could at least remap its own label will be
re-investigated later on.
The attr_map write format is:
unmapped_label mapped_label
When reading the file it shows a current map for a namespace the
process in question is in in the format:
unmapped_label -> mapped_label
Writing to the map file is not disabled after the first write as in
user namespace. For Smack we have no means to map ranges of labels,
hence it can really be advantageous to be able to expand the map later
on. But you can only add to the map. You cannot remove already mapped
labels. You cannot change the already existing mappings. Also mappings
has to be 1-1. All tries to create a map where either the unmapped or
the mapped label already exists in the map will be denied.
setns is also allowed, but the label of a process that is calling
setns has to be already mapped in the target Smack namespace for the
call to succeed.
=== Special labels ===
Smack is using some special labels that have built-in rules. Things
like floor '_', dash '^', star '*', etc. Those labels are not
automatically mapped to the namespace. Moreover, you can choose to map
a different label from the init namespace to behave e.g. like floor
inside the namespace.
Let's say we have no rules and those labels in the init namespace:
_
floor_to_be
label
Both label and floor_to_be can read objects with _. But they have no
access rights to each other.
Now let's create a map like this:
_ ordinary_label
floor_to_be _
label mapped
Right now label 'mapped' can read label '_' which means that
effectively inside this namespace label 'label' has gained read access
to the 'floor_to_be'. The label 'ordinary_label' is exactly it, an
ordinary label that the built-in rules no longer apply to inside the
namespace.
To sum up special labels in the namespace behave the same as in the
init namespace. Not the original special labels though, but the ones
we map to specials. This is the only case where a namespace can have
access rights the init namespace does not have (like the 'label' to
'floor_to_be' in the example above).
=== Current limitations ===
The Smack namespace is not hierarchical yet. It is possible to create
a Smack namespace within the Smack namespace but the creating
namespace will be denied the right to fill the map. Only CAP_MAC_ADMIN
in the init_user_ns can do that now. When hierarchy will be
implemented the process creating another namespace will be allowed to
map only labels that it has permission to itself (those that it has
in its own map).
Special files inside the virtual smackfs needs to be reviewed whether
it's beneficial to have some of their functionality namespaced as well
(e.g. onlycap, syslog. ambient, etc). This would increase
CAP_MAC_ADMIN privileges inside the namespace.
Lukasz Pawelczyk (1):
lsm: namespace hooks
fs/proc/namespaces.c | 4 ++
include/linux/lsm_namespace.h | 68 +++++++++++++++++++
include/linux/nsproxy.h | 2 +
include/linux/proc_ns.h | 2 +
include/linux/security.h | 80 +++++++++++++++++++++++
include/uapi/linux/sched.h | 3 +-
kernel/fork.c | 2 +-
kernel/nsproxy.c | 22 ++++++-
security/Kconfig | 8 +++
security/Makefile | 1 +
security/capability.c | 33 ++++++++++
security/lsm_namespace.c | 147 ++++++++++++++++++++++++++++++++++++++++++
security/security.c | 42 ++++++++++++
13 files changed, 408 insertions(+), 6 deletions(-)
create mode 100644 include/linux/lsm_namespace.h
create mode 100644 security/lsm_namespace.c
--
1.9.3
This commit implements an empty LSM namespace that provides 5 hooks for
LSM modules to implement. Using those an LSM module can implement its
own namespace. The first one to take advantage of this mechanism is
Smack.
Look into the comments in the security.h below for info about specific
hooks.
Signed-off-by: Lukasz Pawelczyk <[email protected]>
---
fs/proc/namespaces.c | 4 ++
include/linux/lsm_namespace.h | 68 +++++++++++++++++++
include/linux/nsproxy.h | 2 +
include/linux/proc_ns.h | 2 +
include/linux/security.h | 80 +++++++++++++++++++++++
include/uapi/linux/sched.h | 3 +-
kernel/fork.c | 2 +-
kernel/nsproxy.c | 22 ++++++-
security/Kconfig | 8 +++
security/Makefile | 1 +
security/capability.c | 33 ++++++++++
security/lsm_namespace.c | 147 ++++++++++++++++++++++++++++++++++++++++++
security/security.c | 42 ++++++++++++
13 files changed, 408 insertions(+), 6 deletions(-)
create mode 100644 include/linux/lsm_namespace.h
create mode 100644 security/lsm_namespace.c
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index 8902609..266e8d8 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -12,10 +12,14 @@
#include <linux/ipc_namespace.h>
#include <linux/pid_namespace.h>
#include <linux/user_namespace.h>
+#include <linux/lsm_namespace.h>
#include "internal.h"
static const struct proc_ns_operations *ns_entries[] = {
+#ifdef CONFIG_SECURITY_NS
+ &lsmns_operations,
+#endif
#ifdef CONFIG_NET_NS
&netns_operations,
#endif
diff --git a/include/linux/lsm_namespace.h b/include/linux/lsm_namespace.h
new file mode 100644
index 0000000..7caf3db
--- /dev/null
+++ b/include/linux/lsm_namespace.h
@@ -0,0 +1,68 @@
+/*
+ * Copyright (C) 2014 Samsung Electronics.
+ *
+ * LSM namespaces
+ *
+ * Author(s):
+ * Lukasz Pawelczyk <[email protected]>
+ *
+ * This program is free software, you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _LINUX_LSM_S_H
+#define _LINUX_LSM_S_H
+
+#include <linux/sched.h>
+#include <linux/kref.h>
+
+struct lsm_namespace {
+ struct kref kref;
+ struct user_namespace *user_ns;
+ unsigned int proc_inum;
+ /* for LSM usage */
+ void *private;
+};
+
+extern struct lsm_namespace init_lsm_ns;
+
+#ifdef CONFIG_SECURITY_NS
+
+struct lsm_namespace *copy_lsm_ns(unsigned long flags,
+ struct user_namespace *user_ns,
+ struct lsm_namespace *ns);
+void free_lsm_ns(struct kref *kref);
+
+static inline struct lsm_namespace *get_lsm_ns(struct lsm_namespace *ns)
+{
+ kref_get(&ns->kref);
+ return ns;
+}
+
+static inline void put_lsm_ns(struct lsm_namespace *ns)
+{
+ kref_put(&ns->kref, free_lsm_ns);
+}
+
+#else /* CONFIG_SECURITY_NS */
+
+static inline struct lsm_namespace *copy_lsm_ns(unsigned long flags,
+ struct user_namespace *user_ns,
+ struct lsm_namespace *ns)
+{
+ return ns;
+}
+
+static inline struct lsm_namespace *get_lsm_ns(struct lsm_namespace *ns)
+{
+ return ns;
+}
+
+static inline void put_lsm_ns(struct lsm_namespace *ns)
+{
+}
+
+#endif /* CONFIG_SECURITY_NS */
+
+#endif /* _LINUX_LSM_S_H */
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index b4ec59d..58c7de1 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -8,6 +8,7 @@ struct mnt_namespace;
struct uts_namespace;
struct ipc_namespace;
struct pid_namespace;
+struct lsm_namespace;
struct fs_struct;
/*
@@ -33,6 +34,7 @@ struct nsproxy {
struct mnt_namespace *mnt_ns;
struct pid_namespace *pid_ns_for_children;
struct net *net_ns;
+ struct lsm_namespace *lsm_ns;
};
extern struct nsproxy init_nsproxy;
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index 34a1e10..b5eaa50 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -27,6 +27,7 @@ extern const struct proc_ns_operations ipcns_operations;
extern const struct proc_ns_operations pidns_operations;
extern const struct proc_ns_operations userns_operations;
extern const struct proc_ns_operations mntns_operations;
+extern const struct proc_ns_operations lsmns_operations;
/*
* We always define these enumerators
@@ -37,6 +38,7 @@ enum {
PROC_UTS_INIT_INO = 0xEFFFFFFEU,
PROC_USER_INIT_INO = 0xEFFFFFFDU,
PROC_PID_INIT_INO = 0xEFFFFFFCU,
+ PROC_LSM_INIT_INO = 0xEFFFFFFBU,
};
#ifdef CONFIG_PROC_FS
diff --git a/include/linux/security.h b/include/linux/security.h
index 3b3aeb1..82a6ece 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -144,6 +144,7 @@ extern unsigned long dac_mmap_min_addr;
/* forward declares to avoid warnings */
struct sched_param;
struct request_sock;
+struct lsm_namespace;
/* bprm->unsafe reasons */
#define LSM_UNSAFE_SHARE 1
@@ -1400,6 +1401,37 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
* audit_rule_init.
* @rule contains the allocated rule
*
+ * @lsmns_init:
+ * Initialize initial LSM namespace. This function should allocate and
+ * fill the private part of the init_lsm_ns.
+ *
+ * @lsmns_allow:
+ * Run during a request to create/copy new lsm namespace while creating
+ * new nsproxy structure. Returning failure here will block the whole
+ * operation requested from userspace (e.g. unshare() or clone()).
+ * This function can be effectively used to disallow new namespaces
+ * creation.
+ * @flags contains flags passed to the userspace syscall (e.g. CLONE_*)
+ * @user_ns points to the associated user namespace
+ * @old_ns pints to the lsm namespace under which the operation happens
+ * Return 0 to allow or -ERRNO to disallow.
+ *
+ * @lsmns_setns:
+ * Run during a setns syscall to add a process to an already existing
+ * lsm namespace. Returning failure here will block the operation
+ * requested from userspace (setns() with CLONE_NEWLSM).
+ * @nsproxy contains a nsproxy to which the lsm namespace will be assigned.
+ * @ns contains lsm namespace that is to be incorporated to the nsproxy.
+ *
+ * @lsmns_create:
+ * Allocates and fills the private part of a new lsm namespace.
+ * @ns points to a newly created lsm namespace.
+ * Return a pointer to a namespace or ERR_PTR(-ERRNO) on error.
+ *
+ * @lsmns_free:
+ * Deallocates the private part of a new lsm namespace.
+ * @ns points to a lsm namespace about to be destroyed.
+ *
* @inode_notifysecctx:
* Notify the security module of what the security context of an inode
* should be. Initializes the incore security context managed by the
@@ -1729,6 +1761,15 @@ struct security_operations {
struct audit_context *actx);
void (*audit_rule_free) (void *lsmrule);
#endif /* CONFIG_AUDIT */
+
+#ifdef CONFIG_SECURITY_NS
+ void (*lsmns_init)(void);
+ int (*lsmns_allow)(unsigned long flags, struct user_namespace *user_ns,
+ struct lsm_namespace *old_ns);
+ int (*lsmns_setns)(struct nsproxy *nsproxy, struct lsm_namespace *ns);
+ int (*lsmns_create)(struct lsm_namespace *ns);
+ void (*lsmns_free)(struct lsm_namespace *ns);
+#endif /* CONFIG_SECURITY_NS */
};
/* prototypes */
@@ -3116,6 +3157,45 @@ static inline void security_audit_rule_free(void *lsmrule)
#endif /* CONFIG_SECURITY */
#endif /* CONFIG_AUDIT */
+#ifdef CONFIG_SECURITY_NS
+
+void security_lsmns_init(void);
+int security_lsmns_allow(unsigned long flags, struct user_namespace *user_ns,
+ struct lsm_namespace *old_ns);
+int security_lsmns_setns(struct nsproxy *nsproxy, struct lsm_namespace *ns);
+int security_lsmns_create(struct lsm_namespace *ns);
+void security_lsmns_free(struct lsm_namespace *ns);
+
+#else /* CONFIG_SECURITY_NS */
+
+static inline void security_lsmns_init(void)
+{
+}
+
+static inline int security_lsmns_allow(unsigned long flags,
+ struct user_namespace *user_ns,
+ struct lsm_namespace *old_ns)
+{
+ return 0;
+}
+
+static inline int security_lsmns_setns(struct nsproxy *nsproxy,
+ struct lsm_namespace *ns)
+{
+ return 0;
+}
+
+static inline int security_lsmns_create(struct lsm_namespace *ns)
+{
+ return 0;
+}
+
+static inline void security_lsmns_free(struct lsm_namespace *ns)
+{
+}
+
+#endif /* CONFIG_SECURITY_NS */
+
#ifdef CONFIG_SECURITYFS
extern struct dentry *securityfs_create_file(const char *name, umode_t mode,
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 34f9d73..5ac7fb9 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -21,8 +21,7 @@
#define CLONE_DETACHED 0x00400000 /* Unused, ignored */
#define CLONE_UNTRACED 0x00800000 /* set if the tracing process can't force CLONE_PTRACE on this clone */
#define CLONE_CHILD_SETTID 0x01000000 /* set the TID in the child */
-/* 0x02000000 was previously the unused CLONE_STOPPED (Start in stopped state)
- and is now available for re-use. */
+#define CLONE_NEWLSM 0x02000000 /* New LSM namespace */
#define CLONE_NEWUTS 0x04000000 /* New utsname group? */
#define CLONE_NEWIPC 0x08000000 /* New ipcs */
#define CLONE_NEWUSER 0x10000000 /* New user namespace */
diff --git a/kernel/fork.c b/kernel/fork.c
index ed4bc33..a5ac809 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1796,7 +1796,7 @@ static int check_unshare_flags(unsigned long unshare_flags)
if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_NEWNS|CLONE_SIGHAND|
CLONE_VM|CLONE_FILES|CLONE_SYSVSEM|
CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET|
- CLONE_NEWUSER|CLONE_NEWPID))
+ CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWLSM))
return -EINVAL;
/*
* Not implemented, but pretend it works if there is nothing to
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 8e78110..841228d 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -25,6 +25,7 @@
#include <linux/proc_ns.h>
#include <linux/file.h>
#include <linux/syscalls.h>
+#include <linux/lsm_namespace.h>
static struct kmem_cache *nsproxy_cachep;
@@ -39,6 +40,9 @@ struct nsproxy init_nsproxy = {
#ifdef CONFIG_NET
.net_ns = &init_net,
#endif
+#ifdef CONFIG_SECURITY
+ .lsm_ns = &init_lsm_ns,
+#endif
};
static inline struct nsproxy *create_nsproxy(void)
@@ -67,6 +71,12 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
if (!new_nsp)
return ERR_PTR(-ENOMEM);
+ new_nsp->lsm_ns = copy_lsm_ns(flags, user_ns, tsk->nsproxy->lsm_ns);
+ if (IS_ERR(new_nsp->lsm_ns)) {
+ err = PTR_ERR(new_nsp->lsm_ns);
+ goto out_lsm;
+ }
+
new_nsp->mnt_ns = copy_mnt_ns(flags, tsk->nsproxy->mnt_ns, user_ns, new_fs);
if (IS_ERR(new_nsp->mnt_ns)) {
err = PTR_ERR(new_nsp->mnt_ns);
@@ -113,6 +123,9 @@ out_uts:
if (new_nsp->mnt_ns)
put_mnt_ns(new_nsp->mnt_ns);
out_ns:
+ if (new_nsp->lsm_ns)
+ put_lsm_ns(new_nsp->lsm_ns);
+out_lsm:
kmem_cache_free(nsproxy_cachep, new_nsp);
return ERR_PTR(err);
}
@@ -128,7 +141,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
struct nsproxy *new_ns;
if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
- CLONE_NEWPID | CLONE_NEWNET)))) {
+ CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWLSM)))) {
get_nsproxy(old_ns);
return 0;
}
@@ -157,6 +170,8 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
void free_nsproxy(struct nsproxy *ns)
{
+ if (ns->lsm_ns)
+ put_lsm_ns(ns->lsm_ns);
if (ns->mnt_ns)
put_mnt_ns(ns->mnt_ns);
if (ns->uts_ns)
@@ -165,7 +180,8 @@ void free_nsproxy(struct nsproxy *ns)
put_ipc_ns(ns->ipc_ns);
if (ns->pid_ns_for_children)
put_pid_ns(ns->pid_ns_for_children);
- put_net(ns->net_ns);
+ if (ns->net_ns)
+ put_net(ns->net_ns);
kmem_cache_free(nsproxy_cachep, ns);
}
@@ -180,7 +196,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
int err = 0;
if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
- CLONE_NEWNET | CLONE_NEWPID)))
+ CLONE_NEWNET | CLONE_NEWPID | CLONE_NEWLSM)))
return 0;
user_ns = new_cred ? new_cred->user_ns : current_user_ns();
diff --git a/security/Kconfig b/security/Kconfig
index beb86b5..7b2118b 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -70,6 +70,14 @@ config SECURITY_PATH
implement pathname based access controls.
If you are unsure how to answer this question, answer N.
+config SECURITY_NS
+ bool "Security namespace for LSM modules"
+ depends on SECURITY
+ help
+ This enables security namespaces for Linux Security Modules.
+ The implementation is specific to currently selected LSM module.
+ If you are unsure how to answer this question, answer N.
+
config INTEL_TXT
bool "Enable Intel(R) Trusted Execution Technology (Intel(R) TXT)"
depends on HAVE_INTEL_TXT
diff --git a/security/Makefile b/security/Makefile
index 05f1c93..f333265 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -23,6 +23,7 @@ obj-$(CONFIG_SECURITY_TOMOYO) += tomoyo/
obj-$(CONFIG_SECURITY_APPARMOR) += apparmor/
obj-$(CONFIG_SECURITY_YAMA) += yama/
obj-$(CONFIG_CGROUP_DEVICE) += device_cgroup.o
+obj-$(CONFIG_SECURITY_NS) += lsm_namespace.o
# Object integrity file lists
subdir-$(CONFIG_INTEGRITY) += integrity
diff --git a/security/capability.c b/security/capability.c
index a74fde6..e043cc9 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -919,6 +919,32 @@ static void cap_audit_rule_free(void *lsmrule)
}
#endif /* CONFIG_AUDIT */
+#ifdef CONFIG_SECURITY_NS
+void cap_lsmns_init(void)
+{
+}
+
+int cap_lsmns_allow(unsigned long flags, struct user_namespace *user_ns,
+ struct lsm_namespace *old_ns)
+{
+ return 0;
+}
+
+int cap_lsmns_setns(struct nsproxy *nsproxy, struct lsm_namespace *ns)
+{
+ return 0;
+}
+
+int cap_lsmns_create(struct lsm_namespace *ns)
+{
+ return 0;
+}
+
+void cap_lsmns_free(struct lsm_namespace *ns)
+{
+}
+#endif /* CONFIG_SECURITY_NS */
+
#define set_to_cap_if_null(ops, function) \
do { \
if (!ops->function) { \
@@ -1134,4 +1160,11 @@ void __init security_fixup_ops(struct security_operations *ops)
set_to_cap_if_null(ops, audit_rule_match);
set_to_cap_if_null(ops, audit_rule_free);
#endif
+#ifdef CONFIG_SECURITY_NS
+ set_to_cap_if_null(ops, lsmns_init);
+ set_to_cap_if_null(ops, lsmns_allow);
+ set_to_cap_if_null(ops, lsmns_setns);
+ set_to_cap_if_null(ops, lsmns_create);
+ set_to_cap_if_null(ops, lsmns_free);
+#endif
}
diff --git a/security/lsm_namespace.c b/security/lsm_namespace.c
new file mode 100644
index 0000000..8f7ec3b
--- /dev/null
+++ b/security/lsm_namespace.c
@@ -0,0 +1,147 @@
+/*
+ * Copyright (C) 2014 Samsung Electronics.
+ *
+ * LSM namespaces
+ *
+ * Author(s):
+ * Lukasz Pawelczyk <[email protected]>
+ *
+ * This program is free software, you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/lsm_namespace.h>
+#include <linux/user_namespace.h>
+#include <linux/nsproxy.h>
+#include <linux/proc_ns.h>
+#include <linux/slab.h>
+#include <linux/security.h>
+
+
+/* private functions */
+
+static struct lsm_namespace *create_lsm_ns(struct user_namespace *user_ns,
+ struct lsm_namespace *old_ns)
+{
+ struct lsm_namespace *ns;
+ int err;
+
+ ns = kmalloc(sizeof(struct lsm_namespace), GFP_KERNEL);
+ if (ns == NULL)
+ return ERR_PTR(-ENOMEM);
+
+ err = proc_alloc_inum(&ns->proc_inum);
+ if (err)
+ goto freeout;
+
+ kref_init(&ns->kref);
+ ns->user_ns = get_user_ns(user_ns);
+ ns->private = NULL;
+
+ err = security_lsmns_create(ns);
+ if (err) {
+ put_user_ns(user_ns);
+ goto freeout;
+ }
+
+ return ns;
+
+freeout:
+ kfree(ns);
+ return ERR_PTR(err);
+}
+
+/* public functions */
+
+struct lsm_namespace *copy_lsm_ns(unsigned long flags,
+ struct user_namespace *user_ns,
+ struct lsm_namespace *ns)
+{
+ int err;
+
+ err = security_lsmns_allow(flags, user_ns, ns);
+ if (err)
+ return ERR_PTR(err);
+
+ if (!(flags & CLONE_NEWLSM))
+ return get_lsm_ns(ns);
+ return create_lsm_ns(user_ns, ns);
+}
+
+void free_lsm_ns(struct kref *kref)
+{
+ struct lsm_namespace *ns;
+
+ ns = container_of(kref, struct lsm_namespace, kref);
+
+ security_lsmns_free(ns);
+
+ put_user_ns(ns->user_ns);
+ proc_free_inum(ns->proc_inum);
+ kfree(ns);
+}
+
+/* proc_ns_operations hooks */
+
+static void *lsmns_get(struct task_struct *task)
+{
+ struct lsm_namespace *ns = NULL;
+ struct nsproxy *nsproxy;
+
+ task_lock(task);
+ nsproxy = task->nsproxy;
+ if (nsproxy)
+ ns = get_lsm_ns(nsproxy->lsm_ns);
+ task_unlock(task);
+
+ return ns;
+}
+
+static void lsmns_put(void *ns)
+{
+ put_lsm_ns(ns);
+}
+
+static int lsmns_install(struct nsproxy *nsproxy, void *ns)
+{
+ struct lsm_namespace *new = ns;
+ int err;
+
+ if (!ns_capable(new->user_ns, CAP_SYS_ADMIN) ||
+ !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
+ return -EPERM;
+
+ err = security_lsmns_setns(nsproxy, new);
+ if (err)
+ return err;
+
+ put_lsm_ns(nsproxy->lsm_ns);
+ nsproxy->lsm_ns = get_lsm_ns(new);
+
+ return 0;
+}
+
+static unsigned int lsmns_inum(void *ns)
+{
+ struct lsm_namespace *lsm_ns = ns;
+
+ return lsm_ns->proc_inum;
+}
+
+const struct proc_ns_operations lsmns_operations = {
+ .name = "lsm",
+ .type = CLONE_NEWLSM,
+ .get = lsmns_get,
+ .put = lsmns_put,
+ .install = lsmns_install,
+ .inum = lsmns_inum,
+};
+
+static __init int lsm_namespaces_init(void)
+{
+ security_lsmns_init();
+
+ return 0;
+}
+subsys_initcall(lsm_namespaces_init);
diff --git a/security/security.c b/security/security.c
index e41b1a8..200365f 100644
--- a/security/security.c
+++ b/security/security.c
@@ -25,10 +25,22 @@
#include <linux/mount.h>
#include <linux/personality.h>
#include <linux/backing-dev.h>
+#include <linux/lsm_namespace.h>
+#include <linux/proc_ns.h>
#include <net/flow.h>
#define MAX_LSM_EVM_XATTR 2
+struct lsm_namespace init_lsm_ns = {
+ .kref = {
+ .refcount = ATOMIC_INIT(1),
+ },
+ .user_ns = &init_user_ns,
+ .proc_inum = PROC_LSM_INIT_INO,
+ /* module specific and to be initialized in lsmns_init() */
+ .private = NULL,
+};
+
/* Boot-time LSM user choice */
static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1] =
CONFIG_DEFAULT_SECURITY;
@@ -1472,3 +1484,33 @@ int security_audit_rule_match(u32 secid, u32 field, u32 op, void *lsmrule,
}
#endif /* CONFIG_AUDIT */
+
+#ifdef CONFIG_SECURITY_NS
+
+void security_lsmns_init(void)
+{
+ security_ops->lsmns_init();
+}
+
+int security_lsmns_allow(unsigned long flags, struct user_namespace *user_ns,
+ struct lsm_namespace *old_ns)
+{
+ return security_ops->lsmns_allow(flags, user_ns, old_ns);
+}
+
+int security_lsmns_setns(struct nsproxy *nsproxy, struct lsm_namespace *ns)
+{
+ return security_ops->lsmns_setns(nsproxy, ns);
+}
+
+int security_lsmns_create(struct lsm_namespace *ns)
+{
+ return security_ops->lsmns_create(ns);
+}
+
+void security_lsmns_free(struct lsm_namespace *ns)
+{
+ security_ops->lsmns_free(ns);
+}
+
+#endif /* CONFIG_SECURITY_NS */
--
1.9.3
On Thu, Nov 27, 2014 at 3:01 PM, Lukasz Pawelczyk
<[email protected]> wrote:
> extern struct dentry *securityfs_create_file(const char *name, umode_t mode,
> diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
> index 34f9d73..5ac7fb9 100644
> --- a/include/uapi/linux/sched.h
> +++ b/include/uapi/linux/sched.h
> @@ -21,8 +21,7 @@
> #define CLONE_DETACHED 0x00400000 /* Unused, ignored */
> #define CLONE_UNTRACED 0x00800000 /* set if the tracing process can't force CLONE_PTRACE on this clone */
> #define CLONE_CHILD_SETTID 0x01000000 /* set the TID in the child */
> -/* 0x02000000 was previously the unused CLONE_STOPPED (Start in stopped state)
> - and is now available for re-use. */
> +#define CLONE_NEWLSM 0x02000000 /* New LSM namespace */
FYI, CLONE_NEWCGROUP also claims last flag [1].
As it looks we will get more and more namespaces, more than clone() can handle.
[1] https://lkml.org/lkml/2014/7/17/588
--
Thanks,
//richard
On czw, 2014-11-27 at 15:18 +0100, Richard Weinberger wrote:
> On Thu, Nov 27, 2014 at 3:01 PM, Lukasz Pawelczyk
> <[email protected]> wrote:
> > -/* 0x02000000 was previously the unused CLONE_STOPPED (Start in stopped state)
> > - and is now available for re-use. */
> > +#define CLONE_NEWLSM 0x02000000 /* New LSM namespace */
>
> FYI, CLONE_NEWCGROUP also claims last flag [1].
Yes, I'm perfectly aware of that. I've seen those patches.
This is RFC for now and CGROUP NS is not merged yet. I'll rebase when
time comes.
> As it looks we will get more and more namespaces, more than clone() can handle.
>
> [1] https://lkml.org/lkml/2014/7/17/588
>
--
Lukasz Pawelczyk
Samsung R&D Institute Poland
Samsung Electronics
On czw, 2014-11-27 at 15:38 +0100, Richard Weinberger wrote:
> Am 27.11.2014 um 15:35 schrieb Lukasz Pawelczyk:
> > On czw, 2014-11-27 at 15:18 +0100, Richard Weinberger wrote:
> >> On Thu, Nov 27, 2014 at 3:01 PM, Lukasz Pawelczyk
> >> <[email protected]> wrote:
> >>> -/* 0x02000000 was previously the unused CLONE_STOPPED (Start in stopped state)
> >>> - and is now available for re-use. */
> >>> +#define CLONE_NEWLSM 0x02000000 /* New LSM namespace */
> >>
> >> FYI, CLONE_NEWCGROUP also claims last flag [1].
> >
> > Yes, I'm perfectly aware of that. I've seen those patches.
> > This is RFC for now and CGROUP NS is not merged yet. I'll rebase when
> > time comes.
>
> Just wanted to indicate that we run out of constants. :)
True, the last one is 0x80000000. I did not notice that. Thanks for
pointing out.
Any suggestion on what can be done here? New syscal with flags2?
--
Lukasz Pawelczyk
Samsung R&D Institute Poland
Samsung Electronics
On czw, 2014-11-27 at 16:01 +0100, Richard Weinberger wrote:
> Am 27.11.2014 um 15:44 schrieb Lukasz Pawelczyk:
> > True, the last one is 0x80000000. I did not notice that. Thanks for
> > pointing out.
>
> Isn't this CLONE_IO?
Yes, I was merely noticing out loud that it's the last bit of 32bit.
After close look though the 0x00001000 appears to be unused
> > Any suggestion on what can be done here? New syscal with flags2?
>
> I'm not sure. But a new syscall would be a candidate.
--
Lukasz Pawelczyk
Samsung R&D Institute Poland
Samsung Electronics
On czw, 2014-11-27 at 16:17 +0100, Richard Weinberger wrote:
> Am 27.11.2014 um 16:11 schrieb Lukasz Pawelczyk:
> > On czw, 2014-11-27 at 16:01 +0100, Richard Weinberger wrote:
> >> Am 27.11.2014 um 15:44 schrieb Lukasz Pawelczyk:
> >>> True, the last one is 0x80000000. I did not notice that. Thanks for
> >>> pointing out.
> >>
> >> Isn't this CLONE_IO?
> >
> > Yes, I was merely noticing out loud that it's the last bit of 32bit.
> >
> > After close look though the 0x00001000 appears to be unused
>
> This was CLONE_PID.
> I'm not sure if we can reuse this. man 2 clone states "It disappeared in Linux 2.5.16.".
> Maybe one of the CC'd parties can tell more...
Would really like someone to comment on this. I'd like to avoid creating
a new syscall at this point.
According to clone(2):
CLONE_STOPPED has been removed in 2.6.38 and can be reused.
CLONE_PID as you mentioned has been removed in 2.5.16 but since 2.3.21
it could only be used by boot process (PID 0).
So this was really long time ago and effectively regular user space
cannot use it since 2.3.21.
--
Lukasz Pawelczyk
Samsung R&D Institute Poland
Samsung Electronics
Lukasz Pawelczyk <[email protected]> writes:
> On czw, 2014-11-27 at 16:01 +0100, Richard Weinberger wrote:
>> Am 27.11.2014 um 15:44 schrieb Lukasz Pawelczyk:
>> > True, the last one is 0x80000000. I did not notice that. Thanks for
>> > pointing out.
>>
>> Isn't this CLONE_IO?
>
> Yes, I was merely noticing out loud that it's the last bit of 32bit.
>
> After close look though the 0x00001000 appears to be unused
>
>> > Any suggestion on what can be done here? New syscal with flags2?
>>
>> I'm not sure. But a new syscall would be a candidate.
We are probably going to need to go a couple rounds with this but at
first approximation I think this functionality needs to be tied to the
user namespace. This functionality already looks half tied to it.
When mounting filesystems with user namespaces priveleges matures a
little more you should be able to use unmapped labels. In the near term
we are looking at filesystems such as tmpfs, fuse and posibly extN.
Eric
On czw, 2014-11-27 at 09:42 -0600, Eric W. Biederman wrote:
> Lukasz Pawelczyk <[email protected]> writes:
>
> > On czw, 2014-11-27 at 16:01 +0100, Richard Weinberger wrote:
> >> Am 27.11.2014 um 15:44 schrieb Lukasz Pawelczyk:
> >> > True, the last one is 0x80000000. I did not notice that. Thanks for
> >> > pointing out.
> >>
> >> Isn't this CLONE_IO?
> >
> > Yes, I was merely noticing out loud that it's the last bit of 32bit.
> >
> > After close look though the 0x00001000 appears to be unused
> >
> >> > Any suggestion on what can be done here? New syscal with flags2?
> >>
> >> I'm not sure. But a new syscall would be a candidate.
>
> We are probably going to need to go a couple rounds with this but at
> first approximation I think this functionality needs to be tied to the
> user namespace. This functionality already looks half tied to it.
>
> When mounting filesystems with user namespaces priveleges matures a
> little more you should be able to use unmapped labels. In the near term
> we are looking at filesystems such as tmpfs, fuse and posibly extN.
I presume you are referring to the Smack namespace readme where I
mentioned mounts with specifying smack labels in the mount options, not
to the quote above?
I was referring the to the check here that has been changed to
smack_ns_privileged() using ns_capable():
http://lxr.free-electrons.com/source/security/smack/smack_lsm.c#L462
And you can't use an unmapped Smack label inside the namespace, this
would be completely against its idea.
Anyway, at this point I'm more interested in the LSM namespace. I'll be
doing an RFC for Smack namespace later.
Unless I misunderstood your mail.
--
Lukasz Pawelczyk
Samsung R&D Institute Poland
Samsung Electronics
Lukasz Pawelczyk <[email protected]> writes:
> On czw, 2014-11-27 at 09:42 -0600, Eric W. Biederman wrote:
>> Lukasz Pawelczyk <[email protected]> writes:
>>
>> > On czw, 2014-11-27 at 16:01 +0100, Richard Weinberger wrote:
>> >> Am 27.11.2014 um 15:44 schrieb Lukasz Pawelczyk:
>> >> > True, the last one is 0x80000000. I did not notice that. Thanks for
>> >> > pointing out.
>> >>
>> >> Isn't this CLONE_IO?
>> >
>> > Yes, I was merely noticing out loud that it's the last bit of 32bit.
>> >
>> > After close look though the 0x00001000 appears to be unused
>> >
>> >> > Any suggestion on what can be done here? New syscal with flags2?
>> >>
>> >> I'm not sure. But a new syscall would be a candidate.
>>
>> We are probably going to need to go a couple rounds with this but at
>> first approximation I think this functionality needs to be tied to the
>> user namespace. This functionality already looks half tied to it.
>>
>> When mounting filesystems with user namespaces priveleges matures a
>> little more you should be able to use unmapped labels. In the near term
>> we are looking at filesystems such as tmpfs, fuse and posibly extN.
>
> I presume you are referring to the Smack namespace readme where I
> mentioned mounts with specifying smack labels in the mount options, not
> to the quote above?
>
> I was referring the to the check here that has been changed to
> smack_ns_privileged() using ns_capable():
> http://lxr.free-electrons.com/source/security/smack/smack_lsm.c#L462
>
> And you can't use an unmapped Smack label inside the namespace, this
> would be completely against its idea.
>
> Anyway, at this point I'm more interested in the LSM namespace. I'll be
> doing an RFC for Smack namespace later.
>
> Unless I misunderstood your mail.
I had two points.
a) Tie the label mapping to the user namespace, then we don't need any
new namespaces.
Is there a reason not to tie the label mapping to the user namespace?
Needing to modify every userspace that create containers to know
about every different lsm looks like a maintenance difficulty I would
prefer to avoid.
b) For filesystems that don't need uid mapping (say ext2 mounted with
user namespace permissions) we shouldn't need LSM mapping either.
Eric
On czw, 2014-11-27 at 10:44 -0600, Eric W. Biederman wrote:
> Lukasz Pawelczyk <[email protected]> writes:
>
> > On czw, 2014-11-27 at 09:42 -0600, Eric W. Biederman wrote:
> >> We are probably going to need to go a couple rounds with this but at
> >> first approximation I think this functionality needs to be tied to the
> >> user namespace. This functionality already looks half tied to it.
Actually it's not. You can create LSM/Smack namespace without user
namespace and it works properly.
> >> When mounting filesystems with user namespaces priveleges matures a
> >> little more you should be able to use unmapped labels. In the near term
> >> we are looking at filesystems such as tmpfs, fuse and posibly extN.
Ok, I get the idea now. But still think it wouldn't do well with the
Smack namespace. It would basically allow you to operate on something
that the administrator did not allowed you to (by filling the labels'
map).
If the user namespace allows such a thing now I was not aware. I'll have
a look.
> I had two points.
> a) Tie the label mapping to the user namespace, then we don't need any
> new namespaces.
>
> Is there a reason not to tie the label mapping to the user namespace?
I remember that I entertained the idea when I started the work on that
and for some reason went against it.
Right now the major issue I see is that LSM by itself is not defined how
it's going to behave. It's up to a specific LSM module.
E.g. within the Smack namespace filling the map is a privileged
operation. So by tying them up you cripple the ability to create a fully
working user namespace as an unprivileged process.
I want to have Smack namespace be able to map its own label without
privileges (as user namespace can do with its own UID) but for now it's
not the case and I'm not sure it will ever be.
With other LSM implementation other limitations might apply.
Besides a use case (with other LSM modules) when someone might not want
to create an LSM namespace might be valid as well.
>
> Needing to modify every userspace that create containers to know
> about every different lsm looks like a maintenance difficulty I would
> prefer to avoid.
The LSM namespace is only one, it's not like every LSM modules creates a
different namespace. The LSM namespace is created for the LSM module
that is active at the moment. And user space might need to be aware of
them anyway as e.g. Smack requires you to create labels' map. Other
modules might require something different.
BTW: have you read the Smack-namespace readme I pasted in the cover
letter? It describes the idea behind namespace implementation in that
particular module.
--
Lukasz Pawelczyk
Samsung R&D Institute Poland
Samsung Electronics
On czw, 2014-11-27 at 18:38 +0100, Lukasz Pawelczyk wrote:
> Right now the major issue I see is that LSM by itself is not defined how
> it's going to behave. It's up to a specific LSM module.
>
> E.g. within the Smack namespace filling the map is a privileged
> operation. So by tying them up you cripple the ability to create a fully
> working user namespace as an unprivileged process.
Entertaining the idea that LSM namespace would be tied to user namespace
(as you suggested) how do you see the limitation I described above?
--
Lukasz Pawelczyk
Samsung R&D Institute Poland
Samsung Electronics
Lukasz Pawelczyk <[email protected]> writes:
> On czw, 2014-11-27 at 18:38 +0100, Lukasz Pawelczyk wrote:
>> Right now the major issue I see is that LSM by itself is not defined how
>> it's going to behave. It's up to a specific LSM module.
>>
>> E.g. within the Smack namespace filling the map is a privileged
>> operation. So by tying them up you cripple the ability to create a fully
>> working user namespace as an unprivileged process.
>
> Entertaining the idea that LSM namespace would be tied to user namespace
> (as you suggested) how do you see the limitation I described above?
If they are tied it means you wind up in a situation where there are no
labels you can set.
In general setting the uid and gid maps is also a privileged operations.
I really don't know what makes sense to do with lsms and namespaces
generically, but I do know that your lsm namespace patche were awkwards
and weird and seemed to be taking things in the wrong direction.
Eric