2015-04-17 07:36:58

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH V6 00/10] namespaces: log namespaces per task

The purpose is to track namespace instances in use by logged processes from the
perspective of init_*_ns by logging the namespace IDs (device ID and namespace
inode - offset).


1/10 exposes proc's ns entries structure which lists a number of useful
operations per namespace type for other subsystems to use.

2/10 proc_ns: define PROC_*_INIT_INO in terms of PROC_DYNAMIC_FIRST

3/10 provides an example of usage for audit_log_task_info() which is used by
syscall audits, among others. audit_log_task() and audit_common_recv_message()
would be other potential use cases.

Proposed output format:
This differs slightly from Aristeu's patch because of the label conflict with
"pid=" due to including it in existing records rather than it being a seperate
record. It has now returned to being a seperate record. The proc device
major/minor are listed in hexadecimal and namespace IDs are the proc inode
minus the base offset.
type=NS_INFO msg=audit(1408577535.306:82): dev=00:03 netns=3 utsns=-3 ipcns=-4 pidns=-1 userns=-2 mntns=0

4/10 change audit startup from __initcall to subsys_initcall to get it started
earlier to be able to receive initial namespace log messages.

5/10 tracks the creation and deletion of namespaces, listing the type of
namespace instance, proc device ID, related namespace id if there is one and
the newly minted namespace ID.

Proposed output format for initial namespace creation:
type=AUDIT_NS_INIT_UTS msg=audit(1408577534.868:5): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_utsns=(none) utsns=-3 res=1
type=AUDIT_NS_INIT_USER msg=audit(1408577534.868:6): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_userns=(none) userns=-2 res=1
type=AUDIT_NS_INIT_PID msg=audit(1408577534.868:7): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_pidns=(none) pidns=-1 res=1
type=AUDIT_NS_INIT_MNT msg=audit(1408577534.868:8): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_mntns=(none) mntns=0 res=1
type=AUDIT_NS_INIT_IPC msg=audit(1408577534.868:9): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_ipcns=(none) ipcns=-4 res=1
type=AUDIT_NS_INIT_NET msg=audit(1408577533.500:10): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_netns=(none) netns=2 res=1

And a CLONE action would result in:
type=type=AUDIT_NS_INIT_NET msg=audit(1408577535.306:81): pid=481 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 dev=00:03 old_netns=2 netns=3 res=1

While deleting a namespace would result in:
type=type=AUDIT_NS_DEL_MNT msg=audit(1408577552.221:85): pid=481 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 dev=00:03 mntns=4 res=1

6/10 accepts a PID from userspace and requests logging an AUDIT_NS_INFO record
type (CAP_AUDIT_CONTROL required).

7/10 is a macro for CLONE_NEW_* flags.

8/10 adds auditing on creation of namespace(s) in fork.

9/10 adds auditing a change of namespace on setns.

10/10 attaches a AUDIT_NS_INFO record to AUDIT_VIRT_CONTROL records
(CAP_AUDIT_WRITE required).


v5 -> v6:
Switch to using namespace ID based on namespace proc inode minus base offset
Added proc device ID to qualify proc inode reference
Eliminate exposed /proc interface

v4 -> v5:
Clean up prototypes for dependencies on CONFIG_NAMESPACES.
Add AUDIT_NS_INFO record type to AUDIT_VIRT_CONTROL record.
Log AUDIT_NS_INFO with PID.
Move /proc/<pid>/ns_* patches to end of patchset to deprecate them.
Log on changing ns (setns).
Log on creating new namespaces when forking.
Added a macro for CLONE_NEW*.

v3 -> v4:
Seperate out the NS_INFO message from the SYSCALL message.
Moved audit_log_namespace_info() out of audit_log_task_info().
Use a seperate message type per namespace type for each of INIT/DEL.
Make ns= easier to search across NS_INFO and NS_INIT/DEL_XXX msg types.
Add /proc/<pid>/ns/ documentation.
Fix dynamic initial ns logging.

v2 -> v3:
Use atomic64_t in ns_serial to simplify it.
Avoid funciton duplication in proc, keying on dentry.
Squash down audit patch to avoid rcu sleep issues.
Add tracking for creation and deletion of namespace instances.

v1 -> v2:
Avoid rollover by switching from an int to a long long.
Change rollover behaviour from simply avoiding zero to raising a BUG.
Expose serial numbers in /proc/<pid>/ns/*_snum.
Expose ns_entries and use it in audit.


Notes:
As for CAP_AUDIT_READ, a patchset has been accepted upstream to check
capabilities of userspace processes that try to join netlink broadcast groups.

This set does not try to solve the non-init namespace audit messages and
auditd problem yet. That will come later, likely with additional auditd
instances running in another namespace with a limited ability to influence the
master auditd. I echo Eric B's idea that messages destined for different
namespaces would have to be tailored for that namespace with references that
make sense (such as the right pid number reported to that pid namespace, and
not leaking info about parents or peers).

Questions:
Is there a way to link serial numbers of namespaces involved in migration of a
container to another kernel? It sounds like what is needed is a part of a
mangement application that is able to pull the audit records from constituent
hosts to build an audit trail of a container.

What additional events should list this information?

Does this present any problematic information leaks? Only CAP_AUDIT_CONTROL
(and now CAP_AUDIT_READ) in init_user_ns can get to this information in
the init namespace at the moment from audit.


Richard Guy Briggs (10):
namespaces: expose ns_entries
proc_ns: define PROC_*_INIT_INO in terms of PROC_DYNAMIC_FIRST
audit: log namespace ID numbers
audit: initialize at subsystem time rather than device time
audit: log creation and deletion of namespace instances
audit: dump namespace IDs for pid on receipt of AUDIT_NS_INFO
sched: add a macro to ref all CLONE_NEW* flags
fork: audit on creation of new namespace(s)
audit: log on switching namespace (setns)
audit: emit AUDIT_NS_INFO record with AUDIT_VIRT_CONTROL record

fs/namespace.c | 13 +++
fs/proc/generic.c | 3 +-
fs/proc/namespaces.c | 2 +-
include/linux/audit.h | 20 +++++
include/linux/proc_ns.h | 10 ++-
include/uapi/linux/audit.h | 21 +++++
include/uapi/linux/sched.h | 6 ++
ipc/namespace.c | 12 +++
kernel/audit.c | 169 +++++++++++++++++++++++++++++++++++++-
kernel/auditsc.c | 2 +
kernel/fork.c | 3 +
kernel/nsproxy.c | 4 +
kernel/pid_namespace.c | 13 +++
kernel/user_namespace.c | 13 +++
kernel/utsname.c | 12 +++
net/core/net_namespace.c | 12 +++
security/integrity/ima/ima_api.c | 2 +
17 files changed, 309 insertions(+), 8 deletions(-)


2015-04-17 07:47:51

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH V6 01/10] namespaces: expose ns_entries

Expose ns_entries so subsystems other than proc can use this set of namespace
operations.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
fs/proc/namespaces.c | 2 +-
include/linux/proc_ns.h | 1 +
2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index 8902609..310da74 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -15,7 +15,7 @@
#include "internal.h"


-static const struct proc_ns_operations *ns_entries[] = {
+const struct proc_ns_operations *ns_entries[] = {
#ifdef CONFIG_NET_NS
&netns_operations,
#endif
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index 34a1e10..09ff93c 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -27,6 +27,7 @@ extern const struct proc_ns_operations ipcns_operations;
extern const struct proc_ns_operations pidns_operations;
extern const struct proc_ns_operations userns_operations;
extern const struct proc_ns_operations mntns_operations;
+extern const struct proc_ns_operations *ns_entries[];

/*
* We always define these enumerators
--
1.7.1

2015-04-17 07:48:00

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH V6 02/10] proc_ns: define PROC_*_INIT_INO in terms of PROC_DYNAMIC_FIRST

Since PROC_*_INIT_INO are all defined relative to PROC_DYNAMIC_FIRST, make it
explicit.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
fs/proc/generic.c | 3 +--
include/linux/proc_ns.h | 9 +++++----
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index b7f268e..9f7726a 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -11,6 +11,7 @@
#include <linux/errno.h>
#include <linux/time.h>
#include <linux/proc_fs.h>
+#include <linux/proc_ns.h>
#include <linux/stat.h>
#include <linux/mm.h>
#include <linux/module.h>
@@ -121,8 +122,6 @@ static int xlate_proc_name(const char *name, struct proc_dir_entry **ret,
static DEFINE_IDA(proc_inum_ida);
static DEFINE_SPINLOCK(proc_inum_lock); /* protects the above */

-#define PROC_DYNAMIC_FIRST 0xF0000000U
-
/*
* Return an inode number between PROC_DYNAMIC_FIRST and
* 0xffffffff, or zero on failure.
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index 09ff93c..340372b 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -32,12 +32,13 @@ extern const struct proc_ns_operations *ns_entries[];
/*
* We always define these enumerators
*/
+#define PROC_DYNAMIC_FIRST 0xF0000000U
enum {
PROC_ROOT_INO = 1,
- PROC_IPC_INIT_INO = 0xEFFFFFFFU,
- PROC_UTS_INIT_INO = 0xEFFFFFFEU,
- PROC_USER_INIT_INO = 0xEFFFFFFDU,
- PROC_PID_INIT_INO = 0xEFFFFFFCU,
+ PROC_IPC_INIT_INO = PROC_DYNAMIC_FIRST - 1,
+ PROC_UTS_INIT_INO = PROC_DYNAMIC_FIRST - 2,
+ PROC_USER_INIT_INO = PROC_DYNAMIC_FIRST - 3,
+ PROC_PID_INIT_INO = PROC_DYNAMIC_FIRST - 4,
};

#ifdef CONFIG_PROC_FS
--
1.7.1

2015-04-17 07:37:27

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH V6 03/10] audit: log namespace ID numbers

Log the namespace identifiers (device ID and proc inode minus base offset) of a
task in a new record type (1329) (usually accompanies audit_log_task_info()
type=SYSCALL record) which is used by syscall audits, among others..

Idea first presented:
https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html

Typical output format would look something like:
type=NS_INFO msg=audit(1408577535.306:82): pid=374 dev=00:03 netns=116 utsns=-2 ipcns=-1 pidns=-4 userns=-3 mntns=0

The namespace identifier values are printed relative to PROC_DYNAMIC_FIRST
(0xF000000) (so the first 4 are negative).

Suggested-by: Aristeu Rozanski <[email protected]>
Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
---
include/linux/audit.h | 8 ++++++++
include/uapi/linux/audit.h | 1 +
kernel/audit.c | 35 +++++++++++++++++++++++++++++++++++
kernel/auditsc.c | 2 ++
security/integrity/ima/ima_api.c | 2 ++
5 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index b481779..71698ec 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -478,6 +478,12 @@ static inline void audit_log_secctx(struct audit_buffer *ab, u32 secid)
extern int audit_log_task_context(struct audit_buffer *ab);
extern void audit_log_task_info(struct audit_buffer *ab,
struct task_struct *tsk);
+#ifdef CONFIG_NAMESPACES
+extern void audit_log_ns_info(struct task_struct *tsk);
+#else
+static inline void audit_log_ns_info(struct task_struct *tsk)
+{ }
+#endif

extern int audit_update_lsm_rules(void);

@@ -534,6 +540,8 @@ static inline int audit_log_task_context(struct audit_buffer *ab)
static inline void audit_log_task_info(struct audit_buffer *ab,
struct task_struct *tsk)
{ }
+static inline void audit_log_ns_info(struct task_struct *tsk)
+{ }
#define audit_enabled 0
#endif /* CONFIG_AUDIT */
static inline void audit_log_string(struct audit_buffer *ab, const char *buf)
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 2ccf19e..1ffb151 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -110,6 +110,7 @@
#define AUDIT_SECCOMP 1326 /* Secure Computing event */
#define AUDIT_PROCTITLE 1327 /* Proctitle emit event */
#define AUDIT_FEATURE_CHANGE 1328 /* audit log listing feature changes */
+#define AUDIT_NS_INFO 1329 /* Record process namespace IDs */

#define AUDIT_AVC 1400 /* SE Linux avc denial or grant */
#define AUDIT_SELINUX_ERR 1401 /* Internal SE Linux Errors */
diff --git a/kernel/audit.c b/kernel/audit.c
index d5a1220..68200ce 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -66,7 +66,9 @@
#include <linux/freezer.h>
#include <linux/tty.h>
#include <linux/pid_namespace.h>
+#include <linux/proc_ns.h>
#include <net/netns/generic.h>
+#include <linux/mount.h> /* struct vfsmount */

#include "audit.h"

@@ -745,6 +747,8 @@ static void audit_log_feature_change(int which, u32 old_feature, u32 new_feature
audit_feature_names[which], !!old_feature, !!new_feature,
!!old_lock, !!new_lock, res);
audit_log_end(ab);
+
+ audit_log_ns_info(current);
}

static int audit_set_feature(struct sk_buff *skb)
@@ -1653,6 +1657,35 @@ void audit_log_session_info(struct audit_buffer *ab)
audit_log_format(ab, " auid=%u ses=%u", auid, sessionid);
}

+#ifdef CONFIG_NAMESPACES
+void audit_log_ns_info(struct task_struct *tsk)
+{
+ const struct proc_ns_operations **entry;
+ bool end = false;
+ struct audit_buffer *ab;
+ struct vfsmount *mnt = task_active_pid_ns(tsk)->proc_mnt;
+ struct super_block *sb = mnt->mnt_sb;
+
+ if (!tsk)
+ return;
+ ab = audit_log_start(tsk->audit_context, GFP_KERNEL,
+ AUDIT_NS_INFO);
+ if (!ab)
+ return;
+ audit_log_format(ab, "pid=%d", task_pid_nr(tsk));
+ audit_log_format(ab, " dev=%02x:%02x", MAJOR(sb->s_dev), MINOR(sb->s_dev));
+ for (entry = ns_entries; !end; entry++) {
+ void *ns = (*entry)->get(tsk);
+
+ audit_log_format(ab, " %sns=%d", (*entry)->name,
+ (*entry)->inum(ns) - PROC_DYNAMIC_FIRST);
+ (*entry)->put(ns);
+ end = (*entry)->type == CLONE_NEWNS;
+ }
+ audit_log_end(ab);
+}
+#endif /* CONFIG_NAMESPACES */
+
void audit_log_key(struct audit_buffer *ab, char *key)
{
audit_log_format(ab, " key=");
@@ -1935,6 +1968,8 @@ void audit_log_link_denied(const char *operation, struct path *link)
audit_log_format(ab, " res=0");
audit_log_end(ab);

+ audit_log_ns_info(current);
+
/* Generate AUDIT_PATH record with object. */
name->type = AUDIT_TYPE_NORMAL;
audit_copy_inode(name, link->dentry, link->dentry->d_inode);
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 4b89f7f..3ca3416 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1378,6 +1378,8 @@ static void audit_log_exit(struct audit_context *context, struct task_struct *ts
audit_log_key(ab, context->filterkey);
audit_log_end(ab);

+ audit_log_ns_info(tsk);
+
for (aux = context->aux; aux; aux = aux->next) {

ab = audit_log_start(context, GFP_KERNEL, aux->type);
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index d9cd5ce..58ac695 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -323,6 +323,8 @@ void ima_audit_measurement(struct integrity_iint_cache *iint,
audit_log_task_info(ab, current);
audit_log_end(ab);

+ audit_log_ns_info(current);
+
iint->flags |= IMA_AUDITED;
}

--
1.7.1

2015-04-17 07:37:04

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH V6 04/10] audit: initialize at subsystem time rather than device time

The audit subsystem should be initialized a bit earlier so that it is in place
in time for initial namespace ID number logging.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
kernel/audit.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 68200ce..63f32f4 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1188,7 +1188,7 @@ static int __init audit_init(void)

return 0;
}
-__initcall(audit_init);
+subsys_initcall(audit_init);

/* Process kernel command-line parameter at boot time. audit=0 or audit=1. */
static int __init audit_enable(char *str)
--
1.7.1

2015-04-17 07:38:47

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

Log the creation and deletion of namespace instances in all 6 types of
namespaces.

Twelve new audit message types have been introduced:
AUDIT_NS_INIT_MNT 1330 /* Record mount namespace instance creation */
AUDIT_NS_INIT_UTS 1331 /* Record UTS namespace instance creation */
AUDIT_NS_INIT_IPC 1332 /* Record IPC namespace instance creation */
AUDIT_NS_INIT_USER 1333 /* Record USER namespace instance creation */
AUDIT_NS_INIT_PID 1334 /* Record PID namespace instance creation */
AUDIT_NS_INIT_NET 1335 /* Record NET namespace instance creation */
AUDIT_NS_DEL_MNT 1336 /* Record mount namespace instance deletion */
AUDIT_NS_DEL_UTS 1337 /* Record UTS namespace instance deletion */
AUDIT_NS_DEL_IPC 1338 /* Record IPC namespace instance deletion */
AUDIT_NS_DEL_USER 1339 /* Record USER namespace instance deletion */
AUDIT_NS_DEL_PID 1340 /* Record PID namespace instance deletion */
AUDIT_NS_DEL_NET 1341 /* Record NET namespace instance deletion */

As suggested by Eric Paris, there are 12 message types, one for each of
creation and deletion, one for each type of namespace so that text searches are
easier in conjunction with the AUDIT_NS_INFO message type, being able to search
for all records such as "netns=4 " and to avoid fields disappearing per message
type to make ausearch more efficient.

A typical startup would look roughly like:

type=AUDIT_NS_INIT_UTS msg=audit(1408577534.868:5): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_utsns=(none) utsns=-2 res=1
type=AUDIT_NS_INIT_USER msg=audit(1408577534.868:6): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_userns=(none) userns=-3 res=1
type=AUDIT_NS_INIT_PID msg=audit(1408577534.868:7): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_pidns=(none) pidns=-4 res=1
type=AUDIT_NS_INIT_MNT msg=audit(1408577534.868:8): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_mntns=(none) mntns=0 res=1
type=AUDIT_NS_INIT_IPC msg=audit(1408577534.868:9): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_ipcns=(none) ipcns=-1 res=1
type=AUDIT_NS_INIT_NET msg=audit(1408577533.500:10): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_netns=(none) netns=2 res=1

And a CLONE action would result in:
type=type=AUDIT_NS_INIT_NET msg=audit(1408577535.306:81): pid=481 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 dev=00:03 old_netns=2 netns=3 res=1

While deleting a namespace would result in:
type=type=AUDIT_NS_DEL_MNT msg=audit(1408577552.221:85): pid=481 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 dev=00:03 mntns=4 res=1

If not "(none)", old_XXXns lists the namespace from which it was cloned.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
fs/namespace.c | 13 +++++++++
include/linux/audit.h | 8 +++++
include/uapi/linux/audit.h | 12 ++++++++
ipc/namespace.c | 12 ++++++++
kernel/audit.c | 64 ++++++++++++++++++++++++++++++++++++++++++++
kernel/pid_namespace.c | 13 +++++++++
kernel/user_namespace.c | 13 +++++++++
kernel/utsname.c | 12 ++++++++
net/core/net_namespace.c | 12 ++++++++
9 files changed, 159 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 182bc41..7b62543 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -24,6 +24,7 @@
#include <linux/proc_ns.h>
#include <linux/magic.h>
#include <linux/bootmem.h>
+#include <linux/audit.h>
#include "pnode.h"
#include "internal.h"

@@ -2459,6 +2460,7 @@ dput_out:

static void free_mnt_ns(struct mnt_namespace *ns)
{
+ audit_log_ns_del(AUDIT_NS_DEL_MNT, ns->proc_inum);
proc_free_inum(ns->proc_inum);
put_user_ns(ns->user_ns);
kfree(ns);
@@ -2518,6 +2520,7 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
new_ns = alloc_mnt_ns(user_ns);
if (IS_ERR(new_ns))
return new_ns;
+ audit_log_ns_init(AUDIT_NS_INIT_MNT, ns->proc_inum, new_ns->proc_inum);

namespace_lock();
/* First pass: copy the tree topology */
@@ -2830,6 +2833,16 @@ static void __init init_mount_tree(void)
set_fs_root(current->fs, &root);
}

+/* log the ID of init mnt namespace after audit service starts */
+static int __init mnt_ns_init_log(void)
+{
+ struct mnt_namespace *init_mnt_ns = init_task.nsproxy->mnt_ns;
+
+ audit_log_ns_init(AUDIT_NS_INIT_MNT, 0, init_mnt_ns->proc_inum);
+ return 0;
+}
+late_initcall(mnt_ns_init_log);
+
void __init mnt_init(void)
{
unsigned u;
diff --git a/include/linux/audit.h b/include/linux/audit.h
index 71698ec..b28dfb0 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -484,6 +484,9 @@ extern void audit_log_ns_info(struct task_struct *tsk);
static inline void audit_log_ns_info(struct task_struct *tsk)
{ }
#endif
+extern void audit_log_ns_init(int type, unsigned int old_inum,
+ unsigned int inum);
+extern void audit_log_ns_del(int type, unsigned int inum);

extern int audit_update_lsm_rules(void);

@@ -542,6 +545,11 @@ static inline void audit_log_task_info(struct audit_buffer *ab,
{ }
static inline void audit_log_ns_info(struct task_struct *tsk)
{ }
+static inline int audit_log_ns_init(int type, unsigned int old_inum,
+ unsigned int inum)
+{ }
+static inline int audit_log_ns_del(int type, unsigned int inum)
+{ }
#define audit_enabled 0
#endif /* CONFIG_AUDIT */
static inline void audit_log_string(struct audit_buffer *ab, const char *buf)
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 1ffb151..487cad6 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -111,6 +111,18 @@
#define AUDIT_PROCTITLE 1327 /* Proctitle emit event */
#define AUDIT_FEATURE_CHANGE 1328 /* audit log listing feature changes */
#define AUDIT_NS_INFO 1329 /* Record process namespace IDs */
+#define AUDIT_NS_INIT_MNT 1330 /* Record mount namespace instance creation */
+#define AUDIT_NS_INIT_UTS 1331 /* Record UTS namespace instance creation */
+#define AUDIT_NS_INIT_IPC 1332 /* Record IPC namespace instance creation */
+#define AUDIT_NS_INIT_USER 1333 /* Record USER namespace instance creation */
+#define AUDIT_NS_INIT_PID 1334 /* Record PID namespace instance creation */
+#define AUDIT_NS_INIT_NET 1335 /* Record NET namespace instance creation */
+#define AUDIT_NS_DEL_MNT 1336 /* Record mount namespace instance deletion */
+#define AUDIT_NS_DEL_UTS 1337 /* Record UTS namespace instance deletion */
+#define AUDIT_NS_DEL_IPC 1338 /* Record IPC namespace instance deletion */
+#define AUDIT_NS_DEL_USER 1339 /* Record USER namespace instance deletion */
+#define AUDIT_NS_DEL_PID 1340 /* Record PID namespace instance deletion */
+#define AUDIT_NS_DEL_NET 1341 /* Record NET namespace instance deletion */

#define AUDIT_AVC 1400 /* SE Linux avc denial or grant */
#define AUDIT_SELINUX_ERR 1401 /* Internal SE Linux Errors */
diff --git a/ipc/namespace.c b/ipc/namespace.c
index 59451c1..73727ce 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -13,6 +13,7 @@
#include <linux/mount.h>
#include <linux/user_namespace.h>
#include <linux/proc_ns.h>
+#include <linux/audit.h>

#include "util.h"

@@ -41,6 +42,8 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
}
atomic_inc(&nr_ipc_ns);

+ audit_log_ns_init(AUDIT_NS_INIT_IPC, old_ns->proc_inum, ns->proc_inum);
+
sem_init_ns(ns);
msg_init_ns(ns);
shm_init_ns(ns);
@@ -119,6 +122,7 @@ static void free_ipc_ns(struct ipc_namespace *ns)
*/
ipcns_notify(IPCNS_REMOVED);
put_user_ns(ns->user_ns);
+ audit_log_ns_del(AUDIT_NS_DEL_IPC, ns->proc_inum);
proc_free_inum(ns->proc_inum);
kfree(ns);
}
@@ -197,3 +201,11 @@ const struct proc_ns_operations ipcns_operations = {
.install = ipcns_install,
.inum = ipcns_inum,
};
+
+/* log the ID of init IPC namespace after audit service starts */
+static int __init ipc_namespaces_init(void)
+{
+ audit_log_ns_init(AUDIT_NS_INIT_IPC, 0, init_ipc_ns.proc_inum);
+ return 0;
+}
+late_initcall(ipc_namespaces_init);
diff --git a/kernel/audit.c b/kernel/audit.c
index 63f32f4..e6230c4 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1978,6 +1978,70 @@ out:
kfree(name);
}

+#ifdef CONFIG_NAMESPACES
+static char *ns_name[] = {
+ "mnt",
+ "uts",
+ "ipc",
+ "user",
+ "pid",
+ "net",
+};
+
+/**
+ * audit_log_ns_init - report a namespace instance creation
+ * @type: type of audit namespace instance created message
+ * @old_inum: the ID number of the cloned namespace instance
+ * @inum: the ID number of the new namespace instance
+ */
+void audit_log_ns_init(int type, unsigned int old_inum, unsigned int inum)
+{
+ struct audit_buffer *ab;
+ char *audit_ns_name = ns_name[type - AUDIT_NS_INIT_MNT];
+ struct vfsmount *mnt = task_active_pid_ns(current)->proc_mnt;
+ struct super_block *sb = mnt->mnt_sb;
+ char old_ns[16];
+
+ if (type < AUDIT_NS_INIT_MNT || type > AUDIT_NS_INIT_NET) {
+ WARN(1, "audit_log_ns_init: type:%d out of range", type);
+ return;
+ }
+ if (!old_inum)
+ sprintf(old_ns, "(none)");
+ else
+ sprintf(old_ns, "%d", old_inum - PROC_DYNAMIC_FIRST);
+ audit_log_common_recv_msg(&ab, type);
+ audit_log_format(ab, " dev=%02x:%02x old_%sns=%s %sns=%d res=1",
+ MAJOR(sb->s_dev), MINOR(sb->s_dev),
+ audit_ns_name, old_ns,
+ audit_ns_name, inum - PROC_DYNAMIC_FIRST);
+ audit_log_end(ab);
+}
+
+/**
+ * audit_log_ns_del - report a namespace instance deleted
+ * @type: type of audit namespace instance deleted message
+ * @inum: the ID number of the namespace instance
+ */
+void audit_log_ns_del(int type, unsigned int inum)
+{
+ struct audit_buffer *ab;
+ char *audit_ns_name = ns_name[type - AUDIT_NS_DEL_MNT];
+ struct vfsmount *mnt = task_active_pid_ns(current)->proc_mnt;
+ struct super_block *sb = mnt->mnt_sb;
+
+ if (type < AUDIT_NS_DEL_MNT || type > AUDIT_NS_DEL_NET) {
+ WARN(1, "audit_log_ns_del: type:%d out of range", type);
+ return;
+ }
+ audit_log_common_recv_msg(&ab, type);
+ audit_log_format(ab, " dev=%02x:%02x %sns=%d res=1",
+ MAJOR(sb->s_dev), MINOR(sb->s_dev), audit_ns_name,
+ inum - PROC_DYNAMIC_FIRST);
+ audit_log_end(ab);
+}
+#endif /* CONFIG_NAMESPACES */
+
/**
* audit_log_end - end one audit record
* @ab: the audit_buffer
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index db95d8e..d28fd14 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -18,6 +18,7 @@
#include <linux/proc_ns.h>
#include <linux/reboot.h>
#include <linux/export.h>
+#include <linux/audit.h>

struct pid_cache {
int nr_ids;
@@ -109,6 +110,9 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
if (err)
goto out_free_map;

+ audit_log_ns_init(AUDIT_NS_INIT_PID, parent_pid_ns->proc_inum,
+ ns->proc_inum);
+
kref_init(&ns->kref);
ns->level = level;
ns->parent = get_pid_ns(parent_pid_ns);
@@ -142,6 +146,7 @@ static void destroy_pid_namespace(struct pid_namespace *ns)
{
int i;

+ audit_log_ns_del(AUDIT_NS_DEL_PID, ns->proc_inum);
proc_free_inum(ns->proc_inum);
for (i = 0; i < PIDMAP_ENTRIES; i++)
kfree(ns->pidmap[i].page);
@@ -388,3 +393,11 @@ static __init int pid_namespaces_init(void)
}

__initcall(pid_namespaces_init);
+
+/* log the ID of init PID namespace after audit service starts */
+static __init int pid_namespaces_late_init(void)
+{
+ audit_log_ns_init(AUDIT_NS_INIT_PID, 0, init_pid_ns.proc_inum);
+ return 0;
+}
+late_initcall(pid_namespaces_late_init);
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index fcc0256..89c2517 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -22,6 +22,7 @@
#include <linux/ctype.h>
#include <linux/projid.h>
#include <linux/fs_struct.h>
+#include <linux/audit.h>

static struct kmem_cache *user_ns_cachep __read_mostly;

@@ -92,6 +93,9 @@ int create_user_ns(struct cred *new)
return ret;
}

+ audit_log_ns_init(AUDIT_NS_INIT_USER, parent_ns->proc_inum,
+ ns->proc_inum);
+
atomic_set(&ns->count, 1);
/* Leave the new->user_ns reference with the new user namespace. */
ns->parent = parent_ns;
@@ -136,6 +140,7 @@ void free_user_ns(struct user_namespace *ns)
#ifdef CONFIG_PERSISTENT_KEYRINGS
key_put(ns->persistent_keyring_register);
#endif
+ audit_log_ns_del(AUDIT_NS_DEL_USER, ns->proc_inum);
proc_free_inum(ns->proc_inum);
kmem_cache_free(user_ns_cachep, ns);
ns = parent;
@@ -909,3 +914,11 @@ static __init int user_namespaces_init(void)
return 0;
}
subsys_initcall(user_namespaces_init);
+
+/* log the ID of init user namespace after audit service starts */
+static __init int user_namespaces_late_init(void)
+{
+ audit_log_ns_init(AUDIT_NS_INIT_USER, 0, init_user_ns.proc_inum);
+ return 0;
+}
+late_initcall(user_namespaces_late_init);
diff --git a/kernel/utsname.c b/kernel/utsname.c
index fd39312..fa21e8d 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/user_namespace.h>
#include <linux/proc_ns.h>
+#include <linux/audit.h>

static struct uts_namespace *create_uts_ns(void)
{
@@ -48,6 +49,8 @@ static struct uts_namespace *clone_uts_ns(struct user_namespace *user_ns,
return ERR_PTR(err);
}

+ audit_log_ns_init(AUDIT_NS_INIT_UTS, old_ns->proc_inum, ns->proc_inum);
+
down_read(&uts_sem);
memcpy(&ns->name, &old_ns->name, sizeof(ns->name));
ns->user_ns = get_user_ns(user_ns);
@@ -84,6 +87,7 @@ void free_uts_ns(struct kref *kref)

ns = container_of(kref, struct uts_namespace, kref);
put_user_ns(ns->user_ns);
+ audit_log_ns_del(AUDIT_NS_DEL_UTS, ns->proc_inum);
proc_free_inum(ns->proc_inum);
kfree(ns);
}
@@ -138,3 +142,11 @@ const struct proc_ns_operations utsns_operations = {
.install = utsns_install,
.inum = utsns_inum,
};
+
+/* log the ID of init UTS namespace after audit service starts */
+static int __init uts_namespaces_init(void)
+{
+ audit_log_ns_init(AUDIT_NS_INIT_UTS, 0, init_uts_ns.proc_inum);
+ return 0;
+}
+late_initcall(uts_namespaces_init);
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 85b6269..562eb85 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -17,6 +17,7 @@
#include <linux/user_namespace.h>
#include <net/net_namespace.h>
#include <net/netns/generic.h>
+#include <linux/audit.h>

/*
* Our network namespace constructor/destructor lists
@@ -253,6 +254,8 @@ struct net *copy_net_ns(unsigned long flags,
mutex_lock(&net_mutex);
rv = setup_net(net, user_ns);
if (rv == 0) {
+ audit_log_ns_init(AUDIT_NS_INIT_NET, old_net->proc_inum,
+ net->proc_inum);
rtnl_lock();
list_add_tail_rcu(&net->list, &net_namespace_list);
rtnl_unlock();
@@ -389,6 +392,7 @@ static __net_init int net_ns_net_init(struct net *net)

static __net_exit void net_ns_net_exit(struct net *net)
{
+ audit_log_ns_del(AUDIT_NS_DEL_NET, net->proc_inum);
proc_free_inum(net->proc_inum);
}

@@ -435,6 +439,14 @@ static int __init net_ns_init(void)

pure_initcall(net_ns_init);

+/* log the ID of init_net namespace after audit service starts */
+static int __init net_ns_init_log(void)
+{
+ audit_log_ns_init(AUDIT_NS_INIT_NET, 0, init_net.proc_inum);
+ return 0;
+}
+late_initcall(net_ns_init_log);
+
#ifdef CONFIG_NET_NS
static int __register_pernet_operations(struct list_head *list,
struct pernet_operations *ops)
--
1.7.1

2015-04-17 07:38:39

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH V6 06/10] audit: dump namespace IDs for pid on receipt of AUDIT_NS_INFO

When a task with CAP_AUDIT_CONTROL sends a NETLINK_AUDIT message of type
AUDIT_NS_INFO with a PID of interest, dump the namespace IDs of that task to
the audit log.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
kernel/audit.c | 14 ++++++++++++++
1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index e6230c4..b7f10e9 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -674,6 +674,7 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
case AUDIT_TTY_SET:
case AUDIT_TRIM:
case AUDIT_MAKE_EQUIV:
+ case AUDIT_NS_INFO:
/* Only support auditd and auditctl in initial pid namespace
* for now. */
if (task_active_pid_ns(current) != &init_pid_ns)
@@ -1070,6 +1071,19 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
audit_log_end(ab);
break;
}
+ case AUDIT_NS_INFO:
+#ifdef CONFIG_NAMESPACES
+ {
+ struct task_struct *tsk;
+
+ rcu_read_lock();
+ tsk = find_task_by_vpid(*(pid_t *)data);
+ rcu_read_unlock();
+ audit_log_ns_info(tsk);
+ }
+#else /* CONFIG_NAMESPACES */
+ err = -EOPNOTSUPP;
+#endif /* CONFIG_NAMESPACES */
default:
err = -EINVAL;
break;
--
1.7.1

2015-04-17 07:37:19

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH V6 07/10] sched: add a macro to ref all CLONE_NEW* flags

Added the macro CLONE_NEW_MASK_ALL to refer to all CLONE_NEW* flags.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
include/uapi/linux/sched.h | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 34f9d73..21ed8f4 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -28,6 +28,12 @@
#define CLONE_NEWUSER 0x10000000 /* New user namespace */
#define CLONE_NEWPID 0x20000000 /* New pid namespace */
#define CLONE_NEWNET 0x40000000 /* New network namespace */
+#define CLONE_NEW_MASK_ALL (CLONE_NEWNS \
+ | CLONE_NEWUTS \
+ | CLONE_NEWIPC \
+ | CLONE_NEWUSER \
+ | CLONE_NEWPID \
+ | CLONE_NEWNET) /* mask of all namespace type flags */
#define CLONE_IO 0x80000000 /* Clone io context */

/*
--
1.7.1

2015-04-17 07:37:10

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH V6 08/10] fork: audit on creation of new namespace(s)

When clone(2) is called to fork a new process creating one or more namespaces,
audit the event to tie the new pid with the namespace IDs.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
kernel/fork.c | 3 +++
kernel/nsproxy.c | 1 +
2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 6a13c46..2ea1225 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1624,6 +1624,9 @@ long do_fork(unsigned long clone_flags,
get_task_struct(p);
}

+ if (unlikely(clone_flags & CLONE_NEW_MASK_ALL))
+ audit_log_ns_info(p);
+
wake_up_new_task(p);

/* forking complete and child started to run, tell ptracer */
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 8e78110..d5353c2 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -25,6 +25,7 @@
#include <linux/proc_ns.h>
#include <linux/file.h>
#include <linux/syscalls.h>
+#include <linux/audit.h>

static struct kmem_cache *nsproxy_cachep;

--
1.7.1

2015-04-17 07:38:02

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH V6 09/10] audit: log on switching namespace (setns)

Added six new audit message types, AUDIT_NS_SET_* and function
audit_log_ns_set() to log a switch of namespace.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
include/linux/audit.h | 4 +++
include/uapi/linux/audit.h | 6 +++++
kernel/audit.c | 52 ++++++++++++++++++++++++++++++++++++++++++++
kernel/nsproxy.c | 3 ++
4 files changed, 65 insertions(+), 0 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index b28dfb0..c71c819 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -26,6 +26,7 @@
#include <linux/sched.h>
#include <linux/ptrace.h>
#include <uapi/linux/audit.h>
+#include <linux/proc_ns.h>

struct audit_sig_info {
uid_t uid;
@@ -487,6 +488,7 @@ static inline void audit_log_ns_info(struct task_struct *tsk)
extern void audit_log_ns_init(int type, unsigned int old_inum,
unsigned int inum);
extern void audit_log_ns_del(int type, unsigned int inum);
+extern void audit_log_ns_set(const struct proc_ns_operations *ops, void *ns);

extern int audit_update_lsm_rules(void);

@@ -550,6 +552,8 @@ static inline int audit_log_ns_init(int type, unsigned int old_inum,
{ }
static inline int audit_log_ns_del(int type, unsigned int inum)
{ }
+static inline void audit_log_ns_set(const struct proc_ns_operations *ops, void *ns)
+{ }
#define audit_enabled 0
#endif /* CONFIG_AUDIT */
static inline void audit_log_string(struct audit_buffer *ab, const char *buf)
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 487cad6..567b45f 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -123,6 +123,12 @@
#define AUDIT_NS_DEL_USER 1339 /* Record USER namespace instance deletion */
#define AUDIT_NS_DEL_PID 1340 /* Record PID namespace instance deletion */
#define AUDIT_NS_DEL_NET 1341 /* Record NET namespace instance deletion */
+#define AUDIT_NS_SET_MNT 1342 /* Record mount namespace instance deletion */
+#define AUDIT_NS_SET_UTS 1343 /* Record UTS namespace instance deletion */
+#define AUDIT_NS_SET_IPC 1344 /* Record IPC namespace instance deletion */
+#define AUDIT_NS_SET_USER 1345 /* Record USER namespace instance deletion */
+#define AUDIT_NS_SET_PID 1346 /* Record PID namespace instance deletion */
+#define AUDIT_NS_SET_NET 1347 /* Record NET namespace instance deletion */

#define AUDIT_AVC 1400 /* SE Linux avc denial or grant */
#define AUDIT_SELINUX_ERR 1401 /* Internal SE Linux Errors */
diff --git a/kernel/audit.c b/kernel/audit.c
index b7f10e9..a7b1b61 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -2054,6 +2054,58 @@ void audit_log_ns_del(int type, unsigned int inum)
inum - PROC_DYNAMIC_FIRST);
audit_log_end(ab);
}
+
+/**
+ * audit_log_ns_set - report a namespace set change
+ * @ops: the ops structure for the namespace to be changed
+ * @ns: the new namespace
+ */
+void audit_log_ns_set(const struct proc_ns_operations *ops, void *ns)
+{
+ struct audit_buffer *ab;
+ void *old_ns;
+ int msg_type;
+ struct vfsmount *mnt = task_active_pid_ns(current)->proc_mnt;
+ struct super_block *sb = mnt->mnt_sb;
+ char old_ns_s[16];
+
+ switch (ops->type) {
+ case CLONE_NEWNS:
+ msg_type = AUDIT_NS_SET_MNT;
+ break;
+ case CLONE_NEWUTS:
+ msg_type = AUDIT_NS_SET_UTS;
+ break;
+ case CLONE_NEWIPC:
+ msg_type = AUDIT_NS_SET_IPC;
+ break;
+ case CLONE_NEWUSER:
+ msg_type = AUDIT_NS_SET_USER;
+ break;
+ case CLONE_NEWPID:
+ msg_type = AUDIT_NS_SET_PID;
+ break;
+ case CLONE_NEWNET:
+ msg_type = AUDIT_NS_SET_NET;
+ break;
+ default:
+ return;
+ }
+ audit_log_common_recv_msg(&ab, ops->type);
+ if (!ab)
+ return;
+ old_ns = ops->get(current);
+ if (!ops->inum(old_ns))
+ sprintf(old_ns_s, "(none)");
+ else
+ sprintf(old_ns_s, "%d", ops->inum(old_ns) - PROC_DYNAMIC_FIRST);
+ audit_log_format(ab, " dev=%02x:%02x old_%sns=%s %sns=%d res=1",
+ MAJOR(sb->s_dev), MINOR(sb->s_dev),
+ ops->name, old_ns_s,
+ ops->name, ops->inum(ns));
+ ops->put(old_ns);
+ audit_log_end(ab);
+}
#endif /* CONFIG_NAMESPACES */

/**
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index d5353c2..2ca86cf 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -257,6 +257,9 @@ SYSCALL_DEFINE2(setns, int, fd, int, nstype)
goto out;
}
switch_task_namespaces(tsk, new_nsproxy);
+
+ audit_log_ns_set(ops, ei->ns);
+
out:
fput(file);
return err;
--
1.7.1

2015-04-17 07:37:58

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH V6 10/10] audit: emit AUDIT_NS_INFO record with AUDIT_VIRT_CONTROL record

Signed-off-by: Richard Guy Briggs <[email protected]>
---
include/uapi/linux/audit.h | 2 ++
kernel/audit.c | 2 ++
2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 567b45f..b6a55fe 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -163,6 +163,8 @@

#define AUDIT_KERNEL 2000 /* Asynchronous audit record. NOT A REQUEST. */

+#define AUDIT_VIRT_CONTROL 2500 /* Start, Pause, Stop VM */
+
/* Rule flags */
#define AUDIT_FILTER_USER 0x00 /* Apply rule to user-generated messages */
#define AUDIT_FILTER_TASK 0x01 /* Apply rule at task creation (not syscall) */
diff --git a/kernel/audit.c b/kernel/audit.c
index a7b1b61..8a01d88 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -943,6 +943,8 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
}
audit_set_portid(ab, NETLINK_CB(skb).portid);
audit_log_end(ab);
+ if (msg_type == AUDIT_VIRT_CONTROL)
+ audit_log_ns_info(NULL);
mutex_lock(&audit_cmd_mutex);
}
break;
--
1.7.1

2015-04-17 08:19:05

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V6 07/10] sched: add a macro to ref all CLONE_NEW* flags

On Fri, Apr 17, 2015 at 03:35:54AM -0400, Richard Guy Briggs wrote:
> Added the macro CLONE_NEW_MASK_ALL to refer to all CLONE_NEW* flags.

A wee bit about why might be nice..

2015-04-17 15:48:26

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH V6 07/10] sched: add a macro to ref all CLONE_NEW* flags

On 15/04/17, Peter Zijlstra wrote:
> On Fri, Apr 17, 2015 at 03:35:54AM -0400, Richard Guy Briggs wrote:
> > Added the macro CLONE_NEW_MASK_ALL to refer to all CLONE_NEW* flags.
>
> A wee bit about why might be nice..

It makes the following patch much cleaner to read:
[PATCH V6 08/10] fork: audit on creation of new namespace(s)
https://lkml.org/lkml/2015/4/17/50

I was hoping it might also make a lot of other code cleaner, but most of
the other places where multiple CLONE_NEW* flags are used, not all six
are used together, but only 5 are used. Ok, so it is helpful in 1 of 3:

It would actually be useful in check_unshare_flags():
https://github.com/torvalds/linux/blob/v3.17/kernel/fork.c#L1791

but not in copy_namespaces() or unshare_nsproxy_namespaces():
https://github.com/torvalds/linux/blob/v3.17/kernel/nsproxy.c#L130
https://github.com/torvalds/linux/blob/v3.17/kernel/nsproxy.c#L183

- RGB

--
Richard Guy Briggs <[email protected]>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

2015-04-17 17:41:48

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V6 07/10] sched: add a macro to ref all CLONE_NEW* flags

On Fri, Apr 17, 2015 at 11:42:50AM -0400, Richard Guy Briggs wrote:
> On 15/04/17, Peter Zijlstra wrote:
> > On Fri, Apr 17, 2015 at 03:35:54AM -0400, Richard Guy Briggs wrote:
> > > Added the macro CLONE_NEW_MASK_ALL to refer to all CLONE_NEW* flags.
> >
> > A wee bit about why might be nice..
>
> It makes the following patch much cleaner to read:
> [PATCH V6 08/10] fork: audit on creation of new namespace(s)
> https://lkml.org/lkml/2015/4/17/50
>
> I was hoping it might also make a lot of other code cleaner, but most of
> the other places where multiple CLONE_NEW* flags are used, not all six
> are used together, but only 5 are used. Ok, so it is helpful in 1 of 3:
>
> It would actually be useful in check_unshare_flags():
> https://github.com/torvalds/linux/blob/v3.17/kernel/fork.c#L1791
>
> but not in copy_namespaces() or unshare_nsproxy_namespaces():
> https://github.com/torvalds/linux/blob/v3.17/kernel/nsproxy.c#L130
> https://github.com/torvalds/linux/blob/v3.17/kernel/nsproxy.c#L183
>

Right, so no objections from me on this, its just that I only saw this
one patch in isolation without context and the changelog failed on
rationale.

Does it perchance make sense to fold this patch into the next patch that
actually makes use of it?

2015-04-17 22:00:45

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH V6 07/10] sched: add a macro to ref all CLONE_NEW* flags

On 15/04/17, Peter Zijlstra wrote:
> On Fri, Apr 17, 2015 at 11:42:50AM -0400, Richard Guy Briggs wrote:
> > On 15/04/17, Peter Zijlstra wrote:
> > > On Fri, Apr 17, 2015 at 03:35:54AM -0400, Richard Guy Briggs wrote:
> > > > Added the macro CLONE_NEW_MASK_ALL to refer to all CLONE_NEW* flags.
> > >
> > > A wee bit about why might be nice..
> >
> > It makes the following patch much cleaner to read:
> > [PATCH V6 08/10] fork: audit on creation of new namespace(s)
> > https://lkml.org/lkml/2015/4/17/50
> >
> > I was hoping it might also make a lot of other code cleaner, but most of
> > the other places where multiple CLONE_NEW* flags are used, not all six
> > are used together, but only 5 are used. Ok, so it is helpful in 1 of 3:
> >
> > It would actually be useful in check_unshare_flags():
> > https://github.com/torvalds/linux/blob/v3.17/kernel/fork.c#L1791
> >
> > but not in copy_namespaces() or unshare_nsproxy_namespaces():
> > https://github.com/torvalds/linux/blob/v3.17/kernel/nsproxy.c#L130
> > https://github.com/torvalds/linux/blob/v3.17/kernel/nsproxy.c#L183
>
> Right, so no objections from me on this, its just that I only saw this
> one patch in isolation without context and the changelog failed on
> rationale.

I realize you only saw a small window of this patchset, but this feels
like bike shedding about the main objective of the set...

I'll add a bit more justification and context if/when I respin for the
rest of the set.

> Does it perchance make sense to fold this patch into the next patch that
> actually makes use of it?

It would if it were the only potential user. I don't want to bury a
surprise in something bigger. Is there a preferred way to use such a
macro to make the other three examples cleaner, or is that just useless
churn and obfuscation? Would there be a concise way to express all
CLONE_NEW* flags *except* user?

- RGB

--
Richard Guy Briggs <[email protected]>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

2015-04-21 04:37:45

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH V6 00/10] namespaces: log namespaces per task

Richard Guy Briggs <[email protected]> writes:

> The purpose is to track namespace instances in use by logged processes from the
> perspective of init_*_ns by logging the namespace IDs (device ID and namespace
> inode - offset).

In broad strokes the user interface appears correct.

Things that I see that concern me:

- After Als most recent changes these inodes no longer live in the proc
superblock so the device number reported in these patches is
incorrect.

- I am nervous about audit logs being flooded with users creating lots
of namespaces. But that is more your lookout than mine.

- unshare is not logging when it creates new namespaces.

As small numbers are nice and these inodes all live in their own
superblock now we should be able to remove the games with
PROC_DYNAMIC_FIRST and just use small numbers for these inodes
everywhere.

I have answered your comments below.

> 1/10 exposes proc's ns entries structure which lists a number of useful
> operations per namespace type for other subsystems to use.
>
> 2/10 proc_ns: define PROC_*_INIT_INO in terms of PROC_DYNAMIC_FIRST
>
> 3/10 provides an example of usage for audit_log_task_info() which is used by
> syscall audits, among others. audit_log_task() and audit_common_recv_message()
> would be other potential use cases.
>
> Proposed output format:
> This differs slightly from Aristeu's patch because of the label conflict with
> "pid=" due to including it in existing records rather than it being a seperate
> record. It has now returned to being a seperate record. The proc device
> major/minor are listed in hexadecimal and namespace IDs are the proc inode
> minus the base offset.
> type=NS_INFO msg=audit(1408577535.306:82): dev=00:03 netns=3 utsns=-3 ipcns=-4 pidns=-1 userns=-2 mntns=0
>
> 4/10 change audit startup from __initcall to subsys_initcall to get it started
> earlier to be able to receive initial namespace log messages.
>
> 5/10 tracks the creation and deletion of namespaces, listing the type of
> namespace instance, proc device ID, related namespace id if there is one and
> the newly minted namespace ID.
>
> Proposed output format for initial namespace creation:
> type=AUDIT_NS_INIT_UTS msg=audit(1408577534.868:5): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_utsns=(none) utsns=-3 res=1
> type=AUDIT_NS_INIT_USER msg=audit(1408577534.868:6): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_userns=(none) userns=-2 res=1
> type=AUDIT_NS_INIT_PID msg=audit(1408577534.868:7): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_pidns=(none) pidns=-1 res=1
> type=AUDIT_NS_INIT_MNT msg=audit(1408577534.868:8): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_mntns=(none) mntns=0 res=1
> type=AUDIT_NS_INIT_IPC msg=audit(1408577534.868:9): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_ipcns=(none) ipcns=-4 res=1
> type=AUDIT_NS_INIT_NET msg=audit(1408577533.500:10): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_netns=(none) netns=2 res=1
>
> And a CLONE action would result in:
> type=type=AUDIT_NS_INIT_NET msg=audit(1408577535.306:81): pid=481 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 dev=00:03 old_netns=2 netns=3 res=1
>
> While deleting a namespace would result in:
> type=type=AUDIT_NS_DEL_MNT msg=audit(1408577552.221:85): pid=481 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 dev=00:03 mntns=4 res=1
>
> 6/10 accepts a PID from userspace and requests logging an AUDIT_NS_INFO record
> type (CAP_AUDIT_CONTROL required).
>
> 7/10 is a macro for CLONE_NEW_* flags.
>
> 8/10 adds auditing on creation of namespace(s) in fork.
>
> 9/10 adds auditing a change of namespace on setns.
>
> 10/10 attaches a AUDIT_NS_INFO record to AUDIT_VIRT_CONTROL records
> (CAP_AUDIT_WRITE required).
>
>
> v5 -> v6:
> Switch to using namespace ID based on namespace proc inode minus base offset
> Added proc device ID to qualify proc inode reference
> Eliminate exposed /proc interface
>
> v4 -> v5:
> Clean up prototypes for dependencies on CONFIG_NAMESPACES.
> Add AUDIT_NS_INFO record type to AUDIT_VIRT_CONTROL record.
> Log AUDIT_NS_INFO with PID.
> Move /proc/<pid>/ns_* patches to end of patchset to deprecate them.
> Log on changing ns (setns).
> Log on creating new namespaces when forking.
> Added a macro for CLONE_NEW*.
>
> v3 -> v4:
> Seperate out the NS_INFO message from the SYSCALL message.
> Moved audit_log_namespace_info() out of audit_log_task_info().
> Use a seperate message type per namespace type for each of INIT/DEL.
> Make ns= easier to search across NS_INFO and NS_INIT/DEL_XXX msg types.
> Add /proc/<pid>/ns/ documentation.
> Fix dynamic initial ns logging.
>
> v2 -> v3:
> Use atomic64_t in ns_serial to simplify it.
> Avoid funciton duplication in proc, keying on dentry.
> Squash down audit patch to avoid rcu sleep issues.
> Add tracking for creation and deletion of namespace instances.
>
> v1 -> v2:
> Avoid rollover by switching from an int to a long long.
> Change rollover behaviour from simply avoiding zero to raising a BUG.
> Expose serial numbers in /proc/<pid>/ns/*_snum.
> Expose ns_entries and use it in audit.
>
>
> Notes:
> As for CAP_AUDIT_READ, a patchset has been accepted upstream to check
> capabilities of userspace processes that try to join netlink broadcast groups.
>
> This set does not try to solve the non-init namespace audit messages and
> auditd problem yet. That will come later, likely with additional auditd
> instances running in another namespace with a limited ability to influence the
> master auditd. I echo Eric B's idea that messages destined for different
> namespaces would have to be tailored for that namespace with references that
> make sense (such as the right pid number reported to that pid namespace, and
> not leaking info about parents or peers).
>
> Questions:
> Is there a way to link serial numbers of namespaces involved in migration of a
> container to another kernel? It sounds like what is needed is a part of a
> mangement application that is able to pull the audit records from constituent
> hosts to build an audit trail of a container.

I honestly don't know how much we are going to care about namespace ids
during migration. So far this is not a problem that has come up.

I don't think migration becomes a practical concern (other than
interface wise) until achieve a non-init namespace auditd. The easy way
to handle migration would be to log a setns of every process from their
old namespaces to their new namespaces. As you appear to have a setns
event defined.

How to handle the more general case beyond audit remains unclear. I
think it will be a little while yet before we start dealing with
migrating applications that care. When we do we will either need to
generate some kind of hot-plug event that userspace can respond to and
discover all of the appropriate file-system nodes have changed, or we
will need to build a mechanism in the kernel to preserve these numbers.

I really don't know which solution we will wind up with in the kernel at
this point.

> What additional events should list this information?

At least unshare.

> Does this present any problematic information leaks? Only CAP_AUDIT_CONTROL
> (and now CAP_AUDIT_READ) in init_user_ns can get to this information in
> the init namespace at the moment from audit.

Good question. Today access to this information is generally guarded
with CAP_SYS_PTRACE.

I suspect for some of audits tracing features like this one we should
also use CAP_SYS_PTRACE so that we have a consistent set of checks for
getting information about applications.

Eric


> Richard Guy Briggs (10):
> namespaces: expose ns_entries
> proc_ns: define PROC_*_INIT_INO in terms of PROC_DYNAMIC_FIRST
> audit: log namespace ID numbers
> audit: initialize at subsystem time rather than device time
> audit: log creation and deletion of namespace instances
> audit: dump namespace IDs for pid on receipt of AUDIT_NS_INFO
> sched: add a macro to ref all CLONE_NEW* flags
> fork: audit on creation of new namespace(s)
> audit: log on switching namespace (setns)
> audit: emit AUDIT_NS_INFO record with AUDIT_VIRT_CONTROL record
>
> fs/namespace.c | 13 +++
> fs/proc/generic.c | 3 +-
> fs/proc/namespaces.c | 2 +-
> include/linux/audit.h | 20 +++++
> include/linux/proc_ns.h | 10 ++-
> include/uapi/linux/audit.h | 21 +++++
> include/uapi/linux/sched.h | 6 ++
> ipc/namespace.c | 12 +++
> kernel/audit.c | 169 +++++++++++++++++++++++++++++++++++++-
> kernel/auditsc.c | 2 +
> kernel/fork.c | 3 +
> kernel/nsproxy.c | 4 +
> kernel/pid_namespace.c | 13 +++
> kernel/user_namespace.c | 13 +++
> kernel/utsname.c | 12 +++
> net/core/net_namespace.c | 12 +++
> security/integrity/ima/ima_api.c | 2 +
> 17 files changed, 309 insertions(+), 8 deletions(-)

2015-04-23 03:08:12

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH V6 00/10] namespaces: log namespaces per task

On 15/04/20, Eric W. Biederman wrote:
> Richard Guy Briggs <[email protected]> writes:
>
> > The purpose is to track namespace instances in use by logged processes from the
> > perspective of init_*_ns by logging the namespace IDs (device ID and namespace
> > inode - offset).
>
> In broad strokes the user interface appears correct.
>
> Things that I see that concern me:
>
> - After Als most recent changes these inodes no longer live in the proc
> superblock so the device number reported in these patches is
> incorrect.

Ok, found the patchset you're talking about:
3d3d35b kill proc_ns completely
e149ed2 take the targets of /proc/*/ns/* symlinks to separate fs
f77c801 bury struct proc_ns in fs/proc
33c4294 copy address of proc_ns_ops into ns_common
6344c43 new helpers: ns_alloc_inum/ns_free_inum
6496452 make proc_ns_operations work with struct ns_common * instead of void *
3c04118 switch the rest of proc_ns_operations to working with &...->ns
ff24870 netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
58be2825 make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
435d5f4 common object embedded into various struct ....ns

Ok, I've got some minor jigging to do to get inum too...

> - I am nervous about audit logs being flooded with users creating lots
> of namespaces. But that is more your lookout than mine.

There was a thought to create a filter to en/disable this logging...
It is an auxiliary record to syscalls, so they can be ignored by userspace tools.

> - unshare is not logging when it creates new namespaces.

They are all covered:
sys_unshare > unshare_userns > create_user_ns
sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_mnt_ns
sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_utsname > clone_uts_ns
sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_ipcs > get_ipc_ns
sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_pid_ns > create_pid_namespace
sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_net_ns

> As small numbers are nice and these inodes all live in their own
> superblock now we should be able to remove the games with
> PROC_DYNAMIC_FIRST and just use small numbers for these inodes
> everywhere.

That is compelling if I can untangle the proc inode allocation code from the
ida/idr. Should be as easy as defining a new ns_alloc_inum (and ns_free_inum)
to use instead of proc_alloc_inum with its own ns_inum_ida and ns_inum_lock,
then defining a NS_DYNAMIC_FIRST and defining NS_{IPC,UTS,USER,PID}_INIT_INO in
the place of the existing PROC_*_INIT_INO.

> I have answered your comments below.

More below...

> > 1/10 exposes proc's ns entries structure which lists a number of useful
> > operations per namespace type for other subsystems to use.
> >
> > 2/10 proc_ns: define PROC_*_INIT_INO in terms of PROC_DYNAMIC_FIRST
> >
> > 3/10 provides an example of usage for audit_log_task_info() which is used by
> > syscall audits, among others. audit_log_task() and audit_common_recv_message()
> > would be other potential use cases.
> >
> > Proposed output format:
> > This differs slightly from Aristeu's patch because of the label conflict with
> > "pid=" due to including it in existing records rather than it being a seperate
> > record. It has now returned to being a seperate record. The proc device
> > major/minor are listed in hexadecimal and namespace IDs are the proc inode
> > minus the base offset.
> > type=NS_INFO msg=audit(1408577535.306:82): dev=00:03 netns=3 utsns=-3 ipcns=-4 pidns=-1 userns=-2 mntns=0
> >
> > 4/10 change audit startup from __initcall to subsys_initcall to get it started
> > earlier to be able to receive initial namespace log messages.
> >
> > 5/10 tracks the creation and deletion of namespaces, listing the type of
> > namespace instance, proc device ID, related namespace id if there is one and
> > the newly minted namespace ID.
> >
> > Proposed output format for initial namespace creation:
> > type=AUDIT_NS_INIT_UTS msg=audit(1408577534.868:5): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_utsns=(none) utsns=-3 res=1
> > type=AUDIT_NS_INIT_USER msg=audit(1408577534.868:6): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_userns=(none) userns=-2 res=1
> > type=AUDIT_NS_INIT_PID msg=audit(1408577534.868:7): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_pidns=(none) pidns=-1 res=1
> > type=AUDIT_NS_INIT_MNT msg=audit(1408577534.868:8): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_mntns=(none) mntns=0 res=1
> > type=AUDIT_NS_INIT_IPC msg=audit(1408577534.868:9): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_ipcns=(none) ipcns=-4 res=1
> > type=AUDIT_NS_INIT_NET msg=audit(1408577533.500:10): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_netns=(none) netns=2 res=1
> >
> > And a CLONE action would result in:
> > type=type=AUDIT_NS_INIT_NET msg=audit(1408577535.306:81): pid=481 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 dev=00:03 old_netns=2 netns=3 res=1
> >
> > While deleting a namespace would result in:
> > type=type=AUDIT_NS_DEL_MNT msg=audit(1408577552.221:85): pid=481 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 dev=00:03 mntns=4 res=1
> >
> > 6/10 accepts a PID from userspace and requests logging an AUDIT_NS_INFO record
> > type (CAP_AUDIT_CONTROL required).
> >
> > 7/10 is a macro for CLONE_NEW_* flags.
> >
> > 8/10 adds auditing on creation of namespace(s) in fork.
> >
> > 9/10 adds auditing a change of namespace on setns.
> >
> > 10/10 attaches a AUDIT_NS_INFO record to AUDIT_VIRT_CONTROL records
> > (CAP_AUDIT_WRITE required).
> >
> >
> > v5 -> v6:
> > Switch to using namespace ID based on namespace proc inode minus base offset
> > Added proc device ID to qualify proc inode reference
> > Eliminate exposed /proc interface
> >
> > v4 -> v5:
> > Clean up prototypes for dependencies on CONFIG_NAMESPACES.
> > Add AUDIT_NS_INFO record type to AUDIT_VIRT_CONTROL record.
> > Log AUDIT_NS_INFO with PID.
> > Move /proc/<pid>/ns_* patches to end of patchset to deprecate them.
> > Log on changing ns (setns).
> > Log on creating new namespaces when forking.
> > Added a macro for CLONE_NEW*.
> >
> > v3 -> v4:
> > Seperate out the NS_INFO message from the SYSCALL message.
> > Moved audit_log_namespace_info() out of audit_log_task_info().
> > Use a seperate message type per namespace type for each of INIT/DEL.
> > Make ns= easier to search across NS_INFO and NS_INIT/DEL_XXX msg types.
> > Add /proc/<pid>/ns/ documentation.
> > Fix dynamic initial ns logging.
> >
> > v2 -> v3:
> > Use atomic64_t in ns_serial to simplify it.
> > Avoid funciton duplication in proc, keying on dentry.
> > Squash down audit patch to avoid rcu sleep issues.
> > Add tracking for creation and deletion of namespace instances.
> >
> > v1 -> v2:
> > Avoid rollover by switching from an int to a long long.
> > Change rollover behaviour from simply avoiding zero to raising a BUG.
> > Expose serial numbers in /proc/<pid>/ns/*_snum.
> > Expose ns_entries and use it in audit.
> >
> >
> > Notes:
> > As for CAP_AUDIT_READ, a patchset has been accepted upstream to check
> > capabilities of userspace processes that try to join netlink broadcast groups.
> >
> > This set does not try to solve the non-init namespace audit messages and
> > auditd problem yet. That will come later, likely with additional auditd
> > instances running in another namespace with a limited ability to influence the
> > master auditd. I echo Eric B's idea that messages destined for different
> > namespaces would have to be tailored for that namespace with references that
> > make sense (such as the right pid number reported to that pid namespace, and
> > not leaking info about parents or peers).
> >
> > Questions:
> > Is there a way to link serial numbers of namespaces involved in migration of a
> > container to another kernel? It sounds like what is needed is a part of a
> > mangement application that is able to pull the audit records from constituent
> > hosts to build an audit trail of a container.
>
> I honestly don't know how much we are going to care about namespace ids
> during migration. So far this is not a problem that has come up.

Not for CRIU, but it will be an issue for a container auditor that aggregates
information from individually auditted hosts.

> I don't think migration becomes a practical concern (other than
> interface wise) until achieve a non-init namespace auditd. The easy way
> to handle migration would be to log a setns of every process from their
> old namespaces to their new namespaces. As you appear to have a setns
> event defined.

Again, this would be taken care of by a layer above that is container-aware
across multiple hosts.

> How to handle the more general case beyond audit remains unclear. I
> think it will be a little while yet before we start dealing with
> migrating applications that care. When we do we will either need to
> generate some kind of hot-plug event that userspace can respond to and
> discover all of the appropriate file-system nodes have changed, or we
> will need to build a mechanism in the kernel to preserve these numbers.

I don't expect to need to preserve these numbers. The higher layer application
will be able to do that translation.

> I really don't know which solution we will wind up with in the kernel at
> this point.
>
> > What additional events should list this information?
>
> At least unshare.

Already covered as noted above. If it is a brand new namespace, it will show
the old one as "(none)" (or maybe zero now that we are looking at renumbering
the NS inodes). If it is an unshared one, it will show the old one from which
it was unshared.

> > Does this present any problematic information leaks? Only CAP_AUDIT_CONTROL
> > (and now CAP_AUDIT_READ) in init_user_ns can get to this information in
> > the init namespace at the moment from audit.
>
> Good question. Today access to this information is generally guarded
> with CAP_SYS_PTRACE.
>
> I suspect for some of audits tracing features like this one we should
> also use CAP_SYS_PTRACE so that we have a consistent set of checks for
> getting information about applications.

I assume CAP_SYS_PTRACE is orthogonal to CAP_AUDIT_{CONTROL,READ} and that
CAP_SYS_PTRACE would need to be insufficient to get that information.


Thanks for your thoughtful feedback, Eric.

> Eric
>
> > Richard Guy Briggs (10):
> > namespaces: expose ns_entries
> > proc_ns: define PROC_*_INIT_INO in terms of PROC_DYNAMIC_FIRST
> > audit: log namespace ID numbers
> > audit: initialize at subsystem time rather than device time
> > audit: log creation and deletion of namespace instances
> > audit: dump namespace IDs for pid on receipt of AUDIT_NS_INFO
> > sched: add a macro to ref all CLONE_NEW* flags
> > fork: audit on creation of new namespace(s)
> > audit: log on switching namespace (setns)
> > audit: emit AUDIT_NS_INFO record with AUDIT_VIRT_CONTROL record
> >
> > fs/namespace.c | 13 +++
> > fs/proc/generic.c | 3 +-
> > fs/proc/namespaces.c | 2 +-
> > include/linux/audit.h | 20 +++++
> > include/linux/proc_ns.h | 10 ++-
> > include/uapi/linux/audit.h | 21 +++++
> > include/uapi/linux/sched.h | 6 ++
> > ipc/namespace.c | 12 +++
> > kernel/audit.c | 169 +++++++++++++++++++++++++++++++++++++-
> > kernel/auditsc.c | 2 +
> > kernel/fork.c | 3 +
> > kernel/nsproxy.c | 4 +
> > kernel/pid_namespace.c | 13 +++
> > kernel/user_namespace.c | 13 +++
> > kernel/utsname.c | 12 +++
> > net/core/net_namespace.c | 12 +++
> > security/integrity/ima/ima_api.c | 2 +
> > 17 files changed, 309 insertions(+), 8 deletions(-)

- RGB

--
Richard Guy Briggs <[email protected]>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

2015-04-23 20:44:47

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH V6 00/10] namespaces: log namespaces per task

On 15/04/22, Richard Guy Briggs wrote:
> On 15/04/20, Eric W. Biederman wrote:
> > Richard Guy Briggs <[email protected]> writes:
> >
> > > The purpose is to track namespace instances in use by logged processes from the
> > > perspective of init_*_ns by logging the namespace IDs (device ID and namespace
> > > inode - offset).
> >
> > In broad strokes the user interface appears correct.
> >
> > Things that I see that concern me:
> >
> > - After Als most recent changes these inodes no longer live in the proc
> > superblock so the device number reported in these patches is
> > incorrect.
>
> Ok, found the patchset you're talking about:
> 3d3d35b kill proc_ns completely
> e149ed2 take the targets of /proc/*/ns/* symlinks to separate fs
> f77c801 bury struct proc_ns in fs/proc
> 33c4294 copy address of proc_ns_ops into ns_common
> 6344c43 new helpers: ns_alloc_inum/ns_free_inum
> 6496452 make proc_ns_operations work with struct ns_common * instead of void *
> 3c04118 switch the rest of proc_ns_operations to working with &...->ns
> ff24870 netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
> 58be2825 make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
> 435d5f4 common object embedded into various struct ....ns
>
> Ok, I've got some minor jigging to do to get inum too...

Do I even need to report the device number anymore since I am concluding
s_dev is never set (or always zero) in the nsfs filesystem by
mount_pseudo() and isn't even mountable? In fact, I never needed to
report the device since proc ida/idr and inodes are kernel-global and
namespace-oblivious.

> > - I am nervous about audit logs being flooded with users creating lots
> > of namespaces. But that is more your lookout than mine.
>
> There was a thought to create a filter to en/disable this logging...
> It is an auxiliary record to syscalls, so they can be ignored by userspace tools.
>
> > - unshare is not logging when it creates new namespaces.
>
> They are all covered:
> sys_unshare > unshare_userns > create_user_ns
> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_mnt_ns
> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_utsname > clone_uts_ns
> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_ipcs > get_ipc_ns
> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_pid_ns > create_pid_namespace
> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_net_ns
>
> > As small numbers are nice and these inodes all live in their own
> > superblock now we should be able to remove the games with
> > PROC_DYNAMIC_FIRST and just use small numbers for these inodes
> > everywhere.
>
> That is compelling if I can untangle the proc inode allocation code from the
> ida/idr. Should be as easy as defining a new ns_alloc_inum (and ns_free_inum)
> to use instead of proc_alloc_inum with its own ns_inum_ida and ns_inum_lock,
> then defining a NS_DYNAMIC_FIRST and defining NS_{IPC,UTS,USER,PID}_INIT_INO in
> the place of the existing PROC_*_INIT_INO.
>
> > I have answered your comments below.
>
> More below...
>
> > > 1/10 exposes proc's ns entries structure which lists a number of useful
> > > operations per namespace type for other subsystems to use.
> > >
> > > 2/10 proc_ns: define PROC_*_INIT_INO in terms of PROC_DYNAMIC_FIRST
> > >
> > > 3/10 provides an example of usage for audit_log_task_info() which is used by
> > > syscall audits, among others. audit_log_task() and audit_common_recv_message()
> > > would be other potential use cases.
> > >
> > > Proposed output format:
> > > This differs slightly from Aristeu's patch because of the label conflict with
> > > "pid=" due to including it in existing records rather than it being a seperate
> > > record. It has now returned to being a seperate record. The proc device
> > > major/minor are listed in hexadecimal and namespace IDs are the proc inode
> > > minus the base offset.
> > > type=NS_INFO msg=audit(1408577535.306:82): dev=00:03 netns=3 utsns=-3 ipcns=-4 pidns=-1 userns=-2 mntns=0
> > >
> > > 4/10 change audit startup from __initcall to subsys_initcall to get it started
> > > earlier to be able to receive initial namespace log messages.
> > >
> > > 5/10 tracks the creation and deletion of namespaces, listing the type of
> > > namespace instance, proc device ID, related namespace id if there is one and
> > > the newly minted namespace ID.
> > >
> > > Proposed output format for initial namespace creation:
> > > type=AUDIT_NS_INIT_UTS msg=audit(1408577534.868:5): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_utsns=(none) utsns=-3 res=1
> > > type=AUDIT_NS_INIT_USER msg=audit(1408577534.868:6): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_userns=(none) userns=-2 res=1
> > > type=AUDIT_NS_INIT_PID msg=audit(1408577534.868:7): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_pidns=(none) pidns=-1 res=1
> > > type=AUDIT_NS_INIT_MNT msg=audit(1408577534.868:8): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_mntns=(none) mntns=0 res=1
> > > type=AUDIT_NS_INIT_IPC msg=audit(1408577534.868:9): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_ipcns=(none) ipcns=-4 res=1
> > > type=AUDIT_NS_INIT_NET msg=audit(1408577533.500:10): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_netns=(none) netns=2 res=1
> > >
> > > And a CLONE action would result in:
> > > type=type=AUDIT_NS_INIT_NET msg=audit(1408577535.306:81): pid=481 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 dev=00:03 old_netns=2 netns=3 res=1
> > >
> > > While deleting a namespace would result in:
> > > type=type=AUDIT_NS_DEL_MNT msg=audit(1408577552.221:85): pid=481 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 dev=00:03 mntns=4 res=1
> > >
> > > 6/10 accepts a PID from userspace and requests logging an AUDIT_NS_INFO record
> > > type (CAP_AUDIT_CONTROL required).
> > >
> > > 7/10 is a macro for CLONE_NEW_* flags.
> > >
> > > 8/10 adds auditing on creation of namespace(s) in fork.
> > >
> > > 9/10 adds auditing a change of namespace on setns.
> > >
> > > 10/10 attaches a AUDIT_NS_INFO record to AUDIT_VIRT_CONTROL records
> > > (CAP_AUDIT_WRITE required).
> > >
> > >
> > > v5 -> v6:
> > > Switch to using namespace ID based on namespace proc inode minus base offset
> > > Added proc device ID to qualify proc inode reference
> > > Eliminate exposed /proc interface
> > >
> > > v4 -> v5:
> > > Clean up prototypes for dependencies on CONFIG_NAMESPACES.
> > > Add AUDIT_NS_INFO record type to AUDIT_VIRT_CONTROL record.
> > > Log AUDIT_NS_INFO with PID.
> > > Move /proc/<pid>/ns_* patches to end of patchset to deprecate them.
> > > Log on changing ns (setns).
> > > Log on creating new namespaces when forking.
> > > Added a macro for CLONE_NEW*.
> > >
> > > v3 -> v4:
> > > Seperate out the NS_INFO message from the SYSCALL message.
> > > Moved audit_log_namespace_info() out of audit_log_task_info().
> > > Use a seperate message type per namespace type for each of INIT/DEL.
> > > Make ns= easier to search across NS_INFO and NS_INIT/DEL_XXX msg types.
> > > Add /proc/<pid>/ns/ documentation.
> > > Fix dynamic initial ns logging.
> > >
> > > v2 -> v3:
> > > Use atomic64_t in ns_serial to simplify it.
> > > Avoid funciton duplication in proc, keying on dentry.
> > > Squash down audit patch to avoid rcu sleep issues.
> > > Add tracking for creation and deletion of namespace instances.
> > >
> > > v1 -> v2:
> > > Avoid rollover by switching from an int to a long long.
> > > Change rollover behaviour from simply avoiding zero to raising a BUG.
> > > Expose serial numbers in /proc/<pid>/ns/*_snum.
> > > Expose ns_entries and use it in audit.
> > >
> > >
> > > Notes:
> > > As for CAP_AUDIT_READ, a patchset has been accepted upstream to check
> > > capabilities of userspace processes that try to join netlink broadcast groups.
> > >
> > > This set does not try to solve the non-init namespace audit messages and
> > > auditd problem yet. That will come later, likely with additional auditd
> > > instances running in another namespace with a limited ability to influence the
> > > master auditd. I echo Eric B's idea that messages destined for different
> > > namespaces would have to be tailored for that namespace with references that
> > > make sense (such as the right pid number reported to that pid namespace, and
> > > not leaking info about parents or peers).
> > >
> > > Questions:
> > > Is there a way to link serial numbers of namespaces involved in migration of a
> > > container to another kernel? It sounds like what is needed is a part of a
> > > mangement application that is able to pull the audit records from constituent
> > > hosts to build an audit trail of a container.
> >
> > I honestly don't know how much we are going to care about namespace ids
> > during migration. So far this is not a problem that has come up.
>
> Not for CRIU, but it will be an issue for a container auditor that aggregates
> information from individually auditted hosts.
>
> > I don't think migration becomes a practical concern (other than
> > interface wise) until achieve a non-init namespace auditd. The easy way
> > to handle migration would be to log a setns of every process from their
> > old namespaces to their new namespaces. As you appear to have a setns
> > event defined.
>
> Again, this would be taken care of by a layer above that is container-aware
> across multiple hosts.
>
> > How to handle the more general case beyond audit remains unclear. I
> > think it will be a little while yet before we start dealing with
> > migrating applications that care. When we do we will either need to
> > generate some kind of hot-plug event that userspace can respond to and
> > discover all of the appropriate file-system nodes have changed, or we
> > will need to build a mechanism in the kernel to preserve these numbers.
>
> I don't expect to need to preserve these numbers. The higher layer application
> will be able to do that translation.
>
> > I really don't know which solution we will wind up with in the kernel at
> > this point.
> >
> > > What additional events should list this information?
> >
> > At least unshare.
>
> Already covered as noted above. If it is a brand new namespace, it will show
> the old one as "(none)" (or maybe zero now that we are looking at renumbering
> the NS inodes). If it is an unshared one, it will show the old one from which
> it was unshared.
>
> > > Does this present any problematic information leaks? Only CAP_AUDIT_CONTROL
> > > (and now CAP_AUDIT_READ) in init_user_ns can get to this information in
> > > the init namespace at the moment from audit.
> >
> > Good question. Today access to this information is generally guarded
> > with CAP_SYS_PTRACE.
> >
> > I suspect for some of audits tracing features like this one we should
> > also use CAP_SYS_PTRACE so that we have a consistent set of checks for
> > getting information about applications.
>
> I assume CAP_SYS_PTRACE is orthogonal to CAP_AUDIT_{CONTROL,READ} and that
> CAP_SYS_PTRACE would need to be insufficient to get that information.
>
>
> Thanks for your thoughtful feedback, Eric.
>
> > Eric
> >
> > > Richard Guy Briggs (10):
> > > namespaces: expose ns_entries
> > > proc_ns: define PROC_*_INIT_INO in terms of PROC_DYNAMIC_FIRST
> > > audit: log namespace ID numbers
> > > audit: initialize at subsystem time rather than device time
> > > audit: log creation and deletion of namespace instances
> > > audit: dump namespace IDs for pid on receipt of AUDIT_NS_INFO
> > > sched: add a macro to ref all CLONE_NEW* flags
> > > fork: audit on creation of new namespace(s)
> > > audit: log on switching namespace (setns)
> > > audit: emit AUDIT_NS_INFO record with AUDIT_VIRT_CONTROL record
> > >
> > > fs/namespace.c | 13 +++
> > > fs/proc/generic.c | 3 +-
> > > fs/proc/namespaces.c | 2 +-
> > > include/linux/audit.h | 20 +++++
> > > include/linux/proc_ns.h | 10 ++-
> > > include/uapi/linux/audit.h | 21 +++++
> > > include/uapi/linux/sched.h | 6 ++
> > > ipc/namespace.c | 12 +++
> > > kernel/audit.c | 169 +++++++++++++++++++++++++++++++++++++-
> > > kernel/auditsc.c | 2 +
> > > kernel/fork.c | 3 +
> > > kernel/nsproxy.c | 4 +
> > > kernel/pid_namespace.c | 13 +++
> > > kernel/user_namespace.c | 13 +++
> > > kernel/utsname.c | 12 +++
> > > net/core/net_namespace.c | 12 +++
> > > security/integrity/ima/ima_api.c | 2 +
> > > 17 files changed, 309 insertions(+), 8 deletions(-)
>
> - RGB

- RGB

--
Richard Guy Briggs <[email protected]>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

2015-04-24 19:40:41

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH V6 00/10] namespaces: log namespaces per task

Richard Guy Briggs <[email protected]> writes:

> On 15/04/22, Richard Guy Briggs wrote:
>> On 15/04/20, Eric W. Biederman wrote:
>> > Richard Guy Briggs <[email protected]> writes:
>> >
>> > > The purpose is to track namespace instances in use by logged processes from the
>> > > perspective of init_*_ns by logging the namespace IDs (device ID and namespace
>> > > inode - offset).
>> >
>> > In broad strokes the user interface appears correct.
>> >
>> > Things that I see that concern me:
>> >
>> > - After Als most recent changes these inodes no longer live in the proc
>> > superblock so the device number reported in these patches is
>> > incorrect.
>>
>> Ok, found the patchset you're talking about:
>> 3d3d35b kill proc_ns completely
>> e149ed2 take the targets of /proc/*/ns/* symlinks to separate fs
>> f77c801 bury struct proc_ns in fs/proc
>> 33c4294 copy address of proc_ns_ops into ns_common
>> 6344c43 new helpers: ns_alloc_inum/ns_free_inum
>> 6496452 make proc_ns_operations work with struct ns_common * instead of void *
>> 3c04118 switch the rest of proc_ns_operations to working with &...->ns
>> ff24870 netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
>> 58be2825 make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
>> 435d5f4 common object embedded into various struct ....ns
>>
>> Ok, I've got some minor jigging to do to get inum too...
>
> Do I even need to report the device number anymore since I am concluding
> s_dev is never set (or always zero) in the nsfs filesystem by
> mount_pseudo() and isn't even mountable?

We still need the dev. We do have a device number get_anon_bdev fills it in.

> In fact, I never needed to
> report the device since proc ida/idr and inodes are kernel-global and
> namespace-oblivious.

This is the bit I really want to keep to be forward looking. If we
every need to preserve the inode numbers across a migration we could
have different super blocks with different inode numbers for the same
namespace.

>> > - I am nervous about audit logs being flooded with users creating lots
>> > of namespaces. But that is more your lookout than mine.
>>
>> There was a thought to create a filter to en/disable this logging...
>> It is an auxiliary record to syscalls, so they can be ignored by userspace tools.
>>
>> > - unshare is not logging when it creates new namespaces.
>>
>> They are all covered:
>> sys_unshare > unshare_userns > create_user_ns
>> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_mnt_ns
>> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_utsname > clone_uts_ns
>> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_ipcs > get_ipc_ns
>> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_pid_ns > create_pid_namespace
>> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_net_ns

Then why the special change to fork? That was not reflected on
the unshare path as far as I could see.

>> > As small numbers are nice and these inodes all live in their own
>> > superblock now we should be able to remove the games with
>> > PROC_DYNAMIC_FIRST and just use small numbers for these inodes
>> > everywhere.
>>
>> That is compelling if I can untangle the proc inode allocation code from the
>> ida/idr. Should be as easy as defining a new ns_alloc_inum (and ns_free_inum)
>> to use instead of proc_alloc_inum with its own ns_inum_ida and ns_inum_lock,
>> then defining a NS_DYNAMIC_FIRST and defining NS_{IPC,UTS,USER,PID}_INIT_INO in
>> the place of the existing PROC_*_INIT_INO.

Something like that. Just a new ida/idr allocator specific to that
superblock.

Yeah. It is somewhere on my todo, but I have been prioritizing getting
the bugs that look potentially expoloitable fixed in the mount
namespace. Al made things nice for one case but left a mess for a bunch
of others.

>> > I honestly don't know how much we are going to care about namespace ids
>> > during migration. So far this is not a problem that has come up.
>>
>> Not for CRIU, but it will be an issue for a container auditor that aggregates
>> information from individually auditted hosts.
>>
>> > I don't think migration becomes a practical concern (other than
>> > interface wise) until achieve a non-init namespace auditd. The easy way
>> > to handle migration would be to log a setns of every process from their
>> > old namespaces to their new namespaces. As you appear to have a setns
>> > event defined.
>>
>> Again, this would be taken care of by a layer above that is container-aware
>> across multiple hosts.


>> > How to handle the more general case beyond audit remains unclear. I
>> > think it will be a little while yet before we start dealing with
>> > migrating applications that care. When we do we will either need to
>> > generate some kind of hot-plug event that userspace can respond to and
>> > discover all of the appropriate file-system nodes have changed, or we
>> > will need to build a mechanism in the kernel to preserve these numbers.
>>
>> I don't expect to need to preserve these numbers. The higher layer application
>> will be able to do that translation.

We need to be very aware of what is happening.

The situation I am concerned about looks something like.

Program A:
fd1 = open(/proc/self/ns/net);
fstat(fd1, &stat1)

... later ...

fd2 = open(/var/run/netns/johnny);
fstat(fd2, &stat2);

if ((stat1.st_dev == stat2.st_dev) &&
(stat1.st_ino == stat2.st_ino)) {
/* Same netns do something... */
}


What happens when we migrate Program A with it's cached stat data of
of a network namespace file?

This requires either a hotplug event that Program A listens to or that
the inode number and device number are preserved across migration.

Exactly what we do depends on where we are when it comes up. But this
is not something some layer about the program can abstract it all out so
we don't need to worry about it.

Eric

2015-04-28 02:06:05

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH V6 00/10] namespaces: log namespaces per task

On 15/04/24, Eric W. Biederman wrote:
> Richard Guy Briggs <[email protected]> writes:
> > On 15/04/22, Richard Guy Briggs wrote:
> >> On 15/04/20, Eric W. Biederman wrote:
> >> > Richard Guy Briggs <[email protected]> writes:
> >> >
> >> > > The purpose is to track namespace instances in use by logged processes from the
> >> > > perspective of init_*_ns by logging the namespace IDs (device ID and namespace
> >> > > inode - offset).
> >> >
> >> > In broad strokes the user interface appears correct.
> >> >
> >> > Things that I see that concern me:
> >> >
> >> > - After Als most recent changes these inodes no longer live in the proc
> >> > superblock so the device number reported in these patches is
> >> > incorrect.
> >>
> >> Ok, found the patchset you're talking about:
> >> 3d3d35b kill proc_ns completely
> >> e149ed2 take the targets of /proc/*/ns/* symlinks to separate fs
> >> f77c801 bury struct proc_ns in fs/proc
> >> 33c4294 copy address of proc_ns_ops into ns_common
> >> 6344c43 new helpers: ns_alloc_inum/ns_free_inum
> >> 6496452 make proc_ns_operations work with struct ns_common * instead of void *
> >> 3c04118 switch the rest of proc_ns_operations to working with &...->ns
> >> ff24870 netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
> >> 58be2825 make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
> >> 435d5f4 common object embedded into various struct ....ns
> >>
> >> Ok, I've got some minor jigging to do to get inum too...
> >
> > Do I even need to report the device number anymore since I am concluding
> > s_dev is never set (or always zero) in the nsfs filesystem by
> > mount_pseudo() and isn't even mountable?
>
> We still need the dev. We do have a device number get_anon_bdev fills it in.

Fine, it has a device number. There appears to be only one of these
allocated per kernel. I can get it from &nsfs->fs_supers (and take the
first instance given by hlist_for_each_entry and verify there are no
others). Why do I need it, again?

> > In fact, I never needed to
> > report the device since proc ida/idr and inodes are kernel-global and
> > namespace-oblivious.
>
> This is the bit I really want to keep to be forward looking. If we
> every need to preserve the inode numbers across a migration we could
> have different super blocks with different inode numbers for the same
> namespace.

I don't quite follow your argument here, but can accept that in the
future we might add other namespace devices. I wonder if we might do
that augmentation later and leave out the device number for now...

> >> > - I am nervous about audit logs being flooded with users creating lots
> >> > of namespaces. But that is more your lookout than mine.
> >>
> >> There was a thought to create a filter to en/disable this logging...
> >> It is an auxiliary record to syscalls, so they can be ignored by userspace tools.
> >>
> >> > - unshare is not logging when it creates new namespaces.
> >>
> >> They are all covered:
> >> sys_unshare > unshare_userns > create_user_ns
> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_mnt_ns
> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_utsname > clone_uts_ns
> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_ipcs > get_ipc_ns
> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_pid_ns > create_pid_namespace
> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_net_ns
>
> Then why the special change to fork? That was not reflected on
> the unshare path as far as I could see.

Fork can specify more than one CLONE flag at once, so collecting them
all in one statementn seemed helpful. setns can only set one at a time.

> >> > As small numbers are nice and these inodes all live in their own
> >> > superblock now we should be able to remove the games with
> >> > PROC_DYNAMIC_FIRST and just use small numbers for these inodes
> >> > everywhere.
> >>
> >> That is compelling if I can untangle the proc inode allocation code from the
> >> ida/idr. Should be as easy as defining a new ns_alloc_inum (and ns_free_inum)
> >> to use instead of proc_alloc_inum with its own ns_inum_ida and ns_inum_lock,
> >> then defining a NS_DYNAMIC_FIRST and defining NS_{IPC,UTS,USER,PID}_INIT_INO in
> >> the place of the existing PROC_*_INIT_INO.
>
> Something like that. Just a new ida/idr allocator specific to that
> superblock.
>
> Yeah. It is somewhere on my todo, but I have been prioritizing getting
> the bugs that look potentially expoloitable fixed in the mount
> namespace. Al made things nice for one case but left a mess for a bunch
> of others.
>
> >> > I honestly don't know how much we are going to care about namespace ids
> >> > during migration. So far this is not a problem that has come up.
> >>
> >> Not for CRIU, but it will be an issue for a container auditor that aggregates
> >> information from individually auditted hosts.
> >>
> >> > I don't think migration becomes a practical concern (other than
> >> > interface wise) until achieve a non-init namespace auditd. The easy way
> >> > to handle migration would be to log a setns of every process from their
> >> > old namespaces to their new namespaces. As you appear to have a setns
> >> > event defined.
> >>
> >> Again, this would be taken care of by a layer above that is container-aware
> >> across multiple hosts.
>
>
> >> > How to handle the more general case beyond audit remains unclear. I
> >> > think it will be a little while yet before we start dealing with
> >> > migrating applications that care. When we do we will either need to
> >> > generate some kind of hot-plug event that userspace can respond to and
> >> > discover all of the appropriate file-system nodes have changed, or we
> >> > will need to build a mechanism in the kernel to preserve these numbers.
> >>
> >> I don't expect to need to preserve these numbers. The higher layer application
> >> will be able to do that translation.
>
> We need to be very aware of what is happening.
>
> The situation I am concerned about looks something like.
>
> Program A:
> fd1 = open(/proc/self/ns/net);
> fstat(fd1, &stat1)
>
> ... later ...
>
> fd2 = open(/var/run/netns/johnny);
> fstat(fd2, &stat2);
>
> if ((stat1.st_dev == stat2.st_dev) &&
> (stat1.st_ino == stat2.st_ino)) {
> /* Same netns do something... */
> }
>
>
> What happens when we migrate Program A with it's cached stat data of
> of a network namespace file?
>
> This requires either a hotplug event that Program A listens to or that
> the inode number and device number are preserved across migration.
>
> Exactly what we do depends on where we are when it comes up. But this
> is not something some layer about the program can abstract it all out so
> we don't need to worry about it.

Ok, understood, we can't just punt this one to a higher layer...

So this comes back to a question above, which is how do we determine
which device it is from? Sounds like we need something added to
ns_common or one of the 6 namespace types structs.

> Eric

- RGB

--
Richard Guy Briggs <[email protected]>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

2015-04-28 02:20:56

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH V6 00/10] namespaces: log namespaces per task

Richard Guy Briggs <[email protected]> writes:

> On 15/04/24, Eric W. Biederman wrote:
>> Richard Guy Briggs <[email protected]> writes:
>> > On 15/04/22, Richard Guy Briggs wrote:
>> >> On 15/04/20, Eric W. Biederman wrote:
>> >> > Richard Guy Briggs <[email protected]> writes:
>> >> >
>> >
>> > Do I even need to report the device number anymore since I am concluding
>> > s_dev is never set (or always zero) in the nsfs filesystem by
>> > mount_pseudo() and isn't even mountable?
>>
>> We still need the dev. We do have a device number get_anon_bdev fills it in.
>
> Fine, it has a device number. There appears to be only one of these
> allocated per kernel. I can get it from &nsfs->fs_supers (and take the
> first instance given by hlist_for_each_entry and verify there are no
> others). Why do I need it, again?

Because if we have to preserve the inode number over a migration event I
want to preserve the fact that we are talking about inode numbers from a
superblock with a device number.

Otherwise known as I am allergic to kernel global identifiers, because
they can be major pains. I don't want to have to go back and implement
a namespace for namespaces.

>> >> They are all covered:
>> >> sys_unshare > unshare_userns > create_user_ns
>> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_mnt_ns
>> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_utsname > clone_uts_ns
>> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_ipcs > get_ipc_ns
>> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_pid_ns > create_pid_namespace
>> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_net_ns
>>
>> Then why the special change to fork? That was not reflected on
>> the unshare path as far as I could see.
>
> Fork can specify more than one CLONE flag at once, so collecting them
> all in one statementn seemed helpful. setns can only set one at a time.

unshare can also specify more than one CLONE flag at once.

I just pointed that out becase that seemed really unsymmetrical.

> Ok, understood, we can't just punt this one to a higher layer...
>
> So this comes back to a question above, which is how do we determine
> which device it is from? Sounds like we need something added to
> ns_common or one of the 6 namespace types structs.

Or we can just hard code reading it off of the appropriate magic
filesystem. Probably what we want is a well named helper function that
does the job.

I just care that when we talk about these things we are talking about
inode numbers from a superblock that is associated with a given device
number. That way I don't have nightmares about dealing with a namespace
for namespaces.

Eric

2015-05-05 16:06:41

by Steve Grubb

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

Hello,

I think there needs to be some more discussion around this. It seems like this
is not exactly recording things that are useful for audit.

On Friday, April 17, 2015 03:35:52 AM Richard Guy Briggs wrote:
> Log the creation and deletion of namespace instances in all 6 types of
> namespaces.
>
> Twelve new audit message types have been introduced:
> AUDIT_NS_INIT_MNT 1330 /* Record mount namespace instance creation
> */ AUDIT_NS_INIT_UTS 1331 /* Record UTS namespace instance
> creation */ AUDIT_NS_INIT_IPC 1332 /* Record IPC namespace
> instance creation */ AUDIT_NS_INIT_USER 1333 /* Record USER
> namespace instance creation */ AUDIT_NS_INIT_PID 1334 /* Record
> PID namespace instance creation */ AUDIT_NS_INIT_NET 1335 /*
> Record NET namespace instance creation */ AUDIT_NS_DEL_MNT 1336
> /* Record mount namespace instance deletion */ AUDIT_NS_DEL_UTS 1337
> /* Record UTS namespace instance deletion */ AUDIT_NS_DEL_IPC
> 1338 /* Record IPC namespace instance deletion */ AUDIT_NS_DEL_USER
> 1339 /* Record USER namespace instance deletion */ AUDIT_NS_DEL_PID
> 1340 /* Record PID namespace instance deletion */ AUDIT_NS_DEL_NET
> 1341 /* Record NET namespace instance deletion */

The requirements for auditing of containers should be derived from VPP. In it,
it asks for selectable auditing, selective audit, and selective audit review.
What this means is that we need the container and all its children to have one
identifier that is inserted into all the events that are associated with the
container.

With this, its possible to do a search for all events related to a container.
Its possible to exclude events from a container. Its possible to not get any
events.

The requirements also call out for the identification of the subject. This
means that the event should be bound to a syscall such as clone, setns, or
unshare.

Also, any user space events originating inside the container needs to have the
container ID added to the user space event - just like auid and session id.

Recording each instance of a name space is giving me something that I cannot
use to do queries required by the security target. Given these events, how do
I locate a web server event where it accesses a watched file? That
authentication failed? That an update within the container failed?

The requirements are that we have to log the creation, suspension, migration,
and termination of a container. The requirements are not on the individual
name space.

Maybe I'm missing how these events give me that. But I'd like to hear how I
would be able to meet requirements with these 12 events.

-Steve


> As suggested by Eric Paris, there are 12 message types, one for each of
> creation and deletion, one for each type of namespace so that text searches
> are easier in conjunction with the AUDIT_NS_INFO message type, being able
> to search for all records such as "netns=4 " and to avoid fields
> disappearing per message type to make ausearch more efficient.
>
> A typical startup would look roughly like:
>
> type=AUDIT_NS_INIT_UTS msg=audit(1408577534.868:5): pid=1 uid=0
> auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_utsns=(none)
> utsns=-2 res=1 type=AUDIT_NS_INIT_USER msg=audit(1408577534.868:6): pid=1
> uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03
> old_userns=(none) userns=-3 res=1 type=AUDIT_NS_INIT_PID
> msg=audit(1408577534.868:7): pid=1 uid=0 auid=4294967295 ses=4294967295
> subj=kernel dev=00:03 old_pidns=(none) pidns=-4 res=1
> type=AUDIT_NS_INIT_MNT msg=audit(1408577534.868:8): pid=1 uid=0
> auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_mntns=(none)
> mntns=0 res=1 type=AUDIT_NS_INIT_IPC msg=audit(1408577534.868:9): pid=1
> uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_ipcns=(none)
> ipcns=-1 res=1 type=AUDIT_NS_INIT_NET msg=audit(1408577533.500:10): pid=1
> uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_netns=(none)
> netns=2 res=1
>
> And a CLONE action would result in:
> type=type=AUDIT_NS_INIT_NET msg=audit(1408577535.306:81): pid=481 uid=0
> auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 dev=00:03
> old_netns=2 netns=3 res=1
>
> While deleting a namespace would result in:
> type=type=AUDIT_NS_DEL_MNT msg=audit(1408577552.221:85): pid=481 uid=0
> auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 dev=00:03
> mntns=4 res=1
>
> If not "(none)", old_XXXns lists the namespace from which it was cloned.
>
> Signed-off-by: Richard Guy Briggs <[email protected]>
> ---
> fs/namespace.c | 13 +++++++++
> include/linux/audit.h | 8 +++++
> include/uapi/linux/audit.h | 12 ++++++++
> ipc/namespace.c | 12 ++++++++
> kernel/audit.c | 64
> ++++++++++++++++++++++++++++++++++++++++++++ kernel/pid_namespace.c |
> 13 +++++++++
> kernel/user_namespace.c | 13 +++++++++
> kernel/utsname.c | 12 ++++++++
> net/core/net_namespace.c | 12 ++++++++
> 9 files changed, 159 insertions(+), 0 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 182bc41..7b62543 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -24,6 +24,7 @@
> #include <linux/proc_ns.h>
> #include <linux/magic.h>
> #include <linux/bootmem.h>
> +#include <linux/audit.h>
> #include "pnode.h"
> #include "internal.h"
>
> @@ -2459,6 +2460,7 @@ dput_out:
>
> static void free_mnt_ns(struct mnt_namespace *ns)
> {
> + audit_log_ns_del(AUDIT_NS_DEL_MNT, ns->proc_inum);
> proc_free_inum(ns->proc_inum);
> put_user_ns(ns->user_ns);
> kfree(ns);
> @@ -2518,6 +2520,7 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags,
> struct mnt_namespace *ns, new_ns = alloc_mnt_ns(user_ns);
> if (IS_ERR(new_ns))
> return new_ns;
> + audit_log_ns_init(AUDIT_NS_INIT_MNT, ns->proc_inum, new_ns->proc_inum);
>
> namespace_lock();
> /* First pass: copy the tree topology */
> @@ -2830,6 +2833,16 @@ static void __init init_mount_tree(void)
> set_fs_root(current->fs, &root);
> }
>
> +/* log the ID of init mnt namespace after audit service starts */
> +static int __init mnt_ns_init_log(void)
> +{
> + struct mnt_namespace *init_mnt_ns = init_task.nsproxy->mnt_ns;
> +
> + audit_log_ns_init(AUDIT_NS_INIT_MNT, 0, init_mnt_ns->proc_inum);
> + return 0;
> +}
> +late_initcall(mnt_ns_init_log);
> +
> void __init mnt_init(void)
> {
> unsigned u;
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 71698ec..b28dfb0 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -484,6 +484,9 @@ extern void audit_log_ns_info(struct
task_struct
> *tsk); static inline void audit_log_ns_info(struct task_struct *tsk) {
> }
> #endif
> +extern void audit_log_ns_init(int type, unsigned int old_inum,
> + unsigned int inum);
> +extern void audit_log_ns_del(int type, unsigned int inum);
>
> extern int audit_update_lsm_rules(void);
>
> @@ -542,6 +545,11 @@ static inline void audit_log_task_info(struct
> audit_buffer *ab, { }
> static inline void audit_log_ns_info(struct task_struct *tsk)
> { }
> +static inline int audit_log_ns_init(int type, unsigned int old_inum,
> + unsigned int inum)
> +{ }
> +static inline int audit_log_ns_del(int type, unsigned int inum)
> +{ }
> #define audit_enabled 0
> #endif /* CONFIG_AUDIT */
> static inline void audit_log_string(struct audit_buffer *ab, const char
> *buf) diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index 1ffb151..487cad6 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -111,6 +111,18 @@
> #define AUDIT_PROCTITLE 1327 /* Proctitle emit event */
> #define AUDIT_FEATURE_CHANGE 1328 /* audit log listing feature changes
*/
> #define AUDIT_NS_INFO 1329 /* Record process namespace IDs */
> +#define AUDIT_NS_INIT_MNT 1330 /* Record mount namespace instance
creation
> */ +#define AUDIT_NS_INIT_UTS 1331 /* Record UTS namespace instance
> creation */ +#define AUDIT_NS_INIT_IPC 1332 /* Record IPC namespace
> instance creation */ +#define AUDIT_NS_INIT_USER 1333 /* Record USER
> namespace instance creation */ +#define AUDIT_NS_INIT_PID 1334 /* Record
> PID namespace instance creation */ +#define AUDIT_NS_INIT_NET 1335 /*
> Record NET namespace instance creation */ +#define AUDIT_NS_DEL_MNT 1336
/*
> Record mount namespace instance deletion */ +#define
> AUDIT_NS_DEL_UTS 1337 /* Record UTS namespace instance deletion */
+#define
> AUDIT_NS_DEL_IPC 1338 /* Record IPC namespace instance deletion */
+#define
> AUDIT_NS_DEL_USER 1339 /* Record USER namespace instance deletion */
> +#define AUDIT_NS_DEL_PID 1340 /* Record PID namespace instance
deletion */
> +#define AUDIT_NS_DEL_NET 1341 /* Record NET namespace instance deletion
*/
>
> #define AUDIT_AVC 1400 /* SE Linux avc denial or grant */
> #define AUDIT_SELINUX_ERR 1401 /* Internal SE Linux Errors */
> diff --git a/ipc/namespace.c b/ipc/namespace.c
> index 59451c1..73727ce 100644
> --- a/ipc/namespace.c
> +++ b/ipc/namespace.c
> @@ -13,6 +13,7 @@
> #include <linux/mount.h>
> #include <linux/user_namespace.h>
> #include <linux/proc_ns.h>
> +#include <linux/audit.h>
>
> #include "util.h"
>
> @@ -41,6 +42,8 @@ static struct ipc_namespace *create_ipc_ns(struct
> user_namespace *user_ns, }
> atomic_inc(&nr_ipc_ns);
>
> + audit_log_ns_init(AUDIT_NS_INIT_IPC, old_ns->proc_inum, ns->proc_inum);
> +
> sem_init_ns(ns);
> msg_init_ns(ns);
> shm_init_ns(ns);
> @@ -119,6 +122,7 @@ static void free_ipc_ns(struct ipc_namespace *ns)
> */
> ipcns_notify(IPCNS_REMOVED);
> put_user_ns(ns->user_ns);
> + audit_log_ns_del(AUDIT_NS_DEL_IPC, ns->proc_inum);
> proc_free_inum(ns->proc_inum);
> kfree(ns);
> }
> @@ -197,3 +201,11 @@ const struct proc_ns_operations ipcns_operations = {
> .install = ipcns_install,
> .inum = ipcns_inum,
> };
> +
> +/* log the ID of init IPC namespace after audit service starts */
> +static int __init ipc_namespaces_init(void)
> +{
> + audit_log_ns_init(AUDIT_NS_INIT_IPC, 0, init_ipc_ns.proc_inum);
> + return 0;
> +}
> +late_initcall(ipc_namespaces_init);
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 63f32f4..e6230c4 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -1978,6 +1978,70 @@ out:
> kfree(name);
> }
>
> +#ifdef CONFIG_NAMESPACES
> +static char *ns_name[] = {
> + "mnt",
> + "uts",
> + "ipc",
> + "user",
> + "pid",
> + "net",
> +};
> +
> +/**
> + * audit_log_ns_init - report a namespace instance creation
> + * @type: type of audit namespace instance created message
> + * @old_inum: the ID number of the cloned namespace instance
> + * @inum: the ID number of the new namespace instance
> + */
> +void audit_log_ns_init(int type, unsigned int old_inum, unsigned int inum)
> +{
> + struct audit_buffer *ab;
> + char *audit_ns_name = ns_name[type - AUDIT_NS_INIT_MNT];
> + struct vfsmount *mnt = task_active_pid_ns(current)->proc_mnt;
> + struct super_block *sb = mnt->mnt_sb;
> + char old_ns[16];
> +
> + if (type < AUDIT_NS_INIT_MNT || type > AUDIT_NS_INIT_NET) {
> + WARN(1, "audit_log_ns_init: type:%d out of range", type);
> + return;
> + }
> + if (!old_inum)
> + sprintf(old_ns, "(none)");
> + else
> + sprintf(old_ns, "%d", old_inum - PROC_DYNAMIC_FIRST);
> + audit_log_common_recv_msg(&ab, type);
> + audit_log_format(ab, " dev=%02x:%02x old_%sns=%s %sns=%d res=1",
> + MAJOR(sb->s_dev), MINOR(sb->s_dev),
> + audit_ns_name, old_ns,
> + audit_ns_name, inum - PROC_DYNAMIC_FIRST);
> + audit_log_end(ab);
> +}
> +
> +/**
> + * audit_log_ns_del - report a namespace instance deleted
> + * @type: type of audit namespace instance deleted message
> + * @inum: the ID number of the namespace instance
> + */
> +void audit_log_ns_del(int type, unsigned int inum)
> +{
> + struct audit_buffer *ab;
> + char *audit_ns_name = ns_name[type - AUDIT_NS_DEL_MNT];
> + struct vfsmount *mnt = task_active_pid_ns(current)->proc_mnt;
> + struct super_block *sb = mnt->mnt_sb;
> +
> + if (type < AUDIT_NS_DEL_MNT || type > AUDIT_NS_DEL_NET) {
> + WARN(1, "audit_log_ns_del: type:%d out of range", type);
> + return;
> + }
> + audit_log_common_recv_msg(&ab, type);
> + audit_log_format(ab, " dev=%02x:%02x %sns=%d res=1",
> + MAJOR(sb->s_dev), MINOR(sb->s_dev), audit_ns_name,
> + inum - PROC_DYNAMIC_FIRST);
> + audit_log_end(ab);
> +}
> +#endif /* CONFIG_NAMESPACES */
> +
> /**
> * audit_log_end - end one audit record
> * @ab: the audit_buffer
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index db95d8e..d28fd14 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -18,6 +18,7 @@
> #include <linux/proc_ns.h>
> #include <linux/reboot.h>
> #include <linux/export.h>
> +#include <linux/audit.h>
>
> struct pid_cache {
> int nr_ids;
> @@ -109,6 +110,9 @@ static struct pid_namespace *create_pid_namespace(struct
> user_namespace *user_ns if (err)
> goto out_free_map;
>
> + audit_log_ns_init(AUDIT_NS_INIT_PID, parent_pid_ns->proc_inum,
> + ns->proc_inum);
> +
> kref_init(&ns->kref);
> ns->level = level;
> ns->parent = get_pid_ns(parent_pid_ns);
> @@ -142,6 +146,7 @@ static void destroy_pid_namespace(struct pid_namespace
> *ns) {
> int i;
>
> + audit_log_ns_del(AUDIT_NS_DEL_PID, ns->proc_inum);
> proc_free_inum(ns->proc_inum);
> for (i = 0; i < PIDMAP_ENTRIES; i++)
> kfree(ns->pidmap[i].page);
> @@ -388,3 +393,11 @@ static __init int pid_namespaces_init(void)
> }
>
> __initcall(pid_namespaces_init);
> +
> +/* log the ID of init PID namespace after audit service starts */
> +static __init int pid_namespaces_late_init(void)
> +{
> + audit_log_ns_init(AUDIT_NS_INIT_PID, 0, init_pid_ns.proc_inum);
> + return 0;
> +}
> +late_initcall(pid_namespaces_late_init);
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index fcc0256..89c2517 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -22,6 +22,7 @@
> #include <linux/ctype.h>
> #include <linux/projid.h>
> #include <linux/fs_struct.h>
> +#include <linux/audit.h>
>
> static struct kmem_cache *user_ns_cachep __read_mostly;
>
> @@ -92,6 +93,9 @@ int create_user_ns(struct cred *new)
> return ret;
> }
>
> + audit_log_ns_init(AUDIT_NS_INIT_USER, parent_ns->proc_inum,
> + ns->proc_inum);
> +
> atomic_set(&ns->count, 1);
> /* Leave the new->user_ns reference with the new user namespace. */
> ns->parent = parent_ns;
> @@ -136,6 +140,7 @@ void free_user_ns(struct user_namespace *ns)
> #ifdef CONFIG_PERSISTENT_KEYRINGS
> key_put(ns->persistent_keyring_register);
> #endif
> + audit_log_ns_del(AUDIT_NS_DEL_USER, ns->proc_inum);
> proc_free_inum(ns->proc_inum);
> kmem_cache_free(user_ns_cachep, ns);
> ns = parent;
> @@ -909,3 +914,11 @@ static __init int user_namespaces_init(void)
> return 0;
> }
> subsys_initcall(user_namespaces_init);
> +
> +/* log the ID of init user namespace after audit service starts */
> +static __init int user_namespaces_late_init(void)
> +{
> + audit_log_ns_init(AUDIT_NS_INIT_USER, 0, init_user_ns.proc_inum);
> + return 0;
> +}
> +late_initcall(user_namespaces_late_init);
> diff --git a/kernel/utsname.c b/kernel/utsname.c
> index fd39312..fa21e8d 100644
> --- a/kernel/utsname.c
> +++ b/kernel/utsname.c
> @@ -16,6 +16,7 @@
> #include <linux/slab.h>
> #include <linux/user_namespace.h>
> #include <linux/proc_ns.h>
> +#include <linux/audit.h>
>
> static struct uts_namespace *create_uts_ns(void)
> {
> @@ -48,6 +49,8 @@ static struct uts_namespace *clone_uts_ns(struct
> user_namespace *user_ns, return ERR_PTR(err);
> }
>
> + audit_log_ns_init(AUDIT_NS_INIT_UTS, old_ns->proc_inum, ns->proc_inum);
> +
> down_read(&uts_sem);
> memcpy(&ns->name, &old_ns->name, sizeof(ns->name));
> ns->user_ns = get_user_ns(user_ns);
> @@ -84,6 +87,7 @@ void free_uts_ns(struct kref *kref)
>
> ns = container_of(kref, struct uts_namespace, kref);
> put_user_ns(ns->user_ns);
> + audit_log_ns_del(AUDIT_NS_DEL_UTS, ns->proc_inum);
> proc_free_inum(ns->proc_inum);
> kfree(ns);
> }
> @@ -138,3 +142,11 @@ const struct proc_ns_operations utsns_operations = {
> .install = utsns_install,
> .inum = utsns_inum,
> };
> +
> +/* log the ID of init UTS namespace after audit service starts */
> +static int __init uts_namespaces_init(void)
> +{
> + audit_log_ns_init(AUDIT_NS_INIT_UTS, 0, init_uts_ns.proc_inum);
> + return 0;
> +}
> +late_initcall(uts_namespaces_init);
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 85b6269..562eb85 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -17,6 +17,7 @@
> #include <linux/user_namespace.h>
> #include <net/net_namespace.h>
> #include <net/netns/generic.h>
> +#include <linux/audit.h>
>
> /*
> * Our network namespace constructor/destructor lists
> @@ -253,6 +254,8 @@ struct net *copy_net_ns(unsigned long flags,
> mutex_lock(&net_mutex);
> rv = setup_net(net, user_ns);
> if (rv == 0) {
> + audit_log_ns_init(AUDIT_NS_INIT_NET, old_net->proc_inum,
> + net->proc_inum);
> rtnl_lock();
> list_add_tail_rcu(&net->list, &net_namespace_list);
> rtnl_unlock();
> @@ -389,6 +392,7 @@ static __net_init int net_ns_net_init(struct net *net)
>
> static __net_exit void net_ns_net_exit(struct net *net)
> {
> + audit_log_ns_del(AUDIT_NS_DEL_NET, net->proc_inum);
> proc_free_inum(net->proc_inum);
> }
>
> @@ -435,6 +439,14 @@ static int __init net_ns_init(void)
>
> pure_initcall(net_ns_init);
>
> +/* log the ID of init_net namespace after audit service starts */
> +static int __init net_ns_init_log(void)
> +{
> + audit_log_ns_init(AUDIT_NS_INIT_NET, 0, init_net.proc_inum);
> + return 0;
> +}
> +late_initcall(net_ns_init_log);
> +
> #ifdef CONFIG_NET_NS
> static int __register_pernet_operations(struct list_head *list,
> struct pernet_operations *ops)

2015-05-05 16:07:01

by Aristeu Rozanski

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

Hi Steve,
On Tue, May 05, 2015 at 10:22:32AM -0400, Steve Grubb wrote:
> The requirements for auditing of containers should be derived from VPP. In it,
> it asks for selectable auditing, selective audit, and selective audit review.
> What this means is that we need the container and all its children to have one
> identifier that is inserted into all the events that are associated with the
> container.
>
> With this, its possible to do a search for all events related to a container.
> Its possible to exclude events from a container. Its possible to not get any
> events.
>
> The requirements also call out for the identification of the subject. This
> means that the event should be bound to a syscall such as clone, setns, or
> unshare.
>
> Also, any user space events originating inside the container needs to have the
> container ID added to the user space event - just like auid and session id.
>
> Recording each instance of a name space is giving me something that I cannot
> use to do queries required by the security target. Given these events, how do
> I locate a web server event where it accesses a watched file? That
> authentication failed? That an update within the container failed?
>
> The requirements are that we have to log the creation, suspension, migration,
> and termination of a container. The requirements are not on the individual
> name space.
>
> Maybe I'm missing how these events give me that. But I'd like to hear how I
> would be able to meet requirements with these 12 events.

what about cases you don't use lxc, libvirt to create namespaces? It's
easier if the logging is done by namespaces and in case they're created
by any container manager, it can generate a new event notifying it
created a container named "foo" with these namespaces: x, y, z, w and
from that you can piece together everything that happened. Userspace
tools can change to adapt to using namespaces and the idea of container
to make it easier to lookup for events instead of relying on a number
that might not be there (think someone using unshare, ip netns, ...). It
was discussed in the past and having the concept of "container" in
kernel space and it's not going to happen, so userspace should deal with
it.

--
Aristeu

2015-05-05 16:08:09

by Steve Grubb

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On Tuesday, May 05, 2015 10:31:20 AM Aristeu Rozanski wrote:
> Hi Steve,
>
> On Tue, May 05, 2015 at 10:22:32AM -0400, Steve Grubb wrote:
> > The requirements for auditing of containers should be derived from VPP. In
> > it, it asks for selectable auditing, selective audit, and selective audit
> > review. What this means is that we need the container and all its
> > children to have one identifier that is inserted into all the events that
> > are associated with the container.
> >
> > With this, its possible to do a search for all events related to a
> > container. Its possible to exclude events from a container. Its possible
> > to not get any events.
> >
> > The requirements also call out for the identification of the subject. This
> > means that the event should be bound to a syscall such as clone, setns, or
> > unshare.
> >
> > Also, any user space events originating inside the container needs to have
> > the container ID added to the user space event - just like auid and
> > session id.
> >
> > Recording each instance of a name space is giving me something that I
> > cannot use to do queries required by the security target. Given these
> > events, how do I locate a web server event where it accesses a watched
> > file? That authentication failed? That an update within the container
> > failed?
> >
> > The requirements are that we have to log the creation, suspension,
> > migration, and termination of a container. The requirements are not on
> > the individual name space.
> >
> > Maybe I'm missing how these events give me that. But I'd like to hear how
> > I would be able to meet requirements with these 12 events.
>
> what about cases you don't use lxc, libvirt to create namespaces?

There's a pretty good chance that we don't care. We've had file system
namespace for about 8 or 9 years and we never needed to have a namespace
identifier added.

> It's easier if the logging is done by namespaces and in case they're created
> by any container manager, it can generate a new event notifying it
> created a container named "foo" with these namespaces: x, y, z, w and
> from that you can piece together everything that happened.

OK, if they are emitted they should be an auxiliary record to clone, setns, or
unshare system calls. But lets go down this path. We have 6 or so name spaces.
These identifiers will need to be added to every single event in the system so
that I can figure out what event belongs to which container.


> Userspace tools can change to adapt to using namespaces and the idea of
> container to make it easier to lookup for events instead of relying on a
> number that might not be there (think someone using unshare, ip netns, ...).

That's what I am trying to do...figure out how I can these identifiers to see if
this actually solves the problem. This is why I wanted to state the actual
requirements. Its easy to lose the overall view.

Also, I am concerned about how much extra disk space this is going to eat up.


> It was discussed in the past and having the concept of "container" in
> kernel space and it's not going to happen, so userspace should deal with
> it.

This is what I am asking for help with. How do I locate an authentication
event from container using the information in these events?

-Steve

2015-05-05 16:08:30

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

Steve Grubb <[email protected]> writes:

> The requirements for auditing of containers should be derived from VPP. In it,
> it asks for selectable auditing, selective audit, and selective audit review.
> What this means is that we need the container and all its children to have one
> identifier that is inserted into all the events that are associated with the
> container.

That is technically impossible. Nested containers exist.

That is when container G is nested in container F which is in turn
nested in container E which is in turn nested in container D which is in
turn nested in container C which is in turn nested in container B which
is nested in container A there is no one label you can put on audit
messages from container G which is the ``correct'' one.

Or are you proposing that something in container G have labels
A B C D E F G included on every audit message? That introduces enough
complexity in generating and parsing the messages I wouldn't trust those
messages as the least bug in generation and parsing would be a security
issue.

What is the world is VPP? It sounds like something non-public thing.
Certainly it has never been a part of the public container discussion
and as such it appears to be completely ridiculous to bring up in a
public discussion.

Eric

2015-05-05 16:09:47

by Steve Grubb

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On Tuesday, May 05, 2015 09:56:03 AM Eric W. Biederman wrote:
> Steve Grubb <[email protected]> writes:
> > The requirements for auditing of containers should be derived from VPP. In
> > it, it asks for selectable auditing, selective audit, and selective audit
> > review. What this means is that we need the container and all its
> > children to have one identifier that is inserted into all the events that
> > are associated with the container.
>
> That is technically impossible. Nested containers exist.

OK, then lets talk about that, too. When something is 2 layers deep, the
outside world cannot make sense of it. The inner one can be a loopback mounted
file in the outer one. That means that I need the container itself to be
responsible for events so that things are recorded using paths, uids, and pids
that make sense to it. It can enrich the events and send them to the outer
container.


> That is when container G is nested in container F which is in turn
> nested in container E which is in turn nested in container D which is in
> turn nested in container C which is in turn nested in container B which
> is nested in container A there is no one label you can put on audit
> messages from container G which is the ``correct'' one.
>
> Or are you proposing that something in container G have labels
> A B C D E F G included on every audit message?

We need to have audit events to either be globally tagged so that the outside
world understand what happening no matter how deep. Or we need each layer to
be responsible for itself. This means having an audit rule match engine for
each namespace like netfilter is to networking.


> That introduces enough complexity in generating and parsing the messages I
> wouldn't trust those messages as the least bug in generation and parsing
> would be a security issue.

That goes with the territory.


> What is the world is VPP?

Virtualization Protection Profile. Before people say it doesn't apply, it kind
of does. It defines the necessary security mechanisms for either full blown
virt like QEMU/Xen based or it gives enough wiggle room for containers and
other types of VMs. Specifically, it defines the audit requirements needed for
this kind of technology.


> It sounds like something non-public thing. Certainly it has never been a
> part of the public container discussion and as such it appears to be
> completely ridiculous to bring up in a public discussion.

No, its a public thing. Audit requirements start in section 5.2:

https://www.niap-ccevs.org/pp/PP_SV_V1.0/

-Steve

2015-05-08 14:42:58

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH V6 00/10] namespaces: log namespaces per task

On 15/04/27, Eric W. Biederman wrote:
> Richard Guy Briggs <[email protected]> writes:
> > On 15/04/24, Eric W. Biederman wrote:
> >> Richard Guy Briggs <[email protected]> writes:
> >> > On 15/04/22, Richard Guy Briggs wrote:
> >> >> On 15/04/20, Eric W. Biederman wrote:
> >> >> > Richard Guy Briggs <[email protected]> writes:
> >> > Do I even need to report the device number anymore since I am concluding
> >> > s_dev is never set (or always zero) in the nsfs filesystem by
> >> > mount_pseudo() and isn't even mountable?
> >>
> >> We still need the dev. We do have a device number get_anon_bdev fills it in.
> >
> > Fine, it has a device number. There appears to be only one of these
> > allocated per kernel. I can get it from &nsfs->fs_supers (and take the
> > first instance given by hlist_for_each_entry and verify there are no
> > others). Why do I need it, again?
>
> Because if we have to preserve the inode number over a migration event I
> want to preserve the fact that we are talking about inode numbers from a
> superblock with a device number.
>
> Otherwise known as I am allergic to kernel global identifiers, because
> they can be major pains. I don't want to have to go back and implement
> a namespace for namespaces.

Alright, I'll change the device over to that... We can figure out how
to select the correct device number of nsfs instances if it increases
beyond one.

> >> >> They are all covered:
> >> >> sys_unshare > unshare_userns > create_user_ns
> >> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_mnt_ns
> >> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_utsname > clone_uts_ns
> >> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_ipcs > get_ipc_ns
> >> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_pid_ns > create_pid_namespace
> >> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_net_ns
> >>
> >> Then why the special change to fork? That was not reflected on
> >> the unshare path as far as I could see.
> >
> > Fork can specify more than one CLONE flag at once, so collecting them
> > all in one statementn seemed helpful. setns can only set one at a time.
>
> unshare can also specify more than one CLONE flag at once.
> I just pointed that out becase that seemed really unsymmetrical.

Ah sorry, my mistake, I was thinking setns... I've added a call in
sys_unshare().

> > Ok, understood, we can't just punt this one to a higher layer...
> >
> > So this comes back to a question above, which is how do we determine
> > which device it is from? Sounds like we need something added to
> > ns_common or one of the 6 namespace types structs.
>
> Or we can just hard code reading it off of the appropriate magic
> filesystem. Probably what we want is a well named helper function that
> does the job.

There is a bit of overhead to read that, so I've added a dev_t member to
ns_common. Simplest way I found was to call iterate_supers() since
struct file_system_type *nsfs isn't exposed.

> I just care that when we talk about these things we are talking about
> inode numbers from a superblock that is associated with a given device
> number. That way I don't have nightmares about dealing with a namespace
> for namespaces.
>
> Eric

- RGB

--
Richard Guy Briggs <[email protected]>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

2015-05-12 19:58:09

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On 15/05/05, Steve Grubb wrote:
> Hello,
>
> I think there needs to be some more discussion around this. It seems like this
> is not exactly recording things that are useful for audit.

It seems to me that either audit has to assemble that information, or
the kernel has to do so. The kernel doesn't know about containers
(yet?).

> On Friday, April 17, 2015 03:35:52 AM Richard Guy Briggs wrote:
> > Log the creation and deletion of namespace instances in all 6 types of
> > namespaces.
> >
> > Twelve new audit message types have been introduced:
> > AUDIT_NS_INIT_MNT 1330 /* Record mount namespace instance creation
> > */ AUDIT_NS_INIT_UTS 1331 /* Record UTS namespace instance
> > creation */ AUDIT_NS_INIT_IPC 1332 /* Record IPC namespace
> > instance creation */ AUDIT_NS_INIT_USER 1333 /* Record USER
> > namespace instance creation */ AUDIT_NS_INIT_PID 1334 /* Record
> > PID namespace instance creation */ AUDIT_NS_INIT_NET 1335 /*
> > Record NET namespace instance creation */ AUDIT_NS_DEL_MNT 1336
> > /* Record mount namespace instance deletion */ AUDIT_NS_DEL_UTS 1337
> > /* Record UTS namespace instance deletion */ AUDIT_NS_DEL_IPC
> > 1338 /* Record IPC namespace instance deletion */ AUDIT_NS_DEL_USER
> > 1339 /* Record USER namespace instance deletion */ AUDIT_NS_DEL_PID
> > 1340 /* Record PID namespace instance deletion */ AUDIT_NS_DEL_NET
> > 1341 /* Record NET namespace instance deletion */
>
> The requirements for auditing of containers should be derived from VPP. In it,
> it asks for selectable auditing, selective audit, and selective audit review.
> What this means is that we need the container and all its children to have one
> identifier that is inserted into all the events that are associated with the
> container.

Is that requirement for the records that are sent from the kernel, or
for the records stored by auditd, or by another facility that delivers
those records to a final consumer?

> With this, its possible to do a search for all events related to a container.
> Its possible to exclude events from a container. Its possible to not get any
> events.
>
> The requirements also call out for the identification of the subject. This
> means that the event should be bound to a syscall such as clone, setns, or
> unshare.

Is it useful to have a reference of the init namespace set from which
all others are spawned?

If it isn't bound, I assume the subject should be added to the message
format? I'm thinking of messages without an audit_context such as audit
user messages (such as AUDIT_NS_INFO and AUDIT_VIRT_CONTROL).

For now, we should not need to log namespaces with AUDIT_FEATURE_CHANGE
or AUDIT_CONFIG_CHANGE messages since only initial user namespace with
initial pid namespace has permission to do so. This will need to be
addressed by having non-init config changes be limited to that container
or set of namespaces and possibly its children. The other possibility
is to add the subject to the stand-alone message.

> Also, any user space events originating inside the container needs to have the
> container ID added to the user space event - just like auid and session id.

This sounds like every task needs to record a container ID since that
information is otherwise unknown by the kernel except by what might be
provided by an audit user message such as AUDIT_VIRT_CONTROL or possibly
the new AUDIT_NS_INFO request. It could be stored in struct task_struct
or in struct audit_context. I don't have a suggestion on how to get
that information securely into the kernel.

> Recording each instance of a name space is giving me something that I cannot
> use to do queries required by the security target. Given these events, how do
> I locate a web server event where it accesses a watched file? That
> authentication failed? That an update within the container failed?
>
> The requirements are that we have to log the creation, suspension, migration,
> and termination of a container. The requirements are not on the individual
> name space.

Ok. Do we have a robust definition of a container? Where is that
definition managed? If it is a userspace concept, then I think either
userspace should be assembling this information, or providing that
information to the entity that will be expected to know about and
provide it.

> Maybe I'm missing how these events give me that. But I'd like to hear how I
> would be able to meet requirements with these 12 events.

Adding the infrastructure to give each of those 12 events an audit
context to be able to give meaningful subject fields in audit records
appears to require adding a struct task_struct argument to calls to
copy_mnt_ns(), copy_utsname(), copy_ipcs(), copy_pid_ns(),
copy_net_ns(), create_user_ns() unless I use current. I think we must
use current since the userns is created before the spawned process is
mature or has an audit context in the case of clone.

Either that, or I have mis-understood and I should be stashing this
namespace ID information in an audit_aux_data structure or a more
permanent part of struct audit_context to be printed when required on
syscall exit. I'm trying to think through if it is needed in any
non-syscall audit messages.

Another RFC patch set coming...

> -Steve
>
> > As suggested by Eric Paris, there are 12 message types, one for each of
> > creation and deletion, one for each type of namespace so that text searches
> > are easier in conjunction with the AUDIT_NS_INFO message type, being able
> > to search for all records such as "netns=4 " and to avoid fields
> > disappearing per message type to make ausearch more efficient.
> >
> > A typical startup would look roughly like:
> >
> > type=AUDIT_NS_INIT_UTS msg=audit(1408577534.868:5): pid=1 uid=0
> > auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_utsns=(none)
> > utsns=-2 res=1 type=AUDIT_NS_INIT_USER msg=audit(1408577534.868:6): pid=1
> > uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03
> > old_userns=(none) userns=-3 res=1 type=AUDIT_NS_INIT_PID
> > msg=audit(1408577534.868:7): pid=1 uid=0 auid=4294967295 ses=4294967295
> > subj=kernel dev=00:03 old_pidns=(none) pidns=-4 res=1
> > type=AUDIT_NS_INIT_MNT msg=audit(1408577534.868:8): pid=1 uid=0
> > auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_mntns=(none)
> > mntns=0 res=1 type=AUDIT_NS_INIT_IPC msg=audit(1408577534.868:9): pid=1
> > uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_ipcns=(none)
> > ipcns=-1 res=1 type=AUDIT_NS_INIT_NET msg=audit(1408577533.500:10): pid=1
> > uid=0 auid=4294967295 ses=4294967295 subj=kernel dev=00:03 old_netns=(none)
> > netns=2 res=1
> >
> > And a CLONE action would result in:
> > type=type=AUDIT_NS_INIT_NET msg=audit(1408577535.306:81): pid=481 uid=0
> > auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 dev=00:03
> > old_netns=2 netns=3 res=1
> >
> > While deleting a namespace would result in:
> > type=type=AUDIT_NS_DEL_MNT msg=audit(1408577552.221:85): pid=481 uid=0
> > auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 dev=00:03
> > mntns=4 res=1
> >
> > If not "(none)", old_XXXns lists the namespace from which it was cloned.
> >
> > Signed-off-by: Richard Guy Briggs <[email protected]>
> > ---
> > fs/namespace.c | 13 +++++++++
> > include/linux/audit.h | 8 +++++
> > include/uapi/linux/audit.h | 12 ++++++++
> > ipc/namespace.c | 12 ++++++++
> > kernel/audit.c | 64
> > ++++++++++++++++++++++++++++++++++++++++++++ kernel/pid_namespace.c |
> > 13 +++++++++
> > kernel/user_namespace.c | 13 +++++++++
> > kernel/utsname.c | 12 ++++++++
> > net/core/net_namespace.c | 12 ++++++++
> > 9 files changed, 159 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index 182bc41..7b62543 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -24,6 +24,7 @@
> > #include <linux/proc_ns.h>
> > #include <linux/magic.h>
> > #include <linux/bootmem.h>
> > +#include <linux/audit.h>
> > #include "pnode.h"
> > #include "internal.h"
> >
> > @@ -2459,6 +2460,7 @@ dput_out:
> >
> > static void free_mnt_ns(struct mnt_namespace *ns)
> > {
> > + audit_log_ns_del(AUDIT_NS_DEL_MNT, ns->proc_inum);
> > proc_free_inum(ns->proc_inum);
> > put_user_ns(ns->user_ns);
> > kfree(ns);
> > @@ -2518,6 +2520,7 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags,
> > struct mnt_namespace *ns, new_ns = alloc_mnt_ns(user_ns);
> > if (IS_ERR(new_ns))
> > return new_ns;
> > + audit_log_ns_init(AUDIT_NS_INIT_MNT, ns->proc_inum, new_ns->proc_inum);
> >
> > namespace_lock();
> > /* First pass: copy the tree topology */
> > @@ -2830,6 +2833,16 @@ static void __init init_mount_tree(void)
> > set_fs_root(current->fs, &root);
> > }
> >
> > +/* log the ID of init mnt namespace after audit service starts */
> > +static int __init mnt_ns_init_log(void)
> > +{
> > + struct mnt_namespace *init_mnt_ns = init_task.nsproxy->mnt_ns;
> > +
> > + audit_log_ns_init(AUDIT_NS_INIT_MNT, 0, init_mnt_ns->proc_inum);
> > + return 0;
> > +}
> > +late_initcall(mnt_ns_init_log);
> > +
> > void __init mnt_init(void)
> > {
> > unsigned u;
> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index 71698ec..b28dfb0 100644
> > --- a/include/linux/audit.h
> > +++ b/include/linux/audit.h
> > @@ -484,6 +484,9 @@ extern void audit_log_ns_info(struct
> task_struct
> > *tsk); static inline void audit_log_ns_info(struct task_struct *tsk) {
> > }
> > #endif
> > +extern void audit_log_ns_init(int type, unsigned int old_inum,
> > + unsigned int inum);
> > +extern void audit_log_ns_del(int type, unsigned int inum);
> >
> > extern int audit_update_lsm_rules(void);
> >
> > @@ -542,6 +545,11 @@ static inline void audit_log_task_info(struct
> > audit_buffer *ab, { }
> > static inline void audit_log_ns_info(struct task_struct *tsk)
> > { }
> > +static inline int audit_log_ns_init(int type, unsigned int old_inum,
> > + unsigned int inum)
> > +{ }
> > +static inline int audit_log_ns_del(int type, unsigned int inum)
> > +{ }
> > #define audit_enabled 0
> > #endif /* CONFIG_AUDIT */
> > static inline void audit_log_string(struct audit_buffer *ab, const char
> > *buf) diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> > index 1ffb151..487cad6 100644
> > --- a/include/uapi/linux/audit.h
> > +++ b/include/uapi/linux/audit.h
> > @@ -111,6 +111,18 @@
> > #define AUDIT_PROCTITLE 1327 /* Proctitle emit event */
> > #define AUDIT_FEATURE_CHANGE 1328 /* audit log listing feature changes
> */
> > #define AUDIT_NS_INFO 1329 /* Record process namespace IDs */
> > +#define AUDIT_NS_INIT_MNT 1330 /* Record mount namespace instance
> creation
> > */ +#define AUDIT_NS_INIT_UTS 1331 /* Record UTS namespace instance
> > creation */ +#define AUDIT_NS_INIT_IPC 1332 /* Record IPC namespace
> > instance creation */ +#define AUDIT_NS_INIT_USER 1333 /* Record USER
> > namespace instance creation */ +#define AUDIT_NS_INIT_PID 1334 /* Record
> > PID namespace instance creation */ +#define AUDIT_NS_INIT_NET 1335 /*
> > Record NET namespace instance creation */ +#define AUDIT_NS_DEL_MNT 1336
> /*
> > Record mount namespace instance deletion */ +#define
> > AUDIT_NS_DEL_UTS 1337 /* Record UTS namespace instance deletion */
> +#define
> > AUDIT_NS_DEL_IPC 1338 /* Record IPC namespace instance deletion */
> +#define
> > AUDIT_NS_DEL_USER 1339 /* Record USER namespace instance deletion */
> > +#define AUDIT_NS_DEL_PID 1340 /* Record PID namespace instance
> deletion */
> > +#define AUDIT_NS_DEL_NET 1341 /* Record NET namespace instance deletion
> */
> >
> > #define AUDIT_AVC 1400 /* SE Linux avc denial or grant */
> > #define AUDIT_SELINUX_ERR 1401 /* Internal SE Linux Errors */
> > diff --git a/ipc/namespace.c b/ipc/namespace.c
> > index 59451c1..73727ce 100644
> > --- a/ipc/namespace.c
> > +++ b/ipc/namespace.c
> > @@ -13,6 +13,7 @@
> > #include <linux/mount.h>
> > #include <linux/user_namespace.h>
> > #include <linux/proc_ns.h>
> > +#include <linux/audit.h>
> >
> > #include "util.h"
> >
> > @@ -41,6 +42,8 @@ static struct ipc_namespace *create_ipc_ns(struct
> > user_namespace *user_ns, }
> > atomic_inc(&nr_ipc_ns);
> >
> > + audit_log_ns_init(AUDIT_NS_INIT_IPC, old_ns->proc_inum, ns->proc_inum);
> > +
> > sem_init_ns(ns);
> > msg_init_ns(ns);
> > shm_init_ns(ns);
> > @@ -119,6 +122,7 @@ static void free_ipc_ns(struct ipc_namespace *ns)
> > */
> > ipcns_notify(IPCNS_REMOVED);
> > put_user_ns(ns->user_ns);
> > + audit_log_ns_del(AUDIT_NS_DEL_IPC, ns->proc_inum);
> > proc_free_inum(ns->proc_inum);
> > kfree(ns);
> > }
> > @@ -197,3 +201,11 @@ const struct proc_ns_operations ipcns_operations = {
> > .install = ipcns_install,
> > .inum = ipcns_inum,
> > };
> > +
> > +/* log the ID of init IPC namespace after audit service starts */
> > +static int __init ipc_namespaces_init(void)
> > +{
> > + audit_log_ns_init(AUDIT_NS_INIT_IPC, 0, init_ipc_ns.proc_inum);
> > + return 0;
> > +}
> > +late_initcall(ipc_namespaces_init);
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 63f32f4..e6230c4 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -1978,6 +1978,70 @@ out:
> > kfree(name);
> > }
> >
> > +#ifdef CONFIG_NAMESPACES
> > +static char *ns_name[] = {
> > + "mnt",
> > + "uts",
> > + "ipc",
> > + "user",
> > + "pid",
> > + "net",
> > +};
> > +
> > +/**
> > + * audit_log_ns_init - report a namespace instance creation
> > + * @type: type of audit namespace instance created message
> > + * @old_inum: the ID number of the cloned namespace instance
> > + * @inum: the ID number of the new namespace instance
> > + */
> > +void audit_log_ns_init(int type, unsigned int old_inum, unsigned int inum)
> > +{
> > + struct audit_buffer *ab;
> > + char *audit_ns_name = ns_name[type - AUDIT_NS_INIT_MNT];
> > + struct vfsmount *mnt = task_active_pid_ns(current)->proc_mnt;
> > + struct super_block *sb = mnt->mnt_sb;
> > + char old_ns[16];
> > +
> > + if (type < AUDIT_NS_INIT_MNT || type > AUDIT_NS_INIT_NET) {
> > + WARN(1, "audit_log_ns_init: type:%d out of range", type);
> > + return;
> > + }
> > + if (!old_inum)
> > + sprintf(old_ns, "(none)");
> > + else
> > + sprintf(old_ns, "%d", old_inum - PROC_DYNAMIC_FIRST);
> > + audit_log_common_recv_msg(&ab, type);
> > + audit_log_format(ab, " dev=%02x:%02x old_%sns=%s %sns=%d res=1",
> > + MAJOR(sb->s_dev), MINOR(sb->s_dev),
> > + audit_ns_name, old_ns,
> > + audit_ns_name, inum - PROC_DYNAMIC_FIRST);
> > + audit_log_end(ab);
> > +}
> > +
> > +/**
> > + * audit_log_ns_del - report a namespace instance deleted
> > + * @type: type of audit namespace instance deleted message
> > + * @inum: the ID number of the namespace instance
> > + */
> > +void audit_log_ns_del(int type, unsigned int inum)
> > +{
> > + struct audit_buffer *ab;
> > + char *audit_ns_name = ns_name[type - AUDIT_NS_DEL_MNT];
> > + struct vfsmount *mnt = task_active_pid_ns(current)->proc_mnt;
> > + struct super_block *sb = mnt->mnt_sb;
> > +
> > + if (type < AUDIT_NS_DEL_MNT || type > AUDIT_NS_DEL_NET) {
> > + WARN(1, "audit_log_ns_del: type:%d out of range", type);
> > + return;
> > + }
> > + audit_log_common_recv_msg(&ab, type);
> > + audit_log_format(ab, " dev=%02x:%02x %sns=%d res=1",
> > + MAJOR(sb->s_dev), MINOR(sb->s_dev), audit_ns_name,
> > + inum - PROC_DYNAMIC_FIRST);
> > + audit_log_end(ab);
> > +}
> > +#endif /* CONFIG_NAMESPACES */
> > +
> > /**
> > * audit_log_end - end one audit record
> > * @ab: the audit_buffer
> > diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> > index db95d8e..d28fd14 100644
> > --- a/kernel/pid_namespace.c
> > +++ b/kernel/pid_namespace.c
> > @@ -18,6 +18,7 @@
> > #include <linux/proc_ns.h>
> > #include <linux/reboot.h>
> > #include <linux/export.h>
> > +#include <linux/audit.h>
> >
> > struct pid_cache {
> > int nr_ids;
> > @@ -109,6 +110,9 @@ static struct pid_namespace *create_pid_namespace(struct
> > user_namespace *user_ns if (err)
> > goto out_free_map;
> >
> > + audit_log_ns_init(AUDIT_NS_INIT_PID, parent_pid_ns->proc_inum,
> > + ns->proc_inum);
> > +
> > kref_init(&ns->kref);
> > ns->level = level;
> > ns->parent = get_pid_ns(parent_pid_ns);
> > @@ -142,6 +146,7 @@ static void destroy_pid_namespace(struct pid_namespace
> > *ns) {
> > int i;
> >
> > + audit_log_ns_del(AUDIT_NS_DEL_PID, ns->proc_inum);
> > proc_free_inum(ns->proc_inum);
> > for (i = 0; i < PIDMAP_ENTRIES; i++)
> > kfree(ns->pidmap[i].page);
> > @@ -388,3 +393,11 @@ static __init int pid_namespaces_init(void)
> > }
> >
> > __initcall(pid_namespaces_init);
> > +
> > +/* log the ID of init PID namespace after audit service starts */
> > +static __init int pid_namespaces_late_init(void)
> > +{
> > + audit_log_ns_init(AUDIT_NS_INIT_PID, 0, init_pid_ns.proc_inum);
> > + return 0;
> > +}
> > +late_initcall(pid_namespaces_late_init);
> > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> > index fcc0256..89c2517 100644
> > --- a/kernel/user_namespace.c
> > +++ b/kernel/user_namespace.c
> > @@ -22,6 +22,7 @@
> > #include <linux/ctype.h>
> > #include <linux/projid.h>
> > #include <linux/fs_struct.h>
> > +#include <linux/audit.h>
> >
> > static struct kmem_cache *user_ns_cachep __read_mostly;
> >
> > @@ -92,6 +93,9 @@ int create_user_ns(struct cred *new)
> > return ret;
> > }
> >
> > + audit_log_ns_init(AUDIT_NS_INIT_USER, parent_ns->proc_inum,
> > + ns->proc_inum);
> > +
> > atomic_set(&ns->count, 1);
> > /* Leave the new->user_ns reference with the new user namespace. */
> > ns->parent = parent_ns;
> > @@ -136,6 +140,7 @@ void free_user_ns(struct user_namespace *ns)
> > #ifdef CONFIG_PERSISTENT_KEYRINGS
> > key_put(ns->persistent_keyring_register);
> > #endif
> > + audit_log_ns_del(AUDIT_NS_DEL_USER, ns->proc_inum);
> > proc_free_inum(ns->proc_inum);
> > kmem_cache_free(user_ns_cachep, ns);
> > ns = parent;
> > @@ -909,3 +914,11 @@ static __init int user_namespaces_init(void)
> > return 0;
> > }
> > subsys_initcall(user_namespaces_init);
> > +
> > +/* log the ID of init user namespace after audit service starts */
> > +static __init int user_namespaces_late_init(void)
> > +{
> > + audit_log_ns_init(AUDIT_NS_INIT_USER, 0, init_user_ns.proc_inum);
> > + return 0;
> > +}
> > +late_initcall(user_namespaces_late_init);
> > diff --git a/kernel/utsname.c b/kernel/utsname.c
> > index fd39312..fa21e8d 100644
> > --- a/kernel/utsname.c
> > +++ b/kernel/utsname.c
> > @@ -16,6 +16,7 @@
> > #include <linux/slab.h>
> > #include <linux/user_namespace.h>
> > #include <linux/proc_ns.h>
> > +#include <linux/audit.h>
> >
> > static struct uts_namespace *create_uts_ns(void)
> > {
> > @@ -48,6 +49,8 @@ static struct uts_namespace *clone_uts_ns(struct
> > user_namespace *user_ns, return ERR_PTR(err);
> > }
> >
> > + audit_log_ns_init(AUDIT_NS_INIT_UTS, old_ns->proc_inum, ns->proc_inum);
> > +
> > down_read(&uts_sem);
> > memcpy(&ns->name, &old_ns->name, sizeof(ns->name));
> > ns->user_ns = get_user_ns(user_ns);
> > @@ -84,6 +87,7 @@ void free_uts_ns(struct kref *kref)
> >
> > ns = container_of(kref, struct uts_namespace, kref);
> > put_user_ns(ns->user_ns);
> > + audit_log_ns_del(AUDIT_NS_DEL_UTS, ns->proc_inum);
> > proc_free_inum(ns->proc_inum);
> > kfree(ns);
> > }
> > @@ -138,3 +142,11 @@ const struct proc_ns_operations utsns_operations = {
> > .install = utsns_install,
> > .inum = utsns_inum,
> > };
> > +
> > +/* log the ID of init UTS namespace after audit service starts */
> > +static int __init uts_namespaces_init(void)
> > +{
> > + audit_log_ns_init(AUDIT_NS_INIT_UTS, 0, init_uts_ns.proc_inum);
> > + return 0;
> > +}
> > +late_initcall(uts_namespaces_init);
> > diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> > index 85b6269..562eb85 100644
> > --- a/net/core/net_namespace.c
> > +++ b/net/core/net_namespace.c
> > @@ -17,6 +17,7 @@
> > #include <linux/user_namespace.h>
> > #include <net/net_namespace.h>
> > #include <net/netns/generic.h>
> > +#include <linux/audit.h>
> >
> > /*
> > * Our network namespace constructor/destructor lists
> > @@ -253,6 +254,8 @@ struct net *copy_net_ns(unsigned long flags,
> > mutex_lock(&net_mutex);
> > rv = setup_net(net, user_ns);
> > if (rv == 0) {
> > + audit_log_ns_init(AUDIT_NS_INIT_NET, old_net->proc_inum,
> > + net->proc_inum);
> > rtnl_lock();
> > list_add_tail_rcu(&net->list, &net_namespace_list);
> > rtnl_unlock();
> > @@ -389,6 +392,7 @@ static __net_init int net_ns_net_init(struct net *net)
> >
> > static __net_exit void net_ns_net_exit(struct net *net)
> > {
> > + audit_log_ns_del(AUDIT_NS_DEL_NET, net->proc_inum);
> > proc_free_inum(net->proc_inum);
> > }
> >
> > @@ -435,6 +439,14 @@ static int __init net_ns_init(void)
> >
> > pure_initcall(net_ns_init);
> >
> > +/* log the ID of init_net namespace after audit service starts */
> > +static int __init net_ns_init_log(void)
> > +{
> > + audit_log_ns_init(AUDIT_NS_INIT_NET, 0, init_net.proc_inum);
> > + return 0;
> > +}
> > +late_initcall(net_ns_init_log);
> > +
> > #ifdef CONFIG_NET_NS
> > static int __register_pernet_operations(struct list_head *list,
> > struct pernet_operations *ops)
>

- RGB

--
Richard Guy Briggs <[email protected]>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

2015-05-14 14:57:21

by Steve Grubb

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On Tuesday, May 12, 2015 03:57:59 PM Richard Guy Briggs wrote:
> On 15/05/05, Steve Grubb wrote:
> > I think there needs to be some more discussion around this. It seems like
> > this is not exactly recording things that are useful for audit.
>
> It seems to me that either audit has to assemble that information, or
> the kernel has to do so. The kernel doesn't know about containers
> (yet?).

Auditing is something that has a lot of requirements imposed on it by security
standards. There was no requirement to have an auid until audit came along and
said that uid is not good enough to know who is issuing commands because of su
or sudo. There was no requirement for sessionid until we had to track each
action back to a login so we could see if the login came from the expected
place.

What I am saying is we have the same situation. Audit needs to track a
container and we need an ID. The information that is being logged is not
useful for auditing. Maybe someone wants that info in syslog, but I doubt it.
The audit trail's purpose is to allow a security officer to reconstruct the
events to determine what happened during some security incident.

What they would want to know is what resources were assigned; if two
containers shared a resource, what resource and container was it shared with;
if two containers can communicate, we need to see or control information flow
when necessary; and we need to see termination and release of resources.

Also, if the host OS cannot make sense of the information being logged because
the pid maps to another process name, or a uid maps to another user, or a file
access maps to something not in the host's, then we need the container to do
its own auditing and resolve these mappings and optionally pass these to an
aggregation server.

Nothing else makes sense.


> > On Friday, April 17, 2015 03:35:52 AM Richard Guy Briggs wrote:
> > > Log the creation and deletion of namespace instances in all 6 types of
> > > namespaces.
> > >
> > > Twelve new audit message types have been introduced:
> > > AUDIT_NS_INIT_MNT 1330 /* Record mount namespace instance
> > > creation
> > > */ AUDIT_NS_INIT_UTS 1331 /* Record UTS namespace instance
> > > creation */ AUDIT_NS_INIT_IPC 1332 /* Record IPC namespace
> > > instance creation */ AUDIT_NS_INIT_USER 1333 /* Record USER
> > > namespace instance creation */ AUDIT_NS_INIT_PID 1334 /* Record
> > > PID namespace instance creation */ AUDIT_NS_INIT_NET 1335 /*
> > > Record NET namespace instance creation */ AUDIT_NS_DEL_MNT 1336
> > > /* Record mount namespace instance deletion */ AUDIT_NS_DEL_UTS
> > > 1337
> > >
> > > /* Record UTS namespace instance deletion */ AUDIT_NS_DEL_IPC
> > >
> > > 1338 /* Record IPC namespace instance deletion */ AUDIT_NS_DEL_USER
> > >
> > > 1339 /* Record USER namespace instance deletion */ AUDIT_NS_DEL_PID
> > >
> > > 1340 /* Record PID namespace instance deletion */ AUDIT_NS_DEL_NET
> > >
> > > 1341 /* Record NET namespace instance deletion */
> >
> > The requirements for auditing of containers should be derived from VPP. In
> > it, it asks for selectable auditing, selective audit, and selective audit
> > review. What this means is that we need the container and all its
> > children to have one identifier that is inserted into all the events that
> > are associated with the container.
>
> Is that requirement for the records that are sent from the kernel, or
> for the records stored by auditd, or by another facility that delivers
> those records to a final consumer?

A little of both. Selective audit means that you can set rules to include or
exclude an event. This is done in the kernel. Selectable review means that the
user space tools need to be able to skip past records not of interest to a
specific line of inquiry. Also, logging everything and letting user space work
it out later is also not a solution because the needle is harder to find in a
larger haystack. Or, the logs may rotate and its gone forever because the
partition is filled.


> > With this, its possible to do a search for all events related to a
> > container. Its possible to exclude events from a container. Its possible
> > to not get any events.
> >
> > The requirements also call out for the identification of the subject. This
> > means that the event should be bound to a syscall such as clone, setns, or
> > unshare.
>
> Is it useful to have a reference of the init namespace set from which
> all others are spawned?

For things directly observable by the init name space, yes.

> If it isn't bound, I assume the subject should be added to the message
> format? I'm thinking of messages without an audit_context such as audit
> user messages (such as AUDIT_NS_INFO and AUDIT_VIRT_CONTROL).

Making these events auxiliary records to a syscall is all that is needed. The
same way that PATH is added to an open event. If someone wants to have
container/namespace events, they add a rule on clone(2).


> For now, we should not need to log namespaces with AUDIT_FEATURE_CHANGE
> or AUDIT_CONFIG_CHANGE messages since only initial user namespace with
> initial pid namespace has permission to do so. This will need to be
> addressed by having non-init config changes be limited to that container
> or set of namespaces and possibly its children. The other possibility
> is to add the subject to the stand-alone message.
>
> > Also, any user space events originating inside the container needs to have
> > the container ID added to the user space event - just like auid and
> > session id.
>
> This sounds like every task needs to record a container ID since that
> information is otherwise unknown by the kernel except by what might be
> provided by an audit user message such as AUDIT_VIRT_CONTROL or possibly
> the new AUDIT_NS_INFO request.

Right. The same as we record auid and ses on every event. We'll need a
container ID logged with everything. -1 for unset, meaning init namespace.


> It could be stored in struct task_struct or in struct audit_context. I
> don't have a suggestion on how to get that information securely into the
> kernel.

That is where I'd suggest. Its for audit subsystem needs.


> > Recording each instance of a name space is giving me something that I
> > cannot use to do queries required by the security target. Given these
> > events, how do I locate a web server event where it accesses a watched
> > file? That authentication failed? That an update within the container
> > failed?
> >
> > The requirements are that we have to log the creation, suspension,
> > migration, and termination of a container. The requirements are not on
> > the individual name space.
>
> Ok. Do we have a robust definition of a container?

We call the combination of name spaces, cgroups, and seccomp rules a
container.

> Where is that definition managed?

In the thing that invokes a container.

> If it is a userspace concept, then I think either userspace should be
> assembling this information, or providing that information to the entity
> that will be expected to know about and provide it.

Well, uid is a userspace concept, too. But we record an auid and keep it
immutable so that we can check enforcement of system security policy which is
also a user space concept. These things need to be collected to a place that
can be associated with events as needed. That place is the kernel.


> > Maybe I'm missing how these events give me that. But I'd like to hear how
> > I
> > would be able to meet requirements with these 12 events.
>
> Adding the infrastructure to give each of those 12 events an audit
> context to be able to give meaningful subject fields in audit records
> appears to require adding a struct task_struct argument to calls to
> copy_mnt_ns(), copy_utsname(), copy_ipcs(), copy_pid_ns(),
> copy_net_ns(), create_user_ns() unless I use current. I think we must
> use current since the userns is created before the spawned process is
> mature or has an audit context in the case of clone.

I think you are heading down the wrong path. We can tell from syscall flags
what is being done. Try this:

## Optional - log container creation
-a always,exit -F arch=b32 -S clone -F a0&0x7C020000 -F key=container-create
-a always,exit -F arch=b64 -S clone -F a0&0x7C020000 -F key=container-create

## Optional - watch for containers that may change their configuration
-a always,exit -F arch=b32 -S unshare,setns -F key=container-config
-a always,exit -F arch=b64 -S unshare,setns -F key=container-config

Then muck with containers, then use ausearch --start recent -k container -i. I
think you'll see that we know a bit about what's happening. What's needed is
the breadcrumb trail to tie future events back to the container so that we can
check for violations of host security policy.

> Either that, or I have mis-understood and I should be stashing this
> namespace ID information in an audit_aux_data structure or a more
> permanent part of struct audit_context to be printed when required on
> syscall exit. I'm trying to think through if it is needed in any
> non-syscall audit messages.

I think this is what is required. But we also have the issue where an event's
meaning can't be determined outside of a container. (For example, login,
account creation, password change, uid change, file access, etc.) So, I think
auditing needs to be local to the container for enrichment and ultimately
forwarded to an aggregating server.

-Steve

2015-05-14 15:47:34

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

Steve Grubb <[email protected]> writes:

> On Tuesday, May 12, 2015 03:57:59 PM Richard Guy Briggs wrote:
>> On 15/05/05, Steve Grubb wrote:
>> > I think there needs to be some more discussion around this. It seems like
>> > this is not exactly recording things that are useful for audit.
>>
>> It seems to me that either audit has to assemble that information, or
>> the kernel has to do so. The kernel doesn't know about containers
>> (yet?).
>
> Auditing is something that has a lot of requirements imposed on it by security
> standards. There was no requirement to have an auid until audit came along and
> said that uid is not good enough to know who is issuing commands because of su
> or sudo. There was no requirement for sessionid until we had to track each
> action back to a login so we could see if the login came from the expected
> place.

Stop right there.

You want a global identifier in a realm where only relative identifiers
exist, and make sense.

I am sorry that isn't going to happen. EVER.

Square peg, round hole. It doesn't work, it doesn't make sense, and
most especially it doesn't allow anyone to reconstruct anything, because
it does not make sense and does not match what the kernel is doing.

Container IDs do not, and will not exist. There is probably something
reasonable in your request but until you stop talking that nonsense I
can't see it.

Global IDs take us into the namespace of namespaces problem and that
isn't going to happen. I have already bent as far in this direction as
I can go. Further namespace creation is not a privileged event which
makes the requestion for a container ID make even less sense. With
anyone able to create whatever they want it will not be a identifier
that makes any sense to someone reading an audit log.

Eric

2015-05-14 16:22:11

by Steve Grubb

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On Thursday, May 14, 2015 10:42:38 AM Eric W. Biederman wrote:
> Steve Grubb <[email protected]> writes:
> > On Tuesday, May 12, 2015 03:57:59 PM Richard Guy Briggs wrote:
> >> On 15/05/05, Steve Grubb wrote:
> >> > I think there needs to be some more discussion around this. It seems
> >> > like
> >> > this is not exactly recording things that are useful for audit.
> >>
> >> It seems to me that either audit has to assemble that information, or
> >> the kernel has to do so. The kernel doesn't know about containers
> >> (yet?).
> >
> > Auditing is something that has a lot of requirements imposed on it by
> > security standards. There was no requirement to have an auid until audit
> > came along and said that uid is not good enough to know who is issuing
> > commands because of su or sudo. There was no requirement for sessionid
> > until we had to track each action back to a login so we could see if the
> > login came from the expected place.
>
> Stop right there.
>
> You want a global identifier in a realm where only relative identifiers
> exist, and make sense.

Global to a name space for me is I guess relative for you. The ID is needed to
tie events together to check for violations of the security policy of the
container/namespace invoking child container/namespace.

As a concrete example, suppose a container is to have its own /etc/shadow. If
for some reason the container used the host's copy, then that would point to a
misconfiguration or perhaps indicate an escape from the container.

I would imagine that the next layer down has its own set of global identifiers
so that it can verify enforcement of its own security assumptions. This does
not need to be global to the system from top to 9 layers down. Each layer
needs to have a way of locating events common to a child container instance.


> I am sorry that isn't going to happen. EVER.

Then I'd suggest we either scrap this set of patches and forget auditing of
containers. (This would have the effect of disallowing them in a lot of
environments because violations of security policy can't be detected.)

Or someone please explain how what is proposed to be logged allows the tying
together of events. Or even supports the requirements I stated in my last
email.

-Steve

2015-05-14 19:19:48

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On Thursday, May 14, 2015 10:57:14 AM Steve Grubb wrote:
> On Tuesday, May 12, 2015 03:57:59 PM Richard Guy Briggs wrote:
> > On 15/05/05, Steve Grubb wrote:
> > > I think there needs to be some more discussion around this. It seems
> > > like this is not exactly recording things that are useful for audit.
> >
> > It seems to me that either audit has to assemble that information, or
> > the kernel has to do so. The kernel doesn't know about containers
> > (yet?).
>
> Auditing is something that has a lot of requirements imposed on it by
> security standards. There was no requirement to have an auid until audit
> came along and said that uid is not good enough to know who is issuing
> commands because of su or sudo. There was no requirement for sessionid
> until we had to track each action back to a login so we could see if the
> login came from the expected place.
>
> What I am saying is we have the same situation. Audit needs to track a
> container and we need an ID. The information that is being logged is not
> useful for auditing. Maybe someone wants that info in syslog, but I doubt
> it. The audit trail's purpose is to allow a security officer to reconstruct
> the events to determine what happened during some security incident.

As Eric, and others, have stated, the container concept is a userspace idea,
not a kernel idea; the kernel only knows, and cares about, namespaces. This
is unlikely to change.

However, as Steve points out, there is precedence for the kernel to record
userspace tokens for the sake of audit. Personally I'm not a big fan of this
in general, but I do recognize that it does satisfy a legitimate need. Think
of things like auid and the sessionid as necessary evils; audit is already
chock full of evilness I doubt one more will doom us all to hell.

Moving forward, I'd like to see the following:

* Record the creation/removal/mgmt of the individual namespaces as Richard's
patchset currently does. However, I'd suggest using an explicit namespace
value for the init namespace instead of the "unset" value in the V6 patchset
(my apologies if you've already changed this Richard, I haven't looked at V7
yet).

* Create a container ID token (unsigned 32-bit integer?), similar to
auid/sessionid, that is set by userspace and carried by the kernel to be used
in audit records. I'd like to see some discussion on how we manage this, e.g.
how do handle container ID inheritance, how do we handle nested containers
(setting the containerid when it is already set), do we care if multiple
different containers share the same namespace config, etc.?

* When userspace sets the container ID, emit a new audit record with the
associated namespace tokens and the container ID.

* Look at our existing audit records to determine which records should have
namespace and container ID tokens added. We may only want to add the
additional fields in the case where the namespace/container ID tokens are not
the init namespace.

Can we all live with this? If not, please suggest some alternate ideas;
simply shouting "IT'S ALL CRAP!" isn't helpful for anyone ... it may be true,
but it doesn't help us solve the problem ;)

--
paul moore
security @ redhat

2015-05-15 00:49:14

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On 15/05/14, Steve Grubb wrote:
> On Tuesday, May 12, 2015 03:57:59 PM Richard Guy Briggs wrote:
> > On 15/05/05, Steve Grubb wrote:
> > > I think there needs to be some more discussion around this. It seems like
> > > this is not exactly recording things that are useful for audit.
> >
> > It seems to me that either audit has to assemble that information, or
> > the kernel has to do so. The kernel doesn't know about containers
> > (yet?).
>
> Auditing is something that has a lot of requirements imposed on it by security
> standards. There was no requirement to have an auid until audit came along and
> said that uid is not good enough to know who is issuing commands because of su
> or sudo. There was no requirement for sessionid until we had to track each
> action back to a login so we could see if the login came from the expected
> place.
>
> What I am saying is we have the same situation. Audit needs to track a
> container and we need an ID. The information that is being logged is not
> useful for auditing. Maybe someone wants that info in syslog, but I doubt it.
> The audit trail's purpose is to allow a security officer to reconstruct the
> events to determine what happened during some security incident.

I agree the information being logged is not yet useful, but it is a
component of what would be. I wasn't ever thinking about syslog... It
is this trail that I was trying to help create.

> What they would want to know is what resources were assigned; if two
> containers shared a resource, what resource and container was it shared with;
> if two containers can communicate, we need to see or control information flow
> when necessary; and we need to see termination and release of resources.

So, namespaces are a big part of this. I understand how they are
spawned and potentially shared. I have a more vague idea about how
cgroups contribute to this concept of a container. So far, I have very
little idea how seccomp contributes, but I assume that it will also need
to be part of this tracking.

> Also, if the host OS cannot make sense of the information being logged because
> the pid maps to another process name, or a uid maps to another user, or a file
> access maps to something not in the host's, then we need the container to do
> its own auditing and resolve these mappings and optionally pass these to an
> aggregation server.

I'm open to both being possible.

> Nothing else makes sense.
>
> > > On Friday, April 17, 2015 03:35:52 AM Richard Guy Briggs wrote:
> > > > Log the creation and deletion of namespace instances in all 6 types of
> > > > namespaces.
> > > >
> > > > Twelve new audit message types have been introduced:
> > > > AUDIT_NS_INIT_MNT 1330 /* Record mount namespace instance
> > > > creation
> > > > */ AUDIT_NS_INIT_UTS 1331 /* Record UTS namespace instance
> > > > creation */ AUDIT_NS_INIT_IPC 1332 /* Record IPC namespace
> > > > instance creation */ AUDIT_NS_INIT_USER 1333 /* Record USER
> > > > namespace instance creation */ AUDIT_NS_INIT_PID 1334 /* Record
> > > > PID namespace instance creation */ AUDIT_NS_INIT_NET 1335 /*
> > > > Record NET namespace instance creation */ AUDIT_NS_DEL_MNT 1336
> > > > /* Record mount namespace instance deletion */ AUDIT_NS_DEL_UTS
> > > > 1337
> > > >
> > > > /* Record UTS namespace instance deletion */ AUDIT_NS_DEL_IPC
> > > >
> > > > 1338 /* Record IPC namespace instance deletion */ AUDIT_NS_DEL_USER
> > > >
> > > > 1339 /* Record USER namespace instance deletion */ AUDIT_NS_DEL_PID
> > > >
> > > > 1340 /* Record PID namespace instance deletion */ AUDIT_NS_DEL_NET
> > > >
> > > > 1341 /* Record NET namespace instance deletion */
> > >
> > > The requirements for auditing of containers should be derived from VPP. In
> > > it, it asks for selectable auditing, selective audit, and selective audit
> > > review. What this means is that we need the container and all its
> > > children to have one identifier that is inserted into all the events that
> > > are associated with the container.
> >
> > Is that requirement for the records that are sent from the kernel, or
> > for the records stored by auditd, or by another facility that delivers
> > those records to a final consumer?
>
> A little of both. Selective audit means that you can set rules to include or
> exclude an event. This is done in the kernel. Selectable review means that the
> user space tools need to be able to skip past records not of interest to a
> specific line of inquiry. Also, logging everything and letting user space work
> it out later is also not a solution because the needle is harder to find in a
> larger haystack. Or, the logs may rotate and its gone forever because the
> partition is filled.

I agree it needs to be a balance of flexibility and efficiency.

> > > With this, its possible to do a search for all events related to a
> > > container. Its possible to exclude events from a container. Its possible
> > > to not get any events.
> > >
> > > The requirements also call out for the identification of the subject. This
> > > means that the event should be bound to a syscall such as clone, setns, or
> > > unshare.
> >
> > Is it useful to have a reference of the init namespace set from which
> > all others are spawned?
>
> For things directly observable by the init name space, yes.

Ok, so we'll need to have a way to document that initial state on boot
before any other processes start, preferably in one clear brief record.

> > If it isn't bound, I assume the subject should be added to the message
> > format? I'm thinking of messages without an audit_context such as audit
> > user messages (such as AUDIT_NS_INFO and AUDIT_VIRT_CONTROL).
>
> Making these events auxiliary records to a syscall is all that is needed. The
> same way that PATH is added to an open event. If someone wants to have
> container/namespace events, they add a rule on clone(2).

This doesn't make sense. The point of this type of record is to have a
way for a userspace container manager (which maybe should have a new CAP
type) to tie the creation of namespaces to a specific container name or
ID. It might even contain cgroup and/or seccomp info.

> > For now, we should not need to log namespaces with AUDIT_FEATURE_CHANGE
> > or AUDIT_CONFIG_CHANGE messages since only initial user namespace with
> > initial pid namespace has permission to do so. This will need to be
> > addressed by having non-init config changes be limited to that container
> > or set of namespaces and possibly its children. The other possibility
> > is to add the subject to the stand-alone message.
> >
> > > Also, any user space events originating inside the container needs to have
> > > the container ID added to the user space event - just like auid and
> > > session id.
> >
> > This sounds like every task needs to record a container ID since that
> > information is otherwise unknown by the kernel except by what might be
> > provided by an audit user message such as AUDIT_VIRT_CONTROL or possibly
> > the new AUDIT_NS_INFO request.
>
> Right. The same as we record auid and ses on every event. We'll need a
> container ID logged with everything. -1 for unset, meaning init namespace.

Ok, that might remove the need for the reply I just wrote above.

> > It could be stored in struct task_struct or in struct audit_context. I
> > don't have a suggestion on how to get that information securely into the
> > kernel.
>
> That is where I'd suggest. Its for audit subsystem needs.

struct audit_context would be my choice.

> > > Recording each instance of a name space is giving me something that I
> > > cannot use to do queries required by the security target. Given these
> > > events, how do I locate a web server event where it accesses a watched
> > > file? That authentication failed? That an update within the container
> > > failed?
> > >
> > > The requirements are that we have to log the creation, suspension,
> > > migration, and termination of a container. The requirements are not on
> > > the individual name space.
> >
> > Ok. Do we have a robust definition of a container?
>
> We call the combination of name spaces, cgroups, and seccomp rules a
> container.

Can you detail what information is required from each?

> > Where is that definition managed?
>
> In the thing that invokes a container.

I was looking for a reference to a standards document rather than an
application...

> > If it is a userspace concept, then I think either userspace should be
> > assembling this information, or providing that information to the entity
> > that will be expected to know about and provide it.
>
> Well, uid is a userspace concept, too. But we record an auid and keep it
> immutable so that we can check enforcement of system security policy which is
> also a user space concept. These things need to be collected to a place that
> can be associated with events as needed. That place is the kernel.

I am fine with putting that in the kernel if that is what makes most
sense.

> > > Maybe I'm missing how these events give me that. But I'd like to
> > > hear how I would be able to meet requirements with these 12
> > > events.
> >
> > Adding the infrastructure to give each of those 12 events an audit
> > context to be able to give meaningful subject fields in audit records
> > appears to require adding a struct task_struct argument to calls to
> > copy_mnt_ns(), copy_utsname(), copy_ipcs(), copy_pid_ns(),
> > copy_net_ns(), create_user_ns() unless I use current. I think we must
> > use current since the userns is created before the spawned process is
> > mature or has an audit context in the case of clone.
>
> I think you are heading down the wrong path.

That's why I started questioning it...

> We can tell from syscall flags what is being done. Try this:
>
> ## Optional - log container creation
> -a always,exit -F arch=b32 -S clone -F a0&0x7C020000 -F key=container-create
> -a always,exit -F arch=b64 -S clone -F a0&0x7C020000 -F key=container-create
>
> ## Optional - watch for containers that may change their configuration
> -a always,exit -F arch=b32 -S unshare,setns -F key=container-config
> -a always,exit -F arch=b64 -S unshare,setns -F key=container-config
>
> Then muck with containers, then use ausearch --start recent -k container -i. I
> think you'll see that we know a bit about what's happening. What's needed is
> the breadcrumb trail to tie future events back to the container so that we can
> check for violations of host security policy.

Agreed.

> > Either that, or I have mis-understood and I should be stashing this
> > namespace ID information in an audit_aux_data structure or a more
> > permanent part of struct audit_context to be printed when required on
> > syscall exit. I'm trying to think through if it is needed in any
> > non-syscall audit messages.
>
> I think this is what is required. But we also have the issue where an event's
> meaning can't be determined outside of a container. (For example, login,
> account creation, password change, uid change, file access, etc.) So, I think
> auditing needs to be local to the container for enrichment and ultimately
> forwarded to an aggregating server.

There are some events that will mean more to different layers...
They should be determined by the rules in each auditd jurisdiction,
potentially one per user namespace.

> -Steve

- RGB

--
Richard Guy Briggs <[email protected]>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

2015-05-15 01:36:40

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

Paul Moore <[email protected]> writes:
> As Eric, and others, have stated, the container concept is a userspace idea,
> not a kernel idea; the kernel only knows, and cares about, namespaces. This
> is unlikely to change.
>
> However, as Steve points out, there is precedence for the kernel to record
> userspace tokens for the sake of audit. Personally I'm not a big fan of this
> in general, but I do recognize that it does satisfy a legitimate need. Think
> of things like auid and the sessionid as necessary evils; audit is already
> chock full of evilness I doubt one more will doom us all to hell.
>
> Moving forward, I'd like to see the following:

> * Create a container ID token (unsigned 32-bit integer?), similar to
> auid/sessionid, that is set by userspace and carried by the kernel to be used
> in audit records. I'd like to see some discussion on how we manage this, e.g.
> how do handle container ID inheritance, how do we handle nested containers
> (setting the containerid when it is already set), do we care if multiple
> different containers share the same namespace config, etc.?


> Can we all live with this? If not, please suggest some alternate ideas;
> simply shouting "IT'S ALL CRAP!" isn't helpful for anyone ... it may be true,
> but it doesn't help us solve the problem ;)

Without stopping and defining what someone means by container I think it
is pretty much nonsense.

Should every vsftp connection get a container every? Every chrome tab?

At some of the connections per second numbers I have seen we might
exhaust a 32bit number in an hour or two. Will any of that make sense
to someone reading the audit logs?

Without considerning that container creation is an unprivileged
operation I think it is pretty much nonsense. Do I get to say I am any
container I want? That would seem to invalidate the concept of
userspace setting a container id.

How does any of this interact with setns? AKA entering a container?

I will go as far as looking at patches. If someone comes up with
a mission statement about what they are actually trying to achieve and a
mechanism that actually achieves that, and that allows for containers to
nest we can talk about doing something like that.

But for right now I just hear proposals for things that make no sense
and can not possibly work. Not least because it will require modifying
every program that creates a container and who knows how many of them
there are. Especially since you don't need to be root. Modifying
/usr/bin/unshare seems a little far out to me.

Eric



2015-05-15 02:04:05

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On 15/05/14, Eric W. Biederman wrote:
> Steve Grubb <[email protected]> writes:
> > On Tuesday, May 12, 2015 03:57:59 PM Richard Guy Briggs wrote:
> >> On 15/05/05, Steve Grubb wrote:
> >> > I think there needs to be some more discussion around this. It seems like
> >> > this is not exactly recording things that are useful for audit.
> >>
> >> It seems to me that either audit has to assemble that information, or
> >> the kernel has to do so. The kernel doesn't know about containers
> >> (yet?).
> >
> > Auditing is something that has a lot of requirements imposed on it by security
> > standards. There was no requirement to have an auid until audit came along and
> > said that uid is not good enough to know who is issuing commands because of su
> > or sudo. There was no requirement for sessionid until we had to track each
> > action back to a login so we could see if the login came from the expected
> > place.
>
> Stop right there.
>
> You want a global identifier in a realm where only relative identifiers
> exist, and make sense.

I am assuming he wants an identifier unique per container on one kernel
and what happens on other kernels is a matter for a management
application to take care of. This kernel doesn't have to deal with it
other than taking information from a container management application.

> I am sorry that isn't going to happen. EVER.
>
> Square peg, round hole. It doesn't work, it doesn't make sense, and
> most especially it doesn't allow anyone to reconstruct anything, because
> it does not make sense and does not match what the kernel is doing.
>
> Container IDs do not, and will not exist. There is probably something
> reasonable in your request but until you stop talking that nonsense I
> can't see it.

I didn't see anything in any of what Steve said that suggested it was to
be unique beyond that one kernel.

> Global IDs take us into the namespace of namespaces problem and that
> isn't going to happen. I have already bent as far in this direction as
> I can go. Further namespace creation is not a privileged event which
> makes the requestion for a container ID make even less sense. With
> anyone able to create whatever they want it will not be a identifier
> that makes any sense to someone reading an audit log.

Again, I assume this is up to a container management application that
will manage its pool of container hosts and an audit aggregator.

You keep raising an objection about the unworkability of a "namespace of
namespaces". Just so we are all on the same page here, can you explain
exactly what you mean with "namespace of namespaces"?

> Eric

- RGB

--
Richard Guy Briggs <[email protected]>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

2015-05-15 02:11:31

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On 15/05/14, Oren Laadan wrote:
> On Thu, May 14, 2015 at 8:48 PM, Richard Guy Briggs <[email protected]> wrote:
>
> >
> > > > > Recording each instance of a name space is giving me something that I
> > > > > cannot use to do queries required by the security target. Given these
> > > > > events, how do I locate a web server event where it accesses a
> > watched
> > > > > file? That authentication failed? That an update within the container
> > > > > failed?
> > > > >
> > > > > The requirements are that we have to log the creation, suspension,
> > > > > migration, and termination of a container. The requirements are not
> > on
> > > > > the individual name space.
> > > >
> > > > Ok. Do we have a robust definition of a container?
> > >
> > > We call the combination of name spaces, cgroups, and seccomp rules a
> > > container.
> >
> > Can you detail what information is required from each?
> >
> > > > Where is that definition managed?
> > >
> > > In the thing that invokes a container.
> >
> > I was looking for a reference to a standards document rather than an
> > application...
> >
> >
> [focusing on "containers id" - snipped the rest away]
>
> I am unfamiliar with the audit subsystem, but work with namespaces in other
> contexts. Perhaps the term "container" is overloaded here. The definition
> suggested by Steve in this thread makes sense to me: "a combination of
> namespaces". I imagine people may want to audit subsets of namespaces.

I assume it would be a bit more than that, including cgroup and seccomp info.

> For namespaces, can use a string like "A:B:C:D:E:F" as an identifier for a
> particular combination, where A-F are respective namespaces identifiers.
> (Can be taken for example from /proc/PID/ns/{mnt,uts,ipc,user,pid,net}).
> That will even be grep-able to locate records related to a particular
> subset
> of namespaces. So a "container" in the classic meaning would have all A-F
> unique and different from the init process, but processes separated only by
> e.g. mnt-ns and net-ns will differ from the init process in A and F.
>
> (If a string is a no go, then perhaps combine the IDs in a unique way into a
> super ID).

I'd be fine with either, even including the nsfs deviceID.

> Oren.

- RGB

--
Richard Guy Briggs <[email protected]>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

2015-05-15 02:25:33

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On 15/05/14, Eric W. Biederman wrote:
> Paul Moore <[email protected]> writes:
> > As Eric, and others, have stated, the container concept is a userspace idea,
> > not a kernel idea; the kernel only knows, and cares about, namespaces. This
> > is unlikely to change.
> >
> > However, as Steve points out, there is precedence for the kernel to record
> > userspace tokens for the sake of audit. Personally I'm not a big fan of this
> > in general, but I do recognize that it does satisfy a legitimate need. Think
> > of things like auid and the sessionid as necessary evils; audit is already
> > chock full of evilness I doubt one more will doom us all to hell.
> >
> > Moving forward, I'd like to see the following:
>
> > * Create a container ID token (unsigned 32-bit integer?), similar to
> > auid/sessionid, that is set by userspace and carried by the kernel to be used
> > in audit records. I'd like to see some discussion on how we manage this, e.g.
> > how do handle container ID inheritance, how do we handle nested containers
> > (setting the containerid when it is already set), do we care if multiple
> > different containers share the same namespace config, etc.?
>
>
> > Can we all live with this? If not, please suggest some alternate ideas;
> > simply shouting "IT'S ALL CRAP!" isn't helpful for anyone ... it may be true,
> > but it doesn't help us solve the problem ;)
>
> Without stopping and defining what someone means by container I think it
> is pretty much nonsense.

Not complete, but this is why I'm asking for a standards document...

> Should every vsftp connection get a container every? Every chrome tab?
>
> At some of the connections per second numbers I have seen we might
> exhaust a 32bit number in an hour or two. Will any of that make sense
> to someone reading the audit logs?

So making it 64bits buys us some time, but sure... I think your
definition of a container may be a bit more liberal than what we're
trying to understand...

> Without considerning that container creation is an unprivileged
> operation I think it is pretty much nonsense. Do I get to say I am any
> container I want? That would seem to invalidate the concept of
> userspace setting a container id.

Ok, my impression was that we're dealing with a privileged application
as I alluded with the need to create a new CAP_AUDIT_CONTAINER_ID or
something...

> How does any of this interact with setns? AKA entering a container?

You mean entering another namespace that might all be part of one
container? Or an an application attempting to enter the namespace of
another container?

> I will go as far as looking at patches. If someone comes up with
> a mission statement about what they are actually trying to achieve and a
> mechanism that actually achieves that, and that allows for containers to
> nest we can talk about doing something like that.

I don't pretend these patches are anywhere near finished or ready for
upstream.

> But for right now I just hear proposals for things that make no sense
> and can not possibly work. Not least because it will require modifying
> every program that creates a container and who knows how many of them
> there are. Especially since you don't need to be root. Modifying
> /usr/bin/unshare seems a little far out to me.

My understanding is that just spawning or changing namespace doesn't
imply spawning or changing containers. I also don't necessarily assume
that creating a container is an atomic operation, though that concept
might make some sense to understand or predict the boundaries of
actions...

> Eric

- RGB

--
Richard Guy Briggs <[email protected]>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

2015-05-15 02:32:36

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On 15/05/14, Paul Moore wrote:
> On Thursday, May 14, 2015 10:57:14 AM Steve Grubb wrote:
> > On Tuesday, May 12, 2015 03:57:59 PM Richard Guy Briggs wrote:
> > > On 15/05/05, Steve Grubb wrote:
> > > > I think there needs to be some more discussion around this. It seems
> > > > like this is not exactly recording things that are useful for audit.
> > >
> > > It seems to me that either audit has to assemble that information, or
> > > the kernel has to do so. The kernel doesn't know about containers
> > > (yet?).
> >
> > Auditing is something that has a lot of requirements imposed on it by
> > security standards. There was no requirement to have an auid until audit
> > came along and said that uid is not good enough to know who is issuing
> > commands because of su or sudo. There was no requirement for sessionid
> > until we had to track each action back to a login so we could see if the
> > login came from the expected place.
> >
> > What I am saying is we have the same situation. Audit needs to track a
> > container and we need an ID. The information that is being logged is not
> > useful for auditing. Maybe someone wants that info in syslog, but I doubt
> > it. The audit trail's purpose is to allow a security officer to reconstruct
> > the events to determine what happened during some security incident.
>
> As Eric, and others, have stated, the container concept is a userspace idea,
> not a kernel idea; the kernel only knows, and cares about, namespaces. This
> is unlikely to change.
>
> However, as Steve points out, there is precedence for the kernel to record
> userspace tokens for the sake of audit. Personally I'm not a big fan of this
> in general, but I do recognize that it does satisfy a legitimate need. Think
> of things like auid and the sessionid as necessary evils; audit is already
> chock full of evilness I doubt one more will doom us all to hell.
>
> Moving forward, I'd like to see the following:
>
> * Record the creation/removal/mgmt of the individual namespaces as Richard's
> patchset currently does. However, I'd suggest using an explicit namespace
> value for the init namespace instead of the "unset" value in the V6 patchset
> (my apologies if you've already changed this Richard, I haven't looked at V7
> yet).

The "unset" (none) value is only there before the first namespaces have
been created. After that, any new ones are created relative to the init
namespace of that type.

> * Create a container ID token (unsigned 32-bit integer?), similar to
> auid/sessionid, that is set by userspace and carried by the kernel to be used
> in audit records. I'd like to see some discussion on how we manage this, e.g.
> how do handle container ID inheritance, how do we handle nested containers
> (setting the containerid when it is already set), do we care if multiple
> different containers share the same namespace config, etc.?

(Addressed in another reply.) Nested will need some careful thought...

> * When userspace sets the container ID, emit a new audit record with the
> associated namespace tokens and the container ID.

That was the goal of AUDIT_VIRT_CONTROL or AUDIT_NS_INFO messages from
userspace into the kernel.

> * Look at our existing audit records to determine which records should have
> namespace and container ID tokens added. We may only want to add the
> additional fields in the case where the namespace/container ID tokens are not
> the init namespace.

If we have a record that ties a set of namespace IDs with a container
ID, then I expect we only need to list the containerID along with auid
and sessionID.

> Can we all live with this? If not, please suggest some alternate ideas;
> simply shouting "IT'S ALL CRAP!" isn't helpful for anyone ... it may be true,
> but it doesn't help us solve the problem ;)

Thanks Paul.

> paul moore

- RGB

--
Richard Guy Briggs <[email protected]>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

2015-05-15 06:23:36

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On Thu, May 14, 2015 at 7:32 PM, Richard Guy Briggs <[email protected]> wrote:
> On 15/05/14, Paul Moore wrote:
>> * Look at our existing audit records to determine which records should have
>> namespace and container ID tokens added. We may only want to add the
>> additional fields in the case where the namespace/container ID tokens are not
>> the init namespace.
>
> If we have a record that ties a set of namespace IDs with a container
> ID, then I expect we only need to list the containerID along with auid
> and sessionID.

The problem here is that the kernel has no concept of a "container", and I
don't think it makes any sense to add one just for audit. "Container" is a
marketing term used by some userspace tools.

I can imagine that both audit could benefit from a concept of a
namespace *path* that understands nesting (e.g. root/2/5/1 or
something along those lines). Mapping these to "containers" belongs
in userspace, I think.

--Andy

2015-05-15 12:38:11

by Steve Grubb

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On Thursday, May 14, 2015 11:23:09 PM Andy Lutomirski wrote:
> On Thu, May 14, 2015 at 7:32 PM, Richard Guy Briggs <[email protected]> wrote:
> > On 15/05/14, Paul Moore wrote:
> >> * Look at our existing audit records to determine which records should
> >> have
> >> namespace and container ID tokens added. We may only want to add the
> >> additional fields in the case where the namespace/container ID tokens are
> >> not the init namespace.
> >
> > If we have a record that ties a set of namespace IDs with a container
> > ID, then I expect we only need to list the containerID along with auid
> > and sessionID.
>
> The problem here is that the kernel has no concept of a "container", and I
> don't think it makes any sense to add one just for audit. "Container" is a
> marketing term used by some userspace tools.

No, its a real thing just like a login. Does the kernel have any concept of a
login? Yet it happens. And it causes us to generate events describing who,
where from, role, success, and time of day. :-)


> I can imagine that both audit could benefit from a concept of a
> namespace *path* that understands nesting (e.g. root/2/5/1 or
> something along those lines). Mapping these to "containers" belongs
> in userspace, I think.

I don't doubt that just as user space sequences the actions that are a login.
I just need the kernel to do some book keeping and associate the necessary
attributes in the event record to be able to reconstruct what is actually
happening.

-Steve

2015-05-15 13:17:39

by Steve Grubb

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On Thursday, May 14, 2015 08:31:45 PM Eric W. Biederman wrote:
> Paul Moore <[email protected]> writes:
> > As Eric, and others, have stated, the container concept is a userspace
> > idea, not a kernel idea; the kernel only knows, and cares about,
> > namespaces. This is unlikely to change.
> >
> > However, as Steve points out, there is precedence for the kernel to record
> > userspace tokens for the sake of audit. Personally I'm not a big fan of
> > this in general, but I do recognize that it does satisfy a legitimate
> > need. Think of things like auid and the sessionid as necessary evils;
> > audit is already chock full of evilness I doubt one more will doom us all
> > to hell.
> >
> > Moving forward, I'd like to see the following:
> >
> > * Create a container ID token (unsigned 32-bit integer?), similar to
> > auid/sessionid, that is set by userspace and carried by the kernel to be
> > used in audit records. I'd like to see some discussion on how we manage
> > this, e.g. how do handle container ID inheritance, how do we handle
> > nested containers (setting the containerid when it is already set), do we
> > care if multiple different containers share the same namespace config,
> > etc.?
> >
> > Can we all live with this? If not, please suggest some alternate ideas;
> > simply shouting "IT'S ALL CRAP!" isn't helpful for anyone ... it may be
> > true, but it doesn't help us solve the problem ;)
>
> Without stopping and defining what someone means by container I think it
> is pretty much nonsense.

Maybe this is what's hanging everyone up? Its easy to get lost when your view
is down at the syscall level and what is happening in the kernel. Starting a
container is akin to the idea of login. Not every call to setresuid is a
login. It could be a setuid program starting or a daemon dropping privileges.
The idea of a container is a higher level concept that starting a name space.
I think comparing a login with a container is a useful analogy because both
are higher level concepts but employ low level ideas. A login is a collection
of chdir, setuid, setgid, allocating a tty, associating the first 3 file
descriptors, setting a process group, and starting a specific executable. All
these low level concepts each by itself is not special.

A container is what we need auditing events around not creation of namespaces.
If we want creation of namespaces, we can audit the clone/unshare/setns
syscalls. The container is when a managing program such as docker, lxc, or
sometimes systemd creates a special operating environment for the express
purpose of running programs disassociated in some way from the parent
namespaces, cgroups, and security assumptions. Its this orchestration, just as
sshd orchestrates a login, that makes it different.


> Should every vsftp connection get a container every? Every chrome tab?

No. Also, note that not every program that grants a user session constitutes a
login.


> At some of the connections per second numbers I have seen we might
> exhaust a 32bit number in an hour or two. Will any of that make sense
> to someone reading the audit logs?

I would agree if we were auditing creation of name spaces. But going back to
the concept of login, these could occur at a high rate. This is a bruteforce
login attack. We put countermeasures in place to prevent it. But it is
possible for the session id to wrap. But in our case, things like lxc or
docker don't start hundreds of these a minute.


> Without considerning that container creation is an unprivileged
> operation I think it is pretty much nonsense. Do I get to say I am any
> container I want? That would seem to invalidate the concept of
> userspace setting a container id.

It would need to be a privileged operation just as setuid is.


> How does any of this interact with setns? AKA entering a container?

We have to audit this. For the moment, auditing the setns syscall may be
enough. I'd have to look at the lifecycle of the application that's doing this
to determine if we need more.


> I will go as far as looking at patches. If someone comes up with
> a mission statement about what they are actually trying to achieve and a
> mechanism that actually achieves that, and that allows for containers to
> nest we can talk about doing something like that.

Auditing wouldn't impose any restrictions on this. We just need a way to
observe actions within and associate them as needed to investigate violations
of security policy.

> But for right now I just hear proposals for things that make no sense
> and can not possibly work. Not least because it will require modifying
> every program that creates a container and who knows how many of them
> there are.

We only care about a couple programs doing the orchestration. They will need
to have the right support added to them. I'm hoping the analogy of a login
helps demonstrate what we are after.

-Steve

2015-05-15 13:18:22

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On May 15, 2015 9:38 PM, "Steve Grubb" <[email protected]> wrote:
>
> On Thursday, May 14, 2015 11:23:09 PM Andy Lutomirski wrote:
> > On Thu, May 14, 2015 at 7:32 PM, Richard Guy Briggs <[email protected]> wrote:
> > > On 15/05/14, Paul Moore wrote:
> > >> * Look at our existing audit records to determine which records should
> > >> have
> > >> namespace and container ID tokens added. We may only want to add the
> > >> additional fields in the case where the namespace/container ID tokens are
> > >> not the init namespace.
> > >
> > > If we have a record that ties a set of namespace IDs with a container
> > > ID, then I expect we only need to list the containerID along with auid
> > > and sessionID.
> >
> > The problem here is that the kernel has no concept of a "container", and I
> > don't think it makes any sense to add one just for audit. "Container" is a
> > marketing term used by some userspace tools.
>
> No, its a real thing just like a login. Does the kernel have any concept of a
> login? Yet it happens. And it causes us to generate events describing who,
> where from, role, success, and time of day. :-)
>

I really hope those records come from userspace, not the kernel. I
also wonder what happens when a user logs in and types "sudo agetty
/dev/ttyS0 115200". If a user does that and then someone logs in on
/dev/ttyS0, which login are they?

>
> > I can imagine that both audit could benefit from a concept of a
> > namespace *path* that understands nesting (e.g. root/2/5/1 or
> > something along those lines). Mapping these to "containers" belongs
> > in userspace, I think.
>
> I don't doubt that just as user space sequences the actions that are a login.
> I just need the kernel to do some book keeping and associate the necessary
> attributes in the event record to be able to reconstruct what is actually
> happening.

A precondition for that is having those records have some
correspondence to what is actually happening. Since the kernel has no
concept of a container, and since the same kernel mechanisms could be
used for things that are probably not whatever the Common Criteria
rules think a container is, this could be quite difficult to define in
a meaningful manner.

Hence my suggestion to add only minimal support in the kernel and to
do this in userspace.

--Andy

2015-05-15 13:19:30

by Daniel Walsh

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances



On 05/14/2015 10:11 PM, Richard Guy Briggs wrote:
> On 15/05/14, Oren Laadan wrote:
>> On Thu, May 14, 2015 at 8:48 PM, Richard Guy Briggs <[email protected]> wrote:
>>
>>>>>> Recording each instance of a name space is giving me something that I
>>>>>> cannot use to do queries required by the security target. Given these
>>>>>> events, how do I locate a web server event where it accesses a
>>> watched
>>>>>> file? That authentication failed? That an update within the container
>>>>>> failed?
>>>>>>
>>>>>> The requirements are that we have to log the creation, suspension,
>>>>>> migration, and termination of a container. The requirements are not
>>> on
>>>>>> the individual name space.
>>>>> Ok. Do we have a robust definition of a container?
>>>> We call the combination of name spaces, cgroups, and seccomp rules a
>>>> container.
>>> Can you detail what information is required from each?
>>>
>>>>> Where is that definition managed?
>>>> In the thing that invokes a container.
>>> I was looking for a reference to a standards document rather than an
>>> application...
>>>
>>>
>> [focusing on "containers id" - snipped the rest away]
>>
>> I am unfamiliar with the audit subsystem, but work with namespaces in other
>> contexts. Perhaps the term "container" is overloaded here. The definition
>> suggested by Steve in this thread makes sense to me: "a combination of
>> namespaces". I imagine people may want to audit subsets of namespaces.
> I assume it would be a bit more than that, including cgroup and seccomp info.
I don't see why seccomp versus other Security mechanism come into this.
Not really
sure of cgroup. That stuff would all be associated with the process. I
would guess
you could look at the process that modified these for logging, but that
should happen
at the time they get changed, Not recorded for every process.
>> For namespaces, can use a string like "A:B:C:D:E:F" as an identifier for a
>> particular combination, where A-F are respective namespaces identifiers.
>> (Can be taken for example from /proc/PID/ns/{mnt,uts,ipc,user,pid,net}).
>> That will even be grep-able to locate records related to a particular
>> subset
>> of namespaces. So a "container" in the classic meaning would have all A-F
>> unique and different from the init process, but processes separated only by
>> e.g. mnt-ns and net-ns will differ from the init process in A and F.
>>
>> (If a string is a no go, then perhaps combine the IDs in a unique way into a
>> super ID).
> I'd be fine with either, even including the nsfs deviceID.
>
>> Oren.
> - RGB
>
> --
> Richard Guy Briggs <[email protected]>
> Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
> Remote, Ottawa, Canada
> Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
>
> --
> Linux-audit mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/linux-audit

2015-05-15 14:56:05

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

Steve Grubb <[email protected]> writes:

> On Thursday, May 14, 2015 08:31:45 PM Eric W. Biederman wrote:
>> Paul Moore <[email protected]> writes:
>> > As Eric, and others, have stated, the container concept is a userspace
>> > idea, not a kernel idea; the kernel only knows, and cares about,
>> > namespaces. This is unlikely to change.
>> >
>> > However, as Steve points out, there is precedence for the kernel to record
>> > userspace tokens for the sake of audit. Personally I'm not a big fan of
>> > this in general, but I do recognize that it does satisfy a legitimate
>> > need. Think of things like auid and the sessionid as necessary evils;
>> > audit is already chock full of evilness I doubt one more will doom us all
>> > to hell.
>> >
>> > Moving forward, I'd like to see the following:
>> >
>> > * Create a container ID token (unsigned 32-bit integer?), similar to
>> > auid/sessionid, that is set by userspace and carried by the kernel to be
>> > used in audit records. I'd like to see some discussion on how we manage
>> > this, e.g. how do handle container ID inheritance, how do we handle
>> > nested containers (setting the containerid when it is already set), do we
>> > care if multiple different containers share the same namespace config,
>> > etc.?
>> >
>> > Can we all live with this? If not, please suggest some alternate ideas;
>> > simply shouting "IT'S ALL CRAP!" isn't helpful for anyone ... it may be
>> > true, but it doesn't help us solve the problem ;)
>>
>> Without stopping and defining what someone means by container I think it
>> is pretty much nonsense.
>
> Maybe this is what's hanging everyone up? Its easy to get lost when your view
> is down at the syscall level and what is happening in the kernel. Starting a
> container is akin to the idea of login. Not every call to setresuid is a
> login. It could be a setuid program starting or a daemon dropping privileges.
> The idea of a container is a higher level concept that starting a name space.
> I think comparing a login with a container is a useful analogy because both
> are higher level concepts but employ low level ideas. A login is a collection
> of chdir, setuid, setgid, allocating a tty, associating the first 3 file
> descriptors, setting a process group, and starting a specific executable. All
> these low level concepts each by itself is not special.

Except login and setresuid are privileged operation.

CREATING A CONTAINER IS NOT A PRIVILGED OPERATION.
Your analagy fails rather badly with respect to that fact.

> A container is what we need auditing events around not creation of namespaces.
> If we want creation of namespaces, we can audit the clone/unshare/setns
> syscalls. The container is when a managing program such as docker, lxc, or
> sometimes systemd creates a special operating environment for the express
> purpose of running programs disassociated in some way from the parent
> namespaces, cgroups, and security assumptions. Its this orchestration, just as
> sshd orchestrates a login, that makes it different.

What do you define as a container? From what I can tell we share
a similiar understanding of the term, and running lxc is not a
privileged operation. Running sandstorm.io is not a privileged
operation.

>> Should every vsftp connection get a container every? Every chrome tab?
>
> No. Also, note that not every program that grants a user session constitutes a
> login.

>> At some of the connections per second numbers I have seen we might
>> exhaust a 32bit number in an hour or two. Will any of that make sense
>> to someone reading the audit logs?
>
> I would agree if we were auditing creation of name spaces. But going back to
> the concept of login, these could occur at a high rate. This is a bruteforce
> login attack. We put countermeasures in place to prevent it. But it is
> possible for the session id to wrap. But in our case, things like lxc or
> docker don't start hundreds of these a minute.

Except there are reasonable situtations where container creation does
happen at fast rates. Outside of a container per network connection
(which is likely to happen at some point) I have seen builds fire up
more containers than I can count as part of automated testing.

>> Without considerning that container creation is an unprivileged
>> operation I think it is pretty much nonsense. Do I get to say I am any
>> container I want? That would seem to invalidate the concept of
>> userspace setting a container id.
>
> It would need to be a privileged operation just as setuid is.

CONTAINER CREATION IS NOT A PRIVILEGED OPERATION.

That is today. That is talking about lxc.

CONTAINER CREATION IS NOT A PRIVILEGED OPERATION.

And ultimately we don't want it to be, as if you can safely create a
container without privilege your system is safer.

>> How does any of this interact with setns? AKA entering a container?
>
> We have to audit this. For the moment, auditing the setns syscall may be
> enough. I'd have to look at the lifecycle of the application that's doing this
> to determine if we need more.

Frequently it will be sysadmins for some arbitrary reason calling
nsenter or a similar program that is more aware of their favorite
container flavor.

>> I will go as far as looking at patches. If someone comes up with
>> a mission statement about what they are actually trying to achieve and a
>> mechanism that actually achieves that, and that allows for containers to
>> nest we can talk about doing something like that.
>
> Auditing wouldn't impose any restrictions on this. We just need a way to
> observe actions within and associate them as needed to investigate violations
> of security policy.

*Rolls eyes* But the rest of the container tool kit in the kernel will
impose limitations on those identifiers.

>> But for right now I just hear proposals for things that make no sense
>> and can not possibly work. Not least because it will require modifying
>> every program that creates a container and who knows how many of them
>> there are.
>
> We only care about a couple programs doing the orchestration. They will need
> to have the right support added to them. I'm hoping the analogy of a login
> helps demonstrate what we are after.

All I see is that (a) you have not defined what you see a container as
(b) you have failed to acknowledge I can create a container without
privilege (which breaks your analogy with login).

But I think I am with Andy. If you only care about privileged events
and privileged containers, it is unlikely you need to do anything in the
kernel and you can perform whatever logging you see fit in your
privileged userspace applications.

Of course in the log run I don't see what good that will do you as I
expect increasingly there will not need to be any special permissions to
create containers.

Eric

2015-05-15 20:26:55

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On Thursday, May 14, 2015 08:48:55 PM Richard Guy Briggs wrote:
> On 15/05/14, Steve Grubb wrote:
> > What they would want to know is what resources were assigned; if two
> > containers shared a resource, what resource and container was it shared
> > with; if two containers can communicate, we need to see or control
> > information flow when necessary; and we need to see termination and
> > release of resources.
>
> So, namespaces are a big part of this. I understand how they are
> spawned and potentially shared. I have a more vague idea about how
> cgroups contribute to this concept of a container. So far, I have very
> little idea how seccomp contributes, but I assume that it will also need
> to be part of this tracking.

It doesn't, really. We shouldn't worry about seccomp from a
namespace/container auditing perspective. The normal seccomp auditing should
be sufficient for namespaces/containers.

--
paul moore
security @ redhat

2015-05-15 20:42:50

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On Thursday, May 14, 2015 09:10:56 PM Oren Laadan wrote:
> [focusing on "containers id" - snipped the rest away]
>
> I am unfamiliar with the audit subsystem, but work with namespaces in other
> contexts. Perhaps the term "container" is overloaded here. The definition
> suggested by Steve in this thread makes sense to me: "a combination of
> namespaces". I imagine people may want to audit subsets of namespaces.
>
> For namespaces, can use a string like "A:B:C:D:E:F" as an identifier for a
> particular combination, where A-F are respective namespaces identifiers.
> (Can be taken for example from /proc/PID/ns/{mnt,uts,ipc,user,pid,net}).
> That will even be grep-able to locate records related to a particular
> subset
> of namespaces. So a "container" in the classic meaning would have all A-F
> unique and different from the init process, but processes separated only by
> e.g. mnt-ns and net-ns will differ from the init process in A and F.
>
> (If a string is a no go, then perhaps combine the IDs in a unique way into a
> super ID).

As has been mentioned in every other email in this thread, the kernel has no
concept of a container, it is a userspace idea and trying to generate a
meaningful value in the kernel is a mistake in my opinion. My current opinion
is that we allow userspace to set a container ID token as it sees fit and the
kernel will just use the value provided by userspace.

--
paul moore
security @ redhat

2015-05-15 21:01:37

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On Thursday, May 14, 2015 08:31:45 PM Eric W. Biederman wrote:
> Paul Moore <[email protected]> writes:
> > As Eric, and others, have stated, the container concept is a userspace
> > idea, not a kernel idea; the kernel only knows, and cares about,
> > namespaces. This is unlikely to change.
> >
> > However, as Steve points out, there is precedence for the kernel to record
> > userspace tokens for the sake of audit. Personally I'm not a big fan of
> > this in general, but I do recognize that it does satisfy a legitimate
> > need. Think of things like auid and the sessionid as necessary evils;
> > audit is already chock full of evilness I doubt one more will doom us all
> > to hell.
> >
> > Moving forward, I'd like to see the following:
> >
> > * Create a container ID token (unsigned 32-bit integer?), similar to
> > auid/sessionid, that is set by userspace and carried by the kernel to be
> > used in audit records. I'd like to see some discussion on how we manage
> > this, e.g. how do handle container ID inheritance, how do we handle
> > nested containers (setting the containerid when it is already set), do we
> > care if multiple different containers share the same namespace config,
> > etc.?
> >
> >
> > Can we all live with this? If not, please suggest some alternate ideas;
> > simply shouting "IT'S ALL CRAP!" isn't helpful for anyone ... it may be
> > true, but it doesn't help us solve the problem ;)
>
> Without stopping and defining what someone means by container I think it
> is pretty much nonsense.

For what it is worth, I doubt we will ever arrive at a consistent definition
of a container. This is one of the reasons why I don't think we want the
kernel generating a container ID token, although I understand the real world
desire to have the kernel report such information back in the audit logs.

> Should every vsftp connection get a container every? Every chrome tab?

That's up to the individual system. I would argue that's a pretty silly
configuration, but one persons silliness is another's best practice. It's a
mad, mad world.

> At some of the connections per second numbers I have seen we might
> exhaust a 32bit number in an hour or two. Will any of that make sense
> to someone reading the audit logs?

If someone if going to spawn each process in a container then they will need
to live with the fallout of that decision.

Also, if folks thing 32-bits is too small, we can always do 64-bits, but I
don't think that was the point you were trying to make (I could be wrong).

> Without considerning that container creation is an unprivileged
> operation I think it is pretty much nonsense. Do I get to say I am any
> container I want? That would seem to invalidate the concept of
> userspace setting a container id.
>
> How does any of this interact with setns? AKA entering a container?

As I said in my email, I think we need some discussion around this; I don't
pretend to think we have this sorted at this point. I just want to make sure
were working towards some common ground instead of shouting the same stuff
back and forth at each other.

> I will go as far as looking at patches. If someone comes up with
> a mission statement about what they are actually trying to achieve and a
> mechanism that actually achieves that, and that allows for containers to
> nest we can talk about doing something like that.

I think Steve has posted some requirements that Richard is trying to satisfy
with these patches; we've also heard from at least one person who is looking
at how to deploy this in the Real World. Perhaps in the next round of patches
Richard can list the requirements in the 0/X patch and describe how they are
satisfied in the patchset.

Beyond that, and ignoring for a moment the whole "a container is not a
*thing*" argument, can I assume that the auditing of nested "containers" are
your main remaining concern at this point?

> But for right now I just hear proposals for things that make no sense
> and can not possibly work. Not least because it will require modifying
> every program that creates a container and who knows how many of them
> there are. Especially since you don't need to be root. Modifying
> /usr/bin/unshare seems a little far out to me.

I think it is very reasonable that there will be some container infrastructure
tools which would handle this, we're already seeing this happening now; asking
for minor changes to these infrastructure applications to support container
auditing doesn't seem like a significant ask to me. Also, to be perfectly
clear, if the applications aren't updated it isn't as if they will fail to
work, it is just that they won't be able to take advantage of the new
container auditing capabilities. That seems reasonable to me.

--
paul moore
security @ redhat

2015-05-15 21:05:30

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On Thursday, May 14, 2015 11:23:09 PM Andy Lutomirski wrote:
> On Thu, May 14, 2015 at 7:32 PM, Richard Guy Briggs <[email protected]> wrote:
> > On 15/05/14, Paul Moore wrote:
> >> * Look at our existing audit records to determine which records should
> >> have
> >> namespace and container ID tokens added. We may only want to add the
> >> additional fields in the case where the namespace/container ID tokens are
> >> not the init namespace.
> >
> > If we have a record that ties a set of namespace IDs with a container
> > ID, then I expect we only need to list the containerID along with auid
> > and sessionID.
>
> The problem here is that the kernel has no concept of a "container", and I
> don't think it makes any sense to add one just for audit. "Container" is a
> marketing term used by some userspace tools.
>
> I can imagine that both audit could benefit from a concept of a
> namespace *path* that understands nesting (e.g. root/2/5/1 or
> something along those lines). Mapping these to "containers" belongs
> in userspace, I think.

It might be helpful to climb up a few levels in this thread ...

I think we all agree that containers are not a kernel construct. I further
believe that the kernel has no business generating container IDs, those should
come from userspace and will likely be different depending on how you define
"container". However, what is less clear to me at this point is how the
kernel should handle the setting, reporting, and general management of this
container ID token.

--
paul moore
security @ redhat

2015-05-16 09:48:06

by Daniel Walsh

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances



On 05/15/2015 05:05 PM, Paul Moore wrote:
> On Thursday, May 14, 2015 11:23:09 PM Andy Lutomirski wrote:
>> On Thu, May 14, 2015 at 7:32 PM, Richard Guy Briggs <[email protected]> wrote:
>>> On 15/05/14, Paul Moore wrote:
>>>> * Look at our existing audit records to determine which records should
>>>> have
>>>> namespace and container ID tokens added. We may only want to add the
>>>> additional fields in the case where the namespace/container ID tokens are
>>>> not the init namespace.
>>> If we have a record that ties a set of namespace IDs with a container
>>> ID, then I expect we only need to list the containerID along with auid
>>> and sessionID.
>> The problem here is that the kernel has no concept of a "container", and I
>> don't think it makes any sense to add one just for audit. "Container" is a
>> marketing term used by some userspace tools.
>>
>> I can imagine that both audit could benefit from a concept of a
>> namespace *path* that understands nesting (e.g. root/2/5/1 or
>> something along those lines). Mapping these to "containers" belongs
>> in userspace, I think.
> It might be helpful to climb up a few levels in this thread ...
>
> I think we all agree that containers are not a kernel construct. I further
> believe that the kernel has no business generating container IDs, those should
> come from userspace and will likely be different depending on how you define
> "container". However, what is less clear to me at this point is how the
> kernel should handle the setting, reporting, and general management of this
> container ID token.
>
Wouldn't the easiest thing be to just treat add a containerid to the
process context like auid. Then
make it a privileged operation to set it. Then tools that care about
auditing like docker can set the ID
and remove the Capability from it sub processes if it cares. All
processes adopt parent processes containerid.
Now containers can be audited and as long as userspace is written
correctly nested containers can either override the containerid or not
depending on what the audit rules are.

No special handling inside of namespaces.

2015-05-16 12:16:59

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On Sat, May 16, 2015 at 5:46 AM, Daniel J Walsh <[email protected]> wrote:
> On 05/15/2015 05:05 PM, Paul Moore wrote:
>> On Thursday, May 14, 2015 11:23:09 PM Andy Lutomirski wrote:
>>> On Thu, May 14, 2015 at 7:32 PM, Richard Guy Briggs <[email protected]> wrote:
>>>> On 15/05/14, Paul Moore wrote:
>>>>> * Look at our existing audit records to determine which records should
>>>>> have
>>>>> namespace and container ID tokens added. We may only want to add the
>>>>> additional fields in the case where the namespace/container ID tokens are
>>>>> not the init namespace.
>>>> If we have a record that ties a set of namespace IDs with a container
>>>> ID, then I expect we only need to list the containerID along with auid
>>>> and sessionID.
>>> The problem here is that the kernel has no concept of a "container", and I
>>> don't think it makes any sense to add one just for audit. "Container" is a
>>> marketing term used by some userspace tools.
>>>
>>> I can imagine that both audit could benefit from a concept of a
>>> namespace *path* that understands nesting (e.g. root/2/5/1 or
>>> something along those lines). Mapping these to "containers" belongs
>>> in userspace, I think.
>> It might be helpful to climb up a few levels in this thread ...
>>
>> I think we all agree that containers are not a kernel construct. I further
>> believe that the kernel has no business generating container IDs, those should
>> come from userspace and will likely be different depending on how you define
>> "container". However, what is less clear to me at this point is how the
>> kernel should handle the setting, reporting, and general management of this
>> container ID token.
>>
> Wouldn't the easiest thing be to just treat add a containerid to the
> process context like auid.

I believe so. At least that was the point I was trying to get across
when I first jumped into this thread.

> Then make it a privileged operation to set it. Then tools that care about
> auditing like docker can set the ID
> and remove the Capability from it sub processes if it cares. All
> processes adopt parent processes containerid.
> Now containers can be audited and as long as userspace is written
> correctly nested containers can either override the containerid or not
> depending on what the audit rules are.

This part I'm still less certain on. I agree that setting the
container ID should be privileged in some sense, but the kernel
shouldn't *require* privilege to create a new container (however the
user chooses to define it). Simply requiring privilege to set the
container ID and failing silently may be sufficient.

--
paul moore
http://www.paul-moore.com

2015-05-16 14:51:25

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

Paul Moore <[email protected]> writes:

> On Sat, May 16, 2015 at 5:46 AM, Daniel J Walsh <[email protected]> wrote:
>> On 05/15/2015 05:05 PM, Paul Moore wrote:
>>> On Thursday, May 14, 2015 11:23:09 PM Andy Lutomirski wrote:
>>>> On Thu, May 14, 2015 at 7:32 PM, Richard Guy Briggs <[email protected]> wrote:
>>>>> On 15/05/14, Paul Moore wrote:
>>>>>> * Look at our existing audit records to determine which records should
>>>>>> have
>>>>>> namespace and container ID tokens added. We may only want to add the
>>>>>> additional fields in the case where the namespace/container ID tokens are
>>>>>> not the init namespace.
>>>>> If we have a record that ties a set of namespace IDs with a container
>>>>> ID, then I expect we only need to list the containerID along with auid
>>>>> and sessionID.
>>>> The problem here is that the kernel has no concept of a "container", and I
>>>> don't think it makes any sense to add one just for audit. "Container" is a
>>>> marketing term used by some userspace tools.
>>>>
>>>> I can imagine that both audit could benefit from a concept of a
>>>> namespace *path* that understands nesting (e.g. root/2/5/1 or
>>>> something along those lines). Mapping these to "containers" belongs
>>>> in userspace, I think.
>>> It might be helpful to climb up a few levels in this thread ...
>>>
>>> I think we all agree that containers are not a kernel construct. I further
>>> believe that the kernel has no business generating container IDs, those should
>>> come from userspace and will likely be different depending on how you define
>>> "container". However, what is less clear to me at this point is how the
>>> kernel should handle the setting, reporting, and general management of this
>>> container ID token.
>>>
>> Wouldn't the easiest thing be to just treat add a containerid to the
>> process context like auid.
>
> I believe so. At least that was the point I was trying to get across
> when I first jumped into this thread.

It sounds nice but containers are not just a per process construct.
Sometimes you might know anamespace but not which process instigated
action to happen on that namespace.

>> Then make it a privileged operation to set it. Then tools that care about
>> auditing like docker can set the ID
>> and remove the Capability from it sub processes if it cares. All
>> processes adopt parent processes containerid.
>> Now containers can be audited and as long as userspace is written
>> correctly nested containers can either override the containerid or not
>> depending on what the audit rules are.
>
> This part I'm still less certain on. I agree that setting the
> container ID should be privileged in some sense, but the kernel
> shouldn't *require* privilege to create a new container (however the
> user chooses to define it). Simply requiring privilege to set the
> container ID and failing silently may be sufficient.

My hope is as things mature fewer and fewer container things will need
any special privilege to create.

I think it needs to start with a clear definition of what is wanted and
then working backwards through which messages in which contexts you want
to have your magic bits.

Eric

2015-05-16 22:49:43

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On Sat, May 16, 2015 at 10:46 AM, Eric W. Biederman
<[email protected]> wrote:
> Paul Moore <[email protected]> writes:
>> On Sat, May 16, 2015 at 5:46 AM, Daniel J Walsh <[email protected]> wrote:
>>> On 05/15/2015 05:05 PM, Paul Moore wrote:
>>>> On Thursday, May 14, 2015 11:23:09 PM Andy Lutomirski wrote:
>>>>> On Thu, May 14, 2015 at 7:32 PM, Richard Guy Briggs <[email protected]> wrote:
>>>>>> On 15/05/14, Paul Moore wrote:
>>>>>>> * Look at our existing audit records to determine which records should
>>>>>>> have
>>>>>>> namespace and container ID tokens added. We may only want to add the
>>>>>>> additional fields in the case where the namespace/container ID tokens are
>>>>>>> not the init namespace.
>>>>>> If we have a record that ties a set of namespace IDs with a container
>>>>>> ID, then I expect we only need to list the containerID along with auid
>>>>>> and sessionID.
>>>>> The problem here is that the kernel has no concept of a "container", and I
>>>>> don't think it makes any sense to add one just for audit. "Container" is a
>>>>> marketing term used by some userspace tools.
>>>>>
>>>>> I can imagine that both audit could benefit from a concept of a
>>>>> namespace *path* that understands nesting (e.g. root/2/5/1 or
>>>>> something along those lines). Mapping these to "containers" belongs
>>>>> in userspace, I think.
>>>> It might be helpful to climb up a few levels in this thread ...
>>>>
>>>> I think we all agree that containers are not a kernel construct. I further
>>>> believe that the kernel has no business generating container IDs, those should
>>>> come from userspace and will likely be different depending on how you define
>>>> "container". However, what is less clear to me at this point is how the
>>>> kernel should handle the setting, reporting, and general management of this
>>>> container ID token.
>>>>
>>> Wouldn't the easiest thing be to just treat add a containerid to the
>>> process context like auid.
>>
>> I believe so. At least that was the point I was trying to get across
>> when I first jumped into this thread.
>
> It sounds nice but containers are not just a per process construct.
> Sometimes you might know anamespace but not which process instigated
> action to happen on that namespace.

>From an auditing perspective I'm not sure we will ever hit those
cases; did you have a particular example in mind?

--
paul moore
http://www.paul-moore.com

2015-05-19 13:09:30

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On 15/05/16, Paul Moore wrote:
> On Sat, May 16, 2015 at 10:46 AM, Eric W. Biederman
> <[email protected]> wrote:
> > Paul Moore <[email protected]> writes:
> >> On Sat, May 16, 2015 at 5:46 AM, Daniel J Walsh <[email protected]> wrote:
> >>> On 05/15/2015 05:05 PM, Paul Moore wrote:
> >>>> On Thursday, May 14, 2015 11:23:09 PM Andy Lutomirski wrote:
> >>>>> On Thu, May 14, 2015 at 7:32 PM, Richard Guy Briggs <[email protected]> wrote:
> >>>>>> On 15/05/14, Paul Moore wrote:
> >>>>>>> * Look at our existing audit records to determine which records should
> >>>>>>> have
> >>>>>>> namespace and container ID tokens added. We may only want to add the
> >>>>>>> additional fields in the case where the namespace/container ID tokens are
> >>>>>>> not the init namespace.
> >>>>>> If we have a record that ties a set of namespace IDs with a container
> >>>>>> ID, then I expect we only need to list the containerID along with auid
> >>>>>> and sessionID.
> >>>>> The problem here is that the kernel has no concept of a "container", and I
> >>>>> don't think it makes any sense to add one just for audit. "Container" is a
> >>>>> marketing term used by some userspace tools.
> >>>>>
> >>>>> I can imagine that both audit could benefit from a concept of a
> >>>>> namespace *path* that understands nesting (e.g. root/2/5/1 or
> >>>>> something along those lines). Mapping these to "containers" belongs
> >>>>> in userspace, I think.
> >>>> It might be helpful to climb up a few levels in this thread ...
> >>>>
> >>>> I think we all agree that containers are not a kernel construct. I further
> >>>> believe that the kernel has no business generating container IDs, those should
> >>>> come from userspace and will likely be different depending on how you define
> >>>> "container". However, what is less clear to me at this point is how the
> >>>> kernel should handle the setting, reporting, and general management of this
> >>>> container ID token.
> >>>>
> >>> Wouldn't the easiest thing be to just treat add a containerid to the
> >>> process context like auid.
> >>
> >> I believe so. At least that was the point I was trying to get across
> >> when I first jumped into this thread.
> >
> > It sounds nice but containers are not just a per process construct.
> > Sometimes you might know anamespace but not which process instigated
> > action to happen on that namespace.
>
> >From an auditing perspective I'm not sure we will ever hit those
> cases; did you have a particular example in mind?

The example that immediately came to mind when I first read Eric's
comment was a packet coming in off a network in a particular network
namespace. That could narrow it down to a subset of containers based on
which network namespace it inhabits, but since it isn't associated with
a particular task yet (other than a kernel thread) it will not be
possible to select the precise nsproxy, let alone the container.

> paul moore

- RGB

--
Richard Guy Briggs <[email protected]>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

2015-05-19 14:27:40

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances

On Tue, May 19, 2015 at 9:09 AM, Richard Guy Briggs <[email protected]> wrote:
> On 15/05/16, Paul Moore wrote:
>> On Sat, May 16, 2015 at 10:46 AM, Eric W. Biederman wrote:
>> > It sounds nice but containers are not just a per process construct.
>> > Sometimes you might know anamespace but not which process instigated
>> > action to happen on that namespace.
>>
>> From an auditing perspective I'm not sure we will ever hit those
>> cases; did you have a particular example in mind?
>
> The example that immediately came to mind when I first read Eric's
> comment was a packet coming in off a network in a particular network
> namespace. That could narrow it down to a subset of containers based on
> which network namespace it inhabits, but since it isn't associated with
> a particular task yet (other than a kernel thread) it will not be
> possible to select the precise nsproxy, let alone the container.

Thanks, I was stuck thinking about syscall based auditing and forgot
about the various LSM based audit records. Of all people you would
think I would remember per-packet audit records ;)

Anyway, in this case I think including the namespace ID is sufficient,
largely because the container userspace doesn't have access to the
packet at this point. In order to actually receive the data the
container's userspace will need to issue a syscall where we can
include the container ID. An overly zealous security officer who
wants to trace all the kernel level audit events, like the one you
describe, can match up the namespace to a container in post-processing
if needed.

--
paul moore
http://www.paul-moore.com