2019-04-09 03:40:50

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V6 00/10] audit: implement container identifier

Implement kernel audit container identifier.

This patchset is a fifth based on the proposal document (V3)
posted:
https://www.redhat.com/archives/linux-audit/2018-January/msg00014.html

The first patch was the last patch from ghak81 that was absorbed into
this patchset since its primary justification is the rest of this
patchset.

The second patch implements the proc fs write to set the audit container
identifier of a process, emitting an AUDIT_CONTAINER_OP record to
announce the registration of that audit container identifier on that
process. This patch requires userspace support for record acceptance
and proper type display.

The third implements reading the audit container identifier from the
proc filesystem for debugging. This patch wasn't planned for upstream
inclusion but is starting to become more likely.

The fourth implements the auxiliary record AUDIT_CONTAINER_ID if an audit
container identifier is associated with an event. This patch requires
userspace support for proper type display.

The 5th adds audit daemon signalling provenance through audit_sig_info2.

The 6th creates a local audit context to be able to bind a standalone
record with a locally created auxiliary record.

The 7th patch adds audit container identifier records to the user
standalone records.

The 8th adds audit container identifier filtering to the exit,
exclude and user lists. This patch adds the AUDIT_CONTID field and
requires auditctl userspace support for the --contid option.

The 9th adds network namespace audit container identifier labelling
based on member tasks' audit container identifier labels.

The 10th adds audit container identifier support to standalone netfilter
records that don't have a task context and lists each container to which
that net namespace belongs.

Example: Set an audit container identifier of 123456 to the "sleep" task:

sleep 2&
child=$!
echo 123456 > /proc/$child/audit_containerid; echo $?
ausearch -ts recent -m container_op
echo child:$child contid:$( cat /proc/$child/audit_containerid)

This should produce a record such as:

type=CONTAINER_OP msg=audit(2018-06-06 12:39:29.636:26949) : op=set opid=2209 contid=123456 old-contid=18446744073709551615 pid=628 auid=root uid=root tty=ttyS0 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm=bash exe=/usr/bin/bash res=yes


Example: Set a filter on an audit container identifier 123459 on /tmp/tmpcontainerid:

contid=123459
key=tmpcontainerid
auditctl -a exit,always -F dir=/tmp -F perm=wa -F contid=$contid -F key=$key
perl -e "sleep 1; open(my \$tmpfile, '>', \"/tmp/$key\"); close(\$tmpfile);" &
child=$!
echo $contid > /proc/$child/audit_containerid
sleep 2
ausearch -i -ts recent -k $key
auditctl -d exit,always -F dir=/tmp -F perm=wa -F contid=$contid -F key=$key
rm -f /tmp/$key

This should produce an event such as:

type=CONTAINER_ID msg=audit(2018-06-06 12:46:31.707:26953) : contid=123459
type=PROCTITLE msg=audit(2018-06-06 12:46:31.707:26953) : proctitle=perl -e sleep 1; open(my $tmpfile, '>', "/tmp/tmpcontainerid"); close($tmpfile);
type=PATH msg=audit(2018-06-06 12:46:31.707:26953) : item=1 name=/tmp/tmpcontainerid inode=25656 dev=00:26 mode=file,644 ouid=root ogid=root rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=none cap_fi=none cap_fe=0 cap_fver=0
type=PATH msg=audit(2018-06-06 12:46:31.707:26953) : item=0 name=/tmp/ inode=8985 dev=00:26 mode=dir,sticky,777 ouid=root ogid=root rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype=PARENT cap_fp=none cap_fi=none cap_fe=0 cap_fver=0
type=CWD msg=audit(2018-06-06 12:46:31.707:26953) : cwd=/root
type=SYSCALL msg=audit(2018-06-06 12:46:31.707:26953) : arch=x86_64 syscall=openat success=yes exit=3 a0=0xffffffffffffff9c a1=0x5621f2b81900 a2=O_WRONLY|O_CREAT|O_TRUNC a3=0x1b6 items=2 ppid=628 pid=2232 auid=root uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=ttyS0 ses=1 comm=perl exe=/usr/bin/perl subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=tmpcontainerid

Example: Test multiple containers on one netns:

sleep 5 &
child1=$!
containerid1=123451
echo $containerid1 > /proc/$child1/audit_containerid
sleep 5 &
child2=$!
containerid2=123452
echo $containerid2 > /proc/$child2/audit_containerid
iptables -I INPUT -i lo -p icmp --icmp-type echo-request -j AUDIT --type accept
iptables -I INPUT -t mangle -i lo -p icmp --icmp-type echo-request -j MARK --set-mark 0x12345555
sleep 1;
bash -c "ping -q -c 1 127.0.0.1 >/dev/null 2>&1"
sleep 1;
ausearch -i -m NETFILTER_PKT -ts boot|grep mark=0x12345555
ausearch -i -m NETFILTER_PKT -ts boot|grep contid=|grep $containerid1|grep $containerid2

This should produce an event such as:

type=NETFILTER_PKT msg=audit(03/15/2019 14:16:13.369:244) : mark=0x12345555 saddr=127.0.0.1 daddr=127.0.0.1 proto=icmp
type=CONTAINER_ID msg=audit(03/15/2019 14:16:13.369:244) : contid=123452,123451


Includes the last patch of https://github.com/linux-audit/audit-kernel/issues/81
Please see the github audit kernel issue for the main feature:
https://github.com/linux-audit/audit-kernel/issues/90
and the kernel filter code:
https://github.com/linux-audit/audit-kernel/issues/91
and the network support:
https://github.com/linux-audit/audit-kernel/issues/92
Please see the github audit userspace issue for supporting record types:
https://github.com/linux-audit/audit-userspace/issues/51
and filter code:
https://github.com/linux-audit/audit-userspace/issues/40
Please see the github audit testsuiite issue for the test case:
https://github.com/linux-audit/audit-testsuite/issues/64
Please see the github audit wiki for the feature overview:
https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID


Changelog:

v6
- change TMPBUFLEN from 11 to 21 to cover the decimal value of contid
u64 (nhorman)
- fix bug overwriting ctx in struct audit_sig_info, move cid above
ctx[0] (nhorman)
- fix bug skipping remaining fields and not advancing bufp when copying
out contid in audit_krule_to_data (omosnacec)
- add acks, tidy commit descriptions, other formatting fixes (checkpatch
wrong on audit_log_lost)
- cast ull for u64 prints
- target_cid tracking was moved from the ptrace/signal patch to
container_op
- target ptrace and signal records were moved from the ptrace/signal
patch to container_id
- auditd signaller tracking was moved to a new AUDIT_SIGNAL_INFO2
request and record
- ditch unnecessary list_empty() checks
- check for null net and aunet in audit_netns_contid_add()
- swap CONTAINER_OP contid/old-contid order to ease parsing

v5
- address loginuid and sessionid syscall scope in ghak104
- address audit_context in CONFIG_AUDIT vs CONFIG_AUDITSYSCALL in ghak105
- remove tty patch, addressed in ghak106
- rebase on audit/next v5.0-rc1
w/ghak59/ghak104/ghak103/ghak100/ghak107/ghak105/ghak106/ghak105sup
- update CONTAINER_ID to CONTAINER_OP in patch description
- move audit_context in audit_task_info to CONFIG_AUDITSYSCALL
- move audit_alloc() and audit_free() out of CONFIG_AUDITSYSCALL and into
CONFIG_AUDIT and create audit_{alloc,free}_syscall
- use plain kmem_cache_alloc() rather than kmem_cache_zalloc() in audit_alloc()
- fix audit_get_contid() declaration type error
- move audit_set_contid() from auditsc.c to audit.c
- audit_log_contid() returns void
- audit_log_contid() handed contid rather than tsk
- switch from AUDIT_CONTAINER to AUDIT_CONTAINER_ID for aux record
- move audit_log_contid(tsk/contid) & audit_contid_set(tsk)/audit_contid_valid(contid)
- switch from tsk to current
- audit_alloc_local() calls audit_log_lost() on failure to allocate a context
- add AUDIT_USER* non-syscall contid record
- cosmetic cleanup double parens, goto out on err
- ditch audit_get_ns_contid_list_lock(), fix aunet lock race
- switch from all-cpu read spinlock to rcu, keep spinlock for write
- update audit_alloc_local() to use ktime_get_coarse_real_ts64()
- add nft_log support
- add call from do_exit() in audit_free() to remove contid from netns
- relegate AUDIT_CONTAINER ref= field (was op=) to debug patch

v4
- preface set with ghak81:"collect audit task parameters"
- add shallyn and sgrubb acks
- rename feature bitmap macro
- rename cid_valid() to audit_contid_valid()
- rename AUDIT_CONTAINER_ID to AUDIT_CONTAINER_OP
- delete audit_get_contid_list() from headers
- move work into inner if, delete "found"
- change netns contid list function names
- move exports for audit_log_contid audit_alloc_local audit_free_context to non-syscall patch
- list contids CSV
- pass in gfp flags to audit_alloc_local() (fix audit_alloc_context callers)
- use "local" in lieu of abusing in_syscall for auditsc_get_stamp()
- read_lock(&tasklist_lock) around children and thread check
- task_lock(tsk) should be taken before first check of tsk->audit
- add spin lock to contid list in aunet
- restrict /proc read to CAP_AUDIT_CONTROL
- remove set again prohibition and inherited flag
- delete contidion spelling fix from patchset, send to netdev/linux-wireless

v3
- switched from containerid in task_struct to audit_task_info (depends on ghak81)
- drop INVALID_CID in favour of only AUDIT_CID_UNSET
- check for !audit_task_info, throw -ENOPROTOOPT on set
- changed -EPERM to -EEXIST for parent check
- return AUDIT_CID_UNSET if !audit_enabled
- squash child/thread check patch into AUDIT_CONTAINER_ID patch
- changed -EPERM to -EBUSY for child check
- separate child and thread checks, use -EALREADY for latter
- move addition of op= from ptrace/signal patch to AUDIT_CONTAINER patch
- fix && to || bashism in ptrace/signal patch
- uninline and export function for audit_free_context()
- drop CONFIG_CHANGE, FEATURE_CHANGE, ANOM_ABEND, ANOM_SECCOMP patches
- move audit_enabled check (xt_AUDIT)
- switched from containerid list in struct net to net_generic's struct audit_net
- move containerid list iteration into audit (xt_AUDIT)
- create function to move namespace switch into audit
- switched /proc/PID/ entry from containerid to audit_containerid
- call kzalloc with GFP_ATOMIC on in_atomic() in audit_alloc_context()
- call kzalloc with GFP_ATOMIC on in_atomic() in audit_log_container_info()
- use xt_net(par) instead of sock_net(skb->sk) to get net
- switched record and field names: initial CONTAINER_ID, aux CONTAINER, field CONTID
- allow to set own contid
- open code audit_set_containerid
- add contid inherited flag
- ccontainerid and pcontainerid eliminated due to inherited flag
- change name of container list funcitons
- rename containerid to contid
- convert initial container record to syscall aux
- fix spelling mistake of contidion in net/rfkill/core.c to avoid contid name collision

v2
- add check for children and threads
- add network namespace container identifier list
- add NETFILTER_PKT audit container identifier logging
- patch description and documentation clean-up and example
- reap unused ppid

Richard Guy Briggs (10):
audit: collect audit task parameters
audit: add container id
audit: read container ID of a process
audit: log container info of syscalls
audit: add contid support for signalling the audit daemon
audit: add support for non-syscall auxiliary records
audit: add containerid support for user records
audit: add containerid filtering
audit: add support for containerid to network namespaces
audit: NETFILTER_PKT: record each container ID associated with a netNS

fs/proc/base.c | 57 +++++++-
include/linux/audit.h | 113 +++++++++++++--
include/linux/sched.h | 7 +-
include/uapi/linux/audit.h | 9 +-
init/init_task.c | 3 +-
init/main.c | 2 +
kernel/audit.c | 325 ++++++++++++++++++++++++++++++++++++++++++--
kernel/audit.h | 9 ++
kernel/auditfilter.c | 47 +++++++
kernel/auditsc.c | 90 ++++++++----
kernel/fork.c | 1 -
kernel/nsproxy.c | 4 +
net/netfilter/nft_log.c | 11 +-
net/netfilter/xt_AUDIT.c | 11 +-
security/selinux/nlmsgtab.c | 1 +
15 files changed, 627 insertions(+), 63 deletions(-)

--
1.8.3.1


2019-04-09 03:41:08

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V6 02/10] audit: add container id

Implement the proc fs write to set the audit container identifier of a
process, emitting an AUDIT_CONTAINER_OP record to document the event.

This is a write from the container orchestrator task to a proc entry of
the form /proc/PID/audit_containerid where PID is the process ID of the
newly created task that is to become the first task in a container, or
an additional task added to a container.

The write expects up to a u64 value (unset: 18446744073709551615).

The writer must have capability CAP_AUDIT_CONTROL.

This will produce a record such as this:
type=CONTAINER_OP msg=audit(2018-06-06 12:39:29.636:26949) : op=set opid=2209 contid=123456 old-contid=18446744073709551615 pid=628 auid=root uid=root tty=ttyS0 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm=bash exe=/usr/bin/bash res=yes

The "op" field indicates an initial set. The "pid" to "ses" fields are
the orchestrator while the "opid" field is the object's PID, the process
being "contained". New and old audit container identifier values are
given in the "contid" fields, while res indicates its success.

It is not permitted to unset the audit container identifier.
A child inherits its parent's audit container identifier.

Please see the github audit kernel issue for the main feature:
https://github.com/linux-audit/audit-kernel/issues/90
Please see the github audit userspace issue for supporting additions:
https://github.com/linux-audit/audit-userspace/issues/51
Please see the github audit testsuiite issue for the test case:
https://github.com/linux-audit/audit-testsuite/issues/64
Please see the github audit wiki for the feature overview:
https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID

Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Acked-by: Steve Grubb <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
---
fs/proc/base.c | 36 ++++++++++++++++++++++++
include/linux/audit.h | 25 +++++++++++++++++
include/uapi/linux/audit.h | 2 ++
kernel/audit.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++
kernel/audit.h | 1 +
kernel/auditsc.c | 4 +++
6 files changed, 137 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index ddef482f1334..43fd0c4b87de 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1294,6 +1294,40 @@ static ssize_t proc_sessionid_read(struct file * file, char __user * buf,
.read = proc_sessionid_read,
.llseek = generic_file_llseek,
};
+
+static ssize_t proc_contid_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct inode *inode = file_inode(file);
+ u64 contid;
+ int rv;
+ struct task_struct *task = get_proc_task(inode);
+
+ if (!task)
+ return -ESRCH;
+ if (*ppos != 0) {
+ /* No partial writes. */
+ put_task_struct(task);
+ return -EINVAL;
+ }
+
+ rv = kstrtou64_from_user(buf, count, 10, &contid);
+ if (rv < 0) {
+ put_task_struct(task);
+ return rv;
+ }
+
+ rv = audit_set_contid(task, contid);
+ put_task_struct(task);
+ if (rv < 0)
+ return rv;
+ return count;
+}
+
+static const struct file_operations proc_contid_operations = {
+ .write = proc_contid_write,
+ .llseek = generic_file_llseek,
+};
#endif

#ifdef CONFIG_FAULT_INJECTION
@@ -3033,6 +3067,7 @@ static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
#ifdef CONFIG_AUDIT
REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
REG("sessionid", S_IRUGO, proc_sessionid_operations),
+ REG("audit_containerid", S_IWUSR, proc_contid_operations),
#endif
#ifdef CONFIG_FAULT_INJECTION
REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
@@ -3431,6 +3466,7 @@ static int proc_tid_comm_permission(struct inode *inode, int mask)
#ifdef CONFIG_AUDIT
REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
REG("sessionid", S_IRUGO, proc_sessionid_operations),
+ REG("audit_containerid", S_IWUSR, proc_contid_operations),
#endif
#ifdef CONFIG_FAULT_INJECTION
REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
diff --git a/include/linux/audit.h b/include/linux/audit.h
index bde346e73f0c..301337776193 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -89,6 +89,7 @@ struct audit_field {
struct audit_task_info {
kuid_t loginuid;
unsigned int sessionid;
+ u64 contid;
#ifdef CONFIG_AUDITSYSCALL
struct audit_context *ctx;
#endif
@@ -189,6 +190,15 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
return tsk->audit->sessionid;
}

+extern int audit_set_contid(struct task_struct *tsk, u64 contid);
+
+static inline u64 audit_get_contid(struct task_struct *tsk)
+{
+ if (!tsk->audit)
+ return AUDIT_CID_UNSET;
+ return tsk->audit->contid;
+}
+
extern u32 audit_enabled;
#else /* CONFIG_AUDIT */
static inline int audit_alloc(struct task_struct *task)
@@ -250,6 +260,11 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
return AUDIT_SID_UNSET;
}

+static inline u64 audit_get_contid(struct task_struct *tsk)
+{
+ return AUDIT_CID_UNSET;
+}
+
#define audit_enabled AUDIT_OFF
#endif /* CONFIG_AUDIT */

@@ -606,6 +621,16 @@ static inline bool audit_loginuid_set(struct task_struct *tsk)
return uid_valid(audit_get_loginuid(tsk));
}

+static inline bool audit_contid_valid(u64 contid)
+{
+ return contid != AUDIT_CID_UNSET;
+}
+
+static inline bool audit_contid_set(struct task_struct *tsk)
+{
+ return audit_contid_valid(audit_get_contid(tsk));
+}
+
static inline void audit_log_string(struct audit_buffer *ab, const char *buf)
{
audit_log_n_string(ab, buf, strlen(buf));
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 3901c51c0b93..4a6a8bf1de32 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -71,6 +71,7 @@
#define AUDIT_TTY_SET 1017 /* Set TTY auditing status */
#define AUDIT_SET_FEATURE 1018 /* Turn an audit feature on or off */
#define AUDIT_GET_FEATURE 1019 /* Get which features are enabled */
+#define AUDIT_CONTAINER_OP 1020 /* Define the container id and info */

#define AUDIT_FIRST_USER_MSG 1100 /* Userspace messages mostly uninteresting to kernel */
#define AUDIT_USER_AVC 1107 /* We filter this differently */
@@ -485,6 +486,7 @@ struct audit_tty_status {

#define AUDIT_UID_UNSET (unsigned int)-1
#define AUDIT_SID_UNSET ((unsigned int)-1)
+#define AUDIT_CID_UNSET ((u64)-1)

/* audit_rule_data supports filter rules with both integer and string
* fields. It corresponds with AUDIT_ADD_RULE, AUDIT_DEL_RULE and
diff --git a/kernel/audit.c b/kernel/audit.c
index 3fb09783cd4a..182b0f2c183d 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -244,6 +244,7 @@ int audit_alloc(struct task_struct *tsk)
}
info->loginuid = audit_get_loginuid(current);
info->sessionid = audit_get_sessionid(current);
+ info->contid = audit_get_contid(current);
tsk->audit = info;

ret = audit_alloc_syscall(tsk);
@@ -258,6 +259,7 @@ int audit_alloc(struct task_struct *tsk)
struct audit_task_info init_struct_audit = {
.loginuid = INVALID_UID,
.sessionid = AUDIT_SID_UNSET,
+ .contid = AUDIT_CID_UNSET,
#ifdef CONFIG_AUDITSYSCALL
.ctx = NULL,
#endif
@@ -2341,6 +2343,73 @@ int audit_set_loginuid(kuid_t loginuid)
}

/**
+ * audit_set_contid - set current task's audit contid
+ * @contid: contid value
+ *
+ * Returns 0 on success, -EPERM on permission failure.
+ *
+ * Called (set) from fs/proc/base.c::proc_contid_write().
+ */
+int audit_set_contid(struct task_struct *task, u64 contid)
+{
+ u64 oldcontid;
+ int rc = 0;
+ struct audit_buffer *ab;
+ uid_t uid;
+ struct tty_struct *tty;
+ char comm[sizeof(current->comm)];
+
+ task_lock(task);
+ /* Can't set if audit disabled */
+ if (!task->audit) {
+ task_unlock(task);
+ return -ENOPROTOOPT;
+ }
+ oldcontid = audit_get_contid(task);
+ read_lock(&tasklist_lock);
+ /* Don't allow the audit containerid to be unset */
+ if (!audit_contid_valid(contid))
+ rc = -EINVAL;
+ /* if we don't have caps, reject */
+ else if (!capable(CAP_AUDIT_CONTROL))
+ rc = -EPERM;
+ /* if task has children or is not single-threaded, deny */
+ else if (!list_empty(&task->children))
+ rc = -EBUSY;
+ else if (!(thread_group_leader(task) && thread_group_empty(task)))
+ rc = -EALREADY;
+ read_unlock(&tasklist_lock);
+ if (!rc)
+ task->audit->contid = contid;
+ task_unlock(task);
+
+ if (!audit_enabled)
+ return rc;
+
+ ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_CONTAINER_OP);
+ if (!ab)
+ return rc;
+
+ uid = from_kuid(&init_user_ns, task_uid(current));
+ tty = audit_get_tty();
+ audit_log_format(ab,
+ "op=set opid=%d contid=%llu old-contid=%llu pid=%d uid=%u auid=%u tty=%s ses=%u",
+ task_tgid_nr(task), contid, oldcontid,
+ task_tgid_nr(current), uid,
+ from_kuid(&init_user_ns, audit_get_loginuid(current)),
+ tty ? tty_name(tty) : "(none)",
+ audit_get_sessionid(current));
+ audit_put_tty(tty);
+ audit_log_task_context(ab);
+ audit_log_format(ab, " comm=");
+ audit_log_untrustedstring(ab, get_task_comm(comm, current));
+ audit_log_d_path_exe(ab, current->mm);
+ audit_log_format(ab, " res=%d", !rc);
+ audit_log_end(ab);
+ return rc;
+}
+
+/**
* audit_log_end - end one audit record
* @ab: the audit_buffer
*
diff --git a/kernel/audit.h b/kernel/audit.h
index c00e2ee3c6b3..e2912924af0d 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -148,6 +148,7 @@ struct audit_context {
kuid_t target_uid;
unsigned int target_sessionid;
u32 target_sid;
+ u64 target_cid;
char target_comm[TASK_COMM_LEN];

struct audit_tree_refs *trees, *first_trees;
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index fd7ca983de4f..1f7edf035b16 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -113,6 +113,7 @@ struct audit_aux_data_pids {
kuid_t target_uid[AUDIT_AUX_PIDS];
unsigned int target_sessionid[AUDIT_AUX_PIDS];
u32 target_sid[AUDIT_AUX_PIDS];
+ u64 target_cid[AUDIT_AUX_PIDS];
char target_comm[AUDIT_AUX_PIDS][TASK_COMM_LEN];
int pid_count;
};
@@ -2368,6 +2369,7 @@ void __audit_ptrace(struct task_struct *t)
context->target_uid = task_uid(t);
context->target_sessionid = audit_get_sessionid(t);
security_task_getsecid(t, &context->target_sid);
+ context->target_cid = audit_get_contid(t);
memcpy(context->target_comm, t->comm, TASK_COMM_LEN);
}

@@ -2408,6 +2410,7 @@ int audit_signal_info(int sig, struct task_struct *t)
ctx->target_uid = t_uid;
ctx->target_sessionid = audit_get_sessionid(t);
security_task_getsecid(t, &ctx->target_sid);
+ ctx->target_cid = audit_get_contid(t);
memcpy(ctx->target_comm, t->comm, TASK_COMM_LEN);
return 0;
}
@@ -2429,6 +2432,7 @@ int audit_signal_info(int sig, struct task_struct *t)
axp->target_uid[axp->pid_count] = t_uid;
axp->target_sessionid[axp->pid_count] = audit_get_sessionid(t);
security_task_getsecid(t, &axp->target_sid[axp->pid_count]);
+ axp->target_cid[axp->pid_count] = audit_get_contid(t);
memcpy(axp->target_comm[axp->pid_count], t->comm, TASK_COMM_LEN);
axp->pid_count++;

--
1.8.3.1

2019-04-09 03:41:36

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V6 03/10] audit: read container ID of a process

Add support for reading the audit container identifier from the proc
filesystem.

This is a read from the proc entry of the form
/proc/PID/audit_containerid where PID is the process ID of the task
whose audit container identifier is sought.

The read expects up to a u64 value (unset: 18446744073709551615).

This read requires CAP_AUDIT_CONTROL.

Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
---
fs/proc/base.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 43fd0c4b87de..acc70239d0cb 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1211,7 +1211,7 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf,
};

#ifdef CONFIG_AUDIT
-#define TMPBUFLEN 11
+#define TMPBUFLEN 21
static ssize_t proc_loginuid_read(struct file * file, char __user * buf,
size_t count, loff_t *ppos)
{
@@ -1295,6 +1295,24 @@ static ssize_t proc_sessionid_read(struct file * file, char __user * buf,
.llseek = generic_file_llseek,
};

+static ssize_t proc_contid_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct inode *inode = file_inode(file);
+ struct task_struct *task = get_proc_task(inode);
+ ssize_t length;
+ char tmpbuf[TMPBUFLEN];
+
+ if (!task)
+ return -ESRCH;
+ /* if we don't have caps, reject */
+ if (!capable(CAP_AUDIT_CONTROL))
+ return -EPERM;
+ length = scnprintf(tmpbuf, TMPBUFLEN, "%llu", audit_get_contid(task));
+ put_task_struct(task);
+ return simple_read_from_buffer(buf, count, ppos, tmpbuf, length);
+}
+
static ssize_t proc_contid_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
@@ -1325,6 +1343,7 @@ static ssize_t proc_contid_write(struct file *file, const char __user *buf,
}

static const struct file_operations proc_contid_operations = {
+ .read = proc_contid_read,
.write = proc_contid_write,
.llseek = generic_file_llseek,
};
@@ -3067,7 +3086,7 @@ static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
#ifdef CONFIG_AUDIT
REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
REG("sessionid", S_IRUGO, proc_sessionid_operations),
- REG("audit_containerid", S_IWUSR, proc_contid_operations),
+ REG("audit_containerid", S_IWUSR|S_IRUSR, proc_contid_operations),
#endif
#ifdef CONFIG_FAULT_INJECTION
REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
@@ -3466,7 +3485,7 @@ static int proc_tid_comm_permission(struct inode *inode, int mask)
#ifdef CONFIG_AUDIT
REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
REG("sessionid", S_IRUGO, proc_sessionid_operations),
- REG("audit_containerid", S_IWUSR, proc_contid_operations),
+ REG("audit_containerid", S_IWUSR|S_IRUSR, proc_contid_operations),
#endif
#ifdef CONFIG_FAULT_INJECTION
REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
--
1.8.3.1

2019-04-09 03:41:51

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V6 04/10] audit: log container info of syscalls

Create a new audit record AUDIT_CONTAINER_ID to document the audit
container identifier of a process if it is present.

Called from audit_log_exit(), syscalls are covered.

A sample raw event:
type=SYSCALL msg=audit(1519924845.499:257): arch=c000003e syscall=257 success=yes exit=3 a0=ffffff9c a1=56374e1cef30 a2=241 a3=1b6 items=2 ppid=606 pid=635 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3 comm="bash" exe="/usr/bin/bash" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="tmpcontainerid"
type=CWD msg=audit(1519924845.499:257): cwd="/root"
type=PATH msg=audit(1519924845.499:257): item=0 name="/tmp/" inode=13863 dev=00:27 mode=041777 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype= PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
type=PATH msg=audit(1519924845.499:257): item=1 name="/tmp/tmpcontainerid" inode=17729 dev=00:27 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
type=PROCTITLE msg=audit(1519924845.499:257): proctitle=62617368002D6300736C65657020313B206563686F2074657374203E202F746D702F746D70636F6E7461696E65726964
type=CONTAINER_ID msg=audit(1519924845.499:257): contid=123458

Please see the github audit kernel issue for the main feature:
https://github.com/linux-audit/audit-kernel/issues/90
Please see the github audit userspace issue for supporting additions:
https://github.com/linux-audit/audit-userspace/issues/51
Please see the github audit testsuiite issue for the test case:
https://github.com/linux-audit/audit-testsuite/issues/64
Please see the github audit wiki for the feature overview:
https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Acked-by: Steve Grubb <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
---
include/linux/audit.h | 5 +++++
include/uapi/linux/audit.h | 1 +
kernel/audit.c | 20 ++++++++++++++++++++
kernel/auditsc.c | 20 ++++++++++++++------
4 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 301337776193..43438192ca2a 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -199,6 +199,8 @@ static inline u64 audit_get_contid(struct task_struct *tsk)
return tsk->audit->contid;
}

+extern void audit_log_contid(struct audit_context *context, u64 contid);
+
extern u32 audit_enabled;
#else /* CONFIG_AUDIT */
static inline int audit_alloc(struct task_struct *task)
@@ -265,6 +267,9 @@ static inline u64 audit_get_contid(struct task_struct *tsk)
return AUDIT_CID_UNSET;
}

+static inline void audit_log_contid(struct audit_context *context, u64 contid)
+{ }
+
#define audit_enabled AUDIT_OFF
#endif /* CONFIG_AUDIT */

diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 4a6a8bf1de32..55fde9970762 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -115,6 +115,7 @@
#define AUDIT_REPLACE 1329 /* Replace auditd if this packet unanswerd */
#define AUDIT_KERN_MODULE 1330 /* Kernel Module events */
#define AUDIT_FANOTIFY 1331 /* Fanotify access decision */
+#define AUDIT_CONTAINER_ID 1332 /* Container ID */

#define AUDIT_AVC 1400 /* SE Linux avc denial or grant */
#define AUDIT_SELINUX_ERR 1401 /* Internal SE Linux Errors */
diff --git a/kernel/audit.c b/kernel/audit.c
index 182b0f2c183d..3e0af53f3c4d 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -2127,6 +2127,26 @@ void audit_log_session_info(struct audit_buffer *ab)
audit_log_format(ab, "auid=%u ses=%u", auid, sessionid);
}

+/*
+ * audit_log_contid - report container info
+ * @context: task or local context for record
+ * @contid: container ID to report
+ */
+void audit_log_contid(struct audit_context *context, u64 contid)
+{
+ struct audit_buffer *ab;
+
+ if (!audit_contid_valid(contid))
+ return;
+ /* Generate AUDIT_CONTAINER_ID record with container ID */
+ ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONTAINER_ID);
+ if (!ab)
+ return;
+ audit_log_format(ab, "contid=%llu", (unsigned long long)contid);
+ audit_log_end(ab);
+}
+EXPORT_SYMBOL(audit_log_contid);
+
void audit_log_key(struct audit_buffer *ab, char *key)
{
audit_log_format(ab, " key=");
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 1f7edf035b16..eea445b7a181 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1523,7 +1523,7 @@ static void audit_log_exit(void)
for (aux = context->aux_pids; aux; aux = aux->next) {
struct audit_aux_data_pids *axs = (void *)aux;

- for (i = 0; i < axs->pid_count; i++)
+ for (i = 0; i < axs->pid_count; i++) {
if (audit_log_pid_context(context, axs->target_pid[i],
axs->target_auid[i],
axs->target_uid[i],
@@ -1531,14 +1531,20 @@ static void audit_log_exit(void)
axs->target_sid[i],
axs->target_comm[i]))
call_panic = 1;
+ audit_log_contid(context, axs->target_cid[i]);
+ }
}

- if (context->target_pid &&
- audit_log_pid_context(context, context->target_pid,
- context->target_auid, context->target_uid,
- context->target_sessionid,
- context->target_sid, context->target_comm))
+ if (context->target_pid) {
+ if (audit_log_pid_context(context, context->target_pid,
+ context->target_auid,
+ context->target_uid,
+ context->target_sessionid,
+ context->target_sid,
+ context->target_comm))
call_panic = 1;
+ audit_log_contid(context, context->target_cid);
+ }

if (context->pwd.dentry && context->pwd.mnt) {
ab = audit_log_start(context, GFP_KERNEL, AUDIT_CWD);
@@ -1557,6 +1563,8 @@ static void audit_log_exit(void)

audit_log_proctitle();

+ audit_log_contid(context, audit_get_contid(current));
+
/* Send end of event record to help user space know we are finished */
ab = audit_log_start(context, GFP_KERNEL, AUDIT_EOE);
if (ab)
--
1.8.3.1

2019-04-09 03:42:04

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V6 05/10] audit: add contid support for signalling the audit daemon

Add audit container identifier support to the action of signalling the
audit daemon.

Since this would need to add an element to the audit_sig_info struct,
a new record type AUDIT_SIGNAL_INFO2 was created with a new
audit_sig_info2 struct. Corresponding support is required in the
userspace code to reflect the new record request and reply type.
An older userspace won't break since it won't know to request this
record type.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
include/linux/audit.h | 7 +++++++
include/uapi/linux/audit.h | 1 +
kernel/audit.c | 27 +++++++++++++++++++++++++++
kernel/audit.h | 1 +
kernel/auditsc.c | 1 +
security/selinux/nlmsgtab.c | 1 +
6 files changed, 38 insertions(+)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 43438192ca2a..c2dec9157463 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -37,6 +37,13 @@ struct audit_sig_info {
char ctx[0];
};

+struct audit_sig_info2 {
+ uid_t uid;
+ pid_t pid;
+ u64 cid;
+ char ctx[0];
+};
+
struct audit_buffer;
struct audit_context;
struct inode;
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 55fde9970762..10cc67926cf1 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -72,6 +72,7 @@
#define AUDIT_SET_FEATURE 1018 /* Turn an audit feature on or off */
#define AUDIT_GET_FEATURE 1019 /* Get which features are enabled */
#define AUDIT_CONTAINER_OP 1020 /* Define the container id and info */
+#define AUDIT_SIGNAL_INFO2 1021 /* Get info auditd signal sender */

#define AUDIT_FIRST_USER_MSG 1100 /* Userspace messages mostly uninteresting to kernel */
#define AUDIT_USER_AVC 1107 /* We filter this differently */
diff --git a/kernel/audit.c b/kernel/audit.c
index 3e0af53f3c4d..87e1d367f98c 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -138,6 +138,7 @@ struct audit_net {
kuid_t audit_sig_uid = INVALID_UID;
pid_t audit_sig_pid = -1;
u32 audit_sig_sid = 0;
+u64 audit_sig_cid = AUDIT_CID_UNSET;

/* Records can be lost in several ways:
0) [suppressed in audit_alloc]
@@ -1097,6 +1098,7 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
case AUDIT_ADD_RULE:
case AUDIT_DEL_RULE:
case AUDIT_SIGNAL_INFO:
+ case AUDIT_SIGNAL_INFO2:
case AUDIT_TTY_GET:
case AUDIT_TTY_SET:
case AUDIT_TRIM:
@@ -1260,6 +1262,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
struct audit_buffer *ab;
u16 msg_type = nlh->nlmsg_type;
struct audit_sig_info *sig_data;
+ struct audit_sig_info2 *sig_data2;
char *ctx = NULL;
u32 len;

@@ -1519,6 +1522,30 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
sig_data, sizeof(*sig_data) + len);
kfree(sig_data);
break;
+ case AUDIT_SIGNAL_INFO2:
+ len = 0;
+ if (audit_sig_sid) {
+ err = security_secid_to_secctx(audit_sig_sid, &ctx, &len);
+ if (err)
+ return err;
+ }
+ sig_data2 = kmalloc(sizeof(*sig_data2) + len, GFP_KERNEL);
+ if (!sig_data2) {
+ if (audit_sig_sid)
+ security_release_secctx(ctx, len);
+ return -ENOMEM;
+ }
+ sig_data2->uid = from_kuid(&init_user_ns, audit_sig_uid);
+ sig_data2->pid = audit_sig_pid;
+ if (audit_sig_sid) {
+ memcpy(sig_data2->ctx, ctx, len);
+ security_release_secctx(ctx, len);
+ }
+ sig_data2->cid = audit_sig_cid;
+ audit_send_reply(skb, seq, AUDIT_SIGNAL_INFO2, 0, 0,
+ sig_data2, sizeof(*sig_data2) + len);
+ kfree(sig_data2);
+ break;
case AUDIT_TTY_GET: {
struct audit_tty_status s;
unsigned int t;
diff --git a/kernel/audit.h b/kernel/audit.h
index e2912924af0d..c5ac6436317e 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -345,6 +345,7 @@ extern void audit_filter_inodes(struct task_struct *tsk,
extern pid_t audit_sig_pid;
extern kuid_t audit_sig_uid;
extern u32 audit_sig_sid;
+extern u64 audit_sig_cid;

extern int audit_filter(int msgtype, unsigned int listtype);

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index eea445b7a181..0a29a00feaf1 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -2405,6 +2405,7 @@ int audit_signal_info(int sig, struct task_struct *t)
else
audit_sig_uid = uid;
security_task_getsecid(current, &audit_sig_sid);
+ audit_sig_cid = audit_get_contid(current);
}

if (!audit_signals || audit_dummy_context())
diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
index 9cec81209617..682fe7397762 100644
--- a/security/selinux/nlmsgtab.c
+++ b/security/selinux/nlmsgtab.c
@@ -132,6 +132,7 @@ struct nlmsg_perm {
{ AUDIT_DEL_RULE, NETLINK_AUDIT_SOCKET__NLMSG_WRITE },
{ AUDIT_USER, NETLINK_AUDIT_SOCKET__NLMSG_RELAY },
{ AUDIT_SIGNAL_INFO, NETLINK_AUDIT_SOCKET__NLMSG_READ },
+ { AUDIT_SIGNAL_INFO2, NETLINK_AUDIT_SOCKET__NLMSG_READ },
{ AUDIT_TRIM, NETLINK_AUDIT_SOCKET__NLMSG_WRITE },
{ AUDIT_MAKE_EQUIV, NETLINK_AUDIT_SOCKET__NLMSG_WRITE },
{ AUDIT_TTY_GET, NETLINK_AUDIT_SOCKET__NLMSG_READ },
--
1.8.3.1

2019-04-09 03:42:17

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V6 06/10] audit: add support for non-syscall auxiliary records

Standalone audit records have the timestamp and serial number generated
on the fly and as such are unique, making them standalone. This new
function audit_alloc_local() generates a local audit context that will
be used only for a standalone record and its auxiliary record(s). The
context is discarded immediately after the local associated records are
produced.

Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
---
include/linux/audit.h | 8 ++++++++
kernel/audit.h | 1 +
kernel/auditsc.c | 35 ++++++++++++++++++++++++++++++-----
3 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index c2dec9157463..ae03cfd5788a 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -291,6 +291,8 @@ static inline void audit_log_contid(struct audit_context *context, u64 contid)

/* These are defined in auditsc.c */
/* Public API */
+extern struct audit_context *audit_alloc_local(gfp_t gfpflags);
+extern void audit_free_context(struct audit_context *context);
extern void __audit_syscall_entry(int major, unsigned long a0, unsigned long a1,
unsigned long a2, unsigned long a3);
extern void __audit_syscall_exit(int ret_success, long ret_value);
@@ -518,6 +520,12 @@ static inline void audit_fanotify(unsigned int response)
extern int audit_n_rules;
extern int audit_signals;
#else /* CONFIG_AUDITSYSCALL */
+static inline struct audit_context *audit_alloc_local(gfp_t gfpflags)
+{
+ return NULL;
+}
+static inline void audit_free_context(struct audit_context *context)
+{ }
static inline void audit_syscall_entry(int major, unsigned long a0,
unsigned long a1, unsigned long a2,
unsigned long a3)
diff --git a/kernel/audit.h b/kernel/audit.h
index c5ac6436317e..2a1a8b8a8019 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -111,6 +111,7 @@ struct audit_proctitle {
struct audit_context {
int dummy; /* must be the first element */
int in_syscall; /* 1 if task is in a syscall */
+ bool local; /* local context needed */
enum audit_state state, current_state;
unsigned int serial; /* serial number for record */
int major; /* syscall number */
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 0a29a00feaf1..b78734878832 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -879,11 +879,13 @@ static inline void audit_free_aux(struct audit_context *context)
}
}

-static inline struct audit_context *audit_alloc_context(enum audit_state state)
+static inline struct audit_context *audit_alloc_context(enum audit_state state,
+ gfp_t gfpflags)
{
struct audit_context *context;

- context = kzalloc(sizeof(*context), GFP_KERNEL);
+ /* We can be called in atomic context via audit_tg() */
+ context = kzalloc(sizeof(*context), gfpflags);
if (!context)
return NULL;
context->state = state;
@@ -919,7 +921,8 @@ int audit_alloc_syscall(struct task_struct *tsk)
return 0;
}

- if (!(context = audit_alloc_context(state))) {
+ context = audit_alloc_context(state, GFP_KERNEL);
+ if (!context) {
kfree(key);
audit_log_lost("out of memory in audit_alloc_syscall");
return -ENOMEM;
@@ -931,8 +934,29 @@ int audit_alloc_syscall(struct task_struct *tsk)
return 0;
}

-static inline void audit_free_context(struct audit_context *context)
+struct audit_context *audit_alloc_local(gfp_t gfpflags)
{
+ struct audit_context *context = NULL;
+
+ if (!audit_ever_enabled)
+ goto out; /* Return if not auditing. */
+ context = audit_alloc_context(AUDIT_RECORD_CONTEXT, gfpflags);
+ if (!context) {
+ audit_log_lost("out of memory in audit_alloc_local");
+ goto out;
+ }
+ context->serial = audit_serial();
+ ktime_get_coarse_real_ts64(&context->ctime);
+ context->local = true;
+out:
+ return context;
+}
+EXPORT_SYMBOL(audit_alloc_local);
+
+void audit_free_context(struct audit_context *context)
+{
+ if (!context)
+ return;
audit_free_module(context);
audit_free_names(context);
unroll_tree_refs(context, NULL, 0);
@@ -943,6 +967,7 @@ static inline void audit_free_context(struct audit_context *context)
audit_proctitle_free(context);
kfree(context);
}
+EXPORT_SYMBOL(audit_free_context);

static int audit_log_pid_context(struct audit_context *context, pid_t pid,
kuid_t auid, kuid_t uid, unsigned int sessionid,
@@ -2173,7 +2198,7 @@ void __audit_inode_child(struct inode *parent,
int auditsc_get_stamp(struct audit_context *ctx,
struct timespec64 *t, unsigned int *serial)
{
- if (!ctx->in_syscall)
+ if (!ctx->in_syscall && !ctx->local)
return 0;
if (!ctx->serial)
ctx->serial = audit_serial();
--
1.8.3.1

2019-04-09 03:42:29

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V6 08/10] audit: add containerid filtering

Implement audit container identifier filtering using the AUDIT_CONTID
field name to send an 8-character string representing a u64 since the
value field is only u32.

Sending it as two u32 was considered, but gathering and comparing two
fields was more complex.

The feature indicator is AUDIT_FEATURE_BITMAP_CONTAINERID.

Please see the github audit kernel issue for the contid filter feature:
https://github.com/linux-audit/audit-kernel/issues/91
Please see the github audit userspace issue for filter additions:
https://github.com/linux-audit/audit-userspace/issues/40
Please see the github audit testsuiite issue for the test case:
https://github.com/linux-audit/audit-testsuite/issues/64
Please see the github audit wiki for the feature overview:
https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
---
include/linux/audit.h | 1 +
include/uapi/linux/audit.h | 5 ++++-
kernel/audit.h | 1 +
kernel/auditfilter.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++
kernel/auditsc.c | 4 ++++
5 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index ae03cfd5788a..6e42e6a10736 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -83,6 +83,7 @@ struct audit_field {
u32 type;
union {
u32 val;
+ u64 val64;
kuid_t uid;
kgid_t gid;
struct {
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 10cc67926cf1..6d32eb1a96fb 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -266,6 +266,7 @@
#define AUDIT_LOGINUID_SET 24
#define AUDIT_SESSIONID 25 /* Session ID */
#define AUDIT_FSTYPE 26 /* FileSystem Type */
+#define AUDIT_CONTID 27 /* Container ID */

/* These are ONLY useful when checking
* at syscall exit time (AUDIT_AT_EXIT). */
@@ -346,6 +347,7 @@ enum {
#define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER 0x00000010
#define AUDIT_FEATURE_BITMAP_LOST_RESET 0x00000020
#define AUDIT_FEATURE_BITMAP_FILTER_FS 0x00000040
+#define AUDIT_FEATURE_BITMAP_CONTAINERID 0x00000080

#define AUDIT_FEATURE_BITMAP_ALL (AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT | \
AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME | \
@@ -353,7 +355,8 @@ enum {
AUDIT_FEATURE_BITMAP_EXCLUDE_EXTEND | \
AUDIT_FEATURE_BITMAP_SESSIONID_FILTER | \
AUDIT_FEATURE_BITMAP_LOST_RESET | \
- AUDIT_FEATURE_BITMAP_FILTER_FS)
+ AUDIT_FEATURE_BITMAP_FILTER_FS | \
+ AUDIT_FEATURE_BITMAP_CONTAINERID)

/* deprecated: AUDIT_VERSION_* */
#define AUDIT_VERSION_LATEST AUDIT_FEATURE_BITMAP_ALL
diff --git a/kernel/audit.h b/kernel/audit.h
index 2a1a8b8a8019..3a40b608bf8d 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -230,6 +230,7 @@ static inline int audit_hash_ino(u32 ino)

extern int audit_match_class(int class, unsigned syscall);
extern int audit_comparator(const u32 left, const u32 op, const u32 right);
+extern int audit_comparator64(const u64 left, const u32 op, const u64 right);
extern int audit_uid_comparator(kuid_t left, u32 op, kuid_t right);
extern int audit_gid_comparator(kgid_t left, u32 op, kgid_t right);
extern int parent_len(const char *path);
diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index 63f8b3f26fab..407b5bb3b4c6 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -410,6 +410,7 @@ static int audit_field_valid(struct audit_entry *entry, struct audit_field *f)
/* FALL THROUGH */
case AUDIT_ARCH:
case AUDIT_FSTYPE:
+ case AUDIT_CONTID:
if (f->op != Audit_not_equal && f->op != Audit_equal)
return -EINVAL;
break;
@@ -582,6 +583,14 @@ static struct audit_entry *audit_data_to_entry(struct audit_rule_data *data,
}
entry->rule.exe = audit_mark;
break;
+ case AUDIT_CONTID:
+ if (f->val != sizeof(u64))
+ goto exit_free;
+ str = audit_unpack_string(&bufp, &remain, f->val);
+ if (IS_ERR(str))
+ goto exit_free;
+ f->val64 = ((u64 *)str)[0];
+ break;
}
}

@@ -664,6 +673,11 @@ static struct audit_rule_data *audit_krule_to_data(struct audit_krule *krule)
data->buflen += data->values[i] =
audit_pack_string(&bufp, audit_mark_path(krule->exe));
break;
+ case AUDIT_CONTID:
+ data->buflen += data->values[i] = sizeof(u64);
+ memcpy(bufp, &f->val64, sizeof(u64));
+ bufp += sizeof(u64);
+ break;
case AUDIT_LOGINUID_SET:
if (krule->pflags & AUDIT_LOGINUID_LEGACY && !f->val) {
data->fields[i] = AUDIT_LOGINUID;
@@ -750,6 +764,10 @@ static int audit_compare_rule(struct audit_krule *a, struct audit_krule *b)
if (!gid_eq(a->fields[i].gid, b->fields[i].gid))
return 1;
break;
+ case AUDIT_CONTID:
+ if (a->fields[i].val64 != b->fields[i].val64)
+ return 1;
+ break;
default:
if (a->fields[i].val != b->fields[i].val)
return 1;
@@ -1206,6 +1224,31 @@ int audit_comparator(u32 left, u32 op, u32 right)
}
}

+int audit_comparator64(u64 left, u32 op, u64 right)
+{
+ switch (op) {
+ case Audit_equal:
+ return (left == right);
+ case Audit_not_equal:
+ return (left != right);
+ case Audit_lt:
+ return (left < right);
+ case Audit_le:
+ return (left <= right);
+ case Audit_gt:
+ return (left > right);
+ case Audit_ge:
+ return (left >= right);
+ case Audit_bitmask:
+ return (left & right);
+ case Audit_bittest:
+ return ((left & right) == right);
+ default:
+ BUG();
+ return 0;
+ }
+}
+
int audit_uid_comparator(kuid_t left, u32 op, kuid_t right)
{
switch (op) {
@@ -1344,6 +1387,10 @@ int audit_filter(int msgtype, unsigned int listtype)
result = audit_comparator(audit_loginuid_set(current),
f->op, f->val);
break;
+ case AUDIT_CONTID:
+ result = audit_comparator64(audit_get_contid(current),
+ f->op, f->val64);
+ break;
case AUDIT_MSGTYPE:
result = audit_comparator(msgtype, f->op, f->val);
break;
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index b78734878832..deb3df8b62be 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -616,6 +616,10 @@ static int audit_filter_rules(struct task_struct *tsk,
case AUDIT_LOGINUID_SET:
result = audit_comparator(audit_loginuid_set(tsk), f->op, f->val);
break;
+ case AUDIT_CONTID:
+ result = audit_comparator64(audit_get_contid(tsk),
+ f->op, f->val64);
+ break;
case AUDIT_SUBJ_USER:
case AUDIT_SUBJ_ROLE:
case AUDIT_SUBJ_TYPE:
--
1.8.3.1

2019-04-09 03:42:35

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V6 10/10] audit: NETFILTER_PKT: record each container ID associated with a netNS

Add audit container identifier auxiliary record(s) to NETFILTER_PKT
event standalone records. Iterate through all potential audit container
identifiers associated with a network namespace.

Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
---
include/linux/audit.h | 5 +++++
kernel/audit.c | 39 +++++++++++++++++++++++++++++++++++++++
net/netfilter/nft_log.c | 11 +++++++++--
net/netfilter/xt_AUDIT.c | 11 +++++++++--
4 files changed, 62 insertions(+), 4 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 4b2503927c37..d43db4491dd1 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -220,6 +220,8 @@ static inline u64 audit_get_contid(struct task_struct *tsk)
extern void audit_netns_contid_del(struct net *net, u64 contid);
extern void audit_switch_task_namespaces(struct nsproxy *ns,
struct task_struct *p);
+extern void audit_log_netns_contid_list(struct net *net,
+ struct audit_context *context);

extern u32 audit_enabled;
#else /* CONFIG_AUDIT */
@@ -296,6 +298,9 @@ static inline void audit_netns_contid_del(struct net *net, u64 contid)
static inline void audit_switch_task_namespaces(struct nsproxy *ns,
struct task_struct *p)
{ }
+static inline void audit_log_netns_contid_list(struct net *net,
+ struct audit_context *context)
+{ }

#define audit_enabled AUDIT_OFF
#endif /* CONFIG_AUDIT */
diff --git a/kernel/audit.c b/kernel/audit.c
index 996213591617..512464a626d1 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -453,6 +453,45 @@ void audit_switch_task_namespaces(struct nsproxy *ns, struct task_struct *p)
audit_netns_contid_add(new->net_ns, contid);
}

+/**
+ * audit_log_netns_contid_list - List contids for the given network namespace
+ * @net: the network namespace of interest
+ * @context: the audit context to use
+ *
+ * Description:
+ * Issues a CONTAINER_ID record with a CSV list of contids associated
+ * with a network namespace to accompany a NETFILTER_PKT record.
+ */
+void audit_log_netns_contid_list(struct net *net, struct audit_context *context)
+{
+ struct audit_buffer *ab = NULL;
+ struct audit_contid *cont;
+ struct audit_net *aunet;
+
+ /* Generate AUDIT_CONTAINER_ID record with container ID CSV list */
+ rcu_read_lock();
+ aunet = net_generic(net, audit_net_id);
+ if (!aunet)
+ goto out;
+ list_for_each_entry_rcu(cont, &aunet->contid_list, list) {
+ if (!ab) {
+ ab = audit_log_start(context, GFP_ATOMIC,
+ AUDIT_CONTAINER_ID);
+ if (!ab) {
+ audit_log_lost("out of memory in audit_log_netns_contid_list");
+ goto out;
+ }
+ audit_log_format(ab, "contid=");
+ } else
+ audit_log_format(ab, ",");
+ audit_log_format(ab, "%llu", (unsigned long long)cont->id);
+ }
+ audit_log_end(ab);
+out:
+ rcu_read_unlock();
+}
+EXPORT_SYMBOL(audit_log_netns_contid_list);
+
void audit_panic(const char *message)
{
switch (audit_failure) {
diff --git a/net/netfilter/nft_log.c b/net/netfilter/nft_log.c
index 655187bed5d8..bdb1ec2368a7 100644
--- a/net/netfilter/nft_log.c
+++ b/net/netfilter/nft_log.c
@@ -69,13 +69,16 @@ static void nft_log_eval_audit(const struct nft_pktinfo *pkt)
struct sk_buff *skb = pkt->skb;
struct audit_buffer *ab;
int fam = -1;
+ struct audit_context *context;
+ struct net *net;

if (!audit_enabled)
return;

- ab = audit_log_start(NULL, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
+ context = audit_alloc_local(GFP_ATOMIC);
+ ab = audit_log_start(context, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
if (!ab)
- return;
+ goto errout;

audit_log_format(ab, "mark=%#x", skb->mark);

@@ -102,6 +105,10 @@ static void nft_log_eval_audit(const struct nft_pktinfo *pkt)
audit_log_format(ab, " saddr=? daddr=? proto=-1");

audit_log_end(ab);
+ net = xt_net(&pkt->xt);
+ audit_log_netns_contid_list(net, context);
+errout:
+ audit_free_context(context);
}

static void nft_log_eval(const struct nft_expr *expr,
diff --git a/net/netfilter/xt_AUDIT.c b/net/netfilter/xt_AUDIT.c
index af883f1b64f9..a3e547435f13 100644
--- a/net/netfilter/xt_AUDIT.c
+++ b/net/netfilter/xt_AUDIT.c
@@ -71,10 +71,13 @@ static bool audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)
{
struct audit_buffer *ab;
int fam = -1;
+ struct audit_context *context;
+ struct net *net;

if (audit_enabled == AUDIT_OFF)
- goto errout;
- ab = audit_log_start(NULL, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
+ goto out;
+ context = audit_alloc_local(GFP_ATOMIC);
+ ab = audit_log_start(context, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
if (ab == NULL)
goto errout;

@@ -104,7 +107,11 @@ static bool audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)

audit_log_end(ab);

+ net = xt_net(par);
+ audit_log_netns_contid_list(net, context);
errout:
+ audit_free_context(context);
+out:
return XT_CONTINUE;
}

--
1.8.3.1

2019-04-09 03:44:06

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V6 09/10] audit: add support for containerid to network namespaces

Audit events could happen in a network namespace outside of a task
context due to packets received from the net that trigger an auditing
rule prior to being associated with a running task. The network
namespace could be in use by multiple containers by association to the
tasks in that network namespace. We still want a way to attribute
these events to any potential containers. Keep a list per network
namespace to track these audit container identifiiers.

Add/increment the audit container identifier on:
- initial setting of the audit container identifier via /proc
- clone/fork call that inherits an audit container identifier
- unshare call that inherits an audit container identifier
- setns call that inherits an audit container identifier
Delete/decrement the audit container identifier on:
- an inherited audit container identifier dropped when child set
- process exit
- unshare call that drops a net namespace
- setns call that drops a net namespace

Please see the github audit kernel issue for contid net support:
https://github.com/linux-audit/audit-kernel/issues/92
Please see the github audit testsuiite issue for the test case:
https://github.com/linux-audit/audit-testsuite/issues/64
Please see the github audit wiki for the feature overview:
https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
---
include/linux/audit.h | 19 +++++++++++
kernel/audit.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++--
kernel/nsproxy.c | 4 +++
3 files changed, 108 insertions(+), 3 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 6e42e6a10736..4b2503927c37 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -27,6 +27,7 @@
#include <linux/ptrace.h>
#include <linux/namei.h> /* LOOKUP_* */
#include <uapi/linux/audit.h>
+#include <linux/refcount.h>

#define AUDIT_INO_UNSET ((unsigned long)-1)
#define AUDIT_DEV_UNSET ((dev_t)-1)
@@ -105,6 +106,13 @@ struct audit_task_info {

extern struct audit_task_info init_struct_audit;

+struct audit_contid {
+ struct list_head list;
+ u64 id;
+ refcount_t refcount;
+ struct rcu_head rcu;
+};
+
extern int is_audit_feature_set(int which);

extern int __init audit_register_class(int class, unsigned *list);
@@ -208,6 +216,10 @@ static inline u64 audit_get_contid(struct task_struct *tsk)
}

extern void audit_log_contid(struct audit_context *context, u64 contid);
+extern void audit_netns_contid_add(struct net *net, u64 contid);
+extern void audit_netns_contid_del(struct net *net, u64 contid);
+extern void audit_switch_task_namespaces(struct nsproxy *ns,
+ struct task_struct *p);

extern u32 audit_enabled;
#else /* CONFIG_AUDIT */
@@ -277,6 +289,13 @@ static inline u64 audit_get_contid(struct task_struct *tsk)

static inline void audit_log_contid(struct audit_context *context, u64 contid)
{ }
+static inline void audit_netns_contid_add(struct net *net, u64 contid)
+{ }
+static inline void audit_netns_contid_del(struct net *net, u64 contid)
+{ }
+static inline void audit_switch_task_namespaces(struct nsproxy *ns,
+ struct task_struct *p)
+{ }

#define audit_enabled AUDIT_OFF
#endif /* CONFIG_AUDIT */
diff --git a/kernel/audit.c b/kernel/audit.c
index 6c742da66b32..996213591617 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -72,6 +72,7 @@
#include <linux/freezer.h>
#include <linux/pid_namespace.h>
#include <net/netns/generic.h>
+#include <net/net_namespace.h>

#include "audit.h"

@@ -99,9 +100,13 @@
/**
* struct audit_net - audit private network namespace data
* @sk: communication socket
+ * @contid_list: audit container identifier list
+ * @contid_list_lock audit container identifier list lock
*/
struct audit_net {
struct sock *sk;
+ struct list_head contid_list;
+ spinlock_t contid_list_lock;
};

/**
@@ -275,8 +280,11 @@ struct audit_task_info init_struct_audit = {
void audit_free(struct task_struct *tsk)
{
struct audit_task_info *info = tsk->audit;
+ struct nsproxy *ns = tsk->nsproxy;

audit_free_syscall(tsk);
+ if (ns)
+ audit_netns_contid_del(ns->net_ns, audit_get_contid(tsk));
/* Freeing the audit_task_info struct must be performed after
* audit_log_exit() due to need for loginuid and sessionid.
*/
@@ -376,6 +384,75 @@ static struct sock *audit_get_sk(const struct net *net)
return aunet->sk;
}

+void audit_netns_contid_add(struct net *net, u64 contid)
+{
+ struct audit_net *aunet;
+ struct list_head *contid_list;
+ struct audit_contid *cont;
+
+ if (!net)
+ return;
+ if (!audit_contid_valid(contid))
+ return;
+ aunet = net_generic(net, audit_net_id);
+ if (!aunet)
+ return;
+ contid_list = &aunet->contid_list;
+ spin_lock(&aunet->contid_list_lock);
+ list_for_each_entry_rcu(cont, contid_list, list)
+ if (cont->id == contid) {
+ refcount_inc(&cont->refcount);
+ goto out;
+ }
+ cont = kmalloc(sizeof(struct audit_contid), GFP_ATOMIC);
+ if (cont) {
+ INIT_LIST_HEAD(&cont->list);
+ cont->id = contid;
+ refcount_set(&cont->refcount, 1);
+ list_add_rcu(&cont->list, contid_list);
+ }
+out:
+ spin_unlock(&aunet->contid_list_lock);
+}
+
+void audit_netns_contid_del(struct net *net, u64 contid)
+{
+ struct audit_net *aunet;
+ struct list_head *contid_list;
+ struct audit_contid *cont = NULL;
+
+ if (!net)
+ return;
+ if (!audit_contid_valid(contid))
+ return;
+ aunet = net_generic(net, audit_net_id);
+ if (!aunet)
+ return;
+ contid_list = &aunet->contid_list;
+ spin_lock(&aunet->contid_list_lock);
+ list_for_each_entry_rcu(cont, contid_list, list)
+ if (cont->id == contid) {
+ if (refcount_dec_and_test(&cont->refcount)) {
+ list_del_rcu(&cont->list);
+ kfree_rcu(cont, rcu);
+ }
+ break;
+ }
+ spin_unlock(&aunet->contid_list_lock);
+}
+
+void audit_switch_task_namespaces(struct nsproxy *ns, struct task_struct *p)
+{
+ u64 contid = audit_get_contid(p);
+ struct nsproxy *new = p->nsproxy;
+
+ if (!audit_contid_valid(contid))
+ return;
+ audit_netns_contid_del(ns->net_ns, contid);
+ if (new)
+ audit_netns_contid_add(new->net_ns, contid);
+}
+
void audit_panic(const char *message)
{
switch (audit_failure) {
@@ -1644,7 +1721,6 @@ static int __net_init audit_net_init(struct net *net)
.flags = NL_CFG_F_NONROOT_RECV,
.groups = AUDIT_NLGRP_MAX,
};
-
struct audit_net *aunet = net_generic(net, audit_net_id);

aunet->sk = netlink_kernel_create(net, NETLINK_AUDIT, &cfg);
@@ -1653,7 +1729,8 @@ static int __net_init audit_net_init(struct net *net)
return -ENOMEM;
}
aunet->sk->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
-
+ INIT_LIST_HEAD(&aunet->contid_list);
+ spin_lock_init(&aunet->contid_list_lock);
return 0;
}

@@ -2404,6 +2481,7 @@ int audit_set_contid(struct task_struct *task, u64 contid)
uid_t uid;
struct tty_struct *tty;
char comm[sizeof(current->comm)];
+ struct net *net = task->nsproxy->net_ns;

task_lock(task);
/* Can't set if audit disabled */
@@ -2425,8 +2503,12 @@ int audit_set_contid(struct task_struct *task, u64 contid)
else if (!(thread_group_leader(task) && thread_group_empty(task)))
rc = -EALREADY;
read_unlock(&tasklist_lock);
- if (!rc)
+ if (!rc) {
+ if (audit_contid_valid(oldcontid))
+ audit_netns_contid_del(net, oldcontid);
task->audit->contid = contid;
+ audit_netns_contid_add(net, contid);
+ }
task_unlock(task);

if (!audit_enabled)
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index f6c5d330059a..718b1201ae70 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -27,6 +27,7 @@
#include <linux/syscalls.h>
#include <linux/cgroup.h>
#include <linux/perf_event.h>
+#include <linux/audit.h>

static struct kmem_cache *nsproxy_cachep;

@@ -140,6 +141,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
struct nsproxy *old_ns = tsk->nsproxy;
struct user_namespace *user_ns = task_cred_xxx(tsk, user_ns);
struct nsproxy *new_ns;
+ u64 contid = audit_get_contid(tsk);

if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
CLONE_NEWPID | CLONE_NEWNET |
@@ -167,6 +169,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
return PTR_ERR(new_ns);

tsk->nsproxy = new_ns;
+ audit_netns_contid_add(new_ns->net_ns, contid);
return 0;
}

@@ -224,6 +227,7 @@ void switch_task_namespaces(struct task_struct *p, struct nsproxy *new)
ns = p->nsproxy;
p->nsproxy = new;
task_unlock(p);
+ audit_switch_task_namespaces(ns, p);

if (ns && atomic_dec_and_test(&ns->count))
free_nsproxy(ns);
--
1.8.3.1

2019-04-09 04:44:35

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V6 07/10] audit: add containerid support for user records

Add audit container identifier auxiliary record to user event standalone
records.

Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
---
kernel/audit.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 87e1d367f98c..6c742da66b32 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1143,12 +1143,6 @@ static void audit_log_common_recv_msg(struct audit_context *context,
audit_log_task_context(*ab);
}

-static inline void audit_log_user_recv_msg(struct audit_buffer **ab,
- u16 msg_type)
-{
- audit_log_common_recv_msg(NULL, ab, msg_type);
-}
-
int is_audit_feature_set(int i)
{
return af.features & AUDIT_FEATURE_TO_MASK(i);
@@ -1411,13 +1405,16 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)

err = audit_filter(msg_type, AUDIT_FILTER_USER);
if (err == 1) { /* match or error */
+ struct audit_context *context;
+
err = 0;
if (msg_type == AUDIT_USER_TTY) {
err = tty_audit_push();
if (err)
break;
}
- audit_log_user_recv_msg(&ab, msg_type);
+ context = audit_alloc_local(GFP_KERNEL);
+ audit_log_common_recv_msg(context, &ab, msg_type);
if (msg_type != AUDIT_USER_TTY)
audit_log_format(ab, " msg='%.*s'",
AUDIT_MESSAGE_TEXT_MAX,
@@ -1433,6 +1430,8 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
audit_log_n_untrustedstring(ab, data, size);
}
audit_log_end(ab);
+ audit_log_contid(context, audit_get_contid(current));
+ audit_free_context(context);
}
break;
case AUDIT_ADD_RULE:
--
1.8.3.1

2019-04-09 04:45:07

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V6 01/10] audit: collect audit task parameters

The audit-related parameters in struct task_struct should ideally be
collected together and accessed through a standard audit API.

Collect the existing loginuid, sessionid and audit_context together in a
new struct audit_task_info called "audit" in struct task_struct.

Use kmem_cache to manage this pool of memory.
Un-inline audit_free() to be able to always recover that memory.

Please see the upstream github issue
https://github.com/linux-audit/audit-kernel/issues/81

Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
---
include/linux/audit.h | 49 +++++++++++++++++++++++------------
include/linux/sched.h | 7 +----
init/init_task.c | 3 +--
init/main.c | 2 ++
kernel/audit.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++++--
kernel/audit.h | 5 ++++
kernel/auditsc.c | 26 ++++++++++---------
kernel/fork.c | 1 -
8 files changed, 124 insertions(+), 40 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 1e69d9fe16da..bde346e73f0c 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -86,6 +86,16 @@ struct audit_field {
u32 op;
};

+struct audit_task_info {
+ kuid_t loginuid;
+ unsigned int sessionid;
+#ifdef CONFIG_AUDITSYSCALL
+ struct audit_context *ctx;
+#endif
+};
+
+extern struct audit_task_info init_struct_audit;
+
extern int is_audit_feature_set(int which);

extern int __init audit_register_class(int class, unsigned *list);
@@ -122,6 +132,9 @@ struct audit_field {
#ifdef CONFIG_AUDIT
/* These are defined in audit.c */
/* Public API */
+extern int audit_alloc(struct task_struct *task);
+extern void audit_free(struct task_struct *task);
+extern void __init audit_task_init(void);
extern __printf(4, 5)
void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
const char *fmt, ...);
@@ -164,16 +177,28 @@ extern void audit_log_key(struct audit_buffer *ab,

static inline kuid_t audit_get_loginuid(struct task_struct *tsk)
{
- return tsk->loginuid;
+ if (!tsk->audit)
+ return INVALID_UID;
+ return tsk->audit->loginuid;
}

static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
{
- return tsk->sessionid;
+ if (!tsk->audit)
+ return AUDIT_SID_UNSET;
+ return tsk->audit->sessionid;
}

extern u32 audit_enabled;
#else /* CONFIG_AUDIT */
+static inline int audit_alloc(struct task_struct *task)
+{
+ return 0;
+}
+static inline void audit_free(struct task_struct *task)
+{ }
+static inline void __init audit_task_init(void)
+{ }
static inline __printf(4, 5)
void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
const char *fmt, ...)
@@ -239,8 +264,6 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)

/* These are defined in auditsc.c */
/* Public API */
-extern int audit_alloc(struct task_struct *task);
-extern void __audit_free(struct task_struct *task);
extern void __audit_syscall_entry(int major, unsigned long a0, unsigned long a1,
unsigned long a2, unsigned long a3);
extern void __audit_syscall_exit(int ret_success, long ret_value);
@@ -263,12 +286,14 @@ extern void audit_seccomp_actions_logged(const char *names,

static inline void audit_set_context(struct task_struct *task, struct audit_context *ctx)
{
- task->audit_context = ctx;
+ task->audit->ctx = ctx;
}

static inline struct audit_context *audit_context(void)
{
- return current->audit_context;
+ if (!current->audit)
+ return NULL;
+ return current->audit->ctx;
}

static inline bool audit_dummy_context(void)
@@ -276,11 +301,7 @@ static inline bool audit_dummy_context(void)
void *p = audit_context();
return !p || *(int *)p;
}
-static inline void audit_free(struct task_struct *task)
-{
- if (unlikely(task->audit_context))
- __audit_free(task);
-}
+
static inline void audit_syscall_entry(int major, unsigned long a0,
unsigned long a1, unsigned long a2,
unsigned long a3)
@@ -470,12 +491,6 @@ static inline void audit_fanotify(unsigned int response)
extern int audit_n_rules;
extern int audit_signals;
#else /* CONFIG_AUDITSYSCALL */
-static inline int audit_alloc(struct task_struct *task)
-{
- return 0;
-}
-static inline void audit_free(struct task_struct *task)
-{ }
static inline void audit_syscall_entry(int major, unsigned long a0,
unsigned long a1, unsigned long a2,
unsigned long a3)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1549584a1538..3e0699a26dab 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -32,7 +32,6 @@
#include <linux/rseq.h>

/* task_struct member predeclarations (sorted alphabetically): */
-struct audit_context;
struct backing_dev_info;
struct bio_list;
struct blk_plug;
@@ -873,11 +872,7 @@ struct task_struct {
struct callback_head *task_works;

#ifdef CONFIG_AUDIT
-#ifdef CONFIG_AUDITSYSCALL
- struct audit_context *audit_context;
-#endif
- kuid_t loginuid;
- unsigned int sessionid;
+ struct audit_task_info *audit;
#endif
struct seccomp seccomp;

diff --git a/init/init_task.c b/init/init_task.c
index c70ef656d0f4..32420485e54b 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -123,8 +123,7 @@ struct task_struct init_task
.thread_group = LIST_HEAD_INIT(init_task.thread_group),
.thread_node = LIST_HEAD_INIT(init_signals.thread_head),
#ifdef CONFIG_AUDIT
- .loginuid = INVALID_UID,
- .sessionid = AUDIT_SID_UNSET,
+ .audit = &init_struct_audit,
#endif
#ifdef CONFIG_PERF_EVENTS
.perf_event_mutex = __MUTEX_INITIALIZER(init_task.perf_event_mutex),
diff --git a/init/main.c b/init/main.c
index 598e278b46f7..26b6c80f5b1d 100644
--- a/init/main.c
+++ b/init/main.c
@@ -92,6 +92,7 @@
#include <linux/rodata_test.h>
#include <linux/jump_label.h>
#include <linux/mem_encrypt.h>
+#include <linux/audit.h>

#include <asm/io.h>
#include <asm/bugs.h>
@@ -734,6 +735,7 @@ asmlinkage __visible void __init start_kernel(void)
nsfs_init();
cpuset_init();
cgroup_init();
+ audit_task_init();
taskstats_init_early();
delayacct_init();

diff --git a/kernel/audit.c b/kernel/audit.c
index b96bf69183f4..3fb09783cd4a 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -215,6 +215,73 @@ struct audit_reply {
struct sk_buff *skb;
};

+static struct kmem_cache *audit_task_cache;
+
+void __init audit_task_init(void)
+{
+ audit_task_cache = kmem_cache_create("audit_task",
+ sizeof(struct audit_task_info),
+ 0, SLAB_PANIC, NULL);
+}
+
+/**
+ * audit_alloc - allocate an audit info block for a task
+ * @tsk: task
+ *
+ * Call audit_alloc_syscall to filter on the task information and
+ * allocate a per-task audit context if necessary. This is called from
+ * copy_process, so no lock is needed.
+ */
+int audit_alloc(struct task_struct *tsk)
+{
+ int ret = 0;
+ struct audit_task_info *info;
+
+ info = kmem_cache_alloc(audit_task_cache, GFP_KERNEL);
+ if (!info) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ info->loginuid = audit_get_loginuid(current);
+ info->sessionid = audit_get_sessionid(current);
+ tsk->audit = info;
+
+ ret = audit_alloc_syscall(tsk);
+ if (ret) {
+ tsk->audit = NULL;
+ kmem_cache_free(audit_task_cache, info);
+ }
+out:
+ return ret;
+}
+
+struct audit_task_info init_struct_audit = {
+ .loginuid = INVALID_UID,
+ .sessionid = AUDIT_SID_UNSET,
+#ifdef CONFIG_AUDITSYSCALL
+ .ctx = NULL,
+#endif
+};
+
+/**
+ * audit_free - free per-task audit info
+ * @tsk: task whose audit info block to free
+ *
+ * Called from copy_process and do_exit
+ */
+void audit_free(struct task_struct *tsk)
+{
+ struct audit_task_info *info = tsk->audit;
+
+ audit_free_syscall(tsk);
+ /* Freeing the audit_task_info struct must be performed after
+ * audit_log_exit() due to need for loginuid and sessionid.
+ */
+ info = tsk->audit;
+ tsk->audit = NULL;
+ kmem_cache_free(audit_task_cache, info);
+}
+
/**
* auditd_test_task - Check to see if a given task is an audit daemon
* @task: the task to check
@@ -2266,8 +2333,8 @@ int audit_set_loginuid(kuid_t loginuid)
sessionid = (unsigned int)atomic_inc_return(&session_id);
}

- current->sessionid = sessionid;
- current->loginuid = loginuid;
+ current->audit->sessionid = sessionid;
+ current->audit->loginuid = loginuid;
out:
audit_log_set_loginuid(oldloginuid, loginuid, oldsessionid, sessionid, rc);
return rc;
diff --git a/kernel/audit.h b/kernel/audit.h
index 958d5b8fc1b3..c00e2ee3c6b3 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -264,6 +264,8 @@ extern void audit_log_d_path_exe(struct audit_buffer *ab,
extern unsigned int audit_serial(void);
extern int auditsc_get_stamp(struct audit_context *ctx,
struct timespec64 *t, unsigned int *serial);
+extern int audit_alloc_syscall(struct task_struct *tsk);
+extern void audit_free_syscall(struct task_struct *tsk);

extern void audit_put_watch(struct audit_watch *watch);
extern void audit_get_watch(struct audit_watch *watch);
@@ -305,6 +307,9 @@ extern void audit_filter_inodes(struct task_struct *tsk,
extern struct list_head *audit_killed_trees(void);
#else /* CONFIG_AUDITSYSCALL */
#define auditsc_get_stamp(c, t, s) 0
+#define audit_alloc_syscall(t) 0
+#define audit_free_syscall(t) {}
+
#define audit_put_watch(w) {}
#define audit_get_watch(w) {}
#define audit_to_watch(k, p, l, o) (-EINVAL)
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 98a98e6dca05..fd7ca983de4f 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -892,23 +892,25 @@ static inline struct audit_context *audit_alloc_context(enum audit_state state)
return context;
}

-/**
- * audit_alloc - allocate an audit context block for a task
+/*
+ * audit_alloc_syscall - allocate an audit context block for a task
* @tsk: task
*
* Filter on the task information and allocate a per-task audit context
* if necessary. Doing so turns on system call auditing for the
- * specified task. This is called from copy_process, so no lock is
- * needed.
+ * specified task. This is called from copy_process via audit_alloc, so
+ * no lock is needed.
*/
-int audit_alloc(struct task_struct *tsk)
+int audit_alloc_syscall(struct task_struct *tsk)
{
struct audit_context *context;
enum audit_state state;
char *key = NULL;

- if (likely(!audit_ever_enabled))
+ if (likely(!audit_ever_enabled)) {
+ audit_set_context(tsk, NULL);
return 0; /* Return if not auditing. */
+ }

state = audit_filter_task(tsk, &key);
if (state == AUDIT_DISABLED) {
@@ -918,7 +920,7 @@ int audit_alloc(struct task_struct *tsk)

if (!(context = audit_alloc_context(state))) {
kfree(key);
- audit_log_lost("out of memory in audit_alloc");
+ audit_log_lost("out of memory in audit_alloc_syscall");
return -ENOMEM;
}
context->filterkey = key;
@@ -1563,14 +1565,15 @@ static void audit_log_exit(void)
}

/**
- * __audit_free - free a per-task audit context
+ * audit_free_syscall - free per-task audit context info
* @tsk: task whose audit context block to free
*
- * Called from copy_process and do_exit
+ * Called from audit_free
*/
-void __audit_free(struct task_struct *tsk)
+void audit_free_syscall(struct task_struct *tsk)
{
- struct audit_context *context = tsk->audit_context;
+ struct audit_task_info *info = tsk->audit;
+ struct audit_context *context = info->ctx;

if (!context)
return;
@@ -1593,7 +1596,6 @@ void __audit_free(struct task_struct *tsk)
if (context->current_state == AUDIT_RECORD_CONTEXT)
audit_log_exit();
}
-
audit_set_context(tsk, NULL);
audit_free_context(context);
}
diff --git a/kernel/fork.c b/kernel/fork.c
index 9dcd18aa210b..9167a8f3edae 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1834,7 +1834,6 @@ static __latent_entropy struct task_struct *copy_process(
posix_cpu_timers_init(p);

p->io_context = NULL;
- audit_set_context(p, NULL);
cgroup_fork(p);
#ifdef CONFIG_NUMA
p->mempolicy = mpol_dup(p->mempolicy);
--
1.8.3.1

2019-04-09 13:00:26

by Ondrej Mosnacek

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 05/10] audit: add contid support for signalling the audit daemon

On Tue, Apr 9, 2019 at 5:40 AM Richard Guy Briggs <[email protected]> wrote:
> Add audit container identifier support to the action of signalling the
> audit daemon.
>
> Since this would need to add an element to the audit_sig_info struct,
> a new record type AUDIT_SIGNAL_INFO2 was created with a new
> audit_sig_info2 struct. Corresponding support is required in the
> userspace code to reflect the new record request and reply type.
> An older userspace won't break since it won't know to request this
> record type.
>
> Signed-off-by: Richard Guy Briggs <[email protected]>

This looks good to me.

Reviewed-by: Ondrej Mosnacek <[email protected]>

Although I'm wondering if we shouldn't try to future-proof the
AUDIT_SIGNAL_INFO2 format somehow, so that we don't need to add
another AUDIT_SIGNAL_INFO3 when the need arises to add yet-another
identifier to it... The simplest solution I can come up with is to add
a "version" field at the beginning (set to 2 initially), then v<N>_len
at the beginning of data for version <N>. But maybe this is too
complicated for too little gain...

> ---
> include/linux/audit.h | 7 +++++++
> include/uapi/linux/audit.h | 1 +
> kernel/audit.c | 27 +++++++++++++++++++++++++++
> kernel/audit.h | 1 +
> kernel/auditsc.c | 1 +
> security/selinux/nlmsgtab.c | 1 +
> 6 files changed, 38 insertions(+)
>
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 43438192ca2a..c2dec9157463 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -37,6 +37,13 @@ struct audit_sig_info {
> char ctx[0];
> };
>
> +struct audit_sig_info2 {
> + uid_t uid;
> + pid_t pid;
> + u64 cid;
> + char ctx[0];
> +};
> +
> struct audit_buffer;
> struct audit_context;
> struct inode;
> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index 55fde9970762..10cc67926cf1 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -72,6 +72,7 @@
> #define AUDIT_SET_FEATURE 1018 /* Turn an audit feature on or off */
> #define AUDIT_GET_FEATURE 1019 /* Get which features are enabled */
> #define AUDIT_CONTAINER_OP 1020 /* Define the container id and info */
> +#define AUDIT_SIGNAL_INFO2 1021 /* Get info auditd signal sender */
>
> #define AUDIT_FIRST_USER_MSG 1100 /* Userspace messages mostly uninteresting to kernel */
> #define AUDIT_USER_AVC 1107 /* We filter this differently */
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 3e0af53f3c4d..87e1d367f98c 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -138,6 +138,7 @@ struct audit_net {
> kuid_t audit_sig_uid = INVALID_UID;
> pid_t audit_sig_pid = -1;
> u32 audit_sig_sid = 0;
> +u64 audit_sig_cid = AUDIT_CID_UNSET;
>
> /* Records can be lost in several ways:
> 0) [suppressed in audit_alloc]
> @@ -1097,6 +1098,7 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
> case AUDIT_ADD_RULE:
> case AUDIT_DEL_RULE:
> case AUDIT_SIGNAL_INFO:
> + case AUDIT_SIGNAL_INFO2:
> case AUDIT_TTY_GET:
> case AUDIT_TTY_SET:
> case AUDIT_TRIM:
> @@ -1260,6 +1262,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> struct audit_buffer *ab;
> u16 msg_type = nlh->nlmsg_type;
> struct audit_sig_info *sig_data;
> + struct audit_sig_info2 *sig_data2;
> char *ctx = NULL;
> u32 len;
>
> @@ -1519,6 +1522,30 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> sig_data, sizeof(*sig_data) + len);
> kfree(sig_data);
> break;
> + case AUDIT_SIGNAL_INFO2:
> + len = 0;
> + if (audit_sig_sid) {
> + err = security_secid_to_secctx(audit_sig_sid, &ctx, &len);
> + if (err)
> + return err;
> + }
> + sig_data2 = kmalloc(sizeof(*sig_data2) + len, GFP_KERNEL);
> + if (!sig_data2) {
> + if (audit_sig_sid)
> + security_release_secctx(ctx, len);
> + return -ENOMEM;
> + }
> + sig_data2->uid = from_kuid(&init_user_ns, audit_sig_uid);
> + sig_data2->pid = audit_sig_pid;
> + if (audit_sig_sid) {
> + memcpy(sig_data2->ctx, ctx, len);
> + security_release_secctx(ctx, len);
> + }
> + sig_data2->cid = audit_sig_cid;
> + audit_send_reply(skb, seq, AUDIT_SIGNAL_INFO2, 0, 0,
> + sig_data2, sizeof(*sig_data2) + len);
> + kfree(sig_data2);
> + break;
> case AUDIT_TTY_GET: {
> struct audit_tty_status s;
> unsigned int t;
> diff --git a/kernel/audit.h b/kernel/audit.h
> index e2912924af0d..c5ac6436317e 100644
> --- a/kernel/audit.h
> +++ b/kernel/audit.h
> @@ -345,6 +345,7 @@ extern void audit_filter_inodes(struct task_struct *tsk,
> extern pid_t audit_sig_pid;
> extern kuid_t audit_sig_uid;
> extern u32 audit_sig_sid;
> +extern u64 audit_sig_cid;
>
> extern int audit_filter(int msgtype, unsigned int listtype);
>
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index eea445b7a181..0a29a00feaf1 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -2405,6 +2405,7 @@ int audit_signal_info(int sig, struct task_struct *t)
> else
> audit_sig_uid = uid;
> security_task_getsecid(current, &audit_sig_sid);
> + audit_sig_cid = audit_get_contid(current);
> }
>
> if (!audit_signals || audit_dummy_context())
> diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
> index 9cec81209617..682fe7397762 100644
> --- a/security/selinux/nlmsgtab.c
> +++ b/security/selinux/nlmsgtab.c
> @@ -132,6 +132,7 @@ struct nlmsg_perm {
> { AUDIT_DEL_RULE, NETLINK_AUDIT_SOCKET__NLMSG_WRITE },
> { AUDIT_USER, NETLINK_AUDIT_SOCKET__NLMSG_RELAY },
> { AUDIT_SIGNAL_INFO, NETLINK_AUDIT_SOCKET__NLMSG_READ },
> + { AUDIT_SIGNAL_INFO2, NETLINK_AUDIT_SOCKET__NLMSG_READ },
> { AUDIT_TRIM, NETLINK_AUDIT_SOCKET__NLMSG_WRITE },
> { AUDIT_MAKE_EQUIV, NETLINK_AUDIT_SOCKET__NLMSG_WRITE },
> { AUDIT_TTY_GET, NETLINK_AUDIT_SOCKET__NLMSG_READ },
> --
> 1.8.3.1
>


--
Ondrej Mosnacek <omosnace at redhat dot com>
Software Engineer, Security Technologies
Red Hat, Inc.

2019-04-09 13:42:03

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 05/10] audit: add contid support for signalling the audit daemon

On Tue, Apr 9, 2019 at 8:58 AM Ondrej Mosnacek <[email protected]> wrote:
>
> On Tue, Apr 9, 2019 at 5:40 AM Richard Guy Briggs <[email protected]> wrote:
> > Add audit container identifier support to the action of signalling the
> > audit daemon.
> >
> > Since this would need to add an element to the audit_sig_info struct,
> > a new record type AUDIT_SIGNAL_INFO2 was created with a new
> > audit_sig_info2 struct. Corresponding support is required in the
> > userspace code to reflect the new record request and reply type.
> > An older userspace won't break since it won't know to request this
> > record type.
> >
> > Signed-off-by: Richard Guy Briggs <[email protected]>
>
> This looks good to me.
>
> Reviewed-by: Ondrej Mosnacek <[email protected]>
>
> Although I'm wondering if we shouldn't try to future-proof the
> AUDIT_SIGNAL_INFO2 format somehow, so that we don't need to add
> another AUDIT_SIGNAL_INFO3 when the need arises to add yet-another
> identifier to it... The simplest solution I can come up with is to add
> a "version" field at the beginning (set to 2 initially), then v<N>_len
> at the beginning of data for version <N>. But maybe this is too
> complicated for too little gain...

FWIW, I believe the long term solution to this is the fabled netlink
attribute approach that we haven't talked about in some time, but I
keep dreaming about (it has been mostly on the back burner becasue 1)
time and 2) didn't want to impact the audit container ID work). While
I'm not opposed to trying to make things like this a bit more robust
by adding version fields and similar things, there are still so many
(so very many) problems with the audit kernel/userspace interface that
still need to be addressed.

--
paul moore
http://www.paul-moore.com

2019-04-09 13:47:52

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 05/10] audit: add contid support for signalling the audit daemon

On Tue, Apr 09, 2019 at 02:57:50PM +0200, Ondrej Mosnacek wrote:
> On Tue, Apr 9, 2019 at 5:40 AM Richard Guy Briggs <[email protected]> wrote:
> > Add audit container identifier support to the action of signalling the
> > audit daemon.
> >
> > Since this would need to add an element to the audit_sig_info struct,
> > a new record type AUDIT_SIGNAL_INFO2 was created with a new
> > audit_sig_info2 struct. Corresponding support is required in the
> > userspace code to reflect the new record request and reply type.
> > An older userspace won't break since it won't know to request this
> > record type.
> >
> > Signed-off-by: Richard Guy Briggs <[email protected]>
>
> This looks good to me.
>
> Reviewed-by: Ondrej Mosnacek <[email protected]>
>
> Although I'm wondering if we shouldn't try to future-proof the
> AUDIT_SIGNAL_INFO2 format somehow, so that we don't need to add
> another AUDIT_SIGNAL_INFO3 when the need arises to add yet-another
> identifier to it... The simplest solution I can come up with is to add
> a "version" field at the beginning (set to 2 initially), then v<N>_len
> at the beginning of data for version <N>. But maybe this is too
> complicated for too little gain...
>
So, I'm not sure how often this needs to be revised (if its not often, this may
be just fine), but if future proofing is warranted, it might be worthwhile to
just use the netlink TLV encoding thats available today. The kernel has a suite
of nla_put_<type> macros (like nla_put_u32()), and the userspace netlink library
can parse those messages fairly easily. It would let you send arbitrary length
messages with a terminator type at the end of the array.

That said, I don't think we want to do that right now for just this message. A
better approach would be to do this now, and in a subsequent patch, create an
AUDIT version 2 netlink protocol that converts all the messages we send to that
format for consistency. Such a change would be large and warrant its own patch
set and review.

I'm good with this patch as it is

Acked-by: Neil Horman <[email protected]>

> > ---
> > include/linux/audit.h | 7 +++++++
> > include/uapi/linux/audit.h | 1 +
> > kernel/audit.c | 27 +++++++++++++++++++++++++++
> > kernel/audit.h | 1 +
> > kernel/auditsc.c | 1 +
> > security/selinux/nlmsgtab.c | 1 +
> > 6 files changed, 38 insertions(+)
> >
> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index 43438192ca2a..c2dec9157463 100644
> > --- a/include/linux/audit.h
> > +++ b/include/linux/audit.h
> > @@ -37,6 +37,13 @@ struct audit_sig_info {
> > char ctx[0];
> > };
> >
> > +struct audit_sig_info2 {
> > + uid_t uid;
> > + pid_t pid;
> > + u64 cid;
> > + char ctx[0];
> > +};
> > +
> > struct audit_buffer;
> > struct audit_context;
> > struct inode;
> > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> > index 55fde9970762..10cc67926cf1 100644
> > --- a/include/uapi/linux/audit.h
> > +++ b/include/uapi/linux/audit.h
> > @@ -72,6 +72,7 @@
> > #define AUDIT_SET_FEATURE 1018 /* Turn an audit feature on or off */
> > #define AUDIT_GET_FEATURE 1019 /* Get which features are enabled */
> > #define AUDIT_CONTAINER_OP 1020 /* Define the container id and info */
> > +#define AUDIT_SIGNAL_INFO2 1021 /* Get info auditd signal sender */
> >
> > #define AUDIT_FIRST_USER_MSG 1100 /* Userspace messages mostly uninteresting to kernel */
> > #define AUDIT_USER_AVC 1107 /* We filter this differently */
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 3e0af53f3c4d..87e1d367f98c 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -138,6 +138,7 @@ struct audit_net {
> > kuid_t audit_sig_uid = INVALID_UID;
> > pid_t audit_sig_pid = -1;
> > u32 audit_sig_sid = 0;
> > +u64 audit_sig_cid = AUDIT_CID_UNSET;
> >
> > /* Records can be lost in several ways:
> > 0) [suppressed in audit_alloc]
> > @@ -1097,6 +1098,7 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
> > case AUDIT_ADD_RULE:
> > case AUDIT_DEL_RULE:
> > case AUDIT_SIGNAL_INFO:
> > + case AUDIT_SIGNAL_INFO2:
> > case AUDIT_TTY_GET:
> > case AUDIT_TTY_SET:
> > case AUDIT_TRIM:
> > @@ -1260,6 +1262,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> > struct audit_buffer *ab;
> > u16 msg_type = nlh->nlmsg_type;
> > struct audit_sig_info *sig_data;
> > + struct audit_sig_info2 *sig_data2;
> > char *ctx = NULL;
> > u32 len;
> >
> > @@ -1519,6 +1522,30 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> > sig_data, sizeof(*sig_data) + len);
> > kfree(sig_data);
> > break;
> > + case AUDIT_SIGNAL_INFO2:
> > + len = 0;
> > + if (audit_sig_sid) {
> > + err = security_secid_to_secctx(audit_sig_sid, &ctx, &len);
> > + if (err)
> > + return err;
> > + }
> > + sig_data2 = kmalloc(sizeof(*sig_data2) + len, GFP_KERNEL);
> > + if (!sig_data2) {
> > + if (audit_sig_sid)
> > + security_release_secctx(ctx, len);
> > + return -ENOMEM;
> > + }
> > + sig_data2->uid = from_kuid(&init_user_ns, audit_sig_uid);
> > + sig_data2->pid = audit_sig_pid;
> > + if (audit_sig_sid) {
> > + memcpy(sig_data2->ctx, ctx, len);
> > + security_release_secctx(ctx, len);
> > + }
> > + sig_data2->cid = audit_sig_cid;
> > + audit_send_reply(skb, seq, AUDIT_SIGNAL_INFO2, 0, 0,
> > + sig_data2, sizeof(*sig_data2) + len);
> > + kfree(sig_data2);
> > + break;
> > case AUDIT_TTY_GET: {
> > struct audit_tty_status s;
> > unsigned int t;
> > diff --git a/kernel/audit.h b/kernel/audit.h
> > index e2912924af0d..c5ac6436317e 100644
> > --- a/kernel/audit.h
> > +++ b/kernel/audit.h
> > @@ -345,6 +345,7 @@ extern void audit_filter_inodes(struct task_struct *tsk,
> > extern pid_t audit_sig_pid;
> > extern kuid_t audit_sig_uid;
> > extern u32 audit_sig_sid;
> > +extern u64 audit_sig_cid;
> >
> > extern int audit_filter(int msgtype, unsigned int listtype);
> >
> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > index eea445b7a181..0a29a00feaf1 100644
> > --- a/kernel/auditsc.c
> > +++ b/kernel/auditsc.c
> > @@ -2405,6 +2405,7 @@ int audit_signal_info(int sig, struct task_struct *t)
> > else
> > audit_sig_uid = uid;
> > security_task_getsecid(current, &audit_sig_sid);
> > + audit_sig_cid = audit_get_contid(current);
> > }
> >
> > if (!audit_signals || audit_dummy_context())
> > diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
> > index 9cec81209617..682fe7397762 100644
> > --- a/security/selinux/nlmsgtab.c
> > +++ b/security/selinux/nlmsgtab.c
> > @@ -132,6 +132,7 @@ struct nlmsg_perm {
> > { AUDIT_DEL_RULE, NETLINK_AUDIT_SOCKET__NLMSG_WRITE },
> > { AUDIT_USER, NETLINK_AUDIT_SOCKET__NLMSG_RELAY },
> > { AUDIT_SIGNAL_INFO, NETLINK_AUDIT_SOCKET__NLMSG_READ },
> > + { AUDIT_SIGNAL_INFO2, NETLINK_AUDIT_SOCKET__NLMSG_READ },
> > { AUDIT_TRIM, NETLINK_AUDIT_SOCKET__NLMSG_WRITE },
> > { AUDIT_MAKE_EQUIV, NETLINK_AUDIT_SOCKET__NLMSG_WRITE },
> > { AUDIT_TTY_GET, NETLINK_AUDIT_SOCKET__NLMSG_READ },
> > --
> > 1.8.3.1
> >
>
>
> --
> Ondrej Mosnacek <omosnace at redhat dot com>
> Software Engineer, Security Technologies
> Red Hat, Inc.
>

2019-04-09 13:50:26

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 05/10] audit: add contid support for signalling the audit daemon

On Tue, Apr 09, 2019 at 09:40:58AM -0400, Paul Moore wrote:
> On Tue, Apr 9, 2019 at 8:58 AM Ondrej Mosnacek <[email protected]> wrote:
> >
> > On Tue, Apr 9, 2019 at 5:40 AM Richard Guy Briggs <[email protected]> wrote:
> > > Add audit container identifier support to the action of signalling the
> > > audit daemon.
> > >
> > > Since this would need to add an element to the audit_sig_info struct,
> > > a new record type AUDIT_SIGNAL_INFO2 was created with a new
> > > audit_sig_info2 struct. Corresponding support is required in the
> > > userspace code to reflect the new record request and reply type.
> > > An older userspace won't break since it won't know to request this
> > > record type.
> > >
> > > Signed-off-by: Richard Guy Briggs <[email protected]>
> >
> > This looks good to me.
> >
> > Reviewed-by: Ondrej Mosnacek <[email protected]>
> >
> > Although I'm wondering if we shouldn't try to future-proof the
> > AUDIT_SIGNAL_INFO2 format somehow, so that we don't need to add
> > another AUDIT_SIGNAL_INFO3 when the need arises to add yet-another
> > identifier to it... The simplest solution I can come up with is to add
> > a "version" field at the beginning (set to 2 initially), then v<N>_len
> > at the beginning of data for version <N>. But maybe this is too
> > complicated for too little gain...
>
> FWIW, I believe the long term solution to this is the fabled netlink
> attribute approach that we haven't talked about in some time, but I
> keep dreaming about (it has been mostly on the back burner becasue 1)
> time and 2) didn't want to impact the audit container ID work). While
> I'm not opposed to trying to make things like this a bit more robust
> by adding version fields and similar things, there are still so many
> (so very many) problems with the audit kernel/userspace interface that
> still need to be addressed.
>

Agreed, this change as-is is in keeping with the message structure that audit
has today, and so is ok with me, but the long term goal should be a conversion
to netlink attributes for all audit messages. Thats a big undertaking and
should be addressed separately though.

Neil

> --
> paul moore
> http://www.paul-moore.com
>

2019-04-09 13:54:18

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 05/10] audit: add contid support for signalling the audit daemon

On 2019-04-09 09:40, Paul Moore wrote:
> On Tue, Apr 9, 2019 at 8:58 AM Ondrej Mosnacek <[email protected]> wrote:
> > On Tue, Apr 9, 2019 at 5:40 AM Richard Guy Briggs <[email protected]> wrote:
> > > Add audit container identifier support to the action of signalling the
> > > audit daemon.
> > >
> > > Since this would need to add an element to the audit_sig_info struct,
> > > a new record type AUDIT_SIGNAL_INFO2 was created with a new
> > > audit_sig_info2 struct. Corresponding support is required in the
> > > userspace code to reflect the new record request and reply type.
> > > An older userspace won't break since it won't know to request this
> > > record type.
> > >
> > > Signed-off-by: Richard Guy Briggs <[email protected]>
> >
> > This looks good to me.
> >
> > Reviewed-by: Ondrej Mosnacek <[email protected]>
> >
> > Although I'm wondering if we shouldn't try to future-proof the
> > AUDIT_SIGNAL_INFO2 format somehow, so that we don't need to add
> > another AUDIT_SIGNAL_INFO3 when the need arises to add yet-another
> > identifier to it... The simplest solution I can come up with is to add
> > a "version" field at the beginning (set to 2 initially), then v<N>_len
> > at the beginning of data for version <N>. But maybe this is too
> > complicated for too little gain...
>
> FWIW, I believe the long term solution to this is the fabled netlink
> attribute approach that we haven't talked about in some time, but I
> keep dreaming about (it has been mostly on the back burner becasue 1)
> time and 2) didn't want to impact the audit container ID work). While
> I'm not opposed to trying to make things like this a bit more robust
> by adding version fields and similar things, there are still so many
> (so very many) problems with the audit kernel/userspace interface that
> still need to be addressed.

While this particular message type is used very infrequently, adding a
version field to every message type strikes me as a huge overhead for
the small likelihood of the format needing to change.

I'd prefer to just key it off the AUDIT_FEATURE_BITMAP or some other
easily detectable change in this distinguishing feature, such as the
presence of /proc/self/audit_containerid, which is what I've done in the
accompanying userspace patchset that I'm preparing to post that works
with this change.

Neil, you are right that netlink has useful mechanisms to help with this
sort of thing, but we're not quite there yet.

> paul moore

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-04-09 14:02:20

by Ondrej Mosnacek

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 05/10] audit: add contid support for signalling the audit daemon

On Tue, Apr 9, 2019 at 3:49 PM Neil Horman <[email protected]> wrote:
> On Tue, Apr 09, 2019 at 09:40:58AM -0400, Paul Moore wrote:
> > On Tue, Apr 9, 2019 at 8:58 AM Ondrej Mosnacek <[email protected]> wrote:
> > >
> > > On Tue, Apr 9, 2019 at 5:40 AM Richard Guy Briggs <[email protected]> wrote:
> > > > Add audit container identifier support to the action of signalling the
> > > > audit daemon.
> > > >
> > > > Since this would need to add an element to the audit_sig_info struct,
> > > > a new record type AUDIT_SIGNAL_INFO2 was created with a new
> > > > audit_sig_info2 struct. Corresponding support is required in the
> > > > userspace code to reflect the new record request and reply type.
> > > > An older userspace won't break since it won't know to request this
> > > > record type.
> > > >
> > > > Signed-off-by: Richard Guy Briggs <[email protected]>
> > >
> > > This looks good to me.
> > >
> > > Reviewed-by: Ondrej Mosnacek <[email protected]>
> > >
> > > Although I'm wondering if we shouldn't try to future-proof the
> > > AUDIT_SIGNAL_INFO2 format somehow, so that we don't need to add
> > > another AUDIT_SIGNAL_INFO3 when the need arises to add yet-another
> > > identifier to it... The simplest solution I can come up with is to add
> > > a "version" field at the beginning (set to 2 initially), then v<N>_len
> > > at the beginning of data for version <N>. But maybe this is too
> > > complicated for too little gain...
> >
> > FWIW, I believe the long term solution to this is the fabled netlink
> > attribute approach that we haven't talked about in some time, but I
> > keep dreaming about (it has been mostly on the back burner becasue 1)
> > time and 2) didn't want to impact the audit container ID work). While
> > I'm not opposed to trying to make things like this a bit more robust
> > by adding version fields and similar things, there are still so many
> > (so very many) problems with the audit kernel/userspace interface that
> > still need to be addressed.
> >
>
> Agreed, this change as-is is in keeping with the message structure that audit
> has today, and so is ok with me, but the long term goal should be a conversion
> to netlink attributes for all audit messages. Thats a big undertaking and
> should be addressed separately though.

Yeah, you both have a good point that doing it now and only for this
message is not necessarily better than not doing it at all. And doing
a general overhaul is out of scope for this series, obviously. I
didn't really mind the current solution before and I mind it even less
now, so consider me satisfied :) I was really just thinking out
loud...

--
Ondrej Mosnacek <omosnace at redhat dot com>
Software Engineer, Security Technologies
Red Hat, Inc.

2019-04-09 14:08:32

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 05/10] audit: add contid support for signalling the audit daemon

On Tue, Apr 9, 2019 at 9:49 AM Neil Horman <[email protected]> wrote:
> On Tue, Apr 09, 2019 at 09:40:58AM -0400, Paul Moore wrote:
> > On Tue, Apr 9, 2019 at 8:58 AM Ondrej Mosnacek <[email protected]> wrote:
> > >
> > > On Tue, Apr 9, 2019 at 5:40 AM Richard Guy Briggs <[email protected]> wrote:
> > > > Add audit container identifier support to the action of signalling the
> > > > audit daemon.
> > > >
> > > > Since this would need to add an element to the audit_sig_info struct,
> > > > a new record type AUDIT_SIGNAL_INFO2 was created with a new
> > > > audit_sig_info2 struct. Corresponding support is required in the
> > > > userspace code to reflect the new record request and reply type.
> > > > An older userspace won't break since it won't know to request this
> > > > record type.
> > > >
> > > > Signed-off-by: Richard Guy Briggs <[email protected]>
> > >
> > > This looks good to me.
> > >
> > > Reviewed-by: Ondrej Mosnacek <[email protected]>
> > >
> > > Although I'm wondering if we shouldn't try to future-proof the
> > > AUDIT_SIGNAL_INFO2 format somehow, so that we don't need to add
> > > another AUDIT_SIGNAL_INFO3 when the need arises to add yet-another
> > > identifier to it... The simplest solution I can come up with is to add
> > > a "version" field at the beginning (set to 2 initially), then v<N>_len
> > > at the beginning of data for version <N>. But maybe this is too
> > > complicated for too little gain...
> >
> > FWIW, I believe the long term solution to this is the fabled netlink
> > attribute approach that we haven't talked about in some time, but I
> > keep dreaming about (it has been mostly on the back burner becasue 1)
> > time and 2) didn't want to impact the audit container ID work). While
> > I'm not opposed to trying to make things like this a bit more robust
> > by adding version fields and similar things, there are still so many
> > (so very many) problems with the audit kernel/userspace interface that
> > still need to be addressed.
>
> Agreed, this change as-is is in keeping with the message structure that audit
> has today, and so is ok with me, but the long term goal should be a conversion
> to netlink attributes for all audit messages. Thats a big undertaking and
> should be addressed separately though.

You've likely missed all the conversations around this from some time
ago, but this is the direction I want us to go towards eventually, and
yes, this is a huge undertaking (much larger than the audit container
ID work) that will need to be done in stages.

The first step is moving away from audit_log_format() to an in-kernel
audit API that separates the data from the record format; I've got a
lot of ideas on that, but as I said earlier, it's mostly on the back
burner so it doesn't hold up the audit container ID work.

--
paul moore
http://www.paul-moore.com

2019-04-09 14:09:39

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 05/10] audit: add contid support for signalling the audit daemon

On Tue, Apr 9, 2019 at 9:53 AM Richard Guy Briggs <[email protected]> wrote:
> On 2019-04-09 09:40, Paul Moore wrote:
> > On Tue, Apr 9, 2019 at 8:58 AM Ondrej Mosnacek <[email protected]> wrote:
> > > On Tue, Apr 9, 2019 at 5:40 AM Richard Guy Briggs <[email protected]> wrote:
> > > > Add audit container identifier support to the action of signalling the
> > > > audit daemon.
> > > >
> > > > Since this would need to add an element to the audit_sig_info struct,
> > > > a new record type AUDIT_SIGNAL_INFO2 was created with a new
> > > > audit_sig_info2 struct. Corresponding support is required in the
> > > > userspace code to reflect the new record request and reply type.
> > > > An older userspace won't break since it won't know to request this
> > > > record type.
> > > >
> > > > Signed-off-by: Richard Guy Briggs <[email protected]>
> > >
> > > This looks good to me.
> > >
> > > Reviewed-by: Ondrej Mosnacek <[email protected]>
> > >
> > > Although I'm wondering if we shouldn't try to future-proof the
> > > AUDIT_SIGNAL_INFO2 format somehow, so that we don't need to add
> > > another AUDIT_SIGNAL_INFO3 when the need arises to add yet-another
> > > identifier to it... The simplest solution I can come up with is to add
> > > a "version" field at the beginning (set to 2 initially), then v<N>_len
> > > at the beginning of data for version <N>. But maybe this is too
> > > complicated for too little gain...
> >
> > FWIW, I believe the long term solution to this is the fabled netlink
> > attribute approach that we haven't talked about in some time, but I
> > keep dreaming about (it has been mostly on the back burner becasue 1)
> > time and 2) didn't want to impact the audit container ID work). While
> > I'm not opposed to trying to make things like this a bit more robust
> > by adding version fields and similar things, there are still so many
> > (so very many) problems with the audit kernel/userspace interface that
> > still need to be addressed.
>
> While this particular message type is used very infrequently, adding a
> version field to every message type strikes me as a huge overhead for
> the small likelihood of the format needing to change.
>
> I'd prefer to just key it off the AUDIT_FEATURE_BITMAP or some other
> easily detectable change in this distinguishing feature, such as the
> presence of /proc/self/audit_containerid, which is what I've done in the
> accompanying userspace patchset that I'm preparing to post that works
> with this change.

That's fine. As I said, I'm not overly worried about this; I view
this as a bit of a necessary hack.

--
paul moore
http://www.paul-moore.com

2019-04-11 11:32:25

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On 2019-04-08 23:39, Richard Guy Briggs wrote:
> Implement kernel audit container identifier.

Here's a first revision of the conversion of my manual test script from
bash to automated perl in the audit-testsuite:

https://github.com/linux-audit/audit-testsuite/pull/83

It revealed some bugs/limitations in userspace code. One is an
omission in my userspace code support for these features that treat the
contid field in the CONATAINER_ID auxiliary record to the NETFILTER_PKT
record as a comma separated list (I already have a patch). Another is
the inability to search on contid in CONTAINER_ID fields (complicated by
the previous issue). A third (already noted in ghau86) is the failure
to group records of the same event if the record number is in the 1000
block. Another is pondering the addition of an old-contid search
option.

Despite these limitations, the test script works.

> This patchset is a fifth based on the proposal document (V3)
> posted:
> https://www.redhat.com/archives/linux-audit/2018-January/msg00014.html
>
> The first patch was the last patch from ghak81 that was absorbed into
> this patchset since its primary justification is the rest of this
> patchset.
>
> The second patch implements the proc fs write to set the audit container
> identifier of a process, emitting an AUDIT_CONTAINER_OP record to
> announce the registration of that audit container identifier on that
> process. This patch requires userspace support for record acceptance
> and proper type display.
>
> The third implements reading the audit container identifier from the
> proc filesystem for debugging. This patch wasn't planned for upstream
> inclusion but is starting to become more likely.
>
> The fourth implements the auxiliary record AUDIT_CONTAINER_ID if an audit
> container identifier is associated with an event. This patch requires
> userspace support for proper type display.
>
> The 5th adds audit daemon signalling provenance through audit_sig_info2.
>
> The 6th creates a local audit context to be able to bind a standalone
> record with a locally created auxiliary record.
>
> The 7th patch adds audit container identifier records to the user
> standalone records.
>
> The 8th adds audit container identifier filtering to the exit,
> exclude and user lists. This patch adds the AUDIT_CONTID field and
> requires auditctl userspace support for the --contid option.
>
> The 9th adds network namespace audit container identifier labelling
> based on member tasks' audit container identifier labels.
>
> The 10th adds audit container identifier support to standalone netfilter
> records that don't have a task context and lists each container to which
> that net namespace belongs.
>
> Example: Set an audit container identifier of 123456 to the "sleep" task:
>
> sleep 2&
> child=$!
> echo 123456 > /proc/$child/audit_containerid; echo $?
> ausearch -ts recent -m container_op
> echo child:$child contid:$( cat /proc/$child/audit_containerid)
>
> This should produce a record such as:
>
> type=CONTAINER_OP msg=audit(2018-06-06 12:39:29.636:26949) : op=set opid=2209 contid=123456 old-contid=18446744073709551615 pid=628 auid=root uid=root tty=ttyS0 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm=bash exe=/usr/bin/bash res=yes
>
>
> Example: Set a filter on an audit container identifier 123459 on /tmp/tmpcontainerid:
>
> contid=123459
> key=tmpcontainerid
> auditctl -a exit,always -F dir=/tmp -F perm=wa -F contid=$contid -F key=$key
> perl -e "sleep 1; open(my \$tmpfile, '>', \"/tmp/$key\"); close(\$tmpfile);" &
> child=$!
> echo $contid > /proc/$child/audit_containerid
> sleep 2
> ausearch -i -ts recent -k $key
> auditctl -d exit,always -F dir=/tmp -F perm=wa -F contid=$contid -F key=$key
> rm -f /tmp/$key
>
> This should produce an event such as:
>
> type=CONTAINER_ID msg=audit(2018-06-06 12:46:31.707:26953) : contid=123459
> type=PROCTITLE msg=audit(2018-06-06 12:46:31.707:26953) : proctitle=perl -e sleep 1; open(my $tmpfile, '>', "/tmp/tmpcontainerid"); close($tmpfile);
> type=PATH msg=audit(2018-06-06 12:46:31.707:26953) : item=1 name=/tmp/tmpcontainerid inode=25656 dev=00:26 mode=file,644 ouid=root ogid=root rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=none cap_fi=none cap_fe=0 cap_fver=0
> type=PATH msg=audit(2018-06-06 12:46:31.707:26953) : item=0 name=/tmp/ inode=8985 dev=00:26 mode=dir,sticky,777 ouid=root ogid=root rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype=PARENT cap_fp=none cap_fi=none cap_fe=0 cap_fver=0
> type=CWD msg=audit(2018-06-06 12:46:31.707:26953) : cwd=/root
> type=SYSCALL msg=audit(2018-06-06 12:46:31.707:26953) : arch=x86_64 syscall=openat success=yes exit=3 a0=0xffffffffffffff9c a1=0x5621f2b81900 a2=O_WRONLY|O_CREAT|O_TRUNC a3=0x1b6 items=2 ppid=628 pid=2232 auid=root uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=ttyS0 ses=1 comm=perl exe=/usr/bin/perl subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=tmpcontainerid
>
> Example: Test multiple containers on one netns:
>
> sleep 5 &
> child1=$!
> containerid1=123451
> echo $containerid1 > /proc/$child1/audit_containerid
> sleep 5 &
> child2=$!
> containerid2=123452
> echo $containerid2 > /proc/$child2/audit_containerid
> iptables -I INPUT -i lo -p icmp --icmp-type echo-request -j AUDIT --type accept
> iptables -I INPUT -t mangle -i lo -p icmp --icmp-type echo-request -j MARK --set-mark 0x12345555
> sleep 1;
> bash -c "ping -q -c 1 127.0.0.1 >/dev/null 2>&1"
> sleep 1;
> ausearch -i -m NETFILTER_PKT -ts boot|grep mark=0x12345555
> ausearch -i -m NETFILTER_PKT -ts boot|grep contid=|grep $containerid1|grep $containerid2
>
> This should produce an event such as:
>
> type=NETFILTER_PKT msg=audit(03/15/2019 14:16:13.369:244) : mark=0x12345555 saddr=127.0.0.1 daddr=127.0.0.1 proto=icmp
> type=CONTAINER_ID msg=audit(03/15/2019 14:16:13.369:244) : contid=123452,123451
>
>
> Includes the last patch of https://github.com/linux-audit/audit-kernel/issues/81
> Please see the github audit kernel issue for the main feature:
> https://github.com/linux-audit/audit-kernel/issues/90
> and the kernel filter code:
> https://github.com/linux-audit/audit-kernel/issues/91
> and the network support:
> https://github.com/linux-audit/audit-kernel/issues/92
> Please see the github audit userspace issue for supporting record types:
> https://github.com/linux-audit/audit-userspace/issues/51
> and filter code:
> https://github.com/linux-audit/audit-userspace/issues/40
> Please see the github audit testsuiite issue for the test case:
> https://github.com/linux-audit/audit-testsuite/issues/64
> Please see the github audit wiki for the feature overview:
> https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
>
>
> Changelog:
>
> v6
> - change TMPBUFLEN from 11 to 21 to cover the decimal value of contid
> u64 (nhorman)
> - fix bug overwriting ctx in struct audit_sig_info, move cid above
> ctx[0] (nhorman)
> - fix bug skipping remaining fields and not advancing bufp when copying
> out contid in audit_krule_to_data (omosnacec)
> - add acks, tidy commit descriptions, other formatting fixes (checkpatch
> wrong on audit_log_lost)
> - cast ull for u64 prints
> - target_cid tracking was moved from the ptrace/signal patch to
> container_op
> - target ptrace and signal records were moved from the ptrace/signal
> patch to container_id
> - auditd signaller tracking was moved to a new AUDIT_SIGNAL_INFO2
> request and record
> - ditch unnecessary list_empty() checks
> - check for null net and aunet in audit_netns_contid_add()
> - swap CONTAINER_OP contid/old-contid order to ease parsing
>
> v5
> - address loginuid and sessionid syscall scope in ghak104
> - address audit_context in CONFIG_AUDIT vs CONFIG_AUDITSYSCALL in ghak105
> - remove tty patch, addressed in ghak106
> - rebase on audit/next v5.0-rc1
> w/ghak59/ghak104/ghak103/ghak100/ghak107/ghak105/ghak106/ghak105sup
> - update CONTAINER_ID to CONTAINER_OP in patch description
> - move audit_context in audit_task_info to CONFIG_AUDITSYSCALL
> - move audit_alloc() and audit_free() out of CONFIG_AUDITSYSCALL and into
> CONFIG_AUDIT and create audit_{alloc,free}_syscall
> - use plain kmem_cache_alloc() rather than kmem_cache_zalloc() in audit_alloc()
> - fix audit_get_contid() declaration type error
> - move audit_set_contid() from auditsc.c to audit.c
> - audit_log_contid() returns void
> - audit_log_contid() handed contid rather than tsk
> - switch from AUDIT_CONTAINER to AUDIT_CONTAINER_ID for aux record
> - move audit_log_contid(tsk/contid) & audit_contid_set(tsk)/audit_contid_valid(contid)
> - switch from tsk to current
> - audit_alloc_local() calls audit_log_lost() on failure to allocate a context
> - add AUDIT_USER* non-syscall contid record
> - cosmetic cleanup double parens, goto out on err
> - ditch audit_get_ns_contid_list_lock(), fix aunet lock race
> - switch from all-cpu read spinlock to rcu, keep spinlock for write
> - update audit_alloc_local() to use ktime_get_coarse_real_ts64()
> - add nft_log support
> - add call from do_exit() in audit_free() to remove contid from netns
> - relegate AUDIT_CONTAINER ref= field (was op=) to debug patch
>
> v4
> - preface set with ghak81:"collect audit task parameters"
> - add shallyn and sgrubb acks
> - rename feature bitmap macro
> - rename cid_valid() to audit_contid_valid()
> - rename AUDIT_CONTAINER_ID to AUDIT_CONTAINER_OP
> - delete audit_get_contid_list() from headers
> - move work into inner if, delete "found"
> - change netns contid list function names
> - move exports for audit_log_contid audit_alloc_local audit_free_context to non-syscall patch
> - list contids CSV
> - pass in gfp flags to audit_alloc_local() (fix audit_alloc_context callers)
> - use "local" in lieu of abusing in_syscall for auditsc_get_stamp()
> - read_lock(&tasklist_lock) around children and thread check
> - task_lock(tsk) should be taken before first check of tsk->audit
> - add spin lock to contid list in aunet
> - restrict /proc read to CAP_AUDIT_CONTROL
> - remove set again prohibition and inherited flag
> - delete contidion spelling fix from patchset, send to netdev/linux-wireless
>
> v3
> - switched from containerid in task_struct to audit_task_info (depends on ghak81)
> - drop INVALID_CID in favour of only AUDIT_CID_UNSET
> - check for !audit_task_info, throw -ENOPROTOOPT on set
> - changed -EPERM to -EEXIST for parent check
> - return AUDIT_CID_UNSET if !audit_enabled
> - squash child/thread check patch into AUDIT_CONTAINER_ID patch
> - changed -EPERM to -EBUSY for child check
> - separate child and thread checks, use -EALREADY for latter
> - move addition of op= from ptrace/signal patch to AUDIT_CONTAINER patch
> - fix && to || bashism in ptrace/signal patch
> - uninline and export function for audit_free_context()
> - drop CONFIG_CHANGE, FEATURE_CHANGE, ANOM_ABEND, ANOM_SECCOMP patches
> - move audit_enabled check (xt_AUDIT)
> - switched from containerid list in struct net to net_generic's struct audit_net
> - move containerid list iteration into audit (xt_AUDIT)
> - create function to move namespace switch into audit
> - switched /proc/PID/ entry from containerid to audit_containerid
> - call kzalloc with GFP_ATOMIC on in_atomic() in audit_alloc_context()
> - call kzalloc with GFP_ATOMIC on in_atomic() in audit_log_container_info()
> - use xt_net(par) instead of sock_net(skb->sk) to get net
> - switched record and field names: initial CONTAINER_ID, aux CONTAINER, field CONTID
> - allow to set own contid
> - open code audit_set_containerid
> - add contid inherited flag
> - ccontainerid and pcontainerid eliminated due to inherited flag
> - change name of container list funcitons
> - rename containerid to contid
> - convert initial container record to syscall aux
> - fix spelling mistake of contidion in net/rfkill/core.c to avoid contid name collision
>
> v2
> - add check for children and threads
> - add network namespace container identifier list
> - add NETFILTER_PKT audit container identifier logging
> - patch description and documentation clean-up and example
> - reap unused ppid
>
> Richard Guy Briggs (10):
> audit: collect audit task parameters
> audit: add container id
> audit: read container ID of a process
> audit: log container info of syscalls
> audit: add contid support for signalling the audit daemon
> audit: add support for non-syscall auxiliary records
> audit: add containerid support for user records
> audit: add containerid filtering
> audit: add support for containerid to network namespaces
> audit: NETFILTER_PKT: record each container ID associated with a netNS
>
> fs/proc/base.c | 57 +++++++-
> include/linux/audit.h | 113 +++++++++++++--
> include/linux/sched.h | 7 +-
> include/uapi/linux/audit.h | 9 +-
> init/init_task.c | 3 +-
> init/main.c | 2 +
> kernel/audit.c | 325 ++++++++++++++++++++++++++++++++++++++++++--
> kernel/audit.h | 9 ++
> kernel/auditfilter.c | 47 +++++++
> kernel/auditsc.c | 90 ++++++++----
> kernel/fork.c | 1 -
> kernel/nsproxy.c | 4 +
> net/netfilter/nft_log.c | 11 +-
> net/netfilter/xt_AUDIT.c | 11 +-
> security/selinux/nlmsgtab.c | 1 +
> 15 files changed, 627 insertions(+), 63 deletions(-)
>
> --
> 1.8.3.1
>

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-04-22 11:55:30

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On Mon, Apr 08, 2019 at 11:39:07PM -0400, Richard Guy Briggs wrote:
> Implement kernel audit container identifier.
>
> This patchset is a fifth based on the proposal document (V3)
> posted:
> https://www.redhat.com/archives/linux-audit/2018-January/msg00014.html
>
> The first patch was the last patch from ghak81 that was absorbed into
> this patchset since its primary justification is the rest of this
> patchset.
>
> The second patch implements the proc fs write to set the audit container
> identifier of a process, emitting an AUDIT_CONTAINER_OP record to
> announce the registration of that audit container identifier on that
> process. This patch requires userspace support for record acceptance
> and proper type display.
>
> The third implements reading the audit container identifier from the
> proc filesystem for debugging. This patch wasn't planned for upstream
> inclusion but is starting to become more likely.
>
> The fourth implements the auxiliary record AUDIT_CONTAINER_ID if an audit
> container identifier is associated with an event. This patch requires
> userspace support for proper type display.
>
> The 5th adds audit daemon signalling provenance through audit_sig_info2.
>
> The 6th creates a local audit context to be able to bind a standalone
> record with a locally created auxiliary record.
>
> The 7th patch adds audit container identifier records to the user
> standalone records.
>
> The 8th adds audit container identifier filtering to the exit,
> exclude and user lists. This patch adds the AUDIT_CONTID field and
> requires auditctl userspace support for the --contid option.
>
> The 9th adds network namespace audit container identifier labelling
> based on member tasks' audit container identifier labels.
>
> The 10th adds audit container identifier support to standalone netfilter
> records that don't have a task context and lists each container to which
> that net namespace belongs.
>
> Example: Set an audit container identifier of 123456 to the "sleep" task:
>
> sleep 2&
> child=$!
> echo 123456 > /proc/$child/audit_containerid; echo $?
> ausearch -ts recent -m container_op
> echo child:$child contid:$( cat /proc/$child/audit_containerid)
>
> This should produce a record such as:
>
> type=CONTAINER_OP msg=audit(2018-06-06 12:39:29.636:26949) : op=set opid=2209 contid=123456 old-contid=18446744073709551615 pid=628 auid=root uid=root tty=ttyS0 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm=bash exe=/usr/bin/bash res=yes
>
>
> Example: Set a filter on an audit container identifier 123459 on /tmp/tmpcontainerid:
>
> contid=123459
> key=tmpcontainerid
> auditctl -a exit,always -F dir=/tmp -F perm=wa -F contid=$contid -F key=$key
> perl -e "sleep 1; open(my \$tmpfile, '>', \"/tmp/$key\"); close(\$tmpfile);" &
> child=$!
> echo $contid > /proc/$child/audit_containerid
> sleep 2
> ausearch -i -ts recent -k $key
> auditctl -d exit,always -F dir=/tmp -F perm=wa -F contid=$contid -F key=$key
> rm -f /tmp/$key
>
> This should produce an event such as:
>
> type=CONTAINER_ID msg=audit(2018-06-06 12:46:31.707:26953) : contid=123459
> type=PROCTITLE msg=audit(2018-06-06 12:46:31.707:26953) : proctitle=perl -e sleep 1; open(my $tmpfile, '>', "/tmp/tmpcontainerid"); close($tmpfile);
> type=PATH msg=audit(2018-06-06 12:46:31.707:26953) : item=1 name=/tmp/tmpcontainerid inode=25656 dev=00:26 mode=file,644 ouid=root ogid=root rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=none cap_fi=none cap_fe=0 cap_fver=0
> type=PATH msg=audit(2018-06-06 12:46:31.707:26953) : item=0 name=/tmp/ inode=8985 dev=00:26 mode=dir,sticky,777 ouid=root ogid=root rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype=PARENT cap_fp=none cap_fi=none cap_fe=0 cap_fver=0
> type=CWD msg=audit(2018-06-06 12:46:31.707:26953) : cwd=/root
> type=SYSCALL msg=audit(2018-06-06 12:46:31.707:26953) : arch=x86_64 syscall=openat success=yes exit=3 a0=0xffffffffffffff9c a1=0x5621f2b81900 a2=O_WRONLY|O_CREAT|O_TRUNC a3=0x1b6 items=2 ppid=628 pid=2232 auid=root uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=ttyS0 ses=1 comm=perl exe=/usr/bin/perl subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=tmpcontainerid
>
> Example: Test multiple containers on one netns:
>
> sleep 5 &
> child1=$!
> containerid1=123451
> echo $containerid1 > /proc/$child1/audit_containerid
> sleep 5 &
> child2=$!
> containerid2=123452
> echo $containerid2 > /proc/$child2/audit_containerid
> iptables -I INPUT -i lo -p icmp --icmp-type echo-request -j AUDIT --type accept
> iptables -I INPUT -t mangle -i lo -p icmp --icmp-type echo-request -j MARK --set-mark 0x12345555
> sleep 1;
> bash -c "ping -q -c 1 127.0.0.1 >/dev/null 2>&1"
> sleep 1;
> ausearch -i -m NETFILTER_PKT -ts boot|grep mark=0x12345555
> ausearch -i -m NETFILTER_PKT -ts boot|grep contid=|grep $containerid1|grep $containerid2
>
> This should produce an event such as:
>
> type=NETFILTER_PKT msg=audit(03/15/2019 14:16:13.369:244) : mark=0x12345555 saddr=127.0.0.1 daddr=127.0.0.1 proto=icmp
> type=CONTAINER_ID msg=audit(03/15/2019 14:16:13.369:244) : contid=123452,123451
>
>
> Includes the last patch of https://github.com/linux-audit/audit-kernel/issues/81
> Please see the github audit kernel issue for the main feature:
> https://github.com/linux-audit/audit-kernel/issues/90
> and the kernel filter code:
> https://github.com/linux-audit/audit-kernel/issues/91
> and the network support:
> https://github.com/linux-audit/audit-kernel/issues/92
> Please see the github audit userspace issue for supporting record types:
> https://github.com/linux-audit/audit-userspace/issues/51
> and filter code:
> https://github.com/linux-audit/audit-userspace/issues/40
> Please see the github audit testsuiite issue for the test case:
> https://github.com/linux-audit/audit-testsuite/issues/64
> Please see the github audit wiki for the feature overview:
> https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
>
>
> Changelog:
>
> v6
> - change TMPBUFLEN from 11 to 21 to cover the decimal value of contid
> u64 (nhorman)
> - fix bug overwriting ctx in struct audit_sig_info, move cid above
> ctx[0] (nhorman)
> - fix bug skipping remaining fields and not advancing bufp when copying
> out contid in audit_krule_to_data (omosnacec)
> - add acks, tidy commit descriptions, other formatting fixes (checkpatch
> wrong on audit_log_lost)
> - cast ull for u64 prints
> - target_cid tracking was moved from the ptrace/signal patch to
> container_op
> - target ptrace and signal records were moved from the ptrace/signal
> patch to container_id
> - auditd signaller tracking was moved to a new AUDIT_SIGNAL_INFO2
> request and record
> - ditch unnecessary list_empty() checks
> - check for null net and aunet in audit_netns_contid_add()
> - swap CONTAINER_OP contid/old-contid order to ease parsing
>
> v5
> - address loginuid and sessionid syscall scope in ghak104
> - address audit_context in CONFIG_AUDIT vs CONFIG_AUDITSYSCALL in ghak105
> - remove tty patch, addressed in ghak106
> - rebase on audit/next v5.0-rc1
> w/ghak59/ghak104/ghak103/ghak100/ghak107/ghak105/ghak106/ghak105sup
> - update CONTAINER_ID to CONTAINER_OP in patch description
> - move audit_context in audit_task_info to CONFIG_AUDITSYSCALL
> - move audit_alloc() and audit_free() out of CONFIG_AUDITSYSCALL and into
> CONFIG_AUDIT and create audit_{alloc,free}_syscall
> - use plain kmem_cache_alloc() rather than kmem_cache_zalloc() in audit_alloc()
> - fix audit_get_contid() declaration type error
> - move audit_set_contid() from auditsc.c to audit.c
> - audit_log_contid() returns void
> - audit_log_contid() handed contid rather than tsk
> - switch from AUDIT_CONTAINER to AUDIT_CONTAINER_ID for aux record
> - move audit_log_contid(tsk/contid) & audit_contid_set(tsk)/audit_contid_valid(contid)
> - switch from tsk to current
> - audit_alloc_local() calls audit_log_lost() on failure to allocate a context
> - add AUDIT_USER* non-syscall contid record
> - cosmetic cleanup double parens, goto out on err
> - ditch audit_get_ns_contid_list_lock(), fix aunet lock race
> - switch from all-cpu read spinlock to rcu, keep spinlock for write
> - update audit_alloc_local() to use ktime_get_coarse_real_ts64()
> - add nft_log support
> - add call from do_exit() in audit_free() to remove contid from netns
> - relegate AUDIT_CONTAINER ref= field (was op=) to debug patch
>
> v4
> - preface set with ghak81:"collect audit task parameters"
> - add shallyn and sgrubb acks
> - rename feature bitmap macro
> - rename cid_valid() to audit_contid_valid()
> - rename AUDIT_CONTAINER_ID to AUDIT_CONTAINER_OP
> - delete audit_get_contid_list() from headers
> - move work into inner if, delete "found"
> - change netns contid list function names
> - move exports for audit_log_contid audit_alloc_local audit_free_context to non-syscall patch
> - list contids CSV
> - pass in gfp flags to audit_alloc_local() (fix audit_alloc_context callers)
> - use "local" in lieu of abusing in_syscall for auditsc_get_stamp()
> - read_lock(&tasklist_lock) around children and thread check
> - task_lock(tsk) should be taken before first check of tsk->audit
> - add spin lock to contid list in aunet
> - restrict /proc read to CAP_AUDIT_CONTROL
> - remove set again prohibition and inherited flag
> - delete contidion spelling fix from patchset, send to netdev/linux-wireless
>
> v3
> - switched from containerid in task_struct to audit_task_info (depends on ghak81)
> - drop INVALID_CID in favour of only AUDIT_CID_UNSET
> - check for !audit_task_info, throw -ENOPROTOOPT on set
> - changed -EPERM to -EEXIST for parent check
> - return AUDIT_CID_UNSET if !audit_enabled
> - squash child/thread check patch into AUDIT_CONTAINER_ID patch
> - changed -EPERM to -EBUSY for child check
> - separate child and thread checks, use -EALREADY for latter
> - move addition of op= from ptrace/signal patch to AUDIT_CONTAINER patch
> - fix && to || bashism in ptrace/signal patch
> - uninline and export function for audit_free_context()
> - drop CONFIG_CHANGE, FEATURE_CHANGE, ANOM_ABEND, ANOM_SECCOMP patches
> - move audit_enabled check (xt_AUDIT)
> - switched from containerid list in struct net to net_generic's struct audit_net
> - move containerid list iteration into audit (xt_AUDIT)
> - create function to move namespace switch into audit
> - switched /proc/PID/ entry from containerid to audit_containerid
> - call kzalloc with GFP_ATOMIC on in_atomic() in audit_alloc_context()
> - call kzalloc with GFP_ATOMIC on in_atomic() in audit_log_container_info()
> - use xt_net(par) instead of sock_net(skb->sk) to get net
> - switched record and field names: initial CONTAINER_ID, aux CONTAINER, field CONTID
> - allow to set own contid
> - open code audit_set_containerid
> - add contid inherited flag
> - ccontainerid and pcontainerid eliminated due to inherited flag
> - change name of container list funcitons
> - rename containerid to contid
> - convert initial container record to syscall aux
> - fix spelling mistake of contidion in net/rfkill/core.c to avoid contid name collision
>
> v2
> - add check for children and threads
> - add network namespace container identifier list
> - add NETFILTER_PKT audit container identifier logging
> - patch description and documentation clean-up and example
> - reap unused ppid
>
> Richard Guy Briggs (10):
> audit: collect audit task parameters
> audit: add container id
> audit: read container ID of a process
> audit: log container info of syscalls
> audit: add contid support for signalling the audit daemon
> audit: add support for non-syscall auxiliary records
> audit: add containerid support for user records
> audit: add containerid filtering
> audit: add support for containerid to network namespaces
> audit: NETFILTER_PKT: record each container ID associated with a netNS
>
> fs/proc/base.c | 57 +++++++-
> include/linux/audit.h | 113 +++++++++++++--
> include/linux/sched.h | 7 +-
> include/uapi/linux/audit.h | 9 +-
> init/init_task.c | 3 +-
> init/main.c | 2 +
> kernel/audit.c | 325 ++++++++++++++++++++++++++++++++++++++++++--
> kernel/audit.h | 9 ++
> kernel/auditfilter.c | 47 +++++++
> kernel/auditsc.c | 90 ++++++++----
> kernel/fork.c | 1 -
> kernel/nsproxy.c | 4 +
> net/netfilter/nft_log.c | 11 +-
> net/netfilter/xt_AUDIT.c | 11 +-
> security/selinux/nlmsgtab.c | 1 +
> 15 files changed, 627 insertions(+), 63 deletions(-)
>
> --
> 1.8.3.1
>
>
I'm sorry, I've lost track of this, where have we landed on it? Are we good for
inclusion?
Neil

2019-04-22 13:52:07

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On Mon, Apr 22, 2019 at 7:38 AM Neil Horman <[email protected]> wrote:
> On Mon, Apr 08, 2019 at 11:39:07PM -0400, Richard Guy Briggs wrote:
> > Implement kernel audit container identifier.
>
> I'm sorry, I've lost track of this, where have we landed on it? Are we good for
> inclusion?

I haven't finished going through this latest revision, but unless
Richard made any significant changes outside of the feedback from the
v5 patchset I'm guessing we are "close".

Based on discussions Richard and I had some time ago, I have always
envisioned the plan as being get the kernel patchset, tests, docs
ready (which Richard has been doing) and then run the actual
implemented API by the userland container folks, e.g. cri-o/lxc/etc.,
to make sure the actual implementation is sane from their perspective.
They've already seen the design, so I'm not expecting any real
surprises here, but sometimes opinions change when they have actual
code in front of them to play with and review.

Beyond that, while the cri-o/lxc/etc. folks are looking it over,
whatever additional testing we can do would be a big win. I'm
thinking I'll pull it into a separate branch in the audit tree
(audit/working-container ?) and include that in my secnext kernels
that I build/test on a regular basis; this is also a handy way to keep
it based against the current audit/next branch. If any changes are
needed Richard can either chose to base those changes on audit/next or
the separate audit container ID branch; that's up to him. I've done
this with other big changes in other trees, e.g. SELinux, and it has
worked well to get some extra testing in and keep the patchset "merge
ready" while others outside the subsystem look things over.

--
paul moore
http://www.paul-moore.com

2019-04-23 10:31:33

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On Mon, Apr 22, 2019 at 09:49:05AM -0400, Paul Moore wrote:
> On Mon, Apr 22, 2019 at 7:38 AM Neil Horman <[email protected]> wrote:
> > On Mon, Apr 08, 2019 at 11:39:07PM -0400, Richard Guy Briggs wrote:
> > > Implement kernel audit container identifier.
> >
> > I'm sorry, I've lost track of this, where have we landed on it? Are we good for
> > inclusion?
>
> I haven't finished going through this latest revision, but unless
> Richard made any significant changes outside of the feedback from the
> v5 patchset I'm guessing we are "close".
>
> Based on discussions Richard and I had some time ago, I have always
> envisioned the plan as being get the kernel patchset, tests, docs
> ready (which Richard has been doing) and then run the actual
> implemented API by the userland container folks, e.g. cri-o/lxc/etc.,
> to make sure the actual implementation is sane from their perspective.
> They've already seen the design, so I'm not expecting any real
> surprises here, but sometimes opinions change when they have actual
> code in front of them to play with and review.
>
> Beyond that, while the cri-o/lxc/etc. folks are looking it over,
> whatever additional testing we can do would be a big win. I'm
> thinking I'll pull it into a separate branch in the audit tree
> (audit/working-container ?) and include that in my secnext kernels
> that I build/test on a regular basis; this is also a handy way to keep
> it based against the current audit/next branch. If any changes are
> needed Richard can either chose to base those changes on audit/next or
> the separate audit container ID branch; that's up to him. I've done
> this with other big changes in other trees, e.g. SELinux, and it has
> worked well to get some extra testing in and keep the patchset "merge
> ready" while others outside the subsystem look things over.
>

That all sounds good, thank you Paul. I knew you and Richard were working on
it, but I somehow managed to loose track of exactly where we left this.

Much Appreciated
Neil

> --
> paul moore
> http://www.paul-moore.com
>

2019-05-28 21:55:47

by Daniel Walsh

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On 4/22/19 9:49 AM, Paul Moore wrote:
> On Mon, Apr 22, 2019 at 7:38 AM Neil Horman <[email protected]> wrote:
>> On Mon, Apr 08, 2019 at 11:39:07PM -0400, Richard Guy Briggs wrote:
>>> Implement kernel audit container identifier.
>> I'm sorry, I've lost track of this, where have we landed on it? Are we good for
>> inclusion?
> I haven't finished going through this latest revision, but unless
> Richard made any significant changes outside of the feedback from the
> v5 patchset I'm guessing we are "close".
>
> Based on discussions Richard and I had some time ago, I have always
> envisioned the plan as being get the kernel patchset, tests, docs
> ready (which Richard has been doing) and then run the actual
> implemented API by the userland container folks, e.g. cri-o/lxc/etc.,
> to make sure the actual implementation is sane from their perspective.
> They've already seen the design, so I'm not expecting any real
> surprises here, but sometimes opinions change when they have actual
> code in front of them to play with and review.
>
> Beyond that, while the cri-o/lxc/etc. folks are looking it over,
> whatever additional testing we can do would be a big win. I'm
> thinking I'll pull it into a separate branch in the audit tree
> (audit/working-container ?) and include that in my secnext kernels
> that I build/test on a regular basis; this is also a handy way to keep
> it based against the current audit/next branch. If any changes are
> needed Richard can either chose to base those changes on audit/next or
> the separate audit container ID branch; that's up to him. I've done
> this with other big changes in other trees, e.g. SELinux, and it has
> worked well to get some extra testing in and keep the patchset "merge
> ready" while others outside the subsystem look things over.
>
Mrunal Patel (maintainer of CRI-O) and I have reviewed the API, and
believe this is something we can work on in the container runtimes team
to implement the container auditing code in CRI-O and Podman.


2019-05-28 22:27:23

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On 2019-05-28 17:53, Daniel Walsh wrote:
> On 4/22/19 9:49 AM, Paul Moore wrote:
> > On Mon, Apr 22, 2019 at 7:38 AM Neil Horman <[email protected]> wrote:
> >> On Mon, Apr 08, 2019 at 11:39:07PM -0400, Richard Guy Briggs wrote:
> >>> Implement kernel audit container identifier.
> >> I'm sorry, I've lost track of this, where have we landed on it? Are we good for
> >> inclusion?
> > I haven't finished going through this latest revision, but unless
> > Richard made any significant changes outside of the feedback from the
> > v5 patchset I'm guessing we are "close".
> >
> > Based on discussions Richard and I had some time ago, I have always
> > envisioned the plan as being get the kernel patchset, tests, docs
> > ready (which Richard has been doing) and then run the actual
> > implemented API by the userland container folks, e.g. cri-o/lxc/etc.,
> > to make sure the actual implementation is sane from their perspective.
> > They've already seen the design, so I'm not expecting any real
> > surprises here, but sometimes opinions change when they have actual
> > code in front of them to play with and review.
> >
> > Beyond that, while the cri-o/lxc/etc. folks are looking it over,
> > whatever additional testing we can do would be a big win. I'm
> > thinking I'll pull it into a separate branch in the audit tree
> > (audit/working-container ?) and include that in my secnext kernels
> > that I build/test on a regular basis; this is also a handy way to keep
> > it based against the current audit/next branch. If any changes are
> > needed Richard can either chose to base those changes on audit/next or
> > the separate audit container ID branch; that's up to him. I've done
> > this with other big changes in other trees, e.g. SELinux, and it has
> > worked well to get some extra testing in and keep the patchset "merge
> > ready" while others outside the subsystem look things over.
> >
> Mrunal Patel (maintainer of CRI-O) and I have reviewed the API, and
> believe this is something we can work on in the container runtimes team
> to implement the container auditing code in CRI-O and Podman.

Thanks Dan, Mrunal!

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-05-28 22:29:59

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On Tue, May 28, 2019 at 5:54 PM Daniel Walsh <[email protected]> wrote:
>
> On 4/22/19 9:49 AM, Paul Moore wrote:
> > On Mon, Apr 22, 2019 at 7:38 AM Neil Horman <[email protected]> wrote:
> >> On Mon, Apr 08, 2019 at 11:39:07PM -0400, Richard Guy Briggs wrote:
> >>> Implement kernel audit container identifier.
> >> I'm sorry, I've lost track of this, where have we landed on it? Are we good for
> >> inclusion?
> > I haven't finished going through this latest revision, but unless
> > Richard made any significant changes outside of the feedback from the
> > v5 patchset I'm guessing we are "close".
> >
> > Based on discussions Richard and I had some time ago, I have always
> > envisioned the plan as being get the kernel patchset, tests, docs
> > ready (which Richard has been doing) and then run the actual
> > implemented API by the userland container folks, e.g. cri-o/lxc/etc.,
> > to make sure the actual implementation is sane from their perspective.
> > They've already seen the design, so I'm not expecting any real
> > surprises here, but sometimes opinions change when they have actual
> > code in front of them to play with and review.
> >
> > Beyond that, while the cri-o/lxc/etc. folks are looking it over,
> > whatever additional testing we can do would be a big win. I'm
> > thinking I'll pull it into a separate branch in the audit tree
> > (audit/working-container ?) and include that in my secnext kernels
> > that I build/test on a regular basis; this is also a handy way to keep
> > it based against the current audit/next branch. If any changes are
> > needed Richard can either chose to base those changes on audit/next or
> > the separate audit container ID branch; that's up to him. I've done
> > this with other big changes in other trees, e.g. SELinux, and it has
> > worked well to get some extra testing in and keep the patchset "merge
> > ready" while others outside the subsystem look things over.
> >
> Mrunal Patel (maintainer of CRI-O) and I have reviewed the API, and
> believe this is something we can work on in the container runtimes team
> to implement the container auditing code in CRI-O and Podman.

Thanks Dan. If I pulled this into a branch and built you some test
kernels to play with, any idea how long it might take to get a proof
of concept working on the cri-o side?

FWIW, I've also reached out to some of the LXC folks I know to get
their take on the API. I think if we can get two different container
runtimes to give the API a thumbs-up then I think we are in good shape
with respect to the userspace interface.

I just finished looking over the last of the pending audit kernel
patches that were queued waiting for the merge window to open so this
is next on my list to look at. I plan to start doing that
tonight/tomorrow, and as long as the changes between v5/v6 are not
that big, it shouldn't take too long.

--
paul moore
http://www.paul-moore.com

2019-05-28 23:04:12

by Steve Grubb

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On Tuesday, May 28, 2019 6:26:47 PM EDT Paul Moore wrote:
> On Tue, May 28, 2019 at 5:54 PM Daniel Walsh <[email protected]> wrote:
> > On 4/22/19 9:49 AM, Paul Moore wrote:
> > > On Mon, Apr 22, 2019 at 7:38 AM Neil Horman <[email protected]>
wrote:
> > >> On Mon, Apr 08, 2019 at 11:39:07PM -0400, Richard Guy Briggs wrote:
> > >>> Implement kernel audit container identifier.
> > >>
> > >> I'm sorry, I've lost track of this, where have we landed on it? Are we
> > >> good for inclusion?
> > >
> > > I haven't finished going through this latest revision, but unless
> > > Richard made any significant changes outside of the feedback from the
> > > v5 patchset I'm guessing we are "close".
> > >
> > > Based on discussions Richard and I had some time ago, I have always
> > > envisioned the plan as being get the kernel patchset, tests, docs
> > > ready (which Richard has been doing) and then run the actual
> > > implemented API by the userland container folks, e.g. cri-o/lxc/etc.,
> > > to make sure the actual implementation is sane from their perspective.
> > > They've already seen the design, so I'm not expecting any real
> > > surprises here, but sometimes opinions change when they have actual
> > > code in front of them to play with and review.
> > >
> > > Beyond that, while the cri-o/lxc/etc. folks are looking it over,
> > > whatever additional testing we can do would be a big win. I'm
> > > thinking I'll pull it into a separate branch in the audit tree
> > > (audit/working-container ?) and include that in my secnext kernels
> > > that I build/test on a regular basis; this is also a handy way to keep
> > > it based against the current audit/next branch. If any changes are
> > > needed Richard can either chose to base those changes on audit/next or
> > > the separate audit container ID branch; that's up to him. I've done
> > > this with other big changes in other trees, e.g. SELinux, and it has
> > > worked well to get some extra testing in and keep the patchset "merge
> > > ready" while others outside the subsystem look things over.
> >
> > Mrunal Patel (maintainer of CRI-O) and I have reviewed the API, and
> > believe this is something we can work on in the container runtimes team
> > to implement the container auditing code in CRI-O and Podman.
>
> Thanks Dan. If I pulled this into a branch and built you some test
> kernels to play with, any idea how long it might take to get a proof
> of concept working on the cri-o side?

We'd need to merge user space patches and let them use that instead of the
raw interface. I'm not going to merge user space until we are pretty sure the
patch is going into the kernel.

-Steve

> FWIW, I've also reached out to some of the LXC folks I know to get
> their take on the API. I think if we can get two different container
> runtimes to give the API a thumbs-up then I think we are in good shape
> with respect to the userspace interface.
>
> I just finished looking over the last of the pending audit kernel
> patches that were queued waiting for the merge window to open so this
> is next on my list to look at. I plan to start doing that
> tonight/tomorrow, and as long as the changes between v5/v6 are not
> that big, it shouldn't take too long.




2019-05-29 00:45:30

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On 2019-05-28 19:00, Steve Grubb wrote:
> On Tuesday, May 28, 2019 6:26:47 PM EDT Paul Moore wrote:
> > On Tue, May 28, 2019 at 5:54 PM Daniel Walsh <[email protected]> wrote:
> > > On 4/22/19 9:49 AM, Paul Moore wrote:
> > > > On Mon, Apr 22, 2019 at 7:38 AM Neil Horman <[email protected]>
> wrote:
> > > >> On Mon, Apr 08, 2019 at 11:39:07PM -0400, Richard Guy Briggs wrote:
> > > >>> Implement kernel audit container identifier.
> > > >>
> > > >> I'm sorry, I've lost track of this, where have we landed on it? Are we
> > > >> good for inclusion?
> > > >
> > > > I haven't finished going through this latest revision, but unless
> > > > Richard made any significant changes outside of the feedback from the
> > > > v5 patchset I'm guessing we are "close".
> > > >
> > > > Based on discussions Richard and I had some time ago, I have always
> > > > envisioned the plan as being get the kernel patchset, tests, docs
> > > > ready (which Richard has been doing) and then run the actual
> > > > implemented API by the userland container folks, e.g. cri-o/lxc/etc.,
> > > > to make sure the actual implementation is sane from their perspective.
> > > > They've already seen the design, so I'm not expecting any real
> > > > surprises here, but sometimes opinions change when they have actual
> > > > code in front of them to play with and review.
> > > >
> > > > Beyond that, while the cri-o/lxc/etc. folks are looking it over,
> > > > whatever additional testing we can do would be a big win. I'm
> > > > thinking I'll pull it into a separate branch in the audit tree
> > > > (audit/working-container ?) and include that in my secnext kernels
> > > > that I build/test on a regular basis; this is also a handy way to keep
> > > > it based against the current audit/next branch. If any changes are
> > > > needed Richard can either chose to base those changes on audit/next or
> > > > the separate audit container ID branch; that's up to him. I've done
> > > > this with other big changes in other trees, e.g. SELinux, and it has
> > > > worked well to get some extra testing in and keep the patchset "merge
> > > > ready" while others outside the subsystem look things over.
> > >
> > > Mrunal Patel (maintainer of CRI-O) and I have reviewed the API, and
> > > believe this is something we can work on in the container runtimes team
> > > to implement the container auditing code in CRI-O and Podman.
> >
> > Thanks Dan. If I pulled this into a branch and built you some test
> > kernels to play with, any idea how long it might take to get a proof
> > of concept working on the cri-o side?
>
> We'd need to merge user space patches and let them use that instead of the
> raw interface. I'm not going to merge user space until we are pretty sure the
> patch is going into the kernel.

I have an f29 test rpm of the userspace bits if that helps for testing:
http://people.redhat.com/~rbriggs/ghak90/git-1db7e21/

Here's what it contains (minus the last patch):
https://github.com/linux-audit/audit-userspace/compare/master...rgbriggs:ghau40-containerid-filter.v7.0

> -Steve
>
> > FWIW, I've also reached out to some of the LXC folks I know to get
> > their take on the API. I think if we can get two different container
> > runtimes to give the API a thumbs-up then I think we are in good shape
> > with respect to the userspace interface.
> >
> > I just finished looking over the last of the pending audit kernel
> > patches that were queued waiting for the merge window to open so this
> > is next on my list to look at. I plan to start doing that
> > tonight/tomorrow, and as long as the changes between v5/v6 are not
> > that big, it shouldn't take too long.

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-05-29 12:05:16

by Daniel Walsh

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On 5/28/19 8:43 PM, Richard Guy Briggs wrote:
> On 2019-05-28 19:00, Steve Grubb wrote:
>> On Tuesday, May 28, 2019 6:26:47 PM EDT Paul Moore wrote:
>>> On Tue, May 28, 2019 at 5:54 PM Daniel Walsh <[email protected]> wrote:
>>>> On 4/22/19 9:49 AM, Paul Moore wrote:
>>>>> On Mon, Apr 22, 2019 at 7:38 AM Neil Horman <[email protected]>
>> wrote:
>>>>>> On Mon, Apr 08, 2019 at 11:39:07PM -0400, Richard Guy Briggs wrote:
>>>>>>> Implement kernel audit container identifier.
>>>>>> I'm sorry, I've lost track of this, where have we landed on it? Are we
>>>>>> good for inclusion?
>>>>> I haven't finished going through this latest revision, but unless
>>>>> Richard made any significant changes outside of the feedback from the
>>>>> v5 patchset I'm guessing we are "close".
>>>>>
>>>>> Based on discussions Richard and I had some time ago, I have always
>>>>> envisioned the plan as being get the kernel patchset, tests, docs
>>>>> ready (which Richard has been doing) and then run the actual
>>>>> implemented API by the userland container folks, e.g. cri-o/lxc/etc.,
>>>>> to make sure the actual implementation is sane from their perspective.
>>>>> They've already seen the design, so I'm not expecting any real
>>>>> surprises here, but sometimes opinions change when they have actual
>>>>> code in front of them to play with and review.
>>>>>
>>>>> Beyond that, while the cri-o/lxc/etc. folks are looking it over,
>>>>> whatever additional testing we can do would be a big win. I'm
>>>>> thinking I'll pull it into a separate branch in the audit tree
>>>>> (audit/working-container ?) and include that in my secnext kernels
>>>>> that I build/test on a regular basis; this is also a handy way to keep
>>>>> it based against the current audit/next branch. If any changes are
>>>>> needed Richard can either chose to base those changes on audit/next or
>>>>> the separate audit container ID branch; that's up to him. I've done
>>>>> this with other big changes in other trees, e.g. SELinux, and it has
>>>>> worked well to get some extra testing in and keep the patchset "merge
>>>>> ready" while others outside the subsystem look things over.
>>>> Mrunal Patel (maintainer of CRI-O) and I have reviewed the API, and
>>>> believe this is something we can work on in the container runtimes team
>>>> to implement the container auditing code in CRI-O and Podman.
>>> Thanks Dan. If I pulled this into a branch and built you some test
>>> kernels to play with, any idea how long it might take to get a proof
>>> of concept working on the cri-o side?
>> We'd need to merge user space patches and let them use that instead of the
>> raw interface. I'm not going to merge user space until we are pretty sure the
>> patch is going into the kernel.
> I have an f29 test rpm of the userspace bits if that helps for testing:
> http://people.redhat.com/~rbriggs/ghak90/git-1db7e21/
>
> Here's what it contains (minus the last patch):
> https://github.com/linux-audit/audit-userspace/compare/master...rgbriggs:ghau40-containerid-filter.v7.0
>
>> -Steve
>>
>>> FWIW, I've also reached out to some of the LXC folks I know to get
>>> their take on the API. I think if we can get two different container
>>> runtimes to give the API a thumbs-up then I think we are in good shape
>>> with respect to the userspace interface.
>>>
>>> I just finished looking over the last of the pending audit kernel
>>> patches that were queued waiting for the merge window to open so this
>>> is next on my list to look at. I plan to start doing that
>>> tonight/tomorrow, and as long as the changes between v5/v6 are not
>>> that big, it shouldn't take too long.
> - RGB
>
> --
> Richard Guy Briggs <[email protected]>
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635

Our current thoughts are to put the setting of the ID inside of conmon,
and then launching the OCI Runtime.  In a perfect world this would
happen in the OCI Runtime, but we have no controls over different OCI
Runtimes. 

By putting it into conmon, then CRI-O and Podman will automatically get
the container id support.  After we have this we have to plumb it back
up through the contianer engines to be able to easily report the link
between the Container UUID and The Kernel Container Audit ID.


2019-05-29 13:16:41

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On Tue, May 28, 2019 at 8:44 PM Richard Guy Briggs <[email protected]> wrote:
> On 2019-05-28 19:00, Steve Grubb wrote:
> > On Tuesday, May 28, 2019 6:26:47 PM EDT Paul Moore wrote:
> > > On Tue, May 28, 2019 at 5:54 PM Daniel Walsh <[email protected]> wrote:

...

> > > > Mrunal Patel (maintainer of CRI-O) and I have reviewed the API, and
> > > > believe this is something we can work on in the container runtimes team
> > > > to implement the container auditing code in CRI-O and Podman.
> > >
> > > Thanks Dan. If I pulled this into a branch and built you some test
> > > kernels to play with, any idea how long it might take to get a proof
> > > of concept working on the cri-o side?
> >
> > We'd need to merge user space patches and let them use that instead of the
> > raw interface. I'm not going to merge user space until we are pretty sure the
> > patch is going into the kernel.
>
> I have an f29 test rpm of the userspace bits if that helps for testing:
> http://people.redhat.com/~rbriggs/ghak90/git-1db7e21/
>
> Here's what it contains (minus the last patch):
> https://github.com/linux-audit/audit-userspace/compare/master...rgbriggs:ghau40-containerid-filter.v7.0

Yes, exactly. Just as I plan to start making some test kernels for
people to play with (assuming v6 looks okay), I think it would be good
if Steve could make a test build of the latest audit userspace with
the audit container ID patches. It really shouldn't be that hard, and
the benefits should far outweigh any time spent generating the
tree/builds.

--
paul moore
http://www.paul-moore.com

2019-05-29 13:20:02

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On Wed, May 29, 2019 at 8:03 AM Daniel Walsh <[email protected]> wrote:
>
> On 5/28/19 8:43 PM, Richard Guy Briggs wrote:
> > On 2019-05-28 19:00, Steve Grubb wrote:
> >> On Tuesday, May 28, 2019 6:26:47 PM EDT Paul Moore wrote:
> >>> On Tue, May 28, 2019 at 5:54 PM Daniel Walsh <[email protected]> wrote:
> >>>> On 4/22/19 9:49 AM, Paul Moore wrote:
> >>>>> On Mon, Apr 22, 2019 at 7:38 AM Neil Horman <[email protected]>
> >> wrote:
> >>>>>> On Mon, Apr 08, 2019 at 11:39:07PM -0400, Richard Guy Briggs wrote:
> >>>>>>> Implement kernel audit container identifier.
> >>>>>> I'm sorry, I've lost track of this, where have we landed on it? Are we
> >>>>>> good for inclusion?
> >>>>> I haven't finished going through this latest revision, but unless
> >>>>> Richard made any significant changes outside of the feedback from the
> >>>>> v5 patchset I'm guessing we are "close".
> >>>>>
> >>>>> Based on discussions Richard and I had some time ago, I have always
> >>>>> envisioned the plan as being get the kernel patchset, tests, docs
> >>>>> ready (which Richard has been doing) and then run the actual
> >>>>> implemented API by the userland container folks, e.g. cri-o/lxc/etc.,
> >>>>> to make sure the actual implementation is sane from their perspective.
> >>>>> They've already seen the design, so I'm not expecting any real
> >>>>> surprises here, but sometimes opinions change when they have actual
> >>>>> code in front of them to play with and review.
> >>>>>
> >>>>> Beyond that, while the cri-o/lxc/etc. folks are looking it over,
> >>>>> whatever additional testing we can do would be a big win. I'm
> >>>>> thinking I'll pull it into a separate branch in the audit tree
> >>>>> (audit/working-container ?) and include that in my secnext kernels
> >>>>> that I build/test on a regular basis; this is also a handy way to keep
> >>>>> it based against the current audit/next branch. If any changes are
> >>>>> needed Richard can either chose to base those changes on audit/next or
> >>>>> the separate audit container ID branch; that's up to him. I've done
> >>>>> this with other big changes in other trees, e.g. SELinux, and it has
> >>>>> worked well to get some extra testing in and keep the patchset "merge
> >>>>> ready" while others outside the subsystem look things over.
> >>>> Mrunal Patel (maintainer of CRI-O) and I have reviewed the API, and
> >>>> believe this is something we can work on in the container runtimes team
> >>>> to implement the container auditing code in CRI-O and Podman.
> >>> Thanks Dan. If I pulled this into a branch and built you some test
> >>> kernels to play with, any idea how long it might take to get a proof
> >>> of concept working on the cri-o side?
> >> We'd need to merge user space patches and let them use that instead of the
> >> raw interface. I'm not going to merge user space until we are pretty sure the
> >> patch is going into the kernel.
> > I have an f29 test rpm of the userspace bits if that helps for testing:
> > http://people.redhat.com/~rbriggs/ghak90/git-1db7e21/
> >
> > Here's what it contains (minus the last patch):
> > https://github.com/linux-audit/audit-userspace/compare/master...rgbriggs:ghau40-containerid-filter.v7.0
> >
> >> -Steve
> >>
> >>> FWIW, I've also reached out to some of the LXC folks I know to get
> >>> their take on the API. I think if we can get two different container
> >>> runtimes to give the API a thumbs-up then I think we are in good shape
> >>> with respect to the userspace interface.
> >>>
> >>> I just finished looking over the last of the pending audit kernel
> >>> patches that were queued waiting for the merge window to open so this
> >>> is next on my list to look at. I plan to start doing that
> >>> tonight/tomorrow, and as long as the changes between v5/v6 are not
> >>> that big, it shouldn't take too long.
> > - RGB
> >
> > --
> > Richard Guy Briggs <[email protected]>
> > Sr. S/W Engineer, Kernel Security, Base Operating Systems
> > Remote, Ottawa, Red Hat Canada
> > IRC: rgb, SunRaycer
> > Voice: +1.647.777.2635, Internal: (81) 32635
>
> Our current thoughts are to put the setting of the ID inside of conmon,
> and then launching the OCI Runtime. In a perfect world this would
> happen in the OCI Runtime, but we have no controls over different OCI
> Runtimes.
>
> By putting it into conmon, then CRI-O and Podman will automatically get
> the container id support. After we have this we have to plumb it back
> up through the contianer engines to be able to easily report the link
> between the Container UUID and The Kernel Container Audit ID.

I'm glad you guys have a plan, that's encouraging, but sadly I have no
idea about the level of complexity/difficulty involved in modifying
the various container bits for a proof-of-concept? Are we talking a
week or two? A month? More?

--
paul moore
http://www.paul-moore.com

2019-05-29 14:11:43

by Daniel Walsh

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On 5/29/19 9:17 AM, Paul Moore wrote:
> On Wed, May 29, 2019 at 8:03 AM Daniel Walsh <[email protected]> wrote:
>> On 5/28/19 8:43 PM, Richard Guy Briggs wrote:
>>> On 2019-05-28 19:00, Steve Grubb wrote:
>>>> On Tuesday, May 28, 2019 6:26:47 PM EDT Paul Moore wrote:
>>>>> On Tue, May 28, 2019 at 5:54 PM Daniel Walsh <[email protected]> wrote:
>>>>>> On 4/22/19 9:49 AM, Paul Moore wrote:
>>>>>>> On Mon, Apr 22, 2019 at 7:38 AM Neil Horman <[email protected]>
>>>> wrote:
>>>>>>>> On Mon, Apr 08, 2019 at 11:39:07PM -0400, Richard Guy Briggs wrote:
>>>>>>>>> Implement kernel audit container identifier.
>>>>>>>> I'm sorry, I've lost track of this, where have we landed on it? Are we
>>>>>>>> good for inclusion?
>>>>>>> I haven't finished going through this latest revision, but unless
>>>>>>> Richard made any significant changes outside of the feedback from the
>>>>>>> v5 patchset I'm guessing we are "close".
>>>>>>>
>>>>>>> Based on discussions Richard and I had some time ago, I have always
>>>>>>> envisioned the plan as being get the kernel patchset, tests, docs
>>>>>>> ready (which Richard has been doing) and then run the actual
>>>>>>> implemented API by the userland container folks, e.g. cri-o/lxc/etc.,
>>>>>>> to make sure the actual implementation is sane from their perspective.
>>>>>>> They've already seen the design, so I'm not expecting any real
>>>>>>> surprises here, but sometimes opinions change when they have actual
>>>>>>> code in front of them to play with and review.
>>>>>>>
>>>>>>> Beyond that, while the cri-o/lxc/etc. folks are looking it over,
>>>>>>> whatever additional testing we can do would be a big win. I'm
>>>>>>> thinking I'll pull it into a separate branch in the audit tree
>>>>>>> (audit/working-container ?) and include that in my secnext kernels
>>>>>>> that I build/test on a regular basis; this is also a handy way to keep
>>>>>>> it based against the current audit/next branch. If any changes are
>>>>>>> needed Richard can either chose to base those changes on audit/next or
>>>>>>> the separate audit container ID branch; that's up to him. I've done
>>>>>>> this with other big changes in other trees, e.g. SELinux, and it has
>>>>>>> worked well to get some extra testing in and keep the patchset "merge
>>>>>>> ready" while others outside the subsystem look things over.
>>>>>> Mrunal Patel (maintainer of CRI-O) and I have reviewed the API, and
>>>>>> believe this is something we can work on in the container runtimes team
>>>>>> to implement the container auditing code in CRI-O and Podman.
>>>>> Thanks Dan. If I pulled this into a branch and built you some test
>>>>> kernels to play with, any idea how long it might take to get a proof
>>>>> of concept working on the cri-o side?
>>>> We'd need to merge user space patches and let them use that instead of the
>>>> raw interface. I'm not going to merge user space until we are pretty sure the
>>>> patch is going into the kernel.
>>> I have an f29 test rpm of the userspace bits if that helps for testing:
>>> http://people.redhat.com/~rbriggs/ghak90/git-1db7e21/
>>>
>>> Here's what it contains (minus the last patch):
>>> https://github.com/linux-audit/audit-userspace/compare/master...rgbriggs:ghau40-containerid-filter.v7.0
>>>
>>>> -Steve
>>>>
>>>>> FWIW, I've also reached out to some of the LXC folks I know to get
>>>>> their take on the API. I think if we can get two different container
>>>>> runtimes to give the API a thumbs-up then I think we are in good shape
>>>>> with respect to the userspace interface.
>>>>>
>>>>> I just finished looking over the last of the pending audit kernel
>>>>> patches that were queued waiting for the merge window to open so this
>>>>> is next on my list to look at. I plan to start doing that
>>>>> tonight/tomorrow, and as long as the changes between v5/v6 are not
>>>>> that big, it shouldn't take too long.
>>> - RGB
>>>
>>> --
>>> Richard Guy Briggs <[email protected]>
>>> Sr. S/W Engineer, Kernel Security, Base Operating Systems
>>> Remote, Ottawa, Red Hat Canada
>>> IRC: rgb, SunRaycer
>>> Voice: +1.647.777.2635, Internal: (81) 32635
>> Our current thoughts are to put the setting of the ID inside of conmon,
>> and then launching the OCI Runtime. In a perfect world this would
>> happen in the OCI Runtime, but we have no controls over different OCI
>> Runtimes.
>>
>> By putting it into conmon, then CRI-O and Podman will automatically get
>> the container id support. After we have this we have to plumb it back
>> up through the contianer engines to be able to easily report the link
>> between the Container UUID and The Kernel Container Audit ID.
> I'm glad you guys have a plan, that's encouraging, but sadly I have no
> idea about the level of complexity/difficulty involved in modifying
> the various container bits for a proof-of-concept? Are we talking a
> week or two? A month? More?
>
If we had the kernel and the libaudit api, it would involve a small
effort in conmon,  I would figure a few days for a POC.  Getting the
hole wiring into CRI-O and Podman, would be a little more effort.


2019-05-29 14:35:51

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On Wed, May 29, 2019 at 10:07 AM Daniel Walsh <[email protected]> wrote:
> On 5/29/19 9:17 AM, Paul Moore wrote:
> > On Wed, May 29, 2019 at 8:03 AM Daniel Walsh <[email protected]> wrote:
> >> On 5/28/19 8:43 PM, Richard Guy Briggs wrote:
> >>> On 2019-05-28 19:00, Steve Grubb wrote:
> >>>> On Tuesday, May 28, 2019 6:26:47 PM EDT Paul Moore wrote:
> >>>>> On Tue, May 28, 2019 at 5:54 PM Daniel Walsh <[email protected]> wrote:
> >>>>>> On 4/22/19 9:49 AM, Paul Moore wrote:
> >>>>>>> On Mon, Apr 22, 2019 at 7:38 AM Neil Horman <[email protected]>
> >>>> wrote:
> >>>>>>>> On Mon, Apr 08, 2019 at 11:39:07PM -0400, Richard Guy Briggs wrote:
> >>>>>>>>> Implement kernel audit container identifier.
> >>>>>>>> I'm sorry, I've lost track of this, where have we landed on it? Are we
> >>>>>>>> good for inclusion?
> >>>>>>> I haven't finished going through this latest revision, but unless
> >>>>>>> Richard made any significant changes outside of the feedback from the
> >>>>>>> v5 patchset I'm guessing we are "close".
> >>>>>>>
> >>>>>>> Based on discussions Richard and I had some time ago, I have always
> >>>>>>> envisioned the plan as being get the kernel patchset, tests, docs
> >>>>>>> ready (which Richard has been doing) and then run the actual
> >>>>>>> implemented API by the userland container folks, e.g. cri-o/lxc/etc.,
> >>>>>>> to make sure the actual implementation is sane from their perspective.
> >>>>>>> They've already seen the design, so I'm not expecting any real
> >>>>>>> surprises here, but sometimes opinions change when they have actual
> >>>>>>> code in front of them to play with and review.
> >>>>>>>
> >>>>>>> Beyond that, while the cri-o/lxc/etc. folks are looking it over,
> >>>>>>> whatever additional testing we can do would be a big win. I'm
> >>>>>>> thinking I'll pull it into a separate branch in the audit tree
> >>>>>>> (audit/working-container ?) and include that in my secnext kernels
> >>>>>>> that I build/test on a regular basis; this is also a handy way to keep
> >>>>>>> it based against the current audit/next branch. If any changes are
> >>>>>>> needed Richard can either chose to base those changes on audit/next or
> >>>>>>> the separate audit container ID branch; that's up to him. I've done
> >>>>>>> this with other big changes in other trees, e.g. SELinux, and it has
> >>>>>>> worked well to get some extra testing in and keep the patchset "merge
> >>>>>>> ready" while others outside the subsystem look things over.
> >>>>>> Mrunal Patel (maintainer of CRI-O) and I have reviewed the API, and
> >>>>>> believe this is something we can work on in the container runtimes team
> >>>>>> to implement the container auditing code in CRI-O and Podman.
> >>>>> Thanks Dan. If I pulled this into a branch and built you some test
> >>>>> kernels to play with, any idea how long it might take to get a proof
> >>>>> of concept working on the cri-o side?
> >>>> We'd need to merge user space patches and let them use that instead of the
> >>>> raw interface. I'm not going to merge user space until we are pretty sure the
> >>>> patch is going into the kernel.
> >>> I have an f29 test rpm of the userspace bits if that helps for testing:
> >>> http://people.redhat.com/~rbriggs/ghak90/git-1db7e21/
> >>>
> >>> Here's what it contains (minus the last patch):
> >>> https://github.com/linux-audit/audit-userspace/compare/master...rgbriggs:ghau40-containerid-filter.v7.0
> >>>
> >>>> -Steve
> >>>>
> >>>>> FWIW, I've also reached out to some of the LXC folks I know to get
> >>>>> their take on the API. I think if we can get two different container
> >>>>> runtimes to give the API a thumbs-up then I think we are in good shape
> >>>>> with respect to the userspace interface.
> >>>>>
> >>>>> I just finished looking over the last of the pending audit kernel
> >>>>> patches that were queued waiting for the merge window to open so this
> >>>>> is next on my list to look at. I plan to start doing that
> >>>>> tonight/tomorrow, and as long as the changes between v5/v6 are not
> >>>>> that big, it shouldn't take too long.
> >>> - RGB
> >>>
> >>> --
> >>> Richard Guy Briggs <[email protected]>
> >>> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> >>> Remote, Ottawa, Red Hat Canada
> >>> IRC: rgb, SunRaycer
> >>> Voice: +1.647.777.2635, Internal: (81) 32635
> >> Our current thoughts are to put the setting of the ID inside of conmon,
> >> and then launching the OCI Runtime. In a perfect world this would
> >> happen in the OCI Runtime, but we have no controls over different OCI
> >> Runtimes.
> >>
> >> By putting it into conmon, then CRI-O and Podman will automatically get
> >> the container id support. After we have this we have to plumb it back
> >> up through the contianer engines to be able to easily report the link
> >> between the Container UUID and The Kernel Container Audit ID.
> > I'm glad you guys have a plan, that's encouraging, but sadly I have no
> > idea about the level of complexity/difficulty involved in modifying
> > the various container bits for a proof-of-concept? Are we talking a
> > week or two? A month? More?
> >
> If we had the kernel and the libaudit api, it would involve a small
> effort in conmon, I would figure a few days for a POC. Getting the
> hole wiring into CRI-O and Podman, would be a little more effort.

That's great. Stay tuned ...

--
paul moore
http://www.paul-moore.com

2019-05-29 14:59:21

by Tycho Andersen

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Mon, Apr 08, 2019 at 11:39:09PM -0400, Richard Guy Briggs wrote:
> It is not permitted to unset the audit container identifier.
> A child inherits its parent's audit container identifier.

...

> /**
> + * audit_set_contid - set current task's audit contid
> + * @contid: contid value
> + *
> + * Returns 0 on success, -EPERM on permission failure.
> + *
> + * Called (set) from fs/proc/base.c::proc_contid_write().
> + */
> +int audit_set_contid(struct task_struct *task, u64 contid)
> +{
> + u64 oldcontid;
> + int rc = 0;
> + struct audit_buffer *ab;
> + uid_t uid;
> + struct tty_struct *tty;
> + char comm[sizeof(current->comm)];
> +
> + task_lock(task);
> + /* Can't set if audit disabled */
> + if (!task->audit) {
> + task_unlock(task);
> + return -ENOPROTOOPT;
> + }
> + oldcontid = audit_get_contid(task);
> + read_lock(&tasklist_lock);
> + /* Don't allow the audit containerid to be unset */
> + if (!audit_contid_valid(contid))
> + rc = -EINVAL;
> + /* if we don't have caps, reject */
> + else if (!capable(CAP_AUDIT_CONTROL))
> + rc = -EPERM;
> + /* if task has children or is not single-threaded, deny */
> + else if (!list_empty(&task->children))
> + rc = -EBUSY;
> + else if (!(thread_group_leader(task) && thread_group_empty(task)))
> + rc = -EALREADY;
> + read_unlock(&tasklist_lock);
> + if (!rc)
> + task->audit->contid = contid;
> + task_unlock(task);
> +
> + if (!audit_enabled)
> + return rc;

...but it is allowed to change it (assuming
capable(CAP_AUDIT_CONTROL), of course)? Seems like this might be more
immediately useful since we still live in the world of majority
privileged containers if we didn't allow changing it, in addition to
un-setting it.

Tycho

2019-05-29 15:30:56

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Wed, May 29, 2019 at 10:57 AM Tycho Andersen <[email protected]> wrote:
>
> On Mon, Apr 08, 2019 at 11:39:09PM -0400, Richard Guy Briggs wrote:
> > It is not permitted to unset the audit container identifier.
> > A child inherits its parent's audit container identifier.
>
> ...
>
> > /**
> > + * audit_set_contid - set current task's audit contid
> > + * @contid: contid value
> > + *
> > + * Returns 0 on success, -EPERM on permission failure.
> > + *
> > + * Called (set) from fs/proc/base.c::proc_contid_write().
> > + */
> > +int audit_set_contid(struct task_struct *task, u64 contid)
> > +{
> > + u64 oldcontid;
> > + int rc = 0;
> > + struct audit_buffer *ab;
> > + uid_t uid;
> > + struct tty_struct *tty;
> > + char comm[sizeof(current->comm)];
> > +
> > + task_lock(task);
> > + /* Can't set if audit disabled */
> > + if (!task->audit) {
> > + task_unlock(task);
> > + return -ENOPROTOOPT;
> > + }
> > + oldcontid = audit_get_contid(task);
> > + read_lock(&tasklist_lock);
> > + /* Don't allow the audit containerid to be unset */
> > + if (!audit_contid_valid(contid))
> > + rc = -EINVAL;
> > + /* if we don't have caps, reject */
> > + else if (!capable(CAP_AUDIT_CONTROL))
> > + rc = -EPERM;
> > + /* if task has children or is not single-threaded, deny */
> > + else if (!list_empty(&task->children))
> > + rc = -EBUSY;
> > + else if (!(thread_group_leader(task) && thread_group_empty(task)))
> > + rc = -EALREADY;
> > + read_unlock(&tasklist_lock);
> > + if (!rc)
> > + task->audit->contid = contid;
> > + task_unlock(task);
> > +
> > + if (!audit_enabled)
> > + return rc;
>
> ...but it is allowed to change it (assuming
> capable(CAP_AUDIT_CONTROL), of course)? Seems like this might be more
> immediately useful since we still live in the world of majority
> privileged containers if we didn't allow changing it, in addition to
> un-setting it.

The idea is that only container orchestrators should be able to
set/modify the audit container ID, and since setting the audit
container ID can have a significant effect on the records captured
(and their routing to multiple daemons when we get there) modifying
the audit container ID is akin to modifying the audit configuration
which is why it is gated by CAP_AUDIT_CONTROL. The current thinking
is that you would only change the audit container ID from one
set/inherited value to another if you were nesting containers, in
which case the nested container orchestrator would need to be granted
CAP_AUDIT_CONTROL (which everyone to date seems to agree is a workable
compromise). We did consider allowing for a chain of nested audit
container IDs, but the implications of doing so are significant
(implementation mess, runtime cost, etc.) so we are leaving that out
of this effort.

From a practical perspective, un-setting the audit container ID is
pretty much the same as changing it from one set value to another so
most of the above applies to that case as well.

--
paul moore
http://www.paul-moore.com

2019-05-29 15:36:24

by Tycho Andersen

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Wed, May 29, 2019 at 11:29:05AM -0400, Paul Moore wrote:
> On Wed, May 29, 2019 at 10:57 AM Tycho Andersen <[email protected]> wrote:
> >
> > On Mon, Apr 08, 2019 at 11:39:09PM -0400, Richard Guy Briggs wrote:
> > > It is not permitted to unset the audit container identifier.
> > > A child inherits its parent's audit container identifier.
> >
> > ...
> >
> > > /**
> > > + * audit_set_contid - set current task's audit contid
> > > + * @contid: contid value
> > > + *
> > > + * Returns 0 on success, -EPERM on permission failure.
> > > + *
> > > + * Called (set) from fs/proc/base.c::proc_contid_write().
> > > + */
> > > +int audit_set_contid(struct task_struct *task, u64 contid)
> > > +{
> > > + u64 oldcontid;
> > > + int rc = 0;
> > > + struct audit_buffer *ab;
> > > + uid_t uid;
> > > + struct tty_struct *tty;
> > > + char comm[sizeof(current->comm)];
> > > +
> > > + task_lock(task);
> > > + /* Can't set if audit disabled */
> > > + if (!task->audit) {
> > > + task_unlock(task);
> > > + return -ENOPROTOOPT;
> > > + }
> > > + oldcontid = audit_get_contid(task);
> > > + read_lock(&tasklist_lock);
> > > + /* Don't allow the audit containerid to be unset */
> > > + if (!audit_contid_valid(contid))
> > > + rc = -EINVAL;
> > > + /* if we don't have caps, reject */
> > > + else if (!capable(CAP_AUDIT_CONTROL))
> > > + rc = -EPERM;
> > > + /* if task has children or is not single-threaded, deny */
> > > + else if (!list_empty(&task->children))
> > > + rc = -EBUSY;
> > > + else if (!(thread_group_leader(task) && thread_group_empty(task)))
> > > + rc = -EALREADY;
> > > + read_unlock(&tasklist_lock);
> > > + if (!rc)
> > > + task->audit->contid = contid;
> > > + task_unlock(task);
> > > +
> > > + if (!audit_enabled)
> > > + return rc;
> >
> > ...but it is allowed to change it (assuming
> > capable(CAP_AUDIT_CONTROL), of course)? Seems like this might be more
> > immediately useful since we still live in the world of majority
> > privileged containers if we didn't allow changing it, in addition to
> > un-setting it.
>
> The idea is that only container orchestrators should be able to
> set/modify the audit container ID, and since setting the audit
> container ID can have a significant effect on the records captured
> (and their routing to multiple daemons when we get there) modifying
> the audit container ID is akin to modifying the audit configuration
> which is why it is gated by CAP_AUDIT_CONTROL. The current thinking
> is that you would only change the audit container ID from one
> set/inherited value to another if you were nesting containers, in
> which case the nested container orchestrator would need to be granted
> CAP_AUDIT_CONTROL (which everyone to date seems to agree is a workable
> compromise).

But then don't you want some kind of ns_capable() instead (probably
not the obvious one, though...)? With capable(), you can't really nest
using the audit-id and user namespaces together.

Tycho

2019-05-29 16:06:41

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Wed, May 29, 2019 at 11:34 AM Tycho Andersen <[email protected]> wrote:
>
> On Wed, May 29, 2019 at 11:29:05AM -0400, Paul Moore wrote:
> > On Wed, May 29, 2019 at 10:57 AM Tycho Andersen <[email protected]> wrote:
> > >
> > > On Mon, Apr 08, 2019 at 11:39:09PM -0400, Richard Guy Briggs wrote:
> > > > It is not permitted to unset the audit container identifier.
> > > > A child inherits its parent's audit container identifier.
> > >
> > > ...
> > >
> > > > /**
> > > > + * audit_set_contid - set current task's audit contid
> > > > + * @contid: contid value
> > > > + *
> > > > + * Returns 0 on success, -EPERM on permission failure.
> > > > + *
> > > > + * Called (set) from fs/proc/base.c::proc_contid_write().
> > > > + */
> > > > +int audit_set_contid(struct task_struct *task, u64 contid)
> > > > +{
> > > > + u64 oldcontid;
> > > > + int rc = 0;
> > > > + struct audit_buffer *ab;
> > > > + uid_t uid;
> > > > + struct tty_struct *tty;
> > > > + char comm[sizeof(current->comm)];
> > > > +
> > > > + task_lock(task);
> > > > + /* Can't set if audit disabled */
> > > > + if (!task->audit) {
> > > > + task_unlock(task);
> > > > + return -ENOPROTOOPT;
> > > > + }
> > > > + oldcontid = audit_get_contid(task);
> > > > + read_lock(&tasklist_lock);
> > > > + /* Don't allow the audit containerid to be unset */
> > > > + if (!audit_contid_valid(contid))
> > > > + rc = -EINVAL;
> > > > + /* if we don't have caps, reject */
> > > > + else if (!capable(CAP_AUDIT_CONTROL))
> > > > + rc = -EPERM;
> > > > + /* if task has children or is not single-threaded, deny */
> > > > + else if (!list_empty(&task->children))
> > > > + rc = -EBUSY;
> > > > + else if (!(thread_group_leader(task) && thread_group_empty(task)))
> > > > + rc = -EALREADY;
> > > > + read_unlock(&tasklist_lock);
> > > > + if (!rc)
> > > > + task->audit->contid = contid;
> > > > + task_unlock(task);
> > > > +
> > > > + if (!audit_enabled)
> > > > + return rc;
> > >
> > > ...but it is allowed to change it (assuming
> > > capable(CAP_AUDIT_CONTROL), of course)? Seems like this might be more
> > > immediately useful since we still live in the world of majority
> > > privileged containers if we didn't allow changing it, in addition to
> > > un-setting it.
> >
> > The idea is that only container orchestrators should be able to
> > set/modify the audit container ID, and since setting the audit
> > container ID can have a significant effect on the records captured
> > (and their routing to multiple daemons when we get there) modifying
> > the audit container ID is akin to modifying the audit configuration
> > which is why it is gated by CAP_AUDIT_CONTROL. The current thinking
> > is that you would only change the audit container ID from one
> > set/inherited value to another if you were nesting containers, in
> > which case the nested container orchestrator would need to be granted
> > CAP_AUDIT_CONTROL (which everyone to date seems to agree is a workable
> > compromise).
>
> But then don't you want some kind of ns_capable() instead (probably
> not the obvious one, though...)? With capable(), you can't really nest
> using the audit-id and user namespaces together.

You want capable() and not ns_capable() because you want to ensure
that the orchestrator has the rights in the init_ns as changes to the
audit container ID could have an auditing impact that spans the entire
system. Setting the audit container ID is equivalent to munging the
kernel's audit configuration, and the audit configuration is not
"namespaced" in any way. The audit container ID work is about
providing the right "container context" (as defined by userspace) with
the audit records so that admins have a better understanding about
what is going on in the system; it is very explicitly not creating an
audit namespace.

At some point in the future we will want to support running multiple
audit daemons, and have a configurable way of routing audit records
based on the audit container ID, which will blur the line regarding
audit namespaces, but even then I would argue we are not creating an
audit namespace.

--
paul moore
http://www.paul-moore.com

2019-05-29 22:17:51

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 04/10] audit: log container info of syscalls

On Mon, Apr 8, 2019 at 11:40 PM Richard Guy Briggs <[email protected]> wrote:
>
> Create a new audit record AUDIT_CONTAINER_ID to document the audit
> container identifier of a process if it is present.
>
> Called from audit_log_exit(), syscalls are covered.
>
> A sample raw event:
> type=SYSCALL msg=audit(1519924845.499:257): arch=c000003e syscall=257 success=yes exit=3 a0=ffffff9c a1=56374e1cef30 a2=241 a3=1b6 items=2 ppid=606 pid=635 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3 comm="bash" exe="/usr/bin/bash" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="tmpcontainerid"
> type=CWD msg=audit(1519924845.499:257): cwd="/root"
> type=PATH msg=audit(1519924845.499:257): item=0 name="/tmp/" inode=13863 dev=00:27 mode=041777 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype= PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
> type=PATH msg=audit(1519924845.499:257): item=1 name="/tmp/tmpcontainerid" inode=17729 dev=00:27 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
> type=PROCTITLE msg=audit(1519924845.499:257): proctitle=62617368002D6300736C65657020313B206563686F2074657374203E202F746D702F746D70636F6E7461696E65726964
> type=CONTAINER_ID msg=audit(1519924845.499:257): contid=123458
>
> Please see the github audit kernel issue for the main feature:
> https://github.com/linux-audit/audit-kernel/issues/90
> Please see the github audit userspace issue for supporting additions:
> https://github.com/linux-audit/audit-userspace/issues/51
> Please see the github audit testsuiite issue for the test case:
> https://github.com/linux-audit/audit-testsuite/issues/64
> Please see the github audit wiki for the feature overview:
> https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> Signed-off-by: Richard Guy Briggs <[email protected]>
> Acked-by: Serge Hallyn <[email protected]>
> Acked-by: Steve Grubb <[email protected]>
> Acked-by: Neil Horman <[email protected]>
> Reviewed-by: Ondrej Mosnacek <[email protected]>
> ---
> include/linux/audit.h | 5 +++++
> include/uapi/linux/audit.h | 1 +
> kernel/audit.c | 20 ++++++++++++++++++++
> kernel/auditsc.c | 20 ++++++++++++++------
> 4 files changed, 40 insertions(+), 6 deletions(-)

...

> diff --git a/kernel/audit.c b/kernel/audit.c
> index 182b0f2c183d..3e0af53f3c4d 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -2127,6 +2127,26 @@ void audit_log_session_info(struct audit_buffer *ab)
> audit_log_format(ab, "auid=%u ses=%u", auid, sessionid);
> }
>
> +/*
> + * audit_log_contid - report container info
> + * @context: task or local context for record
> + * @contid: container ID to report
> + */
> +void audit_log_contid(struct audit_context *context, u64 contid)
> +{
> + struct audit_buffer *ab;
> +
> + if (!audit_contid_valid(contid))
> + return;
> + /* Generate AUDIT_CONTAINER_ID record with container ID */
> + ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONTAINER_ID);
> + if (!ab)
> + return;
> + audit_log_format(ab, "contid=%llu", (unsigned long long)contid);

We have a consistency problem regarding how to output the u64 contid
values; this function uses an explicit cast, others do not. According
to Documentation/core-api/printk-formats.rst the recommendation for
u64 is %llu (or %llx, if you want hex). Looking quickly through the
printk code this appears to still be correct. I suggest we get rid of
the cast (like it was in v5).

> + audit_log_end(ab);
> +}
> +EXPORT_SYMBOL(audit_log_contid);

--
paul moore
http://www.paul-moore.com

2019-05-29 22:18:48

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 09/10] audit: add support for containerid to network namespaces

On Mon, Apr 8, 2019 at 11:41 PM Richard Guy Briggs <[email protected]> wrote:
>
> Audit events could happen in a network namespace outside of a task
> context due to packets received from the net that trigger an auditing
> rule prior to being associated with a running task. The network
> namespace could be in use by multiple containers by association to the
> tasks in that network namespace. We still want a way to attribute
> these events to any potential containers. Keep a list per network
> namespace to track these audit container identifiiers.
>
> Add/increment the audit container identifier on:
> - initial setting of the audit container identifier via /proc
> - clone/fork call that inherits an audit container identifier
> - unshare call that inherits an audit container identifier
> - setns call that inherits an audit container identifier
> Delete/decrement the audit container identifier on:
> - an inherited audit container identifier dropped when child set
> - process exit
> - unshare call that drops a net namespace
> - setns call that drops a net namespace
>
> Please see the github audit kernel issue for contid net support:
> https://github.com/linux-audit/audit-kernel/issues/92
> Please see the github audit testsuiite issue for the test case:
> https://github.com/linux-audit/audit-testsuite/issues/64
> Please see the github audit wiki for the feature overview:
> https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> Signed-off-by: Richard Guy Briggs <[email protected]>
> Acked-by: Neil Horman <[email protected]>
> Reviewed-by: Ondrej Mosnacek <[email protected]>
> ---
> include/linux/audit.h | 19 +++++++++++
> kernel/audit.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++--
> kernel/nsproxy.c | 4 +++
> 3 files changed, 108 insertions(+), 3 deletions(-)

...

> diff --git a/kernel/audit.c b/kernel/audit.c
> index 6c742da66b32..996213591617 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -376,6 +384,75 @@ static struct sock *audit_get_sk(const struct net *net)
> return aunet->sk;
> }
>
> +void audit_netns_contid_add(struct net *net, u64 contid)
> +{
> + struct audit_net *aunet;
> + struct list_head *contid_list;
> + struct audit_contid *cont;
> +
> + if (!net)
> + return;
> + if (!audit_contid_valid(contid))
> + return;
> + aunet = net_generic(net, audit_net_id);
> + if (!aunet)
> + return;
> + contid_list = &aunet->contid_list;
> + spin_lock(&aunet->contid_list_lock);
> + list_for_each_entry_rcu(cont, contid_list, list)
> + if (cont->id == contid) {
> + refcount_inc(&cont->refcount);
> + goto out;
> + }
> + cont = kmalloc(sizeof(struct audit_contid), GFP_ATOMIC);
> + if (cont) {
> + INIT_LIST_HEAD(&cont->list);

I thought you were going to get rid of this INIT_LIST_HEAD() call?

> + cont->id = contid;
> + refcount_set(&cont->refcount, 1);
> + list_add_rcu(&cont->list, contid_list);
> + }
> +out:
> + spin_unlock(&aunet->contid_list_lock);
> +}

--
paul moore
http://www.paul-moore.com

2019-05-29 22:19:00

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 08/10] audit: add containerid filtering

On Mon, Apr 8, 2019 at 11:41 PM Richard Guy Briggs <[email protected]> wrote:
>
> Implement audit container identifier filtering using the AUDIT_CONTID
> field name to send an 8-character string representing a u64 since the
> value field is only u32.
>
> Sending it as two u32 was considered, but gathering and comparing two
> fields was more complex.
>
> The feature indicator is AUDIT_FEATURE_BITMAP_CONTAINERID.
>
> Please see the github audit kernel issue for the contid filter feature:
> https://github.com/linux-audit/audit-kernel/issues/91
> Please see the github audit userspace issue for filter additions:
> https://github.com/linux-audit/audit-userspace/issues/40
> Please see the github audit testsuiite issue for the test case:
> https://github.com/linux-audit/audit-testsuite/issues/64
> Please see the github audit wiki for the feature overview:
> https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> Signed-off-by: Richard Guy Briggs <[email protected]>
> Acked-by: Serge Hallyn <[email protected]>
> Acked-by: Neil Horman <[email protected]>
> Reviewed-by: Ondrej Mosnacek <[email protected]>
> ---
> include/linux/audit.h | 1 +
> include/uapi/linux/audit.h | 5 ++++-
> kernel/audit.h | 1 +
> kernel/auditfilter.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++
> kernel/auditsc.c | 4 ++++
> 5 files changed, 57 insertions(+), 1 deletion(-)

...

> diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> index 63f8b3f26fab..407b5bb3b4c6 100644
> --- a/kernel/auditfilter.c
> +++ b/kernel/auditfilter.c
> @@ -1206,6 +1224,31 @@ int audit_comparator(u32 left, u32 op, u32 right)
> }
> }
>
> +int audit_comparator64(u64 left, u32 op, u64 right)
> +{
> + switch (op) {
> + case Audit_equal:
> + return (left == right);
> + case Audit_not_equal:
> + return (left != right);
> + case Audit_lt:
> + return (left < right);
> + case Audit_le:
> + return (left <= right);
> + case Audit_gt:
> + return (left > right);
> + case Audit_ge:
> + return (left >= right);
> + case Audit_bitmask:
> + return (left & right);
> + case Audit_bittest:
> + return ((left & right) == right);
> + default:
> + BUG();

A little birdy mentioned the BUG() here as a potential issue and while
I had ignored it in earlier patches because this is likely a
cut-n-paste from another audit comparator function, I took a closer
look this time. It appears as though we will never have an invalid op
value as audit_data_to_entry()/audit_to_op() ensure that the op value
is a a known good value. Removing the BUG() from all the audit
comparators is a separate issue, but I think it would be good to
remove it from this newly added comparator; keeping it so that we
return "0" in the default case seems reasoanble.

> + return 0;
> + }
> +}

--
paul moore
http://www.paul-moore.com

2019-05-29 22:27:59

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On Mon, Apr 22, 2019 at 9:49 AM Paul Moore <[email protected]> wrote:
> On Mon, Apr 22, 2019 at 7:38 AM Neil Horman <[email protected]> wrote:
> > On Mon, Apr 08, 2019 at 11:39:07PM -0400, Richard Guy Briggs wrote:
> > > Implement kernel audit container identifier.
> >
> > I'm sorry, I've lost track of this, where have we landed on it? Are we good for
> > inclusion?
>
> I haven't finished going through this latest revision, but unless
> Richard made any significant changes outside of the feedback from the
> v5 patchset I'm guessing we are "close".
>
> Based on discussions Richard and I had some time ago, I have always
> envisioned the plan as being get the kernel patchset, tests, docs
> ready (which Richard has been doing) and then run the actual
> implemented API by the userland container folks, e.g. cri-o/lxc/etc.,
> to make sure the actual implementation is sane from their perspective.
> They've already seen the design, so I'm not expecting any real
> surprises here, but sometimes opinions change when they have actual
> code in front of them to play with and review.
>
> Beyond that, while the cri-o/lxc/etc. folks are looking it over,
> whatever additional testing we can do would be a big win. I'm
> thinking I'll pull it into a separate branch in the audit tree
> (audit/working-container ?) and include that in my secnext kernels
> that I build/test on a regular basis; this is also a handy way to keep
> it based against the current audit/next branch. If any changes are
> needed Richard can either chose to base those changes on audit/next or
> the separate audit container ID branch; that's up to him. I've done
> this with other big changes in other trees, e.g. SELinux, and it has
> worked well to get some extra testing in and keep the patchset "merge
> ready" while others outside the subsystem look things over.

I just sent my feedback on the v6 patchset, and it's small: basically
three patches with "one-liner" changes needed.

Richard, it's your call on how you want to proceed from here. You can
post a v7 incorporating the feedback, or since the tweaks are so
minor, you can post fixup patches; the former being more
comprehensive, the later being quicker to review and digest.
Regardless of that, while we are waiting on a prototype from the
container folks, I think it would be good to pull this into a working
branch in the audit repo (as mentioned above), unless you would prefer
to keep it as a patchset on the mailing list? If you want to go with
the working branch approach, I'll keep the branch fresh and (re)based
against audit/next and if we notice any problems you can just submit
fixes against that branch (depending on the issue they can be fixup
patches, or proper patches). My hope is that this will enable the
process to move quicker as we get near the finish line.

--
paul moore
http://www.paul-moore.com

2019-05-29 22:31:19

by Tycho Andersen

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Wed, May 29, 2019 at 12:03:58PM -0400, Paul Moore wrote:
> On Wed, May 29, 2019 at 11:34 AM Tycho Andersen <[email protected]> wrote:
> >
> > On Wed, May 29, 2019 at 11:29:05AM -0400, Paul Moore wrote:
> > > On Wed, May 29, 2019 at 10:57 AM Tycho Andersen <[email protected]> wrote:
> > > >
> > > > On Mon, Apr 08, 2019 at 11:39:09PM -0400, Richard Guy Briggs wrote:
> > > > > It is not permitted to unset the audit container identifier.
> > > > > A child inherits its parent's audit container identifier.
> > > >
> > > > ...
> > > >
> > > > > /**
> > > > > + * audit_set_contid - set current task's audit contid
> > > > > + * @contid: contid value
> > > > > + *
> > > > > + * Returns 0 on success, -EPERM on permission failure.
> > > > > + *
> > > > > + * Called (set) from fs/proc/base.c::proc_contid_write().
> > > > > + */
> > > > > +int audit_set_contid(struct task_struct *task, u64 contid)
> > > > > +{
> > > > > + u64 oldcontid;
> > > > > + int rc = 0;
> > > > > + struct audit_buffer *ab;
> > > > > + uid_t uid;
> > > > > + struct tty_struct *tty;
> > > > > + char comm[sizeof(current->comm)];
> > > > > +
> > > > > + task_lock(task);
> > > > > + /* Can't set if audit disabled */
> > > > > + if (!task->audit) {
> > > > > + task_unlock(task);
> > > > > + return -ENOPROTOOPT;
> > > > > + }
> > > > > + oldcontid = audit_get_contid(task);
> > > > > + read_lock(&tasklist_lock);
> > > > > + /* Don't allow the audit containerid to be unset */
> > > > > + if (!audit_contid_valid(contid))
> > > > > + rc = -EINVAL;
> > > > > + /* if we don't have caps, reject */
> > > > > + else if (!capable(CAP_AUDIT_CONTROL))
> > > > > + rc = -EPERM;
> > > > > + /* if task has children or is not single-threaded, deny */
> > > > > + else if (!list_empty(&task->children))
> > > > > + rc = -EBUSY;
> > > > > + else if (!(thread_group_leader(task) && thread_group_empty(task)))
> > > > > + rc = -EALREADY;
> > > > > + read_unlock(&tasklist_lock);
> > > > > + if (!rc)
> > > > > + task->audit->contid = contid;
> > > > > + task_unlock(task);
> > > > > +
> > > > > + if (!audit_enabled)
> > > > > + return rc;
> > > >
> > > > ...but it is allowed to change it (assuming
> > > > capable(CAP_AUDIT_CONTROL), of course)? Seems like this might be more
> > > > immediately useful since we still live in the world of majority
> > > > privileged containers if we didn't allow changing it, in addition to
> > > > un-setting it.
> > >
> > > The idea is that only container orchestrators should be able to
> > > set/modify the audit container ID, and since setting the audit
> > > container ID can have a significant effect on the records captured
> > > (and their routing to multiple daemons when we get there) modifying
> > > the audit container ID is akin to modifying the audit configuration
> > > which is why it is gated by CAP_AUDIT_CONTROL. The current thinking
> > > is that you would only change the audit container ID from one
> > > set/inherited value to another if you were nesting containers, in
> > > which case the nested container orchestrator would need to be granted
> > > CAP_AUDIT_CONTROL (which everyone to date seems to agree is a workable
> > > compromise).
> >
> > But then don't you want some kind of ns_capable() instead (probably
> > not the obvious one, though...)? With capable(), you can't really nest
> > using the audit-id and user namespaces together.
>
> You want capable() and not ns_capable() because you want to ensure
> that the orchestrator has the rights in the init_ns as changes to the
> audit container ID could have an auditing impact that spans the entire
> system.

Ok but,

> > > The current thinking
> > > is that you would only change the audit container ID from one
> > > set/inherited value to another if you were nesting containers, in
> > > which case the nested container orchestrator would need to be granted
> > > CAP_AUDIT_CONTROL (which everyone to date seems to agree is a workable
> > > compromise).

won't work in user namespaced containers, because they will never be
capable(CAP_AUDIT_CONTROL); so I don't think this will work for
nesting as is. But maybe nobody cares :)

Tycho

2019-05-29 22:41:22

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Wed, May 29, 2019 at 6:28 PM Tycho Andersen <[email protected]> wrote:
> On Wed, May 29, 2019 at 12:03:58PM -0400, Paul Moore wrote:
> > On Wed, May 29, 2019 at 11:34 AM Tycho Andersen <[email protected]> wrote:
> > >
> > > On Wed, May 29, 2019 at 11:29:05AM -0400, Paul Moore wrote:
> > > > On Wed, May 29, 2019 at 10:57 AM Tycho Andersen <[email protected]> wrote:
> > > > >
> > > > > On Mon, Apr 08, 2019 at 11:39:09PM -0400, Richard Guy Briggs wrote:
> > > > > > It is not permitted to unset the audit container identifier.
> > > > > > A child inherits its parent's audit container identifier.
> > > > >
> > > > > ...
> > > > >
> > > > > > /**
> > > > > > + * audit_set_contid - set current task's audit contid
> > > > > > + * @contid: contid value
> > > > > > + *
> > > > > > + * Returns 0 on success, -EPERM on permission failure.
> > > > > > + *
> > > > > > + * Called (set) from fs/proc/base.c::proc_contid_write().
> > > > > > + */
> > > > > > +int audit_set_contid(struct task_struct *task, u64 contid)
> > > > > > +{
> > > > > > + u64 oldcontid;
> > > > > > + int rc = 0;
> > > > > > + struct audit_buffer *ab;
> > > > > > + uid_t uid;
> > > > > > + struct tty_struct *tty;
> > > > > > + char comm[sizeof(current->comm)];
> > > > > > +
> > > > > > + task_lock(task);
> > > > > > + /* Can't set if audit disabled */
> > > > > > + if (!task->audit) {
> > > > > > + task_unlock(task);
> > > > > > + return -ENOPROTOOPT;
> > > > > > + }
> > > > > > + oldcontid = audit_get_contid(task);
> > > > > > + read_lock(&tasklist_lock);
> > > > > > + /* Don't allow the audit containerid to be unset */
> > > > > > + if (!audit_contid_valid(contid))
> > > > > > + rc = -EINVAL;
> > > > > > + /* if we don't have caps, reject */
> > > > > > + else if (!capable(CAP_AUDIT_CONTROL))
> > > > > > + rc = -EPERM;
> > > > > > + /* if task has children or is not single-threaded, deny */
> > > > > > + else if (!list_empty(&task->children))
> > > > > > + rc = -EBUSY;
> > > > > > + else if (!(thread_group_leader(task) && thread_group_empty(task)))
> > > > > > + rc = -EALREADY;
> > > > > > + read_unlock(&tasklist_lock);
> > > > > > + if (!rc)
> > > > > > + task->audit->contid = contid;
> > > > > > + task_unlock(task);
> > > > > > +
> > > > > > + if (!audit_enabled)
> > > > > > + return rc;
> > > > >
> > > > > ...but it is allowed to change it (assuming
> > > > > capable(CAP_AUDIT_CONTROL), of course)? Seems like this might be more
> > > > > immediately useful since we still live in the world of majority
> > > > > privileged containers if we didn't allow changing it, in addition to
> > > > > un-setting it.
> > > >
> > > > The idea is that only container orchestrators should be able to
> > > > set/modify the audit container ID, and since setting the audit
> > > > container ID can have a significant effect on the records captured
> > > > (and their routing to multiple daemons when we get there) modifying
> > > > the audit container ID is akin to modifying the audit configuration
> > > > which is why it is gated by CAP_AUDIT_CONTROL. The current thinking
> > > > is that you would only change the audit container ID from one
> > > > set/inherited value to another if you were nesting containers, in
> > > > which case the nested container orchestrator would need to be granted
> > > > CAP_AUDIT_CONTROL (which everyone to date seems to agree is a workable
> > > > compromise).
> > >
> > > But then don't you want some kind of ns_capable() instead (probably
> > > not the obvious one, though...)? With capable(), you can't really nest
> > > using the audit-id and user namespaces together.
> >
> > You want capable() and not ns_capable() because you want to ensure
> > that the orchestrator has the rights in the init_ns as changes to the
> > audit container ID could have an auditing impact that spans the entire
> > system.
>
> Ok but,
>
> > > > The current thinking
> > > > is that you would only change the audit container ID from one
> > > > set/inherited value to another if you were nesting containers, in
> > > > which case the nested container orchestrator would need to be granted
> > > > CAP_AUDIT_CONTROL (which everyone to date seems to agree is a workable
> > > > compromise).
>
> won't work in user namespaced containers, because they will never be
> capable(CAP_AUDIT_CONTROL); so I don't think this will work for
> nesting as is. But maybe nobody cares :)

That's fun :)

To be honest, I've never been a big fan of supporting nested
containers from an audit perspective, so I'm not really too upset
about this. The k8s/cri-o folks seem okay with this, or at least I
haven't heard any objections; lxc folks, what do you have to say?

--
paul moore
http://www.paul-moore.com

2019-05-30 13:09:50

by Ondrej Mosnacek

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 04/10] audit: log container info of syscalls

On Thu, May 30, 2019 at 12:16 AM Paul Moore <[email protected]> wrote:
> On Mon, Apr 8, 2019 at 11:40 PM Richard Guy Briggs <[email protected]> wrote:
> >
> > Create a new audit record AUDIT_CONTAINER_ID to document the audit
> > container identifier of a process if it is present.
> >
> > Called from audit_log_exit(), syscalls are covered.
> >
> > A sample raw event:
> > type=SYSCALL msg=audit(1519924845.499:257): arch=c000003e syscall=257 success=yes exit=3 a0=ffffff9c a1=56374e1cef30 a2=241 a3=1b6 items=2 ppid=606 pid=635 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3 comm="bash" exe="/usr/bin/bash" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="tmpcontainerid"
> > type=CWD msg=audit(1519924845.499:257): cwd="/root"
> > type=PATH msg=audit(1519924845.499:257): item=0 name="/tmp/" inode=13863 dev=00:27 mode=041777 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype= PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
> > type=PATH msg=audit(1519924845.499:257): item=1 name="/tmp/tmpcontainerid" inode=17729 dev=00:27 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
> > type=PROCTITLE msg=audit(1519924845.499:257): proctitle=62617368002D6300736C65657020313B206563686F2074657374203E202F746D702F746D70636F6E7461696E65726964
> > type=CONTAINER_ID msg=audit(1519924845.499:257): contid=123458
> >
> > Please see the github audit kernel issue for the main feature:
> > https://github.com/linux-audit/audit-kernel/issues/90
> > Please see the github audit userspace issue for supporting additions:
> > https://github.com/linux-audit/audit-userspace/issues/51
> > Please see the github audit testsuiite issue for the test case:
> > https://github.com/linux-audit/audit-testsuite/issues/64
> > Please see the github audit wiki for the feature overview:
> > https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> > Signed-off-by: Richard Guy Briggs <[email protected]>
> > Acked-by: Serge Hallyn <[email protected]>
> > Acked-by: Steve Grubb <[email protected]>
> > Acked-by: Neil Horman <[email protected]>
> > Reviewed-by: Ondrej Mosnacek <[email protected]>
> > ---
> > include/linux/audit.h | 5 +++++
> > include/uapi/linux/audit.h | 1 +
> > kernel/audit.c | 20 ++++++++++++++++++++
> > kernel/auditsc.c | 20 ++++++++++++++------
> > 4 files changed, 40 insertions(+), 6 deletions(-)
>
> ...
>
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 182b0f2c183d..3e0af53f3c4d 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -2127,6 +2127,26 @@ void audit_log_session_info(struct audit_buffer *ab)
> > audit_log_format(ab, "auid=%u ses=%u", auid, sessionid);
> > }
> >
> > +/*
> > + * audit_log_contid - report container info
> > + * @context: task or local context for record
> > + * @contid: container ID to report
> > + */
> > +void audit_log_contid(struct audit_context *context, u64 contid)
> > +{
> > + struct audit_buffer *ab;
> > +
> > + if (!audit_contid_valid(contid))
> > + return;
> > + /* Generate AUDIT_CONTAINER_ID record with container ID */
> > + ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONTAINER_ID);
> > + if (!ab)
> > + return;
> > + audit_log_format(ab, "contid=%llu", (unsigned long long)contid);
>
> We have a consistency problem regarding how to output the u64 contid
> values; this function uses an explicit cast, others do not. According
> to Documentation/core-api/printk-formats.rst the recommendation for
> u64 is %llu (or %llx, if you want hex). Looking quickly through the
> printk code this appears to still be correct. I suggest we get rid of
> the cast (like it was in v5).

IIRC it was me who suggested to add the casts. I didn't realize that
the kernel actually guarantees that "%llu" will always work with u64.
Taking that into account I rescind my request to add the cast. Sorry
for the false alarm.

>
> > + audit_log_end(ab);
> > +}
> > +EXPORT_SYMBOL(audit_log_contid);
>
> --
> paul moore
> http://www.paul-moore.com

--
Ondrej Mosnacek <omosnace at redhat dot com>
Software Engineer, Security Technologies
Red Hat, Inc.

2019-05-30 13:12:37

by Steve Grubb

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On Wednesday, May 29, 2019 6:26:12 PM EDT Paul Moore wrote:
> On Mon, Apr 22, 2019 at 9:49 AM Paul Moore <[email protected]> wrote:
> > On Mon, Apr 22, 2019 at 7:38 AM Neil Horman <[email protected]>
wrote:
> > > On Mon, Apr 08, 2019 at 11:39:07PM -0400, Richard Guy Briggs wrote:
> > > > Implement kernel audit container identifier.
> > >
> > > I'm sorry, I've lost track of this, where have we landed on it? Are we
> > > good for inclusion?
> >
> > I haven't finished going through this latest revision, but unless
> > Richard made any significant changes outside of the feedback from the
> > v5 patchset I'm guessing we are "close".
> >
> > Based on discussions Richard and I had some time ago, I have always
> > envisioned the plan as being get the kernel patchset, tests, docs
> > ready (which Richard has been doing) and then run the actual
> > implemented API by the userland container folks, e.g. cri-o/lxc/etc.,
> > to make sure the actual implementation is sane from their perspective.
> > They've already seen the design, so I'm not expecting any real
> > surprises here, but sometimes opinions change when they have actual
> > code in front of them to play with and review.
> >
> > Beyond that, while the cri-o/lxc/etc. folks are looking it over,
> > whatever additional testing we can do would be a big win. I'm
> > thinking I'll pull it into a separate branch in the audit tree
> > (audit/working-container ?) and include that in my secnext kernels
> > that I build/test on a regular basis; this is also a handy way to keep
> > it based against the current audit/next branch. If any changes are
> > needed Richard can either chose to base those changes on audit/next or
> > the separate audit container ID branch; that's up to him. I've done
> > this with other big changes in other trees, e.g. SELinux, and it has
> > worked well to get some extra testing in and keep the patchset "merge
> > ready" while others outside the subsystem look things over.
>
> I just sent my feedback on the v6 patchset, and it's small: basically
> three patches with "one-liner" changes needed.
>
> Richard, it's your call on how you want to proceed from here. You can
> post a v7 incorporating the feedback, or since the tweaks are so
> minor, you can post fixup patches; the former being more
> comprehensive, the later being quicker to review and digest.
> Regardless of that, while we are waiting on a prototype from the
> container folks, I think it would be good to pull this into a working
> branch in the audit repo (as mentioned above), unless you would prefer
> to keep it as a patchset on the mailing list?

Personally, I'd like to see this on a branch so that it's easier to build a
kernel locally for testing.

-Steve


> If you want to go with
> the working branch approach, I'll keep the branch fresh and (re)based
> against audit/next and if we notice any problems you can just submit
> fixes against that branch (depending on the issue they can be fixup
> patches, or proper patches). My hope is that this will enable the
> process to move quicker as we get near the finish line.




2019-05-30 13:37:31

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On Thu, May 30, 2019 at 9:08 AM Steve Grubb <[email protected]> wrote:
> On Wednesday, May 29, 2019 6:26:12 PM EDT Paul Moore wrote:
> > On Mon, Apr 22, 2019 at 9:49 AM Paul Moore <[email protected]> wrote:
> > > On Mon, Apr 22, 2019 at 7:38 AM Neil Horman <[email protected]>
> wrote:
> > > > On Mon, Apr 08, 2019 at 11:39:07PM -0400, Richard Guy Briggs wrote:
> > > > > Implement kernel audit container identifier.
> > > >
> > > > I'm sorry, I've lost track of this, where have we landed on it? Are we
> > > > good for inclusion?
> > >
> > > I haven't finished going through this latest revision, but unless
> > > Richard made any significant changes outside of the feedback from the
> > > v5 patchset I'm guessing we are "close".
> > >
> > > Based on discussions Richard and I had some time ago, I have always
> > > envisioned the plan as being get the kernel patchset, tests, docs
> > > ready (which Richard has been doing) and then run the actual
> > > implemented API by the userland container folks, e.g. cri-o/lxc/etc.,
> > > to make sure the actual implementation is sane from their perspective.
> > > They've already seen the design, so I'm not expecting any real
> > > surprises here, but sometimes opinions change when they have actual
> > > code in front of them to play with and review.
> > >
> > > Beyond that, while the cri-o/lxc/etc. folks are looking it over,
> > > whatever additional testing we can do would be a big win. I'm
> > > thinking I'll pull it into a separate branch in the audit tree
> > > (audit/working-container ?) and include that in my secnext kernels
> > > that I build/test on a regular basis; this is also a handy way to keep
> > > it based against the current audit/next branch. If any changes are
> > > needed Richard can either chose to base those changes on audit/next or
> > > the separate audit container ID branch; that's up to him. I've done
> > > this with other big changes in other trees, e.g. SELinux, and it has
> > > worked well to get some extra testing in and keep the patchset "merge
> > > ready" while others outside the subsystem look things over.
> >
> > I just sent my feedback on the v6 patchset, and it's small: basically
> > three patches with "one-liner" changes needed.
> >
> > Richard, it's your call on how you want to proceed from here. You can
> > post a v7 incorporating the feedback, or since the tweaks are so
> > minor, you can post fixup patches; the former being more
> > comprehensive, the later being quicker to review and digest.
> > Regardless of that, while we are waiting on a prototype from the
> > container folks, I think it would be good to pull this into a working
> > branch in the audit repo (as mentioned above), unless you would prefer
> > to keep it as a patchset on the mailing list?
>
> Personally, I'd like to see this on a branch so that it's easier to build a
> kernel locally for testing.

FWIW, if Richard does prefer for me to pull it into a working branch I
plan to include it in my secnext builds both to make it easier to test
regularly and to make the changes available to people who don't want
to build their own kernel.

* http://www.paul-moore.com/blog/d/2019/04/kernel_secnext_repo.html

--
paul moore
http://www.paul-moore.com

2019-05-30 14:10:15

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 00/10] audit: implement container identifier

On 2019-05-30 09:35, Paul Moore wrote:
> On Thu, May 30, 2019 at 9:08 AM Steve Grubb <[email protected]> wrote:
> > On Wednesday, May 29, 2019 6:26:12 PM EDT Paul Moore wrote:
> > > On Mon, Apr 22, 2019 at 9:49 AM Paul Moore <[email protected]> wrote:
> > > > On Mon, Apr 22, 2019 at 7:38 AM Neil Horman <[email protected]>
> > wrote:
> > > > > On Mon, Apr 08, 2019 at 11:39:07PM -0400, Richard Guy Briggs wrote:
> > > > > > Implement kernel audit container identifier.
> > > > >
> > > > > I'm sorry, I've lost track of this, where have we landed on it? Are we
> > > > > good for inclusion?
> > > >
> > > > I haven't finished going through this latest revision, but unless
> > > > Richard made any significant changes outside of the feedback from the
> > > > v5 patchset I'm guessing we are "close".
> > > >
> > > > Based on discussions Richard and I had some time ago, I have always
> > > > envisioned the plan as being get the kernel patchset, tests, docs
> > > > ready (which Richard has been doing) and then run the actual
> > > > implemented API by the userland container folks, e.g. cri-o/lxc/etc.,
> > > > to make sure the actual implementation is sane from their perspective.
> > > > They've already seen the design, so I'm not expecting any real
> > > > surprises here, but sometimes opinions change when they have actual
> > > > code in front of them to play with and review.
> > > >
> > > > Beyond that, while the cri-o/lxc/etc. folks are looking it over,
> > > > whatever additional testing we can do would be a big win. I'm
> > > > thinking I'll pull it into a separate branch in the audit tree
> > > > (audit/working-container ?) and include that in my secnext kernels
> > > > that I build/test on a regular basis; this is also a handy way to keep
> > > > it based against the current audit/next branch. If any changes are
> > > > needed Richard can either chose to base those changes on audit/next or
> > > > the separate audit container ID branch; that's up to him. I've done
> > > > this with other big changes in other trees, e.g. SELinux, and it has
> > > > worked well to get some extra testing in and keep the patchset "merge
> > > > ready" while others outside the subsystem look things over.
> > >
> > > I just sent my feedback on the v6 patchset, and it's small: basically
> > > three patches with "one-liner" changes needed.
> > >
> > > Richard, it's your call on how you want to proceed from here. You can
> > > post a v7 incorporating the feedback, or since the tweaks are so
> > > minor, you can post fixup patches; the former being more
> > > comprehensive, the later being quicker to review and digest.
> > > Regardless of that, while we are waiting on a prototype from the
> > > container folks, I think it would be good to pull this into a working
> > > branch in the audit repo (as mentioned above), unless you would prefer
> > > to keep it as a patchset on the mailing list?
> >
> > Personally, I'd like to see this on a branch so that it's easier to build a
> > kernel locally for testing.
>
> FWIW, if Richard does prefer for me to pull it into a working branch I
> plan to include it in my secnext builds both to make it easier to test
> regularly and to make the changes available to people who don't want
> to build their own kernel.

Sure, let's do a working branch. I'll answer the issues in respective
threads...

> * http://www.paul-moore.com/blog/d/2019/04/kernel_secnext_repo.html
>
> --
> paul moore
> http://www.paul-moore.com
>
> --
> Linux-audit mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/linux-audit

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-05-30 14:11:36

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 04/10] audit: log container info of syscalls

On 2019-05-30 15:08, Ondrej Mosnacek wrote:
> On Thu, May 30, 2019 at 12:16 AM Paul Moore <[email protected]> wrote:
> > On Mon, Apr 8, 2019 at 11:40 PM Richard Guy Briggs <[email protected]> wrote:
> > >
> > > Create a new audit record AUDIT_CONTAINER_ID to document the audit
> > > container identifier of a process if it is present.
> > >
> > > Called from audit_log_exit(), syscalls are covered.
> > >
> > > A sample raw event:
> > > type=SYSCALL msg=audit(1519924845.499:257): arch=c000003e syscall=257 success=yes exit=3 a0=ffffff9c a1=56374e1cef30 a2=241 a3=1b6 items=2 ppid=606 pid=635 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3 comm="bash" exe="/usr/bin/bash" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="tmpcontainerid"
> > > type=CWD msg=audit(1519924845.499:257): cwd="/root"
> > > type=PATH msg=audit(1519924845.499:257): item=0 name="/tmp/" inode=13863 dev=00:27 mode=041777 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype= PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
> > > type=PATH msg=audit(1519924845.499:257): item=1 name="/tmp/tmpcontainerid" inode=17729 dev=00:27 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
> > > type=PROCTITLE msg=audit(1519924845.499:257): proctitle=62617368002D6300736C65657020313B206563686F2074657374203E202F746D702F746D70636F6E7461696E65726964
> > > type=CONTAINER_ID msg=audit(1519924845.499:257): contid=123458
> > >
> > > Please see the github audit kernel issue for the main feature:
> > > https://github.com/linux-audit/audit-kernel/issues/90
> > > Please see the github audit userspace issue for supporting additions:
> > > https://github.com/linux-audit/audit-userspace/issues/51
> > > Please see the github audit testsuiite issue for the test case:
> > > https://github.com/linux-audit/audit-testsuite/issues/64
> > > Please see the github audit wiki for the feature overview:
> > > https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> > > Signed-off-by: Richard Guy Briggs <[email protected]>
> > > Acked-by: Serge Hallyn <[email protected]>
> > > Acked-by: Steve Grubb <[email protected]>
> > > Acked-by: Neil Horman <[email protected]>
> > > Reviewed-by: Ondrej Mosnacek <[email protected]>
> > > ---
> > > include/linux/audit.h | 5 +++++
> > > include/uapi/linux/audit.h | 1 +
> > > kernel/audit.c | 20 ++++++++++++++++++++
> > > kernel/auditsc.c | 20 ++++++++++++++------
> > > 4 files changed, 40 insertions(+), 6 deletions(-)
> >
> > ...
> >
> > > diff --git a/kernel/audit.c b/kernel/audit.c
> > > index 182b0f2c183d..3e0af53f3c4d 100644
> > > --- a/kernel/audit.c
> > > +++ b/kernel/audit.c
> > > @@ -2127,6 +2127,26 @@ void audit_log_session_info(struct audit_buffer *ab)
> > > audit_log_format(ab, "auid=%u ses=%u", auid, sessionid);
> > > }
> > >
> > > +/*
> > > + * audit_log_contid - report container info
> > > + * @context: task or local context for record
> > > + * @contid: container ID to report
> > > + */
> > > +void audit_log_contid(struct audit_context *context, u64 contid)
> > > +{
> > > + struct audit_buffer *ab;
> > > +
> > > + if (!audit_contid_valid(contid))
> > > + return;
> > > + /* Generate AUDIT_CONTAINER_ID record with container ID */
> > > + ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONTAINER_ID);
> > > + if (!ab)
> > > + return;
> > > + audit_log_format(ab, "contid=%llu", (unsigned long long)contid);
> >
> > We have a consistency problem regarding how to output the u64 contid
> > values; this function uses an explicit cast, others do not. According
> > to Documentation/core-api/printk-formats.rst the recommendation for
> > u64 is %llu (or %llx, if you want hex). Looking quickly through the
> > printk code this appears to still be correct. I suggest we get rid of
> > the cast (like it was in v5).
>
> IIRC it was me who suggested to add the casts. I didn't realize that
> the kernel actually guarantees that "%llu" will always work with u64.
> Taking that into account I rescind my request to add the cast. Sorry
> for the false alarm.

Yeah, just remove the cast.

> > > + audit_log_end(ab);
> > > +}
> > > +EXPORT_SYMBOL(audit_log_contid);
> >
> > --
> > paul moore
> > http://www.paul-moore.com
>
> --
> Ondrej Mosnacek <omosnace at redhat dot com>
> Software Engineer, Security Technologies
> Red Hat, Inc.
>
> --
> Linux-audit mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/linux-audit

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-05-30 14:19:30

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 09/10] audit: add support for containerid to network namespaces

On 2019-05-29 18:17, Paul Moore wrote:
> On Mon, Apr 8, 2019 at 11:41 PM Richard Guy Briggs <[email protected]> wrote:
> >
> > Audit events could happen in a network namespace outside of a task
> > context due to packets received from the net that trigger an auditing
> > rule prior to being associated with a running task. The network
> > namespace could be in use by multiple containers by association to the
> > tasks in that network namespace. We still want a way to attribute
> > these events to any potential containers. Keep a list per network
> > namespace to track these audit container identifiiers.
> >
> > Add/increment the audit container identifier on:
> > - initial setting of the audit container identifier via /proc
> > - clone/fork call that inherits an audit container identifier
> > - unshare call that inherits an audit container identifier
> > - setns call that inherits an audit container identifier
> > Delete/decrement the audit container identifier on:
> > - an inherited audit container identifier dropped when child set
> > - process exit
> > - unshare call that drops a net namespace
> > - setns call that drops a net namespace
> >
> > Please see the github audit kernel issue for contid net support:
> > https://github.com/linux-audit/audit-kernel/issues/92
> > Please see the github audit testsuiite issue for the test case:
> > https://github.com/linux-audit/audit-testsuite/issues/64
> > Please see the github audit wiki for the feature overview:
> > https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> > Signed-off-by: Richard Guy Briggs <[email protected]>
> > Acked-by: Neil Horman <[email protected]>
> > Reviewed-by: Ondrej Mosnacek <[email protected]>
> > ---
> > include/linux/audit.h | 19 +++++++++++
> > kernel/audit.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++--
> > kernel/nsproxy.c | 4 +++
> > 3 files changed, 108 insertions(+), 3 deletions(-)
>
> ...
>
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 6c742da66b32..996213591617 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -376,6 +384,75 @@ static struct sock *audit_get_sk(const struct net *net)
> > return aunet->sk;
> > }
> >
> > +void audit_netns_contid_add(struct net *net, u64 contid)
> > +{
> > + struct audit_net *aunet;
> > + struct list_head *contid_list;
> > + struct audit_contid *cont;
> > +
> > + if (!net)
> > + return;
> > + if (!audit_contid_valid(contid))
> > + return;
> > + aunet = net_generic(net, audit_net_id);
> > + if (!aunet)
> > + return;
> > + contid_list = &aunet->contid_list;
> > + spin_lock(&aunet->contid_list_lock);
> > + list_for_each_entry_rcu(cont, contid_list, list)
> > + if (cont->id == contid) {
> > + refcount_inc(&cont->refcount);
> > + goto out;
> > + }
> > + cont = kmalloc(sizeof(struct audit_contid), GFP_ATOMIC);
> > + if (cont) {
> > + INIT_LIST_HEAD(&cont->list);
>
> I thought you were going to get rid of this INIT_LIST_HEAD() call?

I was intending to, and then Neil weighed in with this opinion:

https://www.redhat.com/archives/linux-audit/2019-April/msg00014.html

If you feel that isn't important, please remove it.

> > + cont->id = contid;
> > + refcount_set(&cont->refcount, 1);
> > + list_add_rcu(&cont->list, contid_list);
> > + }
> > +out:
> > + spin_unlock(&aunet->contid_list_lock);
> > +}
>
> --
> paul moore
> http://www.paul-moore.com

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-05-30 14:21:33

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 08/10] audit: add containerid filtering

On 2019-05-29 18:16, Paul Moore wrote:
> On Mon, Apr 8, 2019 at 11:41 PM Richard Guy Briggs <[email protected]> wrote:
> >
> > Implement audit container identifier filtering using the AUDIT_CONTID
> > field name to send an 8-character string representing a u64 since the
> > value field is only u32.
> >
> > Sending it as two u32 was considered, but gathering and comparing two
> > fields was more complex.
> >
> > The feature indicator is AUDIT_FEATURE_BITMAP_CONTAINERID.
> >
> > Please see the github audit kernel issue for the contid filter feature:
> > https://github.com/linux-audit/audit-kernel/issues/91
> > Please see the github audit userspace issue for filter additions:
> > https://github.com/linux-audit/audit-userspace/issues/40
> > Please see the github audit testsuiite issue for the test case:
> > https://github.com/linux-audit/audit-testsuite/issues/64
> > Please see the github audit wiki for the feature overview:
> > https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> > Signed-off-by: Richard Guy Briggs <[email protected]>
> > Acked-by: Serge Hallyn <[email protected]>
> > Acked-by: Neil Horman <[email protected]>
> > Reviewed-by: Ondrej Mosnacek <[email protected]>
> > ---
> > include/linux/audit.h | 1 +
> > include/uapi/linux/audit.h | 5 ++++-
> > kernel/audit.h | 1 +
> > kernel/auditfilter.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++
> > kernel/auditsc.c | 4 ++++
> > 5 files changed, 57 insertions(+), 1 deletion(-)
>
> ...
>
> > diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> > index 63f8b3f26fab..407b5bb3b4c6 100644
> > --- a/kernel/auditfilter.c
> > +++ b/kernel/auditfilter.c
> > @@ -1206,6 +1224,31 @@ int audit_comparator(u32 left, u32 op, u32 right)
> > }
> > }
> >
> > +int audit_comparator64(u64 left, u32 op, u64 right)
> > +{
> > + switch (op) {
> > + case Audit_equal:
> > + return (left == right);
> > + case Audit_not_equal:
> > + return (left != right);
> > + case Audit_lt:
> > + return (left < right);
> > + case Audit_le:
> > + return (left <= right);
> > + case Audit_gt:
> > + return (left > right);
> > + case Audit_ge:
> > + return (left >= right);
> > + case Audit_bitmask:
> > + return (left & right);
> > + case Audit_bittest:
> > + return ((left & right) == right);
> > + default:
> > + BUG();
>
> A little birdy mentioned the BUG() here as a potential issue and while
> I had ignored it in earlier patches because this is likely a
> cut-n-paste from another audit comparator function, I took a closer
> look this time. It appears as though we will never have an invalid op
> value as audit_data_to_entry()/audit_to_op() ensure that the op value
> is a a known good value. Removing the BUG() from all the audit
> comparators is a separate issue, but I think it would be good to
> remove it from this newly added comparator; keeping it so that we
> return "0" in the default case seems reasoanble.

Fair enough. That BUG(); can be removed.

> > + return 0;
> > + }
> > +}
>
> --
> paul moore
> http://www.paul-moore.com

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-05-30 14:35:54

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 09/10] audit: add support for containerid to network namespaces

On Thu, May 30, 2019 at 10:16 AM Richard Guy Briggs <[email protected]> wrote:
>
> On 2019-05-29 18:17, Paul Moore wrote:
> > On Mon, Apr 8, 2019 at 11:41 PM Richard Guy Briggs <[email protected]> wrote:
> > >
> > > Audit events could happen in a network namespace outside of a task
> > > context due to packets received from the net that trigger an auditing
> > > rule prior to being associated with a running task. The network
> > > namespace could be in use by multiple containers by association to the
> > > tasks in that network namespace. We still want a way to attribute
> > > these events to any potential containers. Keep a list per network
> > > namespace to track these audit container identifiiers.
> > >
> > > Add/increment the audit container identifier on:
> > > - initial setting of the audit container identifier via /proc
> > > - clone/fork call that inherits an audit container identifier
> > > - unshare call that inherits an audit container identifier
> > > - setns call that inherits an audit container identifier
> > > Delete/decrement the audit container identifier on:
> > > - an inherited audit container identifier dropped when child set
> > > - process exit
> > > - unshare call that drops a net namespace
> > > - setns call that drops a net namespace
> > >
> > > Please see the github audit kernel issue for contid net support:
> > > https://github.com/linux-audit/audit-kernel/issues/92
> > > Please see the github audit testsuiite issue for the test case:
> > > https://github.com/linux-audit/audit-testsuite/issues/64
> > > Please see the github audit wiki for the feature overview:
> > > https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> > > Signed-off-by: Richard Guy Briggs <[email protected]>
> > > Acked-by: Neil Horman <[email protected]>
> > > Reviewed-by: Ondrej Mosnacek <[email protected]>
> > > ---
> > > include/linux/audit.h | 19 +++++++++++
> > > kernel/audit.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++--
> > > kernel/nsproxy.c | 4 +++
> > > 3 files changed, 108 insertions(+), 3 deletions(-)
> >
> > ...
> >
> > > diff --git a/kernel/audit.c b/kernel/audit.c
> > > index 6c742da66b32..996213591617 100644
> > > --- a/kernel/audit.c
> > > +++ b/kernel/audit.c
> > > @@ -376,6 +384,75 @@ static struct sock *audit_get_sk(const struct net *net)
> > > return aunet->sk;
> > > }
> > >
> > > +void audit_netns_contid_add(struct net *net, u64 contid)
> > > +{
> > > + struct audit_net *aunet;
> > > + struct list_head *contid_list;
> > > + struct audit_contid *cont;
> > > +
> > > + if (!net)
> > > + return;
> > > + if (!audit_contid_valid(contid))
> > > + return;
> > > + aunet = net_generic(net, audit_net_id);
> > > + if (!aunet)
> > > + return;
> > > + contid_list = &aunet->contid_list;
> > > + spin_lock(&aunet->contid_list_lock);
> > > + list_for_each_entry_rcu(cont, contid_list, list)
> > > + if (cont->id == contid) {
> > > + refcount_inc(&cont->refcount);
> > > + goto out;
> > > + }
> > > + cont = kmalloc(sizeof(struct audit_contid), GFP_ATOMIC);
> > > + if (cont) {
> > > + INIT_LIST_HEAD(&cont->list);
> >
> > I thought you were going to get rid of this INIT_LIST_HEAD() call?
>
> I was intending to, and then Neil weighed in with this opinion:
>
> https://www.redhat.com/archives/linux-audit/2019-April/msg00014.html
>
> If you feel that isn't important, please remove it.

Okay, I missed/forgot that, it seems like the right thing to do is to
leave it as-is.

--
paul moore
http://www.paul-moore.com

2019-05-30 14:36:11

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 04/10] audit: log container info of syscalls

On Thu, May 30, 2019 at 10:09 AM Richard Guy Briggs <[email protected]> wrote:
>
> On 2019-05-30 15:08, Ondrej Mosnacek wrote:
> > On Thu, May 30, 2019 at 12:16 AM Paul Moore <[email protected]> wrote:
> > > On Mon, Apr 8, 2019 at 11:40 PM Richard Guy Briggs <[email protected]> wrote:
> > > >
> > > > Create a new audit record AUDIT_CONTAINER_ID to document the audit
> > > > container identifier of a process if it is present.
> > > >
> > > > Called from audit_log_exit(), syscalls are covered.
> > > >
> > > > A sample raw event:
> > > > type=SYSCALL msg=audit(1519924845.499:257): arch=c000003e syscall=257 success=yes exit=3 a0=ffffff9c a1=56374e1cef30 a2=241 a3=1b6 items=2 ppid=606 pid=635 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3 comm="bash" exe="/usr/bin/bash" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="tmpcontainerid"
> > > > type=CWD msg=audit(1519924845.499:257): cwd="/root"
> > > > type=PATH msg=audit(1519924845.499:257): item=0 name="/tmp/" inode=13863 dev=00:27 mode=041777 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype= PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
> > > > type=PATH msg=audit(1519924845.499:257): item=1 name="/tmp/tmpcontainerid" inode=17729 dev=00:27 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
> > > > type=PROCTITLE msg=audit(1519924845.499:257): proctitle=62617368002D6300736C65657020313B206563686F2074657374203E202F746D702F746D70636F6E7461696E65726964
> > > > type=CONTAINER_ID msg=audit(1519924845.499:257): contid=123458
> > > >
> > > > Please see the github audit kernel issue for the main feature:
> > > > https://github.com/linux-audit/audit-kernel/issues/90
> > > > Please see the github audit userspace issue for supporting additions:
> > > > https://github.com/linux-audit/audit-userspace/issues/51
> > > > Please see the github audit testsuiite issue for the test case:
> > > > https://github.com/linux-audit/audit-testsuite/issues/64
> > > > Please see the github audit wiki for the feature overview:
> > > > https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> > > > Signed-off-by: Richard Guy Briggs <[email protected]>
> > > > Acked-by: Serge Hallyn <[email protected]>
> > > > Acked-by: Steve Grubb <[email protected]>
> > > > Acked-by: Neil Horman <[email protected]>
> > > > Reviewed-by: Ondrej Mosnacek <[email protected]>
> > > > ---
> > > > include/linux/audit.h | 5 +++++
> > > > include/uapi/linux/audit.h | 1 +
> > > > kernel/audit.c | 20 ++++++++++++++++++++
> > > > kernel/auditsc.c | 20 ++++++++++++++------
> > > > 4 files changed, 40 insertions(+), 6 deletions(-)
> > >
> > > ...
> > >
> > > > diff --git a/kernel/audit.c b/kernel/audit.c
> > > > index 182b0f2c183d..3e0af53f3c4d 100644
> > > > --- a/kernel/audit.c
> > > > +++ b/kernel/audit.c
> > > > @@ -2127,6 +2127,26 @@ void audit_log_session_info(struct audit_buffer *ab)
> > > > audit_log_format(ab, "auid=%u ses=%u", auid, sessionid);
> > > > }
> > > >
> > > > +/*
> > > > + * audit_log_contid - report container info
> > > > + * @context: task or local context for record
> > > > + * @contid: container ID to report
> > > > + */
> > > > +void audit_log_contid(struct audit_context *context, u64 contid)
> > > > +{
> > > > + struct audit_buffer *ab;
> > > > +
> > > > + if (!audit_contid_valid(contid))
> > > > + return;
> > > > + /* Generate AUDIT_CONTAINER_ID record with container ID */
> > > > + ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONTAINER_ID);
> > > > + if (!ab)
> > > > + return;
> > > > + audit_log_format(ab, "contid=%llu", (unsigned long long)contid);
> > >
> > > We have a consistency problem regarding how to output the u64 contid
> > > values; this function uses an explicit cast, others do not. According
> > > to Documentation/core-api/printk-formats.rst the recommendation for
> > > u64 is %llu (or %llx, if you want hex). Looking quickly through the
> > > printk code this appears to still be correct. I suggest we get rid of
> > > the cast (like it was in v5).
> >
> > IIRC it was me who suggested to add the casts. I didn't realize that
> > the kernel actually guarantees that "%llu" will always work with u64.
> > Taking that into account I rescind my request to add the cast. Sorry
> > for the false alarm.
>
> Yeah, just remove the cast.

Okay, this is trivial enough I'll take care of this during the merge
with a note.

--
paul moore
http://www.paul-moore.com

2019-05-30 14:38:12

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 08/10] audit: add containerid filtering

On Thu, May 30, 2019 at 10:20 AM Richard Guy Briggs <[email protected]> wrote:
>
> On 2019-05-29 18:16, Paul Moore wrote:
> > On Mon, Apr 8, 2019 at 11:41 PM Richard Guy Briggs <[email protected]> wrote:
> > >
> > > Implement audit container identifier filtering using the AUDIT_CONTID
> > > field name to send an 8-character string representing a u64 since the
> > > value field is only u32.
> > >
> > > Sending it as two u32 was considered, but gathering and comparing two
> > > fields was more complex.
> > >
> > > The feature indicator is AUDIT_FEATURE_BITMAP_CONTAINERID.
> > >
> > > Please see the github audit kernel issue for the contid filter feature:
> > > https://github.com/linux-audit/audit-kernel/issues/91
> > > Please see the github audit userspace issue for filter additions:
> > > https://github.com/linux-audit/audit-userspace/issues/40
> > > Please see the github audit testsuiite issue for the test case:
> > > https://github.com/linux-audit/audit-testsuite/issues/64
> > > Please see the github audit wiki for the feature overview:
> > > https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> > > Signed-off-by: Richard Guy Briggs <[email protected]>
> > > Acked-by: Serge Hallyn <[email protected]>
> > > Acked-by: Neil Horman <[email protected]>
> > > Reviewed-by: Ondrej Mosnacek <[email protected]>
> > > ---
> > > include/linux/audit.h | 1 +
> > > include/uapi/linux/audit.h | 5 ++++-
> > > kernel/audit.h | 1 +
> > > kernel/auditfilter.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++
> > > kernel/auditsc.c | 4 ++++
> > > 5 files changed, 57 insertions(+), 1 deletion(-)
> >
> > ...
> >
> > > diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> > > index 63f8b3f26fab..407b5bb3b4c6 100644
> > > --- a/kernel/auditfilter.c
> > > +++ b/kernel/auditfilter.c
> > > @@ -1206,6 +1224,31 @@ int audit_comparator(u32 left, u32 op, u32 right)
> > > }
> > > }
> > >
> > > +int audit_comparator64(u64 left, u32 op, u64 right)
> > > +{
> > > + switch (op) {
> > > + case Audit_equal:
> > > + return (left == right);
> > > + case Audit_not_equal:
> > > + return (left != right);
> > > + case Audit_lt:
> > > + return (left < right);
> > > + case Audit_le:
> > > + return (left <= right);
> > > + case Audit_gt:
> > > + return (left > right);
> > > + case Audit_ge:
> > > + return (left >= right);
> > > + case Audit_bitmask:
> > > + return (left & right);
> > > + case Audit_bittest:
> > > + return ((left & right) == right);
> > > + default:
> > > + BUG();
> >
> > A little birdy mentioned the BUG() here as a potential issue and while
> > I had ignored it in earlier patches because this is likely a
> > cut-n-paste from another audit comparator function, I took a closer
> > look this time. It appears as though we will never have an invalid op
> > value as audit_data_to_entry()/audit_to_op() ensure that the op value
> > is a a known good value. Removing the BUG() from all the audit
> > comparators is a separate issue, but I think it would be good to
> > remove it from this newly added comparator; keeping it so that we
> > return "0" in the default case seems reasoanble.
>
> Fair enough. That BUG(); can be removed.

Please send a fixup patch for this.

--
paul moore
http://www.paul-moore.com

2019-05-30 17:12:23

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Wed, May 29, 2019 at 06:39:48PM -0400, Paul Moore wrote:
> On Wed, May 29, 2019 at 6:28 PM Tycho Andersen <[email protected]> wrote:
> > On Wed, May 29, 2019 at 12:03:58PM -0400, Paul Moore wrote:
> > > On Wed, May 29, 2019 at 11:34 AM Tycho Andersen <[email protected]> wrote:
> > > >
> > > > On Wed, May 29, 2019 at 11:29:05AM -0400, Paul Moore wrote:
> > > > > On Wed, May 29, 2019 at 10:57 AM Tycho Andersen <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, Apr 08, 2019 at 11:39:09PM -0400, Richard Guy Briggs wrote:
> > > > > > > It is not permitted to unset the audit container identifier.
> > > > > > > A child inherits its parent's audit container identifier.
> > > > > >
> > > > > > ...
> > > > > >
> > > > > > > /**
> > > > > > > + * audit_set_contid - set current task's audit contid
> > > > > > > + * @contid: contid value
> > > > > > > + *
> > > > > > > + * Returns 0 on success, -EPERM on permission failure.
> > > > > > > + *
> > > > > > > + * Called (set) from fs/proc/base.c::proc_contid_write().
> > > > > > > + */
> > > > > > > +int audit_set_contid(struct task_struct *task, u64 contid)
> > > > > > > +{
> > > > > > > + u64 oldcontid;
> > > > > > > + int rc = 0;
> > > > > > > + struct audit_buffer *ab;
> > > > > > > + uid_t uid;
> > > > > > > + struct tty_struct *tty;
> > > > > > > + char comm[sizeof(current->comm)];
> > > > > > > +
> > > > > > > + task_lock(task);
> > > > > > > + /* Can't set if audit disabled */
> > > > > > > + if (!task->audit) {
> > > > > > > + task_unlock(task);
> > > > > > > + return -ENOPROTOOPT;
> > > > > > > + }
> > > > > > > + oldcontid = audit_get_contid(task);
> > > > > > > + read_lock(&tasklist_lock);
> > > > > > > + /* Don't allow the audit containerid to be unset */
> > > > > > > + if (!audit_contid_valid(contid))
> > > > > > > + rc = -EINVAL;
> > > > > > > + /* if we don't have caps, reject */
> > > > > > > + else if (!capable(CAP_AUDIT_CONTROL))
> > > > > > > + rc = -EPERM;
> > > > > > > + /* if task has children or is not single-threaded, deny */
> > > > > > > + else if (!list_empty(&task->children))
> > > > > > > + rc = -EBUSY;
> > > > > > > + else if (!(thread_group_leader(task) && thread_group_empty(task)))
> > > > > > > + rc = -EALREADY;
> > > > > > > + read_unlock(&tasklist_lock);
> > > > > > > + if (!rc)
> > > > > > > + task->audit->contid = contid;
> > > > > > > + task_unlock(task);
> > > > > > > +
> > > > > > > + if (!audit_enabled)
> > > > > > > + return rc;
> > > > > >
> > > > > > ...but it is allowed to change it (assuming
> > > > > > capable(CAP_AUDIT_CONTROL), of course)? Seems like this might be more
> > > > > > immediately useful since we still live in the world of majority
> > > > > > privileged containers if we didn't allow changing it, in addition to
> > > > > > un-setting it.
> > > > >
> > > > > The idea is that only container orchestrators should be able to
> > > > > set/modify the audit container ID, and since setting the audit
> > > > > container ID can have a significant effect on the records captured
> > > > > (and their routing to multiple daemons when we get there) modifying
> > > > > the audit container ID is akin to modifying the audit configuration
> > > > > which is why it is gated by CAP_AUDIT_CONTROL. The current thinking
> > > > > is that you would only change the audit container ID from one
> > > > > set/inherited value to another if you were nesting containers, in
> > > > > which case the nested container orchestrator would need to be granted
> > > > > CAP_AUDIT_CONTROL (which everyone to date seems to agree is a workable
> > > > > compromise).
> > > >
> > > > But then don't you want some kind of ns_capable() instead (probably
> > > > not the obvious one, though...)? With capable(), you can't really nest
> > > > using the audit-id and user namespaces together.
> > >
> > > You want capable() and not ns_capable() because you want to ensure
> > > that the orchestrator has the rights in the init_ns as changes to the
> > > audit container ID could have an auditing impact that spans the entire
> > > system.
> >
> > Ok but,
> >
> > > > > The current thinking
> > > > > is that you would only change the audit container ID from one
> > > > > set/inherited value to another if you were nesting containers, in
> > > > > which case the nested container orchestrator would need to be granted
> > > > > CAP_AUDIT_CONTROL (which everyone to date seems to agree is a workable
> > > > > compromise).
> >
> > won't work in user namespaced containers, because they will never be
> > capable(CAP_AUDIT_CONTROL); so I don't think this will work for
> > nesting as is. But maybe nobody cares :)
>
> That's fun :)
>
> To be honest, I've never been a big fan of supporting nested
> containers from an audit perspective, so I'm not really too upset
> about this. The k8s/cri-o folks seem okay with this, or at least I
> haven't heard any objections; lxc folks, what do you have to say?

I actually thought the answer to this (when last I looked, "some time" ago)
was that userspace should track an audit message saying "task X in
container Y is changing its auditid to Z", and then decide to also track Z.
This should be doable, but a lot of extra work in userspace.

Per-userns containerids would also work. So task X1 is in containerid
1 on the host and creates a new task Y in new userns; it continues to
be reported in init_user_ns as containerid 1 forever; but in its own
userns it can request to be known as some other containerid. Audit
socks would be per-userns, allowing root in a container to watch for
audit events in its own (and descendent) namespaces.

But again I'm sure we've gone over all this in the last few years.

I suppose we can look at this as a "first step", and talk about
making it user-ns-nestable later. But agreed it's not useful in a
lot of situations as is.

-serge

2019-05-30 19:31:16

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Thu, May 30, 2019 at 1:09 PM Serge E. Hallyn <[email protected]> wrote:
> On Wed, May 29, 2019 at 06:39:48PM -0400, Paul Moore wrote:
> > On Wed, May 29, 2019 at 6:28 PM Tycho Andersen <[email protected]> wrote:
> > > On Wed, May 29, 2019 at 12:03:58PM -0400, Paul Moore wrote:
> > > > On Wed, May 29, 2019 at 11:34 AM Tycho Andersen <[email protected]> wrote:
> > > > > On Wed, May 29, 2019 at 11:29:05AM -0400, Paul Moore wrote:
> > > > > > On Wed, May 29, 2019 at 10:57 AM Tycho Andersen <[email protected]> wrote:
> > > > > > > On Mon, Apr 08, 2019 at 11:39:09PM -0400, Richard Guy Briggs wrote:

...

> > > > > > The current thinking
> > > > > > is that you would only change the audit container ID from one
> > > > > > set/inherited value to another if you were nesting containers, in
> > > > > > which case the nested container orchestrator would need to be granted
> > > > > > CAP_AUDIT_CONTROL (which everyone to date seems to agree is a workable
> > > > > > compromise).
> > >
> > > won't work in user namespaced containers, because they will never be
> > > capable(CAP_AUDIT_CONTROL); so I don't think this will work for
> > > nesting as is. But maybe nobody cares :)
> >
> > That's fun :)
> >
> > To be honest, I've never been a big fan of supporting nested
> > containers from an audit perspective, so I'm not really too upset
> > about this. The k8s/cri-o folks seem okay with this, or at least I
> > haven't heard any objections; lxc folks, what do you have to say?
>
> I actually thought the answer to this (when last I looked, "some time" ago)
> was that userspace should track an audit message saying "task X in
> container Y is changing its auditid to Z", and then decide to also track Z.
> This should be doable, but a lot of extra work in userspace.
>
> Per-userns containerids would also work. So task X1 is in containerid
> 1 on the host and creates a new task Y in new userns; it continues to
> be reported in init_user_ns as containerid 1 forever; but in its own
> userns it can request to be known as some other containerid. Audit
> socks would be per-userns, allowing root in a container to watch for
> audit events in its own (and descendent) namespaces.
>
> But again I'm sure we've gone over all this in the last few years.
>
> I suppose we can look at this as a "first step", and talk about
> making it user-ns-nestable later. But agreed it's not useful in a
> lot of situations as is.

[REMINDER: It is an "*audit* container ID" and not a general
"container ID" ;) Smiley aside, I'm not kidding about that part.]

I'm not interested in supporting/merging something that isn't useful;
if this doesn't work for your use case then we need to figure out what
would work. It sounds like nested containers are much more common in
the lxc world, can you elaborate a bit more on this?

As far as the possible solutions you mention above, I'm not sure I
like the per-userns audit container IDs, I'd much rather just emit the
necessary tracking information via the audit record stream and let the
log analysis tools figure it out. However, the bigger question is how
to limit (re)setting the audit container ID when you are in a non-init
userns. For reasons already mentioned, using capable() is a non
starter for everything but the initial userns, and using ns_capable()
is equally poor as it essentially allows any userns the ability to
munge it's audit container ID (obviously not good). It appears we
need a different method for controlling access to the audit container
ID.

Punting this to a LSM hook is an obvious thing to do, and something we
might want to do anyway, but currently audit doesn't rely on the LSM
for proper/safe operation and I'm not sure I want to change that now.

The next obvious thing is to create some sort of access control knob
in audit itself. Perhaps an auditctl operation that would allow the
administrator to specify which containers, via their corresponding
audit container IDs, are allowed to change their audit container ID?
The permission granting would need to be done in the init userns, but
it would allow containers with a non-init userns the ability to change
their audit container ID. We would probably still want a
ns_capable(CAP_AUDIT_CONTROL) restriction in this case.

Does anyone else have any other ideas?

--
paul moore
http://www.paul-moore.com

2019-05-30 20:39:11

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 08/10] audit: add containerid filtering

On 2019-05-30 10:34, Paul Moore wrote:
> On Thu, May 30, 2019 at 10:20 AM Richard Guy Briggs <[email protected]> wrote:
> >
> > On 2019-05-29 18:16, Paul Moore wrote:
> > > On Mon, Apr 8, 2019 at 11:41 PM Richard Guy Briggs <[email protected]> wrote:
> > > >
> > > > Implement audit container identifier filtering using the AUDIT_CONTID
> > > > field name to send an 8-character string representing a u64 since the
> > > > value field is only u32.
> > > >
> > > > Sending it as two u32 was considered, but gathering and comparing two
> > > > fields was more complex.
> > > >
> > > > The feature indicator is AUDIT_FEATURE_BITMAP_CONTAINERID.
> > > >
> > > > Please see the github audit kernel issue for the contid filter feature:
> > > > https://github.com/linux-audit/audit-kernel/issues/91
> > > > Please see the github audit userspace issue for filter additions:
> > > > https://github.com/linux-audit/audit-userspace/issues/40
> > > > Please see the github audit testsuiite issue for the test case:
> > > > https://github.com/linux-audit/audit-testsuite/issues/64
> > > > Please see the github audit wiki for the feature overview:
> > > > https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> > > > Signed-off-by: Richard Guy Briggs <[email protected]>
> > > > Acked-by: Serge Hallyn <[email protected]>
> > > > Acked-by: Neil Horman <[email protected]>
> > > > Reviewed-by: Ondrej Mosnacek <[email protected]>
> > > > ---
> > > > include/linux/audit.h | 1 +
> > > > include/uapi/linux/audit.h | 5 ++++-
> > > > kernel/audit.h | 1 +
> > > > kernel/auditfilter.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++
> > > > kernel/auditsc.c | 4 ++++
> > > > 5 files changed, 57 insertions(+), 1 deletion(-)
> > >
> > > ...
> > >
> > > > diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> > > > index 63f8b3f26fab..407b5bb3b4c6 100644
> > > > --- a/kernel/auditfilter.c
> > > > +++ b/kernel/auditfilter.c
> > > > @@ -1206,6 +1224,31 @@ int audit_comparator(u32 left, u32 op, u32 right)
> > > > }
> > > > }
> > > >
> > > > +int audit_comparator64(u64 left, u32 op, u64 right)
> > > > +{
> > > > + switch (op) {
> > > > + case Audit_equal:
> > > > + return (left == right);
> > > > + case Audit_not_equal:
> > > > + return (left != right);
> > > > + case Audit_lt:
> > > > + return (left < right);
> > > > + case Audit_le:
> > > > + return (left <= right);
> > > > + case Audit_gt:
> > > > + return (left > right);
> > > > + case Audit_ge:
> > > > + return (left >= right);
> > > > + case Audit_bitmask:
> > > > + return (left & right);
> > > > + case Audit_bittest:
> > > > + return ((left & right) == right);
> > > > + default:
> > > > + BUG();
> > >
> > > A little birdy mentioned the BUG() here as a potential issue and while
> > > I had ignored it in earlier patches because this is likely a
> > > cut-n-paste from another audit comparator function, I took a closer
> > > look this time. It appears as though we will never have an invalid op
> > > value as audit_data_to_entry()/audit_to_op() ensure that the op value
> > > is a a known good value. Removing the BUG() from all the audit
> > > comparators is a separate issue, but I think it would be good to
> > > remove it from this newly added comparator; keeping it so that we
> > > return "0" in the default case seems reasoanble.
> >
> > Fair enough. That BUG(); can be removed.
>
> Please send a fixup patch for this.

The fixup patch is trivial. The rebase to v5.2-rc1 audit/next had merge
conflicts with four recent patchsets. It may be simpler to submit a new
patchset and look at a diff of the two sets. I'm testing the rebase
now.

> paul moore http://www.paul-moore.com

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-05-30 20:47:28

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 08/10] audit: add containerid filtering

On Thu, May 30, 2019 at 4:37 PM Richard Guy Briggs <[email protected]> wrote:
> On 2019-05-30 10:34, Paul Moore wrote:
> > On Thu, May 30, 2019 at 10:20 AM Richard Guy Briggs <[email protected]> wrote:
> > >
> > > On 2019-05-29 18:16, Paul Moore wrote:
> > > > On Mon, Apr 8, 2019 at 11:41 PM Richard Guy Briggs <[email protected]> wrote:
> > > > >
> > > > > Implement audit container identifier filtering using the AUDIT_CONTID
> > > > > field name to send an 8-character string representing a u64 since the
> > > > > value field is only u32.
> > > > >
> > > > > Sending it as two u32 was considered, but gathering and comparing two
> > > > > fields was more complex.
> > > > >
> > > > > The feature indicator is AUDIT_FEATURE_BITMAP_CONTAINERID.
> > > > >
> > > > > Please see the github audit kernel issue for the contid filter feature:
> > > > > https://github.com/linux-audit/audit-kernel/issues/91
> > > > > Please see the github audit userspace issue for filter additions:
> > > > > https://github.com/linux-audit/audit-userspace/issues/40
> > > > > Please see the github audit testsuiite issue for the test case:
> > > > > https://github.com/linux-audit/audit-testsuite/issues/64
> > > > > Please see the github audit wiki for the feature overview:
> > > > > https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> > > > > Signed-off-by: Richard Guy Briggs <[email protected]>
> > > > > Acked-by: Serge Hallyn <[email protected]>
> > > > > Acked-by: Neil Horman <[email protected]>
> > > > > Reviewed-by: Ondrej Mosnacek <[email protected]>
> > > > > ---
> > > > > include/linux/audit.h | 1 +
> > > > > include/uapi/linux/audit.h | 5 ++++-
> > > > > kernel/audit.h | 1 +
> > > > > kernel/auditfilter.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++
> > > > > kernel/auditsc.c | 4 ++++
> > > > > 5 files changed, 57 insertions(+), 1 deletion(-)
> > > >
> > > > ...
> > > >
> > > > > diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> > > > > index 63f8b3f26fab..407b5bb3b4c6 100644
> > > > > --- a/kernel/auditfilter.c
> > > > > +++ b/kernel/auditfilter.c
> > > > > @@ -1206,6 +1224,31 @@ int audit_comparator(u32 left, u32 op, u32 right)
> > > > > }
> > > > > }
> > > > >
> > > > > +int audit_comparator64(u64 left, u32 op, u64 right)
> > > > > +{
> > > > > + switch (op) {
> > > > > + case Audit_equal:
> > > > > + return (left == right);
> > > > > + case Audit_not_equal:
> > > > > + return (left != right);
> > > > > + case Audit_lt:
> > > > > + return (left < right);
> > > > > + case Audit_le:
> > > > > + return (left <= right);
> > > > > + case Audit_gt:
> > > > > + return (left > right);
> > > > > + case Audit_ge:
> > > > > + return (left >= right);
> > > > > + case Audit_bitmask:
> > > > > + return (left & right);
> > > > > + case Audit_bittest:
> > > > > + return ((left & right) == right);
> > > > > + default:
> > > > > + BUG();
> > > >
> > > > A little birdy mentioned the BUG() here as a potential issue and while
> > > > I had ignored it in earlier patches because this is likely a
> > > > cut-n-paste from another audit comparator function, I took a closer
> > > > look this time. It appears as though we will never have an invalid op
> > > > value as audit_data_to_entry()/audit_to_op() ensure that the op value
> > > > is a a known good value. Removing the BUG() from all the audit
> > > > comparators is a separate issue, but I think it would be good to
> > > > remove it from this newly added comparator; keeping it so that we
> > > > return "0" in the default case seems reasoanble.
> > >
> > > Fair enough. That BUG(); can be removed.
> >
> > Please send a fixup patch for this.
>
> The fixup patch is trivial.

Yes, I know.

> The rebase to v5.2-rc1 audit/next had merge
> conflicts with four recent patchsets. It may be simpler to submit a new
> patchset and look at a diff of the two sets. I'm testing the rebase
> now.

Great thanks. Although you might want to hold off a bit on posting
the next revision until we sort out the discussion which is happening
in patch 02/10; unfortunately I fear we may need to change some of the
logic.

--
paul moore
http://www.paul-moore.com

2019-05-30 21:11:51

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 08/10] audit: add containerid filtering

On 2019-05-30 16:45, Paul Moore wrote:
> On Thu, May 30, 2019 at 4:37 PM Richard Guy Briggs <[email protected]> wrote:
> > On 2019-05-30 10:34, Paul Moore wrote:
> > > On Thu, May 30, 2019 at 10:20 AM Richard Guy Briggs <[email protected]> wrote:
> > > >
> > > > On 2019-05-29 18:16, Paul Moore wrote:
> > > > > On Mon, Apr 8, 2019 at 11:41 PM Richard Guy Briggs <[email protected]> wrote:
> > > > > >
> > > > > > Implement audit container identifier filtering using the AUDIT_CONTID
> > > > > > field name to send an 8-character string representing a u64 since the
> > > > > > value field is only u32.
> > > > > >
> > > > > > Sending it as two u32 was considered, but gathering and comparing two
> > > > > > fields was more complex.
> > > > > >
> > > > > > The feature indicator is AUDIT_FEATURE_BITMAP_CONTAINERID.
> > > > > >
> > > > > > Please see the github audit kernel issue for the contid filter feature:
> > > > > > https://github.com/linux-audit/audit-kernel/issues/91
> > > > > > Please see the github audit userspace issue for filter additions:
> > > > > > https://github.com/linux-audit/audit-userspace/issues/40
> > > > > > Please see the github audit testsuiite issue for the test case:
> > > > > > https://github.com/linux-audit/audit-testsuite/issues/64
> > > > > > Please see the github audit wiki for the feature overview:
> > > > > > https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> > > > > > Signed-off-by: Richard Guy Briggs <[email protected]>
> > > > > > Acked-by: Serge Hallyn <[email protected]>
> > > > > > Acked-by: Neil Horman <[email protected]>
> > > > > > Reviewed-by: Ondrej Mosnacek <[email protected]>
> > > > > > ---
> > > > > > include/linux/audit.h | 1 +
> > > > > > include/uapi/linux/audit.h | 5 ++++-
> > > > > > kernel/audit.h | 1 +
> > > > > > kernel/auditfilter.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++
> > > > > > kernel/auditsc.c | 4 ++++
> > > > > > 5 files changed, 57 insertions(+), 1 deletion(-)
> > > > >
> > > > > ...
> > > > >
> > > > > > diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> > > > > > index 63f8b3f26fab..407b5bb3b4c6 100644
> > > > > > --- a/kernel/auditfilter.c
> > > > > > +++ b/kernel/auditfilter.c
> > > > > > @@ -1206,6 +1224,31 @@ int audit_comparator(u32 left, u32 op, u32 right)
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > +int audit_comparator64(u64 left, u32 op, u64 right)
> > > > > > +{
> > > > > > + switch (op) {
> > > > > > + case Audit_equal:
> > > > > > + return (left == right);
> > > > > > + case Audit_not_equal:
> > > > > > + return (left != right);
> > > > > > + case Audit_lt:
> > > > > > + return (left < right);
> > > > > > + case Audit_le:
> > > > > > + return (left <= right);
> > > > > > + case Audit_gt:
> > > > > > + return (left > right);
> > > > > > + case Audit_ge:
> > > > > > + return (left >= right);
> > > > > > + case Audit_bitmask:
> > > > > > + return (left & right);
> > > > > > + case Audit_bittest:
> > > > > > + return ((left & right) == right);
> > > > > > + default:
> > > > > > + BUG();
> > > > >
> > > > > A little birdy mentioned the BUG() here as a potential issue and while
> > > > > I had ignored it in earlier patches because this is likely a
> > > > > cut-n-paste from another audit comparator function, I took a closer
> > > > > look this time. It appears as though we will never have an invalid op
> > > > > value as audit_data_to_entry()/audit_to_op() ensure that the op value
> > > > > is a a known good value. Removing the BUG() from all the audit
> > > > > comparators is a separate issue, but I think it would be good to
> > > > > remove it from this newly added comparator; keeping it so that we
> > > > > return "0" in the default case seems reasoanble.
> > > >
> > > > Fair enough. That BUG(); can be removed.
> > >
> > > Please send a fixup patch for this.
> >
> > The fixup patch is trivial.
>
> Yes, I know.
>
> > The rebase to v5.2-rc1 audit/next had merge
> > conflicts with four recent patchsets. It may be simpler to submit a new
> > patchset and look at a diff of the two sets. I'm testing the rebase
> > now.
>
> Great thanks. Although you might want to hold off a bit on posting
> the next revision until we sort out the discussion which is happening
> in patch 02/10; unfortunately I fear we may need to change some of the
> logic.

I'm watching... I have no immediate ideas on how to address that
discussion yet. I'm optimistic it can be adjusted after the initial
commit without changing the API.

> paul moore http://www.paul-moore.com

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-05-30 22:00:36

by Tycho Andersen

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Thu, May 30, 2019 at 03:29:32PM -0400, Paul Moore wrote:
>
> [REMINDER: It is an "*audit* container ID" and not a general
> "container ID" ;) Smiley aside, I'm not kidding about that part.]

This sort of seems like a distinction without a difference; presumably
audit is going to want to differentiate between everything that people
in userspace call a container. So you'll have to support all this
insanity anyway, even if it's "not a container ID".

> I'm not interested in supporting/merging something that isn't useful;
> if this doesn't work for your use case then we need to figure out what
> would work. It sounds like nested containers are much more common in
> the lxc world, can you elaborate a bit more on this?
>
> As far as the possible solutions you mention above, I'm not sure I
> like the per-userns audit container IDs, I'd much rather just emit the
> necessary tracking information via the audit record stream and let the
> log analysis tools figure it out. However, the bigger question is how
> to limit (re)setting the audit container ID when you are in a non-init
> userns. For reasons already mentioned, using capable() is a non
> starter for everything but the initial userns, and using ns_capable()
> is equally poor as it essentially allows any userns the ability to
> munge it's audit container ID (obviously not good). It appears we
> need a different method for controlling access to the audit container
> ID.

One option would be to make it a string, and have it be append only.
That should be safe with no checks.

I know there was a long thread about what type to make this thing. I
think you could accomplish the append-only-ness with a u64 if you had
some rule about only allowing setting lower order bits than those that
are already set. With 4 bits for simplicity:

1100 # initial container id
1100 -> 1011 # not allowed
1100 -> 1101 # allowed, but now 1101 is set in stone since there are
# no lower order bits left

There are probably fancier ways to do it if you actually understand
math :)

Since userns nesting is limited to 32 levels (right now, IIRC), and
you have 64 bits, this might be reasonable. You could just teach
container engines to use the first say N bits for themselves, with a 1
bit for the barrier at the end.

Tycho

2019-05-30 23:28:39

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Thu, May 30, 2019 at 5:29 PM Tycho Andersen <[email protected]> wrote:
> On Thu, May 30, 2019 at 03:29:32PM -0400, Paul Moore wrote:
> >
> > [REMINDER: It is an "*audit* container ID" and not a general
> > "container ID" ;) Smiley aside, I'm not kidding about that part.]
>
> This sort of seems like a distinction without a difference; presumably
> audit is going to want to differentiate between everything that people
> in userspace call a container. So you'll have to support all this
> insanity anyway, even if it's "not a container ID".

That's not quite right. Audit doesn't care about what a container is,
or is not, it also doesn't care if the "audit container ID" actually
matches the ID used by the container engine in userspace and I think
that is a very important line to draw. Audit is simply given a value
which it calls the "audit container ID", it ensures that the value is
inherited appropriately (e.g. children inherit their parent's audit
container ID), and it uses the value in audit records to provide some
additional context for log analysis. The distinction isn't limited to
the value itself, but also to how it is used; it is an "audit
container ID" and not a "container ID" because this value is
exclusively for use by the audit subsystem. We are very intentionally
not adding a generic container ID to the kernel. If the kernel does
ever grow a general purpose container ID we will be one of the first
ones in line to make use of it, but we are not going to be the ones to
generically add containers to the kernel. Enough people already hate
audit ;)

> > I'm not interested in supporting/merging something that isn't useful;
> > if this doesn't work for your use case then we need to figure out what
> > would work. It sounds like nested containers are much more common in
> > the lxc world, can you elaborate a bit more on this?
> >
> > As far as the possible solutions you mention above, I'm not sure I
> > like the per-userns audit container IDs, I'd much rather just emit the
> > necessary tracking information via the audit record stream and let the
> > log analysis tools figure it out. However, the bigger question is how
> > to limit (re)setting the audit container ID when you are in a non-init
> > userns. For reasons already mentioned, using capable() is a non
> > starter for everything but the initial userns, and using ns_capable()
> > is equally poor as it essentially allows any userns the ability to
> > munge it's audit container ID (obviously not good). It appears we
> > need a different method for controlling access to the audit container
> > ID.
>
> One option would be to make it a string, and have it be append only.
> That should be safe with no checks.
>
> I know there was a long thread about what type to make this thing. I
> think you could accomplish the append-only-ness with a u64 if you had
> some rule about only allowing setting lower order bits than those that
> are already set. With 4 bits for simplicity:
>
> 1100 # initial container id
> 1100 -> 1011 # not allowed
> 1100 -> 1101 # allowed, but now 1101 is set in stone since there are
> # no lower order bits left
>
> There are probably fancier ways to do it if you actually understand
> math :)

;)

> Since userns nesting is limited to 32 levels (right now, IIRC), and
> you have 64 bits, this might be reasonable. You could just teach
> container engines to use the first say N bits for themselves, with a 1
> bit for the barrier at the end.

I like the creativity, but I worry that at some point these
limitations are going to be raised (limits have a funny way of doing
that over time) and we will be in trouble. I say "trouble" because I
want to be able to quickly do an audit container ID comparison and
we're going to pay a penalty for these larger values (we'll need this
when we add multiple auditd support and the requisite record routing).

Thinking about this makes me also realize we probably need to think a
bit longer about audit container ID conflicts between orchestrators.
Right now we just take the value that is given to us by the
orchestrator, but if we want to allow multiple container orchestrators
to work without some form of cooperation in userspace (I think we have
to assume the orchestrators will not talk to each other) we likely
need to have some way to block reuse of an audit container ID. We
would either need to prevent the orchestrator from explicitly setting
an audit container ID to a currently in use value, or instead generate
the audit container ID in the kernel upon an event triggered by the
orchestrator (e.g. a write to a /proc file). I suspect we should
start looking at the idr code, I think we will need to make use of it.

--
paul moore
http://www.paul-moore.com

2019-05-31 00:22:36

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On 2019-05-30 19:26, Paul Moore wrote:
> On Thu, May 30, 2019 at 5:29 PM Tycho Andersen <[email protected]> wrote:
> > On Thu, May 30, 2019 at 03:29:32PM -0400, Paul Moore wrote:
> > >
> > > [REMINDER: It is an "*audit* container ID" and not a general
> > > "container ID" ;) Smiley aside, I'm not kidding about that part.]
> >
> > This sort of seems like a distinction without a difference; presumably
> > audit is going to want to differentiate between everything that people
> > in userspace call a container. So you'll have to support all this
> > insanity anyway, even if it's "not a container ID".
>
> That's not quite right. Audit doesn't care about what a container is,
> or is not, it also doesn't care if the "audit container ID" actually
> matches the ID used by the container engine in userspace and I think
> that is a very important line to draw. Audit is simply given a value
> which it calls the "audit container ID", it ensures that the value is
> inherited appropriately (e.g. children inherit their parent's audit
> container ID), and it uses the value in audit records to provide some
> additional context for log analysis. The distinction isn't limited to
> the value itself, but also to how it is used; it is an "audit
> container ID" and not a "container ID" because this value is
> exclusively for use by the audit subsystem. We are very intentionally
> not adding a generic container ID to the kernel. If the kernel does
> ever grow a general purpose container ID we will be one of the first
> ones in line to make use of it, but we are not going to be the ones to
> generically add containers to the kernel. Enough people already hate
> audit ;)
>
> > > I'm not interested in supporting/merging something that isn't useful;
> > > if this doesn't work for your use case then we need to figure out what
> > > would work. It sounds like nested containers are much more common in
> > > the lxc world, can you elaborate a bit more on this?
> > >
> > > As far as the possible solutions you mention above, I'm not sure I
> > > like the per-userns audit container IDs, I'd much rather just emit the
> > > necessary tracking information via the audit record stream and let the
> > > log analysis tools figure it out. However, the bigger question is how
> > > to limit (re)setting the audit container ID when you are in a non-init
> > > userns. For reasons already mentioned, using capable() is a non
> > > starter for everything but the initial userns, and using ns_capable()
> > > is equally poor as it essentially allows any userns the ability to
> > > munge it's audit container ID (obviously not good). It appears we
> > > need a different method for controlling access to the audit container
> > > ID.
> >
> > One option would be to make it a string, and have it be append only.
> > That should be safe with no checks.
> >
> > I know there was a long thread about what type to make this thing. I
> > think you could accomplish the append-only-ness with a u64 if you had
> > some rule about only allowing setting lower order bits than those that
> > are already set. With 4 bits for simplicity:
> >
> > 1100 # initial container id
> > 1100 -> 1011 # not allowed
> > 1100 -> 1101 # allowed, but now 1101 is set in stone since there are
> > # no lower order bits left
> >
> > There are probably fancier ways to do it if you actually understand
> > math :)
>
> ;)
>
> > Since userns nesting is limited to 32 levels (right now, IIRC), and
> > you have 64 bits, this might be reasonable. You could just teach
> > container engines to use the first say N bits for themselves, with a 1
> > bit for the barrier at the end.
>
> I like the creativity, but I worry that at some point these
> limitations are going to be raised (limits have a funny way of doing
> that over time) and we will be in trouble. I say "trouble" because I
> want to be able to quickly do an audit container ID comparison and
> we're going to pay a penalty for these larger values (we'll need this
> when we add multiple auditd support and the requisite record routing).
>
> Thinking about this makes me also realize we probably need to think a
> bit longer about audit container ID conflicts between orchestrators.
> Right now we just take the value that is given to us by the
> orchestrator, but if we want to allow multiple container orchestrators
> to work without some form of cooperation in userspace (I think we have
> to assume the orchestrators will not talk to each other) we likely
> need to have some way to block reuse of an audit container ID. We
> would either need to prevent the orchestrator from explicitly setting
> an audit container ID to a currently in use value, or instead generate
> the audit container ID in the kernel upon an event triggered by the
> orchestrator (e.g. a write to a /proc file). I suspect we should
> start looking at the idr code, I think we will need to make use of it.

My first reaction to using the IDR code is that once an idr is given up,
it can be reused. I suppose we request IDRs and then never give them up
to avoid reuse...

I already had some ideas of preventing an existing ID from being reused,
but that makes the practice of some container engines injecting
processes into existing containers difficult if not impossible.

> paul moore
> http://www.paul-moore.com

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-05-31 12:47:37

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Thu, May 30, 2019 at 8:21 PM Richard Guy Briggs <[email protected]> wrote:
> On 2019-05-30 19:26, Paul Moore wrote:
> > On Thu, May 30, 2019 at 5:29 PM Tycho Andersen <[email protected]> wrote:
> > > On Thu, May 30, 2019 at 03:29:32PM -0400, Paul Moore wrote:
> > > >
> > > > [REMINDER: It is an "*audit* container ID" and not a general
> > > > "container ID" ;) Smiley aside, I'm not kidding about that part.]
> > >
> > > This sort of seems like a distinction without a difference; presumably
> > > audit is going to want to differentiate between everything that people
> > > in userspace call a container. So you'll have to support all this
> > > insanity anyway, even if it's "not a container ID".
> >
> > That's not quite right. Audit doesn't care about what a container is,
> > or is not, it also doesn't care if the "audit container ID" actually
> > matches the ID used by the container engine in userspace and I think
> > that is a very important line to draw. Audit is simply given a value
> > which it calls the "audit container ID", it ensures that the value is
> > inherited appropriately (e.g. children inherit their parent's audit
> > container ID), and it uses the value in audit records to provide some
> > additional context for log analysis. The distinction isn't limited to
> > the value itself, but also to how it is used; it is an "audit
> > container ID" and not a "container ID" because this value is
> > exclusively for use by the audit subsystem. We are very intentionally
> > not adding a generic container ID to the kernel. If the kernel does
> > ever grow a general purpose container ID we will be one of the first
> > ones in line to make use of it, but we are not going to be the ones to
> > generically add containers to the kernel. Enough people already hate
> > audit ;)
> >
> > > > I'm not interested in supporting/merging something that isn't useful;
> > > > if this doesn't work for your use case then we need to figure out what
> > > > would work. It sounds like nested containers are much more common in
> > > > the lxc world, can you elaborate a bit more on this?
> > > >
> > > > As far as the possible solutions you mention above, I'm not sure I
> > > > like the per-userns audit container IDs, I'd much rather just emit the
> > > > necessary tracking information via the audit record stream and let the
> > > > log analysis tools figure it out. However, the bigger question is how
> > > > to limit (re)setting the audit container ID when you are in a non-init
> > > > userns. For reasons already mentioned, using capable() is a non
> > > > starter for everything but the initial userns, and using ns_capable()
> > > > is equally poor as it essentially allows any userns the ability to
> > > > munge it's audit container ID (obviously not good). It appears we
> > > > need a different method for controlling access to the audit container
> > > > ID.
> > >
> > > One option would be to make it a string, and have it be append only.
> > > That should be safe with no checks.
> > >
> > > I know there was a long thread about what type to make this thing. I
> > > think you could accomplish the append-only-ness with a u64 if you had
> > > some rule about only allowing setting lower order bits than those that
> > > are already set. With 4 bits for simplicity:
> > >
> > > 1100 # initial container id
> > > 1100 -> 1011 # not allowed
> > > 1100 -> 1101 # allowed, but now 1101 is set in stone since there are
> > > # no lower order bits left
> > >
> > > There are probably fancier ways to do it if you actually understand
> > > math :)
> >
> > ;)
> >
> > > Since userns nesting is limited to 32 levels (right now, IIRC), and
> > > you have 64 bits, this might be reasonable. You could just teach
> > > container engines to use the first say N bits for themselves, with a 1
> > > bit for the barrier at the end.
> >
> > I like the creativity, but I worry that at some point these
> > limitations are going to be raised (limits have a funny way of doing
> > that over time) and we will be in trouble. I say "trouble" because I
> > want to be able to quickly do an audit container ID comparison and
> > we're going to pay a penalty for these larger values (we'll need this
> > when we add multiple auditd support and the requisite record routing).
> >
> > Thinking about this makes me also realize we probably need to think a
> > bit longer about audit container ID conflicts between orchestrators.
> > Right now we just take the value that is given to us by the
> > orchestrator, but if we want to allow multiple container orchestrators
> > to work without some form of cooperation in userspace (I think we have
> > to assume the orchestrators will not talk to each other) we likely
> > need to have some way to block reuse of an audit container ID. We
> > would either need to prevent the orchestrator from explicitly setting
> > an audit container ID to a currently in use value, or instead generate
> > the audit container ID in the kernel upon an event triggered by the
> > orchestrator (e.g. a write to a /proc file). I suspect we should
> > start looking at the idr code, I think we will need to make use of it.
>
> My first reaction to using the IDR code is that once an idr is given up,
> it can be reused. I suppose we request IDRs and then never give them up
> to avoid reuse...

I'm not sure we ever what to guarantee that an audit container ID
won't be reused during the lifetime of the system, it is a discrete
integer after all. What I think we do want to guarantee is that we
won't allow an unintentional audit container ID collision between
different orchestrators; if a single orchestrator wants to reuse an
audit container ID then that is its choice.

> I already had some ideas of preventing an existing ID from being reused,

Cool. I only made the idr suggestion since it is used for PIDs and
solves a very similar problem.

> but that makes the practice of some container engines injecting
> processes into existing containers difficult if not impossible.

Yes, we'll need some provision to indicate which orchestrator
"controls" that particular audit container ID, and allow that
orchestrator to reuse that particular audit container ID (until all
those containers disappear and the audit container ID is given back to
the pool).

--
paul moore
http://www.paul-moore.com

2019-06-03 20:27:07

by Steve Grubb

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

Hello Paul,

I am curious about this. We seemed to be close to getting this patch pulled
in. A lot of people are waiting for it. Can you summarize what you think the
patches need and who we think needs to do it? I'm lost. Does LXC people need
to propose something? Does Richard? Someone else? Who's got the ball?

Thank,
-Steve

On Friday, May 31, 2019 8:44:45 AM EDT Paul Moore wrote:
> On Thu, May 30, 2019 at 8:21 PM Richard Guy Briggs <[email protected]> wrote:
> > On 2019-05-30 19:26, Paul Moore wrote:
> > > On Thu, May 30, 2019 at 5:29 PM Tycho Andersen <[email protected]> wrote:
> > > > On Thu, May 30, 2019 at 03:29:32PM -0400, Paul Moore wrote:
> > > > > [REMINDER: It is an "*audit* container ID" and not a general
> > > > > "container ID" ;) Smiley aside, I'm not kidding about that part.]
> > > >
> > > > This sort of seems like a distinction without a difference;
> > > > presumably
> > > > audit is going to want to differentiate between everything that
> > > > people
> > > > in userspace call a container. So you'll have to support all this
> > > > insanity anyway, even if it's "not a container ID".
> > >
> > > That's not quite right. Audit doesn't care about what a container is,
> > > or is not, it also doesn't care if the "audit container ID" actually
> > > matches the ID used by the container engine in userspace and I think
> > > that is a very important line to draw. Audit is simply given a value
> > > which it calls the "audit container ID", it ensures that the value is
> > > inherited appropriately (e.g. children inherit their parent's audit
> > > container ID), and it uses the value in audit records to provide some
> > > additional context for log analysis. The distinction isn't limited to
> > > the value itself, but also to how it is used; it is an "audit
> > > container ID" and not a "container ID" because this value is
> > > exclusively for use by the audit subsystem. We are very intentionally
> > > not adding a generic container ID to the kernel. If the kernel does
> > > ever grow a general purpose container ID we will be one of the first
> > > ones in line to make use of it, but we are not going to be the ones to
> > > generically add containers to the kernel. Enough people already hate
> > > audit ;)
> > >
> > > > > I'm not interested in supporting/merging something that isn't
> > > > > useful;
> > > > > if this doesn't work for your use case then we need to figure out
> > > > > what
> > > > > would work. It sounds like nested containers are much more common
> > > > > in
> > > > > the lxc world, can you elaborate a bit more on this?
> > > > >
> > > > > As far as the possible solutions you mention above, I'm not sure I
> > > > > like the per-userns audit container IDs, I'd much rather just emit
> > > > > the
> > > > > necessary tracking information via the audit record stream and let
> > > > > the
> > > > > log analysis tools figure it out. However, the bigger question is
> > > > > how
> > > > > to limit (re)setting the audit container ID when you are in a
> > > > > non-init
> > > > > userns. For reasons already mentioned, using capable() is a non
> > > > > starter for everything but the initial userns, and using
> > > > > ns_capable()
> > > > > is equally poor as it essentially allows any userns the ability to
> > > > > munge it's audit container ID (obviously not good). It appears we
> > > > > need a different method for controlling access to the audit
> > > > > container
> > > > > ID.
> > > >
> > > > One option would be to make it a string, and have it be append only.
> > > > That should be safe with no checks.
> > > >
> > > > I know there was a long thread about what type to make this thing. I
> > > > think you could accomplish the append-only-ness with a u64 if you had
> > > > some rule about only allowing setting lower order bits than those
> > > > that
> > > > are already set. With 4 bits for simplicity:
> > > >
> > > > 1100 # initial container id
> > > > 1100 -> 1011 # not allowed
> > > > 1100 -> 1101 # allowed, but now 1101 is set in stone since there are
> > > >
> > > > # no lower order bits left
> > > >
> > > > There are probably fancier ways to do it if you actually understand
> > > > math :)
> > >
> > > ;)
> > >
> > > > Since userns nesting is limited to 32 levels (right now, IIRC), and
> > > > you have 64 bits, this might be reasonable. You could just teach
> > > > container engines to use the first say N bits for themselves, with a
> > > > 1
> > > > bit for the barrier at the end.
> > >
> > > I like the creativity, but I worry that at some point these
> > > limitations are going to be raised (limits have a funny way of doing
> > > that over time) and we will be in trouble. I say "trouble" because I
> > > want to be able to quickly do an audit container ID comparison and
> > > we're going to pay a penalty for these larger values (we'll need this
> > > when we add multiple auditd support and the requisite record routing).
> > >
> > > Thinking about this makes me also realize we probably need to think a
> > > bit longer about audit container ID conflicts between orchestrators.
> > > Right now we just take the value that is given to us by the
> > > orchestrator, but if we want to allow multiple container orchestrators
> > > to work without some form of cooperation in userspace (I think we have
> > > to assume the orchestrators will not talk to each other) we likely
> > > need to have some way to block reuse of an audit container ID. We
> > > would either need to prevent the orchestrator from explicitly setting
> > > an audit container ID to a currently in use value, or instead generate
> > > the audit container ID in the kernel upon an event triggered by the
> > > orchestrator (e.g. a write to a /proc file). I suspect we should
> > > start looking at the idr code, I think we will need to make use of it.
> >
> > My first reaction to using the IDR code is that once an idr is given up,
> > it can be reused. I suppose we request IDRs and then never give them up
> > to avoid reuse...
>
> I'm not sure we ever what to guarantee that an audit container ID
> won't be reused during the lifetime of the system, it is a discrete
> integer after all. What I think we do want to guarantee is that we
> won't allow an unintentional audit container ID collision between
> different orchestrators; if a single orchestrator wants to reuse an
> audit container ID then that is its choice.
>
> > I already had some ideas of preventing an existing ID from being reused,
>
> Cool. I only made the idr suggestion since it is used for PIDs and
> solves a very similar problem.
>
> > but that makes the practice of some container engines injecting
> > processes into existing containers difficult if not impossible.
>
> Yes, we'll need some provision to indicate which orchestrator
> "controls" that particular audit container ID, and allow that
> orchestrator to reuse that particular audit container ID (until all
> those containers disappear and the audit container ID is given back to
> the pool).




2019-06-18 22:13:17

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Mon, Jun 3, 2019 at 4:24 PM Steve Grubb <[email protected]> wrote:
> Hello Paul,
>
> I am curious about this. We seemed to be close to getting this patch pulled
> in. A lot of people are waiting for it. Can you summarize what you think the
> patches need and who we think needs to do it? I'm lost. Does LXC people need
> to propose something? Does Richard? Someone else? Who's got the ball?

[My apologies, this was lost in my inbox and I just not noticed it.]

Please don't top post on things sent to the mailing lists Steve, you
know better than that.

Yes, things were moving along well, but upon talking with the LXC
folks it appears we underestimated the importance of nested
orchestrators. I suspect my reply to Dan on the 4th covered your
questions, if you didn't see it, here is the relevant snippet:

"To be clear, that's where we are at: we need to figure out what the
kernel API would look like to support nested container orchestrators.
My gut feeling is that this isn't going to be terribly complicated
compared to the rest of the audit container ID work, but it is going
to be some work. We had a discussion about some potential solutions
in the cover letter and it sounds like Richard is working up some
ideas now, let's wait to see what that looks like."

... and that is where we are at. I'm looking forward to seeing
Richard's next patchset.

> On Friday, May 31, 2019 8:44:45 AM EDT Paul Moore wrote:
> > On Thu, May 30, 2019 at 8:21 PM Richard Guy Briggs <[email protected]> wrote:
> > > On 2019-05-30 19:26, Paul Moore wrote:
> > > > On Thu, May 30, 2019 at 5:29 PM Tycho Andersen <[email protected]> wrote:
> > > > > On Thu, May 30, 2019 at 03:29:32PM -0400, Paul Moore wrote:
> > > > > > [REMINDER: It is an "*audit* container ID" and not a general
> > > > > > "container ID" ;) Smiley aside, I'm not kidding about that part.]
> > > > >
> > > > > This sort of seems like a distinction without a difference;
> > > > > presumably
> > > > > audit is going to want to differentiate between everything that
> > > > > people
> > > > > in userspace call a container. So you'll have to support all this
> > > > > insanity anyway, even if it's "not a container ID".
> > > >
> > > > That's not quite right. Audit doesn't care about what a container is,
> > > > or is not, it also doesn't care if the "audit container ID" actually
> > > > matches the ID used by the container engine in userspace and I think
> > > > that is a very important line to draw. Audit is simply given a value
> > > > which it calls the "audit container ID", it ensures that the value is
> > > > inherited appropriately (e.g. children inherit their parent's audit
> > > > container ID), and it uses the value in audit records to provide some
> > > > additional context for log analysis. The distinction isn't limited to
> > > > the value itself, but also to how it is used; it is an "audit
> > > > container ID" and not a "container ID" because this value is
> > > > exclusively for use by the audit subsystem. We are very intentionally
> > > > not adding a generic container ID to the kernel. If the kernel does
> > > > ever grow a general purpose container ID we will be one of the first
> > > > ones in line to make use of it, but we are not going to be the ones to
> > > > generically add containers to the kernel. Enough people already hate
> > > > audit ;)
> > > >
> > > > > > I'm not interested in supporting/merging something that isn't
> > > > > > useful;
> > > > > > if this doesn't work for your use case then we need to figure out
> > > > > > what
> > > > > > would work. It sounds like nested containers are much more common
> > > > > > in
> > > > > > the lxc world, can you elaborate a bit more on this?
> > > > > >
> > > > > > As far as the possible solutions you mention above, I'm not sure I
> > > > > > like the per-userns audit container IDs, I'd much rather just emit
> > > > > > the
> > > > > > necessary tracking information via the audit record stream and let
> > > > > > the
> > > > > > log analysis tools figure it out. However, the bigger question is
> > > > > > how
> > > > > > to limit (re)setting the audit container ID when you are in a
> > > > > > non-init
> > > > > > userns. For reasons already mentioned, using capable() is a non
> > > > > > starter for everything but the initial userns, and using
> > > > > > ns_capable()
> > > > > > is equally poor as it essentially allows any userns the ability to
> > > > > > munge it's audit container ID (obviously not good). It appears we
> > > > > > need a different method for controlling access to the audit
> > > > > > container
> > > > > > ID.
> > > > >
> > > > > One option would be to make it a string, and have it be append only.
> > > > > That should be safe with no checks.
> > > > >
> > > > > I know there was a long thread about what type to make this thing. I
> > > > > think you could accomplish the append-only-ness with a u64 if you had
> > > > > some rule about only allowing setting lower order bits than those
> > > > > that
> > > > > are already set. With 4 bits for simplicity:
> > > > >
> > > > > 1100 # initial container id
> > > > > 1100 -> 1011 # not allowed
> > > > > 1100 -> 1101 # allowed, but now 1101 is set in stone since there are
> > > > >
> > > > > # no lower order bits left
> > > > >
> > > > > There are probably fancier ways to do it if you actually understand
> > > > > math :)
> > > >
> > > > ;)
> > > >
> > > > > Since userns nesting is limited to 32 levels (right now, IIRC), and
> > > > > you have 64 bits, this might be reasonable. You could just teach
> > > > > container engines to use the first say N bits for themselves, with a
> > > > > 1
> > > > > bit for the barrier at the end.
> > > >
> > > > I like the creativity, but I worry that at some point these
> > > > limitations are going to be raised (limits have a funny way of doing
> > > > that over time) and we will be in trouble. I say "trouble" because I
> > > > want to be able to quickly do an audit container ID comparison and
> > > > we're going to pay a penalty for these larger values (we'll need this
> > > > when we add multiple auditd support and the requisite record routing).
> > > >
> > > > Thinking about this makes me also realize we probably need to think a
> > > > bit longer about audit container ID conflicts between orchestrators.
> > > > Right now we just take the value that is given to us by the
> > > > orchestrator, but if we want to allow multiple container orchestrators
> > > > to work without some form of cooperation in userspace (I think we have
> > > > to assume the orchestrators will not talk to each other) we likely
> > > > need to have some way to block reuse of an audit container ID. We
> > > > would either need to prevent the orchestrator from explicitly setting
> > > > an audit container ID to a currently in use value, or instead generate
> > > > the audit container ID in the kernel upon an event triggered by the
> > > > orchestrator (e.g. a write to a /proc file). I suspect we should
> > > > start looking at the idr code, I think we will need to make use of it.
> > >
> > > My first reaction to using the IDR code is that once an idr is given up,
> > > it can be reused. I suppose we request IDRs and then never give them up
> > > to avoid reuse...
> >
> > I'm not sure we ever what to guarantee that an audit container ID
> > won't be reused during the lifetime of the system, it is a discrete
> > integer after all. What I think we do want to guarantee is that we
> > won't allow an unintentional audit container ID collision between
> > different orchestrators; if a single orchestrator wants to reuse an
> > audit container ID then that is its choice.
> >
> > > I already had some ideas of preventing an existing ID from being reused,
> >
> > Cool. I only made the idr suggestion since it is used for PIDs and
> > solves a very similar problem.
> >
> > > but that makes the practice of some container engines injecting
> > > processes into existing containers difficult if not impossible.
> >
> > Yes, we'll need some provision to indicate which orchestrator
> > "controls" that particular audit container ID, and allow that
> > orchestrator to reuse that particular audit container ID (until all
> > those containers disappear and the audit container ID is given back to
> > the pool).
>
>
>
>


--
paul moore
http://www.paul-moore.com

2019-06-18 22:47:30

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On 2019-06-18 18:12, Paul Moore wrote:
> On Mon, Jun 3, 2019 at 4:24 PM Steve Grubb <[email protected]> wrote:
> > Hello Paul,
> >
> > I am curious about this. We seemed to be close to getting this patch pulled
> > in. A lot of people are waiting for it. Can you summarize what you think the
> > patches need and who we think needs to do it? I'm lost. Does LXC people need
> > to propose something? Does Richard? Someone else? Who's got the ball?
>
> [My apologies, this was lost in my inbox and I just not noticed it.]
>
> Please don't top post on things sent to the mailing lists Steve, you
> know better than that.
>
> Yes, things were moving along well, but upon talking with the LXC
> folks it appears we underestimated the importance of nested
> orchestrators. I suspect my reply to Dan on the 4th covered your
> questions, if you didn't see it, here is the relevant snippet:
>
> "To be clear, that's where we are at: we need to figure out what the
> kernel API would look like to support nested container orchestrators.
> My gut feeling is that this isn't going to be terribly complicated
> compared to the rest of the audit container ID work, but it is going
> to be some work. We had a discussion about some potential solutions
> in the cover letter and it sounds like Richard is working up some
> ideas now, let's wait to see what that looks like."
>
> ... and that is where we are at. I'm looking forward to seeing
> Richard's next patchset.

I've rebased everything and am trying out some code to see if it will
address the concerns raised... There will be more overhead on contid
write, and a tiny bit more for normal operations...

> > On Friday, May 31, 2019 8:44:45 AM EDT Paul Moore wrote:
> > > On Thu, May 30, 2019 at 8:21 PM Richard Guy Briggs <[email protected]> wrote:
> > > > On 2019-05-30 19:26, Paul Moore wrote:
> > > > > On Thu, May 30, 2019 at 5:29 PM Tycho Andersen <[email protected]> wrote:
> > > > > > On Thu, May 30, 2019 at 03:29:32PM -0400, Paul Moore wrote:
> > > > > > > [REMINDER: It is an "*audit* container ID" and not a general
> > > > > > > "container ID" ;) Smiley aside, I'm not kidding about that part.]
> > > > > >
> > > > > > This sort of seems like a distinction without a difference;
> > > > > > presumably
> > > > > > audit is going to want to differentiate between everything that
> > > > > > people
> > > > > > in userspace call a container. So you'll have to support all this
> > > > > > insanity anyway, even if it's "not a container ID".
> > > > >
> > > > > That's not quite right. Audit doesn't care about what a container is,
> > > > > or is not, it also doesn't care if the "audit container ID" actually
> > > > > matches the ID used by the container engine in userspace and I think
> > > > > that is a very important line to draw. Audit is simply given a value
> > > > > which it calls the "audit container ID", it ensures that the value is
> > > > > inherited appropriately (e.g. children inherit their parent's audit
> > > > > container ID), and it uses the value in audit records to provide some
> > > > > additional context for log analysis. The distinction isn't limited to
> > > > > the value itself, but also to how it is used; it is an "audit
> > > > > container ID" and not a "container ID" because this value is
> > > > > exclusively for use by the audit subsystem. We are very intentionally
> > > > > not adding a generic container ID to the kernel. If the kernel does
> > > > > ever grow a general purpose container ID we will be one of the first
> > > > > ones in line to make use of it, but we are not going to be the ones to
> > > > > generically add containers to the kernel. Enough people already hate
> > > > > audit ;)
> > > > >
> > > > > > > I'm not interested in supporting/merging something that isn't
> > > > > > > useful;
> > > > > > > if this doesn't work for your use case then we need to figure out
> > > > > > > what
> > > > > > > would work. It sounds like nested containers are much more common
> > > > > > > in
> > > > > > > the lxc world, can you elaborate a bit more on this?
> > > > > > >
> > > > > > > As far as the possible solutions you mention above, I'm not sure I
> > > > > > > like the per-userns audit container IDs, I'd much rather just emit
> > > > > > > the
> > > > > > > necessary tracking information via the audit record stream and let
> > > > > > > the
> > > > > > > log analysis tools figure it out. However, the bigger question is
> > > > > > > how
> > > > > > > to limit (re)setting the audit container ID when you are in a
> > > > > > > non-init
> > > > > > > userns. For reasons already mentioned, using capable() is a non
> > > > > > > starter for everything but the initial userns, and using
> > > > > > > ns_capable()
> > > > > > > is equally poor as it essentially allows any userns the ability to
> > > > > > > munge it's audit container ID (obviously not good). It appears we
> > > > > > > need a different method for controlling access to the audit
> > > > > > > container
> > > > > > > ID.
> > > > > >
> > > > > > One option would be to make it a string, and have it be append only.
> > > > > > That should be safe with no checks.
> > > > > >
> > > > > > I know there was a long thread about what type to make this thing. I
> > > > > > think you could accomplish the append-only-ness with a u64 if you had
> > > > > > some rule about only allowing setting lower order bits than those
> > > > > > that
> > > > > > are already set. With 4 bits for simplicity:
> > > > > >
> > > > > > 1100 # initial container id
> > > > > > 1100 -> 1011 # not allowed
> > > > > > 1100 -> 1101 # allowed, but now 1101 is set in stone since there are
> > > > > >
> > > > > > # no lower order bits left
> > > > > >
> > > > > > There are probably fancier ways to do it if you actually understand
> > > > > > math :)
> > > > >
> > > > > ;)
> > > > >
> > > > > > Since userns nesting is limited to 32 levels (right now, IIRC), and
> > > > > > you have 64 bits, this might be reasonable. You could just teach
> > > > > > container engines to use the first say N bits for themselves, with a
> > > > > > 1
> > > > > > bit for the barrier at the end.
> > > > >
> > > > > I like the creativity, but I worry that at some point these
> > > > > limitations are going to be raised (limits have a funny way of doing
> > > > > that over time) and we will be in trouble. I say "trouble" because I
> > > > > want to be able to quickly do an audit container ID comparison and
> > > > > we're going to pay a penalty for these larger values (we'll need this
> > > > > when we add multiple auditd support and the requisite record routing).
> > > > >
> > > > > Thinking about this makes me also realize we probably need to think a
> > > > > bit longer about audit container ID conflicts between orchestrators.
> > > > > Right now we just take the value that is given to us by the
> > > > > orchestrator, but if we want to allow multiple container orchestrators
> > > > > to work without some form of cooperation in userspace (I think we have
> > > > > to assume the orchestrators will not talk to each other) we likely
> > > > > need to have some way to block reuse of an audit container ID. We
> > > > > would either need to prevent the orchestrator from explicitly setting
> > > > > an audit container ID to a currently in use value, or instead generate
> > > > > the audit container ID in the kernel upon an event triggered by the
> > > > > orchestrator (e.g. a write to a /proc file). I suspect we should
> > > > > start looking at the idr code, I think we will need to make use of it.
> > > >
> > > > My first reaction to using the IDR code is that once an idr is given up,
> > > > it can be reused. I suppose we request IDRs and then never give them up
> > > > to avoid reuse...
> > >
> > > I'm not sure we ever what to guarantee that an audit container ID
> > > won't be reused during the lifetime of the system, it is a discrete
> > > integer after all. What I think we do want to guarantee is that we
> > > won't allow an unintentional audit container ID collision between
> > > different orchestrators; if a single orchestrator wants to reuse an
> > > audit container ID then that is its choice.
> > >
> > > > I already had some ideas of preventing an existing ID from being reused,
> > >
> > > Cool. I only made the idr suggestion since it is used for PIDs and
> > > solves a very similar problem.
> > >
> > > > but that makes the practice of some container engines injecting
> > > > processes into existing containers difficult if not impossible.
> > >
> > > Yes, we'll need some provision to indicate which orchestrator
> > > "controls" that particular audit container ID, and allow that
> > > orchestrator to reuse that particular audit container ID (until all
> > > those containers disappear and the audit container ID is given back to
> > > the pool).
>
> paul moore

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-07-08 22:45:34

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On 2019-05-29 11:29, Paul Moore wrote:
> On Wed, May 29, 2019 at 10:57 AM Tycho Andersen <[email protected]> wrote:
> >
> > On Mon, Apr 08, 2019 at 11:39:09PM -0400, Richard Guy Briggs wrote:
> > > It is not permitted to unset the audit container identifier.
> > > A child inherits its parent's audit container identifier.
> >
> > ...
> >
> > > /**
> > > + * audit_set_contid - set current task's audit contid
> > > + * @contid: contid value
> > > + *
> > > + * Returns 0 on success, -EPERM on permission failure.
> > > + *
> > > + * Called (set) from fs/proc/base.c::proc_contid_write().
> > > + */
> > > +int audit_set_contid(struct task_struct *task, u64 contid)
> > > +{
> > > + u64 oldcontid;
> > > + int rc = 0;
> > > + struct audit_buffer *ab;
> > > + uid_t uid;
> > > + struct tty_struct *tty;
> > > + char comm[sizeof(current->comm)];
> > > +
> > > + task_lock(task);
> > > + /* Can't set if audit disabled */
> > > + if (!task->audit) {
> > > + task_unlock(task);
> > > + return -ENOPROTOOPT;
> > > + }
> > > + oldcontid = audit_get_contid(task);
> > > + read_lock(&tasklist_lock);
> > > + /* Don't allow the audit containerid to be unset */
> > > + if (!audit_contid_valid(contid))
> > > + rc = -EINVAL;
> > > + /* if we don't have caps, reject */
> > > + else if (!capable(CAP_AUDIT_CONTROL))
> > > + rc = -EPERM;
> > > + /* if task has children or is not single-threaded, deny */
> > > + else if (!list_empty(&task->children))
> > > + rc = -EBUSY;
> > > + else if (!(thread_group_leader(task) && thread_group_empty(task)))
> > > + rc = -EALREADY;
> > > + read_unlock(&tasklist_lock);
> > > + if (!rc)
> > > + task->audit->contid = contid;
> > > + task_unlock(task);
> > > +
> > > + if (!audit_enabled)
> > > + return rc;
> >
> > ...but it is allowed to change it (assuming
> > capable(CAP_AUDIT_CONTROL), of course)? Seems like this might be more
> > immediately useful since we still live in the world of majority
> > privileged containers if we didn't allow changing it, in addition to
> > un-setting it.
>
> The idea is that only container orchestrators should be able to
> set/modify the audit container ID, and since setting the audit
> container ID can have a significant effect on the records captured
> (and their routing to multiple daemons when we get there) modifying
> the audit container ID is akin to modifying the audit configuration
> which is why it is gated by CAP_AUDIT_CONTROL. The current thinking
> is that you would only change the audit container ID from one
> set/inherited value to another if you were nesting containers, in
> which case the nested container orchestrator would need to be granted
> CAP_AUDIT_CONTROL (which everyone to date seems to agree is a workable
> compromise). We did consider allowing for a chain of nested audit
> container IDs, but the implications of doing so are significant
> (implementation mess, runtime cost, etc.) so we are leaving that out
> of this effort.

We had previously discussed the idea of restricting
orchestrators/engines from only being able to set the audit container
identifier on their own descendants, but it was discarded. I've added a
check to ensure this is now enforced.

I've also added a check to ensure that a process can't set its own audit
container identifier and that if the identifier is already set, then the
orchestrator/engine must be in a descendant user namespace from the
orchestrator that set the previously inherited audit container
identifier.

> From a practical perspective, un-setting the audit container ID is
> pretty much the same as changing it from one set value to another so
> most of the above applies to that case as well.
>
> --
> paul moore
> http://www.paul-moore.com

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-07-08 22:46:08

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On 2019-05-30 19:26, Paul Moore wrote:
> On Thu, May 30, 2019 at 5:29 PM Tycho Andersen <[email protected]> wrote:
> > On Thu, May 30, 2019 at 03:29:32PM -0400, Paul Moore wrote:
> > >
> > > [REMINDER: It is an "*audit* container ID" and not a general
> > > "container ID" ;) Smiley aside, I'm not kidding about that part.]
> >
> > This sort of seems like a distinction without a difference; presumably
> > audit is going to want to differentiate between everything that people
> > in userspace call a container. So you'll have to support all this
> > insanity anyway, even if it's "not a container ID".
>
> That's not quite right. Audit doesn't care about what a container is,
> or is not, it also doesn't care if the "audit container ID" actually
> matches the ID used by the container engine in userspace and I think
> that is a very important line to draw. Audit is simply given a value
> which it calls the "audit container ID", it ensures that the value is
> inherited appropriately (e.g. children inherit their parent's audit
> container ID), and it uses the value in audit records to provide some
> additional context for log analysis. The distinction isn't limited to
> the value itself, but also to how it is used; it is an "audit
> container ID" and not a "container ID" because this value is
> exclusively for use by the audit subsystem. We are very intentionally
> not adding a generic container ID to the kernel. If the kernel does
> ever grow a general purpose container ID we will be one of the first
> ones in line to make use of it, but we are not going to be the ones to
> generically add containers to the kernel. Enough people already hate
> audit ;)
>
> > > I'm not interested in supporting/merging something that isn't useful;
> > > if this doesn't work for your use case then we need to figure out what
> > > would work. It sounds like nested containers are much more common in
> > > the lxc world, can you elaborate a bit more on this?
> > >
> > > As far as the possible solutions you mention above, I'm not sure I
> > > like the per-userns audit container IDs, I'd much rather just emit the
> > > necessary tracking information via the audit record stream and let the
> > > log analysis tools figure it out. However, the bigger question is how
> > > to limit (re)setting the audit container ID when you are in a non-init
> > > userns. For reasons already mentioned, using capable() is a non
> > > starter for everything but the initial userns, and using ns_capable()
> > > is equally poor as it essentially allows any userns the ability to
> > > munge it's audit container ID (obviously not good). It appears we
> > > need a different method for controlling access to the audit container
> > > ID.
> >
> > One option would be to make it a string, and have it be append only.
> > That should be safe with no checks.
> >
> > I know there was a long thread about what type to make this thing. I
> > think you could accomplish the append-only-ness with a u64 if you had
> > some rule about only allowing setting lower order bits than those that
> > are already set. With 4 bits for simplicity:
> >
> > 1100 # initial container id
> > 1100 -> 1011 # not allowed
> > 1100 -> 1101 # allowed, but now 1101 is set in stone since there are
> > # no lower order bits left
> >
> > There are probably fancier ways to do it if you actually understand
> > math :)
>
> ;)
>
> > Since userns nesting is limited to 32 levels (right now, IIRC), and
> > you have 64 bits, this might be reasonable. You could just teach
> > container engines to use the first say N bits for themselves, with a 1
> > bit for the barrier at the end.
>
> I like the creativity, but I worry that at some point these
> limitations are going to be raised (limits have a funny way of doing
> that over time) and we will be in trouble. I say "trouble" because I
> want to be able to quickly do an audit container ID comparison and
> we're going to pay a penalty for these larger values (we'll need this
> when we add multiple auditd support and the requisite record routing).
>
> Thinking about this makes me also realize we probably need to think a
> bit longer about audit container ID conflicts between orchestrators.
> Right now we just take the value that is given to us by the
> orchestrator, but if we want to allow multiple container orchestrators
> to work without some form of cooperation in userspace (I think we have
> to assume the orchestrators will not talk to each other) we likely
> need to have some way to block reuse of an audit container ID. We
> would either need to prevent the orchestrator from explicitly setting
> an audit container ID to a currently in use value, or instead generate
> the audit container ID in the kernel upon an event triggered by the
> orchestrator (e.g. a write to a /proc file). I suspect we should
> start looking at the idr code, I think we will need to make use of it.

To address this, I'd suggest that it is enforced to only allow the
setting of descendants and to maintain a master list of audit container
identifiers (with a hash table if necessary later) that includes the
container owner.

This also allows the orchestrator/engine to inject processes into
existing containers by checking that the audit container identifier is
only used again by the same owner.

I have working code for both.

> paul moore
> http://www.paul-moore.com

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-07-08 22:46:29

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On 2019-05-30 15:29, Paul Moore wrote:
> On Thu, May 30, 2019 at 1:09 PM Serge E. Hallyn <[email protected]> wrote:
> > On Wed, May 29, 2019 at 06:39:48PM -0400, Paul Moore wrote:
> > > On Wed, May 29, 2019 at 6:28 PM Tycho Andersen <[email protected]> wrote:
> > > > On Wed, May 29, 2019 at 12:03:58PM -0400, Paul Moore wrote:
> > > > > On Wed, May 29, 2019 at 11:34 AM Tycho Andersen <[email protected]> wrote:
> > > > > > On Wed, May 29, 2019 at 11:29:05AM -0400, Paul Moore wrote:
> > > > > > > On Wed, May 29, 2019 at 10:57 AM Tycho Andersen <[email protected]> wrote:
> > > > > > > > On Mon, Apr 08, 2019 at 11:39:09PM -0400, Richard Guy Briggs wrote:
>
> ...
>
> > > > > > > The current thinking
> > > > > > > is that you would only change the audit container ID from one
> > > > > > > set/inherited value to another if you were nesting containers, in
> > > > > > > which case the nested container orchestrator would need to be granted
> > > > > > > CAP_AUDIT_CONTROL (which everyone to date seems to agree is a workable
> > > > > > > compromise).
> > > >
> > > > won't work in user namespaced containers, because they will never be
> > > > capable(CAP_AUDIT_CONTROL); so I don't think this will work for
> > > > nesting as is. But maybe nobody cares :)
> > >
> > > That's fun :)
> > >
> > > To be honest, I've never been a big fan of supporting nested
> > > containers from an audit perspective, so I'm not really too upset
> > > about this. The k8s/cri-o folks seem okay with this, or at least I
> > > haven't heard any objections; lxc folks, what do you have to say?
> >
> > I actually thought the answer to this (when last I looked, "some time" ago)
> > was that userspace should track an audit message saying "task X in
> > container Y is changing its auditid to Z", and then decide to also track Z.
> > This should be doable, but a lot of extra work in userspace.
> >
> > Per-userns containerids would also work. So task X1 is in containerid
> > 1 on the host and creates a new task Y in new userns; it continues to
> > be reported in init_user_ns as containerid 1 forever; but in its own
> > userns it can request to be known as some other containerid. Audit
> > socks would be per-userns, allowing root in a container to watch for
> > audit events in its own (and descendent) namespaces.
> >
> > But again I'm sure we've gone over all this in the last few years.
> >
> > I suppose we can look at this as a "first step", and talk about
> > making it user-ns-nestable later. But agreed it's not useful in a
> > lot of situations as is.
>
> [REMINDER: It is an "*audit* container ID" and not a general
> "container ID" ;) Smiley aside, I'm not kidding about that part.]
>
> I'm not interested in supporting/merging something that isn't useful;
> if this doesn't work for your use case then we need to figure out what
> would work. It sounds like nested containers are much more common in
> the lxc world, can you elaborate a bit more on this?
>
> As far as the possible solutions you mention above, I'm not sure I
> like the per-userns audit container IDs, I'd much rather just emit the
> necessary tracking information via the audit record stream and let the
> log analysis tools figure it out. However, the bigger question is how
> to limit (re)setting the audit container ID when you are in a non-init
> userns. For reasons already mentioned, using capable() is a non
> starter for everything but the initial userns, and using ns_capable()
> is equally poor as it essentially allows any userns the ability to
> munge it's audit container ID (obviously not good). It appears we
> need a different method for controlling access to the audit container
> ID.

We're not quite ready yet for multiple audit daemons and possibly not
yet for audit namespaces, but this is starting to look a lot like the
latter.

If we can't trust ns_capable() then why are we passing on
CAP_AUDIT_CONTROL? It is being passed down and not stripped purposely
by the orchestrator/engine. If ns_capable() isn't inherited how is it
gained otherwise? Can it be inserted by cotainer image? I think the
answer is "no". Either we trust ns_capable() or we have audit
namespaces (recommend based on user namespace) (or both).

At this point I would say we are at an impasse unless we trust
ns_capable() or we implement audit namespaces.

I don't think another mechanism to trust nested orchestrators/engines
will buy us anything.

Am I missing something?

> Punting this to a LSM hook is an obvious thing to do, and something we
> might want to do anyway, but currently audit doesn't rely on the LSM
> for proper/safe operation and I'm not sure I want to change that now.
>
> The next obvious thing is to create some sort of access control knob
> in audit itself. Perhaps an auditctl operation that would allow the
> administrator to specify which containers, via their corresponding
> audit container IDs, are allowed to change their audit container ID?
> The permission granting would need to be done in the init userns, but
> it would allow containers with a non-init userns the ability to change
> their audit container ID. We would probably still want a
> ns_capable(CAP_AUDIT_CONTROL) restriction in this case.

This auditctl knob of which you speak is an additional API, not changing
the existing proposed one.

> Does anyone else have any other ideas?
>
> --
> paul moore
> http://www.paul-moore.com

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-07-08 23:26:25

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On July 8, 2019 8:12:56 PM Richard Guy Briggs <[email protected]> wrote:

> On 2019-05-30 19:26, Paul Moore wrote:
>> On Thu, May 30, 2019 at 5:29 PM Tycho Andersen <[email protected]> wrote:
>>> On Thu, May 30, 2019 at 03:29:32PM -0400, Paul Moore wrote:
>>>>
>>>>
>>>> [REMINDER: It is an "*audit* container ID" and not a general
>>>> "container ID" ;) Smiley aside, I'm not kidding about that part.]
>>>
>>> This sort of seems like a distinction without a difference; presumably
>>> audit is going to want to differentiate between everything that people
>>> in userspace call a container. So you'll have to support all this
>>> insanity anyway, even if it's "not a container ID".
>>
>> That's not quite right. Audit doesn't care about what a container is,
>> or is not, it also doesn't care if the "audit container ID" actually
>> matches the ID used by the container engine in userspace and I think
>> that is a very important line to draw. Audit is simply given a value
>> which it calls the "audit container ID", it ensures that the value is
>> inherited appropriately (e.g. children inherit their parent's audit
>> container ID), and it uses the value in audit records to provide some
>> additional context for log analysis. The distinction isn't limited to
>> the value itself, but also to how it is used; it is an "audit
>> container ID" and not a "container ID" because this value is
>> exclusively for use by the audit subsystem. We are very intentionally
>> not adding a generic container ID to the kernel. If the kernel does
>> ever grow a general purpose container ID we will be one of the first
>> ones in line to make use of it, but we are not going to be the ones to
>> generically add containers to the kernel. Enough people already hate
>> audit ;)
>>
>>>> I'm not interested in supporting/merging something that isn't useful;
>>>> if this doesn't work for your use case then we need to figure out what
>>>> would work. It sounds like nested containers are much more common in
>>>> the lxc world, can you elaborate a bit more on this?
>>>>
>>>>
>>>> As far as the possible solutions you mention above, I'm not sure I
>>>> like the per-userns audit container IDs, I'd much rather just emit the
>>>> necessary tracking information via the audit record stream and let the
>>>> log analysis tools figure it out. However, the bigger question is how
>>>> to limit (re)setting the audit container ID when you are in a non-init
>>>> userns. For reasons already mentioned, using capable() is a non
>>>> starter for everything but the initial userns, and using ns_capable()
>>>> is equally poor as it essentially allows any userns the ability to
>>>> munge it's audit container ID (obviously not good). It appears we
>>>> need a different method for controlling access to the audit container
>>>> ID.
>>>
>>> One option would be to make it a string, and have it be append only.
>>> That should be safe with no checks.
>>>
>>> I know there was a long thread about what type to make this thing. I
>>> think you could accomplish the append-only-ness with a u64 if you had
>>> some rule about only allowing setting lower order bits than those that
>>> are already set. With 4 bits for simplicity:
>>>
>>> 1100 # initial container id
>>> 1100 -> 1011 # not allowed
>>> 1100 -> 1101 # allowed, but now 1101 is set in stone since there are
>>> # no lower order bits left
>>>
>>> There are probably fancier ways to do it if you actually understand
>>> math :)
>>
>> ;)
>>
>>> Since userns nesting is limited to 32 levels (right now, IIRC), and
>>> you have 64 bits, this might be reasonable. You could just teach
>>> container engines to use the first say N bits for themselves, with a 1
>>> bit for the barrier at the end.
>>
>> I like the creativity, but I worry that at some point these
>> limitations are going to be raised (limits have a funny way of doing
>> that over time) and we will be in trouble. I say "trouble" because I
>> want to be able to quickly do an audit container ID comparison and
>> we're going to pay a penalty for these larger values (we'll need this
>> when we add multiple auditd support and the requisite record routing).
>>
>> Thinking about this makes me also realize we probably need to think a
>> bit longer about audit container ID conflicts between orchestrators.
>> Right now we just take the value that is given to us by the
>> orchestrator, but if we want to allow multiple container orchestrators
>> to work without some form of cooperation in userspace (I think we have
>> to assume the orchestrators will not talk to each other) we likely
>> need to have some way to block reuse of an audit container ID. We
>> would either need to prevent the orchestrator from explicitly setting
>> an audit container ID to a currently in use value, or instead generate
>> the audit container ID in the kernel upon an event triggered by the
>> orchestrator (e.g. a write to a /proc file). I suspect we should
>> start looking at the idr code, I think we will need to make use of it.
>
> To address this, I'd suggest that it is enforced to only allow the
> setting of descendants and to maintain a master list of audit container
> identifiers (with a hash table if necessary later) that includes the
> container owner.
>
> This also allows the orchestrator/engine to inject processes into
> existing containers by checking that the audit container identifier is
> only used again by the same owner.
>
> I have working code for both.

Just a quick note that due to some holiday travel I'm not going to be able to adequately respond to your latest messages on this thread for at least another week, likely a bit more. I'm only checking mail to put out fires, and the audit container ID work tends to be something that starts them ;)

--
paul moore
http://www.paul-moore.com




2019-07-15 20:39:52

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Mon, Jul 8, 2019 at 1:51 PM Richard Guy Briggs <[email protected]> wrote:
> On 2019-05-29 11:29, Paul Moore wrote:

...

> > The idea is that only container orchestrators should be able to
> > set/modify the audit container ID, and since setting the audit
> > container ID can have a significant effect on the records captured
> > (and their routing to multiple daemons when we get there) modifying
> > the audit container ID is akin to modifying the audit configuration
> > which is why it is gated by CAP_AUDIT_CONTROL. The current thinking
> > is that you would only change the audit container ID from one
> > set/inherited value to another if you were nesting containers, in
> > which case the nested container orchestrator would need to be granted
> > CAP_AUDIT_CONTROL (which everyone to date seems to agree is a workable
> > compromise). We did consider allowing for a chain of nested audit
> > container IDs, but the implications of doing so are significant
> > (implementation mess, runtime cost, etc.) so we are leaving that out
> > of this effort.
>
> We had previously discussed the idea of restricting
> orchestrators/engines from only being able to set the audit container
> identifier on their own descendants, but it was discarded. I've added a
> check to ensure this is now enforced.

When we weren't allowing nested orchestrators it wasn't necessary, but
with the move to support nesting I believe this will be a requirement.
We might also need/want to restrict audit container ID changes if a
descendant is acting as a container orchestrator and managing one or
more audit container IDs; although I'm less certain of the need for
this.

> I've also added a check to ensure that a process can't set its own audit
> container identifier ...

What does this protect against, or what problem does this solve?
Considering how easy it is to fork/exec, it seems like this could be
trivially bypassed.

> ... and that if the identifier is already set, then the
> orchestrator/engine must be in a descendant user namespace from the
> orchestrator that set the previously inherited audit container
> identifier.

You lost me here ... although I don't like the idea of relying on X
namespace inheritance for a hard coded policy on setting the audit
container ID; we've worked hard to keep this independent of any
definition of a "container" and it would sadden me greatly if we had
to go back on that.

--
paul moore
http://www.paul-moore.com

2019-07-15 21:05:22

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Mon, Jul 8, 2019 at 2:06 PM Richard Guy Briggs <[email protected]> wrote:
> On 2019-05-30 15:29, Paul Moore wrote:

...

> > [REMINDER: It is an "*audit* container ID" and not a general
> > "container ID" ;) Smiley aside, I'm not kidding about that part.]
> >
> > I'm not interested in supporting/merging something that isn't useful;
> > if this doesn't work for your use case then we need to figure out what
> > would work. It sounds like nested containers are much more common in
> > the lxc world, can you elaborate a bit more on this?
> >
> > As far as the possible solutions you mention above, I'm not sure I
> > like the per-userns audit container IDs, I'd much rather just emit the
> > necessary tracking information via the audit record stream and let the
> > log analysis tools figure it out. However, the bigger question is how
> > to limit (re)setting the audit container ID when you are in a non-init
> > userns. For reasons already mentioned, using capable() is a non
> > starter for everything but the initial userns, and using ns_capable()
> > is equally poor as it essentially allows any userns the ability to
> > munge it's audit container ID (obviously not good). It appears we
> > need a different method for controlling access to the audit container
> > ID.
>
> We're not quite ready yet for multiple audit daemons and possibly not
> yet for audit namespaces, but this is starting to look a lot like the
> latter.

A few quick comments on audit namespaces: the audit container ID is
not envisioned as a new namespace (even in nested form) and neither do
I consider running multiple audit daemons to be a new namespace.

From my perspective we create namespaces to allow us to redefine a
global resource for some subset of the system, e.g. providing a unique
/tmp for some number of processes on the system. While it may be
tempting to think of the audit container ID as something we could
"namespace", especially when multiple audit daemons are concerned, in
some ways this would be counter productive; the audit container ID is
intended to be a global ID that can be used to associate audit event
records with a "container" where the "container" is defined by an
orchestrator outside the audit subsystem. The global nature of the
audit container ID allows us to maintain a sane(ish) view of the
system in the audit log, if we were to "namespace" the audit container
ID such that the value was no longer guaranteed to be unique
throughout the system, we would need to additionally track the audit
namespace along with the audit container ID which starts to border on
insanity IMHO.

> If we can't trust ns_capable() then why are we passing on
> CAP_AUDIT_CONTROL? It is being passed down and not stripped purposely
> by the orchestrator/engine. If ns_capable() isn't inherited how is it
> gained otherwise? Can it be inserted by cotainer image? I think the
> answer is "no". Either we trust ns_capable() or we have audit
> namespaces (recommend based on user namespace) (or both).

My thinking is that since ns_capable() checks the credentials with
respect to the current user namespace we can't rely on it to control
access since it would be possible for a privileged process running
inside an unprivileged container to manipulate the audit container ID
(containerized process has CAP_AUDIT_CONTROL, e.g. running as root in
the container, while the container itself does not).

> At this point I would say we are at an impasse unless we trust
> ns_capable() or we implement audit namespaces.

I'm not sure how we can trust ns_capable(), but if you can think of a
way I would love to hear it. I'm also not sure how namespacing audit
is helpful (see my above comments), but if you think it is please
explain.

> I don't think another mechanism to trust nested orchestrators/engines
> will buy us anything.
>
> Am I missing something?

Based on your questions/comments above it looks like your
understanding of ns_capable() does not match mine; if I'm thinking
about ns_capable() incorrectly, please educate me.

--
paul moore
http://www.paul-moore.com

2019-07-15 21:10:50

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Mon, Jul 8, 2019 at 2:12 PM Richard Guy Briggs <[email protected]> wrote:
> On 2019-05-30 19:26, Paul Moore wrote:

...

> > I like the creativity, but I worry that at some point these
> > limitations are going to be raised (limits have a funny way of doing
> > that over time) and we will be in trouble. I say "trouble" because I
> > want to be able to quickly do an audit container ID comparison and
> > we're going to pay a penalty for these larger values (we'll need this
> > when we add multiple auditd support and the requisite record routing).
> >
> > Thinking about this makes me also realize we probably need to think a
> > bit longer about audit container ID conflicts between orchestrators.
> > Right now we just take the value that is given to us by the
> > orchestrator, but if we want to allow multiple container orchestrators
> > to work without some form of cooperation in userspace (I think we have
> > to assume the orchestrators will not talk to each other) we likely
> > need to have some way to block reuse of an audit container ID. We
> > would either need to prevent the orchestrator from explicitly setting
> > an audit container ID to a currently in use value, or instead generate
> > the audit container ID in the kernel upon an event triggered by the
> > orchestrator (e.g. a write to a /proc file). I suspect we should
> > start looking at the idr code, I think we will need to make use of it.
>
> To address this, I'd suggest that it is enforced to only allow the
> setting of descendants and to maintain a master list of audit container
> identifiers (with a hash table if necessary later) that includes the
> container owner.

We're discussing the audit container ID management policy elsewhere in
this thread so I won't comment on that here, but I did want to say
that we will likely need something better than a simple list of audit
container IDs from the start. It's common for systems to have
thousands of containers now (or multiple thousands), which tells me
that a list is a poor choice. You mentioned a hash table, so I would
suggest starting with that over the list for the initial patchset.

--
paul moore
http://www.paul-moore.com

2019-07-16 15:38:19

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On 2019-07-15 17:09, Paul Moore wrote:
> On Mon, Jul 8, 2019 at 2:12 PM Richard Guy Briggs <[email protected]> wrote:
> > On 2019-05-30 19:26, Paul Moore wrote:
>
> ...
>
> > > I like the creativity, but I worry that at some point these
> > > limitations are going to be raised (limits have a funny way of doing
> > > that over time) and we will be in trouble. I say "trouble" because I
> > > want to be able to quickly do an audit container ID comparison and
> > > we're going to pay a penalty for these larger values (we'll need this
> > > when we add multiple auditd support and the requisite record routing).
> > >
> > > Thinking about this makes me also realize we probably need to think a
> > > bit longer about audit container ID conflicts between orchestrators.
> > > Right now we just take the value that is given to us by the
> > > orchestrator, but if we want to allow multiple container orchestrators
> > > to work without some form of cooperation in userspace (I think we have
> > > to assume the orchestrators will not talk to each other) we likely
> > > need to have some way to block reuse of an audit container ID. We
> > > would either need to prevent the orchestrator from explicitly setting
> > > an audit container ID to a currently in use value, or instead generate
> > > the audit container ID in the kernel upon an event triggered by the
> > > orchestrator (e.g. a write to a /proc file). I suspect we should
> > > start looking at the idr code, I think we will need to make use of it.
> >
> > To address this, I'd suggest that it is enforced to only allow the
> > setting of descendants and to maintain a master list of audit container
> > identifiers (with a hash table if necessary later) that includes the
> > container owner.
>
> We're discussing the audit container ID management policy elsewhere in
> this thread so I won't comment on that here, but I did want to say
> that we will likely need something better than a simple list of audit
> container IDs from the start. It's common for systems to have
> thousands of containers now (or multiple thousands), which tells me
> that a list is a poor choice. You mentioned a hash table, so I would
> suggest starting with that over the list for the initial patchset.

I saw that as an internal incremental improvement that did not affect
the API, so I wanted to keep things a bit simpler (as you've requested
in the past) to get this going, and add that enhancement later.

I'll start working on it now. The hash table would simply point to
lists anyways unless you can recommend a better approach.

> paul moore

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-07-16 16:09:42

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Tue, Jul 16, 2019 at 11:37 AM Richard Guy Briggs <[email protected]> wrote:
> On 2019-07-15 17:09, Paul Moore wrote:
> > On Mon, Jul 8, 2019 at 2:12 PM Richard Guy Briggs <[email protected]> wrote:
> > > On 2019-05-30 19:26, Paul Moore wrote:
> >
> > ...
> >
> > > > I like the creativity, but I worry that at some point these
> > > > limitations are going to be raised (limits have a funny way of doing
> > > > that over time) and we will be in trouble. I say "trouble" because I
> > > > want to be able to quickly do an audit container ID comparison and
> > > > we're going to pay a penalty for these larger values (we'll need this
> > > > when we add multiple auditd support and the requisite record routing).
> > > >
> > > > Thinking about this makes me also realize we probably need to think a
> > > > bit longer about audit container ID conflicts between orchestrators.
> > > > Right now we just take the value that is given to us by the
> > > > orchestrator, but if we want to allow multiple container orchestrators
> > > > to work without some form of cooperation in userspace (I think we have
> > > > to assume the orchestrators will not talk to each other) we likely
> > > > need to have some way to block reuse of an audit container ID. We
> > > > would either need to prevent the orchestrator from explicitly setting
> > > > an audit container ID to a currently in use value, or instead generate
> > > > the audit container ID in the kernel upon an event triggered by the
> > > > orchestrator (e.g. a write to a /proc file). I suspect we should
> > > > start looking at the idr code, I think we will need to make use of it.
> > >
> > > To address this, I'd suggest that it is enforced to only allow the
> > > setting of descendants and to maintain a master list of audit container
> > > identifiers (with a hash table if necessary later) that includes the
> > > container owner.
> >
> > We're discussing the audit container ID management policy elsewhere in
> > this thread so I won't comment on that here, but I did want to say
> > that we will likely need something better than a simple list of audit
> > container IDs from the start. It's common for systems to have
> > thousands of containers now (or multiple thousands), which tells me
> > that a list is a poor choice. You mentioned a hash table, so I would
> > suggest starting with that over the list for the initial patchset.
>
> I saw that as an internal incremental improvement that did not affect
> the API, so I wanted to keep things a bit simpler (as you've requested
> in the past) to get this going, and add that enhancement later.

In general a simple approach is a good way to start when the
problem/use-case is not very well understood; in other words, don't
spend a lot of time/effort optimizing something you don't yet
understand. In this case we know that people want to deploy a *lot*
of containers on a single system so we should design the data
structures appropriately. A list is simply not a good fit here, I
believe/hope you know that too.

> I'll start working on it now. The hash table would simply point to
> lists anyways unless you can recommend a better approach.

I assume when you say "point to lists" you are talking about using
lists for the hash buckets? If so, yes that should be fine at this
point. In general if the per-bucket lists become a bottleneck we can
look at the size of the table (or make it tunable) or even use a
different approach entirely. Ultimately the data store is an
implementation detail private to the audit subsystem in the kernel so
we should be able to change it as necessary without breaking anything.

--
paul moore
http://www.paul-moore.com

2019-07-16 16:27:07

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On 2019-07-16 12:08, Paul Moore wrote:
> On Tue, Jul 16, 2019 at 11:37 AM Richard Guy Briggs <[email protected]> wrote:
> > On 2019-07-15 17:09, Paul Moore wrote:
> > > On Mon, Jul 8, 2019 at 2:12 PM Richard Guy Briggs <[email protected]> wrote:
> > > > On 2019-05-30 19:26, Paul Moore wrote:
> > >
> > > ...
> > >
> > > > > I like the creativity, but I worry that at some point these
> > > > > limitations are going to be raised (limits have a funny way of doing
> > > > > that over time) and we will be in trouble. I say "trouble" because I
> > > > > want to be able to quickly do an audit container ID comparison and
> > > > > we're going to pay a penalty for these larger values (we'll need this
> > > > > when we add multiple auditd support and the requisite record routing).
> > > > >
> > > > > Thinking about this makes me also realize we probably need to think a
> > > > > bit longer about audit container ID conflicts between orchestrators.
> > > > > Right now we just take the value that is given to us by the
> > > > > orchestrator, but if we want to allow multiple container orchestrators
> > > > > to work without some form of cooperation in userspace (I think we have
> > > > > to assume the orchestrators will not talk to each other) we likely
> > > > > need to have some way to block reuse of an audit container ID. We
> > > > > would either need to prevent the orchestrator from explicitly setting
> > > > > an audit container ID to a currently in use value, or instead generate
> > > > > the audit container ID in the kernel upon an event triggered by the
> > > > > orchestrator (e.g. a write to a /proc file). I suspect we should
> > > > > start looking at the idr code, I think we will need to make use of it.
> > > >
> > > > To address this, I'd suggest that it is enforced to only allow the
> > > > setting of descendants and to maintain a master list of audit container
> > > > identifiers (with a hash table if necessary later) that includes the
> > > > container owner.
> > >
> > > We're discussing the audit container ID management policy elsewhere in
> > > this thread so I won't comment on that here, but I did want to say
> > > that we will likely need something better than a simple list of audit
> > > container IDs from the start. It's common for systems to have
> > > thousands of containers now (or multiple thousands), which tells me
> > > that a list is a poor choice. You mentioned a hash table, so I would
> > > suggest starting with that over the list for the initial patchset.
> >
> > I saw that as an internal incremental improvement that did not affect
> > the API, so I wanted to keep things a bit simpler (as you've requested
> > in the past) to get this going, and add that enhancement later.
>
> In general a simple approach is a good way to start when the
> problem/use-case is not very well understood; in other words, don't
> spend a lot of time/effort optimizing something you don't yet
> understand. In this case we know that people want to deploy a *lot*
> of containers on a single system so we should design the data
> structures appropriately. A list is simply not a good fit here, I
> believe/hope you know that too.

Yes, I knew that, which is why I alluded to a hash table...

> > I'll start working on it now. The hash table would simply point to
> > lists anyways unless you can recommend a better approach.
>
> I assume when you say "point to lists" you are talking about using
> lists for the hash buckets? If so, yes that should be fine at this
> point. In general if the per-bucket lists become a bottleneck we can
> look at the size of the table (or make it tunable) or even use a
> different approach entirely. Ultimately the data store is an
> implementation detail private to the audit subsystem in the kernel so
> we should be able to change it as necessary without breaking anything.

Yes, this is what I had in mind. It would be tunable either by a macro
or a config option, so the exact value isn't a critical implementation
detail that can be easily tuned as we gain experience with it. And yes,
the intent was that it was a non-user-perceivable implementation choice
other than performace metrics.

> paul moore

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-07-16 19:39:42

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On 2019-07-15 16:38, Paul Moore wrote:
> On Mon, Jul 8, 2019 at 1:51 PM Richard Guy Briggs <[email protected]> wrote:
> > On 2019-05-29 11:29, Paul Moore wrote:
>
> ...
>
> > > The idea is that only container orchestrators should be able to
> > > set/modify the audit container ID, and since setting the audit
> > > container ID can have a significant effect on the records captured
> > > (and their routing to multiple daemons when we get there) modifying
> > > the audit container ID is akin to modifying the audit configuration
> > > which is why it is gated by CAP_AUDIT_CONTROL. The current thinking
> > > is that you would only change the audit container ID from one
> > > set/inherited value to another if you were nesting containers, in
> > > which case the nested container orchestrator would need to be granted
> > > CAP_AUDIT_CONTROL (which everyone to date seems to agree is a workable
> > > compromise). We did consider allowing for a chain of nested audit
> > > container IDs, but the implications of doing so are significant
> > > (implementation mess, runtime cost, etc.) so we are leaving that out
> > > of this effort.
> >
> > We had previously discussed the idea of restricting
> > orchestrators/engines from only being able to set the audit container
> > identifier on their own descendants, but it was discarded. I've added a
> > check to ensure this is now enforced.
>
> When we weren't allowing nested orchestrators it wasn't necessary, but
> with the move to support nesting I believe this will be a requirement.
> We might also need/want to restrict audit container ID changes if a
> descendant is acting as a container orchestrator and managing one or
> more audit container IDs; although I'm less certain of the need for
> this.

I was of the opinion it was necessary before with single-layer parallel
orchestrators/engines.

> > I've also added a check to ensure that a process can't set its own audit
> > container identifier ...
>
> What does this protect against, or what problem does this solve?
> Considering how easy it is to fork/exec, it seems like this could be
> trivially bypassed.

Well, for starters, it would remove one layer of nesting. It would
separate the functional layers of processes. Other than that, it seems
like a gut feeling that it is just wrong to allow it. It seems like a
layer violation that one container orchestrator/engine could set its own
audit container identifier and then set its children as well. It would
be its own parent. It would make it harder to verify adherance to
descendancy and inheritance rules.

> > ... and that if the identifier is already set, then the
> > orchestrator/engine must be in a descendant user namespace from the
> > orchestrator that set the previously inherited audit container
> > identifier.
>
> You lost me here ... although I don't like the idea of relying on X
> namespace inheritance for a hard coded policy on setting the audit
> container ID; we've worked hard to keep this independent of any
> definition of a "container" and it would sadden me greatly if we had
> to go back on that.

This would seem to be the one concession I'm reluctantly making to try
to solve this nested container orchestrator/engine challenge.

Would backing off on that descendant user namespace requirement and only
require that a nested audit container identifier only be permitted on a
descendant task be sufficient? It may for this use case, but I suspect
not for additional audit daemons (we're not there yet) and message
routing to those daemons.

The one difference here is that it does not depend on this if the audit
container identifier has not already been set.

> paul moore

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-07-16 21:40:12

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Tue, Jul 16, 2019 at 3:38 PM Richard Guy Briggs <[email protected]> wrote:
> On 2019-07-15 16:38, Paul Moore wrote:
> > On Mon, Jul 8, 2019 at 1:51 PM Richard Guy Briggs <[email protected]> wrote:
> > > On 2019-05-29 11:29, Paul Moore wrote:
> >
> > ...
> >
> > > > The idea is that only container orchestrators should be able to
> > > > set/modify the audit container ID, and since setting the audit
> > > > container ID can have a significant effect on the records captured
> > > > (and their routing to multiple daemons when we get there) modifying
> > > > the audit container ID is akin to modifying the audit configuration
> > > > which is why it is gated by CAP_AUDIT_CONTROL. The current thinking
> > > > is that you would only change the audit container ID from one
> > > > set/inherited value to another if you were nesting containers, in
> > > > which case the nested container orchestrator would need to be granted
> > > > CAP_AUDIT_CONTROL (which everyone to date seems to agree is a workable
> > > > compromise). We did consider allowing for a chain of nested audit
> > > > container IDs, but the implications of doing so are significant
> > > > (implementation mess, runtime cost, etc.) so we are leaving that out
> > > > of this effort.
> > >
> > > We had previously discussed the idea of restricting
> > > orchestrators/engines from only being able to set the audit container
> > > identifier on their own descendants, but it was discarded. I've added a
> > > check to ensure this is now enforced.
> >
> > When we weren't allowing nested orchestrators it wasn't necessary, but
> > with the move to support nesting I believe this will be a requirement.
> > We might also need/want to restrict audit container ID changes if a
> > descendant is acting as a container orchestrator and managing one or
> > more audit container IDs; although I'm less certain of the need for
> > this.
>
> I was of the opinion it was necessary before with single-layer parallel
> orchestrators/engines.

One of the many things we've disagreed on, but it doesn't really
matter at this point.

> > > I've also added a check to ensure that a process can't set its own audit
> > > container identifier ...
> >
> > What does this protect against, or what problem does this solve?
> > Considering how easy it is to fork/exec, it seems like this could be
> > trivially bypassed.
>
> Well, for starters, it would remove one layer of nesting. It would
> separate the functional layers of processes.

This doesn't seem like something we need to protect against, what's
the harm? My opinion at this point is that we should only add
restrictions to protect against problematic or dangerous situations; I
don't believe one extra layer of nesting counts as either.

Perhaps the container folks on the To/CC line can comment on this? If
there is a valid reason for this restriction, great, let's do it,
otherwise it seems like an unnecessary hard coded policy to me.

> Other than that, it seems
> like a gut feeling that it is just wrong to allow it. It seems like a
> layer violation that one container orchestrator/engine could set its own
> audit container identifier and then set its children as well. It would
> be its own parent.

I suspect you are right that the current crop of container engines
won't do this, but who knows what we'll be doing with "containers" 5,
or even 10, years from now. With that in mind, let me ask the
question again: is allowing an orchestrator the ability to set its own
audit container ID problematic and/or dangerous?

> It would make it harder to verify adherance to descendancy and inheritance rules.

The audit log should contain all the information needed to track that,
right? If it doesn't, then I think we have a problem with the
information we are logging. Right?

> > > ... and that if the identifier is already set, then the
> > > orchestrator/engine must be in a descendant user namespace from the
> > > orchestrator that set the previously inherited audit container
> > > identifier.
> >
> > You lost me here ... although I don't like the idea of relying on X
> > namespace inheritance for a hard coded policy on setting the audit
> > container ID; we've worked hard to keep this independent of any
> > definition of a "container" and it would sadden me greatly if we had
> > to go back on that.
>
> This would seem to be the one concession I'm reluctantly making to try
> to solve this nested container orchestrator/engine challenge.

As I said, you lost me on this - how does this help? A more detailed
explanation of how this helps resolve the nesting problem would be
useful.

> Would backing off on that descendant user namespace requirement and only
> require that a nested audit container identifier only be permitted on a
> descendant task be sufficient? It may for this use case, but I suspect
> not for additional audit daemons (we're not there yet) and message
> routing to those daemons.
>
> The one difference here is that it does not depend on this if the audit
> container identifier has not already been set.

--
paul moore
http://www.paul-moore.com

2019-07-16 22:04:41

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On 2019-07-15 17:04, Paul Moore wrote:
> On Mon, Jul 8, 2019 at 2:06 PM Richard Guy Briggs <[email protected]> wrote:
> > On 2019-05-30 15:29, Paul Moore wrote:
>
> ...
>
> > > [REMINDER: It is an "*audit* container ID" and not a general
> > > "container ID" ;) Smiley aside, I'm not kidding about that part.]
> > >
> > > I'm not interested in supporting/merging something that isn't useful;
> > > if this doesn't work for your use case then we need to figure out what
> > > would work. It sounds like nested containers are much more common in
> > > the lxc world, can you elaborate a bit more on this?
> > >
> > > As far as the possible solutions you mention above, I'm not sure I
> > > like the per-userns audit container IDs, I'd much rather just emit the
> > > necessary tracking information via the audit record stream and let the
> > > log analysis tools figure it out. However, the bigger question is how
> > > to limit (re)setting the audit container ID when you are in a non-init
> > > userns. For reasons already mentioned, using capable() is a non
> > > starter for everything but the initial userns, and using ns_capable()
> > > is equally poor as it essentially allows any userns the ability to
> > > munge it's audit container ID (obviously not good). It appears we
> > > need a different method for controlling access to the audit container
> > > ID.
> >
> > We're not quite ready yet for multiple audit daemons and possibly not
> > yet for audit namespaces, but this is starting to look a lot like the
> > latter.
>
> A few quick comments on audit namespaces: the audit container ID is
> not envisioned as a new namespace (even in nested form) and neither do
> I consider running multiple audit daemons to be a new namespace.

I can picture either one.

> From my perspective we create namespaces to allow us to redefine a
> global resource for some subset of the system, e.g. providing a unique
> /tmp for some number of processes on the system. While it may be
> tempting to think of the audit container ID as something we could
> "namespace", especially when multiple audit daemons are concerned, in
> some ways this would be counter productive; the audit container ID is
> intended to be a global ID that can be used to associate audit event
> records with a "container" where the "container" is defined by an
> orchestrator outside the audit subsystem. The global nature of the
> audit container ID allows us to maintain a sane(ish) view of the
> system in the audit log, if we were to "namespace" the audit container
> ID such that the value was no longer guaranteed to be unique
> throughout the system, we would need to additionally track the audit
> namespace along with the audit container ID which starts to border on
> insanity IMHO.

Understood. And mostly agree. Any audit namespace would have to be a
hybrid anyways, since only the init one would have full access to audit
resources. All the others would be somewhat neutered. And in the case
of checking for previous usage of a contid, if it was not already in use
in the hypothetical audit namespace but was in use elsewhere in the
system and we blocked its usage in this namespace, it would leak that
information by blocking it.

I saw it as a way of permitting layering with the natural descendancy
structure showing that hierarchy. The potential flaw with my reasoning
is that a parent could exit and its children would get re-parented onto
its next ancestor, so the intermediate task with an intermediate contid
would break that contid documentation chain.

> > If we can't trust ns_capable() then why are we passing on
> > CAP_AUDIT_CONTROL? It is being passed down and not stripped purposely
> > by the orchestrator/engine. If ns_capable() isn't inherited how is it
> > gained otherwise? Can it be inserted by cotainer image? I think the
> > answer is "no". Either we trust ns_capable() or we have audit
> > namespaces (recommend based on user namespace) (or both).
>
> My thinking is that since ns_capable() checks the credentials with
> respect to the current user namespace we can't rely on it to control
> access since it would be possible for a privileged process running
> inside an unprivileged container to manipulate the audit container ID
> (containerized process has CAP_AUDIT_CONTROL, e.g. running as root in
> the container, while the container itself does not).

What makes an unprivileged container unprivileged? "root", or "CAP_*"?

If CAP_AUDIT_CONTROL is granted, does "root" matter? Does it matter
what user namespace it is in? I understand that root is *gained* in an
unprivileged user namespace, but capabilities are inherited or permitted
and that process either has it or it doesn't and an unprivileged user
namespace can't gain a capability that has been rescinded. Different
subsystems use the userid or capabilities or both to determine
privileges. In this case, is the userid relevant?

> > At this point I would say we are at an impasse unless we trust
> > ns_capable() or we implement audit namespaces.
>
> I'm not sure how we can trust ns_capable(), but if you can think of a
> way I would love to hear it. I'm also not sure how namespacing audit
> is helpful (see my above comments), but if you think it is please
> explain.

So if we are not namespacing, why do we not trust capabilities?

> > I don't think another mechanism to trust nested orchestrators/engines
> > will buy us anything.
> >
> > Am I missing something?
>
> Based on your questions/comments above it looks like your
> understanding of ns_capable() does not match mine; if I'm thinking
> about ns_capable() incorrectly, please educate me.
>
> --
> paul moore
> http://www.paul-moore.com

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-07-16 23:31:07

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Tue, Jul 16, 2019 at 6:03 PM Richard Guy Briggs <[email protected]> wrote:
> On 2019-07-15 17:04, Paul Moore wrote:
> > On Mon, Jul 8, 2019 at 2:06 PM Richard Guy Briggs <[email protected]> wrote:

...

> > > If we can't trust ns_capable() then why are we passing on
> > > CAP_AUDIT_CONTROL? It is being passed down and not stripped purposely
> > > by the orchestrator/engine. If ns_capable() isn't inherited how is it
> > > gained otherwise? Can it be inserted by cotainer image? I think the
> > > answer is "no". Either we trust ns_capable() or we have audit
> > > namespaces (recommend based on user namespace) (or both).
> >
> > My thinking is that since ns_capable() checks the credentials with
> > respect to the current user namespace we can't rely on it to control
> > access since it would be possible for a privileged process running
> > inside an unprivileged container to manipulate the audit container ID
> > (containerized process has CAP_AUDIT_CONTROL, e.g. running as root in
> > the container, while the container itself does not).
>
> What makes an unprivileged container unprivileged? "root", or "CAP_*"?

My understanding is that when most people refer to an unprivileged
container they are referring to a container run without capabilities
or a container run by a user other than root. I'm sure there are
better definitions out there, by folks much smarter than me on these
things, but that's my working definition.

> If CAP_AUDIT_CONTROL is granted, does "root" matter?

Our discussions here have been about capabilities, not UIDs. The only
reason root might matter is that it generally has the full capability
set.

> Does it matter what user namespace it is in?

What likely matters is what check is called: capable() or
ns_capable(). Those can yield very different results.

> I understand that root is *gained* in an
> unprivileged user namespace, but capabilities are inherited or permitted
> and that process either has it or it doesn't and an unprivileged user
> namespace can't gain a capability that has been rescinded. Different
> subsystems use the userid or capabilities or both to determine
> privileges.

Once again, I believe the important thing to focus on here is
capable() vs ns_capable(). We can't safely rely on ns_capable() for
the audit container ID policy since that is easily met inside the
container regardless of the process' creds which started the
container.

> In this case, is the userid relevant?

We don't do UID checks, we do capability checks, so yes, the UID is irrelevant.

> > > At this point I would say we are at an impasse unless we trust
> > > ns_capable() or we implement audit namespaces.
> >
> > I'm not sure how we can trust ns_capable(), but if you can think of a
> > way I would love to hear it. I'm also not sure how namespacing audit
> > is helpful (see my above comments), but if you think it is please
> > explain.
>
> So if we are not namespacing, why do we not trust capabilities?

We can trust capable(CAP_AUDIT_CONTROL) for enforcing audit container
ID policy, we can not trust ns_capable(CAP_AUDIT_CONTROL).

--
paul moore
http://www.paul-moore.com

2019-07-18 00:54:07

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On 2019-07-16 19:30, Paul Moore wrote:
> On Tue, Jul 16, 2019 at 6:03 PM Richard Guy Briggs <[email protected]> wrote:
> > On 2019-07-15 17:04, Paul Moore wrote:
> > > On Mon, Jul 8, 2019 at 2:06 PM Richard Guy Briggs <[email protected]> wrote:
>
> ...
>
> > > > If we can't trust ns_capable() then why are we passing on
> > > > CAP_AUDIT_CONTROL? It is being passed down and not stripped purposely
> > > > by the orchestrator/engine. If ns_capable() isn't inherited how is it
> > > > gained otherwise? Can it be inserted by cotainer image? I think the
> > > > answer is "no". Either we trust ns_capable() or we have audit
> > > > namespaces (recommend based on user namespace) (or both).
> > >
> > > My thinking is that since ns_capable() checks the credentials with
> > > respect to the current user namespace we can't rely on it to control
> > > access since it would be possible for a privileged process running
> > > inside an unprivileged container to manipulate the audit container ID
> > > (containerized process has CAP_AUDIT_CONTROL, e.g. running as root in
> > > the container, while the container itself does not).
> >
> > What makes an unprivileged container unprivileged? "root", or "CAP_*"?
>
> My understanding is that when most people refer to an unprivileged
> container they are referring to a container run without capabilities
> or a container run by a user other than root. I'm sure there are
> better definitions out there, by folks much smarter than me on these
> things, but that's my working definition.

Close enough to my understanding...

> > If CAP_AUDIT_CONTROL is granted, does "root" matter?
>
> Our discussions here have been about capabilities, not UIDs. The only
> reason root might matter is that it generally has the full capability
> set.

Good, that's my understanding.

> > Does it matter what user namespace it is in?
>
> What likely matters is what check is called: capable() or
> ns_capable(). Those can yield very different results.

Ok, I finally found what I was looking for to better understand the
challenge with trusting ns_capable(). Sorry for being so dense and slow
on this one. I thought I had gone through the code carefully enough,
but this time I finally found it. set_cred_user_ns() sets a full set of
capabilities rather than inheriting them from the parent user_ns, called
from userns_install() or create_userns(). Even if the container
orchestrator/engine restricts those capabilities on its own containers,
they could easily unshare a userns and get a full set unless it also
restricted CAP_SYS_ADMIN, which is used too many other places to be
practical to restrict.

> > I understand that root is *gained* in an
> > unprivileged user namespace, but capabilities are inherited or permitted
> > and that process either has it or it doesn't and an unprivileged user
> > namespace can't gain a capability that has been rescinded. Different
> > subsystems use the userid or capabilities or both to determine
> > privileges.
>
> Once again, I believe the important thing to focus on here is
> capable() vs ns_capable(). We can't safely rely on ns_capable() for
> the audit container ID policy since that is easily met inside the
> container regardless of the process' creds which started the
> container.

Agreed.

> > In this case, is the userid relevant?
>
> We don't do UID checks, we do capability checks, so yes, the UID is irrelevant.

Agreed.

> > > > At this point I would say we are at an impasse unless we trust
> > > > ns_capable() or we implement audit namespaces.
> > >
> > > I'm not sure how we can trust ns_capable(), but if you can think of a
> > > way I would love to hear it. I'm also not sure how namespacing audit
> > > is helpful (see my above comments), but if you think it is please
> > > explain.
> >
> > So if we are not namespacing, why do we not trust capabilities?
>
> We can trust capable(CAP_AUDIT_CONTROL) for enforcing audit container
> ID policy, we can not trust ns_capable(CAP_AUDIT_CONTROL).

Ok. So does a process in a non-init user namespace have two (or more)
sets of capabilities stored in creds, one in the init_user_ns, and one
in current_user_ns? Or does it get stripped of all its capabilities in
init_user_ns once it has its own set in current_user_ns? If the former,
then we can use capable(). If the latter, we need another mechanism, as
you have suggested might be needed.

If some random unprivileged user wants to fire up a container
orchestrator/engine in his own user namespace, then audit needs to be
namespaced. Can we safely discard this scenario for now? That user can
use a VM.

> paul moore

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-07-18 21:53:57

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Wed, Jul 17, 2019 at 8:52 PM Richard Guy Briggs <[email protected]> wrote:
> On 2019-07-16 19:30, Paul Moore wrote:

...

> > We can trust capable(CAP_AUDIT_CONTROL) for enforcing audit container
> > ID policy, we can not trust ns_capable(CAP_AUDIT_CONTROL).
>
> Ok. So does a process in a non-init user namespace have two (or more)
> sets of capabilities stored in creds, one in the init_user_ns, and one
> in current_user_ns? Or does it get stripped of all its capabilities in
> init_user_ns once it has its own set in current_user_ns? If the former,
> then we can use capable(). If the latter, we need another mechanism, as
> you have suggested might be needed.

Unfortunately I think the problem is that ultimately we need to allow
any container orchestrator that has been given privileges to manage
the audit container ID to also grant that privilege to any of the
child process/containers it manages. I don't believe we can do that
with capabilities based on the code I've looked at, and the
discussions I've had, but if you find a way I would leave to hear it.

> If some random unprivileged user wants to fire up a container
> orchestrator/engine in his own user namespace, then audit needs to be
> namespaced. Can we safely discard this scenario for now?

I think the only time we want to allow a container orchestrator to
manage the audit container ID is if it has been granted that privilege
by someone who has that privilege already. In the zero-container, or
single-level of containers, case this is relatively easy, and we can
accomplish it using CAP_AUDIT_CONTROL as the privilege. If we start
nesting container orchestrators it becomes more complicated as we need
to be able to support granting and inheriting this privilege in a
manner; this is why I suggested a new mechanism *may* be necessary.

--
paul moore
http://www.paul-moore.com

2019-07-19 16:49:10

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

Paul Moore <[email protected]> writes:

> On Wed, Jul 17, 2019 at 8:52 PM Richard Guy Briggs <[email protected]> wrote:
>> On 2019-07-16 19:30, Paul Moore wrote:
>
> ...
>
>> > We can trust capable(CAP_AUDIT_CONTROL) for enforcing audit container
>> > ID policy, we can not trust ns_capable(CAP_AUDIT_CONTROL).
>>
>> Ok. So does a process in a non-init user namespace have two (or more)
>> sets of capabilities stored in creds, one in the init_user_ns, and one
>> in current_user_ns? Or does it get stripped of all its capabilities in
>> init_user_ns once it has its own set in current_user_ns? If the former,
>> then we can use capable(). If the latter, we need another mechanism, as
>> you have suggested might be needed.
>
> Unfortunately I think the problem is that ultimately we need to allow
> any container orchestrator that has been given privileges to manage
> the audit container ID to also grant that privilege to any of the
> child process/containers it manages. I don't believe we can do that
> with capabilities based on the code I've looked at, and the
> discussions I've had, but if you find a way I would leave to hear it.

>> If some random unprivileged user wants to fire up a container
>> orchestrator/engine in his own user namespace, then audit needs to be
>> namespaced. Can we safely discard this scenario for now?
>
> I think the only time we want to allow a container orchestrator to
> manage the audit container ID is if it has been granted that privilege
> by someone who has that privilege already. In the zero-container, or
> single-level of containers, case this is relatively easy, and we can
> accomplish it using CAP_AUDIT_CONTROL as the privilege. If we start
> nesting container orchestrators it becomes more complicated as we need
> to be able to support granting and inheriting this privilege in a
> manner; this is why I suggested a new mechanism *may* be necessary.


Let me segway a bit and see if I can get this conversation out of the
rut it seems to have drifted into.

Unprivileged containers and nested containers exist today and are going
to become increasingly common. Let that be a given.

As I recall the interesting thing for audit to log is actions by
privileged processes. Audit can log more but generally configuring
logging by of the actions of unprivileged users is effectively a self
DOS.

So I think the initial implementation can safely ignore actions of
nested containers and unprivileged containers because you don't care
about their actions.

If we start allow running audit in a container then we need to deal with
all of the nesting issues but until then I don't think you folks care.

Or am I wrong. Do the requirements for securely auditing things from
the kernel care about the actions of unprivileged users?

Eric

2019-07-19 16:55:55

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

Richard Guy Briggs <[email protected]> writes:

> Implement the proc fs write to set the audit container identifier of a
> process, emitting an AUDIT_CONTAINER_OP record to document the event.
>
> This is a write from the container orchestrator task to a proc entry of
> the form /proc/PID/audit_containerid where PID is the process ID of the
> newly created task that is to become the first task in a container, or
> an additional task added to a container.
>
> The write expects up to a u64 value (unset: 18446744073709551615).
>
> The writer must have capability CAP_AUDIT_CONTROL.
>
> This will produce a record such as this:
> type=CONTAINER_OP msg=audit(2018-06-06 12:39:29.636:26949) : op=set opid=2209 contid=123456 old-contid=18446744073709551615 pid=628 auid=root uid=root tty=ttyS0 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm=bash exe=/usr/bin/bash res=yes
>
> The "op" field indicates an initial set. The "pid" to "ses" fields are
> the orchestrator while the "opid" field is the object's PID, the process
> being "contained". New and old audit container identifier values are
> given in the "contid" fields, while res indicates its success.
>
> It is not permitted to unset the audit container identifier.
> A child inherits its parent's audit container identifier.

Why get proc involved in this? I know it more or less fits as
this is about a process and it's descendants. But this seems to
encouarge being able to read this value, and being able to read
this value seems to encourage misuse.

So I am not of fan of using proc for this.

> Please see the github audit kernel issue for the main feature:
> https://github.com/linux-audit/audit-kernel/issues/90
> Please see the github audit userspace issue for supporting additions:
> https://github.com/linux-audit/audit-userspace/issues/51
> Please see the github audit testsuiite issue for the test case:
> https://github.com/linux-audit/audit-testsuite/issues/64
> Please see the github audit wiki for the feature overview:
> https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
>
> Signed-off-by: Richard Guy Briggs <[email protected]>
> Acked-by: Serge Hallyn <[email protected]>
> Acked-by: Steve Grubb <[email protected]>
> Acked-by: Neil Horman <[email protected]>
> Reviewed-by: Ondrej Mosnacek <[email protected]>
> ---
> fs/proc/base.c | 36 ++++++++++++++++++++++++
> include/linux/audit.h | 25 +++++++++++++++++
> include/uapi/linux/audit.h | 2 ++
> kernel/audit.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++
> kernel/audit.h | 1 +
> kernel/auditsc.c | 4 +++
> 6 files changed, 137 insertions(+)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index ddef482f1334..43fd0c4b87de 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -1294,6 +1294,40 @@ static ssize_t proc_sessionid_read(struct file * file, char __user * buf,
> .read = proc_sessionid_read,
> .llseek = generic_file_llseek,
> };
> +
> +static ssize_t proc_contid_write(struct file *file, const char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + struct inode *inode = file_inode(file);
> + u64 contid;
> + int rv;
> + struct task_struct *task = get_proc_task(inode);
> +
> + if (!task)
> + return -ESRCH;
> + if (*ppos != 0) {
> + /* No partial writes. */
> + put_task_struct(task);
> + return -EINVAL;
> + }
> +
> + rv = kstrtou64_from_user(buf, count, 10, &contid);
> + if (rv < 0) {
> + put_task_struct(task);
> + return rv;
> + }
> +
> + rv = audit_set_contid(task, contid);
> + put_task_struct(task);
> + if (rv < 0)
> + return rv;
> + return count;
> +}
> +
> +static const struct file_operations proc_contid_operations = {
> + .write = proc_contid_write,
> + .llseek = generic_file_llseek,
> +};
> #endif
>
> #ifdef CONFIG_FAULT_INJECTION
> @@ -3033,6 +3067,7 @@ static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
> #ifdef CONFIG_AUDIT
> REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
> REG("sessionid", S_IRUGO, proc_sessionid_operations),
> + REG("audit_containerid", S_IWUSR, proc_contid_operations),
> #endif
> #ifdef CONFIG_FAULT_INJECTION
> REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
> @@ -3431,6 +3466,7 @@ static int proc_tid_comm_permission(struct inode *inode, int mask)
> #ifdef CONFIG_AUDIT
> REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
> REG("sessionid", S_IRUGO, proc_sessionid_operations),
> + REG("audit_containerid", S_IWUSR, proc_contid_operations),
> #endif
> #ifdef CONFIG_FAULT_INJECTION
> REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index bde346e73f0c..301337776193 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -89,6 +89,7 @@ struct audit_field {
> struct audit_task_info {
> kuid_t loginuid;
> unsigned int sessionid;
> + u64 contid;
> #ifdef CONFIG_AUDITSYSCALL
> struct audit_context *ctx;
> #endif
> @@ -189,6 +190,15 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
> return tsk->audit->sessionid;
> }
>
> +extern int audit_set_contid(struct task_struct *tsk, u64 contid);
> +
> +static inline u64 audit_get_contid(struct task_struct *tsk)
> +{
> + if (!tsk->audit)
> + return AUDIT_CID_UNSET;
> + return tsk->audit->contid;
> +}
> +
> extern u32 audit_enabled;
> #else /* CONFIG_AUDIT */
> static inline int audit_alloc(struct task_struct *task)
> @@ -250,6 +260,11 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
> return AUDIT_SID_UNSET;
> }
>
> +static inline u64 audit_get_contid(struct task_struct *tsk)
> +{
> + return AUDIT_CID_UNSET;
> +}
> +
> #define audit_enabled AUDIT_OFF
> #endif /* CONFIG_AUDIT */
>
> @@ -606,6 +621,16 @@ static inline bool audit_loginuid_set(struct task_struct *tsk)
> return uid_valid(audit_get_loginuid(tsk));
> }
>
> +static inline bool audit_contid_valid(u64 contid)
> +{
> + return contid != AUDIT_CID_UNSET;
> +}
> +
> +static inline bool audit_contid_set(struct task_struct *tsk)
> +{
> + return audit_contid_valid(audit_get_contid(tsk));
> +}
> +
> static inline void audit_log_string(struct audit_buffer *ab, const char *buf)
> {
> audit_log_n_string(ab, buf, strlen(buf));
> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index 3901c51c0b93..4a6a8bf1de32 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -71,6 +71,7 @@
> #define AUDIT_TTY_SET 1017 /* Set TTY auditing status */
> #define AUDIT_SET_FEATURE 1018 /* Turn an audit feature on or off */
> #define AUDIT_GET_FEATURE 1019 /* Get which features are enabled */
> +#define AUDIT_CONTAINER_OP 1020 /* Define the container id and info */
>
> #define AUDIT_FIRST_USER_MSG 1100 /* Userspace messages mostly uninteresting to kernel */
> #define AUDIT_USER_AVC 1107 /* We filter this differently */
> @@ -485,6 +486,7 @@ struct audit_tty_status {
>
> #define AUDIT_UID_UNSET (unsigned int)-1
> #define AUDIT_SID_UNSET ((unsigned int)-1)
> +#define AUDIT_CID_UNSET ((u64)-1)
>
> /* audit_rule_data supports filter rules with both integer and string
> * fields. It corresponds with AUDIT_ADD_RULE, AUDIT_DEL_RULE and
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 3fb09783cd4a..182b0f2c183d 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -244,6 +244,7 @@ int audit_alloc(struct task_struct *tsk)
> }
> info->loginuid = audit_get_loginuid(current);
> info->sessionid = audit_get_sessionid(current);
> + info->contid = audit_get_contid(current);
> tsk->audit = info;
>
> ret = audit_alloc_syscall(tsk);
> @@ -258,6 +259,7 @@ int audit_alloc(struct task_struct *tsk)
> struct audit_task_info init_struct_audit = {
> .loginuid = INVALID_UID,
> .sessionid = AUDIT_SID_UNSET,
> + .contid = AUDIT_CID_UNSET,
> #ifdef CONFIG_AUDITSYSCALL
> .ctx = NULL,
> #endif
> @@ -2341,6 +2343,73 @@ int audit_set_loginuid(kuid_t loginuid)
> }
>
> /**
> + * audit_set_contid - set current task's audit contid
> + * @contid: contid value
> + *
> + * Returns 0 on success, -EPERM on permission failure.
> + *
> + * Called (set) from fs/proc/base.c::proc_contid_write().
> + */
> +int audit_set_contid(struct task_struct *task, u64 contid)
> +{
> + u64 oldcontid;
> + int rc = 0;
> + struct audit_buffer *ab;
> + uid_t uid;
> + struct tty_struct *tty;
> + char comm[sizeof(current->comm)];
> +
> + task_lock(task);
> + /* Can't set if audit disabled */
> + if (!task->audit) {
> + task_unlock(task);
> + return -ENOPROTOOPT;
> + }
> + oldcontid = audit_get_contid(task);
> + read_lock(&tasklist_lock);
> + /* Don't allow the audit containerid to be unset */
> + if (!audit_contid_valid(contid))
> + rc = -EINVAL;
> + /* if we don't have caps, reject */
> + else if (!capable(CAP_AUDIT_CONTROL))
> + rc = -EPERM;
> + /* if task has children or is not single-threaded, deny */
> + else if (!list_empty(&task->children))
> + rc = -EBUSY;
> + else if (!(thread_group_leader(task) && thread_group_empty(task)))
> + rc = -EALREADY;
> + read_unlock(&tasklist_lock);
> + if (!rc)
> + task->audit->contid = contid;
> + task_unlock(task);
> +
> + if (!audit_enabled)
> + return rc;
> +
> + ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_CONTAINER_OP);
> + if (!ab)
> + return rc;
> +
> + uid = from_kuid(&init_user_ns, task_uid(current));
> + tty = audit_get_tty();
> + audit_log_format(ab,
> + "op=set opid=%d contid=%llu old-contid=%llu pid=%d uid=%u auid=%u tty=%s ses=%u",
> + task_tgid_nr(task), contid, oldcontid,
> + task_tgid_nr(current), uid,
> + from_kuid(&init_user_ns, audit_get_loginuid(current)),
> + tty ? tty_name(tty) : "(none)",
> + audit_get_sessionid(current));
> + audit_put_tty(tty);
> + audit_log_task_context(ab);
> + audit_log_format(ab, " comm=");
> + audit_log_untrustedstring(ab, get_task_comm(comm, current));
> + audit_log_d_path_exe(ab, current->mm);
> + audit_log_format(ab, " res=%d", !rc);
> + audit_log_end(ab);
> + return rc;
> +}
> +
> +/**
> * audit_log_end - end one audit record
> * @ab: the audit_buffer
> *
> diff --git a/kernel/audit.h b/kernel/audit.h
> index c00e2ee3c6b3..e2912924af0d 100644
> --- a/kernel/audit.h
> +++ b/kernel/audit.h
> @@ -148,6 +148,7 @@ struct audit_context {
> kuid_t target_uid;
> unsigned int target_sessionid;
> u32 target_sid;
> + u64 target_cid;
> char target_comm[TASK_COMM_LEN];
>
> struct audit_tree_refs *trees, *first_trees;
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index fd7ca983de4f..1f7edf035b16 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -113,6 +113,7 @@ struct audit_aux_data_pids {
> kuid_t target_uid[AUDIT_AUX_PIDS];
> unsigned int target_sessionid[AUDIT_AUX_PIDS];
> u32 target_sid[AUDIT_AUX_PIDS];
> + u64 target_cid[AUDIT_AUX_PIDS];
> char target_comm[AUDIT_AUX_PIDS][TASK_COMM_LEN];
> int pid_count;
> };
> @@ -2368,6 +2369,7 @@ void __audit_ptrace(struct task_struct *t)
> context->target_uid = task_uid(t);
> context->target_sessionid = audit_get_sessionid(t);
> security_task_getsecid(t, &context->target_sid);
> + context->target_cid = audit_get_contid(t);
> memcpy(context->target_comm, t->comm, TASK_COMM_LEN);
> }
>
> @@ -2408,6 +2410,7 @@ int audit_signal_info(int sig, struct task_struct *t)
> ctx->target_uid = t_uid;
> ctx->target_sessionid = audit_get_sessionid(t);
> security_task_getsecid(t, &ctx->target_sid);
> + ctx->target_cid = audit_get_contid(t);
> memcpy(ctx->target_comm, t->comm, TASK_COMM_LEN);
> return 0;
> }
> @@ -2429,6 +2432,7 @@ int audit_signal_info(int sig, struct task_struct *t)
> axp->target_uid[axp->pid_count] = t_uid;
> axp->target_sessionid[axp->pid_count] = audit_get_sessionid(t);
> security_task_getsecid(t, &axp->target_sid[axp->pid_count]);
> + axp->target_cid[axp->pid_count] = audit_get_contid(t);
> memcpy(axp->target_comm[axp->pid_count], t->comm, TASK_COMM_LEN);
> axp->pid_count++;

2019-07-19 18:48:21

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

Richard Guy Briggs <[email protected]> writes:

> On 2019-07-16 19:30, Paul Moore wrote:
>> On Tue, Jul 16, 2019 at 6:03 PM Richard Guy Briggs <[email protected]> wrote:
>> > On 2019-07-15 17:04, Paul Moore wrote:
>> > > On Mon, Jul 8, 2019 at 2:06 PM Richard Guy Briggs <[email protected]> wrote:
>>
>> > > > At this point I would say we are at an impasse unless we trust
>> > > > ns_capable() or we implement audit namespaces.
>> > >
>> > > I'm not sure how we can trust ns_capable(), but if you can think of a
>> > > way I would love to hear it. I'm also not sure how namespacing audit
>> > > is helpful (see my above comments), but if you think it is please
>> > > explain.
>> >
>> > So if we are not namespacing, why do we not trust capabilities?
>>
>> We can trust capable(CAP_AUDIT_CONTROL) for enforcing audit container
>> ID policy, we can not trust ns_capable(CAP_AUDIT_CONTROL).
>
> Ok. So does a process in a non-init user namespace have two (or more)
> sets of capabilities stored in creds, one in the init_user_ns, and one
> in current_user_ns? Or does it get stripped of all its capabilities in
> init_user_ns once it has its own set in current_user_ns? If the former,
> then we can use capable(). If the latter, we need another mechanism, as
> you have suggested might be needed.

The latter. There is only one set of capabilities and it is in the
processes current user namespace.

Eric

2019-07-19 19:31:59

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 03/10] audit: read container ID of a process

Richard Guy Briggs <[email protected]> writes:

> Add support for reading the audit container identifier from the proc
> filesystem.
>
> This is a read from the proc entry of the form
> /proc/PID/audit_containerid where PID is the process ID of the task
> whose audit container identifier is sought.
>
> The read expects up to a u64 value (unset: 18446744073709551615).
>
> This read requires CAP_AUDIT_CONTROL.

This scares me. As this seems to make it easy to reuse an audit
containerid for non-audit purporses.

I would think it would be safer and easier to poke audit and ask it to
log a message with your audit container id.

Eric


> Signed-off-by: Richard Guy Briggs <[email protected]>
> Acked-by: Serge Hallyn <[email protected]>
> Acked-by: Neil Horman <[email protected]>
> Reviewed-by: Ondrej Mosnacek <[email protected]>
> ---
> fs/proc/base.c | 25 ++++++++++++++++++++++---
> 1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 43fd0c4b87de..acc70239d0cb 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -1211,7 +1211,7 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf,
> };
>
> #ifdef CONFIG_AUDIT
> -#define TMPBUFLEN 11
> +#define TMPBUFLEN 21
> static ssize_t proc_loginuid_read(struct file * file, char __user * buf,
> size_t count, loff_t *ppos)
> {
> @@ -1295,6 +1295,24 @@ static ssize_t proc_sessionid_read(struct file * file, char __user * buf,
> .llseek = generic_file_llseek,
> };
>
> +static ssize_t proc_contid_read(struct file *file, char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + struct inode *inode = file_inode(file);
> + struct task_struct *task = get_proc_task(inode);
> + ssize_t length;
> + char tmpbuf[TMPBUFLEN];
> +
> + if (!task)
> + return -ESRCH;
> + /* if we don't have caps, reject */
> + if (!capable(CAP_AUDIT_CONTROL))
> + return -EPERM;
> + length = scnprintf(tmpbuf, TMPBUFLEN, "%llu", audit_get_contid(task));
> + put_task_struct(task);
> + return simple_read_from_buffer(buf, count, ppos, tmpbuf, length);
> +}
> +
> static ssize_t proc_contid_write(struct file *file, const char __user *buf,
> size_t count, loff_t *ppos)
> {
> @@ -1325,6 +1343,7 @@ static ssize_t proc_contid_write(struct file *file, const char __user *buf,
> }
>
> static const struct file_operations proc_contid_operations = {
> + .read = proc_contid_read,
> .write = proc_contid_write,
> .llseek = generic_file_llseek,
> };
> @@ -3067,7 +3086,7 @@ static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
> #ifdef CONFIG_AUDIT
> REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
> REG("sessionid", S_IRUGO, proc_sessionid_operations),
> - REG("audit_containerid", S_IWUSR, proc_contid_operations),
> + REG("audit_containerid", S_IWUSR|S_IRUSR, proc_contid_operations),
> #endif
> #ifdef CONFIG_FAULT_INJECTION
> REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
> @@ -3466,7 +3485,7 @@ static int proc_tid_comm_permission(struct inode *inode, int mask)
> #ifdef CONFIG_AUDIT
> REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
> REG("sessionid", S_IRUGO, proc_sessionid_operations),
> - REG("audit_containerid", S_IWUSR, proc_contid_operations),
> + REG("audit_containerid", S_IWUSR|S_IRUSR, proc_contid_operations),
> #endif
> #ifdef CONFIG_FAULT_INJECTION
> REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),

2019-07-19 19:45:56

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 03/10] audit: read container ID of a process

On 2019-07-19 11:03, Eric W. Biederman wrote:
> Richard Guy Briggs <[email protected]> writes:
>
> > Add support for reading the audit container identifier from the proc
> > filesystem.
> >
> > This is a read from the proc entry of the form
> > /proc/PID/audit_containerid where PID is the process ID of the task
> > whose audit container identifier is sought.
> >
> > The read expects up to a u64 value (unset: 18446744073709551615).
> >
> > This read requires CAP_AUDIT_CONTROL.
>
> This scares me. As this seems to make it easy to reuse an audit
> containerid for non-audit purporses.

At this point, given that capable(CAP_AUDIT_CONTROL) is not available to
any userspaced container orchestrator/engine, it is moot anywaysand we
will need another method.

> I would think it would be safer and easier to poke audit and ask it to
> log a message with your audit container id.

For it to be useful to a container orchestrator/engine, I think that
would depend on whether we are setting the value, or it is being
assigned by the kernel. At this stage it is set by the orchestrator so
this could work.

> Eric
>
> > Signed-off-by: Richard Guy Briggs <[email protected]>
> > Acked-by: Serge Hallyn <[email protected]>
> > Acked-by: Neil Horman <[email protected]>
> > Reviewed-by: Ondrej Mosnacek <[email protected]>
> > ---
> > fs/proc/base.c | 25 ++++++++++++++++++++++---
> > 1 file changed, 22 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/proc/base.c b/fs/proc/base.c
> > index 43fd0c4b87de..acc70239d0cb 100644
> > --- a/fs/proc/base.c
> > +++ b/fs/proc/base.c
> > @@ -1211,7 +1211,7 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf,
> > };
> >
> > #ifdef CONFIG_AUDIT
> > -#define TMPBUFLEN 11
> > +#define TMPBUFLEN 21
> > static ssize_t proc_loginuid_read(struct file * file, char __user * buf,
> > size_t count, loff_t *ppos)
> > {
> > @@ -1295,6 +1295,24 @@ static ssize_t proc_sessionid_read(struct file * file, char __user * buf,
> > .llseek = generic_file_llseek,
> > };
> >
> > +static ssize_t proc_contid_read(struct file *file, char __user *buf,
> > + size_t count, loff_t *ppos)
> > +{
> > + struct inode *inode = file_inode(file);
> > + struct task_struct *task = get_proc_task(inode);
> > + ssize_t length;
> > + char tmpbuf[TMPBUFLEN];
> > +
> > + if (!task)
> > + return -ESRCH;
> > + /* if we don't have caps, reject */
> > + if (!capable(CAP_AUDIT_CONTROL))
> > + return -EPERM;
> > + length = scnprintf(tmpbuf, TMPBUFLEN, "%llu", audit_get_contid(task));
> > + put_task_struct(task);
> > + return simple_read_from_buffer(buf, count, ppos, tmpbuf, length);
> > +}
> > +
> > static ssize_t proc_contid_write(struct file *file, const char __user *buf,
> > size_t count, loff_t *ppos)
> > {
> > @@ -1325,6 +1343,7 @@ static ssize_t proc_contid_write(struct file *file, const char __user *buf,
> > }
> >
> > static const struct file_operations proc_contid_operations = {
> > + .read = proc_contid_read,
> > .write = proc_contid_write,
> > .llseek = generic_file_llseek,
> > };
> > @@ -3067,7 +3086,7 @@ static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
> > #ifdef CONFIG_AUDIT
> > REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
> > REG("sessionid", S_IRUGO, proc_sessionid_operations),
> > - REG("audit_containerid", S_IWUSR, proc_contid_operations),
> > + REG("audit_containerid", S_IWUSR|S_IRUSR, proc_contid_operations),
> > #endif
> > #ifdef CONFIG_FAULT_INJECTION
> > REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
> > @@ -3466,7 +3485,7 @@ static int proc_tid_comm_permission(struct inode *inode, int mask)
> > #ifdef CONFIG_AUDIT
> > REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
> > REG("sessionid", S_IRUGO, proc_sessionid_operations),
> > - REG("audit_containerid", S_IWUSR, proc_contid_operations),
> > + REG("audit_containerid", S_IWUSR|S_IRUSR, proc_contid_operations),
> > #endif
> > #ifdef CONFIG_FAULT_INJECTION
> > REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-07-20 12:53:49

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id

On Fri, 2019-07-19 at 11:00 -0500, Eric W. Biederman wrote:
> Paul Moore <[email protected]> writes:
>
> > On Wed, Jul 17, 2019 at 8:52 PM Richard Guy Briggs <[email protected]>
> > wrote:
> > > On 2019-07-16 19:30, Paul Moore wrote:
> >
> > ...
> >
> > > > We can trust capable(CAP_AUDIT_CONTROL) for enforcing audit
> > > > container ID policy, we can not trust
> > > > ns_capable(CAP_AUDIT_CONTROL).
> > >
> > > Ok. So does a process in a non-init user namespace have two (or
> > > more) sets of capabilities stored in creds, one in the
> > > init_user_ns, and one in current_user_ns? Or does it get
> > > stripped of all its capabilities in init_user_ns once it has its
> > > own set in current_user_ns? If the former, then we can use
> > > capable(). If the latter, we need another mechanism, as
> > > you have suggested might be needed.
> >
> > Unfortunately I think the problem is that ultimately we need to
> > allow any container orchestrator that has been given privileges to
> > manage the audit container ID to also grant that privilege to any
> > of the child process/containers it manages. I don't believe we can
> > do that with capabilities based on the code I've looked at, and the
> > discussions I've had, but if you find a way I would leave to hear
> > it.
> > > If some random unprivileged user wants to fire up a container
> > > orchestrator/engine in his own user namespace, then audit needs
> > > to be namespaced. Can we safely discard this scenario for now?
> >
> > I think the only time we want to allow a container orchestrator to
> > manage the audit container ID is if it has been granted that
> > privilege by someone who has that privilege already. In the zero-
> > container, or single-level of containers, case this is relatively
> > easy, and we can accomplish it using CAP_AUDIT_CONTROL as the
> > privilege. If we start nesting container orchestrators it becomes
> > more complicated as we need to be able to support granting and
> > inheriting this privilege in a manner; this is why I suggested a
> > new mechanism *may* be necessary.
>
>
> Let me segway a bit and see if I can get this conversation out of the
> rut it seems to have drifted into.
>
> Unprivileged containers and nested containers exist today and are
> going to become increasingly common. Let that be a given.

Agree fully.

> As I recall the interesting thing for audit to log is actions by
> privileged processes. Audit can log more but generally configuring
> logging by of the actions of unprivileged users is effectively a self
> DOS.
>
> So I think the initial implementation can safely ignore actions of
> nested containers and unprivileged containers because you don't care
> about their actions.

I don't entirely agree here: remember there might be two consumers for
the audit data: the physical system owner (checking up on the tenants)
and the tenant themselves who might be watching either their sub
tenants or their users (and who, obviously, won't get the full audit
stream). In either case, the tenant may or may not be privileged, and
if they're privileged, it might be through the user_ns in which case
the physical system owner and the kernel would see them as "not
privileged". So I think we are ultimately going to need the ability to
audit unprivileged containers.

I also think audit has a role to play in intrusion detection and
forensic analysis for fully unprivileged containers running external
services, but I don't think we have to solve that case immediately.

> If we start allow running audit in a container then we need to deal
> with all of the nesting issues but until then I don't think you folks
> care.
>
> Or am I wrong. Do the requirements for securely auditing things from
> the kernel care about the actions of unprivileged users?

I think ultimately we have to care, but it could be three phases: first
would be genuinely privileged containers (i.e. with real root inside,
being our most dangerous problem) the second would be user_ns
privileged containers (i.e. with both user_ns and an interior root
mapping) and the third would be unprivileged containers (with or
without user_ns but no interior root).

James