2019-09-19 01:29:39

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 00/21] audit: implement container identifier

Implement kernel audit container identifier.

This patchset is a seventh based on the proposal document (V3)
posted:
https://www.redhat.com/archives/linux-audit/2018-January/msg00014.html

The first patch was the last patch from ghak81 that was absorbed into
this patchset since its primary justification is the rest of this
patchset.

The second patch implements the proc fs write to set the audit container
identifier of a process, emitting an AUDIT_CONTAINER_OP record to
announce the registration of that audit container identifier on that
process. This patch requires userspace support for record acceptance
and proper type display.

The third implements reading the audit container identifier from the
proc filesystem for debugging. This patch wasn't planned for upstream
inclusion but is starting to become more likely.

The fourth converts over from a simple u64 to a list member that includes
owner information to check for descendancy, allow process injection into
a container and prevent id reuse by other orchestrators.

The fifth logs the drop of an audit container identifier once all tasks
using that audit container identifier have exited.

The 6th limits the total number of containers on a system.

The 7th implements the auxiliary record AUDIT_CONTAINER_ID if an audit
container identifier is associated with an event. This patch requires
userspace support for proper type display.

The 8th adds audit daemon signalling provenance through audit_sig_info2.

The 9th creates a local audit context to be able to bind a standalone
record with a locally created auxiliary record.

The 10th patch adds audit container identifier records to the user
standalone records.

The 11th adds audit container identifier filtering to the exit,
exclude and user lists. This patch adds the AUDIT_CONTID field and
requires auditctl userspace support for the --contid option.

The 12th adds network namespace audit container identifier labelling
based on member tasks' audit container identifier labels.

The 13th adds audit container identifier support to standalone netfilter
records that don't have a task context and lists each container to which
that net namespace belongs.

The 14th checks that the target is a descendant for nesting and the 15th
refactors to avoid a duplicate of the copied function.

The 16th and 17th add audit netlink interfaces for the /proc
audit_containerid, loginuid and sessionid.

The 18th adds tracking and reporting for container nesting. This patch
could be split up and the chunks applied to earlier patches if this
nesting tracking and reporting approach is acceptable. Arguably this is
the only way to be able to report activity in a nested container that
also affects its parent containers.

The 19th limits the container nesting depth.

The 20th adds a mechanism to allow a process to be designated as a
container orchestrator/engine in non-init user namespaces and the 21st
adds a /proc interface for testing only.


Example: Set an audit container identifier of 123456 to the "sleep" task:

sleep 2&
child=$!
echo 123456 > /proc/$child/audit_containerid; echo $?
ausearch -ts recent -m container_op
echo child:$child contid:$( cat /proc/$child/audit_containerid)

This should produce a record such as:

type=CONTAINER_OP msg=audit(2018-06-06 12:39:29.636:26949) : op=set opid=2209 contid=123456 old-contid=18446744073709551615 pid=628 auid=root uid=root tty=ttyS0 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm=bash exe=/usr/bin/bash res=yes


Example: Set a filter on an audit container identifier 123459 on /tmp/tmpcontainerid:

contid=123459
key=tmpcontainerid
auditctl -a exit,always -F dir=/tmp -F perm=wa -F contid=$contid -F key=$key
perl -e "sleep 1; open(my \$tmpfile, '>', \"/tmp/$key\"); close(\$tmpfile);" &
child=$!
echo $contid > /proc/$child/audit_containerid
sleep 2
ausearch -i -ts recent -k $key
auditctl -d exit,always -F dir=/tmp -F perm=wa -F contid=$contid -F key=$key
rm -f /tmp/$key

This should produce an event such as:

type=CONTAINER_ID msg=audit(2018-06-06 12:46:31.707:26953) : contid=123459
type=PROCTITLE msg=audit(2018-06-06 12:46:31.707:26953) : proctitle=perl -e sleep 1; open(my $tmpfile, '>', "/tmp/tmpcontainerid"); close($tmpfile);
type=PATH msg=audit(2018-06-06 12:46:31.707:26953) : item=1 name=/tmp/tmpcontainerid inode=25656 dev=00:26 mode=file,644 ouid=root ogid=root rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=none cap_fi=none cap_fe=0 cap_fver=0
type=PATH msg=audit(2018-06-06 12:46:31.707:26953) : item=0 name=/tmp/ inode=8985 dev=00:26 mode=dir,sticky,777 ouid=root ogid=root rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype=PARENT cap_fp=none cap_fi=none cap_fe=0 cap_fver=0
type=CWD msg=audit(2018-06-06 12:46:31.707:26953) : cwd=/root
type=SYSCALL msg=audit(2018-06-06 12:46:31.707:26953) : arch=x86_64 syscall=openat success=yes exit=3 a0=0xffffffffffffff9c a1=0x5621f2b81900 a2=O_WRONLY|O_CREAT|O_TRUNC a3=0x1b6 items=2 ppid=628 pid=2232 auid=root uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=ttyS0 ses=1 comm=perl exe=/usr/bin/perl subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=tmpcontainerid

Example: Test multiple containers on one netns:

sleep 5 &
child1=$!
containerid1=123451
echo $containerid1 > /proc/$child1/audit_containerid
sleep 5 &
child2=$!
containerid2=123452
echo $containerid2 > /proc/$child2/audit_containerid
iptables -I INPUT -i lo -p icmp --icmp-type echo-request -j AUDIT --type accept
iptables -I INPUT -t mangle -i lo -p icmp --icmp-type echo-request -j MARK --set-mark 0x12345555
sleep 1;
bash -c "ping -q -c 1 127.0.0.1 >/dev/null 2>&1"
sleep 1;
ausearch -i -m NETFILTER_PKT -ts boot|grep mark=0x12345555
ausearch -i -m NETFILTER_PKT -ts boot|grep contid=|grep $containerid1|grep $containerid2

This should produce an event such as:

type=NETFILTER_PKT msg=audit(03/15/2019 14:16:13.369:244) : mark=0x12345555 saddr=127.0.0.1 daddr=127.0.0.1 proto=icmp
type=CONTAINER_ID msg=audit(03/15/2019 14:16:13.369:244) : contid=123452,123451


Includes the last patch of https://github.com/linux-audit/audit-kernel/issues/81
Please see the github audit kernel issue for the main feature:
https://github.com/linux-audit/audit-kernel/issues/90
and the kernel filter code:
https://github.com/linux-audit/audit-kernel/issues/91
and the network support:
https://github.com/linux-audit/audit-kernel/issues/92
Please see the github audit userspace issue for supporting record types:
https://github.com/linux-audit/audit-userspace/issues/51
and filter code:
https://github.com/linux-audit/audit-userspace/issues/40
Please see the github audit testsuiite issue for the test case:
https://github.com/linux-audit/audit-testsuite/issues/64
Please see the github audit wiki for the feature overview:
https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID


Changelog:
v7
- remove BUG() in audit_comparator64()
- rebase on v5.2-rc1 audit/next
- resolve merge conflict with ghak111 (signal_info regardless syscall)
- resolve merge conflict with ghak73 (audit_field_valid)
- resolve merge conflict with ghak64 (saddr_fam filter)
- resolve merge conflict with ghak10 (ntp audit) change AUDIT_CONTAINER_ID from 1332 to 1334
- rebase on v5.3-rc1 audit/next
- track container owner
- only permit setting contid of descendants for nesting
- track drop of contid and permit reuse
- track and report container nesting
- permit filtering on any nested contid
- set/get contid and loginuid/sessionid via netlink
- implement capcontid to enable orchestrators in non-init user
namespaces
- limit number of containers
- limit depth of container nesting

v6
- change TMPBUFLEN from 11 to 21 to cover the decimal value of contid
u64 (nhorman)
- fix bug overwriting ctx in struct audit_sig_info, move cid above
ctx[0] (nhorman)
- fix bug skipping remaining fields and not advancing bufp when copying
out contid in audit_krule_to_data (omosnacec)
- add acks, tidy commit descriptions, other formatting fixes (checkpatch
wrong on audit_log_lost)
- cast ull for u64 prints
- target_cid tracking was moved from the ptrace/signal patch to
container_op
- target ptrace and signal records were moved from the ptrace/signal
patch to container_id
- auditd signaller tracking was moved to a new AUDIT_SIGNAL_INFO2
request and record
- ditch unnecessary list_empty() checks
- check for null net and aunet in audit_netns_contid_add()
- swap CONTAINER_OP contid/old-contid order to ease parsing

v5
- address loginuid and sessionid syscall scope in ghak104
- address audit_context in CONFIG_AUDIT vs CONFIG_AUDITSYSCALL in ghak105
- remove tty patch, addressed in ghak106
- rebase on audit/next v5.0-rc1
w/ghak59/ghak104/ghak103/ghak100/ghak107/ghak105/ghak106/ghak105sup
- update CONTAINER_ID to CONTAINER_OP in patch description
- move audit_context in audit_task_info to CONFIG_AUDITSYSCALL
- move audit_alloc() and audit_free() out of CONFIG_AUDITSYSCALL and into
CONFIG_AUDIT and create audit_{alloc,free}_syscall
- use plain kmem_cache_alloc() rather than kmem_cache_zalloc() in audit_alloc()
- fix audit_get_contid() declaration type error
- move audit_set_contid() from auditsc.c to audit.c
- audit_log_contid() returns void
- audit_log_contid() handed contid rather than tsk
- switch from AUDIT_CONTAINER to AUDIT_CONTAINER_ID for aux record
- move audit_log_contid(tsk/contid) & audit_contid_set(tsk)/audit_contid_valid(contid)
- switch from tsk to current
- audit_alloc_local() calls audit_log_lost() on failure to allocate a context
- add AUDIT_USER* non-syscall contid record
- cosmetic cleanup double parens, goto out on err
- ditch audit_get_ns_contid_list_lock(), fix aunet lock race
- switch from all-cpu read spinlock to rcu, keep spinlock for write
- update audit_alloc_local() to use ktime_get_coarse_real_ts64()
- add nft_log support
- add call from do_exit() in audit_free() to remove contid from netns
- relegate AUDIT_CONTAINER ref= field (was op=) to debug patch

v4
- preface set with ghak81:"collect audit task parameters"
- add shallyn and sgrubb acks
- rename feature bitmap macro
- rename cid_valid() to audit_contid_valid()
- rename AUDIT_CONTAINER_ID to AUDIT_CONTAINER_OP
- delete audit_get_contid_list() from headers
- move work into inner if, delete "found"
- change netns contid list function names
- move exports for audit_log_contid audit_alloc_local audit_free_context to non-syscall patch
- list contids CSV
- pass in gfp flags to audit_alloc_local() (fix audit_alloc_context callers)
- use "local" in lieu of abusing in_syscall for auditsc_get_stamp()
- read_lock(&tasklist_lock) around children and thread check
- task_lock(tsk) should be taken before first check of tsk->audit
- add spin lock to contid list in aunet
- restrict /proc read to CAP_AUDIT_CONTROL
- remove set again prohibition and inherited flag
- delete contidion spelling fix from patchset, send to netdev/linux-wireless

v3
- switched from containerid in task_struct to audit_task_info (depends on ghak81)
- drop INVALID_CID in favour of only AUDIT_CID_UNSET
- check for !audit_task_info, throw -ENOPROTOOPT on set
- changed -EPERM to -EEXIST for parent check
- return AUDIT_CID_UNSET if !audit_enabled
- squash child/thread check patch into AUDIT_CONTAINER_ID patch
- changed -EPERM to -EBUSY for child check
- separate child and thread checks, use -EALREADY for latter
- move addition of op= from ptrace/signal patch to AUDIT_CONTAINER patch
- fix && to || bashism in ptrace/signal patch
- uninline and export function for audit_free_context()
- drop CONFIG_CHANGE, FEATURE_CHANGE, ANOM_ABEND, ANOM_SECCOMP patches
- move audit_enabled check (xt_AUDIT)
- switched from containerid list in struct net to net_generic's struct audit_net
- move containerid list iteration into audit (xt_AUDIT)
- create function to move namespace switch into audit
- switched /proc/PID/ entry from containerid to audit_containerid
- call kzalloc with GFP_ATOMIC on in_atomic() in audit_alloc_context()
- call kzalloc with GFP_ATOMIC on in_atomic() in audit_log_container_info()
- use xt_net(par) instead of sock_net(skb->sk) to get net
- switched record and field names: initial CONTAINER_ID, aux CONTAINER, field CONTID
- allow to set own contid
- open code audit_set_containerid
- add contid inherited flag
- ccontainerid and pcontainerid eliminated due to inherited flag
- change name of container list funcitons
- rename containerid to contid
- convert initial container record to syscall aux
- fix spelling mistake of contidion in net/rfkill/core.c to avoid contid name collision

v2
- add check for children and threads
- add network namespace container identifier list
- add NETFILTER_PKT audit container identifier logging
- patch description and documentation clean-up and example
- reap unused ppid

Richard Guy Briggs (21):
audit: collect audit task parameters
audit: add container id
audit: read container ID of a process
audit: convert to contid list to check for orch/engine ownership
audit: log drop of contid on exit of last task
audit: contid limit of 32k imposed to avoid DoS
audit: log container info of syscalls
audit: add contid support for signalling the audit daemon
audit: add support for non-syscall auxiliary records
audit: add containerid support for user records
audit: add containerid filtering
audit: add support for containerid to network namespaces
audit: NETFILTER_PKT: record each container ID associated with a netNS
audit: contid check descendancy and nesting
sched: pull task_is_descendant into kernel/sched/core.c
audit: add support for contid set/get by netlink
audit: add support for loginuid/sessionid set/get by netlink
audit: track container nesting
audit: check cont depth
audit: add capcontid to set contid outside init_user_ns
audit: add proc interface for capcontid

fs/proc/base.c | 112 ++++++-
include/linux/audit.h | 148 ++++++++-
include/linux/sched.h | 10 +-
include/uapi/linux/audit.h | 16 +-
init/init_task.c | 3 +-
init/main.c | 2 +
kernel/audit.c | 728 +++++++++++++++++++++++++++++++++++++++++++-
kernel/audit.h | 38 +++
kernel/auditfilter.c | 64 ++++
kernel/auditsc.c | 91 ++++--
kernel/fork.c | 1 -
kernel/nsproxy.c | 4 +
kernel/sched/core.c | 33 ++
net/netfilter/nft_log.c | 11 +-
net/netfilter/xt_AUDIT.c | 11 +-
security/selinux/nlmsgtab.c | 1 +
security/yama/yama_lsm.c | 33 --
17 files changed, 1210 insertions(+), 96 deletions(-)

--
1.8.3.1


2019-09-19 01:29:46

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 01/21] audit: collect audit task parameters

The audit-related parameters in struct task_struct should ideally be
collected together and accessed through a standard audit API.

Collect the existing loginuid, sessionid and audit_context together in a
new struct audit_task_info called "audit" in struct task_struct.

Use kmem_cache to manage this pool of memory.
Un-inline audit_free() to be able to always recover that memory.

Please see the upstream github issue
https://github.com/linux-audit/audit-kernel/issues/81

Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
---
include/linux/audit.h | 49 +++++++++++++++++++++++------------
include/linux/sched.h | 7 +----
init/init_task.c | 3 +--
init/main.c | 2 ++
kernel/audit.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++++--
kernel/audit.h | 5 ++++
kernel/auditsc.c | 26 ++++++++++---------
kernel/fork.c | 1 -
8 files changed, 124 insertions(+), 40 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 97d0925454df..4fbda55f3cf2 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -95,6 +95,16 @@ struct audit_ntp_data {
struct audit_ntp_data {};
#endif

+struct audit_task_info {
+ kuid_t loginuid;
+ unsigned int sessionid;
+#ifdef CONFIG_AUDITSYSCALL
+ struct audit_context *ctx;
+#endif
+};
+
+extern struct audit_task_info init_struct_audit;
+
extern int is_audit_feature_set(int which);

extern int __init audit_register_class(int class, unsigned *list);
@@ -131,6 +141,9 @@ struct audit_ntp_data {
#ifdef CONFIG_AUDIT
/* These are defined in audit.c */
/* Public API */
+extern int audit_alloc(struct task_struct *task);
+extern void audit_free(struct task_struct *task);
+extern void __init audit_task_init(void);
extern __printf(4, 5)
void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
const char *fmt, ...);
@@ -173,12 +186,16 @@ extern void audit_log_key(struct audit_buffer *ab,

static inline kuid_t audit_get_loginuid(struct task_struct *tsk)
{
- return tsk->loginuid;
+ if (!tsk->audit)
+ return INVALID_UID;
+ return tsk->audit->loginuid;
}

static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
{
- return tsk->sessionid;
+ if (!tsk->audit)
+ return AUDIT_SID_UNSET;
+ return tsk->audit->sessionid;
}

extern u32 audit_enabled;
@@ -186,6 +203,14 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
extern int audit_signal_info(int sig, struct task_struct *t);

#else /* CONFIG_AUDIT */
+static inline int audit_alloc(struct task_struct *task)
+{
+ return 0;
+}
+static inline void audit_free(struct task_struct *task)
+{ }
+static inline void __init audit_task_init(void)
+{ }
static inline __printf(4, 5)
void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
const char *fmt, ...)
@@ -257,8 +282,6 @@ static inline int audit_signal_info(int sig, struct task_struct *t)

/* These are defined in auditsc.c */
/* Public API */
-extern int audit_alloc(struct task_struct *task);
-extern void __audit_free(struct task_struct *task);
extern void __audit_syscall_entry(int major, unsigned long a0, unsigned long a1,
unsigned long a2, unsigned long a3);
extern void __audit_syscall_exit(int ret_success, long ret_value);
@@ -281,12 +304,14 @@ extern void audit_seccomp_actions_logged(const char *names,

static inline void audit_set_context(struct task_struct *task, struct audit_context *ctx)
{
- task->audit_context = ctx;
+ task->audit->ctx = ctx;
}

static inline struct audit_context *audit_context(void)
{
- return current->audit_context;
+ if (!current->audit)
+ return NULL;
+ return current->audit->ctx;
}

static inline bool audit_dummy_context(void)
@@ -294,11 +319,7 @@ static inline bool audit_dummy_context(void)
void *p = audit_context();
return !p || *(int *)p;
}
-static inline void audit_free(struct task_struct *task)
-{
- if (unlikely(task->audit_context))
- __audit_free(task);
-}
+
static inline void audit_syscall_entry(int major, unsigned long a0,
unsigned long a1, unsigned long a2,
unsigned long a3)
@@ -523,12 +544,6 @@ static inline void audit_ntp_log(const struct audit_ntp_data *ad)
extern int audit_n_rules;
extern int audit_signals;
#else /* CONFIG_AUDITSYSCALL */
-static inline int audit_alloc(struct task_struct *task)
-{
- return 0;
-}
-static inline void audit_free(struct task_struct *task)
-{ }
static inline void audit_syscall_entry(int major, unsigned long a0,
unsigned long a1, unsigned long a2,
unsigned long a3)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8dc1811487f5..a936d162513a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -31,7 +31,6 @@
#include <linux/rseq.h>

/* task_struct member predeclarations (sorted alphabetically): */
-struct audit_context;
struct backing_dev_info;
struct bio_list;
struct blk_plug;
@@ -940,11 +939,7 @@ struct task_struct {
struct callback_head *task_works;

#ifdef CONFIG_AUDIT
-#ifdef CONFIG_AUDITSYSCALL
- struct audit_context *audit_context;
-#endif
- kuid_t loginuid;
- unsigned int sessionid;
+ struct audit_task_info *audit;
#endif
struct seccomp seccomp;

diff --git a/init/init_task.c b/init/init_task.c
index 7ab773b9b3cd..6496bbe5c56e 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -124,8 +124,7 @@ struct task_struct init_task
.thread_group = LIST_HEAD_INIT(init_task.thread_group),
.thread_node = LIST_HEAD_INIT(init_signals.thread_head),
#ifdef CONFIG_AUDIT
- .loginuid = INVALID_UID,
- .sessionid = AUDIT_SID_UNSET,
+ .audit = &init_struct_audit,
#endif
#ifdef CONFIG_PERF_EVENTS
.perf_event_mutex = __MUTEX_INITIALIZER(init_task.perf_event_mutex),
diff --git a/init/main.c b/init/main.c
index 96f8d5af52d6..dbcaa49bbaea 100644
--- a/init/main.c
+++ b/init/main.c
@@ -93,6 +93,7 @@
#include <linux/rodata_test.h>
#include <linux/jump_label.h>
#include <linux/mem_encrypt.h>
+#include <linux/audit.h>

#include <asm/io.h>
#include <asm/bugs.h>
@@ -771,6 +772,7 @@ asmlinkage __visible void __init start_kernel(void)
nsfs_init();
cpuset_init();
cgroup_init();
+ audit_task_init();
taskstats_init_early();
delayacct_init();

diff --git a/kernel/audit.c b/kernel/audit.c
index da8dc0db5bd3..5b1c52bafaeb 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -202,6 +202,73 @@ struct audit_reply {
struct sk_buff *skb;
};

+static struct kmem_cache *audit_task_cache;
+
+void __init audit_task_init(void)
+{
+ audit_task_cache = kmem_cache_create("audit_task",
+ sizeof(struct audit_task_info),
+ 0, SLAB_PANIC, NULL);
+}
+
+/**
+ * audit_alloc - allocate an audit info block for a task
+ * @tsk: task
+ *
+ * Call audit_alloc_syscall to filter on the task information and
+ * allocate a per-task audit context if necessary. This is called from
+ * copy_process, so no lock is needed.
+ */
+int audit_alloc(struct task_struct *tsk)
+{
+ int ret = 0;
+ struct audit_task_info *info;
+
+ info = kmem_cache_alloc(audit_task_cache, GFP_KERNEL);
+ if (!info) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ info->loginuid = audit_get_loginuid(current);
+ info->sessionid = audit_get_sessionid(current);
+ tsk->audit = info;
+
+ ret = audit_alloc_syscall(tsk);
+ if (ret) {
+ tsk->audit = NULL;
+ kmem_cache_free(audit_task_cache, info);
+ }
+out:
+ return ret;
+}
+
+struct audit_task_info init_struct_audit = {
+ .loginuid = INVALID_UID,
+ .sessionid = AUDIT_SID_UNSET,
+#ifdef CONFIG_AUDITSYSCALL
+ .ctx = NULL,
+#endif
+};
+
+/**
+ * audit_free - free per-task audit info
+ * @tsk: task whose audit info block to free
+ *
+ * Called from copy_process and do_exit
+ */
+void audit_free(struct task_struct *tsk)
+{
+ struct audit_task_info *info = tsk->audit;
+
+ audit_free_syscall(tsk);
+ /* Freeing the audit_task_info struct must be performed after
+ * audit_log_exit() due to need for loginuid and sessionid.
+ */
+ info = tsk->audit;
+ tsk->audit = NULL;
+ kmem_cache_free(audit_task_cache, info);
+}
+
/**
* auditd_test_task - Check to see if a given task is an audit daemon
* @task: the task to check
@@ -2253,8 +2320,8 @@ int audit_set_loginuid(kuid_t loginuid)
sessionid = (unsigned int)atomic_inc_return(&session_id);
}

- current->sessionid = sessionid;
- current->loginuid = loginuid;
+ current->audit->sessionid = sessionid;
+ current->audit->loginuid = loginuid;
out:
audit_log_set_loginuid(oldloginuid, loginuid, oldsessionid, sessionid, rc);
return rc;
diff --git a/kernel/audit.h b/kernel/audit.h
index 6fb7160412d4..7f623ef216e6 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -251,6 +251,8 @@ extern void audit_log_d_path_exe(struct audit_buffer *ab,
extern unsigned int audit_serial(void);
extern int auditsc_get_stamp(struct audit_context *ctx,
struct timespec64 *t, unsigned int *serial);
+extern int audit_alloc_syscall(struct task_struct *tsk);
+extern void audit_free_syscall(struct task_struct *tsk);

extern void audit_put_watch(struct audit_watch *watch);
extern void audit_get_watch(struct audit_watch *watch);
@@ -292,6 +294,9 @@ extern void audit_filter_inodes(struct task_struct *tsk,
extern struct list_head *audit_killed_trees(void);
#else /* CONFIG_AUDITSYSCALL */
#define auditsc_get_stamp(c, t, s) 0
+#define audit_alloc_syscall(t) 0
+#define audit_free_syscall(t) {}
+
#define audit_put_watch(w) {}
#define audit_get_watch(w) {}
#define audit_to_watch(k, p, l, o) (-EINVAL)
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 4effe01ebbe2..10679da36bb6 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -903,23 +903,25 @@ static inline struct audit_context *audit_alloc_context(enum audit_state state)
return context;
}

-/**
- * audit_alloc - allocate an audit context block for a task
+/*
+ * audit_alloc_syscall - allocate an audit context block for a task
* @tsk: task
*
* Filter on the task information and allocate a per-task audit context
* if necessary. Doing so turns on system call auditing for the
- * specified task. This is called from copy_process, so no lock is
- * needed.
+ * specified task. This is called from copy_process via audit_alloc, so
+ * no lock is needed.
*/
-int audit_alloc(struct task_struct *tsk)
+int audit_alloc_syscall(struct task_struct *tsk)
{
struct audit_context *context;
enum audit_state state;
char *key = NULL;

- if (likely(!audit_ever_enabled))
+ if (likely(!audit_ever_enabled)) {
+ audit_set_context(tsk, NULL);
return 0; /* Return if not auditing. */
+ }

state = audit_filter_task(tsk, &key);
if (state == AUDIT_DISABLED) {
@@ -929,7 +931,7 @@ int audit_alloc(struct task_struct *tsk)

if (!(context = audit_alloc_context(state))) {
kfree(key);
- audit_log_lost("out of memory in audit_alloc");
+ audit_log_lost("out of memory in audit_alloc_syscall");
return -ENOMEM;
}
context->filterkey = key;
@@ -1574,14 +1576,15 @@ static void audit_log_exit(void)
}

/**
- * __audit_free - free a per-task audit context
+ * audit_free_syscall - free per-task audit context info
* @tsk: task whose audit context block to free
*
- * Called from copy_process and do_exit
+ * Called from audit_free
*/
-void __audit_free(struct task_struct *tsk)
+void audit_free_syscall(struct task_struct *tsk)
{
- struct audit_context *context = tsk->audit_context;
+ struct audit_task_info *info = tsk->audit;
+ struct audit_context *context = info->ctx;

if (!context)
return;
@@ -1604,7 +1607,6 @@ void __audit_free(struct task_struct *tsk)
if (context->current_state == AUDIT_RECORD_CONTEXT)
audit_log_exit();
}
-
audit_set_context(tsk, NULL);
audit_free_context(context);
}
diff --git a/kernel/fork.c b/kernel/fork.c
index d8ae0f1b4148..ef9c123e8ae8 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1938,7 +1938,6 @@ static __latent_entropy struct task_struct *copy_process(
posix_cpu_timers_init(p);

p->io_context = NULL;
- audit_set_context(p, NULL);
cgroup_fork(p);
#ifdef CONFIG_NUMA
p->mempolicy = mpol_dup(p->mempolicy);
--
1.8.3.1

2019-09-19 01:30:30

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 03/21] audit: read container ID of a process

Add support for reading the audit container identifier from the proc
filesystem.

This is a read from the proc entry of the form
/proc/PID/audit_containerid where PID is the process ID of the task
whose audit container identifier is sought.

The read expects up to a u64 value (unset: 18446744073709551615).

This read requires CAP_AUDIT_CONTROL.

Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
---
fs/proc/base.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index e2e7c9f4702f..26091800180c 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1224,7 +1224,7 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf,
};

#ifdef CONFIG_AUDIT
-#define TMPBUFLEN 11
+#define TMPBUFLEN 21
static ssize_t proc_loginuid_read(struct file * file, char __user * buf,
size_t count, loff_t *ppos)
{
@@ -1308,6 +1308,24 @@ static ssize_t proc_sessionid_read(struct file * file, char __user * buf,
.llseek = generic_file_llseek,
};

+static ssize_t proc_contid_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct inode *inode = file_inode(file);
+ struct task_struct *task = get_proc_task(inode);
+ ssize_t length;
+ char tmpbuf[TMPBUFLEN];
+
+ if (!task)
+ return -ESRCH;
+ /* if we don't have caps, reject */
+ if (!capable(CAP_AUDIT_CONTROL))
+ return -EPERM;
+ length = scnprintf(tmpbuf, TMPBUFLEN, "%llu", audit_get_contid(task));
+ put_task_struct(task);
+ return simple_read_from_buffer(buf, count, ppos, tmpbuf, length);
+}
+
static ssize_t proc_contid_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
@@ -1338,6 +1356,7 @@ static ssize_t proc_contid_write(struct file *file, const char __user *buf,
}

static const struct file_operations proc_contid_operations = {
+ .read = proc_contid_read,
.write = proc_contid_write,
.llseek = generic_file_llseek,
};
@@ -3101,7 +3120,7 @@ static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
#ifdef CONFIG_AUDIT
REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
REG("sessionid", S_IRUGO, proc_sessionid_operations),
- REG("audit_containerid", S_IWUSR, proc_contid_operations),
+ REG("audit_containerid", S_IWUSR|S_IRUSR, proc_contid_operations),
#endif
#ifdef CONFIG_FAULT_INJECTION
REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
@@ -3502,7 +3521,7 @@ static int proc_tid_comm_permission(struct inode *inode, int mask)
#ifdef CONFIG_AUDIT
REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
REG("sessionid", S_IRUGO, proc_sessionid_operations),
- REG("audit_containerid", S_IWUSR, proc_contid_operations),
+ REG("audit_containerid", S_IWUSR|S_IRUSR, proc_contid_operations),
#endif
#ifdef CONFIG_FAULT_INJECTION
REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
--
1.8.3.1

2019-09-19 01:30:51

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 07/21] audit: log container info of syscalls

Create a new audit record AUDIT_CONTAINER_ID to document the audit
container identifier of a process if it is present.

Called from audit_log_exit(), syscalls are covered.

A sample raw event:
type=SYSCALL msg=audit(1519924845.499:257): arch=c000003e syscall=257 success=yes exit=3 a0=ffffff9c a1=56374e1cef30 a2=241 a3=1b6 items=2 ppid=606 pid=635 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3 comm="bash" exe="/usr/bin/bash" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="tmpcontainerid"
type=CWD msg=audit(1519924845.499:257): cwd="/root"
type=PATH msg=audit(1519924845.499:257): item=0 name="/tmp/" inode=13863 dev=00:27 mode=041777 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype= PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
type=PATH msg=audit(1519924845.499:257): item=1 name="/tmp/tmpcontainerid" inode=17729 dev=00:27 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
type=PROCTITLE msg=audit(1519924845.499:257): proctitle=62617368002D6300736C65657020313B206563686F2074657374203E202F746D702F746D70636F6E7461696E65726964
type=CONTAINER_ID msg=audit(1519924845.499:257): contid=123458

Please see the github audit kernel issue for the main feature:
https://github.com/linux-audit/audit-kernel/issues/90
Please see the github audit userspace issue for supporting additions:
https://github.com/linux-audit/audit-userspace/issues/51
Please see the github audit testsuiite issue for the test case:
https://github.com/linux-audit/audit-testsuite/issues/64
Please see the github audit wiki for the feature overview:
https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Acked-by: Steve Grubb <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
---
include/linux/audit.h | 5 +++++
include/uapi/linux/audit.h | 1 +
kernel/audit.c | 20 ++++++++++++++++++++
kernel/auditsc.c | 20 ++++++++++++++------
4 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index e317807cdd3e..0c18d8e30620 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -220,6 +220,8 @@ static inline u64 audit_get_contid(struct task_struct *tsk)

extern void audit_cont_put(struct audit_cont *cont);

+extern void audit_log_container_id(struct audit_context *context, u64 contid);
+
extern u32 audit_enabled;

extern int audit_signal_info(int sig, struct task_struct *t);
@@ -297,6 +299,9 @@ static inline struct audit_cont *audit_cont(struct task_struct *tsk)
static inline void audit_cont_put(struct audit_cont *cont)
{ }

+static inline void audit_log_container_id(struct audit_context *context, u64 contid)
+{ }
+
#define audit_enabled AUDIT_OFF

static inline int audit_signal_info(int sig, struct task_struct *t)
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 5d0ea2a6783e..4ed080f28b47 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -117,6 +117,7 @@
#define AUDIT_FANOTIFY 1331 /* Fanotify access decision */
#define AUDIT_TIME_INJOFFSET 1332 /* Timekeeping offset injected */
#define AUDIT_TIME_ADJNTPVAL 1333 /* NTP value adjustment */
+#define AUDIT_CONTAINER_ID 1334 /* Container ID */

#define AUDIT_AVC 1400 /* SE Linux avc denial or grant */
#define AUDIT_SELINUX_ERR 1401 /* Internal SE Linux Errors */
diff --git a/kernel/audit.c b/kernel/audit.c
index 329916534dd2..adfb3e6a7f0c 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -2127,6 +2127,26 @@ void audit_log_session_info(struct audit_buffer *ab)
audit_log_format(ab, "auid=%u ses=%u", auid, sessionid);
}

+/*
+ * audit_log_container_id - report container info
+ * @context: task or local context for record
+ * @contid: container ID to report
+ */
+void audit_log_container_id(struct audit_context *context, u64 contid)
+{
+ struct audit_buffer *ab;
+
+ if (!audit_contid_valid(contid))
+ return;
+ /* Generate AUDIT_CONTAINER_ID record with container ID */
+ ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONTAINER_ID);
+ if (!ab)
+ return;
+ audit_log_format(ab, "contid=%llu", contid);
+ audit_log_end(ab);
+}
+EXPORT_SYMBOL(audit_log_container_id);
+
void audit_log_key(struct audit_buffer *ab, char *key)
{
audit_log_format(ab, " key=");
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index bd855794ad26..ac438fcff807 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1534,7 +1534,7 @@ static void audit_log_exit(void)
for (aux = context->aux_pids; aux; aux = aux->next) {
struct audit_aux_data_pids *axs = (void *)aux;

- for (i = 0; i < axs->pid_count; i++)
+ for (i = 0; i < axs->pid_count; i++) {
if (audit_log_pid_context(context, axs->target_pid[i],
axs->target_auid[i],
axs->target_uid[i],
@@ -1542,14 +1542,20 @@ static void audit_log_exit(void)
axs->target_sid[i],
axs->target_comm[i]))
call_panic = 1;
+ audit_log_container_id(context, axs->target_cid[i]);
+ }
}

- if (context->target_pid &&
- audit_log_pid_context(context, context->target_pid,
- context->target_auid, context->target_uid,
- context->target_sessionid,
- context->target_sid, context->target_comm))
+ if (context->target_pid) {
+ if (audit_log_pid_context(context, context->target_pid,
+ context->target_auid,
+ context->target_uid,
+ context->target_sessionid,
+ context->target_sid,
+ context->target_comm))
call_panic = 1;
+ audit_log_container_id(context, context->target_cid);
+ }

if (context->pwd.dentry && context->pwd.mnt) {
ab = audit_log_start(context, GFP_KERNEL, AUDIT_CWD);
@@ -1568,6 +1574,8 @@ static void audit_log_exit(void)

audit_log_proctitle();

+ audit_log_container_id(context, audit_get_contid(current));
+
audit_log_container_drop();

/* Send end of event record to help user space know we are finished */
--
1.8.3.1

2019-09-19 01:30:54

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 08/21] audit: add contid support for signalling the audit daemon

Add audit container identifier support to the action of signalling the
audit daemon.

Since this would need to add an element to the audit_sig_info struct,
a new record type AUDIT_SIGNAL_INFO2 was created with a new
audit_sig_info2 struct. Corresponding support is required in the
userspace code to reflect the new record request and reply type.
An older userspace won't break since it won't know to request this
record type.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
include/linux/audit.h | 7 +++++++
include/uapi/linux/audit.h | 1 +
kernel/audit.c | 28 ++++++++++++++++++++++++++++
kernel/audit.h | 1 +
security/selinux/nlmsgtab.c | 1 +
5 files changed, 38 insertions(+)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 0c18d8e30620..7b640c4da4ee 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -23,6 +23,13 @@ struct audit_sig_info {
char ctx[0];
};

+struct audit_sig_info2 {
+ uid_t uid;
+ pid_t pid;
+ u64 cid;
+ char ctx[0];
+};
+
struct audit_buffer;
struct audit_context;
struct inode;
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 4ed080f28b47..693ec6e0288b 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -72,6 +72,7 @@
#define AUDIT_SET_FEATURE 1018 /* Turn an audit feature on or off */
#define AUDIT_GET_FEATURE 1019 /* Get which features are enabled */
#define AUDIT_CONTAINER_OP 1020 /* Define the container id and info */
+#define AUDIT_SIGNAL_INFO2 1021 /* Get info auditd signal sender */

#define AUDIT_FIRST_USER_MSG 1100 /* Userspace messages mostly uninteresting to kernel */
#define AUDIT_USER_AVC 1107 /* We filter this differently */
diff --git a/kernel/audit.c b/kernel/audit.c
index adfb3e6a7f0c..df3db29f5a8a 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -125,6 +125,7 @@ struct audit_net {
kuid_t audit_sig_uid = INVALID_UID;
pid_t audit_sig_pid = -1;
u32 audit_sig_sid = 0;
+u64 audit_sig_cid = AUDIT_CID_UNSET;

/* Records can be lost in several ways:
0) [suppressed in audit_alloc]
@@ -1094,6 +1095,7 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
case AUDIT_ADD_RULE:
case AUDIT_DEL_RULE:
case AUDIT_SIGNAL_INFO:
+ case AUDIT_SIGNAL_INFO2:
case AUDIT_TTY_GET:
case AUDIT_TTY_SET:
case AUDIT_TRIM:
@@ -1257,6 +1259,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
struct audit_buffer *ab;
u16 msg_type = nlh->nlmsg_type;
struct audit_sig_info *sig_data;
+ struct audit_sig_info2 *sig_data2;
char *ctx = NULL;
u32 len;

@@ -1516,6 +1519,30 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
sig_data, sizeof(*sig_data) + len);
kfree(sig_data);
break;
+ case AUDIT_SIGNAL_INFO2:
+ len = 0;
+ if (audit_sig_sid) {
+ err = security_secid_to_secctx(audit_sig_sid, &ctx, &len);
+ if (err)
+ return err;
+ }
+ sig_data2 = kmalloc(sizeof(*sig_data2) + len, GFP_KERNEL);
+ if (!sig_data2) {
+ if (audit_sig_sid)
+ security_release_secctx(ctx, len);
+ return -ENOMEM;
+ }
+ sig_data2->uid = from_kuid(&init_user_ns, audit_sig_uid);
+ sig_data2->pid = audit_sig_pid;
+ if (audit_sig_sid) {
+ memcpy(sig_data2->ctx, ctx, len);
+ security_release_secctx(ctx, len);
+ }
+ sig_data2->cid = audit_sig_cid;
+ audit_send_reply(skb, seq, AUDIT_SIGNAL_INFO2, 0, 0,
+ sig_data2, sizeof(*sig_data2) + len);
+ kfree(sig_data2);
+ break;
case AUDIT_TTY_GET: {
struct audit_tty_status s;
unsigned int t;
@@ -2384,6 +2411,7 @@ int audit_signal_info(int sig, struct task_struct *t)
else
audit_sig_uid = uid;
security_task_getsecid(current, &audit_sig_sid);
+ audit_sig_cid = audit_get_contid(current);
}

return audit_signal_info_syscall(t);
diff --git a/kernel/audit.h b/kernel/audit.h
index 543f1334ba47..c9a118716ced 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -350,6 +350,7 @@ static inline int audit_signal_info_syscall(struct task_struct *t)
extern pid_t audit_sig_pid;
extern kuid_t audit_sig_uid;
extern u32 audit_sig_sid;
+extern u64 audit_sig_cid;

extern int audit_filter(int msgtype, unsigned int listtype);

diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
index 58345ba0528e..bf21979e7737 100644
--- a/security/selinux/nlmsgtab.c
+++ b/security/selinux/nlmsgtab.c
@@ -132,6 +132,7 @@ struct nlmsg_perm {
{ AUDIT_DEL_RULE, NETLINK_AUDIT_SOCKET__NLMSG_WRITE },
{ AUDIT_USER, NETLINK_AUDIT_SOCKET__NLMSG_RELAY },
{ AUDIT_SIGNAL_INFO, NETLINK_AUDIT_SOCKET__NLMSG_READ },
+ { AUDIT_SIGNAL_INFO2, NETLINK_AUDIT_SOCKET__NLMSG_READ },
{ AUDIT_TRIM, NETLINK_AUDIT_SOCKET__NLMSG_WRITE },
{ AUDIT_MAKE_EQUIV, NETLINK_AUDIT_SOCKET__NLMSG_WRITE },
{ AUDIT_TTY_GET, NETLINK_AUDIT_SOCKET__NLMSG_READ },
--
1.8.3.1

2019-09-19 01:31:02

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 09/21] audit: add support for non-syscall auxiliary records

Standalone audit records have the timestamp and serial number generated
on the fly and as such are unique, making them standalone. This new
function audit_alloc_local() generates a local audit context that will
be used only for a standalone record and its auxiliary record(s). The
context is discarded immediately after the local associated records are
produced.

Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
---
include/linux/audit.h | 8 ++++++++
kernel/audit.h | 1 +
kernel/auditsc.c | 35 ++++++++++++++++++++++++++++++-----
3 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 7b640c4da4ee..e849058cb662 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -329,6 +329,8 @@ static inline int audit_signal_info(int sig, struct task_struct *t)

/* These are defined in auditsc.c */
/* Public API */
+extern struct audit_context *audit_alloc_local(gfp_t gfpflags);
+extern void audit_free_context(struct audit_context *context);
extern void __audit_syscall_entry(int major, unsigned long a0, unsigned long a1,
unsigned long a2, unsigned long a3);
extern void __audit_syscall_exit(int ret_success, long ret_value);
@@ -591,6 +593,12 @@ static inline void audit_ntp_log(const struct audit_ntp_data *ad)
extern int audit_n_rules;
extern int audit_signals;
#else /* CONFIG_AUDITSYSCALL */
+static inline struct audit_context *audit_alloc_local(gfp_t gfpflags)
+{
+ return NULL;
+}
+static inline void audit_free_context(struct audit_context *context)
+{ }
static inline void audit_syscall_entry(int major, unsigned long a0,
unsigned long a1, unsigned long a2,
unsigned long a3)
diff --git a/kernel/audit.h b/kernel/audit.h
index c9a118716ced..1bba13bdffd0 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -98,6 +98,7 @@ struct audit_proctitle {
struct audit_context {
int dummy; /* must be the first element */
int in_syscall; /* 1 if task is in a syscall */
+ bool local; /* local context needed */
enum audit_state state, current_state;
unsigned int serial; /* serial number for record */
int major; /* syscall number */
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index ac438fcff807..3138c88887c7 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -890,11 +890,13 @@ static inline void audit_free_aux(struct audit_context *context)
}
}

-static inline struct audit_context *audit_alloc_context(enum audit_state state)
+static inline struct audit_context *audit_alloc_context(enum audit_state state,
+ gfp_t gfpflags)
{
struct audit_context *context;

- context = kzalloc(sizeof(*context), GFP_KERNEL);
+ /* We can be called in atomic context via audit_tg() */
+ context = kzalloc(sizeof(*context), gfpflags);
if (!context)
return NULL;
context->state = state;
@@ -930,7 +932,8 @@ int audit_alloc_syscall(struct task_struct *tsk)
return 0;
}

- if (!(context = audit_alloc_context(state))) {
+ context = audit_alloc_context(state, GFP_KERNEL);
+ if (!context) {
kfree(key);
audit_log_lost("out of memory in audit_alloc_syscall");
return -ENOMEM;
@@ -942,8 +945,29 @@ int audit_alloc_syscall(struct task_struct *tsk)
return 0;
}

-static inline void audit_free_context(struct audit_context *context)
+struct audit_context *audit_alloc_local(gfp_t gfpflags)
{
+ struct audit_context *context = NULL;
+
+ if (!audit_ever_enabled)
+ goto out; /* Return if not auditing. */
+ context = audit_alloc_context(AUDIT_RECORD_CONTEXT, gfpflags);
+ if (!context) {
+ audit_log_lost("out of memory in audit_alloc_local");
+ goto out;
+ }
+ context->serial = audit_serial();
+ ktime_get_coarse_real_ts64(&context->ctime);
+ context->local = true;
+out:
+ return context;
+}
+EXPORT_SYMBOL(audit_alloc_local);
+
+void audit_free_context(struct audit_context *context)
+{
+ if (!context)
+ return;
audit_free_module(context);
audit_free_names(context);
unroll_tree_refs(context, NULL, 0);
@@ -954,6 +978,7 @@ static inline void audit_free_context(struct audit_context *context)
audit_proctitle_free(context);
kfree(context);
}
+EXPORT_SYMBOL(audit_free_context);

static int audit_log_pid_context(struct audit_context *context, pid_t pid,
kuid_t auid, kuid_t uid, unsigned int sessionid,
@@ -2182,7 +2207,7 @@ void __audit_inode_child(struct inode *parent,
int auditsc_get_stamp(struct audit_context *ctx,
struct timespec64 *t, unsigned int *serial)
{
- if (!ctx->in_syscall)
+ if (!ctx->in_syscall && !ctx->local)
return 0;
if (!ctx->serial)
ctx->serial = audit_serial();
--
1.8.3.1

2019-09-19 01:31:20

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 12/21] audit: add support for containerid to network namespaces

Audit events could happen in a network namespace outside of a task
context due to packets received from the net that trigger an auditing
rule prior to being associated with a running task. The network
namespace could be in use by multiple containers by association to the
tasks in that network namespace. We still want a way to attribute
these events to any potential containers. Keep a list per network
namespace to track these audit container identifiiers.

Add/increment the audit container identifier on:
- initial setting of the audit container identifier via /proc
- clone/fork call that inherits an audit container identifier
- unshare call that inherits an audit container identifier
- setns call that inherits an audit container identifier
Delete/decrement the audit container identifier on:
- an inherited audit container identifier dropped when child set
- process exit
- unshare call that drops a net namespace
- setns call that drops a net namespace

Please see the github audit kernel issue for contid net support:
https://github.com/linux-audit/audit-kernel/issues/92
Please see the github audit testsuiite issue for the test case:
https://github.com/linux-audit/audit-testsuite/issues/64
Please see the github audit wiki for the feature overview:
https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
---
include/linux/audit.h | 19 +++++++++++
kernel/audit.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++--
kernel/nsproxy.c | 4 +++
3 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 575fff6ea7c9..73e3ab38e3e0 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -13,6 +13,7 @@
#include <linux/ptrace.h>
#include <linux/namei.h> /* LOOKUP_* */
#include <uapi/linux/audit.h>
+#include <linux/refcount.h>

#define AUDIT_INO_UNSET ((unsigned long)-1)
#define AUDIT_DEV_UNSET ((dev_t)-1)
@@ -122,6 +123,13 @@ struct audit_task_info {

extern struct audit_task_info init_struct_audit;

+struct audit_contid {
+ struct list_head list;
+ u64 id;
+ refcount_t refcount;
+ struct rcu_head rcu;
+};
+
extern int is_audit_feature_set(int which);

extern int __init audit_register_class(int class, unsigned *list);
@@ -229,6 +237,10 @@ static inline u64 audit_get_contid(struct task_struct *tsk)
extern void audit_cont_put(struct audit_cont *cont);

extern void audit_log_container_id(struct audit_context *context, u64 contid);
+extern void audit_netns_contid_add(struct net *net, u64 contid);
+extern void audit_netns_contid_del(struct net *net, u64 contid);
+extern void audit_switch_task_namespaces(struct nsproxy *ns,
+ struct task_struct *p);

extern u32 audit_enabled;

@@ -309,6 +321,13 @@ static inline void audit_cont_put(struct audit_cont *cont)

static inline void audit_log_container_id(struct audit_context *context, u64 contid)
{ }
+static inline void audit_netns_contid_add(struct net *net, u64 contid)
+{ }
+static inline void audit_netns_contid_del(struct net *net, u64 contid)
+{ }
+static inline void audit_switch_task_namespaces(struct nsproxy *ns,
+ struct task_struct *p)
+{ }

#define audit_enabled AUDIT_OFF

diff --git a/kernel/audit.c b/kernel/audit.c
index 7cdb76b38966..e0c27bc39925 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -59,6 +59,7 @@
#include <linux/freezer.h>
#include <linux/pid_namespace.h>
#include <net/netns/generic.h>
+#include <net/net_namespace.h>

#include "audit.h"

@@ -86,9 +87,13 @@
/**
* struct audit_net - audit private network namespace data
* @sk: communication socket
+ * @contid_list: audit container identifier list
+ * @contid_list_lock audit container identifier list lock
*/
struct audit_net {
struct sock *sk;
+ struct list_head contid_list;
+ spinlock_t contid_list_lock;
};

/**
@@ -269,8 +274,11 @@ struct audit_task_info init_struct_audit = {
void audit_free(struct task_struct *tsk)
{
struct audit_task_info *info = tsk->audit;
+ struct nsproxy *ns = tsk->nsproxy;

audit_free_syscall(tsk);
+ if (ns)
+ audit_netns_contid_del(ns->net_ns, audit_get_contid(tsk));
/* Freeing the audit_task_info struct must be performed after
* audit_log_exit() due to need for loginuid and sessionid.
*/
@@ -373,6 +381,75 @@ static struct sock *audit_get_sk(const struct net *net)
return aunet->sk;
}

+void audit_netns_contid_add(struct net *net, u64 contid)
+{
+ struct audit_net *aunet;
+ struct list_head *contid_list;
+ struct audit_contid *cont;
+
+ if (!net)
+ return;
+ if (!audit_contid_valid(contid))
+ return;
+ aunet = net_generic(net, audit_net_id);
+ if (!aunet)
+ return;
+ contid_list = &aunet->contid_list;
+ spin_lock(&aunet->contid_list_lock);
+ list_for_each_entry_rcu(cont, contid_list, list)
+ if (cont->id == contid) {
+ refcount_inc(&cont->refcount);
+ goto out;
+ }
+ cont = kmalloc(sizeof(struct audit_contid), GFP_ATOMIC);
+ if (cont) {
+ INIT_LIST_HEAD(&cont->list);
+ cont->id = contid;
+ refcount_set(&cont->refcount, 1);
+ list_add_rcu(&cont->list, contid_list);
+ }
+out:
+ spin_unlock(&aunet->contid_list_lock);
+}
+
+void audit_netns_contid_del(struct net *net, u64 contid)
+{
+ struct audit_net *aunet;
+ struct list_head *contid_list;
+ struct audit_contid *cont = NULL;
+
+ if (!net)
+ return;
+ if (!audit_contid_valid(contid))
+ return;
+ aunet = net_generic(net, audit_net_id);
+ if (!aunet)
+ return;
+ contid_list = &aunet->contid_list;
+ spin_lock(&aunet->contid_list_lock);
+ list_for_each_entry_rcu(cont, contid_list, list)
+ if (cont->id == contid) {
+ if (refcount_dec_and_test(&cont->refcount)) {
+ list_del_rcu(&cont->list);
+ kfree_rcu(cont, rcu);
+ }
+ break;
+ }
+ spin_unlock(&aunet->contid_list_lock);
+}
+
+void audit_switch_task_namespaces(struct nsproxy *ns, struct task_struct *p)
+{
+ u64 contid = audit_get_contid(p);
+ struct nsproxy *new = p->nsproxy;
+
+ if (!audit_contid_valid(contid))
+ return;
+ audit_netns_contid_del(ns->net_ns, contid);
+ if (new)
+ audit_netns_contid_add(new->net_ns, contid);
+}
+
void audit_panic(const char *message)
{
switch (audit_failure) {
@@ -1641,7 +1718,6 @@ static int __net_init audit_net_init(struct net *net)
.flags = NL_CFG_F_NONROOT_RECV,
.groups = AUDIT_NLGRP_MAX,
};
-
struct audit_net *aunet = net_generic(net, audit_net_id);

aunet->sk = netlink_kernel_create(net, NETLINK_AUDIT, &cfg);
@@ -1650,7 +1726,8 @@ static int __net_init audit_net_init(struct net *net)
return -ENOMEM;
}
aunet->sk->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
-
+ INIT_LIST_HEAD(&aunet->contid_list);
+ spin_lock_init(&aunet->contid_list_lock);
return 0;
}

@@ -2460,6 +2537,7 @@ int audit_set_contid(struct task_struct *task, u64 contid)
uid_t uid;
struct tty_struct *tty;
char comm[sizeof(current->comm)];
+ struct net *net = task->nsproxy->net_ns;

task_lock(task);
/* Can't set if audit disabled */
@@ -2530,6 +2608,11 @@ int audit_set_contid(struct task_struct *task, u64 contid)
conterror:
spin_unlock(&audit_contid_list_lock);
}
+ if (!rc) {
+ if (audit_contid_valid(oldcontid))
+ audit_netns_contid_del(net, oldcontid);
+ audit_netns_contid_add(net, contid);
+ }
task_unlock(task);

if (!audit_enabled)
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index c815f58e6bc0..bbdb5bbf5446 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -23,6 +23,7 @@
#include <linux/syscalls.h>
#include <linux/cgroup.h>
#include <linux/perf_event.h>
+#include <linux/audit.h>

static struct kmem_cache *nsproxy_cachep;

@@ -136,6 +137,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
struct nsproxy *old_ns = tsk->nsproxy;
struct user_namespace *user_ns = task_cred_xxx(tsk, user_ns);
struct nsproxy *new_ns;
+ u64 contid = audit_get_contid(tsk);

if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
CLONE_NEWPID | CLONE_NEWNET |
@@ -163,6 +165,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
return PTR_ERR(new_ns);

tsk->nsproxy = new_ns;
+ audit_netns_contid_add(new_ns->net_ns, contid);
return 0;
}

@@ -220,6 +223,7 @@ void switch_task_namespaces(struct task_struct *p, struct nsproxy *new)
ns = p->nsproxy;
p->nsproxy = new;
task_unlock(p);
+ audit_switch_task_namespaces(ns, p);

if (ns && atomic_dec_and_test(&ns->count))
free_nsproxy(ns);
--
1.8.3.1

2019-09-19 01:31:37

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 13/21] audit: NETFILTER_PKT: record each container ID associated with a netNS

Add audit container identifier auxiliary record(s) to NETFILTER_PKT
event standalone records. Iterate through all potential audit container
identifiers associated with a network namespace.

Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
---
include/linux/audit.h | 5 +++++
kernel/audit.c | 39 +++++++++++++++++++++++++++++++++++++++
net/netfilter/nft_log.c | 11 +++++++++--
net/netfilter/xt_AUDIT.c | 11 +++++++++--
4 files changed, 62 insertions(+), 4 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 73e3ab38e3e0..dcd92f964120 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -241,6 +241,8 @@ static inline u64 audit_get_contid(struct task_struct *tsk)
extern void audit_netns_contid_del(struct net *net, u64 contid);
extern void audit_switch_task_namespaces(struct nsproxy *ns,
struct task_struct *p);
+extern void audit_log_netns_contid_list(struct net *net,
+ struct audit_context *context);

extern u32 audit_enabled;

@@ -328,6 +330,9 @@ static inline void audit_netns_contid_del(struct net *net, u64 contid)
static inline void audit_switch_task_namespaces(struct nsproxy *ns,
struct task_struct *p)
{ }
+static inline void audit_log_netns_contid_list(struct net *net,
+ struct audit_context *context)
+{ }

#define audit_enabled AUDIT_OFF

diff --git a/kernel/audit.c b/kernel/audit.c
index e0c27bc39925..9ce7a1ec7a92 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -450,6 +450,45 @@ void audit_switch_task_namespaces(struct nsproxy *ns, struct task_struct *p)
audit_netns_contid_add(new->net_ns, contid);
}

+/**
+ * audit_log_netns_contid_list - List contids for the given network namespace
+ * @net: the network namespace of interest
+ * @context: the audit context to use
+ *
+ * Description:
+ * Issues a CONTAINER_ID record with a CSV list of contids associated
+ * with a network namespace to accompany a NETFILTER_PKT record.
+ */
+void audit_log_netns_contid_list(struct net *net, struct audit_context *context)
+{
+ struct audit_buffer *ab = NULL;
+ struct audit_contid *cont;
+ struct audit_net *aunet;
+
+ /* Generate AUDIT_CONTAINER_ID record with container ID CSV list */
+ rcu_read_lock();
+ aunet = net_generic(net, audit_net_id);
+ if (!aunet)
+ goto out;
+ list_for_each_entry_rcu(cont, &aunet->contid_list, list) {
+ if (!ab) {
+ ab = audit_log_start(context, GFP_ATOMIC,
+ AUDIT_CONTAINER_ID);
+ if (!ab) {
+ audit_log_lost("out of memory in audit_log_netns_contid_list");
+ goto out;
+ }
+ audit_log_format(ab, "contid=");
+ } else
+ audit_log_format(ab, ",");
+ audit_log_format(ab, "%llu", cont->id);
+ }
+ audit_log_end(ab);
+out:
+ rcu_read_unlock();
+}
+EXPORT_SYMBOL(audit_log_netns_contid_list);
+
void audit_panic(const char *message)
{
switch (audit_failure) {
diff --git a/net/netfilter/nft_log.c b/net/netfilter/nft_log.c
index fe4831f2258f..98d1e7e1a83c 100644
--- a/net/netfilter/nft_log.c
+++ b/net/netfilter/nft_log.c
@@ -66,13 +66,16 @@ static void nft_log_eval_audit(const struct nft_pktinfo *pkt)
struct sk_buff *skb = pkt->skb;
struct audit_buffer *ab;
int fam = -1;
+ struct audit_context *context;
+ struct net *net;

if (!audit_enabled)
return;

- ab = audit_log_start(NULL, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
+ context = audit_alloc_local(GFP_ATOMIC);
+ ab = audit_log_start(context, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
if (!ab)
- return;
+ goto errout;

audit_log_format(ab, "mark=%#x", skb->mark);

@@ -99,6 +102,10 @@ static void nft_log_eval_audit(const struct nft_pktinfo *pkt)
audit_log_format(ab, " saddr=? daddr=? proto=-1");

audit_log_end(ab);
+ net = xt_net(&pkt->xt);
+ audit_log_netns_contid_list(net, context);
+errout:
+ audit_free_context(context);
}

static void nft_log_eval(const struct nft_expr *expr,
diff --git a/net/netfilter/xt_AUDIT.c b/net/netfilter/xt_AUDIT.c
index 9cdc16b0d0d8..ecf868a1abde 100644
--- a/net/netfilter/xt_AUDIT.c
+++ b/net/netfilter/xt_AUDIT.c
@@ -68,10 +68,13 @@ static bool audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)
{
struct audit_buffer *ab;
int fam = -1;
+ struct audit_context *context;
+ struct net *net;

if (audit_enabled == AUDIT_OFF)
- goto errout;
- ab = audit_log_start(NULL, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
+ goto out;
+ context = audit_alloc_local(GFP_ATOMIC);
+ ab = audit_log_start(context, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
if (ab == NULL)
goto errout;

@@ -101,7 +104,11 @@ static bool audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)

audit_log_end(ab);

+ net = xt_net(par);
+ audit_log_netns_contid_list(net, context);
errout:
+ audit_free_context(context);
+out:
return XT_CONTINUE;
}

--
1.8.3.1

2019-09-19 01:31:45

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 14/21] audit: contid check descendancy and nesting

?fixup! audit: convert to contid list to check for orch/engine ownership

Require the target task to be a descendant of the container
orchestrator/engine.

You would only change the audit container ID from one set or inherited
value to another if you were nesting containers.

If changing the contid, the container orchestrator/engine must be a
descendant and not same orchestrator as the one that set it so it is not
possible to change the contid of another orchestrator's container.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
kernel/audit.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 62 insertions(+), 8 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 9ce7a1ec7a92..69fe1e9af7cb 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -2560,6 +2560,39 @@ static struct task_struct *audit_cont_owner(struct task_struct *tsk)
}

/*
+ * task_is_descendant - walk up a process family tree looking for a match
+ * @parent: the process to compare against while walking up from child
+ * @child: the process to start from while looking upwards for parent
+ *
+ * Returns 1 if child is a descendant of parent, 0 if not.
+ */
+static int task_is_descendant(struct task_struct *parent,
+ struct task_struct *child)
+{
+ int rc = 0;
+ struct task_struct *walker = child;
+
+ if (!parent || !child)
+ return 0;
+
+ rcu_read_lock();
+ if (!thread_group_leader(parent))
+ parent = rcu_dereference(parent->group_leader);
+ while (walker->pid > 0) {
+ if (!thread_group_leader(walker))
+ walker = rcu_dereference(walker->group_leader);
+ if (walker == parent) {
+ rc = 1;
+ break;
+ }
+ walker = rcu_dereference(walker->real_parent);
+ }
+ rcu_read_unlock();
+
+ return rc;
+}
+
+/*
* audit_set_contid - set current task's audit contid
* @task: target task
* @contid: contid value
@@ -2587,22 +2620,43 @@ int audit_set_contid(struct task_struct *task, u64 contid)
oldcontid = audit_get_contid(task);
read_lock(&tasklist_lock);
/* Don't allow the contid to be unset */
- if (!audit_contid_valid(contid))
+ if (!audit_contid_valid(contid)) {
rc = -EINVAL;
+ goto unlock;
+ }
/* Don't allow the contid to be set to the same value again */
- else if (contid == oldcontid) {
+ if (contid == oldcontid) {
rc = -EADDRINUSE;
+ goto unlock;
+ }
/* if we don't have caps, reject */
- else if (!capable(CAP_AUDIT_CONTROL))
+ if (!capable(CAP_AUDIT_CONTROL)) {
rc = -EPERM;
- /* if task has children or is not single-threaded, deny */
- else if (!list_empty(&task->children))
+ goto unlock;
+ }
+ /* if task has children, deny */
+ if (!list_empty(&task->children)) {
rc = -EBUSY;
- else if (!(thread_group_leader(task) && thread_group_empty(task)))
+ goto unlock;
+ }
+ /* if task is not single-threaded, deny */
+ if (!(thread_group_leader(task) && thread_group_empty(task))) {
rc = -EALREADY;
- /* if contid is already set, deny */
- else if (audit_contid_set(task))
+ goto unlock;
+ }
+ /* if task is not descendant, block */
+ if (task == current) {
+ rc = -EBADSLT;
+ goto unlock;
+ }
+ if (!task_is_descendant(current, task)) {
+ rc = -EXDEV;
+ goto unlock;
+ }
+ /* only allow contid setting again if nesting */
+ if (audit_contid_set(task) && current == audit_cont_owner(task))
rc = -ECHILD;
+unlock:
read_unlock(&tasklist_lock);
if (!rc) {
struct audit_cont *oldcont = audit_cont(task);
--
1.8.3.1

2019-09-19 01:32:04

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 02/21] audit: add container id

Implement the proc fs write to set the audit container identifier of a
process, emitting an AUDIT_CONTAINER_OP record to document the event.

This is a write from the container orchestrator task to a proc entry of
the form /proc/PID/audit_containerid where PID is the process ID of the
newly created task that is to become the first task in a container, or
an additional task added to a container.

The write expects up to a u64 value (unset: 18446744073709551615).

The writer must have capability CAP_AUDIT_CONTROL.

This will produce a record such as this:
type=CONTAINER_OP msg=audit(2018-06-06 12:39:29.636:26949) : op=set opid=2209 contid=123456 old-contid=18446744073709551615 pid=628 auid=root uid=root tty=ttyS0 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm=bash exe=/usr/bin/bash res=yes

The "op" field indicates an initial set. The "pid" to "ses" fields are
the orchestrator while the "opid" field is the object's PID, the process
being "contained". New and old audit container identifier values are
given in the "contid" fields, while res indicates its success.

It is not permitted to unset the audit container identifier.
A child inherits its parent's audit container identifier.

Please see the github audit kernel issue for the main feature:
https://github.com/linux-audit/audit-kernel/issues/90
Please see the github audit userspace issue for supporting additions:
https://github.com/linux-audit/audit-userspace/issues/51
Please see the github audit testsuiite issue for the test case:
https://github.com/linux-audit/audit-testsuite/issues/64
Please see the github audit wiki for the feature overview:
https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID

Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Acked-by: Steve Grubb <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
Signed-off-by: Richard Guy Briggs <[email protected]>
---
fs/proc/base.c | 36 +++++++++++++++++++++++
include/linux/audit.h | 25 ++++++++++++++++
include/uapi/linux/audit.h | 2 ++
kernel/audit.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++
kernel/audit.h | 1 +
kernel/auditsc.c | 4 +++
6 files changed, 141 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index ebea9501afb8..e2e7c9f4702f 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1307,6 +1307,40 @@ static ssize_t proc_sessionid_read(struct file * file, char __user * buf,
.read = proc_sessionid_read,
.llseek = generic_file_llseek,
};
+
+static ssize_t proc_contid_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct inode *inode = file_inode(file);
+ u64 contid;
+ int rv;
+ struct task_struct *task = get_proc_task(inode);
+
+ if (!task)
+ return -ESRCH;
+ if (*ppos != 0) {
+ /* No partial writes. */
+ put_task_struct(task);
+ return -EINVAL;
+ }
+
+ rv = kstrtou64_from_user(buf, count, 10, &contid);
+ if (rv < 0) {
+ put_task_struct(task);
+ return rv;
+ }
+
+ rv = audit_set_contid(task, contid);
+ put_task_struct(task);
+ if (rv < 0)
+ return rv;
+ return count;
+}
+
+static const struct file_operations proc_contid_operations = {
+ .write = proc_contid_write,
+ .llseek = generic_file_llseek,
+};
#endif

#ifdef CONFIG_FAULT_INJECTION
@@ -3067,6 +3101,7 @@ static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
#ifdef CONFIG_AUDIT
REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
REG("sessionid", S_IRUGO, proc_sessionid_operations),
+ REG("audit_containerid", S_IWUSR, proc_contid_operations),
#endif
#ifdef CONFIG_FAULT_INJECTION
REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
@@ -3467,6 +3502,7 @@ static int proc_tid_comm_permission(struct inode *inode, int mask)
#ifdef CONFIG_AUDIT
REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
REG("sessionid", S_IRUGO, proc_sessionid_operations),
+ REG("audit_containerid", S_IWUSR, proc_contid_operations),
#endif
#ifdef CONFIG_FAULT_INJECTION
REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
diff --git a/include/linux/audit.h b/include/linux/audit.h
index 4fbda55f3cf2..f2e3b81f2942 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -98,6 +98,7 @@ struct audit_ntp_data {
struct audit_task_info {
kuid_t loginuid;
unsigned int sessionid;
+ u64 contid;
#ifdef CONFIG_AUDITSYSCALL
struct audit_context *ctx;
#endif
@@ -198,6 +199,15 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
return tsk->audit->sessionid;
}

+extern int audit_set_contid(struct task_struct *tsk, u64 contid);
+
+static inline u64 audit_get_contid(struct task_struct *tsk)
+{
+ if (!tsk->audit)
+ return AUDIT_CID_UNSET;
+ return tsk->audit->contid;
+}
+
extern u32 audit_enabled;

extern int audit_signal_info(int sig, struct task_struct *t);
@@ -262,6 +272,11 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
return AUDIT_SID_UNSET;
}

+static inline u64 audit_get_contid(struct task_struct *tsk)
+{
+ return AUDIT_CID_UNSET;
+}
+
#define audit_enabled AUDIT_OFF

static inline int audit_signal_info(int sig, struct task_struct *t)
@@ -676,6 +691,16 @@ static inline bool audit_loginuid_set(struct task_struct *tsk)
return uid_valid(audit_get_loginuid(tsk));
}

+static inline bool audit_contid_valid(u64 contid)
+{
+ return contid != AUDIT_CID_UNSET;
+}
+
+static inline bool audit_contid_set(struct task_struct *tsk)
+{
+ return audit_contid_valid(audit_get_contid(tsk));
+}
+
static inline void audit_log_string(struct audit_buffer *ab, const char *buf)
{
audit_log_n_string(ab, buf, strlen(buf));
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index c89c6495983d..5d0ea2a6783e 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -71,6 +71,7 @@
#define AUDIT_TTY_SET 1017 /* Set TTY auditing status */
#define AUDIT_SET_FEATURE 1018 /* Turn an audit feature on or off */
#define AUDIT_GET_FEATURE 1019 /* Get which features are enabled */
+#define AUDIT_CONTAINER_OP 1020 /* Define the container id and info */

#define AUDIT_FIRST_USER_MSG 1100 /* Userspace messages mostly uninteresting to kernel */
#define AUDIT_USER_AVC 1107 /* We filter this differently */
@@ -488,6 +489,7 @@ struct audit_tty_status {

#define AUDIT_UID_UNSET (unsigned int)-1
#define AUDIT_SID_UNSET ((unsigned int)-1)
+#define AUDIT_CID_UNSET ((u64)-1)

/* audit_rule_data supports filter rules with both integer and string
* fields. It corresponds with AUDIT_ADD_RULE, AUDIT_DEL_RULE and
diff --git a/kernel/audit.c b/kernel/audit.c
index 5b1c52bafaeb..a36ea57cbb61 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -231,6 +231,7 @@ int audit_alloc(struct task_struct *tsk)
}
info->loginuid = audit_get_loginuid(current);
info->sessionid = audit_get_sessionid(current);
+ info->contid = audit_get_contid(current);
tsk->audit = info;

ret = audit_alloc_syscall(tsk);
@@ -245,6 +246,7 @@ int audit_alloc(struct task_struct *tsk)
struct audit_task_info init_struct_audit = {
.loginuid = INVALID_UID,
.sessionid = AUDIT_SID_UNSET,
+ .contid = AUDIT_CID_UNSET,
#ifdef CONFIG_AUDITSYSCALL
.ctx = NULL,
#endif
@@ -2354,6 +2356,77 @@ int audit_signal_info(int sig, struct task_struct *t)
return audit_signal_info_syscall(t);
}

+/*
+ * audit_set_contid - set current task's audit contid
+ * @task: target task
+ * @contid: contid value
+ *
+ * Returns 0 on success, -EPERM on permission failure.
+ *
+ * Called (set) from fs/proc/base.c::proc_contid_write().
+ */
+int audit_set_contid(struct task_struct *task, u64 contid)
+{
+ u64 oldcontid;
+ int rc = 0;
+ struct audit_buffer *ab;
+ uid_t uid;
+ struct tty_struct *tty;
+ char comm[sizeof(current->comm)];
+
+ task_lock(task);
+ /* Can't set if audit disabled */
+ if (!task->audit) {
+ task_unlock(task);
+ return -ENOPROTOOPT;
+ }
+ oldcontid = audit_get_contid(task);
+ read_lock(&tasklist_lock);
+ /* Don't allow the audit containerid to be unset */
+ if (!audit_contid_valid(contid))
+ rc = -EINVAL;
+ /* if we don't have caps, reject */
+ else if (!capable(CAP_AUDIT_CONTROL))
+ rc = -EPERM;
+ /* if task has children or is not single-threaded, deny */
+ else if (!list_empty(&task->children))
+ rc = -EBUSY;
+ else if (!(thread_group_leader(task) && thread_group_empty(task)))
+ rc = -EALREADY;
+ /* if contid is already set, deny */
+ else if (audit_contid_set(task))
+ rc = -ECHILD;
+ read_unlock(&tasklist_lock);
+ if (!rc)
+ task->audit->contid = contid;
+ task_unlock(task);
+
+ if (!audit_enabled)
+ return rc;
+
+ ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_CONTAINER_OP);
+ if (!ab)
+ return rc;
+
+ uid = from_kuid(&init_user_ns, task_uid(current));
+ tty = audit_get_tty();
+ audit_log_format(ab,
+ "op=set opid=%d contid=%llu old-contid=%llu pid=%d uid=%u auid=%u tty=%s ses=%u",
+ task_tgid_nr(task), contid, oldcontid,
+ task_tgid_nr(current), uid,
+ from_kuid(&init_user_ns, audit_get_loginuid(current)),
+ tty ? tty_name(tty) : "(none)",
+ audit_get_sessionid(current));
+ audit_put_tty(tty);
+ audit_log_task_context(ab);
+ audit_log_format(ab, " comm=");
+ audit_log_untrustedstring(ab, get_task_comm(comm, current));
+ audit_log_d_path_exe(ab, current->mm);
+ audit_log_format(ab, " res=%d", !rc);
+ audit_log_end(ab);
+ return rc;
+}
+
/**
* audit_log_end - end one audit record
* @ab: the audit_buffer
diff --git a/kernel/audit.h b/kernel/audit.h
index 7f623ef216e6..16bd03b88e0d 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -135,6 +135,7 @@ struct audit_context {
kuid_t target_uid;
unsigned int target_sessionid;
u32 target_sid;
+ u64 target_cid;
char target_comm[TASK_COMM_LEN];

struct audit_tree_refs *trees, *first_trees;
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 10679da36bb6..0e2d50533959 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -113,6 +113,7 @@ struct audit_aux_data_pids {
kuid_t target_uid[AUDIT_AUX_PIDS];
unsigned int target_sessionid[AUDIT_AUX_PIDS];
u32 target_sid[AUDIT_AUX_PIDS];
+ u64 target_cid[AUDIT_AUX_PIDS];
char target_comm[AUDIT_AUX_PIDS][TASK_COMM_LEN];
int pid_count;
};
@@ -2375,6 +2376,7 @@ void __audit_ptrace(struct task_struct *t)
context->target_uid = task_uid(t);
context->target_sessionid = audit_get_sessionid(t);
security_task_getsecid(t, &context->target_sid);
+ context->target_cid = audit_get_contid(t);
memcpy(context->target_comm, t->comm, TASK_COMM_LEN);
}

@@ -2402,6 +2404,7 @@ int audit_signal_info_syscall(struct task_struct *t)
ctx->target_uid = t_uid;
ctx->target_sessionid = audit_get_sessionid(t);
security_task_getsecid(t, &ctx->target_sid);
+ ctx->target_cid = audit_get_contid(t);
memcpy(ctx->target_comm, t->comm, TASK_COMM_LEN);
return 0;
}
@@ -2423,6 +2426,7 @@ int audit_signal_info_syscall(struct task_struct *t)
axp->target_uid[axp->pid_count] = t_uid;
axp->target_sessionid[axp->pid_count] = audit_get_sessionid(t);
security_task_getsecid(t, &axp->target_sid[axp->pid_count]);
+ axp->target_cid[axp->pid_count] = audit_get_contid(t);
memcpy(axp->target_comm[axp->pid_count], t->comm, TASK_COMM_LEN);
axp->pid_count++;

--
1.8.3.1

2019-09-19 01:32:19

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 16/21] audit: add support for contid set/get by netlink

Add the ability to get and set the audit container identifier using an
audit netlink message using message types AUDIT_SET_CONTID 1023 and
AUDIT_GET_CONTID 1022 in addition to using the proc filesystem. The
message format includes the data structure:

struct audit_contid_status {
pid_t pid;
u64 id;
};

Signed-off-by: Richard Guy Briggs <[email protected]>
---
include/uapi/linux/audit.h | 2 ++
kernel/audit.c | 40 ++++++++++++++++++++++++++++++++++++++++
kernel/audit.h | 5 +++++
3 files changed, 47 insertions(+)

diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index f34108759e8f..e26729fc9943 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -73,6 +73,8 @@
#define AUDIT_GET_FEATURE 1019 /* Get which features are enabled */
#define AUDIT_CONTAINER_OP 1020 /* Define the container id and info */
#define AUDIT_SIGNAL_INFO2 1021 /* Get info auditd signal sender */
+#define AUDIT_GET_CONTID 1022 /* Get contid of a task */
+#define AUDIT_SET_CONTID 1023 /* Set contid of a task */

#define AUDIT_FIRST_USER_MSG 1100 /* Userspace messages mostly uninteresting to kernel */
#define AUDIT_USER_AVC 1107 /* We filter this differently */
diff --git a/kernel/audit.c b/kernel/audit.c
index 4fe7678304dd..df92de20ed73 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1216,6 +1216,8 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
case AUDIT_TTY_SET:
case AUDIT_TRIM:
case AUDIT_MAKE_EQUIV:
+ case AUDIT_GET_CONTID:
+ case AUDIT_SET_CONTID:
/* Only support auditd and auditctl in initial pid namespace
* for now. */
if (task_active_pid_ns(current) != &init_pid_ns)
@@ -1273,6 +1275,23 @@ static int audit_get_feature(struct sk_buff *skb)
return 0;
}

+static int audit_get_contid_status(struct sk_buff *skb)
+{
+ struct nlmsghdr *nlh = nlmsg_hdr(skb);
+ u32 seq = nlh->nlmsg_seq;
+ void *data = nlmsg_data(nlh);
+ struct audit_contid_status cs;
+
+ cs.pid = ((struct audit_contid_status *)data)->pid;
+ if (!cs.pid)
+ cs.pid = task_tgid_nr(current);
+ rcu_read_lock();
+ cs.id = audit_get_contid(find_task_by_vpid(cs.pid));
+ rcu_read_unlock();
+ audit_send_reply(skb, seq, AUDIT_GET_CONTID, 0, 0, &cs, sizeof(cs));
+ return 0;
+}
+
static void audit_log_feature_change(int which, u32 old_feature, u32 new_feature,
u32 old_lock, u32 new_lock, int res)
{
@@ -1700,6 +1719,27 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
audit_log_end(ab);
break;
}
+ case AUDIT_SET_CONTID: {
+ struct audit_contid_status *s = data;
+ struct task_struct *tsk;
+
+ /* check if new data is valid */
+ if (nlmsg_len(nlh) < sizeof(*s))
+ return -EINVAL;
+ tsk = find_get_task_by_vpid(s->pid);
+ if (!tsk)
+ return -EINVAL;
+
+ err = audit_set_contid(tsk, s->id);
+ put_task_struct(tsk);
+ return err;
+ break;
+ }
+ case AUDIT_GET_CONTID:
+ err = audit_get_contid_status(skb);
+ if (err)
+ return err;
+ break;
default:
err = -EINVAL;
break;
diff --git a/kernel/audit.h b/kernel/audit.h
index c9b73abfd6a0..25732fbc47a4 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -224,6 +224,11 @@ static inline int audit_hash_contid(u64 contid)

#define AUDIT_CONTID_COUNT 1 << 16

+struct audit_contid_status {
+ pid_t pid;
+ u64 id;
+};
+
/* Indicates that audit should log the full pathname. */
#define AUDIT_NAME_FULL -1

--
1.8.3.1

2019-09-19 01:32:34

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 05/21] audit: log drop of contid on exit of last task

Since we are tracking the life of each audit container indentifier, we
can match the creation event with the destruction event. Log the
destruction of the audit container identifier when the last process in
that container exits.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
kernel/audit.c | 32 ++++++++++++++++++++++++++++++++
kernel/audit.h | 2 ++
kernel/auditsc.c | 2 ++
3 files changed, 36 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index ea0899130cc1..53d13d638c63 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -2503,6 +2503,38 @@ int audit_set_contid(struct task_struct *task, u64 contid)
return rc;
}

+void audit_log_container_drop(void)
+{
+ struct audit_buffer *ab;
+ uid_t uid;
+ struct tty_struct *tty;
+ char comm[sizeof(current->comm)];
+
+ if (!current->audit || !current->audit->cont ||
+ refcount_read(&current->audit->cont->refcount) > 1)
+ return;
+ ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_CONTAINER_OP);
+ if (!ab)
+ return;
+
+ uid = from_kuid(&init_user_ns, task_uid(current));
+ tty = audit_get_tty();
+ audit_log_format(ab,
+ "op=drop opid=%d contid=%llu old-contid=%llu pid=%d uid=%u auid=%u tty=%s ses=%u",
+ task_tgid_nr(current), audit_get_contid(current),
+ audit_get_contid(current), task_tgid_nr(current), uid,
+ from_kuid(&init_user_ns, audit_get_loginuid(current)),
+ tty ? tty_name(tty) : "(none)",
+ audit_get_sessionid(current));
+ audit_put_tty(tty);
+ audit_log_task_context(ab);
+ audit_log_format(ab, " comm=");
+ audit_log_untrustedstring(ab, get_task_comm(comm, current));
+ audit_log_d_path_exe(ab, current->mm);
+ audit_log_format(ab, " res=1");
+ audit_log_end(ab);
+}
+
/**
* audit_log_end - end one audit record
* @ab: the audit_buffer
diff --git a/kernel/audit.h b/kernel/audit.h
index e4a31aa92dfe..162de8366b32 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -255,6 +255,8 @@ extern void audit_log_d_path_exe(struct audit_buffer *ab,
extern struct tty_struct *audit_get_tty(void);
extern void audit_put_tty(struct tty_struct *tty);

+extern void audit_log_container_drop(void);
+
/* audit watch/mark/tree functions */
#ifdef CONFIG_AUDITSYSCALL
extern unsigned int audit_serial(void);
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 0e2d50533959..bd855794ad26 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1568,6 +1568,8 @@ static void audit_log_exit(void)

audit_log_proctitle();

+ audit_log_container_drop();
+
/* Send end of event record to help user space know we are finished */
ab = audit_log_start(context, GFP_KERNEL, AUDIT_EOE);
if (ab)
--
1.8.3.1

2019-09-19 01:32:51

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 17/21] audit: add support for loginuid/sessionid set/get by netlink

Add the ability to get and set the login uid and to get the session id
using an audit netlink message using message types AUDIT_GET_LOGINUID
1024, AUDIT_SET_LOGINUID 1025 and AUDIT_GET_SESSIONID 1026 in addition
to using the proc filesystem.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
include/uapi/linux/audit.h | 3 +++
kernel/audit.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 65 insertions(+)

diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index e26729fc9943..eef42c8eea77 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -75,6 +75,9 @@
#define AUDIT_SIGNAL_INFO2 1021 /* Get info auditd signal sender */
#define AUDIT_GET_CONTID 1022 /* Get contid of a task */
#define AUDIT_SET_CONTID 1023 /* Set contid of a task */
+#define AUDIT_GET_LOGINUID 1024 /* Get loginuid of a task */
+#define AUDIT_SET_LOGINUID 1025 /* Set loginuid of a task */
+#define AUDIT_GET_SESSIONID 1026 /* Set sessionid of a task */

#define AUDIT_FIRST_USER_MSG 1100 /* Userspace messages mostly uninteresting to kernel */
#define AUDIT_USER_AVC 1107 /* We filter this differently */
diff --git a/kernel/audit.c b/kernel/audit.c
index df92de20ed73..9e82de13d2eb 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1184,6 +1184,15 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
{
int err = 0;

+ /* These messages can work outside the initial namespaces */
+ switch (msg_type) {
+ case AUDIT_GET_LOGINUID:
+ case AUDIT_GET_SESSIONID:
+ return 0;
+ break;
+ default: /* do more checks below */
+ break;
+ }
/* Only support initial user namespace for now. */
/*
* We return ECONNREFUSED because it tricks userspace into thinking
@@ -1218,6 +1227,7 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
case AUDIT_MAKE_EQUIV:
case AUDIT_GET_CONTID:
case AUDIT_SET_CONTID:
+ case AUDIT_SET_LOGINUID:
/* Only support auditd and auditctl in initial pid namespace
* for now. */
if (task_active_pid_ns(current) != &init_pid_ns)
@@ -1292,6 +1302,33 @@ static int audit_get_contid_status(struct sk_buff *skb)
return 0;
}

+struct audit_loginuid_status { uid_t loginuid; };
+
+static int audit_get_loginuid_status(struct sk_buff *skb)
+{
+ u32 seq;
+ uid_t loginuid;
+ struct audit_loginuid_status ls;
+
+ loginuid = from_kuid(current_user_ns(), audit_get_loginuid(current));
+ ls.loginuid = loginuid;
+
+ seq = nlmsg_hdr(skb)->nlmsg_seq;
+ audit_send_reply(skb, seq, AUDIT_GET_LOGINUID, 0, 0, &ls, sizeof(ls));
+ return loginuid;
+}
+
+static int audit_get_sessionid_status(struct sk_buff *skb)
+{
+ u32 seq;
+ struct audit_sessionid_status { u32 sessionid; };
+ struct audit_sessionid_status ss = { audit_get_sessionid(current) };
+
+ seq = nlmsg_hdr(skb)->nlmsg_seq;
+ audit_send_reply(skb, seq, AUDIT_GET_SESSIONID, 0, 0, &ss, sizeof(ss));
+ return audit_get_sessionid(current);
+}
+
static void audit_log_feature_change(int which, u32 old_feature, u32 new_feature,
u32 old_lock, u32 new_lock, int res)
{
@@ -1740,6 +1777,31 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
if (err)
return err;
break;
+ case AUDIT_SET_LOGINUID: {
+ uid_t *loginuid = data;
+ kuid_t kloginuid;
+
+ /* check if new data is valid */
+ if (nlmsg_len(nlh) < sizeof(u32))
+ return -EINVAL;
+
+ kloginuid = make_kuid(current_user_ns(), *loginuid);
+ if (!uid_valid(kloginuid))
+ return -EINVAL;
+
+ return audit_set_loginuid(kloginuid);
+ break;
+ }
+ case AUDIT_GET_LOGINUID:
+ err = audit_get_loginuid_status(skb);
+ if (err)
+ return err;
+ break;
+ case AUDIT_GET_SESSIONID:
+ err = audit_get_sessionid_status(skb);
+ if (err)
+ return err;
+ break;
default:
err = -EINVAL;
break;
--
1.8.3.1

2019-09-19 01:32:58

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 18/21] audit: track container nesting

Track the parent container of a container to be able to filter and
report nesting.

Now that we have a way to track and check the parent container of a
container, fixup other patches, or squash all nesting fixes together.

fixup! audit: add container id
fixup! audit: log drop of contid on exit of last task
fixup! audit: log container info of syscalls
fixup! audit: add containerid filtering
fixup! audit: NETFILTER_PKT: record each container ID associated with a netNS
fixup! audit: convert to contid list to check for orch/engine ownership softirq (for netfilter) audit: protect contid list lock from softirq

Signed-off-by: Richard Guy Briggs <[email protected]>
---
include/linux/audit.h | 1 +
kernel/audit.c | 67 ++++++++++++++++++++++++++++++++++++++++++---------
kernel/audit.h | 3 +++
kernel/auditfilter.c | 20 ++++++++++++++-
kernel/auditsc.c | 2 +-
5 files changed, 79 insertions(+), 14 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index dcd92f964120..1ce27af686ea 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -110,6 +110,7 @@ struct audit_cont {
struct task_struct *owner;
refcount_t refcount;
struct rcu_head rcu;
+ struct audit_cont *parent;
};

struct audit_task_info {
diff --git a/kernel/audit.c b/kernel/audit.c
index 9e82de13d2eb..848fd1c8c579 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -213,7 +213,7 @@ struct audit_reply {

static struct kmem_cache *audit_task_cache;

-static DEFINE_SPINLOCK(audit_contid_list_lock);
+DEFINE_SPINLOCK(audit_contid_list_lock);

void __init audit_task_init(void)
{
@@ -275,6 +275,7 @@ void audit_free(struct task_struct *tsk)
{
struct audit_task_info *info = tsk->audit;
struct nsproxy *ns = tsk->nsproxy;
+ unsigned long flags;

audit_free_syscall(tsk);
if (ns)
@@ -282,9 +283,9 @@ void audit_free(struct task_struct *tsk)
/* Freeing the audit_task_info struct must be performed after
* audit_log_exit() due to need for loginuid and sessionid.
*/
- spin_lock(&audit_contid_list_lock);
+ spin_lock_irqsave(&audit_contid_list_lock, flags);
audit_cont_put(tsk->audit->cont);
- spin_unlock(&audit_contid_list_lock);
+ spin_unlock_irqrestore(&audit_contid_list_lock, flags);
info = tsk->audit;
tsk->audit = NULL;
kmem_cache_free(audit_task_cache, info);
@@ -450,6 +451,7 @@ void audit_switch_task_namespaces(struct nsproxy *ns, struct task_struct *p)
audit_netns_contid_add(new->net_ns, contid);
}

+void audit_log_contid(struct audit_buffer *ab, u64 contid);
/**
* audit_log_netns_contid_list - List contids for the given network namespace
* @net: the network namespace of interest
@@ -481,7 +483,7 @@ void audit_log_netns_contid_list(struct net *net, struct audit_context *context)
audit_log_format(ab, "contid=");
} else
audit_log_format(ab, ",");
- audit_log_format(ab, "%llu", cont->id);
+ audit_log_contid(ab, cont->id);
}
audit_log_end(ab);
out:
@@ -2371,6 +2373,36 @@ void audit_log_session_info(struct audit_buffer *ab)
audit_log_format(ab, "auid=%u ses=%u", auid, sessionid);
}

+void audit_log_contid(struct audit_buffer *ab, u64 contid)
+{
+ struct audit_cont *cont = NULL;
+ struct audit_cont *prcont = NULL;
+ int h;
+ unsigned long flags;
+
+ if (!audit_contid_valid(contid)) {
+ audit_log_format(ab, "%llu", contid);
+ return;
+ }
+ h = audit_hash_contid(contid);
+ spin_lock_irqsave(&audit_contid_list_lock, flags);
+ list_for_each_entry_rcu(cont, &audit_contid_hash[h], list)
+ if (cont->id == contid)
+ prcont = cont;
+ if (!prcont) {
+ audit_log_format(ab, "%llu", contid);
+ goto out;
+ }
+ while (prcont) {
+ audit_log_format(ab, "%llu", prcont->id);
+ prcont = prcont->parent;
+ if (prcont)
+ audit_log_format(ab, "^");
+ }
+out:
+ spin_unlock_irqrestore(&audit_contid_list_lock, flags);
+}
+
/*
* audit_log_container_id - report container info
* @context: task or local context for record
@@ -2386,7 +2418,8 @@ void audit_log_container_id(struct audit_context *context, u64 contid)
ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONTAINER_ID);
if (!ab)
return;
- audit_log_format(ab, "contid=%llu", contid);
+ audit_log_format(ab, "contid=");
+ audit_log_contid(ab, contid);
audit_log_end(ab);
}
EXPORT_SYMBOL(audit_log_container_id);
@@ -2648,6 +2681,7 @@ void audit_cont_put(struct audit_cont *cont)
return;
if (refcount_dec_and_test(&cont->refcount)) {
put_task_struct(cont->owner);
+ audit_cont_put(cont->parent);
list_del_rcu(&cont->list);
kfree_rcu(cont, rcu);
audit_contid_count--;
@@ -2732,8 +2766,9 @@ int audit_set_contid(struct task_struct *task, u64 contid)
struct audit_cont *cont = NULL;
struct audit_cont *newcont = NULL;
int h = audit_hash_contid(contid);
+ unsigned long flags;

- spin_lock(&audit_contid_list_lock);
+ spin_lock_irqsave(&audit_contid_list_lock, flags);
list_for_each_entry_rcu(cont, &audit_contid_hash[h], list)
if (cont->id == contid) {
/* task injection to existing container */
@@ -2757,6 +2792,9 @@ int audit_set_contid(struct task_struct *task, u64 contid)
newcont->id = contid;
get_task_struct(current);
newcont->owner = current;
+ newcont->parent = audit_cont(newcont->owner);
+ if (newcont->parent)
+ refcount_inc(&newcont->parent->refcount);
refcount_set(&newcont->refcount, 1);
list_add_rcu(&newcont->list, &audit_contid_hash[h]);
audit_contid_count++;
@@ -2768,7 +2806,7 @@ int audit_set_contid(struct task_struct *task, u64 contid)
task->audit->cont = newcont;
audit_cont_put(oldcont);
conterror:
- spin_unlock(&audit_contid_list_lock);
+ spin_unlock_irqrestore(&audit_contid_list_lock, flags);
}
if (!rc) {
if (audit_contid_valid(oldcontid))
@@ -2786,9 +2824,12 @@ int audit_set_contid(struct task_struct *task, u64 contid)

uid = from_kuid(&init_user_ns, task_uid(current));
tty = audit_get_tty();
+ audit_log_format(ab, "op=set opid=%d contid=", task_tgid_nr(task));
+ audit_log_contid(ab, contid);
+ audit_log_format(ab, " old-contid=");
+ audit_log_contid(ab, oldcontid);
audit_log_format(ab,
- "op=set opid=%d contid=%llu old-contid=%llu pid=%d uid=%u auid=%u tty=%s ses=%u",
- task_tgid_nr(task), contid, oldcontid,
+ " pid=%d uid=%u auid=%u tty=%s ses=%u",
task_tgid_nr(current), uid,
from_kuid(&init_user_ns, audit_get_loginuid(current)),
tty ? tty_name(tty) : "(none)",
@@ -2819,10 +2860,12 @@ void audit_log_container_drop(void)

uid = from_kuid(&init_user_ns, task_uid(current));
tty = audit_get_tty();
+ audit_log_format(ab, "op=drop opid=%d contid=%llu old-contid=",
+ task_tgid_nr(current), AUDIT_CID_UNSET);
+ audit_log_contid(ab, audit_get_contid(current));
audit_log_format(ab,
- "op=drop opid=%d contid=%llu old-contid=%llu pid=%d uid=%u auid=%u tty=%s ses=%u",
- task_tgid_nr(current), audit_get_contid(current),
- audit_get_contid(current), task_tgid_nr(current), uid,
+ " pid=%d uid=%u auid=%u tty=%s ses=%u",
+ task_tgid_nr(current), uid,
from_kuid(&init_user_ns, audit_get_loginuid(current)),
tty ? tty_name(tty) : "(none)",
audit_get_sessionid(current));
diff --git a/kernel/audit.h b/kernel/audit.h
index 25732fbc47a4..89b7de323c13 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -220,6 +220,8 @@ static inline int audit_hash_contid(u64 contid)
return (contid & (AUDIT_CONTID_BUCKETS-1));
}

+extern spinlock_t audit_contid_list_lock;
+
extern int audit_contid_count;

#define AUDIT_CONTID_COUNT 1 << 16
@@ -235,6 +237,7 @@ struct audit_contid_status {
extern int audit_match_class(int class, unsigned syscall);
extern int audit_comparator(const u32 left, const u32 op, const u32 right);
extern int audit_comparator64(const u64 left, const u32 op, const u64 right);
+extern int audit_contid_comparator(const u64 left, const u32 op, const u64 right);
extern int audit_uid_comparator(kuid_t left, u32 op, kuid_t right);
extern int audit_gid_comparator(kgid_t left, u32 op, kgid_t right);
extern int parent_len(const char *path);
diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index 9606f973fe33..513d57d03637 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -1297,6 +1297,24 @@ int audit_gid_comparator(kgid_t left, u32 op, kgid_t right)
}
}

+int audit_contid_comparator(u64 left, u32 op, u64 right)
+{
+ struct audit_cont *cont = NULL;
+ int h;
+ int result = 0;
+ unsigned long flags;
+
+ h = audit_hash_contid(left);
+ spin_lock_irqsave(&audit_contid_list_lock, flags);
+ list_for_each_entry_rcu(cont, &audit_contid_hash[h], list) {
+ result = audit_comparator64(cont->id, op, right);
+ if (result)
+ break;
+ }
+ spin_unlock_irqrestore(&audit_contid_list_lock, flags);
+ return result;
+}
+
/**
* parent_len - find the length of the parent portion of a pathname
* @path: pathname of which to determine length
@@ -1388,7 +1406,7 @@ int audit_filter(int msgtype, unsigned int listtype)
f->op, f->val);
break;
case AUDIT_CONTID:
- result = audit_comparator64(audit_get_contid(current),
+ result = audit_contid_comparator(audit_get_contid(current),
f->op, f->val64);
break;
case AUDIT_MSGTYPE:
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index a658fe775b86..6bf6d8b9dfd1 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -630,7 +630,7 @@ static int audit_filter_rules(struct task_struct *tsk,
f->op, f->val);
break;
case AUDIT_CONTID:
- result = audit_comparator64(audit_get_contid(tsk),
+ result = audit_contid_comparator(audit_get_contid(tsk),
f->op, f->val64);
break;
case AUDIT_SUBJ_USER:
--
1.8.3.1

2019-09-19 01:33:44

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 15/21] sched: pull task_is_descendant into kernel/sched/core.c

Since the task_is_descendant() function is used in YAMA and in audit,
pull the function into kernel/core/sched.c

Signed-off-by: Richard Guy Briggs <[email protected]>
---
include/linux/sched.h | 3 +++
kernel/audit.c | 33 ---------------------------------
kernel/sched/core.c | 33 +++++++++++++++++++++++++++++++++
security/yama/yama_lsm.c | 33 ---------------------------------
4 files changed, 36 insertions(+), 66 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a936d162513a..b251f018f4db 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1988,4 +1988,7 @@ static inline void rseq_syscall(struct pt_regs *regs)

const struct cpumask *sched_trace_rd_span(struct root_domain *rd);

+extern int task_is_descendant(struct task_struct *parent,
+ struct task_struct *child);
+
#endif
diff --git a/kernel/audit.c b/kernel/audit.c
index 69fe1e9af7cb..4fe7678304dd 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -2560,39 +2560,6 @@ static struct task_struct *audit_cont_owner(struct task_struct *tsk)
}

/*
- * task_is_descendant - walk up a process family tree looking for a match
- * @parent: the process to compare against while walking up from child
- * @child: the process to start from while looking upwards for parent
- *
- * Returns 1 if child is a descendant of parent, 0 if not.
- */
-static int task_is_descendant(struct task_struct *parent,
- struct task_struct *child)
-{
- int rc = 0;
- struct task_struct *walker = child;
-
- if (!parent || !child)
- return 0;
-
- rcu_read_lock();
- if (!thread_group_leader(parent))
- parent = rcu_dereference(parent->group_leader);
- while (walker->pid > 0) {
- if (!thread_group_leader(walker))
- walker = rcu_dereference(walker->group_leader);
- if (walker == parent) {
- rc = 1;
- break;
- }
- walker = rcu_dereference(walker->real_parent);
- }
- rcu_read_unlock();
-
- return rc;
-}
-
-/*
* audit_set_contid - set current task's audit contid
* @task: target task
* @contid: contid value
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2b037f195473..7ba9e07381fa 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7509,6 +7509,39 @@ void dump_cpu_task(int cpu)
}

/*
+ * task_is_descendant - walk up a process family tree looking for a match
+ * @parent: the process to compare against while walking up from child
+ * @child: the process to start from while looking upwards for parent
+ *
+ * Returns 1 if child is a descendant of parent, 0 if not.
+ */
+int task_is_descendant(struct task_struct *parent,
+ struct task_struct *child)
+{
+ int rc = 0;
+ struct task_struct *walker = child;
+
+ if (!parent || !child)
+ return 0;
+
+ rcu_read_lock();
+ if (!thread_group_leader(parent))
+ parent = rcu_dereference(parent->group_leader);
+ while (walker->pid > 0) {
+ if (!thread_group_leader(walker))
+ walker = rcu_dereference(walker->group_leader);
+ if (walker == parent) {
+ rc = 1;
+ break;
+ }
+ walker = rcu_dereference(walker->real_parent);
+ }
+ rcu_read_unlock();
+
+ return rc;
+}
+
+/*
* Nice levels are multiplicative, with a gentle 10% change for every
* nice level changed. I.e. when a CPU-bound task goes from nice 0 to
* nice 1, it will get ~10% less CPU time than another CPU-bound task
diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index 94dc346370b1..25eae205eae8 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -263,39 +263,6 @@ static int yama_task_prctl(int option, unsigned long arg2, unsigned long arg3,
}

/**
- * task_is_descendant - walk up a process family tree looking for a match
- * @parent: the process to compare against while walking up from child
- * @child: the process to start from while looking upwards for parent
- *
- * Returns 1 if child is a descendant of parent, 0 if not.
- */
-static int task_is_descendant(struct task_struct *parent,
- struct task_struct *child)
-{
- int rc = 0;
- struct task_struct *walker = child;
-
- if (!parent || !child)
- return 0;
-
- rcu_read_lock();
- if (!thread_group_leader(parent))
- parent = rcu_dereference(parent->group_leader);
- while (walker->pid > 0) {
- if (!thread_group_leader(walker))
- walker = rcu_dereference(walker->group_leader);
- if (walker == parent) {
- rc = 1;
- break;
- }
- walker = rcu_dereference(walker->real_parent);
- }
- rcu_read_unlock();
-
- return rc;
-}
-
-/**
* ptracer_exception_found - tracer registered as exception for this tracee
* @tracer: the task_struct of the process attempting ptrace
* @tracee: the task_struct of the process to be ptraced
--
1.8.3.1

2019-09-19 01:33:45

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 21/21] audit: add proc interface for capcontid

Add a /proc interface to capcontid for testing purposes. This isn't
intended to be merged upstream. Container orchestrators/engines are
expected to link to libaudit to use the functions audit_set_capcontid()
and audit_get_capcontid.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
fs/proc/base.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 26091800180c..283ef8e006e7 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1360,6 +1360,59 @@ static ssize_t proc_contid_write(struct file *file, const char __user *buf,
.write = proc_contid_write,
.llseek = generic_file_llseek,
};
+
+static ssize_t proc_capcontid_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct inode *inode = file_inode(file);
+ struct task_struct *task = get_proc_task(inode);
+ ssize_t length;
+ char tmpbuf[TMPBUFLEN];
+
+ if (!task)
+ return -ESRCH;
+ /* if we don't have caps, reject */
+ if (!capable(CAP_AUDIT_CONTROL) && !audit_get_capcontid(current))
+ return -EPERM;
+ length = scnprintf(tmpbuf, TMPBUFLEN, "%u", audit_get_capcontid(task));
+ put_task_struct(task);
+ return simple_read_from_buffer(buf, count, ppos, tmpbuf, length);
+}
+
+static ssize_t proc_capcontid_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct inode *inode = file_inode(file);
+ u32 capcontid;
+ int rv;
+ struct task_struct *task = get_proc_task(inode);
+
+ if (!task)
+ return -ESRCH;
+ if (*ppos != 0) {
+ /* No partial writes. */
+ put_task_struct(task);
+ return -EINVAL;
+ }
+
+ rv = kstrtou32_from_user(buf, count, 10, &capcontid);
+ if (rv < 0) {
+ put_task_struct(task);
+ return rv;
+ }
+
+ rv = audit_set_capcontid(task, capcontid);
+ put_task_struct(task);
+ if (rv < 0)
+ return rv;
+ return count;
+}
+
+static const struct file_operations proc_capcontid_operations = {
+ .read = proc_capcontid_read,
+ .write = proc_capcontid_write,
+ .llseek = generic_file_llseek,
+};
#endif

#ifdef CONFIG_FAULT_INJECTION
@@ -3121,6 +3174,7 @@ static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
REG("sessionid", S_IRUGO, proc_sessionid_operations),
REG("audit_containerid", S_IWUSR|S_IRUSR, proc_contid_operations),
+ REG("audit_capcontainerid", S_IWUSR|S_IRUSR|S_IRUSR, proc_capcontid_operations),
#endif
#ifdef CONFIG_FAULT_INJECTION
REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
@@ -3522,6 +3576,7 @@ static int proc_tid_comm_permission(struct inode *inode, int mask)
REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
REG("sessionid", S_IRUGO, proc_sessionid_operations),
REG("audit_containerid", S_IWUSR|S_IRUSR, proc_contid_operations),
+ REG("audit_capcontainerid", S_IWUSR|S_IRUSR|S_IRUSR, proc_capcontid_operations),
#endif
#ifdef CONFIG_FAULT_INJECTION
REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
--
1.8.3.1

2019-09-19 01:34:59

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 19/21] audit: check cont depth

Set an arbitrary limit on the depth of audit container identifier
nesting to limit abuse.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
kernel/audit.c | 21 +++++++++++++++++++++
kernel/audit.h | 2 ++
2 files changed, 23 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index 848fd1c8c579..a70c9184e5d9 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -2667,6 +2667,22 @@ int audit_signal_info(int sig, struct task_struct *t)
return audit_signal_info_syscall(t);
}

+static int audit_contid_depth(struct audit_cont *cont)
+{
+ struct audit_cont *parent;
+ int depth = 1;
+
+ if (!cont)
+ return 0;
+
+ parent = cont->parent;
+ while (parent) {
+ depth++;
+ parent = parent->parent;
+ }
+ return depth;
+}
+
struct audit_cont *audit_cont(struct task_struct *tsk)
{
if (!tsk->audit || !tsk->audit->cont)
@@ -2785,6 +2801,11 @@ int audit_set_contid(struct task_struct *task, u64 contid)
rc = -ENOSPC;
goto conterror;
}
+ /* Set max contid depth */
+ if (audit_contid_depth(audit_cont(current->real_parent)) >= AUDIT_CONTID_DEPTH) {
+ rc = -EMLINK;
+ goto conterror;
+ }
if (!newcont) {
newcont = kmalloc(sizeof(struct audit_cont), GFP_ATOMIC);
if (newcont) {
diff --git a/kernel/audit.h b/kernel/audit.h
index 89b7de323c13..cb25341c1a0f 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -231,6 +231,8 @@ struct audit_contid_status {
u64 id;
};

+#define AUDIT_CONTID_DEPTH 5
+
/* Indicates that audit should log the full pathname. */
#define AUDIT_NAME_FULL -1

--
1.8.3.1

2019-09-19 01:35:21

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

Provide a mechanism similar to CAP_AUDIT_CONTROL to explicitly give a
process in a non-init user namespace the capability to set audit
container identifiers.

Use audit netlink message types AUDIT_GET_CAPCONTID 1027 and
AUDIT_SET_CAPCONTID 1028. The message format includes the data
structure:
struct audit_capcontid_status {
pid_t pid;
u32 enable;
};

Signed-off-by: Richard Guy Briggs <[email protected]>
---
include/linux/audit.h | 14 +++++++
include/uapi/linux/audit.h | 2 +
kernel/audit.c | 98 +++++++++++++++++++++++++++++++++++++++++++++-
kernel/audit.h | 5 +++
4 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 1ce27af686ea..dcc53e62e266 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -117,6 +117,7 @@ struct audit_task_info {
kuid_t loginuid;
unsigned int sessionid;
struct audit_cont *cont;
+ u32 capcontid;
#ifdef CONFIG_AUDITSYSCALL
struct audit_context *ctx;
#endif
@@ -224,6 +225,14 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
return tsk->audit->sessionid;
}

+static inline u32 audit_get_capcontid(struct task_struct *tsk)
+{
+ if (!tsk->audit)
+ return 0;
+ return tsk->audit->capcontid;
+}
+
+extern int audit_set_capcontid(struct task_struct *tsk, u32 enable);
extern int audit_set_contid(struct task_struct *tsk, u64 contid);

static inline u64 audit_get_contid(struct task_struct *tsk)
@@ -309,6 +318,11 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
return AUDIT_SID_UNSET;
}

+static inline u32 audit_get_capcontid(struct task_struct *tsk)
+{
+ return 0;
+}
+
static inline u64 audit_get_contid(struct task_struct *tsk)
{
return AUDIT_CID_UNSET;
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index eef42c8eea77..011b0a8ee9b2 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -78,6 +78,8 @@
#define AUDIT_GET_LOGINUID 1024 /* Get loginuid of a task */
#define AUDIT_SET_LOGINUID 1025 /* Set loginuid of a task */
#define AUDIT_GET_SESSIONID 1026 /* Set sessionid of a task */
+#define AUDIT_GET_CAPCONTID 1027 /* Get cap_contid of a task */
+#define AUDIT_SET_CAPCONTID 1028 /* Set cap_contid of a task */

#define AUDIT_FIRST_USER_MSG 1100 /* Userspace messages mostly uninteresting to kernel */
#define AUDIT_USER_AVC 1107 /* We filter this differently */
diff --git a/kernel/audit.c b/kernel/audit.c
index a70c9184e5d9..7160da464849 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1192,6 +1192,14 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
case AUDIT_GET_SESSIONID:
return 0;
break;
+ case AUDIT_GET_CAPCONTID:
+ case AUDIT_SET_CAPCONTID:
+ case AUDIT_GET_CONTID:
+ case AUDIT_SET_CONTID:
+ if (!netlink_capable(skb, CAP_AUDIT_CONTROL) && !audit_get_capcontid(current))
+ return -EPERM;
+ return 0;
+ break;
default: /* do more checks below */
break;
}
@@ -1227,8 +1235,6 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
case AUDIT_TTY_SET:
case AUDIT_TRIM:
case AUDIT_MAKE_EQUIV:
- case AUDIT_GET_CONTID:
- case AUDIT_SET_CONTID:
case AUDIT_SET_LOGINUID:
/* Only support auditd and auditctl in initial pid namespace
* for now. */
@@ -1304,6 +1310,23 @@ static int audit_get_contid_status(struct sk_buff *skb)
return 0;
}

+static int audit_get_capcontid_status(struct sk_buff *skb)
+{
+ struct nlmsghdr *nlh = nlmsg_hdr(skb);
+ u32 seq = nlh->nlmsg_seq;
+ void *data = nlmsg_data(nlh);
+ struct audit_capcontid_status cs;
+
+ cs.pid = ((struct audit_capcontid_status *)data)->pid;
+ if (!cs.pid)
+ cs.pid = task_tgid_nr(current);
+ rcu_read_lock();
+ cs.enable = audit_get_capcontid(find_task_by_vpid(cs.pid));
+ rcu_read_unlock();
+ audit_send_reply(skb, seq, AUDIT_GET_CAPCONTID, 0, 0, &cs, sizeof(cs));
+ return 0;
+}
+
struct audit_loginuid_status { uid_t loginuid; };

static int audit_get_loginuid_status(struct sk_buff *skb)
@@ -1779,6 +1802,27 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
if (err)
return err;
break;
+ case AUDIT_SET_CAPCONTID: {
+ struct audit_capcontid_status *s = data;
+ struct task_struct *tsk;
+
+ /* check if new data is valid */
+ if (nlmsg_len(nlh) < sizeof(*s))
+ return -EINVAL;
+ tsk = find_get_task_by_vpid(s->pid);
+ if (!tsk)
+ return -EINVAL;
+
+ err = audit_set_capcontid(tsk, s->enable);
+ put_task_struct(tsk);
+ return err;
+ break;
+ }
+ case AUDIT_GET_CAPCONTID:
+ err = audit_get_capcontid_status(skb);
+ if (err)
+ return err;
+ break;
case AUDIT_SET_LOGINUID: {
uid_t *loginuid = data;
kuid_t kloginuid;
@@ -2711,6 +2755,56 @@ static struct task_struct *audit_cont_owner(struct task_struct *tsk)
return NULL;
}

+int audit_set_capcontid(struct task_struct *task, u32 enable)
+{
+ u32 oldcapcontid;
+ int rc = 0;
+ struct audit_buffer *ab;
+ uid_t uid;
+ struct tty_struct *tty;
+ char comm[sizeof(current->comm)];
+
+ if (!task->audit)
+ return -ENOPROTOOPT;
+ oldcapcontid = audit_get_capcontid(task);
+ /* if task is not descendant, block */
+ if (task == current)
+ rc = -EBADSLT;
+ else if (!task_is_descendant(current, task))
+ rc = -EXDEV;
+ else if (current_user_ns() == &init_user_ns) {
+ if (!capable(CAP_AUDIT_CONTROL) && !audit_get_capcontid(current))
+ rc = -EPERM;
+ }
+ if (!rc)
+ task->audit->capcontid = enable;
+
+ if (!audit_enabled)
+ return rc;
+
+ ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_SET_CAPCONTID);
+ if (!ab)
+ return rc;
+
+ uid = from_kuid(&init_user_ns, task_uid(current));
+ tty = audit_get_tty();
+ audit_log_format(ab,
+ "opid=%d capcontid=%u old-capcontid=%u pid=%d uid=%u auid=%u tty=%s ses=%u",
+ task_tgid_nr(task), enable, oldcapcontid,
+ task_tgid_nr(current), uid,
+ from_kuid(&init_user_ns, audit_get_loginuid(current)),
+ tty ? tty_name(tty) : "(none)",
+ audit_get_sessionid(current));
+ audit_put_tty(tty);
+ audit_log_task_context(ab);
+ audit_log_format(ab, " comm=");
+ audit_log_untrustedstring(ab, get_task_comm(comm, current));
+ audit_log_d_path_exe(ab, current->mm);
+ audit_log_format(ab, " res=%d", !rc);
+ audit_log_end(ab);
+ return rc;
+}
+
/*
* audit_set_contid - set current task's audit contid
* @task: target task
diff --git a/kernel/audit.h b/kernel/audit.h
index cb25341c1a0f..ac4694e88485 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -231,6 +231,11 @@ struct audit_contid_status {
u64 id;
};

+struct audit_capcontid_status {
+ pid_t pid;
+ u32 enable;
+};
+
#define AUDIT_CONTID_DEPTH 5

/* Indicates that audit should log the full pathname. */
--
1.8.3.1

2019-09-19 03:30:07

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 11/21] audit: add containerid filtering

Implement audit container identifier filtering using the AUDIT_CONTID
field name to send an 8-character string representing a u64 since the
value field is only u32.

Sending it as two u32 was considered, but gathering and comparing two
fields was more complex.

The feature indicator is AUDIT_FEATURE_BITMAP_CONTAINERID.

Please see the github audit kernel issue for the contid filter feature:
https://github.com/linux-audit/audit-kernel/issues/91
Please see the github audit userspace issue for filter additions:
https://github.com/linux-audit/audit-userspace/issues/40
Please see the github audit testsuiite issue for the test case:
https://github.com/linux-audit/audit-testsuite/issues/64
Please see the github audit wiki for the feature overview:
https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
---
include/linux/audit.h | 1 +
include/uapi/linux/audit.h | 5 ++++-
kernel/audit.h | 1 +
kernel/auditfilter.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
kernel/auditsc.c | 4 ++++
5 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index e849058cb662..575fff6ea7c9 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -69,6 +69,7 @@ struct audit_field {
u32 type;
union {
u32 val;
+ u64 val64;
kuid_t uid;
kgid_t gid;
struct {
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 693ec6e0288b..f34108759e8f 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -268,6 +268,7 @@
#define AUDIT_LOGINUID_SET 24
#define AUDIT_SESSIONID 25 /* Session ID */
#define AUDIT_FSTYPE 26 /* FileSystem Type */
+#define AUDIT_CONTID 27 /* Container ID */

/* These are ONLY useful when checking
* at syscall exit time (AUDIT_AT_EXIT). */
@@ -349,6 +350,7 @@ enum {
#define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER 0x00000010
#define AUDIT_FEATURE_BITMAP_LOST_RESET 0x00000020
#define AUDIT_FEATURE_BITMAP_FILTER_FS 0x00000040
+#define AUDIT_FEATURE_BITMAP_CONTAINERID 0x00000080

#define AUDIT_FEATURE_BITMAP_ALL (AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT | \
AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME | \
@@ -356,7 +358,8 @@ enum {
AUDIT_FEATURE_BITMAP_EXCLUDE_EXTEND | \
AUDIT_FEATURE_BITMAP_SESSIONID_FILTER | \
AUDIT_FEATURE_BITMAP_LOST_RESET | \
- AUDIT_FEATURE_BITMAP_FILTER_FS)
+ AUDIT_FEATURE_BITMAP_FILTER_FS | \
+ AUDIT_FEATURE_BITMAP_CONTAINERID)

/* deprecated: AUDIT_VERSION_* */
#define AUDIT_VERSION_LATEST AUDIT_FEATURE_BITMAP_ALL
diff --git a/kernel/audit.h b/kernel/audit.h
index 1bba13bdffd0..c9b73abfd6a0 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -229,6 +229,7 @@ static inline int audit_hash_contid(u64 contid)

extern int audit_match_class(int class, unsigned syscall);
extern int audit_comparator(const u32 left, const u32 op, const u32 right);
+extern int audit_comparator64(const u64 left, const u32 op, const u64 right);
extern int audit_uid_comparator(kuid_t left, u32 op, kuid_t right);
extern int audit_gid_comparator(kgid_t left, u32 op, kgid_t right);
extern int parent_len(const char *path);
diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index b0126e9c0743..9606f973fe33 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -399,6 +399,7 @@ static int audit_field_valid(struct audit_entry *entry, struct audit_field *f)
case AUDIT_FILETYPE:
case AUDIT_FIELD_COMPARE:
case AUDIT_EXE:
+ case AUDIT_CONTID:
/* only equal and not equal valid ops */
if (f->op != Audit_not_equal && f->op != Audit_equal)
return -EINVAL;
@@ -586,6 +587,14 @@ static struct audit_entry *audit_data_to_entry(struct audit_rule_data *data,
}
entry->rule.exe = audit_mark;
break;
+ case AUDIT_CONTID:
+ if (f->val != sizeof(u64))
+ goto exit_free;
+ str = audit_unpack_string(&bufp, &remain, f->val);
+ if (IS_ERR(str))
+ goto exit_free;
+ f->val64 = ((u64 *)str)[0];
+ break;
}
}

@@ -668,6 +677,11 @@ static struct audit_rule_data *audit_krule_to_data(struct audit_krule *krule)
data->buflen += data->values[i] =
audit_pack_string(&bufp, audit_mark_path(krule->exe));
break;
+ case AUDIT_CONTID:
+ data->buflen += data->values[i] = sizeof(u64);
+ memcpy(bufp, &f->val64, sizeof(u64));
+ bufp += sizeof(u64);
+ break;
case AUDIT_LOGINUID_SET:
if (krule->pflags & AUDIT_LOGINUID_LEGACY && !f->val) {
data->fields[i] = AUDIT_LOGINUID;
@@ -754,6 +768,10 @@ static int audit_compare_rule(struct audit_krule *a, struct audit_krule *b)
if (!gid_eq(a->fields[i].gid, b->fields[i].gid))
return 1;
break;
+ case AUDIT_CONTID:
+ if (a->fields[i].val64 != b->fields[i].val64)
+ return 1;
+ break;
default:
if (a->fields[i].val != b->fields[i].val)
return 1;
@@ -1211,6 +1229,30 @@ int audit_comparator(u32 left, u32 op, u32 right)
}
}

+int audit_comparator64(u64 left, u32 op, u64 right)
+{
+ switch (op) {
+ case Audit_equal:
+ return (left == right);
+ case Audit_not_equal:
+ return (left != right);
+ case Audit_lt:
+ return (left < right);
+ case Audit_le:
+ return (left <= right);
+ case Audit_gt:
+ return (left > right);
+ case Audit_ge:
+ return (left >= right);
+ case Audit_bitmask:
+ return (left & right);
+ case Audit_bittest:
+ return ((left & right) == right);
+ default:
+ return 0;
+ }
+}
+
int audit_uid_comparator(kuid_t left, u32 op, kuid_t right)
{
switch (op) {
@@ -1345,6 +1387,10 @@ int audit_filter(int msgtype, unsigned int listtype)
result = audit_comparator(audit_loginuid_set(current),
f->op, f->val);
break;
+ case AUDIT_CONTID:
+ result = audit_comparator64(audit_get_contid(current),
+ f->op, f->val64);
+ break;
case AUDIT_MSGTYPE:
result = audit_comparator(msgtype, f->op, f->val);
break;
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 3138c88887c7..a658fe775b86 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -629,6 +629,10 @@ static int audit_filter_rules(struct task_struct *tsk,
result = audit_comparator(ctx->sockaddr->ss_family,
f->op, f->val);
break;
+ case AUDIT_CONTID:
+ result = audit_comparator64(audit_get_contid(tsk),
+ f->op, f->val64);
+ break;
case AUDIT_SUBJ_USER:
case AUDIT_SUBJ_ROLE:
case AUDIT_SUBJ_TYPE:
--
1.8.3.1

2019-09-19 04:58:56

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 10/21] audit: add containerid support for user records

Add audit container identifier auxiliary record to user event standalone
records.

Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Ondrej Mosnacek <[email protected]>
---
kernel/audit.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index df3db29f5a8a..7cdb76b38966 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1140,12 +1140,6 @@ static void audit_log_common_recv_msg(struct audit_context *context,
audit_log_task_context(*ab);
}

-static inline void audit_log_user_recv_msg(struct audit_buffer **ab,
- u16 msg_type)
-{
- audit_log_common_recv_msg(NULL, ab, msg_type);
-}
-
int is_audit_feature_set(int i)
{
return af.features & AUDIT_FEATURE_TO_MASK(i);
@@ -1408,13 +1402,16 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)

err = audit_filter(msg_type, AUDIT_FILTER_USER);
if (err == 1) { /* match or error */
+ struct audit_context *context;
+
err = 0;
if (msg_type == AUDIT_USER_TTY) {
err = tty_audit_push();
if (err)
break;
}
- audit_log_user_recv_msg(&ab, msg_type);
+ context = audit_alloc_local(GFP_KERNEL);
+ audit_log_common_recv_msg(context, &ab, msg_type);
if (msg_type != AUDIT_USER_TTY)
audit_log_format(ab, " msg='%.*s'",
AUDIT_MESSAGE_TEXT_MAX,
@@ -1430,6 +1427,8 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
audit_log_n_untrustedstring(ab, data, size);
}
audit_log_end(ab);
+ audit_log_container_id(context, audit_get_contid(current));
+ audit_free_context(context);
}
break;
case AUDIT_ADD_RULE:
--
1.8.3.1

2019-09-19 05:10:55

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 06/21] audit: contid limit of 32k imposed to avoid DoS

Set an arbitrary limit on the number of audit container identifiers to
limit abuse.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
kernel/audit.c | 8 ++++++++
kernel/audit.h | 4 ++++
2 files changed, 12 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index 53d13d638c63..329916534dd2 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -139,6 +139,7 @@ struct audit_net {
struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
/* Hash for contid-based rules */
struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
+int audit_contid_count = 0;

static struct kmem_cache *audit_buffer_cache;

@@ -2384,6 +2385,7 @@ void audit_cont_put(struct audit_cont *cont)
put_task_struct(cont->owner);
list_del_rcu(&cont->list);
kfree_rcu(cont, rcu);
+ audit_contid_count--;
}
}

@@ -2456,6 +2458,11 @@ int audit_set_contid(struct task_struct *task, u64 contid)
goto conterror;
}
}
+ /* Set max contids */
+ if (audit_contid_count > AUDIT_CONTID_COUNT) {
+ rc = -ENOSPC;
+ goto conterror;
+ }
if (!newcont) {
newcont = kmalloc(sizeof(struct audit_cont), GFP_ATOMIC);
if (newcont) {
@@ -2465,6 +2472,7 @@ int audit_set_contid(struct task_struct *task, u64 contid)
newcont->owner = current;
refcount_set(&newcont->refcount, 1);
list_add_rcu(&newcont->list, &audit_contid_hash[h]);
+ audit_contid_count++;
} else {
rc = -ENOMEM;
goto conterror;
diff --git a/kernel/audit.h b/kernel/audit.h
index 162de8366b32..543f1334ba47 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -219,6 +219,10 @@ static inline int audit_hash_contid(u64 contid)
return (contid & (AUDIT_CONTID_BUCKETS-1));
}

+extern int audit_contid_count;
+
+#define AUDIT_CONTID_COUNT 1 << 16
+
/* Indicates that audit should log the full pathname. */
#define AUDIT_NAME_FULL -1

--
1.8.3.1

2019-09-19 05:10:55

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH ghak90 V7 04/21] audit: convert to contid list to check for orch/engine ownership

Store the audit container identifier in a refcounted kernel object that
is added to the master list of audit container identifiers. This will
allow multiple container orchestrators/engines to work on the same
machine without danger of inadvertantly re-using an existing identifier.
It will also allow an orchestrator to inject a process into an existing
container by checking if the original container owner is the one
injecting the task. A hash table list is used to optimize searches.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
include/linux/audit.h | 26 ++++++++++++++--
kernel/audit.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++---
kernel/audit.h | 8 +++++
3 files changed, 112 insertions(+), 8 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index f2e3b81f2942..e317807cdd3e 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -95,10 +95,18 @@ struct audit_ntp_data {
struct audit_ntp_data {};
#endif

+struct audit_cont {
+ struct list_head list;
+ u64 id;
+ struct task_struct *owner;
+ refcount_t refcount;
+ struct rcu_head rcu;
+};
+
struct audit_task_info {
kuid_t loginuid;
unsigned int sessionid;
- u64 contid;
+ struct audit_cont *cont;
#ifdef CONFIG_AUDITSYSCALL
struct audit_context *ctx;
#endif
@@ -203,11 +211,15 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)

static inline u64 audit_get_contid(struct task_struct *tsk)
{
- if (!tsk->audit)
+ if (!tsk->audit || !tsk->audit->cont)
return AUDIT_CID_UNSET;
- return tsk->audit->contid;
+ return tsk->audit->cont->id;
}

+extern struct audit_cont *audit_cont(struct task_struct *tsk);
+
+extern void audit_cont_put(struct audit_cont *cont);
+
extern u32 audit_enabled;

extern int audit_signal_info(int sig, struct task_struct *t);
@@ -277,6 +289,14 @@ static inline u64 audit_get_contid(struct task_struct *tsk)
return AUDIT_CID_UNSET;
}

+static inline struct audit_cont *audit_cont(struct task_struct *tsk)
+{
+ return NULL;
+}
+
+static inline void audit_cont_put(struct audit_cont *cont)
+{ }
+
#define audit_enabled AUDIT_OFF

static inline int audit_signal_info(int sig, struct task_struct *t)
diff --git a/kernel/audit.c b/kernel/audit.c
index a36ea57cbb61..ea0899130cc1 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -137,6 +137,8 @@ struct audit_net {

/* Hash for inode-based rules */
struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
+/* Hash for contid-based rules */
+struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];

static struct kmem_cache *audit_buffer_cache;

@@ -204,6 +206,8 @@ struct audit_reply {

static struct kmem_cache *audit_task_cache;

+static DEFINE_SPINLOCK(audit_contid_list_lock);
+
void __init audit_task_init(void)
{
audit_task_cache = kmem_cache_create("audit_task",
@@ -231,7 +235,9 @@ int audit_alloc(struct task_struct *tsk)
}
info->loginuid = audit_get_loginuid(current);
info->sessionid = audit_get_sessionid(current);
- info->contid = audit_get_contid(current);
+ info->cont = audit_cont(current);
+ if (info->cont)
+ refcount_inc(&info->cont->refcount);
tsk->audit = info;

ret = audit_alloc_syscall(tsk);
@@ -246,7 +252,7 @@ int audit_alloc(struct task_struct *tsk)
struct audit_task_info init_struct_audit = {
.loginuid = INVALID_UID,
.sessionid = AUDIT_SID_UNSET,
- .contid = AUDIT_CID_UNSET,
+ .cont = NULL,
#ifdef CONFIG_AUDITSYSCALL
.ctx = NULL,
#endif
@@ -266,6 +272,9 @@ void audit_free(struct task_struct *tsk)
/* Freeing the audit_task_info struct must be performed after
* audit_log_exit() due to need for loginuid and sessionid.
*/
+ spin_lock(&audit_contid_list_lock);
+ audit_cont_put(tsk->audit->cont);
+ spin_unlock(&audit_contid_list_lock);
info = tsk->audit;
tsk->audit = NULL;
kmem_cache_free(audit_task_cache, info);
@@ -1657,6 +1666,9 @@ static int __init audit_init(void)
for (i = 0; i < AUDIT_INODE_BUCKETS; i++)
INIT_LIST_HEAD(&audit_inode_hash[i]);

+ for (i = 0; i < AUDIT_CONTID_BUCKETS; i++)
+ INIT_LIST_HEAD(&audit_contid_hash[i]);
+
mutex_init(&audit_cmd_mutex.lock);
audit_cmd_mutex.owner = NULL;

@@ -2356,6 +2368,32 @@ int audit_signal_info(int sig, struct task_struct *t)
return audit_signal_info_syscall(t);
}

+struct audit_cont *audit_cont(struct task_struct *tsk)
+{
+ if (!tsk->audit || !tsk->audit->cont)
+ return NULL;
+ return tsk->audit->cont;
+}
+
+/* audit_contid_list_lock must be held by caller */
+void audit_cont_put(struct audit_cont *cont)
+{
+ if (!cont)
+ return;
+ if (refcount_dec_and_test(&cont->refcount)) {
+ put_task_struct(cont->owner);
+ list_del_rcu(&cont->list);
+ kfree_rcu(cont, rcu);
+ }
+}
+
+static struct task_struct *audit_cont_owner(struct task_struct *tsk)
+{
+ if (tsk->audit && tsk->audit->cont)
+ return tsk->audit->cont->owner;
+ return NULL;
+}
+
/*
* audit_set_contid - set current task's audit contid
* @task: target task
@@ -2382,9 +2420,12 @@ int audit_set_contid(struct task_struct *task, u64 contid)
}
oldcontid = audit_get_contid(task);
read_lock(&tasklist_lock);
- /* Don't allow the audit containerid to be unset */
+ /* Don't allow the contid to be unset */
if (!audit_contid_valid(contid))
rc = -EINVAL;
+ /* Don't allow the contid to be set to the same value again */
+ else if (contid == oldcontid) {
+ rc = -EADDRINUSE;
/* if we don't have caps, reject */
else if (!capable(CAP_AUDIT_CONTROL))
rc = -EPERM;
@@ -2397,8 +2438,43 @@ int audit_set_contid(struct task_struct *task, u64 contid)
else if (audit_contid_set(task))
rc = -ECHILD;
read_unlock(&tasklist_lock);
- if (!rc)
- task->audit->contid = contid;
+ if (!rc) {
+ struct audit_cont *oldcont = audit_cont(task);
+ struct audit_cont *cont = NULL;
+ struct audit_cont *newcont = NULL;
+ int h = audit_hash_contid(contid);
+
+ spin_lock(&audit_contid_list_lock);
+ list_for_each_entry_rcu(cont, &audit_contid_hash[h], list)
+ if (cont->id == contid) {
+ /* task injection to existing container */
+ if (current == cont->owner) {
+ refcount_inc(&cont->refcount);
+ newcont = cont;
+ } else {
+ rc = -ENOTUNIQ;
+ goto conterror;
+ }
+ }
+ if (!newcont) {
+ newcont = kmalloc(sizeof(struct audit_cont), GFP_ATOMIC);
+ if (newcont) {
+ INIT_LIST_HEAD(&newcont->list);
+ newcont->id = contid;
+ get_task_struct(current);
+ newcont->owner = current;
+ refcount_set(&newcont->refcount, 1);
+ list_add_rcu(&newcont->list, &audit_contid_hash[h]);
+ } else {
+ rc = -ENOMEM;
+ goto conterror;
+ }
+ }
+ task->audit->cont = newcont;
+ audit_cont_put(oldcont);
+conterror:
+ spin_unlock(&audit_contid_list_lock);
+ }
task_unlock(task);

if (!audit_enabled)
diff --git a/kernel/audit.h b/kernel/audit.h
index 16bd03b88e0d..e4a31aa92dfe 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -211,6 +211,14 @@ static inline int audit_hash_ino(u32 ino)
return (ino & (AUDIT_INODE_BUCKETS-1));
}

+#define AUDIT_CONTID_BUCKETS 32
+extern struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
+
+static inline int audit_hash_contid(u64 contid)
+{
+ return (contid & (AUDIT_CONTID_BUCKETS-1));
+}
+
/* Indicates that audit should log the full pathname. */
#define AUDIT_NAME_FULL -1

--
1.8.3.1

2019-09-26 14:49:11

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 04/21] audit: convert to contid list to check for orch/engine ownership

On Wed, Sep 18, 2019 at 09:22:21PM -0400, Richard Guy Briggs wrote:
> Store the audit container identifier in a refcounted kernel object that
> is added to the master list of audit container identifiers. This will
> allow multiple container orchestrators/engines to work on the same
> machine without danger of inadvertantly re-using an existing identifier.
> It will also allow an orchestrator to inject a process into an existing
> container by checking if the original container owner is the one
> injecting the task. A hash table list is used to optimize searches.
>
> Signed-off-by: Richard Guy Briggs <[email protected]>
> ---
> include/linux/audit.h | 26 ++++++++++++++--
> kernel/audit.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++---
> kernel/audit.h | 8 +++++
> 3 files changed, 112 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index f2e3b81f2942..e317807cdd3e 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -95,10 +95,18 @@ struct audit_ntp_data {
> struct audit_ntp_data {};
> #endif
>
> +struct audit_cont {
> + struct list_head list;
> + u64 id;
> + struct task_struct *owner;
> + refcount_t refcount;
> + struct rcu_head rcu;
> +};
> +
> struct audit_task_info {
> kuid_t loginuid;
> unsigned int sessionid;
> - u64 contid;
> + struct audit_cont *cont;
> #ifdef CONFIG_AUDITSYSCALL
> struct audit_context *ctx;
> #endif
> @@ -203,11 +211,15 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
>
> static inline u64 audit_get_contid(struct task_struct *tsk)
> {
> - if (!tsk->audit)
> + if (!tsk->audit || !tsk->audit->cont)
> return AUDIT_CID_UNSET;
> - return tsk->audit->contid;
> + return tsk->audit->cont->id;
> }
>
> +extern struct audit_cont *audit_cont(struct task_struct *tsk);
> +
> +extern void audit_cont_put(struct audit_cont *cont);
> +
I see that you manual increment this refcount at various call sites, why
no corresponding audit_contid_hold function?

Neil

> extern u32 audit_enabled;
>
> extern int audit_signal_info(int sig, struct task_struct *t);
> @@ -277,6 +289,14 @@ static inline u64 audit_get_contid(struct task_struct *tsk)
> return AUDIT_CID_UNSET;
> }
>
> +static inline struct audit_cont *audit_cont(struct task_struct *tsk)
> +{
> + return NULL;
> +}
> +
> +static inline void audit_cont_put(struct audit_cont *cont)
> +{ }
> +
> #define audit_enabled AUDIT_OFF
>
> static inline int audit_signal_info(int sig, struct task_struct *t)
> diff --git a/kernel/audit.c b/kernel/audit.c
> index a36ea57cbb61..ea0899130cc1 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -137,6 +137,8 @@ struct audit_net {
>
> /* Hash for inode-based rules */
> struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
> +/* Hash for contid-based rules */
> +struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
>
> static struct kmem_cache *audit_buffer_cache;
>
> @@ -204,6 +206,8 @@ struct audit_reply {
>
> static struct kmem_cache *audit_task_cache;
>
> +static DEFINE_SPINLOCK(audit_contid_list_lock);
> +
> void __init audit_task_init(void)
> {
> audit_task_cache = kmem_cache_create("audit_task",
> @@ -231,7 +235,9 @@ int audit_alloc(struct task_struct *tsk)
> }
> info->loginuid = audit_get_loginuid(current);
> info->sessionid = audit_get_sessionid(current);
> - info->contid = audit_get_contid(current);
> + info->cont = audit_cont(current);
> + if (info->cont)
> + refcount_inc(&info->cont->refcount);
> tsk->audit = info;
>
> ret = audit_alloc_syscall(tsk);
> @@ -246,7 +252,7 @@ int audit_alloc(struct task_struct *tsk)
> struct audit_task_info init_struct_audit = {
> .loginuid = INVALID_UID,
> .sessionid = AUDIT_SID_UNSET,
> - .contid = AUDIT_CID_UNSET,
> + .cont = NULL,
> #ifdef CONFIG_AUDITSYSCALL
> .ctx = NULL,
> #endif
> @@ -266,6 +272,9 @@ void audit_free(struct task_struct *tsk)
> /* Freeing the audit_task_info struct must be performed after
> * audit_log_exit() due to need for loginuid and sessionid.
> */
> + spin_lock(&audit_contid_list_lock);
> + audit_cont_put(tsk->audit->cont);
> + spin_unlock(&audit_contid_list_lock);
> info = tsk->audit;
> tsk->audit = NULL;
> kmem_cache_free(audit_task_cache, info);
> @@ -1657,6 +1666,9 @@ static int __init audit_init(void)
> for (i = 0; i < AUDIT_INODE_BUCKETS; i++)
> INIT_LIST_HEAD(&audit_inode_hash[i]);
>
> + for (i = 0; i < AUDIT_CONTID_BUCKETS; i++)
> + INIT_LIST_HEAD(&audit_contid_hash[i]);
> +
> mutex_init(&audit_cmd_mutex.lock);
> audit_cmd_mutex.owner = NULL;
>
> @@ -2356,6 +2368,32 @@ int audit_signal_info(int sig, struct task_struct *t)
> return audit_signal_info_syscall(t);
> }
>
> +struct audit_cont *audit_cont(struct task_struct *tsk)
> +{
> + if (!tsk->audit || !tsk->audit->cont)
> + return NULL;
> + return tsk->audit->cont;
> +}
> +
> +/* audit_contid_list_lock must be held by caller */
> +void audit_cont_put(struct audit_cont *cont)
> +{
> + if (!cont)
> + return;
> + if (refcount_dec_and_test(&cont->refcount)) {
> + put_task_struct(cont->owner);
> + list_del_rcu(&cont->list);
> + kfree_rcu(cont, rcu);
> + }
> +}
> +
> +static struct task_struct *audit_cont_owner(struct task_struct *tsk)
> +{
> + if (tsk->audit && tsk->audit->cont)
> + return tsk->audit->cont->owner;
> + return NULL;
> +}
> +
> /*
> * audit_set_contid - set current task's audit contid
> * @task: target task
> @@ -2382,9 +2420,12 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> }
> oldcontid = audit_get_contid(task);
> read_lock(&tasklist_lock);
> - /* Don't allow the audit containerid to be unset */
> + /* Don't allow the contid to be unset */
> if (!audit_contid_valid(contid))
> rc = -EINVAL;
> + /* Don't allow the contid to be set to the same value again */
> + else if (contid == oldcontid) {
> + rc = -EADDRINUSE;
> /* if we don't have caps, reject */
> else if (!capable(CAP_AUDIT_CONTROL))
> rc = -EPERM;
> @@ -2397,8 +2438,43 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> else if (audit_contid_set(task))
> rc = -ECHILD;
> read_unlock(&tasklist_lock);
> - if (!rc)
> - task->audit->contid = contid;
> + if (!rc) {
> + struct audit_cont *oldcont = audit_cont(task);
> + struct audit_cont *cont = NULL;
> + struct audit_cont *newcont = NULL;
> + int h = audit_hash_contid(contid);
> +
> + spin_lock(&audit_contid_list_lock);
> + list_for_each_entry_rcu(cont, &audit_contid_hash[h], list)
> + if (cont->id == contid) {
> + /* task injection to existing container */
> + if (current == cont->owner) {
> + refcount_inc(&cont->refcount);
> + newcont = cont;
> + } else {
> + rc = -ENOTUNIQ;
> + goto conterror;
> + }
> + }
> + if (!newcont) {
> + newcont = kmalloc(sizeof(struct audit_cont), GFP_ATOMIC);
> + if (newcont) {
> + INIT_LIST_HEAD(&newcont->list);
> + newcont->id = contid;
> + get_task_struct(current);
> + newcont->owner = current;
> + refcount_set(&newcont->refcount, 1);
> + list_add_rcu(&newcont->list, &audit_contid_hash[h]);
> + } else {
> + rc = -ENOMEM;
> + goto conterror;
> + }
> + }
> + task->audit->cont = newcont;
> + audit_cont_put(oldcont);
> +conterror:
> + spin_unlock(&audit_contid_list_lock);
> + }
> task_unlock(task);
>
> if (!audit_enabled)
> diff --git a/kernel/audit.h b/kernel/audit.h
> index 16bd03b88e0d..e4a31aa92dfe 100644
> --- a/kernel/audit.h
> +++ b/kernel/audit.h
> @@ -211,6 +211,14 @@ static inline int audit_hash_ino(u32 ino)
> return (ino & (AUDIT_INODE_BUCKETS-1));
> }
>
> +#define AUDIT_CONTID_BUCKETS 32
> +extern struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
> +
> +static inline int audit_hash_contid(u64 contid)
> +{
> + return (contid & (AUDIT_CONTID_BUCKETS-1));
> +}
> +
> /* Indicates that audit should log the full pathname. */
> #define AUDIT_NAME_FULL -1
>
> --
> 1.8.3.1
>
>

2019-09-27 12:52:36

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 06/21] audit: contid limit of 32k imposed to avoid DoS

On Wed, Sep 18, 2019 at 09:22:23PM -0400, Richard Guy Briggs wrote:
> Set an arbitrary limit on the number of audit container identifiers to
> limit abuse.
>
> Signed-off-by: Richard Guy Briggs <[email protected]>
> ---
> kernel/audit.c | 8 ++++++++
> kernel/audit.h | 4 ++++
> 2 files changed, 12 insertions(+)
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 53d13d638c63..329916534dd2 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -139,6 +139,7 @@ struct audit_net {
> struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
> /* Hash for contid-based rules */
> struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
> +int audit_contid_count = 0;
>
> static struct kmem_cache *audit_buffer_cache;
>
> @@ -2384,6 +2385,7 @@ void audit_cont_put(struct audit_cont *cont)
> put_task_struct(cont->owner);
> list_del_rcu(&cont->list);
> kfree_rcu(cont, rcu);
> + audit_contid_count--;
> }
> }
>
> @@ -2456,6 +2458,11 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> goto conterror;
> }
> }
> + /* Set max contids */
> + if (audit_contid_count > AUDIT_CONTID_COUNT) {
> + rc = -ENOSPC;
> + goto conterror;
> + }
You should check for audit_contid_count == AUDIT_CONTID_COUNT here, no?
or at least >=, since you increment it below. Otherwise its possible
that you will exceed it by one in the full condition.

> if (!newcont) {
> newcont = kmalloc(sizeof(struct audit_cont), GFP_ATOMIC);
> if (newcont) {
> @@ -2465,6 +2472,7 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> newcont->owner = current;
> refcount_set(&newcont->refcount, 1);
> list_add_rcu(&newcont->list, &audit_contid_hash[h]);
> + audit_contid_count++;
> } else {
> rc = -ENOMEM;
> goto conterror;
> diff --git a/kernel/audit.h b/kernel/audit.h
> index 162de8366b32..543f1334ba47 100644
> --- a/kernel/audit.h
> +++ b/kernel/audit.h
> @@ -219,6 +219,10 @@ static inline int audit_hash_contid(u64 contid)
> return (contid & (AUDIT_CONTID_BUCKETS-1));
> }
>
> +extern int audit_contid_count;
> +
> +#define AUDIT_CONTID_COUNT 1 << 16
> +
Just to ask the question, since it wasn't clear in the changelog, what
abuse are you avoiding here? Ostensibly you should be able to create as
many container ids as you have space for, and the simple creation of
container ids doesn't seem like the resource strain I would be concerned
about here, given that an orchestrator can still create as many
containers as the system will otherwise allow, which will consume
significantly more ram/disk/etc.

> /* Indicates that audit should log the full pathname. */
> #define AUDIT_NAME_FULL -1
>
> --
> 1.8.3.1
>
>

2019-10-11 00:39:28

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 04/21] audit: convert to contid list to check for orch/engine ownership

On Wed, Sep 18, 2019 at 9:24 PM Richard Guy Briggs <[email protected]> wrote:
> Store the audit container identifier in a refcounted kernel object that
> is added to the master list of audit container identifiers. This will
> allow multiple container orchestrators/engines to work on the same
> machine without danger of inadvertantly re-using an existing identifier.
> It will also allow an orchestrator to inject a process into an existing
> container by checking if the original container owner is the one
> injecting the task. A hash table list is used to optimize searches.
>
> Signed-off-by: Richard Guy Briggs <[email protected]>
> ---
> include/linux/audit.h | 26 ++++++++++++++--
> kernel/audit.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++---
> kernel/audit.h | 8 +++++
> 3 files changed, 112 insertions(+), 8 deletions(-)

One general comment before we go off into the weeds on this ... I can
understand why you wanted to keep this patch separate from the earlier
patches, but as we get closer to having mergeable code this should get
folded into the previous patches. For example, there shouldn't be a
change in audit_task_info where you change the contid field from a u64
to struct pointer, it should be a struct pointer from the start.

It's also disappointing that idr appears to only be for 32-bit ID
values, if we had a 64-bit idr I think we could simplify this greatly.

> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index f2e3b81f2942..e317807cdd3e 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -95,10 +95,18 @@ struct audit_ntp_data {
> struct audit_ntp_data {};
> #endif
>
> +struct audit_cont {
> + struct list_head list;
> + u64 id;
> + struct task_struct *owner;
> + refcount_t refcount;
> + struct rcu_head rcu;
> +};

It seems as though in most of the code you are using "contid", any
reason why didn't stick with that naming scheme here, e.g. "struct
audit_contid"?

> struct audit_task_info {
> kuid_t loginuid;
> unsigned int sessionid;
> - u64 contid;
> + struct audit_cont *cont;

Same, why not stick with "contid"?

> #ifdef CONFIG_AUDITSYSCALL
> struct audit_context *ctx;
> #endif
> @@ -203,11 +211,15 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
>
> static inline u64 audit_get_contid(struct task_struct *tsk)
> {
> - if (!tsk->audit)
> + if (!tsk->audit || !tsk->audit->cont)
> return AUDIT_CID_UNSET;
> - return tsk->audit->contid;
> + return tsk->audit->cont->id;
> }

Assuming for a moment that we implement an audit_contid_get() (see
Neil's comment as well as mine below), we probably need to name this
something different so we don't all lose our minds when we read this
code. On the plus side we can probably preface it with an underscore
since it is a static, in which case _audit_contid_get() might be okay,
but I'm open to suggestions.

> +extern struct audit_cont *audit_cont(struct task_struct *tsk);
> +
> +extern void audit_cont_put(struct audit_cont *cont);

More of the "contid" vs "cont".

> extern u32 audit_enabled;
>
> extern int audit_signal_info(int sig, struct task_struct *t);
> @@ -277,6 +289,14 @@ static inline u64 audit_get_contid(struct task_struct *tsk)
> return AUDIT_CID_UNSET;
> }
>
> +static inline struct audit_cont *audit_cont(struct task_struct *tsk)
> +{
> + return NULL;
> +}
> +
> +static inline void audit_cont_put(struct audit_cont *cont)
> +{ }
> +
> #define audit_enabled AUDIT_OFF
>
> static inline int audit_signal_info(int sig, struct task_struct *t)
> diff --git a/kernel/audit.c b/kernel/audit.c
> index a36ea57cbb61..ea0899130cc1 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -137,6 +137,8 @@ struct audit_net {
>
> /* Hash for inode-based rules */
> struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
> +/* Hash for contid-based rules */
> +struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
>
> static struct kmem_cache *audit_buffer_cache;
>
> @@ -204,6 +206,8 @@ struct audit_reply {
>
> static struct kmem_cache *audit_task_cache;
>
> +static DEFINE_SPINLOCK(audit_contid_list_lock);

Since it looks like this protectects audit_contid_hash, I think it
would be better to move it up underneath audit_contid_hash.

> void __init audit_task_init(void)
> {
> audit_task_cache = kmem_cache_create("audit_task",
> @@ -231,7 +235,9 @@ int audit_alloc(struct task_struct *tsk)
> }
> info->loginuid = audit_get_loginuid(current);
> info->sessionid = audit_get_sessionid(current);
> - info->contid = audit_get_contid(current);
> + info->cont = audit_cont(current);
> + if (info->cont)
> + refcount_inc(&info->cont->refcount);

See the other comments about a "get" function, but I think we need a
RCU read lock around the above, no?

> tsk->audit = info;
>
> ret = audit_alloc_syscall(tsk);
> @@ -246,7 +252,7 @@ int audit_alloc(struct task_struct *tsk)
> struct audit_task_info init_struct_audit = {
> .loginuid = INVALID_UID,
> .sessionid = AUDIT_SID_UNSET,
> - .contid = AUDIT_CID_UNSET,
> + .cont = NULL,

More "cont" vs "contid".

> #ifdef CONFIG_AUDITSYSCALL
> .ctx = NULL,
> #endif
> @@ -266,6 +272,9 @@ void audit_free(struct task_struct *tsk)
> /* Freeing the audit_task_info struct must be performed after
> * audit_log_exit() due to need for loginuid and sessionid.
> */
> + spin_lock(&audit_contid_list_lock);
> + audit_cont_put(tsk->audit->cont);
> + spin_unlock(&audit_contid_list_lock);

Perhaps this will make sense as I get further into the patchset, but
why not move the spin lock operations into audit_[cont/contid]_put()?

> info = tsk->audit;
> tsk->audit = NULL;
> kmem_cache_free(audit_task_cache, info);
> @@ -1657,6 +1666,9 @@ static int __init audit_init(void)
> for (i = 0; i < AUDIT_INODE_BUCKETS; i++)
> INIT_LIST_HEAD(&audit_inode_hash[i]);
>
> + for (i = 0; i < AUDIT_CONTID_BUCKETS; i++)
> + INIT_LIST_HEAD(&audit_contid_hash[i]);
> +
> mutex_init(&audit_cmd_mutex.lock);
> audit_cmd_mutex.owner = NULL;
>
> @@ -2356,6 +2368,32 @@ int audit_signal_info(int sig, struct task_struct *t)
> return audit_signal_info_syscall(t);
> }
>
> +struct audit_cont *audit_cont(struct task_struct *tsk)
> +{
> + if (!tsk->audit || !tsk->audit->cont)
> + return NULL;
> + return tsk->audit->cont;
> +}
> +
> +/* audit_contid_list_lock must be held by caller */
> +void audit_cont_put(struct audit_cont *cont)
> +{
> + if (!cont)
> + return;
> + if (refcount_dec_and_test(&cont->refcount)) {
> + put_task_struct(cont->owner);
> + list_del_rcu(&cont->list);
> + kfree_rcu(cont, rcu);
> + }
> +}

I tend to agree with Neil's previous comment; if we've got a
audit_[cont/contid]_put(), why not an audit_[cont/contid]_get()?

> +static struct task_struct *audit_cont_owner(struct task_struct *tsk)
> +{
> + if (tsk->audit && tsk->audit->cont)
> + return tsk->audit->cont->owner;
> + return NULL;
> +}

I'm not sure if this is possible (I haven't make my way through the
entire patchset) and the function above isn't used in this patch (why
is it here?), but it seems like it would be safer to convert this into
an audit_contid_isowner() function that simply returns 1/0 depending
on if the passed task_struct is the owner or not of a passed audit
container ID value?

> /*
> * audit_set_contid - set current task's audit contid
> * @task: target task
> @@ -2382,9 +2420,12 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> }
> oldcontid = audit_get_contid(task);
> read_lock(&tasklist_lock);
> - /* Don't allow the audit containerid to be unset */
> + /* Don't allow the contid to be unset */
> if (!audit_contid_valid(contid))
> rc = -EINVAL;
> + /* Don't allow the contid to be set to the same value again */
> + else if (contid == oldcontid) {
> + rc = -EADDRINUSE;
> /* if we don't have caps, reject */
> else if (!capable(CAP_AUDIT_CONTROL))
> rc = -EPERM;

RCU read lock? It's a bit dicey since I believe the tasklist_lock is
going to provide us the safety we need, but if we are going to claim
that the audit container ID list is protected by RCU we should
probably use it.

> @@ -2397,8 +2438,43 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> else if (audit_contid_set(task))
> rc = -ECHILD;
> read_unlock(&tasklist_lock);
> - if (!rc)
> - task->audit->contid = contid;
> + if (!rc) {
> + struct audit_cont *oldcont = audit_cont(task);

Previously we held the tasklist_lock to protect the audit container ID
associated with the struct, should we still be holding it here?

Regardless, I worry that the lock dependencies between the
tasklist_lock and the audit_contid_list_lock are going to be tricky.
It might be nice to document the relationship in a comment up near
where you declare audit_contid_list_lock.

> + struct audit_cont *cont = NULL;
> + struct audit_cont *newcont = NULL;
> + int h = audit_hash_contid(contid);
> +
> + spin_lock(&audit_contid_list_lock);
> + list_for_each_entry_rcu(cont, &audit_contid_hash[h], list)
> + if (cont->id == contid) {
> + /* task injection to existing container */
> + if (current == cont->owner) {

I understand the desire to limit a given audit container ID to the
orchestrator that created it, but are we certain that we can track
audit container ID "ownership" via a single instance of a task_struct?
What happens when the orchestrator stops/restarts/crashes? Do we
even care?

> + refcount_inc(&cont->refcount);
> + newcont = cont;

We can bail out of the loop here, yes?

> + } else {
> + rc = -ENOTUNIQ;
> + goto conterror;
> + }
> + }
> + if (!newcont) {
> + newcont = kmalloc(sizeof(struct audit_cont), GFP_ATOMIC);
> + if (newcont) {
> + INIT_LIST_HEAD(&newcont->list);
> + newcont->id = contid;
> + get_task_struct(current);
> + newcont->owner = current;
> + refcount_set(&newcont->refcount, 1);
> + list_add_rcu(&newcont->list, &audit_contid_hash[h]);
> + } else {
> + rc = -ENOMEM;
> + goto conterror;
> + }
> + }
> + task->audit->cont = newcont;
> + audit_cont_put(oldcont);
> +conterror:
> + spin_unlock(&audit_contid_list_lock);
> + }
> task_unlock(task);
>
> if (!audit_enabled)
> diff --git a/kernel/audit.h b/kernel/audit.h
> index 16bd03b88e0d..e4a31aa92dfe 100644
> --- a/kernel/audit.h
> +++ b/kernel/audit.h
> @@ -211,6 +211,14 @@ static inline int audit_hash_ino(u32 ino)
> return (ino & (AUDIT_INODE_BUCKETS-1));
> }
>
> +#define AUDIT_CONTID_BUCKETS 32
> +extern struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
> +
> +static inline int audit_hash_contid(u64 contid)
> +{
> + return (contid & (AUDIT_CONTID_BUCKETS-1));
> +}
> +
> /* Indicates that audit should log the full pathname. */
> #define AUDIT_NAME_FULL -1
>

--
paul moore
http://www.paul-moore.com

2019-10-11 00:40:18

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 05/21] audit: log drop of contid on exit of last task

On Wed, Sep 18, 2019 at 9:24 PM Richard Guy Briggs <[email protected]> wrote:
> Since we are tracking the life of each audit container indentifier, we
> can match the creation event with the destruction event. Log the
> destruction of the audit container identifier when the last process in
> that container exits.
>
> Signed-off-by: Richard Guy Briggs <[email protected]>
> ---
> kernel/audit.c | 32 ++++++++++++++++++++++++++++++++
> kernel/audit.h | 2 ++
> kernel/auditsc.c | 2 ++
> 3 files changed, 36 insertions(+)
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index ea0899130cc1..53d13d638c63 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -2503,6 +2503,38 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> return rc;
> }
>
> +void audit_log_container_drop(void)
> +{
> + struct audit_buffer *ab;
> + uid_t uid;
> + struct tty_struct *tty;
> + char comm[sizeof(current->comm)];
> +
> + if (!current->audit || !current->audit->cont ||
> + refcount_read(&current->audit->cont->refcount) > 1)
> + return;
> + ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_CONTAINER_OP);
> + if (!ab)
> + return;
> +
> + uid = from_kuid(&init_user_ns, task_uid(current));
> + tty = audit_get_tty();
> + audit_log_format(ab,
> + "op=drop opid=%d contid=%llu old-contid=%llu pid=%d uid=%u auid=%u tty=%s ses=%u",
> + task_tgid_nr(current), audit_get_contid(current),
> + audit_get_contid(current), task_tgid_nr(current), uid,
> + from_kuid(&init_user_ns, audit_get_loginuid(current)),
> + tty ? tty_name(tty) : "(none)",
> + audit_get_sessionid(current));
> + audit_put_tty(tty);
> + audit_log_task_context(ab);
> + audit_log_format(ab, " comm=");
> + audit_log_untrustedstring(ab, get_task_comm(comm, current));
> + audit_log_d_path_exe(ab, current->mm);
> + audit_log_format(ab, " res=1");
> + audit_log_end(ab);
> +}

Why can't we just do this in audit_cont_put()? Is it because we call
audit_cont_put() in the new audit_free() function? What if we were to
do it in __audit_free()/audit_free_syscall()?

--
paul moore
http://www.paul-moore.com

2019-10-11 00:40:23

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 06/21] audit: contid limit of 32k imposed to avoid DoS

On Fri, Sep 27, 2019 at 8:52 AM Neil Horman <[email protected]> wrote:
> On Wed, Sep 18, 2019 at 09:22:23PM -0400, Richard Guy Briggs wrote:
> > Set an arbitrary limit on the number of audit container identifiers to
> > limit abuse.
> >
> > Signed-off-by: Richard Guy Briggs <[email protected]>
> > ---
> > kernel/audit.c | 8 ++++++++
> > kernel/audit.h | 4 ++++
> > 2 files changed, 12 insertions(+)
> >
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 53d13d638c63..329916534dd2 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c

...

> > @@ -2465,6 +2472,7 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> > newcont->owner = current;
> > refcount_set(&newcont->refcount, 1);
> > list_add_rcu(&newcont->list, &audit_contid_hash[h]);
> > + audit_contid_count++;
> > } else {
> > rc = -ENOMEM;
> > goto conterror;
> > diff --git a/kernel/audit.h b/kernel/audit.h
> > index 162de8366b32..543f1334ba47 100644
> > --- a/kernel/audit.h
> > +++ b/kernel/audit.h
> > @@ -219,6 +219,10 @@ static inline int audit_hash_contid(u64 contid)
> > return (contid & (AUDIT_CONTID_BUCKETS-1));
> > }
> >
> > +extern int audit_contid_count;
> > +
> > +#define AUDIT_CONTID_COUNT 1 << 16
> > +
>
> Just to ask the question, since it wasn't clear in the changelog, what
> abuse are you avoiding here? Ostensibly you should be able to create as
> many container ids as you have space for, and the simple creation of
> container ids doesn't seem like the resource strain I would be concerned
> about here, given that an orchestrator can still create as many
> containers as the system will otherwise allow, which will consume
> significantly more ram/disk/etc.

I've got a similar question. Up to this point in the patchset, there
is a potential issue of hash bucket chain lengths and traversing them
with a spinlock held, but it seems like we shouldn't be putting an
arbitrary limit on audit container IDs unless we have a good reason
for it. If for some reason we do want to enforce a limit, it should
probably be a tunable value like a sysctl, or similar.

--
paul moore
http://www.paul-moore.com

2019-10-11 00:40:48

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 08/21] audit: add contid support for signalling the audit daemon

On Wed, Sep 18, 2019 at 9:25 PM Richard Guy Briggs <[email protected]> wrote:
> Add audit container identifier support to the action of signalling the
> audit daemon.
>
> Since this would need to add an element to the audit_sig_info struct,
> a new record type AUDIT_SIGNAL_INFO2 was created with a new
> audit_sig_info2 struct. Corresponding support is required in the
> userspace code to reflect the new record request and reply type.
> An older userspace won't break since it won't know to request this
> record type.
>
> Signed-off-by: Richard Guy Briggs <[email protected]>
> ---
> include/linux/audit.h | 7 +++++++
> include/uapi/linux/audit.h | 1 +
> kernel/audit.c | 28 ++++++++++++++++++++++++++++
> kernel/audit.h | 1 +
> security/selinux/nlmsgtab.c | 1 +
> 5 files changed, 38 insertions(+)
>
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 0c18d8e30620..7b640c4da4ee 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -23,6 +23,13 @@ struct audit_sig_info {
> char ctx[0];
> };
>
> +struct audit_sig_info2 {
> + uid_t uid;
> + pid_t pid;
> + u64 cid;
> + char ctx[0];
> +};
> +
> struct audit_buffer;
> struct audit_context;
> struct inode;
> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index 4ed080f28b47..693ec6e0288b 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -72,6 +72,7 @@
> #define AUDIT_SET_FEATURE 1018 /* Turn an audit feature on or off */
> #define AUDIT_GET_FEATURE 1019 /* Get which features are enabled */
> #define AUDIT_CONTAINER_OP 1020 /* Define the container id and info */
> +#define AUDIT_SIGNAL_INFO2 1021 /* Get info auditd signal sender */
>
> #define AUDIT_FIRST_USER_MSG 1100 /* Userspace messages mostly uninteresting to kernel */
> #define AUDIT_USER_AVC 1107 /* We filter this differently */
> diff --git a/kernel/audit.c b/kernel/audit.c
> index adfb3e6a7f0c..df3db29f5a8a 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -125,6 +125,7 @@ struct audit_net {
> kuid_t audit_sig_uid = INVALID_UID;
> pid_t audit_sig_pid = -1;
> u32 audit_sig_sid = 0;
> +u64 audit_sig_cid = AUDIT_CID_UNSET;
>
> /* Records can be lost in several ways:
> 0) [suppressed in audit_alloc]
> @@ -1094,6 +1095,7 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
> case AUDIT_ADD_RULE:
> case AUDIT_DEL_RULE:
> case AUDIT_SIGNAL_INFO:
> + case AUDIT_SIGNAL_INFO2:
> case AUDIT_TTY_GET:
> case AUDIT_TTY_SET:
> case AUDIT_TRIM:
> @@ -1257,6 +1259,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> struct audit_buffer *ab;
> u16 msg_type = nlh->nlmsg_type;
> struct audit_sig_info *sig_data;
> + struct audit_sig_info2 *sig_data2;
> char *ctx = NULL;
> u32 len;
>
> @@ -1516,6 +1519,30 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> sig_data, sizeof(*sig_data) + len);
> kfree(sig_data);
> break;
> + case AUDIT_SIGNAL_INFO2:
> + len = 0;
> + if (audit_sig_sid) {
> + err = security_secid_to_secctx(audit_sig_sid, &ctx, &len);
> + if (err)
> + return err;
> + }
> + sig_data2 = kmalloc(sizeof(*sig_data2) + len, GFP_KERNEL);
> + if (!sig_data2) {
> + if (audit_sig_sid)
> + security_release_secctx(ctx, len);
> + return -ENOMEM;
> + }
> + sig_data2->uid = from_kuid(&init_user_ns, audit_sig_uid);
> + sig_data2->pid = audit_sig_pid;
> + if (audit_sig_sid) {
> + memcpy(sig_data2->ctx, ctx, len);
> + security_release_secctx(ctx, len);
> + }
> + sig_data2->cid = audit_sig_cid;
> + audit_send_reply(skb, seq, AUDIT_SIGNAL_INFO2, 0, 0,
> + sig_data2, sizeof(*sig_data2) + len);
> + kfree(sig_data2);
> + break;
> case AUDIT_TTY_GET: {
> struct audit_tty_status s;
> unsigned int t;
> @@ -2384,6 +2411,7 @@ int audit_signal_info(int sig, struct task_struct *t)
> else
> audit_sig_uid = uid;
> security_task_getsecid(current, &audit_sig_sid);
> + audit_sig_cid = audit_get_contid(current);
> }

I've been wondering something as I've been working my way through
these patches and this patch seems like a good spot to discuss this
... Now that we have the concept of an audit container ID "lifetime"
in the kernel, when do we consider the ID gone? Is it when the last
process in the container exits, or is it when we generate the last
audit record which could possibly contain the audit container ID?
This patch would appear to support the former, but if we wanted the
latter we would need to grab a reference to the audit container ID
struct so it wouldn't "die" on us before we could emit the signal info
record.

--
paul moore
http://www.paul-moore.com

2019-10-11 00:41:35

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 14/21] audit: contid check descendancy and nesting

On Wed, Sep 18, 2019 at 9:26 PM Richard Guy Briggs <[email protected]> wrote:
> ?fixup! audit: convert to contid list to check for orch/engine ownership

?

> Require the target task to be a descendant of the container
> orchestrator/engine.
>
> You would only change the audit container ID from one set or inherited
> value to another if you were nesting containers.
>
> If changing the contid, the container orchestrator/engine must be a
> descendant and not same orchestrator as the one that set it so it is not
> possible to change the contid of another orchestrator's container.

Did you mean to say that the container orchestrator must be an
ancestor of the target, and the same orchestrator as the one that set
the target process' audit container ID?

Or maybe I'm missing something about what you are trying to do?

> Signed-off-by: Richard Guy Briggs <[email protected]>
> ---
> kernel/audit.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 62 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 9ce7a1ec7a92..69fe1e9af7cb 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -2560,6 +2560,39 @@ static struct task_struct *audit_cont_owner(struct task_struct *tsk)
> }
>
> /*
> + * task_is_descendant - walk up a process family tree looking for a match
> + * @parent: the process to compare against while walking up from child
> + * @child: the process to start from while looking upwards for parent
> + *
> + * Returns 1 if child is a descendant of parent, 0 if not.
> + */
> +static int task_is_descendant(struct task_struct *parent,
> + struct task_struct *child)
> +{
> + int rc = 0;
> + struct task_struct *walker = child;
> +
> + if (!parent || !child)
> + return 0;
> +
> + rcu_read_lock();
> + if (!thread_group_leader(parent))
> + parent = rcu_dereference(parent->group_leader);
> + while (walker->pid > 0) {
> + if (!thread_group_leader(walker))
> + walker = rcu_dereference(walker->group_leader);
> + if (walker == parent) {
> + rc = 1;
> + break;
> + }
> + walker = rcu_dereference(walker->real_parent);
> + }
> + rcu_read_unlock();
> +
> + return rc;
> +}
> +
> +/*
> * audit_set_contid - set current task's audit contid
> * @task: target task
> * @contid: contid value
> @@ -2587,22 +2620,43 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> oldcontid = audit_get_contid(task);
> read_lock(&tasklist_lock);
> /* Don't allow the contid to be unset */
> - if (!audit_contid_valid(contid))
> + if (!audit_contid_valid(contid)) {
> rc = -EINVAL;
> + goto unlock;
> + }
> /* Don't allow the contid to be set to the same value again */
> - else if (contid == oldcontid) {
> + if (contid == oldcontid) {
> rc = -EADDRINUSE;
> + goto unlock;
> + }
> /* if we don't have caps, reject */
> - else if (!capable(CAP_AUDIT_CONTROL))
> + if (!capable(CAP_AUDIT_CONTROL)) {
> rc = -EPERM;
> - /* if task has children or is not single-threaded, deny */
> - else if (!list_empty(&task->children))
> + goto unlock;
> + }
> + /* if task has children, deny */
> + if (!list_empty(&task->children)) {
> rc = -EBUSY;
> - else if (!(thread_group_leader(task) && thread_group_empty(task)))
> + goto unlock;
> + }
> + /* if task is not single-threaded, deny */
> + if (!(thread_group_leader(task) && thread_group_empty(task))) {
> rc = -EALREADY;
> - /* if contid is already set, deny */
> - else if (audit_contid_set(task))
> + goto unlock;
> + }
> + /* if task is not descendant, block */
> + if (task == current) {
> + rc = -EBADSLT;
> + goto unlock;
> + }
> + if (!task_is_descendant(current, task)) {
> + rc = -EXDEV;
> + goto unlock;
> + }
> + /* only allow contid setting again if nesting */
> + if (audit_contid_set(task) && current == audit_cont_owner(task))
> rc = -ECHILD;
> +unlock:
> read_unlock(&tasklist_lock);
> if (!rc) {
> struct audit_cont *oldcont = audit_cont(task);

--
paul moore
http://www.paul-moore.com

2019-10-11 00:42:01

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 13/21] audit: NETFILTER_PKT: record each container ID associated with a netNS

On Wed, Sep 18, 2019 at 9:26 PM Richard Guy Briggs <[email protected]> wrote:
> Add audit container identifier auxiliary record(s) to NETFILTER_PKT
> event standalone records. Iterate through all potential audit container
> identifiers associated with a network namespace.
>
> Signed-off-by: Richard Guy Briggs <[email protected]>
> Acked-by: Neil Horman <[email protected]>
> Reviewed-by: Ondrej Mosnacek <[email protected]>
> ---
> include/linux/audit.h | 5 +++++
> kernel/audit.c | 39 +++++++++++++++++++++++++++++++++++++++
> net/netfilter/nft_log.c | 11 +++++++++--
> net/netfilter/xt_AUDIT.c | 11 +++++++++--
> 4 files changed, 62 insertions(+), 4 deletions(-)

This should be squashed together with patch 12/21; neither patch makes
sense by themselves.

> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 73e3ab38e3e0..dcd92f964120 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -241,6 +241,8 @@ static inline u64 audit_get_contid(struct task_struct *tsk)
> extern void audit_netns_contid_del(struct net *net, u64 contid);
> extern void audit_switch_task_namespaces(struct nsproxy *ns,
> struct task_struct *p);
> +extern void audit_log_netns_contid_list(struct net *net,
> + struct audit_context *context);
>
> extern u32 audit_enabled;
>
> @@ -328,6 +330,9 @@ static inline void audit_netns_contid_del(struct net *net, u64 contid)
> static inline void audit_switch_task_namespaces(struct nsproxy *ns,
> struct task_struct *p)
> { }
> +static inline void audit_log_netns_contid_list(struct net *net,
> + struct audit_context *context)
> +{ }
>
> #define audit_enabled AUDIT_OFF
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index e0c27bc39925..9ce7a1ec7a92 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -450,6 +450,45 @@ void audit_switch_task_namespaces(struct nsproxy *ns, struct task_struct *p)
> audit_netns_contid_add(new->net_ns, contid);
> }
>
> +/**
> + * audit_log_netns_contid_list - List contids for the given network namespace
> + * @net: the network namespace of interest
> + * @context: the audit context to use
> + *
> + * Description:
> + * Issues a CONTAINER_ID record with a CSV list of contids associated
> + * with a network namespace to accompany a NETFILTER_PKT record.
> + */
> +void audit_log_netns_contid_list(struct net *net, struct audit_context *context)
> +{
> + struct audit_buffer *ab = NULL;
> + struct audit_contid *cont;
> + struct audit_net *aunet;
> +
> + /* Generate AUDIT_CONTAINER_ID record with container ID CSV list */
> + rcu_read_lock();
> + aunet = net_generic(net, audit_net_id);
> + if (!aunet)
> + goto out;
> + list_for_each_entry_rcu(cont, &aunet->contid_list, list) {
> + if (!ab) {
> + ab = audit_log_start(context, GFP_ATOMIC,
> + AUDIT_CONTAINER_ID);
> + if (!ab) {
> + audit_log_lost("out of memory in audit_log_netns_contid_list");
> + goto out;
> + }
> + audit_log_format(ab, "contid=");
> + } else
> + audit_log_format(ab, ",");
> + audit_log_format(ab, "%llu", cont->id);
> + }
> + audit_log_end(ab);
> +out:
> + rcu_read_unlock();
> +}
> +EXPORT_SYMBOL(audit_log_netns_contid_list);
> +
> void audit_panic(const char *message)
> {
> switch (audit_failure) {
> diff --git a/net/netfilter/nft_log.c b/net/netfilter/nft_log.c
> index fe4831f2258f..98d1e7e1a83c 100644
> --- a/net/netfilter/nft_log.c
> +++ b/net/netfilter/nft_log.c
> @@ -66,13 +66,16 @@ static void nft_log_eval_audit(const struct nft_pktinfo *pkt)
> struct sk_buff *skb = pkt->skb;
> struct audit_buffer *ab;
> int fam = -1;
> + struct audit_context *context;
> + struct net *net;
>
> if (!audit_enabled)
> return;
>
> - ab = audit_log_start(NULL, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
> + context = audit_alloc_local(GFP_ATOMIC);
> + ab = audit_log_start(context, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
> if (!ab)
> - return;
> + goto errout;
>
> audit_log_format(ab, "mark=%#x", skb->mark);
>
> @@ -99,6 +102,10 @@ static void nft_log_eval_audit(const struct nft_pktinfo *pkt)
> audit_log_format(ab, " saddr=? daddr=? proto=-1");
>
> audit_log_end(ab);
> + net = xt_net(&pkt->xt);
> + audit_log_netns_contid_list(net, context);
> +errout:
> + audit_free_context(context);
> }
>
> static void nft_log_eval(const struct nft_expr *expr,
> diff --git a/net/netfilter/xt_AUDIT.c b/net/netfilter/xt_AUDIT.c
> index 9cdc16b0d0d8..ecf868a1abde 100644
> --- a/net/netfilter/xt_AUDIT.c
> +++ b/net/netfilter/xt_AUDIT.c
> @@ -68,10 +68,13 @@ static bool audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)
> {
> struct audit_buffer *ab;
> int fam = -1;
> + struct audit_context *context;
> + struct net *net;
>
> if (audit_enabled == AUDIT_OFF)
> - goto errout;
> - ab = audit_log_start(NULL, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
> + goto out;
> + context = audit_alloc_local(GFP_ATOMIC);
> + ab = audit_log_start(context, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
> if (ab == NULL)
> goto errout;
>
> @@ -101,7 +104,11 @@ static bool audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)
>
> audit_log_end(ab);
>
> + net = xt_net(par);
> + audit_log_netns_contid_list(net, context);
> errout:
> + audit_free_context(context);
> +out:
> return XT_CONTINUE;
> }
>

--
paul moore
http://www.paul-moore.com

2019-10-11 00:42:02

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 16/21] audit: add support for contid set/get by netlink

On Wed, Sep 18, 2019 at 9:26 PM Richard Guy Briggs <[email protected]> wrote:
> Add the ability to get and set the audit container identifier using an
> audit netlink message using message types AUDIT_SET_CONTID 1023 and
> AUDIT_GET_CONTID 1022 in addition to using the proc filesystem. The
> message format includes the data structure:
>
> struct audit_contid_status {
> pid_t pid;
> u64 id;
> };
>
> Signed-off-by: Richard Guy Briggs <[email protected]>
> ---
> include/uapi/linux/audit.h | 2 ++
> kernel/audit.c | 40 ++++++++++++++++++++++++++++++++++++++++
> kernel/audit.h | 5 +++++
> 3 files changed, 47 insertions(+)

I'm not a fan of having multiple interfaces to do one thing if it can
be avoided. Presumably the argument for the netlink API is the
container folks don't want to have to mount /proc inside containers
which are going to host nested orchestrators? Can you reasonably run
a fully fledged orchestrator without a valid /proc?

--
paul moore
http://www.paul-moore.com

2019-10-11 00:42:21

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 17/21] audit: add support for loginuid/sessionid set/get by netlink

On Wed, Sep 18, 2019 at 9:27 PM Richard Guy Briggs <[email protected]> wrote:
> Add the ability to get and set the login uid and to get the session id
> using an audit netlink message using message types AUDIT_GET_LOGINUID
> 1024, AUDIT_SET_LOGINUID 1025 and AUDIT_GET_SESSIONID 1026 in addition
> to using the proc filesystem.
>
> Signed-off-by: Richard Guy Briggs <[email protected]>
> ---
> include/uapi/linux/audit.h | 3 +++
> kernel/audit.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 65 insertions(+)

This is completely independent of the audit container ID work, yes?
If so, it shouldn't be part of this patchset.

--
paul moore
http://www.paul-moore.com

2019-10-11 00:43:03

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 12/21] audit: add support for containerid to network namespaces

On Wed, Sep 18, 2019 at 9:26 PM Richard Guy Briggs <[email protected]> wrote:
> Audit events could happen in a network namespace outside of a task
> context due to packets received from the net that trigger an auditing
> rule prior to being associated with a running task. The network
> namespace could be in use by multiple containers by association to the
> tasks in that network namespace. We still want a way to attribute
> these events to any potential containers. Keep a list per network
> namespace to track these audit container identifiiers.
>
> Add/increment the audit container identifier on:
> - initial setting of the audit container identifier via /proc
> - clone/fork call that inherits an audit container identifier
> - unshare call that inherits an audit container identifier
> - setns call that inherits an audit container identifier
> Delete/decrement the audit container identifier on:
> - an inherited audit container identifier dropped when child set
> - process exit
> - unshare call that drops a net namespace
> - setns call that drops a net namespace
>
> Please see the github audit kernel issue for contid net support:
> https://github.com/linux-audit/audit-kernel/issues/92
> Please see the github audit testsuiite issue for the test case:
> https://github.com/linux-audit/audit-testsuite/issues/64
> Please see the github audit wiki for the feature overview:
> https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> Signed-off-by: Richard Guy Briggs <[email protected]>
> Acked-by: Neil Horman <[email protected]>
> Reviewed-by: Ondrej Mosnacek <[email protected]>
> ---
> include/linux/audit.h | 19 +++++++++++
> kernel/audit.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++--
> kernel/nsproxy.c | 4 +++
> 3 files changed, 108 insertions(+), 2 deletions(-)

...

> diff --git a/kernel/audit.c b/kernel/audit.c
> index 7cdb76b38966..e0c27bc39925 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -373,6 +381,75 @@ static struct sock *audit_get_sk(const struct net *net)
> return aunet->sk;
> }
>
> +void audit_netns_contid_add(struct net *net, u64 contid)
> +{
> + struct audit_net *aunet;
> + struct list_head *contid_list;
> + struct audit_contid *cont;
> +
> + if (!net)
> + return;
> + if (!audit_contid_valid(contid))
> + return;
> + aunet = net_generic(net, audit_net_id);
> + if (!aunet)
> + return;
> + contid_list = &aunet->contid_list;
> + spin_lock(&aunet->contid_list_lock);
> + list_for_each_entry_rcu(cont, contid_list, list)
> + if (cont->id == contid) {
> + refcount_inc(&cont->refcount);
> + goto out;
> + }
> + cont = kmalloc(sizeof(struct audit_contid), GFP_ATOMIC);

kmalloc(sizeof(*cont), GFP_ATOMIC)


> + if (cont) {
> + INIT_LIST_HEAD(&cont->list);
> + cont->id = contid;
> + refcount_set(&cont->refcount, 1);
> + list_add_rcu(&cont->list, contid_list);
> + }
> +out:
> + spin_unlock(&aunet->contid_list_lock);
> +}

--
paul moore
http://www.paul-moore.com

2019-10-11 00:43:39

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 15/21] sched: pull task_is_descendant into kernel/sched/core.c

On Wed, Sep 18, 2019 at 9:26 PM Richard Guy Briggs <[email protected]> wrote:
> Since the task_is_descendant() function is used in YAMA and in audit,
> pull the function into kernel/core/sched.c
>
> Signed-off-by: Richard Guy Briggs <[email protected]>
> ---
> include/linux/sched.h | 3 +++
> kernel/audit.c | 33 ---------------------------------
> kernel/sched/core.c | 33 +++++++++++++++++++++++++++++++++
> security/yama/yama_lsm.c | 33 ---------------------------------
> 4 files changed, 36 insertions(+), 66 deletions(-)

I'm not really reviewing this as I'm still a little confused from
patch 14/21, but if 14/21 works out as correct this patch should be
squashed into that one.

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index a936d162513a..b251f018f4db 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1988,4 +1988,7 @@ static inline void rseq_syscall(struct pt_regs *regs)
>
> const struct cpumask *sched_trace_rd_span(struct root_domain *rd);
>
> +extern int task_is_descendant(struct task_struct *parent,
> + struct task_struct *child);
> +
> #endif
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 69fe1e9af7cb..4fe7678304dd 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -2560,39 +2560,6 @@ static struct task_struct *audit_cont_owner(struct task_struct *tsk)
> }
>
> /*
> - * task_is_descendant - walk up a process family tree looking for a match
> - * @parent: the process to compare against while walking up from child
> - * @child: the process to start from while looking upwards for parent
> - *
> - * Returns 1 if child is a descendant of parent, 0 if not.
> - */
> -static int task_is_descendant(struct task_struct *parent,
> - struct task_struct *child)
> -{
> - int rc = 0;
> - struct task_struct *walker = child;
> -
> - if (!parent || !child)
> - return 0;
> -
> - rcu_read_lock();
> - if (!thread_group_leader(parent))
> - parent = rcu_dereference(parent->group_leader);
> - while (walker->pid > 0) {
> - if (!thread_group_leader(walker))
> - walker = rcu_dereference(walker->group_leader);
> - if (walker == parent) {
> - rc = 1;
> - break;
> - }
> - walker = rcu_dereference(walker->real_parent);
> - }
> - rcu_read_unlock();
> -
> - return rc;
> -}
> -
> -/*
> * audit_set_contid - set current task's audit contid
> * @task: target task
> * @contid: contid value
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2b037f195473..7ba9e07381fa 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7509,6 +7509,39 @@ void dump_cpu_task(int cpu)
> }
>
> /*
> + * task_is_descendant - walk up a process family tree looking for a match
> + * @parent: the process to compare against while walking up from child
> + * @child: the process to start from while looking upwards for parent
> + *
> + * Returns 1 if child is a descendant of parent, 0 if not.
> + */
> +int task_is_descendant(struct task_struct *parent,
> + struct task_struct *child)
> +{
> + int rc = 0;
> + struct task_struct *walker = child;
> +
> + if (!parent || !child)
> + return 0;
> +
> + rcu_read_lock();
> + if (!thread_group_leader(parent))
> + parent = rcu_dereference(parent->group_leader);
> + while (walker->pid > 0) {
> + if (!thread_group_leader(walker))
> + walker = rcu_dereference(walker->group_leader);
> + if (walker == parent) {
> + rc = 1;
> + break;
> + }
> + walker = rcu_dereference(walker->real_parent);
> + }
> + rcu_read_unlock();
> +
> + return rc;
> +}
> +
> +/*
> * Nice levels are multiplicative, with a gentle 10% change for every
> * nice level changed. I.e. when a CPU-bound task goes from nice 0 to
> * nice 1, it will get ~10% less CPU time than another CPU-bound task
> diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
> index 94dc346370b1..25eae205eae8 100644
> --- a/security/yama/yama_lsm.c
> +++ b/security/yama/yama_lsm.c
> @@ -263,39 +263,6 @@ static int yama_task_prctl(int option, unsigned long arg2, unsigned long arg3,
> }
>
> /**
> - * task_is_descendant - walk up a process family tree looking for a match
> - * @parent: the process to compare against while walking up from child
> - * @child: the process to start from while looking upwards for parent
> - *
> - * Returns 1 if child is a descendant of parent, 0 if not.
> - */
> -static int task_is_descendant(struct task_struct *parent,
> - struct task_struct *child)
> -{
> - int rc = 0;
> - struct task_struct *walker = child;
> -
> - if (!parent || !child)
> - return 0;
> -
> - rcu_read_lock();
> - if (!thread_group_leader(parent))
> - parent = rcu_dereference(parent->group_leader);
> - while (walker->pid > 0) {
> - if (!thread_group_leader(walker))
> - walker = rcu_dereference(walker->group_leader);
> - if (walker == parent) {
> - rc = 1;
> - break;
> - }
> - walker = rcu_dereference(walker->real_parent);
> - }
> - rcu_read_unlock();
> -
> - return rc;
> -}
> -
> -/**
> * ptracer_exception_found - tracer registered as exception for this tracee
> * @tracer: the task_struct of the process attempting ptrace
> * @tracee: the task_struct of the process to be ptraced

--
paul moore
http://www.paul-moore.com

2019-10-11 00:44:33

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 18/21] audit: track container nesting

On Wed, Sep 18, 2019 at 9:27 PM Richard Guy Briggs <[email protected]> wrote:
> Track the parent container of a container to be able to filter and
> report nesting.
>
> Now that we have a way to track and check the parent container of a
> container, fixup other patches, or squash all nesting fixes together.
>
> fixup! audit: add container id
> fixup! audit: log drop of contid on exit of last task
> fixup! audit: log container info of syscalls
> fixup! audit: add containerid filtering
> fixup! audit: NETFILTER_PKT: record each container ID associated with a netNS
> fixup! audit: convert to contid list to check for orch/engine ownership softirq (for netfilter) audit: protect contid list lock from softirq
>
> Signed-off-by: Richard Guy Briggs <[email protected]>
> ---
> include/linux/audit.h | 1 +
> kernel/audit.c | 67 ++++++++++++++++++++++++++++++++++++++++++---------
> kernel/audit.h | 3 +++
> kernel/auditfilter.c | 20 ++++++++++++++-
> kernel/auditsc.c | 2 +-
> 5 files changed, 79 insertions(+), 14 deletions(-)

This is my last comment of the patchset because this is where it
starts to get a little weird. I know we've talked about fixup!
patches some in the past, but perhaps I didn't do a very good job
communicating my poin; let me try again.

Submitting a fixup patch is okay if you've already posted a (lengthy)
patchset and there was a small nit that someone uncovered that needed
to be fixed prior to merging, assuming everyone (this includes the
reviewer, the patch author, and the maintainer) is okay with the
author posting the fix as fixup! patch then go for it. Done this way,
fixup patches can save a lot of development, testing, and review time.
However, in my opinion it is wrong to submit a patchset that has fixup
patches as part of the original posting. In this case fixup patches
have the opposite effect: the patchset becomes more complicated,
reviews take longer, and the likelihood of missing important details
increases.

When in doubt, don't submit separate fixup patches, fold them into the
original patches instead.

--
paul moore
http://www.paul-moore.com

2019-10-19 09:51:38

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On 2019-09-18 21:22, Richard Guy Briggs wrote:
> Provide a mechanism similar to CAP_AUDIT_CONTROL to explicitly give a
> process in a non-init user namespace the capability to set audit
> container identifiers.
>
> Use audit netlink message types AUDIT_GET_CAPCONTID 1027 and
> AUDIT_SET_CAPCONTID 1028. The message format includes the data
> structure:
> struct audit_capcontid_status {
> pid_t pid;
> u32 enable;
> };

Paul, can I get a review of the general idea here to see if you're ok
with this way of effectively extending CAP_AUDIT_CONTROL for the sake of
setting contid from beyond the init user namespace where capable() can't
reach and ns_capable() is meaningless for these purposes?

Last weekend was Canadian Thanksgiving where I took an extra day for an
annual bike trip and I'm buried to my neck in a complete kitchen gut
(down to 1920 structural double brick and knob/tube wiring), but I've
got fixes or responses to almost everything else you've raised which
I'll post shortly.

Thanks!

> Signed-off-by: Richard Guy Briggs <[email protected]>
> ---
> include/linux/audit.h | 14 +++++++
> include/uapi/linux/audit.h | 2 +
> kernel/audit.c | 98 +++++++++++++++++++++++++++++++++++++++++++++-
> kernel/audit.h | 5 +++
> 4 files changed, 117 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 1ce27af686ea..dcc53e62e266 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -117,6 +117,7 @@ struct audit_task_info {
> kuid_t loginuid;
> unsigned int sessionid;
> struct audit_cont *cont;
> + u32 capcontid;
> #ifdef CONFIG_AUDITSYSCALL
> struct audit_context *ctx;
> #endif
> @@ -224,6 +225,14 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
> return tsk->audit->sessionid;
> }
>
> +static inline u32 audit_get_capcontid(struct task_struct *tsk)
> +{
> + if (!tsk->audit)
> + return 0;
> + return tsk->audit->capcontid;
> +}
> +
> +extern int audit_set_capcontid(struct task_struct *tsk, u32 enable);
> extern int audit_set_contid(struct task_struct *tsk, u64 contid);
>
> static inline u64 audit_get_contid(struct task_struct *tsk)
> @@ -309,6 +318,11 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
> return AUDIT_SID_UNSET;
> }
>
> +static inline u32 audit_get_capcontid(struct task_struct *tsk)
> +{
> + return 0;
> +}
> +
> static inline u64 audit_get_contid(struct task_struct *tsk)
> {
> return AUDIT_CID_UNSET;
> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index eef42c8eea77..011b0a8ee9b2 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -78,6 +78,8 @@
> #define AUDIT_GET_LOGINUID 1024 /* Get loginuid of a task */
> #define AUDIT_SET_LOGINUID 1025 /* Set loginuid of a task */
> #define AUDIT_GET_SESSIONID 1026 /* Set sessionid of a task */
> +#define AUDIT_GET_CAPCONTID 1027 /* Get cap_contid of a task */
> +#define AUDIT_SET_CAPCONTID 1028 /* Set cap_contid of a task */
>
> #define AUDIT_FIRST_USER_MSG 1100 /* Userspace messages mostly uninteresting to kernel */
> #define AUDIT_USER_AVC 1107 /* We filter this differently */
> diff --git a/kernel/audit.c b/kernel/audit.c
> index a70c9184e5d9..7160da464849 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -1192,6 +1192,14 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
> case AUDIT_GET_SESSIONID:
> return 0;
> break;
> + case AUDIT_GET_CAPCONTID:
> + case AUDIT_SET_CAPCONTID:
> + case AUDIT_GET_CONTID:
> + case AUDIT_SET_CONTID:
> + if (!netlink_capable(skb, CAP_AUDIT_CONTROL) && !audit_get_capcontid(current))
> + return -EPERM;
> + return 0;
> + break;
> default: /* do more checks below */
> break;
> }
> @@ -1227,8 +1235,6 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
> case AUDIT_TTY_SET:
> case AUDIT_TRIM:
> case AUDIT_MAKE_EQUIV:
> - case AUDIT_GET_CONTID:
> - case AUDIT_SET_CONTID:
> case AUDIT_SET_LOGINUID:
> /* Only support auditd and auditctl in initial pid namespace
> * for now. */
> @@ -1304,6 +1310,23 @@ static int audit_get_contid_status(struct sk_buff *skb)
> return 0;
> }
>
> +static int audit_get_capcontid_status(struct sk_buff *skb)
> +{
> + struct nlmsghdr *nlh = nlmsg_hdr(skb);
> + u32 seq = nlh->nlmsg_seq;
> + void *data = nlmsg_data(nlh);
> + struct audit_capcontid_status cs;
> +
> + cs.pid = ((struct audit_capcontid_status *)data)->pid;
> + if (!cs.pid)
> + cs.pid = task_tgid_nr(current);
> + rcu_read_lock();
> + cs.enable = audit_get_capcontid(find_task_by_vpid(cs.pid));
> + rcu_read_unlock();
> + audit_send_reply(skb, seq, AUDIT_GET_CAPCONTID, 0, 0, &cs, sizeof(cs));
> + return 0;
> +}
> +
> struct audit_loginuid_status { uid_t loginuid; };
>
> static int audit_get_loginuid_status(struct sk_buff *skb)
> @@ -1779,6 +1802,27 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> if (err)
> return err;
> break;
> + case AUDIT_SET_CAPCONTID: {
> + struct audit_capcontid_status *s = data;
> + struct task_struct *tsk;
> +
> + /* check if new data is valid */
> + if (nlmsg_len(nlh) < sizeof(*s))
> + return -EINVAL;
> + tsk = find_get_task_by_vpid(s->pid);
> + if (!tsk)
> + return -EINVAL;
> +
> + err = audit_set_capcontid(tsk, s->enable);
> + put_task_struct(tsk);
> + return err;
> + break;
> + }
> + case AUDIT_GET_CAPCONTID:
> + err = audit_get_capcontid_status(skb);
> + if (err)
> + return err;
> + break;
> case AUDIT_SET_LOGINUID: {
> uid_t *loginuid = data;
> kuid_t kloginuid;
> @@ -2711,6 +2755,56 @@ static struct task_struct *audit_cont_owner(struct task_struct *tsk)
> return NULL;
> }
>
> +int audit_set_capcontid(struct task_struct *task, u32 enable)
> +{
> + u32 oldcapcontid;
> + int rc = 0;
> + struct audit_buffer *ab;
> + uid_t uid;
> + struct tty_struct *tty;
> + char comm[sizeof(current->comm)];
> +
> + if (!task->audit)
> + return -ENOPROTOOPT;
> + oldcapcontid = audit_get_capcontid(task);
> + /* if task is not descendant, block */
> + if (task == current)
> + rc = -EBADSLT;
> + else if (!task_is_descendant(current, task))
> + rc = -EXDEV;
> + else if (current_user_ns() == &init_user_ns) {
> + if (!capable(CAP_AUDIT_CONTROL) && !audit_get_capcontid(current))
> + rc = -EPERM;
> + }
> + if (!rc)
> + task->audit->capcontid = enable;
> +
> + if (!audit_enabled)
> + return rc;
> +
> + ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_SET_CAPCONTID);
> + if (!ab)
> + return rc;
> +
> + uid = from_kuid(&init_user_ns, task_uid(current));
> + tty = audit_get_tty();
> + audit_log_format(ab,
> + "opid=%d capcontid=%u old-capcontid=%u pid=%d uid=%u auid=%u tty=%s ses=%u",
> + task_tgid_nr(task), enable, oldcapcontid,
> + task_tgid_nr(current), uid,
> + from_kuid(&init_user_ns, audit_get_loginuid(current)),
> + tty ? tty_name(tty) : "(none)",
> + audit_get_sessionid(current));
> + audit_put_tty(tty);
> + audit_log_task_context(ab);
> + audit_log_format(ab, " comm=");
> + audit_log_untrustedstring(ab, get_task_comm(comm, current));
> + audit_log_d_path_exe(ab, current->mm);
> + audit_log_format(ab, " res=%d", !rc);
> + audit_log_end(ab);
> + return rc;
> +}
> +
> /*
> * audit_set_contid - set current task's audit contid
> * @task: target task
> diff --git a/kernel/audit.h b/kernel/audit.h
> index cb25341c1a0f..ac4694e88485 100644
> --- a/kernel/audit.h
> +++ b/kernel/audit.h
> @@ -231,6 +231,11 @@ struct audit_contid_status {
> u64 id;
> };
>
> +struct audit_capcontid_status {
> + pid_t pid;
> + u32 enable;
> +};
> +
> #define AUDIT_CONTID_DEPTH 5
>
> /* Indicates that audit should log the full pathname. */
> --
> 1.8.3.1
>

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-10-21 19:56:24

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On Fri, Oct 18, 2019 at 9:39 PM Richard Guy Briggs <[email protected]> wrote:
> On 2019-09-18 21:22, Richard Guy Briggs wrote:
> > Provide a mechanism similar to CAP_AUDIT_CONTROL to explicitly give a
> > process in a non-init user namespace the capability to set audit
> > container identifiers.
> >
> > Use audit netlink message types AUDIT_GET_CAPCONTID 1027 and
> > AUDIT_SET_CAPCONTID 1028. The message format includes the data
> > structure:
> > struct audit_capcontid_status {
> > pid_t pid;
> > u32 enable;
> > };
>
> Paul, can I get a review of the general idea here to see if you're ok
> with this way of effectively extending CAP_AUDIT_CONTROL for the sake of
> setting contid from beyond the init user namespace where capable() can't
> reach and ns_capable() is meaningless for these purposes?

I think my previous comment about having both the procfs and netlink
interfaces apply here. I don't see why we need two different APIs at
the start; explain to me why procfs isn't sufficient. If the argument
is simply the desire to avoid mounting procfs in the container, how
many container orchestrators can function today without a valid /proc?

--
paul moore
http://www.paul-moore.com

2019-10-21 21:40:00

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On 2019-10-21 15:53, Paul Moore wrote:
> On Fri, Oct 18, 2019 at 9:39 PM Richard Guy Briggs <[email protected]> wrote:
> > On 2019-09-18 21:22, Richard Guy Briggs wrote:
> > > Provide a mechanism similar to CAP_AUDIT_CONTROL to explicitly give a
> > > process in a non-init user namespace the capability to set audit
> > > container identifiers.
> > >
> > > Use audit netlink message types AUDIT_GET_CAPCONTID 1027 and
> > > AUDIT_SET_CAPCONTID 1028. The message format includes the data
> > > structure:
> > > struct audit_capcontid_status {
> > > pid_t pid;
> > > u32 enable;
> > > };
> >
> > Paul, can I get a review of the general idea here to see if you're ok
> > with this way of effectively extending CAP_AUDIT_CONTROL for the sake of
> > setting contid from beyond the init user namespace where capable() can't
> > reach and ns_capable() is meaningless for these purposes?
>
> I think my previous comment about having both the procfs and netlink
> interfaces apply here. I don't see why we need two different APIs at
> the start; explain to me why procfs isn't sufficient. If the argument
> is simply the desire to avoid mounting procfs in the container, how
> many container orchestrators can function today without a valid /proc?

Ok, sorry, I meant to address that question from a previous patch
comment at the same time.

It was raised by Eric Biederman that the proc filesystem interface for
audit had its limitations and he had suggested an audit netlink
interface made more sense.

The intent was to switch to the audit netlink interface for contid,
capcontid and to add the audit netlink interface for loginuid and
sessionid while deprecating the proc interface for loginuid and
sessionid. This was alluded to in the cover letter, but not very clear,
I'm afraid. I have patches to remove the contid and loginuid/sessionid
interfaces in another tree which is why I had forgotten to outline that
plan more explicitly in the cover letter.

> paul moore

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-10-21 21:44:18

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On Mon, Oct 21, 2019 at 5:38 PM Richard Guy Briggs <[email protected]> wrote:
> On 2019-10-21 15:53, Paul Moore wrote:
> > On Fri, Oct 18, 2019 at 9:39 PM Richard Guy Briggs <[email protected]> wrote:
> > > On 2019-09-18 21:22, Richard Guy Briggs wrote:
> > > > Provide a mechanism similar to CAP_AUDIT_CONTROL to explicitly give a
> > > > process in a non-init user namespace the capability to set audit
> > > > container identifiers.
> > > >
> > > > Use audit netlink message types AUDIT_GET_CAPCONTID 1027 and
> > > > AUDIT_SET_CAPCONTID 1028. The message format includes the data
> > > > structure:
> > > > struct audit_capcontid_status {
> > > > pid_t pid;
> > > > u32 enable;
> > > > };
> > >
> > > Paul, can I get a review of the general idea here to see if you're ok
> > > with this way of effectively extending CAP_AUDIT_CONTROL for the sake of
> > > setting contid from beyond the init user namespace where capable() can't
> > > reach and ns_capable() is meaningless for these purposes?
> >
> > I think my previous comment about having both the procfs and netlink
> > interfaces apply here. I don't see why we need two different APIs at
> > the start; explain to me why procfs isn't sufficient. If the argument
> > is simply the desire to avoid mounting procfs in the container, how
> > many container orchestrators can function today without a valid /proc?
>
> Ok, sorry, I meant to address that question from a previous patch
> comment at the same time.
>
> It was raised by Eric Biederman that the proc filesystem interface for
> audit had its limitations and he had suggested an audit netlink
> interface made more sense.

I'm sure you've got it handy, so I'm going to be lazy and ask: archive
pointer to Eric's comments? Just a heads-up, I'm really *not* a fan
of using the netlink interface for this, so unless Eric presents a
super compelling reason for why we shouldn't use procfs I'm inclined
to stick with /proc.

> The intent was to switch to the audit netlink interface for contid,
> capcontid and to add the audit netlink interface for loginuid and
> sessionid while deprecating the proc interface for loginuid and
> sessionid. This was alluded to in the cover letter, but not very clear,
> I'm afraid. I have patches to remove the contid and loginuid/sessionid
> interfaces in another tree which is why I had forgotten to outline that
> plan more explicitly in the cover letter.

--
paul moore
http://www.paul-moore.com

2019-10-21 23:59:04

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On 2019-10-21 17:43, Paul Moore wrote:
> On Mon, Oct 21, 2019 at 5:38 PM Richard Guy Briggs <[email protected]> wrote:
> > On 2019-10-21 15:53, Paul Moore wrote:
> > > On Fri, Oct 18, 2019 at 9:39 PM Richard Guy Briggs <[email protected]> wrote:
> > > > On 2019-09-18 21:22, Richard Guy Briggs wrote:
> > > > > Provide a mechanism similar to CAP_AUDIT_CONTROL to explicitly give a
> > > > > process in a non-init user namespace the capability to set audit
> > > > > container identifiers.
> > > > >
> > > > > Use audit netlink message types AUDIT_GET_CAPCONTID 1027 and
> > > > > AUDIT_SET_CAPCONTID 1028. The message format includes the data
> > > > > structure:
> > > > > struct audit_capcontid_status {
> > > > > pid_t pid;
> > > > > u32 enable;
> > > > > };
> > > >
> > > > Paul, can I get a review of the general idea here to see if you're ok
> > > > with this way of effectively extending CAP_AUDIT_CONTROL for the sake of
> > > > setting contid from beyond the init user namespace where capable() can't
> > > > reach and ns_capable() is meaningless for these purposes?
> > >
> > > I think my previous comment about having both the procfs and netlink
> > > interfaces apply here. I don't see why we need two different APIs at
> > > the start; explain to me why procfs isn't sufficient. If the argument
> > > is simply the desire to avoid mounting procfs in the container, how
> > > many container orchestrators can function today without a valid /proc?
> >
> > Ok, sorry, I meant to address that question from a previous patch
> > comment at the same time.
> >
> > It was raised by Eric Biederman that the proc filesystem interface for
> > audit had its limitations and he had suggested an audit netlink
> > interface made more sense.
>
> I'm sure you've got it handy, so I'm going to be lazy and ask: archive
> pointer to Eric's comments? Just a heads-up, I'm really *not* a fan
> of using the netlink interface for this, so unless Eric presents a
> super compelling reason for why we shouldn't use procfs I'm inclined
> to stick with /proc.

It was actually a video call with Eric and Steve where that was
recommended, so I can't provide you with any first-hand communication
about it. I'll get more details...

So, with that out of the way, could you please comment on the general
idea of what was intended to be the central idea of this mechanism to be
able to nest containers beyond the initial user namespace (knowing that
a /proc interface is available and the audit netlink interface isn't
necessary for it to work and the latter can be easily removed)?

> > The intent was to switch to the audit netlink interface for contid,
> > capcontid and to add the audit netlink interface for loginuid and
> > sessionid while deprecating the proc interface for loginuid and
> > sessionid. This was alluded to in the cover letter, but not very clear,
> > I'm afraid. I have patches to remove the contid and loginuid/sessionid
> > interfaces in another tree which is why I had forgotten to outline that
> > plan more explicitly in the cover letter.
>
> paul moore

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-10-22 00:32:25

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On Mon, Oct 21, 2019 at 7:58 PM Richard Guy Briggs <[email protected]> wrote:
> On 2019-10-21 17:43, Paul Moore wrote:
> > On Mon, Oct 21, 2019 at 5:38 PM Richard Guy Briggs <[email protected]> wrote:
> > > On 2019-10-21 15:53, Paul Moore wrote:
> > > > On Fri, Oct 18, 2019 at 9:39 PM Richard Guy Briggs <[email protected]> wrote:
> > > > > On 2019-09-18 21:22, Richard Guy Briggs wrote:
> > > > > > Provide a mechanism similar to CAP_AUDIT_CONTROL to explicitly give a
> > > > > > process in a non-init user namespace the capability to set audit
> > > > > > container identifiers.
> > > > > >
> > > > > > Use audit netlink message types AUDIT_GET_CAPCONTID 1027 and
> > > > > > AUDIT_SET_CAPCONTID 1028. The message format includes the data
> > > > > > structure:
> > > > > > struct audit_capcontid_status {
> > > > > > pid_t pid;
> > > > > > u32 enable;
> > > > > > };
> > > > >
> > > > > Paul, can I get a review of the general idea here to see if you're ok
> > > > > with this way of effectively extending CAP_AUDIT_CONTROL for the sake of
> > > > > setting contid from beyond the init user namespace where capable() can't
> > > > > reach and ns_capable() is meaningless for these purposes?
> > > >
> > > > I think my previous comment about having both the procfs and netlink
> > > > interfaces apply here. I don't see why we need two different APIs at
> > > > the start; explain to me why procfs isn't sufficient. If the argument
> > > > is simply the desire to avoid mounting procfs in the container, how
> > > > many container orchestrators can function today without a valid /proc?
> > >
> > > Ok, sorry, I meant to address that question from a previous patch
> > > comment at the same time.
> > >
> > > It was raised by Eric Biederman that the proc filesystem interface for
> > > audit had its limitations and he had suggested an audit netlink
> > > interface made more sense.
> >
> > I'm sure you've got it handy, so I'm going to be lazy and ask: archive
> > pointer to Eric's comments? Just a heads-up, I'm really *not* a fan
> > of using the netlink interface for this, so unless Eric presents a
> > super compelling reason for why we shouldn't use procfs I'm inclined
> > to stick with /proc.
>
> It was actually a video call with Eric and Steve where that was
> recommended, so I can't provide you with any first-hand communication
> about it. I'll get more details...

Yeah, that sort of information really needs to be on the list.

> So, with that out of the way, could you please comment on the general
> idea of what was intended to be the central idea of this mechanism to be
> able to nest containers beyond the initial user namespace (knowing that
> a /proc interface is available and the audit netlink interface isn't
> necessary for it to work and the latter can be easily removed)?

I'm not entirely clear what you are asking about, are you asking why I
care about nesting container orchestrators? Simply put, it is not
uncommon for the LXC/LXD folks to see nested container orchestrators,
so I felt it was important to support that use case. When we
originally started this effort we probably should have done a better
job reaching out to the LXC/LXD folks, we may have caught this
earlier. Regardless, we caught it, and it looks like we are on our
way to supporting it (that's good).

Are you asking why I prefer the procfs approach to setting/getting the
audit container ID? For one, it makes it easier for a LSM to enforce
the audit container ID operations independent of the other audit
control APIs. It also provides a simpler interface for container
orchestrators. Both seem like desirable traits as far as I'm
concerned.

> > > The intent was to switch to the audit netlink interface for contid,
> > > capcontid and to add the audit netlink interface for loginuid and
> > > sessionid while deprecating the proc interface for loginuid and
> > > sessionid. This was alluded to in the cover letter, but not very clear,
> > > I'm afraid. I have patches to remove the contid and loginuid/sessionid
> > > interfaces in another tree which is why I had forgotten to outline that
> > > plan more explicitly in the cover letter.

--
paul moore
http://www.paul-moore.com

2019-10-22 13:33:29

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On Mon, Oct 21, 2019 at 08:31:37PM -0400, Paul Moore wrote:
> On Mon, Oct 21, 2019 at 7:58 PM Richard Guy Briggs <[email protected]> wrote:
> > On 2019-10-21 17:43, Paul Moore wrote:
> > > On Mon, Oct 21, 2019 at 5:38 PM Richard Guy Briggs <[email protected]> wrote:
> > > > On 2019-10-21 15:53, Paul Moore wrote:
> > > > > On Fri, Oct 18, 2019 at 9:39 PM Richard Guy Briggs <[email protected]> wrote:
> > > > > > On 2019-09-18 21:22, Richard Guy Briggs wrote:
> > > > > > > Provide a mechanism similar to CAP_AUDIT_CONTROL to explicitly give a
> > > > > > > process in a non-init user namespace the capability to set audit
> > > > > > > container identifiers.
> > > > > > >
> > > > > > > Use audit netlink message types AUDIT_GET_CAPCONTID 1027 and
> > > > > > > AUDIT_SET_CAPCONTID 1028. The message format includes the data
> > > > > > > structure:
> > > > > > > struct audit_capcontid_status {
> > > > > > > pid_t pid;
> > > > > > > u32 enable;
> > > > > > > };
> > > > > >
> > > > > > Paul, can I get a review of the general idea here to see if you're ok
> > > > > > with this way of effectively extending CAP_AUDIT_CONTROL for the sake of
> > > > > > setting contid from beyond the init user namespace where capable() can't
> > > > > > reach and ns_capable() is meaningless for these purposes?
> > > > >
> > > > > I think my previous comment about having both the procfs and netlink
> > > > > interfaces apply here. I don't see why we need two different APIs at
> > > > > the start; explain to me why procfs isn't sufficient. If the argument
> > > > > is simply the desire to avoid mounting procfs in the container, how
> > > > > many container orchestrators can function today without a valid /proc?
> > > >
> > > > Ok, sorry, I meant to address that question from a previous patch
> > > > comment at the same time.
> > > >
> > > > It was raised by Eric Biederman that the proc filesystem interface for
> > > > audit had its limitations and he had suggested an audit netlink
> > > > interface made more sense.
> > >
> > > I'm sure you've got it handy, so I'm going to be lazy and ask: archive
> > > pointer to Eric's comments? Just a heads-up, I'm really *not* a fan
> > > of using the netlink interface for this, so unless Eric presents a
> > > super compelling reason for why we shouldn't use procfs I'm inclined
> > > to stick with /proc.
> >
> > It was actually a video call with Eric and Steve where that was
> > recommended, so I can't provide you with any first-hand communication
> > about it. I'll get more details...
>
> Yeah, that sort of information really needs to be on the list.
>
> > So, with that out of the way, could you please comment on the general
> > idea of what was intended to be the central idea of this mechanism to be
> > able to nest containers beyond the initial user namespace (knowing that
> > a /proc interface is available and the audit netlink interface isn't
> > necessary for it to work and the latter can be easily removed)?
>
> I'm not entirely clear what you are asking about, are you asking why I
> care about nesting container orchestrators? Simply put, it is not
> uncommon for the LXC/LXD folks to see nested container orchestrators,
> so I felt it was important to support that use case. When we
> originally started this effort we probably should have done a better
> job reaching out to the LXC/LXD folks, we may have caught this
> earlier. Regardless, we caught it, and it looks like we are on our
> way to supporting it (that's good).
>
> Are you asking why I prefer the procfs approach to setting/getting the
> audit container ID? For one, it makes it easier for a LSM to enforce
> the audit container ID operations independent of the other audit
> control APIs. It also provides a simpler interface for container
> orchestrators. Both seem like desirable traits as far as I'm
> concerned.
>
I agree that one api is probably the best approach here, but I actually
think that the netlink interface is the more flexible approach. Its a
little more work for userspace (you have to marshal your data into a
netlink message before sending it, and wait for an async response), but
thats a well known pattern, and it provides significantly more
flexibility for the kernel. LSM already has a hook to audit netlink
messages in sock_sendmsg, so thats not a problem, and if you use
netlink, you get the advantage of being able to broadcast messages
within your network namespaces, facilitating any needed orchestrator
co-ordination. To do the same thing with a filesystem api, you need to
use the fanotify api, which IIRC doesn't work on proc.

Neil

> > > > The intent was to switch to the audit netlink interface for contid,
> > > > capcontid and to add the audit netlink interface for loginuid and
> > > > sessionid while deprecating the proc interface for loginuid and
> > > > sessionid. This was alluded to in the cover letter, but not very clear,
> > > > I'm afraid. I have patches to remove the contid and loginuid/sessionid
> > > > interfaces in another tree which is why I had forgotten to outline that
> > > > plan more explicitly in the cover letter.
>
> --
> paul moore
> http://www.paul-moore.com
>

2019-10-22 14:15:37

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On Tue, Oct 22, 2019 at 8:13 AM Neil Horman <[email protected]> wrote:
> On Mon, Oct 21, 2019 at 08:31:37PM -0400, Paul Moore wrote:
> > On Mon, Oct 21, 2019 at 7:58 PM Richard Guy Briggs <[email protected]> wrote:
> > > On 2019-10-21 17:43, Paul Moore wrote:
> > > > On Mon, Oct 21, 2019 at 5:38 PM Richard Guy Briggs <[email protected]> wrote:
> > > > > On 2019-10-21 15:53, Paul Moore wrote:
> > > > > > On Fri, Oct 18, 2019 at 9:39 PM Richard Guy Briggs <[email protected]> wrote:
> > > > > > > On 2019-09-18 21:22, Richard Guy Briggs wrote:
> > > > > > > > Provide a mechanism similar to CAP_AUDIT_CONTROL to explicitly give a
> > > > > > > > process in a non-init user namespace the capability to set audit
> > > > > > > > container identifiers.
> > > > > > > >
> > > > > > > > Use audit netlink message types AUDIT_GET_CAPCONTID 1027 and
> > > > > > > > AUDIT_SET_CAPCONTID 1028. The message format includes the data
> > > > > > > > structure:
> > > > > > > > struct audit_capcontid_status {
> > > > > > > > pid_t pid;
> > > > > > > > u32 enable;
> > > > > > > > };
> > > > > > >
> > > > > > > Paul, can I get a review of the general idea here to see if you're ok
> > > > > > > with this way of effectively extending CAP_AUDIT_CONTROL for the sake of
> > > > > > > setting contid from beyond the init user namespace where capable() can't
> > > > > > > reach and ns_capable() is meaningless for these purposes?
> > > > > >
> > > > > > I think my previous comment about having both the procfs and netlink
> > > > > > interfaces apply here. I don't see why we need two different APIs at
> > > > > > the start; explain to me why procfs isn't sufficient. If the argument
> > > > > > is simply the desire to avoid mounting procfs in the container, how
> > > > > > many container orchestrators can function today without a valid /proc?
> > > > >
> > > > > Ok, sorry, I meant to address that question from a previous patch
> > > > > comment at the same time.
> > > > >
> > > > > It was raised by Eric Biederman that the proc filesystem interface for
> > > > > audit had its limitations and he had suggested an audit netlink
> > > > > interface made more sense.
> > > >
> > > > I'm sure you've got it handy, so I'm going to be lazy and ask: archive
> > > > pointer to Eric's comments? Just a heads-up, I'm really *not* a fan
> > > > of using the netlink interface for this, so unless Eric presents a
> > > > super compelling reason for why we shouldn't use procfs I'm inclined
> > > > to stick with /proc.
> > >
> > > It was actually a video call with Eric and Steve where that was
> > > recommended, so I can't provide you with any first-hand communication
> > > about it. I'll get more details...
> >
> > Yeah, that sort of information really needs to be on the list.
> >
> > > So, with that out of the way, could you please comment on the general
> > > idea of what was intended to be the central idea of this mechanism to be
> > > able to nest containers beyond the initial user namespace (knowing that
> > > a /proc interface is available and the audit netlink interface isn't
> > > necessary for it to work and the latter can be easily removed)?
> >
> > I'm not entirely clear what you are asking about, are you asking why I
> > care about nesting container orchestrators? Simply put, it is not
> > uncommon for the LXC/LXD folks to see nested container orchestrators,
> > so I felt it was important to support that use case. When we
> > originally started this effort we probably should have done a better
> > job reaching out to the LXC/LXD folks, we may have caught this
> > earlier. Regardless, we caught it, and it looks like we are on our
> > way to supporting it (that's good).
> >
> > Are you asking why I prefer the procfs approach to setting/getting the
> > audit container ID? For one, it makes it easier for a LSM to enforce
> > the audit container ID operations independent of the other audit
> > control APIs. It also provides a simpler interface for container
> > orchestrators. Both seem like desirable traits as far as I'm
> > concerned.
> >
> I agree that one api is probably the best approach here, but I actually
> think that the netlink interface is the more flexible approach. Its a
> little more work for userspace (you have to marshal your data into a
> netlink message before sending it, and wait for an async response), but
> thats a well known pattern, and it provides significantly more
> flexibility for the kernel. LSM already has a hook to audit netlink
> messages in sock_sendmsg, so thats not a problem ...

Look closely at how the LSM controls for netlink work and you'll see a
number of problems; basically command level granularity it hard. On
the other hand, per-file granularity it easy.

> ... and if you use
> netlink, you get the advantage of being able to broadcast messages
> within your network namespaces, facilitating any needed orchestrator
> co-ordination.

Please don't read this comment as support of the netlink approach, but
I don't think we want to use the multicast netlink; we would want it
to be more of client/server model so that we could enforce access
controls a bit easier. Besides, is this even a use case?

> To do the same thing with a filesystem api, you need to
> use the fanotify api, which IIRC doesn't work on proc.

Once again, I'm not sure this is a problem we are trying to solve
(broadcasting audit container ID across multiple tasks), is it?
Access to the audit container ID in userspace is something I've always
thought needs to be tightly controlled to prevent abuse.

--
paul moore
http://www.paul-moore.com

2019-10-22 14:29:11

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On 2019-10-21 20:31, Paul Moore wrote:
> On Mon, Oct 21, 2019 at 7:58 PM Richard Guy Briggs <[email protected]> wrote:
> > On 2019-10-21 17:43, Paul Moore wrote:
> > > On Mon, Oct 21, 2019 at 5:38 PM Richard Guy Briggs <[email protected]> wrote:
> > > > On 2019-10-21 15:53, Paul Moore wrote:
> > > > > On Fri, Oct 18, 2019 at 9:39 PM Richard Guy Briggs <[email protected]> wrote:
> > > > > > On 2019-09-18 21:22, Richard Guy Briggs wrote:
> > > > > > > Provide a mechanism similar to CAP_AUDIT_CONTROL to explicitly give a
> > > > > > > process in a non-init user namespace the capability to set audit
> > > > > > > container identifiers.
> > > > > > >
> > > > > > > Use audit netlink message types AUDIT_GET_CAPCONTID 1027 and
> > > > > > > AUDIT_SET_CAPCONTID 1028. The message format includes the data
> > > > > > > structure:
> > > > > > > struct audit_capcontid_status {
> > > > > > > pid_t pid;
> > > > > > > u32 enable;
> > > > > > > };
> > > > > >
> > > > > > Paul, can I get a review of the general idea here to see if you're ok
> > > > > > with this way of effectively extending CAP_AUDIT_CONTROL for the sake of
> > > > > > setting contid from beyond the init user namespace where capable() can't
> > > > > > reach and ns_capable() is meaningless for these purposes?
> > > > >
> > > > > I think my previous comment about having both the procfs and netlink
> > > > > interfaces apply here. I don't see why we need two different APIs at
> > > > > the start; explain to me why procfs isn't sufficient. If the argument
> > > > > is simply the desire to avoid mounting procfs in the container, how
> > > > > many container orchestrators can function today without a valid /proc?
> > > >
> > > > Ok, sorry, I meant to address that question from a previous patch
> > > > comment at the same time.
> > > >
> > > > It was raised by Eric Biederman that the proc filesystem interface for
> > > > audit had its limitations and he had suggested an audit netlink
> > > > interface made more sense.
> > >
> > > I'm sure you've got it handy, so I'm going to be lazy and ask: archive
> > > pointer to Eric's comments? Just a heads-up, I'm really *not* a fan
> > > of using the netlink interface for this, so unless Eric presents a
> > > super compelling reason for why we shouldn't use procfs I'm inclined
> > > to stick with /proc.
> >
> > It was actually a video call with Eric and Steve where that was
> > recommended, so I can't provide you with any first-hand communication
> > about it. I'll get more details...
>
> Yeah, that sort of information really needs to be on the list.
>
> > So, with that out of the way, could you please comment on the general
> > idea of what was intended to be the central idea of this mechanism to be
> > able to nest containers beyond the initial user namespace (knowing that
> > a /proc interface is available and the audit netlink interface isn't
> > necessary for it to work and the latter can be easily removed)?
>
> I'm not entirely clear what you are asking about, are you asking why I
> care about nesting container orchestrators? Simply put, it is not
> uncommon for the LXC/LXD folks to see nested container orchestrators,
> so I felt it was important to support that use case. When we
> originally started this effort we probably should have done a better
> job reaching out to the LXC/LXD folks, we may have caught this
> earlier. Regardless, we caught it, and it looks like we are on our
> way to supporting it (that's good).

I'm not asking why you care about container orchestrators.

> Are you asking why I prefer the procfs approach to setting/getting the
> audit container ID? For one, it makes it easier for a LSM to enforce
> the audit container ID operations independent of the other audit
> control APIs. It also provides a simpler interface for container
> orchestrators. Both seem like desirable traits as far as I'm
> concerned.

I'd like to leave the proc/netlink decision/debate out of this
discussion, though it does need to happen and I was hoping that would
happen on the loginuid/sessionid proc/netlink patch thread.

I'd like your perspective on how the capcontid feature was implemented
(aside from the proc/netlink api issue which was intended to be
consistent across loginuid/sessionid/contid/capcontid). Do you see this
feature as potentially solving the nested container issue in child user
namespaces?

> > > > The intent was to switch to the audit netlink interface for contid,
> > > > capcontid and to add the audit netlink interface for loginuid and
> > > > sessionid while deprecating the proc interface for loginuid and
> > > > sessionid. This was alluded to in the cover letter, but not very clear,
> > > > I'm afraid. I have patches to remove the contid and loginuid/sessionid
> > > > interfaces in another tree which is why I had forgotten to outline that
> > > > plan more explicitly in the cover letter.
>
> paul moore

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-10-22 14:38:53

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On Tue, Oct 22, 2019 at 10:27 AM Richard Guy Briggs <[email protected]> wrote:
> I'd like your perspective on how the capcontid feature was implemented
> (aside from the proc/netlink api issue which was intended to be
> consistent across loginuid/sessionid/contid/capcontid). Do you see this
> feature as potentially solving the nested container issue in child user
> namespaces?

The patchset is a bit messy at this point in the stack due to the
"fixup!" confusion and a few other things which I already mentioned so
I don't really want to comment too much on that until I can see
everything in a reasonable patch stack. Let's leave that for the next
draft.

--
paul moore
http://www.paul-moore.com

2019-10-22 23:50:54

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On 2019-10-22 08:13, Neil Horman wrote:
> On Mon, Oct 21, 2019 at 08:31:37PM -0400, Paul Moore wrote:
> > On Mon, Oct 21, 2019 at 7:58 PM Richard Guy Briggs <[email protected]> wrote:
> > > On 2019-10-21 17:43, Paul Moore wrote:
> > > > On Mon, Oct 21, 2019 at 5:38 PM Richard Guy Briggs <[email protected]> wrote:
> > > > > On 2019-10-21 15:53, Paul Moore wrote:
> > > > > > On Fri, Oct 18, 2019 at 9:39 PM Richard Guy Briggs <[email protected]> wrote:
> > > > > > > On 2019-09-18 21:22, Richard Guy Briggs wrote:
> > > > > > > > Provide a mechanism similar to CAP_AUDIT_CONTROL to explicitly give a
> > > > > > > > process in a non-init user namespace the capability to set audit
> > > > > > > > container identifiers.
> > > > > > > >
> > > > > > > > Use audit netlink message types AUDIT_GET_CAPCONTID 1027 and
> > > > > > > > AUDIT_SET_CAPCONTID 1028. The message format includes the data
> > > > > > > > structure:
> > > > > > > > struct audit_capcontid_status {
> > > > > > > > pid_t pid;
> > > > > > > > u32 enable;
> > > > > > > > };
> > > > > > >
> > > > > > > Paul, can I get a review of the general idea here to see if you're ok
> > > > > > > with this way of effectively extending CAP_AUDIT_CONTROL for the sake of
> > > > > > > setting contid from beyond the init user namespace where capable() can't
> > > > > > > reach and ns_capable() is meaningless for these purposes?
> > > > > >
> > > > > > I think my previous comment about having both the procfs and netlink
> > > > > > interfaces apply here. I don't see why we need two different APIs at
> > > > > > the start; explain to me why procfs isn't sufficient. If the argument
> > > > > > is simply the desire to avoid mounting procfs in the container, how
> > > > > > many container orchestrators can function today without a valid /proc?
> > > > >
> > > > > Ok, sorry, I meant to address that question from a previous patch
> > > > > comment at the same time.
> > > > >
> > > > > It was raised by Eric Biederman that the proc filesystem interface for
> > > > > audit had its limitations and he had suggested an audit netlink
> > > > > interface made more sense.
> > > >
> > > > I'm sure you've got it handy, so I'm going to be lazy and ask: archive
> > > > pointer to Eric's comments? Just a heads-up, I'm really *not* a fan
> > > > of using the netlink interface for this, so unless Eric presents a
> > > > super compelling reason for why we shouldn't use procfs I'm inclined
> > > > to stick with /proc.
> > >
> > > It was actually a video call with Eric and Steve where that was
> > > recommended, so I can't provide you with any first-hand communication
> > > about it. I'll get more details...
> >
> > Yeah, that sort of information really needs to be on the list.
> >
> > > So, with that out of the way, could you please comment on the general
> > > idea of what was intended to be the central idea of this mechanism to be
> > > able to nest containers beyond the initial user namespace (knowing that
> > > a /proc interface is available and the audit netlink interface isn't
> > > necessary for it to work and the latter can be easily removed)?
> >
> > I'm not entirely clear what you are asking about, are you asking why I
> > care about nesting container orchestrators? Simply put, it is not
> > uncommon for the LXC/LXD folks to see nested container orchestrators,
> > so I felt it was important to support that use case. When we
> > originally started this effort we probably should have done a better
> > job reaching out to the LXC/LXD folks, we may have caught this
> > earlier. Regardless, we caught it, and it looks like we are on our
> > way to supporting it (that's good).
> >
> > Are you asking why I prefer the procfs approach to setting/getting the
> > audit container ID? For one, it makes it easier for a LSM to enforce
> > the audit container ID operations independent of the other audit
> > control APIs. It also provides a simpler interface for container
> > orchestrators. Both seem like desirable traits as far as I'm
> > concerned.
>
> I agree that one api is probably the best approach here, but I actually
> think that the netlink interface is the more flexible approach. Its a
> little more work for userspace (you have to marshal your data into a
> netlink message before sending it, and wait for an async response), but
> thats a well known pattern, and it provides significantly more
> flexibility for the kernel. LSM already has a hook to audit netlink
> messages in sock_sendmsg, so thats not a problem, and if you use
> netlink, you get the advantage of being able to broadcast messages
> within your network namespaces, facilitating any needed orchestrator
> co-ordination. To do the same thing with a filesystem api, you need to
> use the fanotify api, which IIRC doesn't work on proc.

One api was the intent, deprecating proc for loginuid and sessionid if
netlink was the chosen way to go.

I don't think we had discussed the possibility or need to use netlink
multicast for this purpose and see it as a liability to limiting access
to only those processes that need it.

> Neil
>
> > > > > The intent was to switch to the audit netlink interface for contid,
> > > > > capcontid and to add the audit netlink interface for loginuid and
> > > > > sessionid while deprecating the proc interface for loginuid and
> > > > > sessionid. This was alluded to in the cover letter, but not very clear,
> > > > > I'm afraid. I have patches to remove the contid and loginuid/sessionid
> > > > > interfaces in another tree which is why I had forgotten to outline that
> > > > > plan more explicitly in the cover letter.
> >
> > paul moore

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-10-25 19:07:34

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On 2019-10-21 20:31, Paul Moore wrote:
> On Mon, Oct 21, 2019 at 7:58 PM Richard Guy Briggs <[email protected]> wrote:
> > On 2019-10-21 17:43, Paul Moore wrote:
> > > On Mon, Oct 21, 2019 at 5:38 PM Richard Guy Briggs <[email protected]> wrote:
> > > > On 2019-10-21 15:53, Paul Moore wrote:
> > > > > On Fri, Oct 18, 2019 at 9:39 PM Richard Guy Briggs <[email protected]> wrote:
> > > > > > On 2019-09-18 21:22, Richard Guy Briggs wrote:
> > > > > > > Provide a mechanism similar to CAP_AUDIT_CONTROL to explicitly give a
> > > > > > > process in a non-init user namespace the capability to set audit
> > > > > > > container identifiers.
> > > > > > >
> > > > > > > Use audit netlink message types AUDIT_GET_CAPCONTID 1027 and
> > > > > > > AUDIT_SET_CAPCONTID 1028. The message format includes the data
> > > > > > > structure:
> > > > > > > struct audit_capcontid_status {
> > > > > > > pid_t pid;
> > > > > > > u32 enable;
> > > > > > > };
> > > > > >
> > > > > > Paul, can I get a review of the general idea here to see if you're ok
> > > > > > with this way of effectively extending CAP_AUDIT_CONTROL for the sake of
> > > > > > setting contid from beyond the init user namespace where capable() can't
> > > > > > reach and ns_capable() is meaningless for these purposes?
> > > > >
> > > > > I think my previous comment about having both the procfs and netlink
> > > > > interfaces apply here. I don't see why we need two different APIs at
> > > > > the start; explain to me why procfs isn't sufficient. If the argument
> > > > > is simply the desire to avoid mounting procfs in the container, how
> > > > > many container orchestrators can function today without a valid /proc?
> > > >
> > > > Ok, sorry, I meant to address that question from a previous patch
> > > > comment at the same time.
> > > >
> > > > It was raised by Eric Biederman that the proc filesystem interface for
> > > > audit had its limitations and he had suggested an audit netlink
> > > > interface made more sense.
> > >
> > > I'm sure you've got it handy, so I'm going to be lazy and ask: archive
> > > pointer to Eric's comments? Just a heads-up, I'm really *not* a fan
> > > of using the netlink interface for this, so unless Eric presents a
> > > super compelling reason for why we shouldn't use procfs I'm inclined
> > > to stick with /proc.
> >
> > It was actually a video call with Eric and Steve where that was
> > recommended, so I can't provide you with any first-hand communication
> > about it. I'll get more details...
>
> Yeah, that sort of information really needs to be on the list.

Here's the note I had from that meeting:

- Eric raised the issue that using /proc is likely to get more and more
hoary due to mount namespaces and suggested that we use a netlink
audit message (or a new syscall) to set the audit container identifier
and since the loginuid is a similar type of operation, that it should be
migrated over to a similar mechanism to get it away from /proc. Get
could be done with a netlink audit message that triggers an audit log
message to deliver the information. I'm reluctant to further pollute
the syscall space if we can find another method. The netlink audit
message makes sense since any audit-enabled service is likely to already
have an audit socket open.

I don't have more detailed notes about what Eric said specifically.

> > So, with that out of the way, could you please comment on the general
> > idea of what was intended to be the central idea of this mechanism to be
> > able to nest containers beyond the initial user namespace (knowing that
> > a /proc interface is available and the audit netlink interface isn't
> > necessary for it to work and the latter can be easily removed)?
>
> I'm not entirely clear what you are asking about, are you asking why I
> care about nesting container orchestrators? Simply put, it is not
> uncommon for the LXC/LXD folks to see nested container orchestrators,
> so I felt it was important to support that use case. When we
> originally started this effort we probably should have done a better
> job reaching out to the LXC/LXD folks, we may have caught this
> earlier. Regardless, we caught it, and it looks like we are on our
> way to supporting it (that's good).
>
> Are you asking why I prefer the procfs approach to setting/getting the
> audit container ID? For one, it makes it easier for a LSM to enforce
> the audit container ID operations independent of the other audit
> control APIs. It also provides a simpler interface for container
> orchestrators. Both seem like desirable traits as far as I'm
> concerned.
>
> > > > The intent was to switch to the audit netlink interface for contid,
> > > > capcontid and to add the audit netlink interface for loginuid and
> > > > sessionid while deprecating the proc interface for loginuid and
> > > > sessionid. This was alluded to in the cover letter, but not very clear,
> > > > I'm afraid. I have patches to remove the contid and loginuid/sessionid
> > > > interfaces in another tree which is why I had forgotten to outline that
> > > > plan more explicitly in the cover letter.
>
> --
> paul moore
> http://www.paul-moore.com

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-10-25 19:08:14

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 06/21] audit: contid limit of 32k imposed to avoid DoS

On 2019-10-10 20:38, Paul Moore wrote:
> On Fri, Sep 27, 2019 at 8:52 AM Neil Horman <[email protected]> wrote:
> > On Wed, Sep 18, 2019 at 09:22:23PM -0400, Richard Guy Briggs wrote:
> > > Set an arbitrary limit on the number of audit container identifiers to
> > > limit abuse.
> > >
> > > Signed-off-by: Richard Guy Briggs <[email protected]>
> > > ---
> > > kernel/audit.c | 8 ++++++++
> > > kernel/audit.h | 4 ++++
> > > 2 files changed, 12 insertions(+)
> > >
> > > diff --git a/kernel/audit.c b/kernel/audit.c
> > > index 53d13d638c63..329916534dd2 100644
> > > --- a/kernel/audit.c
> > > +++ b/kernel/audit.c
>
> ...
>
> > > @@ -2465,6 +2472,7 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> > > newcont->owner = current;
> > > refcount_set(&newcont->refcount, 1);
> > > list_add_rcu(&newcont->list, &audit_contid_hash[h]);
> > > + audit_contid_count++;
> > > } else {
> > > rc = -ENOMEM;
> > > goto conterror;
> > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > index 162de8366b32..543f1334ba47 100644
> > > --- a/kernel/audit.h
> > > +++ b/kernel/audit.h
> > > @@ -219,6 +219,10 @@ static inline int audit_hash_contid(u64 contid)
> > > return (contid & (AUDIT_CONTID_BUCKETS-1));
> > > }
> > >
> > > +extern int audit_contid_count;
> > > +
> > > +#define AUDIT_CONTID_COUNT 1 << 16
> > > +
> >
> > Just to ask the question, since it wasn't clear in the changelog, what
> > abuse are you avoiding here? Ostensibly you should be able to create as
> > many container ids as you have space for, and the simple creation of
> > container ids doesn't seem like the resource strain I would be concerned
> > about here, given that an orchestrator can still create as many
> > containers as the system will otherwise allow, which will consume
> > significantly more ram/disk/etc.
>
> I've got a similar question. Up to this point in the patchset, there
> is a potential issue of hash bucket chain lengths and traversing them
> with a spinlock held, but it seems like we shouldn't be putting an
> arbitrary limit on audit container IDs unless we have a good reason
> for it. If for some reason we do want to enforce a limit, it should
> probably be a tunable value like a sysctl, or similar.

Can you separate and clarify the concerns here?

I plan to move this patch to the end of the patchset and make it
optional, possibly adding a tuning mechanism. Like the migration from
/proc to netlink for loginuid/sessionid/contid/capcontid, this was Eric
Biederman's concern and suggested mitigation.

As for the first issue of the bucket chain length traversal while
holding the list spin-lock, would you prefer to use the rcu lock to
traverse the list and then only hold the spin-lock when modifying the
list, and possibly even make the spin-lock more fine-grained per list?

> paul moore

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-10-25 19:10:24

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 14/21] audit: contid check descendancy and nesting

On 2019-10-10 20:40, Paul Moore wrote:
> On Wed, Sep 18, 2019 at 9:26 PM Richard Guy Briggs <[email protected]> wrote:
> > ?fixup! audit: convert to contid list to check for orch/engine ownership
>
> ?
>
> > Require the target task to be a descendant of the container
> > orchestrator/engine.
> >
> > You would only change the audit container ID from one set or inherited
> > value to another if you were nesting containers.
> >
> > If changing the contid, the container orchestrator/engine must be a
> > descendant and not same orchestrator as the one that set it so it is not
> > possible to change the contid of another orchestrator's container.
>
> Did you mean to say that the container orchestrator must be an
> ancestor of the target, and the same orchestrator as the one that set
> the target process' audit container ID?

Not quite, the first half yes, but the second half: if it was already
set by that orchestrator, it can't be set again. If it is a different
orchestrator that is a descendant of the orchestrator that set it, then
allow the action.

> Or maybe I'm missing something about what you are trying to do?

Does that help clarify it?

> > Signed-off-by: Richard Guy Briggs <[email protected]>
> > ---
> > kernel/audit.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
> > 1 file changed, 62 insertions(+), 8 deletions(-)
> >
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 9ce7a1ec7a92..69fe1e9af7cb 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -2560,6 +2560,39 @@ static struct task_struct *audit_cont_owner(struct task_struct *tsk)
> > }
> >
> > /*
> > + * task_is_descendant - walk up a process family tree looking for a match
> > + * @parent: the process to compare against while walking up from child
> > + * @child: the process to start from while looking upwards for parent
> > + *
> > + * Returns 1 if child is a descendant of parent, 0 if not.
> > + */
> > +static int task_is_descendant(struct task_struct *parent,
> > + struct task_struct *child)
> > +{
> > + int rc = 0;
> > + struct task_struct *walker = child;
> > +
> > + if (!parent || !child)
> > + return 0;
> > +
> > + rcu_read_lock();
> > + if (!thread_group_leader(parent))
> > + parent = rcu_dereference(parent->group_leader);
> > + while (walker->pid > 0) {
> > + if (!thread_group_leader(walker))
> > + walker = rcu_dereference(walker->group_leader);
> > + if (walker == parent) {
> > + rc = 1;
> > + break;
> > + }
> > + walker = rcu_dereference(walker->real_parent);
> > + }
> > + rcu_read_unlock();
> > +
> > + return rc;
> > +}
> > +
> > +/*
> > * audit_set_contid - set current task's audit contid
> > * @task: target task
> > * @contid: contid value
> > @@ -2587,22 +2620,43 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> > oldcontid = audit_get_contid(task);
> > read_lock(&tasklist_lock);
> > /* Don't allow the contid to be unset */
> > - if (!audit_contid_valid(contid))
> > + if (!audit_contid_valid(contid)) {
> > rc = -EINVAL;
> > + goto unlock;
> > + }
> > /* Don't allow the contid to be set to the same value again */
> > - else if (contid == oldcontid) {
> > + if (contid == oldcontid) {
> > rc = -EADDRINUSE;
> > + goto unlock;
> > + }
> > /* if we don't have caps, reject */
> > - else if (!capable(CAP_AUDIT_CONTROL))
> > + if (!capable(CAP_AUDIT_CONTROL)) {
> > rc = -EPERM;
> > - /* if task has children or is not single-threaded, deny */
> > - else if (!list_empty(&task->children))
> > + goto unlock;
> > + }
> > + /* if task has children, deny */
> > + if (!list_empty(&task->children)) {
> > rc = -EBUSY;
> > - else if (!(thread_group_leader(task) && thread_group_empty(task)))
> > + goto unlock;
> > + }
> > + /* if task is not single-threaded, deny */
> > + if (!(thread_group_leader(task) && thread_group_empty(task))) {
> > rc = -EALREADY;
> > - /* if contid is already set, deny */
> > - else if (audit_contid_set(task))
> > + goto unlock;
> > + }
> > + /* if task is not descendant, block */
> > + if (task == current) {
> > + rc = -EBADSLT;
> > + goto unlock;
> > + }
> > + if (!task_is_descendant(current, task)) {
> > + rc = -EXDEV;
> > + goto unlock;
> > + }
> > + /* only allow contid setting again if nesting */
> > + if (audit_contid_set(task) && current == audit_cont_owner(task))
> > rc = -ECHILD;
> > +unlock:
> > read_unlock(&tasklist_lock);
> > if (!rc) {
> > struct audit_cont *oldcont = audit_cont(task);
>
> --
> paul moore
> http://www.paul-moore.com

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-10-25 20:55:33

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 08/21] audit: add contid support for signalling the audit daemon

On 2019-10-10 20:39, Paul Moore wrote:
> On Wed, Sep 18, 2019 at 9:25 PM Richard Guy Briggs <[email protected]> wrote:
> > Add audit container identifier support to the action of signalling the
> > audit daemon.
> >
> > Since this would need to add an element to the audit_sig_info struct,
> > a new record type AUDIT_SIGNAL_INFO2 was created with a new
> > audit_sig_info2 struct. Corresponding support is required in the
> > userspace code to reflect the new record request and reply type.
> > An older userspace won't break since it won't know to request this
> > record type.
> >
> > Signed-off-by: Richard Guy Briggs <[email protected]>
> > ---
> > include/linux/audit.h | 7 +++++++
> > include/uapi/linux/audit.h | 1 +
> > kernel/audit.c | 28 ++++++++++++++++++++++++++++
> > kernel/audit.h | 1 +
> > security/selinux/nlmsgtab.c | 1 +
> > 5 files changed, 38 insertions(+)
> >
> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index 0c18d8e30620..7b640c4da4ee 100644
> > --- a/include/linux/audit.h
> > +++ b/include/linux/audit.h
> > @@ -23,6 +23,13 @@ struct audit_sig_info {
> > char ctx[0];
> > };
> >
> > +struct audit_sig_info2 {
> > + uid_t uid;
> > + pid_t pid;
> > + u64 cid;
> > + char ctx[0];
> > +};
> > +
> > struct audit_buffer;
> > struct audit_context;
> > struct inode;
> > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> > index 4ed080f28b47..693ec6e0288b 100644
> > --- a/include/uapi/linux/audit.h
> > +++ b/include/uapi/linux/audit.h
> > @@ -72,6 +72,7 @@
> > #define AUDIT_SET_FEATURE 1018 /* Turn an audit feature on or off */
> > #define AUDIT_GET_FEATURE 1019 /* Get which features are enabled */
> > #define AUDIT_CONTAINER_OP 1020 /* Define the container id and info */
> > +#define AUDIT_SIGNAL_INFO2 1021 /* Get info auditd signal sender */
> >
> > #define AUDIT_FIRST_USER_MSG 1100 /* Userspace messages mostly uninteresting to kernel */
> > #define AUDIT_USER_AVC 1107 /* We filter this differently */
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index adfb3e6a7f0c..df3db29f5a8a 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -125,6 +125,7 @@ struct audit_net {
> > kuid_t audit_sig_uid = INVALID_UID;
> > pid_t audit_sig_pid = -1;
> > u32 audit_sig_sid = 0;
> > +u64 audit_sig_cid = AUDIT_CID_UNSET;
> >
> > /* Records can be lost in several ways:
> > 0) [suppressed in audit_alloc]
> > @@ -1094,6 +1095,7 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
> > case AUDIT_ADD_RULE:
> > case AUDIT_DEL_RULE:
> > case AUDIT_SIGNAL_INFO:
> > + case AUDIT_SIGNAL_INFO2:
> > case AUDIT_TTY_GET:
> > case AUDIT_TTY_SET:
> > case AUDIT_TRIM:
> > @@ -1257,6 +1259,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> > struct audit_buffer *ab;
> > u16 msg_type = nlh->nlmsg_type;
> > struct audit_sig_info *sig_data;
> > + struct audit_sig_info2 *sig_data2;
> > char *ctx = NULL;
> > u32 len;
> >
> > @@ -1516,6 +1519,30 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> > sig_data, sizeof(*sig_data) + len);
> > kfree(sig_data);
> > break;
> > + case AUDIT_SIGNAL_INFO2:
> > + len = 0;
> > + if (audit_sig_sid) {
> > + err = security_secid_to_secctx(audit_sig_sid, &ctx, &len);
> > + if (err)
> > + return err;
> > + }
> > + sig_data2 = kmalloc(sizeof(*sig_data2) + len, GFP_KERNEL);
> > + if (!sig_data2) {
> > + if (audit_sig_sid)
> > + security_release_secctx(ctx, len);
> > + return -ENOMEM;
> > + }
> > + sig_data2->uid = from_kuid(&init_user_ns, audit_sig_uid);
> > + sig_data2->pid = audit_sig_pid;
> > + if (audit_sig_sid) {
> > + memcpy(sig_data2->ctx, ctx, len);
> > + security_release_secctx(ctx, len);
> > + }
> > + sig_data2->cid = audit_sig_cid;
> > + audit_send_reply(skb, seq, AUDIT_SIGNAL_INFO2, 0, 0,
> > + sig_data2, sizeof(*sig_data2) + len);
> > + kfree(sig_data2);
> > + break;
> > case AUDIT_TTY_GET: {
> > struct audit_tty_status s;
> > unsigned int t;
> > @@ -2384,6 +2411,7 @@ int audit_signal_info(int sig, struct task_struct *t)
> > else
> > audit_sig_uid = uid;
> > security_task_getsecid(current, &audit_sig_sid);
> > + audit_sig_cid = audit_get_contid(current);
> > }
>
> I've been wondering something as I've been working my way through
> these patches and this patch seems like a good spot to discuss this
> ... Now that we have the concept of an audit container ID "lifetime"
> in the kernel, when do we consider the ID gone? Is it when the last
> process in the container exits, or is it when we generate the last
> audit record which could possibly contain the audit container ID?
> This patch would appear to support the former, but if we wanted the
> latter we would need to grab a reference to the audit container ID
> struct so it wouldn't "die" on us before we could emit the signal info
> record.

Are you concerned with the availability of the data when the audit
signal info record is generated, when the kernel last deals with a
particular contid or when userspace thinks there will be no more
references to it?

I've got a bit of a dilemma with this one...

In fact, the latter situation you describe isn't a concern at present to
be able to deliver the information since the value is copied into the
audit signal global internal variables before the signalling task dies
and the audit signal info record is created from those copied (cached)
values when requested from userspace.

So the issue raised above I don't think is a problem. However, patch 18
(which wasn't reviewed because it was a patch to a number of preceeding
patches) changes the reporting approach to give a chain of nested
contids which isn't reflected in the same level of reporting for the
audit signal patch/mechanism. Solving this is a bit more complex. We
could have the audit signal internal caching store a pointer to the
relevant container object and bump its refcount to ensure it doesn't
vanish until we are done with it, but the audit signal info binary
record format already has a variable length due to the selinux context
at the end of that struct and adding a second variable length element to
it would make it more complicated (but not impossible) to handle.

> paul moore

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-10-25 20:56:04

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 05/21] audit: log drop of contid on exit of last task

On 2019-10-10 20:38, Paul Moore wrote:
> On Wed, Sep 18, 2019 at 9:24 PM Richard Guy Briggs <[email protected]> wrote:
> > Since we are tracking the life of each audit container indentifier, we
> > can match the creation event with the destruction event. Log the
> > destruction of the audit container identifier when the last process in
> > that container exits.
> >
> > Signed-off-by: Richard Guy Briggs <[email protected]>
> > ---
> > kernel/audit.c | 32 ++++++++++++++++++++++++++++++++
> > kernel/audit.h | 2 ++
> > kernel/auditsc.c | 2 ++
> > 3 files changed, 36 insertions(+)
> >
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index ea0899130cc1..53d13d638c63 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -2503,6 +2503,38 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> > return rc;
> > }
> >
> > +void audit_log_container_drop(void)
> > +{
> > + struct audit_buffer *ab;
> > + uid_t uid;
> > + struct tty_struct *tty;
> > + char comm[sizeof(current->comm)];
> > +
> > + if (!current->audit || !current->audit->cont ||
> > + refcount_read(&current->audit->cont->refcount) > 1)
> > + return;
> > + ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_CONTAINER_OP);
> > + if (!ab)
> > + return;
> > +
> > + uid = from_kuid(&init_user_ns, task_uid(current));
> > + tty = audit_get_tty();
> > + audit_log_format(ab,
> > + "op=drop opid=%d contid=%llu old-contid=%llu pid=%d uid=%u auid=%u tty=%s ses=%u",
> > + task_tgid_nr(current), audit_get_contid(current),
> > + audit_get_contid(current), task_tgid_nr(current), uid,
> > + from_kuid(&init_user_ns, audit_get_loginuid(current)),
> > + tty ? tty_name(tty) : "(none)",
> > + audit_get_sessionid(current));
> > + audit_put_tty(tty);
> > + audit_log_task_context(ab);
> > + audit_log_format(ab, " comm=");
> > + audit_log_untrustedstring(ab, get_task_comm(comm, current));
> > + audit_log_d_path_exe(ab, current->mm);
> > + audit_log_format(ab, " res=1");
> > + audit_log_end(ab);
> > +}
>
> Why can't we just do this in audit_cont_put()? Is it because we call
> audit_cont_put() in the new audit_free() function? What if we were to
> do it in __audit_free()/audit_free_syscall()?

The intent was to put this before the EOE record of a syscall so we
could fill out all the fields similarly to op=set, but this could stand
alone dropping or nulling a bunch of fields.

It would also never get printed if we left it before the EOE and had the
audit signal info record keep a reference to it.

Hmmm...

> paul moore

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-10-25 20:57:09

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 06/21] audit: contid limit of 32k imposed to avoid DoS

On 2019-09-27 08:51, Neil Horman wrote:
> On Wed, Sep 18, 2019 at 09:22:23PM -0400, Richard Guy Briggs wrote:
> > Set an arbitrary limit on the number of audit container identifiers to
> > limit abuse.
> >
> > Signed-off-by: Richard Guy Briggs <[email protected]>
> > ---
> > kernel/audit.c | 8 ++++++++
> > kernel/audit.h | 4 ++++
> > 2 files changed, 12 insertions(+)
> >
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 53d13d638c63..329916534dd2 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -139,6 +139,7 @@ struct audit_net {
> > struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
> > /* Hash for contid-based rules */
> > struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
> > +int audit_contid_count = 0;
> >
> > static struct kmem_cache *audit_buffer_cache;
> >
> > @@ -2384,6 +2385,7 @@ void audit_cont_put(struct audit_cont *cont)
> > put_task_struct(cont->owner);
> > list_del_rcu(&cont->list);
> > kfree_rcu(cont, rcu);
> > + audit_contid_count--;
> > }
> > }
> >
> > @@ -2456,6 +2458,11 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> > goto conterror;
> > }
> > }
> > + /* Set max contids */
> > + if (audit_contid_count > AUDIT_CONTID_COUNT) {
> > + rc = -ENOSPC;
> > + goto conterror;
> > + }
> You should check for audit_contid_count == AUDIT_CONTID_COUNT here, no?
> or at least >=, since you increment it below. Otherwise its possible
> that you will exceed it by one in the full condition.

Yes, agreed.

> > if (!newcont) {
> > newcont = kmalloc(sizeof(struct audit_cont), GFP_ATOMIC);
> > if (newcont) {
> > @@ -2465,6 +2472,7 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> > newcont->owner = current;
> > refcount_set(&newcont->refcount, 1);
> > list_add_rcu(&newcont->list, &audit_contid_hash[h]);
> > + audit_contid_count++;
> > } else {
> > rc = -ENOMEM;
> > goto conterror;
> > diff --git a/kernel/audit.h b/kernel/audit.h
> > index 162de8366b32..543f1334ba47 100644
> > --- a/kernel/audit.h
> > +++ b/kernel/audit.h
> > @@ -219,6 +219,10 @@ static inline int audit_hash_contid(u64 contid)
> > return (contid & (AUDIT_CONTID_BUCKETS-1));
> > }
> >
> > +extern int audit_contid_count;
> > +
> > +#define AUDIT_CONTID_COUNT 1 << 16
> > +
> Just to ask the question, since it wasn't clear in the changelog, what
> abuse are you avoiding here? Ostensibly you should be able to create as
> many container ids as you have space for, and the simple creation of
> container ids doesn't seem like the resource strain I would be concerned
> about here, given that an orchestrator can still create as many
> containers as the system will otherwise allow, which will consume
> significantly more ram/disk/etc.

Agreed. I'm not a huge fan of this, but included it as an optional
patch to address resource abuse concerns of Eric Beiderman. I'll push
it to the end of the patchset and make it clear it is optional unless I
hear a compelling reason to keep it.

A similar argument was used to make the audit queue length tunable
parameter unlimited.

> > /* Indicates that audit should log the full pathname. */
> > #define AUDIT_NAME_FULL -1
> >
> > --
> > 1.8.3.1

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-10-25 20:58:37

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 04/21] audit: convert to contid list to check for orch/engine ownership

On 2019-09-26 10:46, Neil Horman wrote:
> On Wed, Sep 18, 2019 at 09:22:21PM -0400, Richard Guy Briggs wrote:
> > Store the audit container identifier in a refcounted kernel object that
> > is added to the master list of audit container identifiers. This will
> > allow multiple container orchestrators/engines to work on the same
> > machine without danger of inadvertantly re-using an existing identifier.
> > It will also allow an orchestrator to inject a process into an existing
> > container by checking if the original container owner is the one
> > injecting the task. A hash table list is used to optimize searches.
> >
> > Signed-off-by: Richard Guy Briggs <[email protected]>
> > ---
> > include/linux/audit.h | 26 ++++++++++++++--
> > kernel/audit.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++---
> > kernel/audit.h | 8 +++++
> > 3 files changed, 112 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index f2e3b81f2942..e317807cdd3e 100644
> > --- a/include/linux/audit.h
> > +++ b/include/linux/audit.h
> > @@ -95,10 +95,18 @@ struct audit_ntp_data {
> > struct audit_ntp_data {};
> > #endif
> >
> > +struct audit_cont {
> > + struct list_head list;
> > + u64 id;
> > + struct task_struct *owner;
> > + refcount_t refcount;
> > + struct rcu_head rcu;
> > +};
> > +
> > struct audit_task_info {
> > kuid_t loginuid;
> > unsigned int sessionid;
> > - u64 contid;
> > + struct audit_cont *cont;
> > #ifdef CONFIG_AUDITSYSCALL
> > struct audit_context *ctx;
> > #endif
> > @@ -203,11 +211,15 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
> >
> > static inline u64 audit_get_contid(struct task_struct *tsk)
> > {
> > - if (!tsk->audit)
> > + if (!tsk->audit || !tsk->audit->cont)
> > return AUDIT_CID_UNSET;
> > - return tsk->audit->contid;
> > + return tsk->audit->cont->id;
> > }
> >
> > +extern struct audit_cont *audit_cont(struct task_struct *tsk);
> > +
> > +extern void audit_cont_put(struct audit_cont *cont);
> > +
> I see that you manual increment this refcount at various call sites, why
> no corresponding audit_contid_hold function?

I was trying to avoid the get function due to having one site where I
needed the pointer for later but didn't need a refcount to it so that I
could release the refcount it if it was replaced by another cont object.
A hold function would just contain one line that would call the
refcount_inc(). If I did convert things over to a get function, it
would hide some of this extra conditional code in the main calling
function, but in one place I could just call put immediately to
neutralize that unneeded refcount.

Would you see any issue with that extra get/put refcount that would only
happen in the case of changing a contid in a nesting situation?

> Neil
>
> > extern u32 audit_enabled;
> >
> > extern int audit_signal_info(int sig, struct task_struct *t);
> > @@ -277,6 +289,14 @@ static inline u64 audit_get_contid(struct task_struct *tsk)
> > return AUDIT_CID_UNSET;
> > }
> >
> > +static inline struct audit_cont *audit_cont(struct task_struct *tsk)
> > +{
> > + return NULL;
> > +}
> > +
> > +static inline void audit_cont_put(struct audit_cont *cont)
> > +{ }
> > +
> > #define audit_enabled AUDIT_OFF
> >
> > static inline int audit_signal_info(int sig, struct task_struct *t)
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index a36ea57cbb61..ea0899130cc1 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -137,6 +137,8 @@ struct audit_net {
> >
> > /* Hash for inode-based rules */
> > struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
> > +/* Hash for contid-based rules */
> > +struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
> >
> > static struct kmem_cache *audit_buffer_cache;
> >
> > @@ -204,6 +206,8 @@ struct audit_reply {
> >
> > static struct kmem_cache *audit_task_cache;
> >
> > +static DEFINE_SPINLOCK(audit_contid_list_lock);
> > +
> > void __init audit_task_init(void)
> > {
> > audit_task_cache = kmem_cache_create("audit_task",
> > @@ -231,7 +235,9 @@ int audit_alloc(struct task_struct *tsk)
> > }
> > info->loginuid = audit_get_loginuid(current);
> > info->sessionid = audit_get_sessionid(current);
> > - info->contid = audit_get_contid(current);
> > + info->cont = audit_cont(current);
> > + if (info->cont)
> > + refcount_inc(&info->cont->refcount);
> > tsk->audit = info;
> >
> > ret = audit_alloc_syscall(tsk);
> > @@ -246,7 +252,7 @@ int audit_alloc(struct task_struct *tsk)
> > struct audit_task_info init_struct_audit = {
> > .loginuid = INVALID_UID,
> > .sessionid = AUDIT_SID_UNSET,
> > - .contid = AUDIT_CID_UNSET,
> > + .cont = NULL,
> > #ifdef CONFIG_AUDITSYSCALL
> > .ctx = NULL,
> > #endif
> > @@ -266,6 +272,9 @@ void audit_free(struct task_struct *tsk)
> > /* Freeing the audit_task_info struct must be performed after
> > * audit_log_exit() due to need for loginuid and sessionid.
> > */
> > + spin_lock(&audit_contid_list_lock);
> > + audit_cont_put(tsk->audit->cont);
> > + spin_unlock(&audit_contid_list_lock);
> > info = tsk->audit;
> > tsk->audit = NULL;
> > kmem_cache_free(audit_task_cache, info);
> > @@ -1657,6 +1666,9 @@ static int __init audit_init(void)
> > for (i = 0; i < AUDIT_INODE_BUCKETS; i++)
> > INIT_LIST_HEAD(&audit_inode_hash[i]);
> >
> > + for (i = 0; i < AUDIT_CONTID_BUCKETS; i++)
> > + INIT_LIST_HEAD(&audit_contid_hash[i]);
> > +
> > mutex_init(&audit_cmd_mutex.lock);
> > audit_cmd_mutex.owner = NULL;
> >
> > @@ -2356,6 +2368,32 @@ int audit_signal_info(int sig, struct task_struct *t)
> > return audit_signal_info_syscall(t);
> > }
> >
> > +struct audit_cont *audit_cont(struct task_struct *tsk)
> > +{
> > + if (!tsk->audit || !tsk->audit->cont)
> > + return NULL;
> > + return tsk->audit->cont;
> > +}
> > +
> > +/* audit_contid_list_lock must be held by caller */
> > +void audit_cont_put(struct audit_cont *cont)
> > +{
> > + if (!cont)
> > + return;
> > + if (refcount_dec_and_test(&cont->refcount)) {
> > + put_task_struct(cont->owner);
> > + list_del_rcu(&cont->list);
> > + kfree_rcu(cont, rcu);
> > + }
> > +}
> > +
> > +static struct task_struct *audit_cont_owner(struct task_struct *tsk)
> > +{
> > + if (tsk->audit && tsk->audit->cont)
> > + return tsk->audit->cont->owner;
> > + return NULL;
> > +}
> > +
> > /*
> > * audit_set_contid - set current task's audit contid
> > * @task: target task
> > @@ -2382,9 +2420,12 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> > }
> > oldcontid = audit_get_contid(task);
> > read_lock(&tasklist_lock);
> > - /* Don't allow the audit containerid to be unset */
> > + /* Don't allow the contid to be unset */
> > if (!audit_contid_valid(contid))
> > rc = -EINVAL;
> > + /* Don't allow the contid to be set to the same value again */
> > + else if (contid == oldcontid) {
> > + rc = -EADDRINUSE;
> > /* if we don't have caps, reject */
> > else if (!capable(CAP_AUDIT_CONTROL))
> > rc = -EPERM;
> > @@ -2397,8 +2438,43 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> > else if (audit_contid_set(task))
> > rc = -ECHILD;
> > read_unlock(&tasklist_lock);
> > - if (!rc)
> > - task->audit->contid = contid;
> > + if (!rc) {
> > + struct audit_cont *oldcont = audit_cont(task);
> > + struct audit_cont *cont = NULL;
> > + struct audit_cont *newcont = NULL;
> > + int h = audit_hash_contid(contid);
> > +
> > + spin_lock(&audit_contid_list_lock);
> > + list_for_each_entry_rcu(cont, &audit_contid_hash[h], list)
> > + if (cont->id == contid) {
> > + /* task injection to existing container */
> > + if (current == cont->owner) {
> > + refcount_inc(&cont->refcount);
> > + newcont = cont;
> > + } else {
> > + rc = -ENOTUNIQ;
> > + goto conterror;
> > + }
> > + }
> > + if (!newcont) {
> > + newcont = kmalloc(sizeof(struct audit_cont), GFP_ATOMIC);
> > + if (newcont) {
> > + INIT_LIST_HEAD(&newcont->list);
> > + newcont->id = contid;
> > + get_task_struct(current);
> > + newcont->owner = current;
> > + refcount_set(&newcont->refcount, 1);
> > + list_add_rcu(&newcont->list, &audit_contid_hash[h]);
> > + } else {
> > + rc = -ENOMEM;
> > + goto conterror;
> > + }
> > + }
> > + task->audit->cont = newcont;
> > + audit_cont_put(oldcont);
> > +conterror:
> > + spin_unlock(&audit_contid_list_lock);
> > + }
> > task_unlock(task);
> >
> > if (!audit_enabled)
> > diff --git a/kernel/audit.h b/kernel/audit.h
> > index 16bd03b88e0d..e4a31aa92dfe 100644
> > --- a/kernel/audit.h
> > +++ b/kernel/audit.h
> > @@ -211,6 +211,14 @@ static inline int audit_hash_ino(u32 ino)
> > return (ino & (AUDIT_INODE_BUCKETS-1));
> > }
> >
> > +#define AUDIT_CONTID_BUCKETS 32
> > +extern struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
> > +
> > +static inline int audit_hash_contid(u64 contid)
> > +{
> > + return (contid & (AUDIT_CONTID_BUCKETS-1));
> > +}
> > +
> > /* Indicates that audit should log the full pathname. */
> > #define AUDIT_NAME_FULL -1
> >
> > --
> > 1.8.3.1
> >
> >

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-10-25 21:03:29

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 04/21] audit: convert to contid list to check for orch/engine ownership

On 2019-10-10 20:38, Paul Moore wrote:
> On Wed, Sep 18, 2019 at 9:24 PM Richard Guy Briggs <[email protected]> wrote:
> > Store the audit container identifier in a refcounted kernel object that
> > is added to the master list of audit container identifiers. This will
> > allow multiple container orchestrators/engines to work on the same
> > machine without danger of inadvertantly re-using an existing identifier.
> > It will also allow an orchestrator to inject a process into an existing
> > container by checking if the original container owner is the one
> > injecting the task. A hash table list is used to optimize searches.
> >
> > Signed-off-by: Richard Guy Briggs <[email protected]>
> > ---
> > include/linux/audit.h | 26 ++++++++++++++--
> > kernel/audit.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++---
> > kernel/audit.h | 8 +++++
> > 3 files changed, 112 insertions(+), 8 deletions(-)
>
> One general comment before we go off into the weeds on this ... I can
> understand why you wanted to keep this patch separate from the earlier
> patches, but as we get closer to having mergeable code this should get
> folded into the previous patches. For example, there shouldn't be a
> change in audit_task_info where you change the contid field from a u64
> to struct pointer, it should be a struct pointer from the start.

I should have marked this patchset as RFC even though it was v7 due to a
lot of new ideas/code that was added with uncertainties needing comment
and direction.

> It's also disappointing that idr appears to only be for 32-bit ID
> values, if we had a 64-bit idr I think we could simplify this greatly.

Perhaps. I do still see value in letting the orchestrator choose the
value.

> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index f2e3b81f2942..e317807cdd3e 100644
> > --- a/include/linux/audit.h
> > +++ b/include/linux/audit.h
> > @@ -95,10 +95,18 @@ struct audit_ntp_data {
> > struct audit_ntp_data {};
> > #endif
> >
> > +struct audit_cont {
> > + struct list_head list;
> > + u64 id;
> > + struct task_struct *owner;
> > + refcount_t refcount;
> > + struct rcu_head rcu;
> > +};
>
> It seems as though in most of the code you are using "contid", any
> reason why didn't stick with that naming scheme here, e.g. "struct
> audit_contid"?

I was using contid to refer to the value itself and cont to refer to the
refcounted object. I find cont a bit too terse, so I'm still thinking
of changing it. Perhaps contobj?

> > struct audit_task_info {
> > kuid_t loginuid;
> > unsigned int sessionid;
> > - u64 contid;
> > + struct audit_cont *cont;
>
> Same, why not stick with "contid"?

^^^

> > #ifdef CONFIG_AUDITSYSCALL
> > struct audit_context *ctx;
> > #endif
> > @@ -203,11 +211,15 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
> >
> > static inline u64 audit_get_contid(struct task_struct *tsk)
> > {
> > - if (!tsk->audit)
> > + if (!tsk->audit || !tsk->audit->cont)
> > return AUDIT_CID_UNSET;
> > - return tsk->audit->contid;
> > + return tsk->audit->cont->id;
> > }
>
> Assuming for a moment that we implement an audit_contid_get() (see
> Neil's comment as well as mine below), we probably need to name this
> something different so we don't all lose our minds when we read this
> code. On the plus side we can probably preface it with an underscore
> since it is a static, in which case _audit_contid_get() might be okay,
> but I'm open to suggestions.

I'm fine with the "_" prefix, can you point to precedent or convention?

> > +extern struct audit_cont *audit_cont(struct task_struct *tsk);
> > +
> > +extern void audit_cont_put(struct audit_cont *cont);
>
> More of the "contid" vs "cont".

^^^

> > extern u32 audit_enabled;
> >
> > extern int audit_signal_info(int sig, struct task_struct *t);
> > @@ -277,6 +289,14 @@ static inline u64 audit_get_contid(struct task_struct *tsk)
> > return AUDIT_CID_UNSET;
> > }
> >
> > +static inline struct audit_cont *audit_cont(struct task_struct *tsk)
> > +{
> > + return NULL;
> > +}
> > +
> > +static inline void audit_cont_put(struct audit_cont *cont)
> > +{ }
> > +
> > #define audit_enabled AUDIT_OFF
> >
> > static inline int audit_signal_info(int sig, struct task_struct *t)
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index a36ea57cbb61..ea0899130cc1 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -137,6 +137,8 @@ struct audit_net {
> >
> > /* Hash for inode-based rules */
> > struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
> > +/* Hash for contid-based rules */
> > +struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
> >
> > static struct kmem_cache *audit_buffer_cache;
> >
> > @@ -204,6 +206,8 @@ struct audit_reply {
> >
> > static struct kmem_cache *audit_task_cache;
> >
> > +static DEFINE_SPINLOCK(audit_contid_list_lock);
>
> Since it looks like this protectects audit_contid_hash, I think it
> would be better to move it up underneath audit_contid_hash.

Agreed.

> > void __init audit_task_init(void)
> > {
> > audit_task_cache = kmem_cache_create("audit_task",
> > @@ -231,7 +235,9 @@ int audit_alloc(struct task_struct *tsk)
> > }
> > info->loginuid = audit_get_loginuid(current);
> > info->sessionid = audit_get_sessionid(current);
> > - info->contid = audit_get_contid(current);
> > + info->cont = audit_cont(current);
> > + if (info->cont)
> > + refcount_inc(&info->cont->refcount);
>
> See the other comments about a "get" function, but I think we need a
> RCU read lock around the above, no?

The rcu read lock is to protect the list rather than the cont object
itself, the latter of which is protected by its refcount.

> > tsk->audit = info;
> >
> > ret = audit_alloc_syscall(tsk);
> > @@ -246,7 +252,7 @@ int audit_alloc(struct task_struct *tsk)
> > struct audit_task_info init_struct_audit = {
> > .loginuid = INVALID_UID,
> > .sessionid = AUDIT_SID_UNSET,
> > - .contid = AUDIT_CID_UNSET,
> > + .cont = NULL,
>
> More "cont" vs "contid".

^^^

> > #ifdef CONFIG_AUDITSYSCALL
> > .ctx = NULL,
> > #endif
> > @@ -266,6 +272,9 @@ void audit_free(struct task_struct *tsk)
> > /* Freeing the audit_task_info struct must be performed after
> > * audit_log_exit() due to need for loginuid and sessionid.
> > */
> > + spin_lock(&audit_contid_list_lock);
> > + audit_cont_put(tsk->audit->cont);
> > + spin_unlock(&audit_contid_list_lock);
>
> Perhaps this will make sense as I get further into the patchset, but
> why not move the spin lock operations into audit_[cont/contid]_put()?

audit_cont_put() is recursive in patch 18/21, which would have been
evident if 18/21 was squashed into this one as you pointed out there...

> > info = tsk->audit;
> > tsk->audit = NULL;
> > kmem_cache_free(audit_task_cache, info);
> > @@ -1657,6 +1666,9 @@ static int __init audit_init(void)
> > for (i = 0; i < AUDIT_INODE_BUCKETS; i++)
> > INIT_LIST_HEAD(&audit_inode_hash[i]);
> >
> > + for (i = 0; i < AUDIT_CONTID_BUCKETS; i++)
> > + INIT_LIST_HEAD(&audit_contid_hash[i]);
> > +
> > mutex_init(&audit_cmd_mutex.lock);
> > audit_cmd_mutex.owner = NULL;
> >
> > @@ -2356,6 +2368,32 @@ int audit_signal_info(int sig, struct task_struct *t)
> > return audit_signal_info_syscall(t);
> > }
> >
> > +struct audit_cont *audit_cont(struct task_struct *tsk)
> > +{
> > + if (!tsk->audit || !tsk->audit->cont)
> > + return NULL;
> > + return tsk->audit->cont;
> > +}
> > +
> > +/* audit_contid_list_lock must be held by caller */
> > +void audit_cont_put(struct audit_cont *cont)
> > +{
> > + if (!cont)
> > + return;
> > + if (refcount_dec_and_test(&cont->refcount)) {
> > + put_task_struct(cont->owner);
> > + list_del_rcu(&cont->list);
> > + kfree_rcu(cont, rcu);
> > + }
> > +}
>
> I tend to agree with Neil's previous comment; if we've got a
> audit_[cont/contid]_put(), why not an audit_[cont/contid]_get()?

^^^

> > +static struct task_struct *audit_cont_owner(struct task_struct *tsk)
> > +{
> > + if (tsk->audit && tsk->audit->cont)
> > + return tsk->audit->cont->owner;
> > + return NULL;
> > +}
>
> I'm not sure if this is possible (I haven't make my way through the
> entire patchset) and the function above isn't used in this patch (why
> is it here?), but it seems like it would be safer to convert this into
> an audit_contid_isowner() function that simply returns 1/0 depending
> on if the passed task_struct is the owner or not of a passed audit
> container ID value?

Agreed since it is only ever compared with current. It can be moved to
14/21.

> > /*
> > * audit_set_contid - set current task's audit contid
> > * @task: target task
> > @@ -2382,9 +2420,12 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> > }
> > oldcontid = audit_get_contid(task);
> > read_lock(&tasklist_lock);
> > - /* Don't allow the audit containerid to be unset */
> > + /* Don't allow the contid to be unset */
> > if (!audit_contid_valid(contid))
> > rc = -EINVAL;
> > + /* Don't allow the contid to be set to the same value again */
> > + else if (contid == oldcontid) {
> > + rc = -EADDRINUSE;
> > /* if we don't have caps, reject */
> > else if (!capable(CAP_AUDIT_CONTROL))
> > rc = -EPERM;
>
> RCU read lock? It's a bit dicey since I believe the tasklist_lock is
> going to provide us the safety we need, but if we are going to claim
> that the audit container ID list is protected by RCU we should
> probably use it.

Yes, perhaps, but to protect the task read, not the list, until it is
accessed. Getting the contid value or cont pointer via the task does
not involve the list. The cont pointer is protected by its refcount.

> > @@ -2397,8 +2438,43 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> > else if (audit_contid_set(task))
> > rc = -ECHILD;
> > read_unlock(&tasklist_lock);
> > - if (!rc)
> > - task->audit->contid = contid;
> > + if (!rc) {
> > + struct audit_cont *oldcont = audit_cont(task);
>
> Previously we held the tasklist_lock to protect the audit container ID
> associated with the struct, should we still be holding it here?

We held the tasklist_lock to protect access to the target task's
child/parent/thread relationships.

> Regardless, I worry that the lock dependencies between the
> tasklist_lock and the audit_contid_list_lock are going to be tricky.
> It might be nice to document the relationship in a comment up near
> where you declare audit_contid_list_lock.

I don't think there should be a conflict between the two.

The contid_list_lock doesn't care if the cont object is associated to a
particular task.

> > + struct audit_cont *cont = NULL;
> > + struct audit_cont *newcont = NULL;
> > + int h = audit_hash_contid(contid);
> > +
> > + spin_lock(&audit_contid_list_lock);
> > + list_for_each_entry_rcu(cont, &audit_contid_hash[h], list)
> > + if (cont->id == contid) {
> > + /* task injection to existing container */
> > + if (current == cont->owner) {
>
> I understand the desire to limit a given audit container ID to the
> orchestrator that created it, but are we certain that we can track
> audit container ID "ownership" via a single instance of a task_struct?

Are you suggesting that a task_struct representing a task may be
replaced for a specific task? I don't believe that will ever happen.

> What happens when the orchestrator stops/restarts/crashes? Do we
> even care?

Reap all of its containers?

> > + refcount_inc(&cont->refcount);
> > + newcont = cont;
>
> We can bail out of the loop here, yes?

Yes, that would be a performance improvement, but not functional bug,
thanks. :-)

> > + } else {
> > + rc = -ENOTUNIQ;
> > + goto conterror;
> > + }
> > + }
> > + if (!newcont) {
> > + newcont = kmalloc(sizeof(struct audit_cont), GFP_ATOMIC);
> > + if (newcont) {
> > + INIT_LIST_HEAD(&newcont->list);
> > + newcont->id = contid;
> > + get_task_struct(current);
> > + newcont->owner = current;
> > + refcount_set(&newcont->refcount, 1);
> > + list_add_rcu(&newcont->list, &audit_contid_hash[h]);
> > + } else {
> > + rc = -ENOMEM;
> > + goto conterror;
> > + }
> > + }
> > + task->audit->cont = newcont;
> > + audit_cont_put(oldcont);
> > +conterror:
> > + spin_unlock(&audit_contid_list_lock);
> > + }
> > task_unlock(task);
> >
> > if (!audit_enabled)
> > diff --git a/kernel/audit.h b/kernel/audit.h
> > index 16bd03b88e0d..e4a31aa92dfe 100644
> > --- a/kernel/audit.h
> > +++ b/kernel/audit.h
> > @@ -211,6 +211,14 @@ static inline int audit_hash_ino(u32 ino)
> > return (ino & (AUDIT_INODE_BUCKETS-1));
> > }
> >
> > +#define AUDIT_CONTID_BUCKETS 32
> > +extern struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
> > +
> > +static inline int audit_hash_contid(u64 contid)
> > +{
> > + return (contid & (AUDIT_CONTID_BUCKETS-1));
> > +}
> > +
> > /* Indicates that audit should log the full pathname. */
> > #define AUDIT_NAME_FULL -1
> >
>
> --
> paul moore
> http://www.paul-moore.com

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-10-28 20:10:50

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 04/21] audit: convert to contid list to check for orch/engine ownership

On Fri, Oct 25, 2019 at 04:00:19PM -0400, Richard Guy Briggs wrote:
> On 2019-09-26 10:46, Neil Horman wrote:
> > On Wed, Sep 18, 2019 at 09:22:21PM -0400, Richard Guy Briggs wrote:
> > > Store the audit container identifier in a refcounted kernel object that
> > > is added to the master list of audit container identifiers. This will
> > > allow multiple container orchestrators/engines to work on the same
> > > machine without danger of inadvertantly re-using an existing identifier.
> > > It will also allow an orchestrator to inject a process into an existing
> > > container by checking if the original container owner is the one
> > > injecting the task. A hash table list is used to optimize searches.
> > >
> > > Signed-off-by: Richard Guy Briggs <[email protected]>
> > > ---
> > > include/linux/audit.h | 26 ++++++++++++++--
> > > kernel/audit.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++---
> > > kernel/audit.h | 8 +++++
> > > 3 files changed, 112 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > > index f2e3b81f2942..e317807cdd3e 100644
> > > --- a/include/linux/audit.h
> > > +++ b/include/linux/audit.h
> > > @@ -95,10 +95,18 @@ struct audit_ntp_data {
> > > struct audit_ntp_data {};
> > > #endif
> > >
> > > +struct audit_cont {
> > > + struct list_head list;
> > > + u64 id;
> > > + struct task_struct *owner;
> > > + refcount_t refcount;
> > > + struct rcu_head rcu;
> > > +};
> > > +
> > > struct audit_task_info {
> > > kuid_t loginuid;
> > > unsigned int sessionid;
> > > - u64 contid;
> > > + struct audit_cont *cont;
> > > #ifdef CONFIG_AUDITSYSCALL
> > > struct audit_context *ctx;
> > > #endif
> > > @@ -203,11 +211,15 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
> > >
> > > static inline u64 audit_get_contid(struct task_struct *tsk)
> > > {
> > > - if (!tsk->audit)
> > > + if (!tsk->audit || !tsk->audit->cont)
> > > return AUDIT_CID_UNSET;
> > > - return tsk->audit->contid;
> > > + return tsk->audit->cont->id;
> > > }
> > >
> > > +extern struct audit_cont *audit_cont(struct task_struct *tsk);
> > > +
> > > +extern void audit_cont_put(struct audit_cont *cont);
> > > +
> > I see that you manual increment this refcount at various call sites, why
> > no corresponding audit_contid_hold function?
>
> I was trying to avoid the get function due to having one site where I
> needed the pointer for later but didn't need a refcount to it so that I
> could release the refcount it if it was replaced by another cont object.
> A hold function would just contain one line that would call the
> refcount_inc(). If I did convert things over to a get function, it
> would hide some of this extra conditional code in the main calling
> function, but in one place I could just call put immediately to
> neutralize that unneeded refcount.
>
Ok, but this pattern:

static inline u64 __audit_contid_get(struct audit_cont *c) {
return c->id;
}

audit_contid_get(struct audit_cont *c) {
refcount_hold(c)
return __audit_contid_get(c)
}

Squares that up, doesn't it? It gives you an internal non refcount
holding version then to use.

> Would you see any issue with that extra get/put refcount that would only
> happen in the case of changing a contid in a nesting situation?
>
No, I personally wouldn't have an issue with it, but the above would
make it pretty readable I think

> > Neil
> >
> > > extern u32 audit_enabled;
> > >
> > > extern int audit_signal_info(int sig, struct task_struct *t);
> > > @@ -277,6 +289,14 @@ static inline u64 audit_get_contid(struct task_struct *tsk)
> > > return AUDIT_CID_UNSET;
> > > }
> > >
> > > +static inline struct audit_cont *audit_cont(struct task_struct *tsk)
> > > +{
> > > + return NULL;
> > > +}
> > > +
> > > +static inline void audit_cont_put(struct audit_cont *cont)
> > > +{ }
> > > +
> > > #define audit_enabled AUDIT_OFF
> > >
> > > static inline int audit_signal_info(int sig, struct task_struct *t)
> > > diff --git a/kernel/audit.c b/kernel/audit.c
> > > index a36ea57cbb61..ea0899130cc1 100644
> > > --- a/kernel/audit.c
> > > +++ b/kernel/audit.c
> > > @@ -137,6 +137,8 @@ struct audit_net {
> > >
> > > /* Hash for inode-based rules */
> > > struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
> > > +/* Hash for contid-based rules */
> > > +struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
> > >
> > > static struct kmem_cache *audit_buffer_cache;
> > >
> > > @@ -204,6 +206,8 @@ struct audit_reply {
> > >
> > > static struct kmem_cache *audit_task_cache;
> > >
> > > +static DEFINE_SPINLOCK(audit_contid_list_lock);
> > > +
> > > void __init audit_task_init(void)
> > > {
> > > audit_task_cache = kmem_cache_create("audit_task",
> > > @@ -231,7 +235,9 @@ int audit_alloc(struct task_struct *tsk)
> > > }
> > > info->loginuid = audit_get_loginuid(current);
> > > info->sessionid = audit_get_sessionid(current);
> > > - info->contid = audit_get_contid(current);
> > > + info->cont = audit_cont(current);
> > > + if (info->cont)
> > > + refcount_inc(&info->cont->refcount);
> > > tsk->audit = info;
> > >
> > > ret = audit_alloc_syscall(tsk);
> > > @@ -246,7 +252,7 @@ int audit_alloc(struct task_struct *tsk)
> > > struct audit_task_info init_struct_audit = {
> > > .loginuid = INVALID_UID,
> > > .sessionid = AUDIT_SID_UNSET,
> > > - .contid = AUDIT_CID_UNSET,
> > > + .cont = NULL,
> > > #ifdef CONFIG_AUDITSYSCALL
> > > .ctx = NULL,
> > > #endif
> > > @@ -266,6 +272,9 @@ void audit_free(struct task_struct *tsk)
> > > /* Freeing the audit_task_info struct must be performed after
> > > * audit_log_exit() due to need for loginuid and sessionid.
> > > */
> > > + spin_lock(&audit_contid_list_lock);
> > > + audit_cont_put(tsk->audit->cont);
> > > + spin_unlock(&audit_contid_list_lock);
> > > info = tsk->audit;
> > > tsk->audit = NULL;
> > > kmem_cache_free(audit_task_cache, info);
> > > @@ -1657,6 +1666,9 @@ static int __init audit_init(void)
> > > for (i = 0; i < AUDIT_INODE_BUCKETS; i++)
> > > INIT_LIST_HEAD(&audit_inode_hash[i]);
> > >
> > > + for (i = 0; i < AUDIT_CONTID_BUCKETS; i++)
> > > + INIT_LIST_HEAD(&audit_contid_hash[i]);
> > > +
> > > mutex_init(&audit_cmd_mutex.lock);
> > > audit_cmd_mutex.owner = NULL;
> > >
> > > @@ -2356,6 +2368,32 @@ int audit_signal_info(int sig, struct task_struct *t)
> > > return audit_signal_info_syscall(t);
> > > }
> > >
> > > +struct audit_cont *audit_cont(struct task_struct *tsk)
> > > +{
> > > + if (!tsk->audit || !tsk->audit->cont)
> > > + return NULL;
> > > + return tsk->audit->cont;
> > > +}
> > > +
> > > +/* audit_contid_list_lock must be held by caller */
> > > +void audit_cont_put(struct audit_cont *cont)
> > > +{
> > > + if (!cont)
> > > + return;
> > > + if (refcount_dec_and_test(&cont->refcount)) {
> > > + put_task_struct(cont->owner);
> > > + list_del_rcu(&cont->list);
> > > + kfree_rcu(cont, rcu);
> > > + }
> > > +}
> > > +
> > > +static struct task_struct *audit_cont_owner(struct task_struct *tsk)
> > > +{
> > > + if (tsk->audit && tsk->audit->cont)
> > > + return tsk->audit->cont->owner;
> > > + return NULL;
> > > +}
> > > +
> > > /*
> > > * audit_set_contid - set current task's audit contid
> > > * @task: target task
> > > @@ -2382,9 +2420,12 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> > > }
> > > oldcontid = audit_get_contid(task);
> > > read_lock(&tasklist_lock);
> > > - /* Don't allow the audit containerid to be unset */
> > > + /* Don't allow the contid to be unset */
> > > if (!audit_contid_valid(contid))
> > > rc = -EINVAL;
> > > + /* Don't allow the contid to be set to the same value again */
> > > + else if (contid == oldcontid) {
> > > + rc = -EADDRINUSE;
> > > /* if we don't have caps, reject */
> > > else if (!capable(CAP_AUDIT_CONTROL))
> > > rc = -EPERM;
> > > @@ -2397,8 +2438,43 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> > > else if (audit_contid_set(task))
> > > rc = -ECHILD;
> > > read_unlock(&tasklist_lock);
> > > - if (!rc)
> > > - task->audit->contid = contid;
> > > + if (!rc) {
> > > + struct audit_cont *oldcont = audit_cont(task);
> > > + struct audit_cont *cont = NULL;
> > > + struct audit_cont *newcont = NULL;
> > > + int h = audit_hash_contid(contid);
> > > +
> > > + spin_lock(&audit_contid_list_lock);
> > > + list_for_each_entry_rcu(cont, &audit_contid_hash[h], list)
> > > + if (cont->id == contid) {
> > > + /* task injection to existing container */
> > > + if (current == cont->owner) {
> > > + refcount_inc(&cont->refcount);
> > > + newcont = cont;
> > > + } else {
> > > + rc = -ENOTUNIQ;
> > > + goto conterror;
> > > + }
> > > + }
> > > + if (!newcont) {
> > > + newcont = kmalloc(sizeof(struct audit_cont), GFP_ATOMIC);
> > > + if (newcont) {
> > > + INIT_LIST_HEAD(&newcont->list);
> > > + newcont->id = contid;
> > > + get_task_struct(current);
> > > + newcont->owner = current;
> > > + refcount_set(&newcont->refcount, 1);
> > > + list_add_rcu(&newcont->list, &audit_contid_hash[h]);
> > > + } else {
> > > + rc = -ENOMEM;
> > > + goto conterror;
> > > + }
> > > + }
> > > + task->audit->cont = newcont;
> > > + audit_cont_put(oldcont);
> > > +conterror:
> > > + spin_unlock(&audit_contid_list_lock);
> > > + }
> > > task_unlock(task);
> > >
> > > if (!audit_enabled)
> > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > index 16bd03b88e0d..e4a31aa92dfe 100644
> > > --- a/kernel/audit.h
> > > +++ b/kernel/audit.h
> > > @@ -211,6 +211,14 @@ static inline int audit_hash_ino(u32 ino)
> > > return (ino & (AUDIT_INODE_BUCKETS-1));
> > > }
> > >
> > > +#define AUDIT_CONTID_BUCKETS 32
> > > +extern struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
> > > +
> > > +static inline int audit_hash_contid(u64 contid)
> > > +{
> > > + return (contid & (AUDIT_CONTID_BUCKETS-1));
> > > +}
> > > +
> > > /* Indicates that audit should log the full pathname. */
> > > #define AUDIT_NAME_FULL -1
> > >
> > > --
> > > 1.8.3.1
> > >
> > >
>
> - RGB
>
> --
> Richard Guy Briggs <[email protected]>
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635
>
>

2019-10-30 20:31:16

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On Thu, Oct 24, 2019 at 5:00 PM Richard Guy Briggs <[email protected]> wrote:
> Here's the note I had from that meeting:
>
> - Eric raised the issue that using /proc is likely to get more and more
> hoary due to mount namespaces and suggested that we use a netlink
> audit message (or a new syscall) to set the audit container identifier
> and since the loginuid is a similar type of operation, that it should be
> migrated over to a similar mechanism to get it away from /proc. Get
> could be done with a netlink audit message that triggers an audit log
> message to deliver the information. I'm reluctant to further pollute
> the syscall space if we can find another method. The netlink audit
> message makes sense since any audit-enabled service is likely to already
> have an audit socket open.

Thanks for the background info on the off-list meeting. I would
encourage you to have discussions like this on-list in the future; if
that isn't possible, hosting a public call would okay-ish, but a
distant second.

At this point in time I'm not overly concerned about /proc completely
going away in namespaces/containers that are full featured enough to
host a container orchestrator. If/when reliance on procfs becomes an
issue, we can look at alternate APIs, but given the importance of
/proc to userspace (including to audit) I suspect we are going to see
it persist for some time. I would prefer to see you to drop the audit
container ID netlink API portions of this patchset and focus on the
procfs API.

Also, for the record, removing the audit loginuid from procfs is not
something to take lightly, if at all; like it or not, it's part of the
kernel API.

--
paul moore
http://www.paul-moore.com

2019-10-30 23:16:34

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 14/21] audit: contid check descendancy and nesting

On Thu, Oct 24, 2019 at 6:08 PM Richard Guy Briggs <[email protected]> wrote:
> On 2019-10-10 20:40, Paul Moore wrote:
> > On Wed, Sep 18, 2019 at 9:26 PM Richard Guy Briggs <[email protected]> wrote:
> > > ?fixup! audit: convert to contid list to check for orch/engine ownership
> >
> > ?
> >
> > > Require the target task to be a descendant of the container
> > > orchestrator/engine.
> > >
> > > You would only change the audit container ID from one set or inherited
> > > value to another if you were nesting containers.
> > >
> > > If changing the contid, the container orchestrator/engine must be a
> > > descendant and not same orchestrator as the one that set it so it is not
> > > possible to change the contid of another orchestrator's container.
> >
> > Did you mean to say that the container orchestrator must be an
> > ancestor of the target, and the same orchestrator as the one that set
> > the target process' audit container ID?
>
> Not quite, the first half yes, but the second half: if it was already
> set by that orchestrator, it can't be set again. If it is a different
> orchestrator that is a descendant of the orchestrator that set it, then
> allow the action.
>
> > Or maybe I'm missing something about what you are trying to do?
>
> Does that help clarify it?

I think so, it's pretty much as you stated originally: "Require the
target task to be a descendant of the container orchestrator/engine".
It's possible I misread something in the patch, or got lost in all the
?fixup! patching. I'll take a closer look at the next revision of the
patchset to make sure the code makes sense to me, but the logic seems
reasonable.

--
paul moore
http://www.paul-moore.com

2019-10-31 00:24:57

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On 2019-10-30 16:27, Paul Moore wrote:
> On Thu, Oct 24, 2019 at 5:00 PM Richard Guy Briggs <[email protected]> wrote:
> > Here's the note I had from that meeting:
> >
> > - Eric raised the issue that using /proc is likely to get more and more
> > hoary due to mount namespaces and suggested that we use a netlink
> > audit message (or a new syscall) to set the audit container identifier
> > and since the loginuid is a similar type of operation, that it should be
> > migrated over to a similar mechanism to get it away from /proc. Get
> > could be done with a netlink audit message that triggers an audit log
> > message to deliver the information. I'm reluctant to further pollute
> > the syscall space if we can find another method. The netlink audit
> > message makes sense since any audit-enabled service is likely to already
> > have an audit socket open.
>
> Thanks for the background info on the off-list meeting. I would
> encourage you to have discussions like this on-list in the future; if
> that isn't possible, hosting a public call would okay-ish, but a
> distant second.

I'm still trying to get Eric's attention to get him to weigh in here and
provide a more eloquent representation of his ideas and concerns. Some
of it was related to CRIU(sp?) issues which we've already of which we've
already seen similar concerns in namespace identifiers including the
device identity to qualify it.

> At this point in time I'm not overly concerned about /proc completely
> going away in namespaces/containers that are full featured enough to
> host a container orchestrator. If/when reliance on procfs becomes an
> issue, we can look at alternate APIs, but given the importance of
> /proc to userspace (including to audit) I suspect we are going to see
> it persist for some time. I would prefer to see you to drop the audit
> container ID netlink API portions of this patchset and focus on the
> procfs API.

I've already refactored the code to put the netlink bits at the end as
completely optional pieces for completeness so they won't get in the way
of the real substance of this patchset. The nesting depth and total
number of containers checks have also been punted to the end of the
patchset to get them out of the way of discussion.

> Also, for the record, removing the audit loginuid from procfs is not
> something to take lightly, if at all; like it or not, it's part of the
> kernel API.

Oh, I'm quite aware of how important this change is and it was discussed
with Steve Grubb who saw the concern and value of considering such a
disruptive change. Removing proc support for auid/ses would be a
long-term deprecation if accepted.

Really, I should have labelled the v7 patchset as RFC since there were
so many new and disruptive ideas presented in it.

> paul moore
> http://www.paul-moore.com

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-10-31 14:01:30

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On Wed, Oct 30, 2019 at 6:04 PM Richard Guy Briggs <[email protected]> wrote:
> On 2019-10-30 16:27, Paul Moore wrote:
> > On Thu, Oct 24, 2019 at 5:00 PM Richard Guy Briggs <[email protected]> wrote:
> > > Here's the note I had from that meeting:
> > >
> > > - Eric raised the issue that using /proc is likely to get more and more
> > > hoary due to mount namespaces and suggested that we use a netlink
> > > audit message (or a new syscall) to set the audit container identifier
> > > and since the loginuid is a similar type of operation, that it should be
> > > migrated over to a similar mechanism to get it away from /proc. Get
> > > could be done with a netlink audit message that triggers an audit log
> > > message to deliver the information. I'm reluctant to further pollute
> > > the syscall space if we can find another method. The netlink audit
> > > message makes sense since any audit-enabled service is likely to already
> > > have an audit socket open.
> >
> > Thanks for the background info on the off-list meeting. I would
> > encourage you to have discussions like this on-list in the future; if
> > that isn't possible, hosting a public call would okay-ish, but a
> > distant second.
>
> I'm still trying to get Eric's attention to get him to weigh in here and
> provide a more eloquent representation of his ideas and concerns. Some
> of it was related to CRIU(sp?) issues which we've already of which we've
> already seen similar concerns in namespace identifiers including the
> device identity to qualify it.

Okay, let's leave this open until we hear from Eric to see if he has
any additional information, but it's going to need to be pretty
compelling.

> > At this point in time I'm not overly concerned about /proc completely
> > going away in namespaces/containers that are full featured enough to
> > host a container orchestrator. If/when reliance on procfs becomes an
> > issue, we can look at alternate APIs, but given the importance of
> > /proc to userspace (including to audit) I suspect we are going to see
> > it persist for some time. I would prefer to see you to drop the audit
> > container ID netlink API portions of this patchset and focus on the
> > procfs API.
>
> I've already refactored the code to put the netlink bits at the end as
> completely optional pieces for completeness so they won't get in the way
> of the real substance of this patchset. The nesting depth and total
> number of containers checks have also been punted to the end of the
> patchset to get them out of the way of discussion.

That's fine, but if we do decide to drop the netlink API after hearing
from Eric, please drop those from the patchset. Keeping the patchset
small and focused should be a goal, and including rejected/dead
patches (even at the end) doesn't help move towards that goal.

> > Also, for the record, removing the audit loginuid from procfs is not
> > something to take lightly, if at all; like it or not, it's part of the
> > kernel API.
>
> Oh, I'm quite aware of how important this change is and it was discussed
> with Steve Grubb who saw the concern and value of considering such a
> disruptive change. Removing proc support for auid/ses would be a
> long-term deprecation if accepted.

As I mentioned, that comment was more "for the record" than you in
particular; I know we've talked a lot over the years about kernel API
stability and I'm confident you are aware of the pitfalls there. :)

--
paul moore
http://www.paul-moore.com

2019-10-31 14:53:04

by Steve Grubb

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

Hello,

TLDR; I see a lot of benefit to switching away from procfs for setting auid &
sessionid.

On Wednesday, October 30, 2019 6:03:20 PM EDT Richard Guy Briggs wrote:
> > Also, for the record, removing the audit loginuid from procfs is not
> > something to take lightly, if at all; like it or not, it's part of the
> > kernel API.

It can also be used by tools to iterate processes related to one user or
session. I use this in my Intrusion Prevention System which will land in
audit user space at some point in the future.


> Oh, I'm quite aware of how important this change is and it was discussed
> with Steve Grubb who saw the concern and value of considering such a
> disruptive change.

Actually, I advocated for syscall. I think the gist of Eric's idea was that /
proc is the intersection of many nasty problems. By relying on it, you can't
simplify the API to reduce the complexity. Almost no program actually needs
access to /proc. ps does. But almost everything else is happy without it. For
example, when you setup chroot jails, you may have to add /dev/random or /
dev/null, but almost never /proc. What does force you to add /proc is any
entry point daemon like sshd because it needs to set the loginuid. If we
switch away from /proc, then sshd or crond will no longer /require/ procfs to
be available which again simplifies the system design.


> Removing proc support for auid/ses would be a
> long-term deprecation if accepted.

It might need to just be turned into readonly for a while. But then again,
perhaps auid and session should be part of /proc/<pid>/status? Maybe this can
be done independently and ahead of the container work so there is a migration
path for things that read auid or session. TBH, maybe this should have been
done from the beginning.

-Steve



2019-10-31 23:44:48

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On Thu, Oct 31, 2019 at 10:51 AM Steve Grubb <[email protected]> wrote:
> On Wednesday, October 30, 2019 6:03:20 PM EDT Richard Guy Briggs wrote:
> > > Also, for the record, removing the audit loginuid from procfs is not
> > > something to take lightly, if at all; like it or not, it's part of the
> > > kernel API.
>
> It can also be used by tools to iterate processes related to one user or
> session. I use this in my Intrusion Prevention System which will land in
> audit user space at some point in the future.

Let's try to stay focused on the audit container ID functionality; I
fear if we start bringing in other unrelated issues we are never going
to land these patches.

> > Oh, I'm quite aware of how important this change is and it was discussed
> > with Steve Grubb who saw the concern and value of considering such a
> > disruptive change.
>
> Actually, I advocated for syscall. I think the gist of Eric's idea was that /
> proc is the intersection of many nasty problems. By relying on it, you can't
> simplify the API to reduce the complexity.

I guess complexity is relative in a sense, but reading and writing a
number from a file in procfs seems awfully simple to me.

> Almost no program actually needs
> access to /proc. ps does. But almost everything else is happy without it. For
> example, when you setup chroot jails, you may have to add /dev/random or /
> dev/null, but almost never /proc. What does force you to add /proc is any
> entry point daemon like sshd because it needs to set the loginuid. If we
> switch away from /proc, then sshd or crond will no longer /require/ procfs to
> be available which again simplifies the system design.

It's not that simple, there are plenty of container use cases beyond
ps which require procfs:

Most LSM aware applications require procfs to view and manage some LSM
state (e.g. /proc/self/attr).

System containers, containers that run their own init/systemd/etc.,
require a working procfs.

Nested container orchestrators often run in system containers, which
require a working procfs (see above).

I'm sure there are plenty others, but these are the ones that came
immediately to mind.

--
paul moore
http://www.paul-moore.com

2019-11-01 01:37:39

by Duncan Roe

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On Thu, Oct 31, 2019 at 10:50:57AM -0400, Steve Grubb wrote:
> Hello,
>
> TLDR; I see a lot of benefit to switching away from procfs for setting auid &
> sessionid.
>
> On Wednesday, October 30, 2019 6:03:20 PM EDT Richard Guy Briggs wrote:
> > > Also, for the record, removing the audit loginuid from procfs is not
> > > something to take lightly, if at all; like it or not, it's part of the
> > > kernel API.
>
> It can also be used by tools to iterate processes related to one user or
> session. I use this in my Intrusion Prevention System which will land in
> audit user space at some point in the future.
>
>
> > Oh, I'm quite aware of how important this change is and it was discussed
> > with Steve Grubb who saw the concern and value of considering such a
> > disruptive change.
>
> Actually, I advocated for syscall. I think the gist of Eric's idea was that /
> proc is the intersection of many nasty problems. By relying on it, you can't
> simplify the API to reduce the complexity. Almost no program actually needs
^^^^^^ ^^ ^^^^^^^ ^^^^^^^^ ^^^^^
> access to /proc. ps does. But almost everything else is happy without it. For
> ^^^^^^ ^^ ^^^^^^ ^^ ^^^^^

Eh?? *top* needs /proc/ps, as do most of the programs in package procps-ng.
Then there's lsof, pgrep (which doesn't fail but can't find anything) and even
lilo (for Slackware ;)

> example, when you setup chroot jails, you may have to add /dev/random or /
> dev/null, but almost never /proc. What does force you to add /proc is any
> entry point daemon like sshd because it needs to set the loginuid. If we
> switch away from /proc, then sshd or crond will no longer /require/ procfs to
> be available which again simplifies the system design.
>
>
> > Removing proc support for auid/ses would be a
> > long-term deprecation if accepted.
>
> It might need to just be turned into readonly for a while. But then again,
> perhaps auid and session should be part of /proc/<pid>/status? Maybe this can
> be done independently and ahead of the container work so there is a migration
> path for things that read auid or session. TBH, maybe this should have been
> done from the beginning.
>
> -Steve
>
Cheers ... Duncan.

2019-11-01 15:11:20

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On 2019-10-31 10:50, Steve Grubb wrote:
> Hello,
>
> TLDR; I see a lot of benefit to switching away from procfs for setting auid &
> sessionid.
>
> On Wednesday, October 30, 2019 6:03:20 PM EDT Richard Guy Briggs wrote:
> > > Also, for the record, removing the audit loginuid from procfs is not
> > > something to take lightly, if at all; like it or not, it's part of the
> > > kernel API.
>
> It can also be used by tools to iterate processes related to one user or
> session. I use this in my Intrusion Prevention System which will land in
> audit user space at some point in the future.
>
> > Oh, I'm quite aware of how important this change is and it was discussed
> > with Steve Grubb who saw the concern and value of considering such a
> > disruptive change.
>
> Actually, I advocated for syscall. I think the gist of Eric's idea was that /
> proc is the intersection of many nasty problems. By relying on it, you can't
> simplify the API to reduce the complexity. Almost no program actually needs
> access to /proc. ps does. But almost everything else is happy without it. For
> example, when you setup chroot jails, you may have to add /dev/random or /
> dev/null, but almost never /proc. What does force you to add /proc is any
> entry point daemon like sshd because it needs to set the loginuid. If we
> switch away from /proc, then sshd or crond will no longer /require/ procfs to
> be available which again simplifies the system design.
>
> > Removing proc support for auid/ses would be a
> > long-term deprecation if accepted.
>
> It might need to just be turned into readonly for a while. But then again,
> perhaps auid and session should be part of /proc/<pid>/status? Maybe this can
> be done independently and ahead of the container work so there is a migration
> path for things that read auid or session. TBH, maybe this should have been
> done from the beginning.

How about making loginuid/contid/capcontid writable only via netlink but
still provide the /proc interface for reading? Deprecation of proc can
be left as a decision for later. This way sshd/crond/getty don't need
/proc, but the info is still there for tools that want to read it.

> -Steve

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-11-01 15:18:24

by Steve Grubb

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On Friday, November 1, 2019 11:09:27 AM EDT Richard Guy Briggs wrote:
> On 2019-10-31 10:50, Steve Grubb wrote:
> > Hello,
> >
> > TLDR; I see a lot of benefit to switching away from procfs for setting
> > auid & sessionid.
> >
> > On Wednesday, October 30, 2019 6:03:20 PM EDT Richard Guy Briggs wrote:
> > > > Also, for the record, removing the audit loginuid from procfs is not
> > > > something to take lightly, if at all; like it or not, it's part of
> > > > the
> > > > kernel API.
> >
> > It can also be used by tools to iterate processes related to one user or
> > session. I use this in my Intrusion Prevention System which will land in
> > audit user space at some point in the future.
> >
> > > Oh, I'm quite aware of how important this change is and it was
> > > discussed
> > > with Steve Grubb who saw the concern and value of considering such a
> > > disruptive change.
> >
> > Actually, I advocated for syscall. I think the gist of Eric's idea was
> > that / proc is the intersection of many nasty problems. By relying on
> > it, you can't simplify the API to reduce the complexity. Almost no
> > program actually needs access to /proc. ps does. But almost everything
> > else is happy without it. For example, when you setup chroot jails, you
> > may have to add /dev/random or / dev/null, but almost never /proc. What
> > does force you to add /proc is any entry point daemon like sshd because
> > it needs to set the loginuid. If we switch away from /proc, then sshd or
> > crond will no longer /require/ procfs to be available which again
> > simplifies the system design.
> >
> > > Removing proc support for auid/ses would be a
> > > long-term deprecation if accepted.
> >
> > It might need to just be turned into readonly for a while. But then
> > again,
> > perhaps auid and session should be part of /proc/<pid>/status? Maybe this
> > can be done independently and ahead of the container work so there is a
> > migration path for things that read auid or session. TBH, maybe this
> > should have been done from the beginning.
>
> How about making loginuid/contid/capcontid writable only via netlink but
> still provide the /proc interface for reading? Deprecation of proc can
> be left as a decision for later. This way sshd/crond/getty don't need
> /proc, but the info is still there for tools that want to read it.

This also sounds good to me. But I still think loginuid and audit sessionid
should get written in /proc/<pid>/status so that all process information is
consolidated in one place.

-Steve


2019-11-01 15:25:24

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On 2019-11-01 11:13, Steve Grubb wrote:
> On Friday, November 1, 2019 11:09:27 AM EDT Richard Guy Briggs wrote:
> > On 2019-10-31 10:50, Steve Grubb wrote:
> > > Hello,
> > >
> > > TLDR; I see a lot of benefit to switching away from procfs for setting
> > > auid & sessionid.
> > >
> > > On Wednesday, October 30, 2019 6:03:20 PM EDT Richard Guy Briggs wrote:
> > > > > Also, for the record, removing the audit loginuid from procfs is not
> > > > > something to take lightly, if at all; like it or not, it's part of
> > > > > the
> > > > > kernel API.
> > >
> > > It can also be used by tools to iterate processes related to one user or
> > > session. I use this in my Intrusion Prevention System which will land in
> > > audit user space at some point in the future.
> > >
> > > > Oh, I'm quite aware of how important this change is and it was
> > > > discussed
> > > > with Steve Grubb who saw the concern and value of considering such a
> > > > disruptive change.
> > >
> > > Actually, I advocated for syscall. I think the gist of Eric's idea was
> > > that / proc is the intersection of many nasty problems. By relying on
> > > it, you can't simplify the API to reduce the complexity. Almost no
> > > program actually needs access to /proc. ps does. But almost everything
> > > else is happy without it. For example, when you setup chroot jails, you
> > > may have to add /dev/random or / dev/null, but almost never /proc. What
> > > does force you to add /proc is any entry point daemon like sshd because
> > > it needs to set the loginuid. If we switch away from /proc, then sshd or
> > > crond will no longer /require/ procfs to be available which again
> > > simplifies the system design.
> > >
> > > > Removing proc support for auid/ses would be a
> > > > long-term deprecation if accepted.
> > >
> > > It might need to just be turned into readonly for a while. But then
> > > again,
> > > perhaps auid and session should be part of /proc/<pid>/status? Maybe this
> > > can be done independently and ahead of the container work so there is a
> > > migration path for things that read auid or session. TBH, maybe this
> > > should have been done from the beginning.
> >
> > How about making loginuid/contid/capcontid writable only via netlink but
> > still provide the /proc interface for reading? Deprecation of proc can
> > be left as a decision for later. This way sshd/crond/getty don't need
> > /proc, but the info is still there for tools that want to read it.
>
> This also sounds good to me. But I still think loginuid and audit sessionid
> should get written in /proc/<pid>/status so that all process information is
> consolidated in one place.

I don't have a problem adding auid/sessionid to /proc/<pid>/status with
other related information, but it is disruptive to deprecate the
existing interface which could be a seperate step.

> -Steve

- RGB

--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2019-11-01 17:17:14

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 20/21] audit: add capcontid to set contid outside init_user_ns

On Fri, Nov 1, 2019 at 11:10 AM Richard Guy Briggs <[email protected]> wrote:
> On 2019-10-31 10:50, Steve Grubb wrote:
> > Hello,
> >
> > TLDR; I see a lot of benefit to switching away from procfs for setting auid &
> > sessionid.
> >
> > On Wednesday, October 30, 2019 6:03:20 PM EDT Richard Guy Briggs wrote:
> > > > Also, for the record, removing the audit loginuid from procfs is not
> > > > something to take lightly, if at all; like it or not, it's part of the
> > > > kernel API.
> >
> > It can also be used by tools to iterate processes related to one user or
> > session. I use this in my Intrusion Prevention System which will land in
> > audit user space at some point in the future.
> >
> > > Oh, I'm quite aware of how important this change is and it was discussed
> > > with Steve Grubb who saw the concern and value of considering such a
> > > disruptive change.
> >
> > Actually, I advocated for syscall. I think the gist of Eric's idea was that /
> > proc is the intersection of many nasty problems. By relying on it, you can't
> > simplify the API to reduce the complexity. Almost no program actually needs
> > access to /proc. ps does. But almost everything else is happy without it. For
> > example, when you setup chroot jails, you may have to add /dev/random or /
> > dev/null, but almost never /proc. What does force you to add /proc is any
> > entry point daemon like sshd because it needs to set the loginuid. If we
> > switch away from /proc, then sshd or crond will no longer /require/ procfs to
> > be available which again simplifies the system design.
> >
> > > Removing proc support for auid/ses would be a
> > > long-term deprecation if accepted.
> >
> > It might need to just be turned into readonly for a while. But then again,
> > perhaps auid and session should be part of /proc/<pid>/status? Maybe this can
> > be done independently and ahead of the container work so there is a migration
> > path for things that read auid or session. TBH, maybe this should have been
> > done from the beginning.
>
> How about making loginuid/contid/capcontid writable only via netlink but
> still provide the /proc interface for reading? Deprecation of proc can
> be left as a decision for later. This way sshd/crond/getty don't need
> /proc, but the info is still there for tools that want to read it.

Just so there is no confusion for the next patchset: I think it would
be a mistake to include any changes to loginuid in your next patchset,
even as a "RFC" at the end. Also, barring some shocking comments from
Eric relating to the imminent death of /proc in containers, I think it
would also be a mistake to include the netlink API.

Let's keep it small and focused :)

--
paul moore
http://www.paul-moore.com

2019-11-08 17:42:33

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 08/21] audit: add contid support for signalling the audit daemon

On Fri, Oct 25, 2019 at 3:20 PM Richard Guy Briggs <[email protected]> wrote:
> On 2019-10-10 20:39, Paul Moore wrote:
> > On Wed, Sep 18, 2019 at 9:25 PM Richard Guy Briggs <[email protected]> wrote:
> > > Add audit container identifier support to the action of signalling the
> > > audit daemon.
> > >
> > > Since this would need to add an element to the audit_sig_info struct,
> > > a new record type AUDIT_SIGNAL_INFO2 was created with a new
> > > audit_sig_info2 struct. Corresponding support is required in the
> > > userspace code to reflect the new record request and reply type.
> > > An older userspace won't break since it won't know to request this
> > > record type.
> > >
> > > Signed-off-by: Richard Guy Briggs <[email protected]>
> > > ---
> > > include/linux/audit.h | 7 +++++++
> > > include/uapi/linux/audit.h | 1 +
> > > kernel/audit.c | 28 ++++++++++++++++++++++++++++
> > > kernel/audit.h | 1 +
> > > security/selinux/nlmsgtab.c | 1 +
> > > 5 files changed, 38 insertions(+)
> > >
> > > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > > index 0c18d8e30620..7b640c4da4ee 100644
> > > --- a/include/linux/audit.h
> > > +++ b/include/linux/audit.h
> > > @@ -23,6 +23,13 @@ struct audit_sig_info {
> > > char ctx[0];
> > > };
> > >
> > > +struct audit_sig_info2 {
> > > + uid_t uid;
> > > + pid_t pid;
> > > + u64 cid;
> > > + char ctx[0];
> > > +};
> > > +
> > > struct audit_buffer;
> > > struct audit_context;
> > > struct inode;
> > > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> > > index 4ed080f28b47..693ec6e0288b 100644
> > > --- a/include/uapi/linux/audit.h
> > > +++ b/include/uapi/linux/audit.h
> > > @@ -72,6 +72,7 @@
> > > #define AUDIT_SET_FEATURE 1018 /* Turn an audit feature on or off */
> > > #define AUDIT_GET_FEATURE 1019 /* Get which features are enabled */
> > > #define AUDIT_CONTAINER_OP 1020 /* Define the container id and info */
> > > +#define AUDIT_SIGNAL_INFO2 1021 /* Get info auditd signal sender */
> > >
> > > #define AUDIT_FIRST_USER_MSG 1100 /* Userspace messages mostly uninteresting to kernel */
> > > #define AUDIT_USER_AVC 1107 /* We filter this differently */
> > > diff --git a/kernel/audit.c b/kernel/audit.c
> > > index adfb3e6a7f0c..df3db29f5a8a 100644
> > > --- a/kernel/audit.c
> > > +++ b/kernel/audit.c
> > > @@ -125,6 +125,7 @@ struct audit_net {
> > > kuid_t audit_sig_uid = INVALID_UID;
> > > pid_t audit_sig_pid = -1;
> > > u32 audit_sig_sid = 0;
> > > +u64 audit_sig_cid = AUDIT_CID_UNSET;
> > >
> > > /* Records can be lost in several ways:
> > > 0) [suppressed in audit_alloc]
> > > @@ -1094,6 +1095,7 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
> > > case AUDIT_ADD_RULE:
> > > case AUDIT_DEL_RULE:
> > > case AUDIT_SIGNAL_INFO:
> > > + case AUDIT_SIGNAL_INFO2:
> > > case AUDIT_TTY_GET:
> > > case AUDIT_TTY_SET:
> > > case AUDIT_TRIM:
> > > @@ -1257,6 +1259,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> > > struct audit_buffer *ab;
> > > u16 msg_type = nlh->nlmsg_type;
> > > struct audit_sig_info *sig_data;
> > > + struct audit_sig_info2 *sig_data2;
> > > char *ctx = NULL;
> > > u32 len;
> > >
> > > @@ -1516,6 +1519,30 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> > > sig_data, sizeof(*sig_data) + len);
> > > kfree(sig_data);
> > > break;
> > > + case AUDIT_SIGNAL_INFO2:
> > > + len = 0;
> > > + if (audit_sig_sid) {
> > > + err = security_secid_to_secctx(audit_sig_sid, &ctx, &len);
> > > + if (err)
> > > + return err;
> > > + }
> > > + sig_data2 = kmalloc(sizeof(*sig_data2) + len, GFP_KERNEL);
> > > + if (!sig_data2) {
> > > + if (audit_sig_sid)
> > > + security_release_secctx(ctx, len);
> > > + return -ENOMEM;
> > > + }
> > > + sig_data2->uid = from_kuid(&init_user_ns, audit_sig_uid);
> > > + sig_data2->pid = audit_sig_pid;
> > > + if (audit_sig_sid) {
> > > + memcpy(sig_data2->ctx, ctx, len);
> > > + security_release_secctx(ctx, len);
> > > + }
> > > + sig_data2->cid = audit_sig_cid;
> > > + audit_send_reply(skb, seq, AUDIT_SIGNAL_INFO2, 0, 0,
> > > + sig_data2, sizeof(*sig_data2) + len);
> > > + kfree(sig_data2);
> > > + break;
> > > case AUDIT_TTY_GET: {
> > > struct audit_tty_status s;
> > > unsigned int t;
> > > @@ -2384,6 +2411,7 @@ int audit_signal_info(int sig, struct task_struct *t)
> > > else
> > > audit_sig_uid = uid;
> > > security_task_getsecid(current, &audit_sig_sid);
> > > + audit_sig_cid = audit_get_contid(current);
> > > }
> >
> > I've been wondering something as I've been working my way through
> > these patches and this patch seems like a good spot to discuss this
> > ... Now that we have the concept of an audit container ID "lifetime"
> > in the kernel, when do we consider the ID gone? Is it when the last
> > process in the container exits, or is it when we generate the last
> > audit record which could possibly contain the audit container ID?
> > This patch would appear to support the former, but if we wanted the
> > latter we would need to grab a reference to the audit container ID
> > struct so it wouldn't "die" on us before we could emit the signal info
> > record.
>
> Are you concerned with the availability of the data when the audit
> signal info record is generated, when the kernel last deals with a
> particular contid or when userspace thinks there will be no more
> references to it?
>
> I've got a bit of a dilemma with this one...
>
> In fact, the latter situation you describe isn't a concern at present to
> be able to deliver the information since the value is copied into the
> audit signal global internal variables before the signalling task dies
> and the audit signal info record is created from those copied (cached)
> values when requested from userspace.
>
> So the issue raised above I don't think is a problem. However, patch 18
> (which wasn't reviewed because it was a patch to a number of preceeding
> patches) changes the reporting approach to give a chain of nested
> contids which isn't reflected in the same level of reporting for the
> audit signal patch/mechanism. Solving this is a bit more complex. We
> could have the audit signal internal caching store a pointer to the
> relevant container object and bump its refcount to ensure it doesn't
> vanish until we are done with it, but the audit signal info binary
> record format already has a variable length due to the selinux context
> at the end of that struct and adding a second variable length element to
> it would make it more complicated (but not impossible) to handle.

[side note #1: Sorry for the delay, travel/conferences have limited my
time and I felt we needed to focus on the larger issue of
netlink/procfs first. Back to the other topics ...]

[side note #2: I just realized that one can shorten "audit container
ID" to ACID, I think that's going to be my favorite realization of the
day :)]

My concern wasn't really about the availability of the data, since as
you said, it is copied into the record buffer, but rather a delay
between when the audit container ID (ACID) disappears from the
tracking/list db in the kernel to when it is emitted in an audit
record from the kernel. During this time is seems like it could be
possible for the orchestrator to reintroduce the same ACID value and
if someone is not taking into account the full audit history they
could get confused (the full audit history should show the proper
creation/destruction events in the correct order). Ultimately I'm not
sure it is a major issue, and fixing it is likely to be really ugly,
but I think it would be good to add some comments in the code
regarding what we guarantee as far as ACID lifetimes are concerned.

--
paul moore
http://www.paul-moore.com

2019-11-08 17:51:52

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 06/21] audit: contid limit of 32k imposed to avoid DoS

On Thu, Oct 24, 2019 at 5:23 PM Richard Guy Briggs <[email protected]> wrote:
> On 2019-10-10 20:38, Paul Moore wrote:
> > On Fri, Sep 27, 2019 at 8:52 AM Neil Horman <[email protected]> wrote:
> > > On Wed, Sep 18, 2019 at 09:22:23PM -0400, Richard Guy Briggs wrote:
> > > > Set an arbitrary limit on the number of audit container identifiers to
> > > > limit abuse.
> > > >
> > > > Signed-off-by: Richard Guy Briggs <[email protected]>
> > > > ---
> > > > kernel/audit.c | 8 ++++++++
> > > > kernel/audit.h | 4 ++++
> > > > 2 files changed, 12 insertions(+)
> > > >
> > > > diff --git a/kernel/audit.c b/kernel/audit.c
> > > > index 53d13d638c63..329916534dd2 100644
> > > > --- a/kernel/audit.c
> > > > +++ b/kernel/audit.c
> >
> > ...
> >
> > > > @@ -2465,6 +2472,7 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> > > > newcont->owner = current;
> > > > refcount_set(&newcont->refcount, 1);
> > > > list_add_rcu(&newcont->list, &audit_contid_hash[h]);
> > > > + audit_contid_count++;
> > > > } else {
> > > > rc = -ENOMEM;
> > > > goto conterror;
> > > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > > index 162de8366b32..543f1334ba47 100644
> > > > --- a/kernel/audit.h
> > > > +++ b/kernel/audit.h
> > > > @@ -219,6 +219,10 @@ static inline int audit_hash_contid(u64 contid)
> > > > return (contid & (AUDIT_CONTID_BUCKETS-1));
> > > > }
> > > >
> > > > +extern int audit_contid_count;
> > > > +
> > > > +#define AUDIT_CONTID_COUNT 1 << 16
> > > > +
> > >
> > > Just to ask the question, since it wasn't clear in the changelog, what
> > > abuse are you avoiding here? Ostensibly you should be able to create as
> > > many container ids as you have space for, and the simple creation of
> > > container ids doesn't seem like the resource strain I would be concerned
> > > about here, given that an orchestrator can still create as many
> > > containers as the system will otherwise allow, which will consume
> > > significantly more ram/disk/etc.
> >
> > I've got a similar question. Up to this point in the patchset, there
> > is a potential issue of hash bucket chain lengths and traversing them
> > with a spinlock held, but it seems like we shouldn't be putting an
> > arbitrary limit on audit container IDs unless we have a good reason
> > for it. If for some reason we do want to enforce a limit, it should
> > probably be a tunable value like a sysctl, or similar.
>
> Can you separate and clarify the concerns here?

"Why are you doing this?" is about as simple as I can pose the question.

> I plan to move this patch to the end of the patchset and make it
> optional, possibly adding a tuning mechanism. Like the migration from
> /proc to netlink for loginuid/sessionid/contid/capcontid, this was Eric
> Biederman's concern and suggested mitigation.

Okay, let's just drop it. I *really* don't like this approach of
tossing questionable stuff at the end of the patchset; I get why you
are doing it, but I think we really need to focus on keeping this
changeset small. If the number of ACIDs (heh) become unwieldy the
right solution is to improve the algorithms/structures, if we can't do
that for some reason, *then* we can fall back to a limiting knob in a
latter release.

> As for the first issue of the bucket chain length traversal while
> holding the list spin-lock, would you prefer to use the rcu lock to
> traverse the list and then only hold the spin-lock when modifying the
> list, and possibly even make the spin-lock more fine-grained per list?

Until we have a better idea of how this is going to be used, I think
it's okay for now. It's also internal to the kernel so we can change
it at any time. My comments about the locking/structs was only to try
and think of some reason why one might want to limit the number of
ACIDs since neither you or Eric provided any reasoning that I could
see.

--
paul moore
http://www.paul-moore.com

2019-11-08 18:27:36

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH ghak90 V7 04/21] audit: convert to contid list to check for orch/engine ownership

On Fri, Oct 25, 2019 at 5:00 PM Richard Guy Briggs <[email protected]> wrote:
> On 2019-10-10 20:38, Paul Moore wrote:
> > On Wed, Sep 18, 2019 at 9:24 PM Richard Guy Briggs <[email protected]> wrote:
> > > Store the audit container identifier in a refcounted kernel object that
> > > is added to the master list of audit container identifiers. This will
> > > allow multiple container orchestrators/engines to work on the same
> > > machine without danger of inadvertantly re-using an existing identifier.
> > > It will also allow an orchestrator to inject a process into an existing
> > > container by checking if the original container owner is the one
> > > injecting the task. A hash table list is used to optimize searches.
> > >
> > > Signed-off-by: Richard Guy Briggs <[email protected]>
> > > ---
> > > include/linux/audit.h | 26 ++++++++++++++--
> > > kernel/audit.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++---
> > > kernel/audit.h | 8 +++++
> > > 3 files changed, 112 insertions(+), 8 deletions(-)
> >
> > One general comment before we go off into the weeds on this ... I can
> > understand why you wanted to keep this patch separate from the earlier
> > patches, but as we get closer to having mergeable code this should get
> > folded into the previous patches. For example, there shouldn't be a
> > change in audit_task_info where you change the contid field from a u64
> > to struct pointer, it should be a struct pointer from the start.
>
> I should have marked this patchset as RFC even though it was v7 due to a
> lot of new ideas/code that was added with uncertainties needing comment
> and direction.
>
> > It's also disappointing that idr appears to only be for 32-bit ID
> > values, if we had a 64-bit idr I think we could simplify this greatly.
>
> Perhaps. I do still see value in letting the orchestrator choose the
> value.

Agreed. I was just thinking out loud that it seems like much of what
we need could be a generic library mechanism similar to, but not quite
like, the existing idr code.

> > > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > > index f2e3b81f2942..e317807cdd3e 100644
> > > --- a/include/linux/audit.h
> > > +++ b/include/linux/audit.h
> > > @@ -95,10 +95,18 @@ struct audit_ntp_data {
> > > struct audit_ntp_data {};
> > > #endif
> > >
> > > +struct audit_cont {
> > > + struct list_head list;
> > > + u64 id;
> > > + struct task_struct *owner;
> > > + refcount_t refcount;
> > > + struct rcu_head rcu;
> > > +};
> >
> > It seems as though in most of the code you are using "contid", any
> > reason why didn't stick with that naming scheme here, e.g. "struct
> > audit_contid"?
>
> I was using contid to refer to the value itself and cont to refer to the
> refcounted object. I find cont a bit too terse, so I'm still thinking
> of changing it. Perhaps contobj?

Yes, just "cont" is a bit too ambiguous considering we have both
integer values and structures being passed around. Whatever you
decide on, a common base with separate suffixes seems like a good
idea.

FWIW, I still think the "audit container ID" : "ACID" thing is kinda funny ;)

> > > @@ -203,11 +211,15 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
> > >
> > > static inline u64 audit_get_contid(struct task_struct *tsk)
> > > {
> > > - if (!tsk->audit)
> > > + if (!tsk->audit || !tsk->audit->cont)
> > > return AUDIT_CID_UNSET;
> > > - return tsk->audit->contid;
> > > + return tsk->audit->cont->id;
> > > }
> >
> > Assuming for a moment that we implement an audit_contid_get() (see
> > Neil's comment as well as mine below), we probably need to name this
> > something different so we don't all lose our minds when we read this
> > code. On the plus side we can probably preface it with an underscore
> > since it is a static, in which case _audit_contid_get() might be okay,
> > but I'm open to suggestions.
>
> I'm fine with the "_" prefix, can you point to precedent or convention?

Generally kernel functions which are "special"/private/unsafe/etc.
have a one, or two, underscore prefix. If you don't want to add the
prefix, that's fine, but please change the name as mentioned
previously.

> > > @@ -231,7 +235,9 @@ int audit_alloc(struct task_struct *tsk)
> > > }
> > > info->loginuid = audit_get_loginuid(current);
> > > info->sessionid = audit_get_sessionid(current);
> > > - info->contid = audit_get_contid(current);
> > > + info->cont = audit_cont(current);
> > > + if (info->cont)
> > > + refcount_inc(&info->cont->refcount);
> >
> > See the other comments about a "get" function, but I think we need a
> > RCU read lock around the above, no?
>
> The rcu read lock is to protect the list rather than the cont object
> itself, the latter of which is protected by its refcount.

What protects you from info->cont going away between when you fetch
the pointer via audit_cont() to when you dereference it in
refcount_inc()?

> > > @@ -2397,8 +2438,43 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> > > else if (audit_contid_set(task))
> > > rc = -ECHILD;
> > > read_unlock(&tasklist_lock);
> > > - if (!rc)
> > > - task->audit->contid = contid;
> > > + if (!rc) {
> > > + struct audit_cont *oldcont = audit_cont(task);
> >
> > Previously we held the tasklist_lock to protect the audit container ID
> > associated with the struct, should we still be holding it here?
>
> We held the tasklist_lock to protect access to the target task's
> child/parent/thread relationships.

What protects us in the case of simultaneous calls to audit_set_contid()?

> > Regardless, I worry that the lock dependencies between the
> > tasklist_lock and the audit_contid_list_lock are going to be tricky.
> > It might be nice to document the relationship in a comment up near
> > where you declare audit_contid_list_lock.
>
> I don't think there should be a conflict between the two.
>
> The contid_list_lock doesn't care if the cont object is associated to a
> particular task.

Please document the relationship between the two, I worry we could
easily run into lockdep problems without a clearly defined ordering.

> > > + struct audit_cont *cont = NULL;
> > > + struct audit_cont *newcont = NULL;
> > > + int h = audit_hash_contid(contid);
> > > +
> > > + spin_lock(&audit_contid_list_lock);
> > > + list_for_each_entry_rcu(cont, &audit_contid_hash[h], list)
> > > + if (cont->id == contid) {
> > > + /* task injection to existing container */
> > > + if (current == cont->owner) {
> >
> > I understand the desire to limit a given audit container ID to the
> > orchestrator that created it, but are we certain that we can track
> > audit container ID "ownership" via a single instance of a task_struct?
>
> Are you suggesting that a task_struct representing a task may be
> replaced for a specific task? I don't believe that will ever happen.
>
> > What happens when the orchestrator stops/restarts/crashes? Do we
> > even care?
>
> Reap all of its containers?

These were genuine questions, I'm not suggesting anything in
particular, I'm just curious about how we handle an orchestrator that
isn't continuously running ... is this possible? Do we care?

--
paul moore
http://www.paul-moore.com