2013-06-19 01:53:53

by Gao feng

[permalink] [raw]
Subject: [Part1 PATCH 00/22] Add namespace support for audit

This patchset is first part of namespace support for audit.
in this patchset, the mainly resources of audit system have
been isolated. the audit filter, rules havn't been isolated
now. It will be implemented in Part2. We finished the isolation
of user audit message in this patchset.

I choose to assign audit to the user namespace.
Right now,there are six kinds of namespaces, such as
net, mount, ipc, pid, uts and user. the first five
namespaces have special usage. the audit isn't suitable to
belong to these five namespaces, And since the flag of system
call clone is in short supply, we can't provide a new flag such
as CLONE_NEWAUDIT to enable audit namespace separately. so the
user namespace may be the best choice.

[Patch 4/21] add a compare function pointer for netlink table,
so audit subsystem can use it's self-defined compare function
to make sure audit netlink sockets can communicate with each
other when they in the same user namespace. this patch has been
merged into David's net-next tree.

There is one point that some people may dislike,in [PATCH 3/21],
the kernel side audit netlink socket is created only when we create
the first netns for the userns, and this userns will hold the netns
until we destroy this userns. It also means if we only unshare the
user namespace, the audit is unavailable since we don't have audit
netlink socket. if we should unshare user and net namespace both.

change from RFC:
1, Move the cleanup patches to the head of this patchset.
2, Fix a scheduling while atomic BUG. This bug is caused by
kthread_stop in audit_free_user_ns.
3, Only allow init user namespace to change backlog_limit.
4, Audit subsystem is available only when kernel side audit
netlink socket has been created.
5, Only isolate the basic resources of audit, and only make
user audit message namespace aware.


This patchset is based on linus' linux tree.

You can pull this patchset from:
git://github.com/gao-feng/auditns.git

The following changes since commit 8177a9d79c0e942dcac3312f15585d0344d505a5

"lseek(fd, n, SEEK_END) does *not* go to eof - n"

are available in the git repository at:

git://github.com/gao-feng/auditns.git

for you to fetch changes up to 85c36b981ac692ec18e362ba484629a457d50cb2

"Audit: Allow GET,SET,USER MSG operations in uninit user namespace"

Gao feng (22):
Audit: change type of audit_ever_enabled to bool
Audit: remove duplicate comments
Audit: make audit kernel side netlink sock per userns
netlink: Add compare function for netlink_table
Audit: implement audit self-defined compare function
Audit: make audit_skb_queue per user namespace
Audit: make audit_skb_hold_queue per user namespace
Audit: make kauditd_task per user namespace
Audit: make audit_nlk_portid per user namesapce
Audit: make audit_enabled per user namespace
Audit: make audit_ever_enabled per user namespace
Audit: make audit_initialized per user namespace
Audit: only allow init user namespace to change rate limit
Audit: only allow init user namespace to change audit_failure
Audit: only allow init user namespace to change backlog_limit
Audit: make kauditd_wait per user namespace
Audit: make audit_backlog_wait per user namespace
Audit: introduce new audit logging interface for user namespace
Audit: pass proper user namespace to audit_log_common_recv_msg
Audit: Log audit config change in uninit user namespace
Audit: send reply message to the auditd in proper user namespace
Audit: Allow GET,SET,USER MSG operations in uninit user namespace

include/linux/audit.h | 39 +++-
include/linux/netlink.h | 1 +
include/linux/user_namespace.h | 33 ++-
kernel/audit.c | 452 +++++++++++++++++++++++++----------------
kernel/audit.h | 7 +-
kernel/auditsc.c | 11 +-
kernel/user_namespace.c | 3 +
net/netlink/af_netlink.c | 32 ++-
net/netlink/af_netlink.h | 1 +
9 files changed, 387 insertions(+), 192 deletions(-)

--
1.8.1.4


2013-06-19 01:53:08

by Gao feng

[permalink] [raw]
Subject: [PATCH 10/22] Audit: make audit_enabled per user namespace

This patch makes audit_enabled per user namespace,
Right now,use audit_enabled of init user namespace to
decide if audit is enabled no matter which user namespace
we belong to.

Signed-off-by: Gao feng <[email protected]>
---
include/linux/audit.h | 4 +++-
include/linux/user_namespace.h | 1 +
kernel/audit.c | 32 ++++++++++++++++----------------
3 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 179351d..cc30db9 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -450,7 +450,8 @@ extern int audit_filter_user(int type);
extern int audit_filter_type(int type);
extern int audit_receive_filter(int type, int pid, int seq,
void *data, size_t datasz);
-extern int audit_enabled;
+#define audit_enabled (init_user_ns.audit.enabled)
+#define audit_enabled_ns(ns) (ns->audit.enabled)
#else /* CONFIG_AUDIT */
static inline __printf(4, 5)
void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
@@ -503,6 +504,7 @@ static inline void audit_set_user_ns(struct user_namespace *ns)
static inline void audit_free_user_ns(struct user_namespace *ns)
{ }
#define audit_enabled 0
+#define audit_enabled_ns(ns) 0
#endif /* CONFIG_AUDIT */
static inline void audit_log_string(struct audit_buffer *ab, const char *buf)
{
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 60dd6da..9972f0f 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -21,6 +21,7 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
#ifdef CONFIG_AUDIT
struct audit_ctrl {
struct sock *sock;
+ int enabled;
int pid;
int portid;
struct sk_buff_head queue;
diff --git a/kernel/audit.c b/kernel/audit.c
index ca61cf0..758b1e8 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -78,11 +78,8 @@ static int audit_initialized;
#define AUDIT_OFF 0
#define AUDIT_ON 1
#define AUDIT_LOCKED 2
-int audit_enabled;
bool audit_ever_enabled;

-EXPORT_SYMBOL_GPL(audit_enabled);
-
/* Default state when kernel boots without any parameters. */
static int audit_default;

@@ -274,14 +271,15 @@ static int audit_log_config_change(char *function_name, int new, int old,
static int audit_do_config_change(char *function_name, int *to_change, int new)
{
int allow_changes, rc = 0, old = *to_change;
+ struct user_namespace *ns = current_user_ns();

/* check if we are locked */
- if (audit_enabled == AUDIT_LOCKED)
+ if (ns->audit.enabled == AUDIT_LOCKED)
allow_changes = 0;
else
allow_changes = 1;

- if (audit_enabled != AUDIT_OFF) {
+ if (ns->audit.enabled != AUDIT_OFF) {
rc = audit_log_config_change(function_name, new, old, allow_changes);
if (rc)
allow_changes = 0;
@@ -306,13 +304,14 @@ static int audit_set_backlog_limit(int limit)
return audit_do_config_change("audit_backlog_limit", &audit_backlog_limit, limit);
}

-static int audit_set_enabled(int state)
+static int audit_set_enabled(struct user_namespace *ns, int state)
{
int rc;
if (state < AUDIT_OFF || state > AUDIT_LOCKED)
return -EINVAL;

- rc = audit_do_config_change("audit_enabled", &audit_enabled, state);
+ rc = audit_do_config_change("audit_enabled", &ns->audit.enabled,
+ state);
if (!rc)
audit_ever_enabled |= !!state;

@@ -625,7 +624,7 @@ static int audit_log_common_recv_msg(struct audit_buffer **ab, u16 msg_type)
int rc = 0;
uid_t uid = from_kuid(&init_user_ns, current_uid());

- if (!audit_enabled) {
+ if (!audit_enabled_ns(&init_user_ns)) {
*ab = NULL;
return rc;
}
@@ -677,7 +676,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)

switch (msg_type) {
case AUDIT_GET:
- status_set.enabled = audit_enabled;
+ status_set.enabled = ns->audit.enabled;
status_set.failure = audit_failure;
status_set.pid = ns->audit.pid;
status_set.rate_limit = audit_rate_limit;
@@ -693,7 +692,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
return -EINVAL;
status_get = (struct audit_status *)data;
if (status_get->mask & AUDIT_STATUS_ENABLED) {
- err = audit_set_enabled(status_get->enabled);
+ err = audit_set_enabled(ns, status_get->enabled);
if (err < 0)
return err;
}
@@ -705,7 +704,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
if (status_get->mask & AUDIT_STATUS_PID) {
int new_pid = status_get->pid;

- if (audit_enabled != AUDIT_OFF)
+ if (ns->audit.enabled != AUDIT_OFF)
audit_log_config_change("audit_pid", new_pid,
ns->audit.pid, 1);
ns->audit.pid = new_pid;
@@ -722,7 +721,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
case AUDIT_USER:
case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
- if (!audit_enabled && msg_type != AUDIT_USER_AVC)
+ if (!audit_enabled_ns(ns) && msg_type != AUDIT_USER_AVC)
return 0;

err = audit_filter_user(msg_type);
@@ -755,9 +754,10 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
case AUDIT_DEL_RULE:
if (nlmsg_len(nlh) < sizeof(struct audit_rule_data))
return -EINVAL;
- if (audit_enabled == AUDIT_LOCKED) {
+ if (ns->audit.enabled == AUDIT_LOCKED) {
audit_log_common_recv_msg(&ab, AUDIT_CONFIG_CHANGE);
- audit_log_format(ab, " audit_enabled=%d res=0", audit_enabled);
+ audit_log_format(ab, " audit_enabled=%d res=0",
+ ns->audit.enabled);
audit_log_end(ab);
return -EPERM;
}
@@ -965,7 +965,6 @@ static int __init audit_init(void)

audit_set_user_ns(&init_user_ns);
audit_initialized = AUDIT_INITIALIZED;
- audit_enabled = audit_default;
audit_ever_enabled |= !!audit_default;

audit_log(NULL, GFP_KERNEL, AUDIT_KERNEL, "initialized");
@@ -987,7 +986,7 @@ static int __init audit_enable(char *str)
printk(KERN_INFO "audit: %s", audit_default ? "enabled" : "disabled");

if (audit_initialized == AUDIT_INITIALIZED) {
- audit_enabled = audit_default;
+ init_user_ns.audit.enabled = audit_default;
audit_ever_enabled |= !!audit_default;
} else if (audit_initialized == AUDIT_UNINITIALIZED) {
printk(" (after initialization)");
@@ -1792,6 +1791,7 @@ void audit_set_user_ns(struct user_namespace *ns)

skb_queue_head_init(&ns->audit.queue);
skb_queue_head_init(&ns->audit.hold_queue);
+ ns->audit.enabled = audit_default;
}

void audit_free_user_ns(struct user_namespace *ns)
--
1.8.1.4

2013-06-19 01:53:39

by Gao feng

[permalink] [raw]
Subject: [PATCH 08/22] Audit: make kauditd_task per user namespace

This patch makes kauditd_task per user namespace,
Since right now we only allow user in init user
namesapce to send audit netlink message to kernel,
so actually the kauditd_task belongs to other user
namespace will still not run.

Signed-off-by: Gao feng <[email protected]>
---
include/linux/audit.h | 1 +
include/linux/user_namespace.h | 15 +++++++++--
kernel/audit.c | 58 ++++++++++++++++++++++++++----------------
kernel/audit.h | 5 ++--
kernel/auditsc.c | 6 ++---
5 files changed, 55 insertions(+), 30 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 6720901..179351d 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -26,6 +26,7 @@
#include <linux/sched.h>
#include <linux/ptrace.h>
#include <uapi/linux/audit.h>
+#include <linux/user_namespace.h>

struct audit_sig_info {
uid_t uid;
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 53420a4..ae69f20 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -21,8 +21,10 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
#ifdef CONFIG_AUDIT
struct audit_ctrl {
struct sock *sock;
+ int pid;
struct sk_buff_head queue;
struct sk_buff_head hold_queue;
+ struct task_struct *kauditd_task;
};
#endif

@@ -59,8 +61,17 @@ extern void free_user_ns(struct user_namespace *ns);

static inline void put_user_ns(struct user_namespace *ns)
{
- if (ns && atomic_dec_and_test(&ns->count))
- free_user_ns(ns);
+ if (ns) {
+ if (atomic_dec_and_test(&ns->count)) {
+ free_user_ns(ns);
+ } else if (atomic_read(&ns->count) == 1) {
+ /* If the last user of this userns is kauditd,
+ * we should wake up the kauditd and let it kill
+ * itself, Then this userns will be destroyed.*/
+ if (ns->audit.kauditd_task)
+ wake_up_process(ns->audit.kauditd_task);
+ }
+ }
}

struct seq_operations;
diff --git a/kernel/audit.c b/kernel/audit.c
index 75325f0..7b696cd5 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -94,7 +94,6 @@ static int audit_failure = AUDIT_FAIL_PRINTK;
* contains the pid of the auditd process and audit_nlk_portid contains
* the portid to use to send netlink messages to that process.
*/
-int audit_pid;
static int audit_nlk_portid;

/* If audit_rate_limit is non-zero, limit the rate of sending audit records
@@ -131,7 +130,6 @@ static DEFINE_SPINLOCK(audit_freelist_lock);
static int audit_freelist_count;
static LIST_HEAD(audit_freelist);

-static struct task_struct *kauditd_task;
static DECLARE_WAIT_QUEUE_HEAD(kauditd_wait);
static DECLARE_WAIT_QUEUE_HEAD(audit_backlog_wait);

@@ -184,7 +182,7 @@ void audit_panic(const char *message)
break;
case AUDIT_FAIL_PANIC:
/* test audit_pid since printk is always losey, why bother? */
- if (audit_pid)
+ if (&init_user_ns.audit.pid)
panic("audit: %s\n", message);
break;
}
@@ -386,9 +384,10 @@ static void kauditd_send_skb(struct sk_buff *skb)
audit_nlk_portid, 0);
if (err < 0) {
BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
- printk(KERN_ERR "audit: *NO* daemon at audit_pid=%d\n", audit_pid);
+ printk(KERN_ERR "audit: *NO* daemon at audit_pid=%d\n",
+ init_user_ns.audit.pid);
audit_log_lost("auditd disappeared\n");
- audit_pid = 0;
+ init_user_ns.audit.pid = 0;
/* we might get lucky and get this in the next auditd */
audit_hold_skb(skb);
} else
@@ -411,19 +410,19 @@ static void kauditd_send_skb(struct sk_buff *skb)
* in 5 years when I want to play with this again I'll see this
* note and still have no friggin idea what i'm thinking today.
*/
-static void flush_hold_queue(void)
+static void flush_hold_queue(struct user_namespace *ns)
{
struct sk_buff *skb;
- struct sk_buff_head *hold_queue = &init_user_ns.audit.hold_queue;
+ struct sk_buff_head *hold_queue = &ns->audit.hold_queue;

- if (!audit_default || !audit_pid || !init_user_ns.audit.sock)
+ if (!audit_default || !ns->audit.pid || !ns->audit.sock)
return;

skb = skb_dequeue(hold_queue);
if (likely(!skb))
return;

- while (skb && audit_pid) {
+ while (skb && ns->audit.pid) {
kauditd_send_skb(skb);
skb = skb_dequeue(hold_queue);
}
@@ -438,18 +437,26 @@ static void flush_hold_queue(void)

static int kauditd_thread(void *dummy)
{
+ struct user_namespace *ns = dummy;
+
set_freezable();
while (!kthread_should_stop()) {
struct sk_buff *skb;
- struct sk_buff_head *queue = &init_user_ns.audit.queue;
+ struct sk_buff_head *queue = &ns->audit.queue;
DECLARE_WAITQUEUE(wait, current);

- flush_hold_queue();
+ /* Ok, We are the last user of this userns,
+ * It's time to go. Kill kauditd thread and
+ * release the userns. */
+ if (atomic_read(&ns->count) == 1)
+ break;
+
+ flush_hold_queue(ns);

skb = skb_dequeue(queue);
wake_up(&audit_backlog_wait);
if (skb) {
- if (audit_pid && init_user_ns.audit.sock)
+ if (ns->audit.pid && ns->audit.sock)
kauditd_send_skb(skb);
else
audit_printk_skb(skb);
@@ -466,6 +473,8 @@ static int kauditd_thread(void *dummy)
__set_current_state(TASK_RUNNING);
remove_wait_queue(&kauditd_wait, &wait);
}
+
+ put_user_ns(ns);
return 0;
}

@@ -658,13 +667,17 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
ns = current_user_ns();
/* As soon as there's any sign of userspace auditd,
* start kauditd to talk to it */
- if (!kauditd_task) {
- kauditd_task = kthread_run(kauditd_thread, NULL, "kauditd");
- if (IS_ERR(kauditd_task)) {
- err = PTR_ERR(kauditd_task);
- kauditd_task = NULL;
- return err;
+ if (!ns->audit.kauditd_task) {
+ struct task_struct *tsk;
+
+ tsk = kthread_run(kauditd_thread,
+ get_user_ns(ns), "kauditd");
+ if (IS_ERR(tsk)) {
+ put_user_ns(ns);
+ return PTR_ERR(tsk);
}
+
+ ns->audit.kauditd_task = tsk;
}
seq = nlh->nlmsg_seq;
data = nlmsg_data(nlh);
@@ -673,7 +686,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
case AUDIT_GET:
status_set.enabled = audit_enabled;
status_set.failure = audit_failure;
- status_set.pid = audit_pid;
+ status_set.pid = ns->audit.pid;
status_set.rate_limit = audit_rate_limit;
status_set.backlog_limit = audit_backlog_limit;
status_set.lost = atomic_read(&audit_lost);
@@ -700,8 +713,9 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
int new_pid = status_get->pid;

if (audit_enabled != AUDIT_OFF)
- audit_log_config_change("audit_pid", new_pid, audit_pid, 1);
- audit_pid = new_pid;
+ audit_log_config_change("audit_pid", new_pid,
+ ns->audit.pid, 1);
+ ns->audit.pid = new_pid;
audit_nlk_portid = NETLINK_CB(skb).portid;
}
if (status_get->mask & AUDIT_STATUS_RATE_LIMIT) {
@@ -1714,7 +1728,7 @@ void audit_log_end(struct audit_buffer *ab)
struct nlmsghdr *nlh = nlmsg_hdr(ab->skb);
nlh->nlmsg_len = ab->skb->len - NLMSG_HDRLEN;

- if (audit_pid && init_user_ns.audit.sock) {
+ if (init_user_ns.audit.pid && init_user_ns.audit.sock) {
skb_queue_tail(&init_user_ns.audit.queue, ab->skb);
wake_up_interruptible(&kauditd_wait);
} else {
diff --git a/kernel/audit.h b/kernel/audit.h
index 2258827..d746214 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -217,8 +217,6 @@ extern void audit_log_name(struct audit_context *context,
struct audit_names *n, struct path *path,
int record_num, int *call_panic);

-extern int audit_pid;
-
#define AUDIT_INODE_BUCKETS 32
extern struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];

@@ -309,7 +307,8 @@ extern u32 audit_sig_sid;
extern int __audit_signal_info(int sig, struct task_struct *t);
static inline int audit_signal_info(int sig, struct task_struct *t)
{
- if (unlikely((audit_pid && t->tgid == audit_pid) ||
+ if (unlikely((init_user_ns.audit.pid &&
+ t->tgid == init_user_ns.audit.pid) ||
(audit_signals && !audit_dummy_context())))
return __audit_signal_info(sig, t);
return 0;
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 3c8a601..8ba8684 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -745,7 +745,7 @@ static enum audit_state audit_filter_syscall(struct task_struct *tsk,
struct audit_entry *e;
enum audit_state state;

- if (audit_pid && tsk->tgid == audit_pid)
+ if (init_user_ns.audit.pid && tsk->tgid == init_user_ns.audit.pid)
return AUDIT_DISABLED;

rcu_read_lock();
@@ -806,7 +806,7 @@ void audit_filter_inodes(struct task_struct *tsk, struct audit_context *ctx)
{
struct audit_names *n;

- if (audit_pid && tsk->tgid == audit_pid)
+ if (init_user_ns.audit.pid && tsk->tgid == init_user_ns.audit.pid)
return;

rcu_read_lock();
@@ -2220,7 +2220,7 @@ int __audit_signal_info(int sig, struct task_struct *t)
struct audit_context *ctx = tsk->audit_context;
kuid_t uid = current_uid(), t_uid = task_uid(t);

- if (audit_pid && t->tgid == audit_pid) {
+ if (init_user_ns.audit.pid && t->tgid == init_user_ns.audit.pid) {
if (sig == SIGTERM || sig == SIGHUP || sig == SIGUSR1 || sig == SIGUSR2) {
audit_sig_pid = tsk->pid;
if (uid_valid(tsk->loginuid))
--
1.8.1.4

2013-06-19 01:53:10

by Gao feng

[permalink] [raw]
Subject: [PATCH 09/22] Audit: make audit_nlk_portid per user namesapce

After this patch, audit_nlk_port is per user namespace.
Just like prev patch does,use audit_nlk_portid of init
user namespace in kauditd_send_skb.

Signed-off-by: Gao feng <[email protected]>
---
include/linux/user_namespace.h | 1 +
kernel/audit.c | 11 ++---------
2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index ae69f20..60dd6da 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -22,6 +22,7 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
struct audit_ctrl {
struct sock *sock;
int pid;
+ int portid;
struct sk_buff_head queue;
struct sk_buff_head hold_queue;
struct task_struct *kauditd_task;
diff --git a/kernel/audit.c b/kernel/audit.c
index 7b696cd5..ca61cf0 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -89,13 +89,6 @@ static int audit_default;
/* If auditing cannot proceed, audit_failure selects what happens. */
static int audit_failure = AUDIT_FAIL_PRINTK;

-/*
- * If audit records are to be written to the netlink socket, audit_pid
- * contains the pid of the auditd process and audit_nlk_portid contains
- * the portid to use to send netlink messages to that process.
- */
-static int audit_nlk_portid;
-
/* If audit_rate_limit is non-zero, limit the rate of sending audit records
* to that number per second. This prevents DoS attacks, but results in
* audit records being dropped. */
@@ -381,7 +374,7 @@ static void kauditd_send_skb(struct sk_buff *skb)
/* take a reference in case we can't send it and we want to hold it */
skb_get(skb);
err = netlink_unicast(init_user_ns.audit.sock, skb,
- audit_nlk_portid, 0);
+ init_user_ns.audit.portid, 0);
if (err < 0) {
BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
printk(KERN_ERR "audit: *NO* daemon at audit_pid=%d\n",
@@ -716,7 +709,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
audit_log_config_change("audit_pid", new_pid,
ns->audit.pid, 1);
ns->audit.pid = new_pid;
- audit_nlk_portid = NETLINK_CB(skb).portid;
+ ns->audit.portid = NETLINK_CB(skb).portid;
}
if (status_get->mask & AUDIT_STATUS_RATE_LIMIT) {
err = audit_set_rate_limit(status_get->rate_limit);
--
1.8.1.4

2013-06-19 01:54:12

by Gao feng

[permalink] [raw]
Subject: [PATCH 07/22] Audit: make audit_skb_hold_queue per user namespace

After this patch, ervery user namespace has one
audit_skb_hold_queue. Since we havn't finish the
preparations, only allow user to operate the skb
hold queue of init user namespace.

Signed-off-by: Gao feng <[email protected]>
---
include/linux/user_namespace.h | 1 +
kernel/audit.c | 16 +++++++++-------
2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index e322f20..53420a4 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -22,6 +22,7 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
struct audit_ctrl {
struct sock *sock;
struct sk_buff_head queue;
+ struct sk_buff_head hold_queue;
};
#endif

diff --git a/kernel/audit.c b/kernel/audit.c
index e2f6366..75325f0 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -131,8 +131,6 @@ static DEFINE_SPINLOCK(audit_freelist_lock);
static int audit_freelist_count;
static LIST_HEAD(audit_freelist);

-/* queue of skbs to send to auditd when/if it comes back */
-static struct sk_buff_head audit_skb_hold_queue;
static struct task_struct *kauditd_task;
static DECLARE_WAIT_QUEUE_HEAD(kauditd_wait);
static DECLARE_WAIT_QUEUE_HEAD(audit_backlog_wait);
@@ -351,9 +349,11 @@ static int audit_set_failure(int state)
*/
static void audit_hold_skb(struct sk_buff *skb)
{
+ struct sk_buff_head *hold_queue = &init_user_ns.audit.hold_queue;
+
if (audit_default &&
- skb_queue_len(&audit_skb_hold_queue) < audit_backlog_limit)
- skb_queue_tail(&audit_skb_hold_queue, skb);
+ skb_queue_len(hold_queue) < audit_backlog_limit)
+ skb_queue_tail(hold_queue, skb);
else
kfree_skb(skb);
}
@@ -414,17 +414,18 @@ static void kauditd_send_skb(struct sk_buff *skb)
static void flush_hold_queue(void)
{
struct sk_buff *skb;
+ struct sk_buff_head *hold_queue = &init_user_ns.audit.hold_queue;

if (!audit_default || !audit_pid || !init_user_ns.audit.sock)
return;

- skb = skb_dequeue(&audit_skb_hold_queue);
+ skb = skb_dequeue(hold_queue);
if (likely(!skb))
return;

while (skb && audit_pid) {
kauditd_send_skb(skb);
- skb = skb_dequeue(&audit_skb_hold_queue);
+ skb = skb_dequeue(hold_queue);
}

/*
@@ -956,7 +957,6 @@ static int __init audit_init(void)
return -1;

audit_set_user_ns(&init_user_ns);
- skb_queue_head_init(&audit_skb_hold_queue);
audit_initialized = AUDIT_INITIALIZED;
audit_enabled = audit_default;
audit_ever_enabled |= !!audit_default;
@@ -1784,6 +1784,7 @@ void audit_set_user_ns(struct user_namespace *ns)
return;

skb_queue_head_init(&ns->audit.queue);
+ skb_queue_head_init(&ns->audit.hold_queue);
}

void audit_free_user_ns(struct user_namespace *ns)
@@ -1798,6 +1799,7 @@ void audit_free_user_ns(struct user_namespace *ns)
}

skb_queue_purge(&ns->audit.queue);
+ skb_queue_purge(&ns->audit.hold_queue);
}

EXPORT_SYMBOL(audit_log_start);
--
1.8.1.4

2013-06-19 01:54:09

by Gao feng

[permalink] [raw]
Subject: [PATCH 01/22] Audit: change type of audit_ever_enabled to bool

It's better to define audit_ever_enabled as bool.

Signed-off-by: Gao feng <[email protected]>
---
kernel/audit.c | 2 +-
kernel/audit.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 91e53d0..ad3084c 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -78,7 +78,7 @@ static int audit_initialized;
#define AUDIT_ON 1
#define AUDIT_LOCKED 2
int audit_enabled;
-int audit_ever_enabled;
+bool audit_ever_enabled;

EXPORT_SYMBOL_GPL(audit_enabled);

diff --git a/kernel/audit.h b/kernel/audit.h
index 1c95131..2258827 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -205,7 +205,7 @@ struct audit_context {
#endif
};

-extern int audit_ever_enabled;
+extern bool audit_ever_enabled;

extern void audit_copy_inode(struct audit_names *name,
const struct dentry *dentry,
--
1.8.1.4

2013-06-19 01:54:07

by Gao feng

[permalink] [raw]
Subject: [PATCH 22/22] Audit: Allow GET,SET,USER MSG operations in uninit user namespace

After this patch, user can set/get audit informations
in container, and they can also send user msg to the
audit subsystem.

Signed-off-by: Gao feng <[email protected]>
---
kernel/audit.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 0b3fd8b..1b60a5a 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -594,11 +594,6 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
{
int err = 0;

- /* Only support the initial namespaces for now. */
- if ((current_user_ns() != &init_user_ns) ||
- (task_active_pid_ns(current) != &init_pid_ns))
- return -EPERM;
-
switch (msg_type) {
case AUDIT_LIST:
case AUDIT_ADD:
@@ -606,6 +601,7 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
return -EOPNOTSUPP;
case AUDIT_GET:
case AUDIT_SET:
+ break;
case AUDIT_LIST_RULES:
case AUDIT_ADD_RULE:
case AUDIT_DEL_RULE:
@@ -614,13 +610,17 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
case AUDIT_TTY_SET:
case AUDIT_TRIM:
case AUDIT_MAKE_EQUIV:
- if (!capable(CAP_AUDIT_CONTROL))
+ /* These operations only support the initial
+ * namespaces for now. */
+ if ((current_user_ns() != &init_user_ns) ||
+ (task_active_pid_ns(current) != &init_pid_ns) ||
+ !capable(CAP_AUDIT_CONTROL))
err = -EPERM;
break;
case AUDIT_USER:
case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
- if (!capable(CAP_AUDIT_WRITE))
+ if (!ns_capable(current_user_ns(), CAP_AUDIT_WRITE))
err = -EPERM;
break;
default: /* bad msg */
--
1.8.1.4

2013-06-19 01:54:05

by Gao feng

[permalink] [raw]
Subject: [PATCH 19/22] Audit: pass proper user namespace to audit_log_common_recv_msg

The audit log that generated in user namespace should be
received by the auditd running in this user namespace.

Signed-off-by: Gao feng <[email protected]>
---
kernel/audit.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 5d3764c..2d81aac 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -624,17 +624,18 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
return err;
}

-static int audit_log_common_recv_msg(struct audit_buffer **ab, u16 msg_type)
+static int audit_log_common_recv_msg(struct user_namespace *ns,
+ struct audit_buffer **ab, u16 msg_type)
{
int rc = 0;
- uid_t uid = from_kuid(&init_user_ns, current_uid());
+ uid_t uid = from_kuid(ns, current_uid());

- if (!audit_enabled_ns(&init_user_ns)) {
+ if (!audit_enabled_ns(ns)) {
*ab = NULL;
return rc;
}

- *ab = audit_log_start(NULL, GFP_KERNEL, msg_type);
+ *ab = audit_log_start_ns(ns, NULL, GFP_KERNEL, msg_type);
if (unlikely(!*ab))
return rc;
audit_log_format(*ab, "pid=%d uid=%u", task_tgid_vnr(current), uid);
@@ -737,7 +738,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
if (err)
break;
}
- audit_log_common_recv_msg(&ab, msg_type);
+ audit_log_common_recv_msg(ns, &ab, msg_type);
if (msg_type != AUDIT_USER_TTY)
audit_log_format(ab, " msg='%.1024s'",
(char *)data);
@@ -752,7 +753,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
audit_log_n_untrustedstring(ab, data, size);
}
audit_set_pid(ab, NETLINK_CB(skb).portid);
- audit_log_end(ab);
+ audit_log_end_ns(ns, ab);
}
break;
case AUDIT_ADD_RULE:
@@ -760,10 +761,11 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
if (nlmsg_len(nlh) < sizeof(struct audit_rule_data))
return -EINVAL;
if (ns->audit.enabled == AUDIT_LOCKED) {
- audit_log_common_recv_msg(&ab, AUDIT_CONFIG_CHANGE);
+ audit_log_common_recv_msg(ns, &ab,
+ AUDIT_CONFIG_CHANGE);
audit_log_format(ab, " audit_enabled=%d res=0",
ns->audit.enabled);
- audit_log_end(ab);
+ audit_log_end_ns(ns, ab);
return -EPERM;
}
/* fallthrough */
@@ -773,9 +775,9 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
break;
case AUDIT_TRIM:
audit_trim_trees();
- audit_log_common_recv_msg(&ab, AUDIT_CONFIG_CHANGE);
+ audit_log_common_recv_msg(ns, &ab, AUDIT_CONFIG_CHANGE);
audit_log_format(ab, " op=trim res=1");
- audit_log_end(ab);
+ audit_log_end_ns(ns, ab);
break;
case AUDIT_MAKE_EQUIV: {
void *bufp = data;
@@ -803,14 +805,14 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
/* OK, here comes... */
err = audit_tag_tree(old, new);

- audit_log_common_recv_msg(&ab, AUDIT_CONFIG_CHANGE);
+ audit_log_common_recv_msg(ns, &ab, AUDIT_CONFIG_CHANGE);

audit_log_format(ab, " op=make_equiv old=");
audit_log_untrustedstring(ab, old);
audit_log_format(ab, " new=");
audit_log_untrustedstring(ab, new);
audit_log_format(ab, " res=%d", !err);
- audit_log_end(ab);
+ audit_log_end_ns(ns, ab);
kfree(old);
kfree(new);
break;
--
1.8.1.4

2013-06-19 01:54:03

by Gao feng

[permalink] [raw]
Subject: [PATCH 20/22] Audit: Log audit config change in uninit user namespace

This patch allow to log audit config change in
uninit user namespace.

Signed-off-by: Gao feng <[email protected]>
---
kernel/audit.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 2d81aac..84a882c 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -245,13 +245,14 @@ void audit_log_lost(const char *message)
}
}

-static int audit_log_config_change(char *function_name, int new, int old,
+static int audit_log_config_change(struct user_namespace *ns,
+ char *function_name, int new, int old,
int allow_changes)
{
struct audit_buffer *ab;
int rc = 0;

- ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
+ ab = audit_log_start_ns(ns, NULL, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
if (unlikely(!ab))
return rc;
audit_log_format(ab, "%s=%d old=%d", function_name, new, old);
@@ -260,7 +261,7 @@ static int audit_log_config_change(char *function_name, int new, int old,
if (rc)
allow_changes = 0; /* Something weird, deny request */
audit_log_format(ab, " res=%d", allow_changes);
- audit_log_end(ab);
+ audit_log_end_ns(ns, ab);
return rc;
}

@@ -276,7 +277,8 @@ static int audit_do_config_change(char *function_name, int *to_change, int new)
allow_changes = 1;

if (ns->audit.enabled != AUDIT_OFF) {
- rc = audit_log_config_change(function_name, new, old, allow_changes);
+ rc = audit_log_config_change(ns, function_name, new,
+ old, allow_changes);
if (rc)
allow_changes = 0;
}
@@ -711,7 +713,8 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
int new_pid = status_get->pid;

if (ns->audit.enabled != AUDIT_OFF)
- audit_log_config_change("audit_pid", new_pid,
+ audit_log_config_change(ns, "audit_pid",
+ new_pid,
ns->audit.pid, 1);
ns->audit.pid = new_pid;
ns->audit.portid = NETLINK_CB(skb).portid;
--
1.8.1.4

2013-06-19 01:54:00

by Gao feng

[permalink] [raw]
Subject: [PATCH 21/22] Audit: send reply message to the auditd in proper user namespace

We can send the audit reply message to userspace auditd
process which running in the same user namespace with the
process which send the audit request message to kernel.

Signed-off-by: Gao feng <[email protected]>
---
kernel/audit.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 84a882c..0b3fd8b 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -146,6 +146,7 @@ struct audit_buffer {
struct audit_reply {
int pid;
struct sk_buff *skb;
+ struct user_namespace *ns;
};

static void audit_set_pid(struct audit_buffer *ab, pid_t pid)
@@ -532,8 +533,9 @@ static int audit_send_reply_thread(void *arg)

/* Ignore failure. It'll only happen if the sender goes away,
because our timeout is set to infinite. */
- netlink_unicast(init_user_ns.audit.sock, reply->skb,
+ netlink_unicast(reply->ns->audit.sock, reply->skb,
reply->pid, 0);
+ put_user_ns(reply->ns);
kfree(reply);
return 0;
}
@@ -572,11 +574,13 @@ static int audit_send_reply(int pid, int seq, int type, int done, int multi,

reply->pid = pid;
reply->skb = skb;
+ reply->ns = get_user_ns(current_user_ns());

tsk = kthread_run(audit_send_reply_thread, reply, "audit_send_reply");
if (!IS_ERR(tsk))
return 0;
kfree_skb(skb);
+ put_user_ns(reply->ns);
out:
kfree(reply);
return ret;
@@ -833,7 +837,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
security_release_secctx(ctx, len);
return -ENOMEM;
}
- sig_data->uid = from_kuid(&init_user_ns, audit_sig_uid);
+ sig_data->uid = from_kuid(ns, audit_sig_uid);
sig_data->pid = audit_sig_pid;
if (audit_sig_sid) {
memcpy(sig_data->ctx, ctx, len);
--
1.8.1.4

2013-06-19 01:53:58

by Gao feng

[permalink] [raw]
Subject: [PATCH 18/22] Audit: introduce new audit logging interface for user namespace

This interface audit_log_start_ns and audit_log_end_ns
will be used for logging audit logs in user namespace.

Signed-off-by: Gao feng <[email protected]>
---
include/linux/audit.h | 25 ++++++++++++--
kernel/audit.c | 95 ++++++++++++++++++++++++++++++---------------------
2 files changed, 78 insertions(+), 42 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index cc30db9..b64f268 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -404,10 +404,18 @@ extern __printf(4, 5)
void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
const char *fmt, ...);

-extern struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, int type);
+extern struct audit_buffer *
+audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, int type);
+
+extern struct audit_buffer *
+audit_log_start_ns(struct user_namespace *ns, struct audit_context *ctx,
+ gfp_t gfp_mask, int type);
+
extern __printf(2, 3)
void audit_log_format(struct audit_buffer *ab, const char *fmt, ...);
extern void audit_log_end(struct audit_buffer *ab);
+extern void audit_log_end_ns(struct user_namespace *ns,
+ struct audit_buffer *ab);
extern int audit_string_contains_control(const char *string,
size_t len);
extern void audit_log_n_hex(struct audit_buffer *ab,
@@ -457,8 +465,16 @@ static inline __printf(4, 5)
void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
const char *fmt, ...)
{ }
-static inline struct audit_buffer *audit_log_start(struct audit_context *ctx,
- gfp_t gfp_mask, int type)
+static inline
+struct audit_buffer *audit_log_start(struct audit_context *ctx,
+ gfp_t gfp_mask, int type)
+{
+ return NULL;
+}
+static inline
+struct audit_buffer *audit_log_start_ns(struct user_namespace *ns,
+ struct audit_context *ctx,
+ gfp_t gfp_mask, int type)
{
return NULL;
}
@@ -467,6 +483,9 @@ void audit_log_format(struct audit_buffer *ab, const char *fmt, ...)
{ }
static inline void audit_log_end(struct audit_buffer *ab)
{ }
+static inline void audit_log_end_ns(struct user_namespace *ns,
+ struct audit_buffer *ab)
+{ }
static inline void audit_log_n_hex(struct audit_buffer *ab,
const unsigned char *buf, size_t len)
{ }
diff --git a/kernel/audit.c b/kernel/audit.c
index 3dcaa97..5d3764c 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1112,47 +1112,35 @@ static inline void audit_get_stamp(struct audit_context *ctx,
/*
* Wait for auditd to drain the queue a little
*/
-static void wait_for_auditd(unsigned long sleep_time)
+static void wait_for_auditd(struct user_namespace *ns,
+ unsigned long sleep_time)
{
- const struct sk_buff_head *queue = &init_user_ns.audit.queue;
+ const struct sk_buff_head *queue = &ns->audit.queue;
DECLARE_WAITQUEUE(wait, current);
set_current_state(TASK_UNINTERRUPTIBLE);
- add_wait_queue(&init_user_ns.audit.backlog_wait, &wait);
+ add_wait_queue(&ns->audit.backlog_wait, &wait);

if (audit_backlog_limit &&
skb_queue_len(queue) > audit_backlog_limit)
schedule_timeout(sleep_time);

__set_current_state(TASK_RUNNING);
- remove_wait_queue(&init_user_ns.audit.backlog_wait, &wait);
+ remove_wait_queue(&ns->audit.backlog_wait, &wait);
}

-/**
- * audit_log_start - obtain an audit buffer
- * @ctx: audit_context (may be NULL)
- * @gfp_mask: type of allocation
- * @type: audit message type
- *
- * Returns audit_buffer pointer on success or NULL on error.
- *
- * Obtain an audit buffer. This routine does locking to obtain the
- * audit buffer, but then no locking is required for calls to
- * audit_log_*format. If the task (ctx) is a task that is currently in a
- * syscall, then the syscall is marked as auditable and an audit record
- * will be written at syscall exit. If there is no associated task, then
- * task context (ctx) should be NULL.
- */
-struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
- int type)
+struct audit_buffer *audit_log_start_ns(struct user_namespace *ns,
+ struct audit_context *ctx,
+ gfp_t gfp_mask,
+ int type)
{
struct audit_buffer *ab = NULL;
struct timespec t;
unsigned int uninitialized_var(serial);
int reserve;
unsigned long timeout_start = jiffies;
- struct sk_buff_head *queue = &init_user_ns.audit.queue;
+ struct sk_buff_head *queue = &ns->audit.queue;

- if (init_user_ns.audit.initialized != AUDIT_INITIALIZED)
+ if (ns->audit.initialized != AUDIT_INITIALIZED)
return NULL;

if (unlikely(audit_filter_type(type)))
@@ -1172,7 +1160,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
sleep_time = timeout_start + audit_backlog_wait_time -
jiffies;
if ((long)sleep_time > 0)
- wait_for_auditd(sleep_time);
+ wait_for_auditd(ns, sleep_time);
continue;
}
if (audit_rate_check() && printk_ratelimit())
@@ -1183,7 +1171,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
audit_backlog_limit);
audit_log_lost("backlog limit exceeded");
audit_backlog_wait_time = audit_backlog_wait_overflow;
- wake_up(&init_user_ns.audit.backlog_wait);
+ wake_up(&ns->audit.backlog_wait);
return NULL;
}

@@ -1200,6 +1188,28 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
return ab;
}

+
+/**
+ * audit_log_start - obtain an audit buffer
+ * @ctx: audit_context (may be NULL)
+ * @gfp_mask: type of allocation
+ * @type: audit message type
+ *
+ * Returns audit_buffer pointer on success or NULL on error.
+ *
+ * Obtain an audit buffer. This routine does locking to obtain the
+ * audit buffer, but then no locking is required for calls to
+ * audit_log_*format. If the task (ctx) is a task that is currently in a
+ * syscall, then the syscall is marked as auditable and an audit record
+ * will be written at syscall exit. If there is no associated task, then
+ * task context (ctx) should be NULL.
+ */
+struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
+ int type)
+{
+ return audit_log_start_ns(&init_user_ns, ctx, gfp_mask, type);
+}
+
/**
* audit_expand - expand skb in the audit buffer
* @ab: audit_buffer
@@ -1704,16 +1714,7 @@ out:
kfree(name);
}

-/**
- * audit_log_end - end one audit record
- * @ab: the audit_buffer
- *
- * The netlink_* functions cannot be called inside an irq context, so
- * the audit buffer is placed on a queue and a tasklet is scheduled to
- * remove them from the queue outside the irq context. May be called in
- * any context.
- */
-void audit_log_end(struct audit_buffer *ab)
+void audit_log_end_ns(struct user_namespace *ns, struct audit_buffer *ab)
{
if (!ab)
return;
@@ -1723,11 +1724,11 @@ void audit_log_end(struct audit_buffer *ab)
struct nlmsghdr *nlh = nlmsg_hdr(ab->skb);
nlh->nlmsg_len = ab->skb->len - NLMSG_HDRLEN;

- if (init_user_ns.audit.pid && init_user_ns.audit.sock) {
- skb_queue_tail(&init_user_ns.audit.queue, ab->skb);
- wake_up_interruptible(&init_user_ns.audit.kauditd_wait);
+ if (ns->audit.pid && ns->audit.sock) {
+ skb_queue_tail(&ns->audit.queue, ab->skb);
+ wake_up_interruptible(&ns->audit.kauditd_wait);
} else {
- audit_printk_skb(&init_user_ns, ab->skb);
+ audit_printk_skb(ns, ab->skb);
}
ab->skb = NULL;
}
@@ -1735,6 +1736,20 @@ void audit_log_end(struct audit_buffer *ab)
}

/**
+ * audit_log_end - end one audit record
+ * @ab: the audit_buffer
+ *
+ * The netlink_* functions cannot be called inside an irq context, so
+ * the audit buffer is placed on a queue and a tasklet is scheduled to
+ * remove them from the queue outside the irq context. May be called in
+ * any context.
+ */
+void audit_log_end(struct audit_buffer *ab)
+{
+ audit_log_end_ns(&init_user_ns, ab);
+}
+
+/**
* audit_log - Log an audit record
* @ctx: audit context
* @gfp_mask: type of allocation
@@ -1818,7 +1833,9 @@ void audit_free_user_ns(struct user_namespace *ns)
}

EXPORT_SYMBOL(audit_log_start);
+EXPORT_SYMBOL(audit_log_start_ns);
EXPORT_SYMBOL(audit_log_end);
+EXPORT_SYMBOL(audit_log_end_ns);
EXPORT_SYMBOL(audit_log_format);
EXPORT_SYMBOL(audit_log);
EXPORT_SYMBOL(audit_set_user_ns);
--
1.8.1.4

2013-06-19 01:53:54

by Gao feng

[permalink] [raw]
Subject: [PATCH 03/22] Audit: make audit kernel side netlink sock per userns

This patch try to make the audit_sock per user namespace,
not global.

Since sock is assigned to net namespace, when creating
a netns, we will allocate a audit_sock for the userns
which create this netns, and this netns will keep alive
until the creator userns being destroyed.

If userns creates many netns, the audit_sock is only
allocated once.

Signed-off-by: Gao feng <[email protected]>
---
include/linux/audit.h | 5 +++
include/linux/user_namespace.h | 9 ++++
kernel/audit.c | 100 +++++++++++++++++++++++++++++++----------
kernel/user_namespace.c | 2 +
4 files changed, 93 insertions(+), 23 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index b20b038..85f9d7f 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -439,6 +439,8 @@ extern int audit_log_task_context(struct audit_buffer *ab);
extern void audit_log_task_info(struct audit_buffer *ab,
struct task_struct *tsk);

+extern void audit_free_user_ns(struct user_namespace *ns);
+
extern int audit_update_lsm_rules(void);

/* Private API (for audit.c only) */
@@ -492,6 +494,9 @@ static inline int audit_log_task_context(struct audit_buffer *ab)
static inline void audit_log_task_info(struct audit_buffer *ab,
struct task_struct *tsk)
{ }
+
+static inline void audit_free_user_ns(struct user_namespace *ns)
+{ }
#define audit_enabled 0
#endif /* CONFIG_AUDIT */
static inline void audit_log_string(struct audit_buffer *ab, const char *buf)
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index b6b215f..8797421 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -17,6 +17,12 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
} extent[UID_GID_MAP_MAX_EXTENTS];
};

+#ifdef CONFIG_AUDIT
+struct audit_ctrl {
+ struct sock *sock;
+};
+#endif
+
struct user_namespace {
struct uid_gid_map uid_map;
struct uid_gid_map gid_map;
@@ -26,6 +32,9 @@ struct user_namespace {
kuid_t owner;
kgid_t group;
unsigned int proc_inum;
+#ifdef CONFIG_AUDIT
+ struct audit_ctrl audit;
+#endif
bool may_mount_sysfs;
bool may_mount_proc;
};
diff --git a/kernel/audit.c b/kernel/audit.c
index 843e7a2..11b56b7 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -64,6 +64,7 @@
#include <linux/freezer.h>
#include <linux/tty.h>
#include <linux/pid_namespace.h>
+#include <linux/user_namespace.h>

#include "audit.h"

@@ -120,9 +121,6 @@ u32 audit_sig_sid = 0;
*/
static atomic_t audit_lost = ATOMIC_INIT(0);

-/* The netlink socket. */
-static struct sock *audit_sock;
-
/* Hash for inode-based rules */
struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];

@@ -385,7 +383,8 @@ static void kauditd_send_skb(struct sk_buff *skb)
int err;
/* take a reference in case we can't send it and we want to hold it */
skb_get(skb);
- err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
+ err = netlink_unicast(init_user_ns.audit.sock, skb,
+ audit_nlk_portid, 0);
if (err < 0) {
BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
printk(KERN_ERR "audit: *NO* daemon at audit_pid=%d\n", audit_pid);
@@ -417,7 +416,7 @@ static void flush_hold_queue(void)
{
struct sk_buff *skb;

- if (!audit_default || !audit_pid)
+ if (!audit_default || !audit_pid || !init_user_ns.audit.sock)
return;

skb = skb_dequeue(&audit_skb_hold_queue);
@@ -449,7 +448,7 @@ static int kauditd_thread(void *dummy)
skb = skb_dequeue(&audit_skb_queue);
wake_up(&audit_backlog_wait);
if (skb) {
- if (audit_pid)
+ if (audit_pid && init_user_ns.audit.sock)
kauditd_send_skb(skb);
else
audit_printk_skb(skb);
@@ -479,8 +478,11 @@ int audit_send_list(void *_dest)
mutex_lock(&audit_cmd_mutex);
mutex_unlock(&audit_cmd_mutex);

+ if (!init_user_ns.audit.sock)
+ return 0;
+
while ((skb = __skb_dequeue(&dest->q)) != NULL)
- netlink_unicast(audit_sock, skb, pid, 0);
+ netlink_unicast(init_user_ns.audit.sock, skb, pid, 0);

kfree(dest);

@@ -521,7 +523,8 @@ static int audit_send_reply_thread(void *arg)

/* Ignore failure. It'll only happen if the sender goes away,
because our timeout is set to infinite. */
- netlink_unicast(audit_sock, reply->skb, reply->pid, 0);
+ netlink_unicast(init_user_ns.audit.sock, reply->skb,
+ reply->pid, 0);
kfree(reply);
return 0;
}
@@ -538,16 +541,21 @@ static int audit_send_reply_thread(void *arg)
* Allocates an skb, builds the netlink message, and sends it to the pid.
* No failure notifications.
*/
-static void audit_send_reply(int pid, int seq, int type, int done, int multi,
- const void *payload, int size)
+static int audit_send_reply(int pid, int seq, int type, int done, int multi,
+ const void *payload, int size)
{
+ int ret = -ECONNREFUSED;
struct sk_buff *skb;
struct task_struct *tsk;
- struct audit_reply *reply = kmalloc(sizeof(struct audit_reply),
- GFP_KERNEL);
+ struct audit_reply *reply;
+
+ if (!current_user_ns()->audit.sock)
+ return ret;

+ ret = -ENOMEM;
+ reply = kmalloc(sizeof(struct audit_reply), GFP_KERNEL);
if (!reply)
- return;
+ return ret;

skb = audit_make_reply(pid, seq, type, done, multi, payload, size);
if (!skb)
@@ -558,10 +566,11 @@ static void audit_send_reply(int pid, int seq, int type, int done, int multi,

tsk = kthread_run(audit_send_reply_thread, reply, "audit_send_reply");
if (!IS_ERR(tsk))
- return;
+ return 0;
kfree_skb(skb);
out:
kfree(reply);
+ return ret;
}

/*
@@ -886,24 +895,56 @@ static void audit_receive(struct sk_buff *skb)
mutex_unlock(&audit_cmd_mutex);
}

+static int __net_init audit_net_init(struct net *net)
+{
+ struct user_namespace *ns = net->user_ns;
+
+ if (!ns->audit.sock) {
+ struct sock *sk = NULL;
+ /*
+ * create kernel side netlink socket for audit,
+ * make audit sock per user namespace.
+ */
+ struct netlink_kernel_cfg cfg = {
+ .input = audit_receive,
+ };
+
+ sk = netlink_kernel_create(net, NETLINK_AUDIT, &cfg);
+ if (!sk) {
+ if (net_eq(net, &init_net))
+ audit_panic("cannot initialize netlink socket");
+ return -1;
+ }
+ sk->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
+ ns->audit.sock = sk;
+ /*
+ * Get reference of net->passive, force this netns
+ * to be alive until we destroy the userns which
+ * creates this netns.
+ */
+ atomic_inc(&net->passive);
+ }
+
+ return 0;
+}
+
+static struct pernet_operations audit_net_ops = {
+ .init = audit_net_init,
+};
+
/* Initialize audit support at boot time. */
static int __init audit_init(void)
{
int i;
- struct netlink_kernel_cfg cfg = {
- .input = audit_receive,
- };

if (audit_initialized == AUDIT_DISABLED)
return 0;

printk(KERN_INFO "audit: initializing netlink socket (%s)\n",
audit_default ? "enabled" : "disabled");
- audit_sock = netlink_kernel_create(&init_net, NETLINK_AUDIT, &cfg);
- if (!audit_sock)
- audit_panic("cannot initialize netlink socket");
- else
- audit_sock->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
+
+ if (register_pernet_subsys(&audit_net_ops) < 0)
+ return -1;

skb_queue_head_init(&audit_skb_queue);
skb_queue_head_init(&audit_skb_hold_queue);
@@ -1662,7 +1703,7 @@ void audit_log_end(struct audit_buffer *ab)
struct nlmsghdr *nlh = nlmsg_hdr(ab->skb);
nlh->nlmsg_len = ab->skb->len - NLMSG_HDRLEN;

- if (audit_pid) {
+ if (audit_pid && init_user_ns.audit.sock) {
skb_queue_tail(&audit_skb_queue, ab->skb);
wake_up_interruptible(&kauditd_wait);
} else {
@@ -1726,7 +1767,20 @@ void audit_log_secctx(struct audit_buffer *ab, u32 secid)
EXPORT_SYMBOL(audit_log_secctx);
#endif

+void audit_free_user_ns(struct user_namespace *ns)
+{
+ if (audit_initialized == AUDIT_DISABLED)
+ return;
+
+ if (ns->audit.sock) {
+ struct net *net = sock_net(ns->audit.sock);
+ netlink_kernel_release(ns->audit.sock);
+ net_drop_ns(net);
+ }
+}
+
EXPORT_SYMBOL(audit_log_start);
EXPORT_SYMBOL(audit_log_end);
EXPORT_SYMBOL(audit_log_format);
EXPORT_SYMBOL(audit_log);
+EXPORT_SYMBOL(audit_free_user_ns);
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index d8c30db..99de920 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -22,6 +22,7 @@
#include <linux/ctype.h>
#include <linux/projid.h>
#include <linux/fs_struct.h>
+#include <linux/audit.h>

static struct kmem_cache *user_ns_cachep __read_mostly;

@@ -124,6 +125,7 @@ void free_user_ns(struct user_namespace *ns)
do {
parent = ns->parent;
proc_free_inum(ns->proc_inum);
+ audit_free_user_ns(ns);
kmem_cache_free(user_ns_cachep, ns);
ns = parent;
} while (atomic_dec_and_test(&parent->count));
--
1.8.1.4

2013-06-19 01:53:56

by Gao feng

[permalink] [raw]
Subject: [PATCH 17/22] Audit: make audit_backlog_wait per user namespace

Tasks are added to audit_backlog_wait when the
audit_skb_queue of user namespace is full, so
audit_backlog_wait should be per user namespace too.

Signed-off-by: Gao feng <[email protected]>
---
include/linux/user_namespace.h | 1 +
kernel/audit.c | 11 +++++------
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 28938f3..c186a84 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -29,6 +29,7 @@ struct audit_ctrl {
struct sk_buff_head hold_queue;
struct task_struct *kauditd_task;
wait_queue_head_t kauditd_wait;
+ wait_queue_head_t backlog_wait;
bool ever_enabled;
};
#endif
diff --git a/kernel/audit.c b/kernel/audit.c
index e3d7da7..3dcaa97 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -119,8 +119,6 @@ static DEFINE_SPINLOCK(audit_freelist_lock);
static int audit_freelist_count;
static LIST_HEAD(audit_freelist);

-static DECLARE_WAIT_QUEUE_HEAD(audit_backlog_wait);
-
/* Serialize requests from userspace. */
DEFINE_MUTEX(audit_cmd_mutex);

@@ -453,7 +451,7 @@ static int kauditd_thread(void *dummy)
flush_hold_queue(ns);

skb = skb_dequeue(queue);
- wake_up(&audit_backlog_wait);
+ wake_up(&ns->audit.backlog_wait);
if (skb) {
if (ns->audit.pid && ns->audit.sock)
kauditd_send_skb(ns, skb);
@@ -1119,14 +1117,14 @@ static void wait_for_auditd(unsigned long sleep_time)
const struct sk_buff_head *queue = &init_user_ns.audit.queue;
DECLARE_WAITQUEUE(wait, current);
set_current_state(TASK_UNINTERRUPTIBLE);
- add_wait_queue(&audit_backlog_wait, &wait);
+ add_wait_queue(&init_user_ns.audit.backlog_wait, &wait);

if (audit_backlog_limit &&
skb_queue_len(queue) > audit_backlog_limit)
schedule_timeout(sleep_time);

__set_current_state(TASK_RUNNING);
- remove_wait_queue(&audit_backlog_wait, &wait);
+ remove_wait_queue(&init_user_ns.audit.backlog_wait, &wait);
}

/**
@@ -1185,7 +1183,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
audit_backlog_limit);
audit_log_lost("backlog limit exceeded");
audit_backlog_wait_time = audit_backlog_wait_overflow;
- wake_up(&audit_backlog_wait);
+ wake_up(&init_user_ns.audit.backlog_wait);
return NULL;
}

@@ -1799,6 +1797,7 @@ void audit_set_user_ns(struct user_namespace *ns)
ns->audit.enabled = audit_default;
ns->audit.ever_enabled |= !!audit_default;
init_waitqueue_head(&ns->audit.kauditd_wait);
+ init_waitqueue_head(&ns->audit.backlog_wait);

ns->audit.initialized = AUDIT_INITIALIZED;
}
--
1.8.1.4

2013-06-19 01:53:51

by Gao feng

[permalink] [raw]
Subject: [PATCH 12/22] Audit: make audit_initialized per user namespace

audit_initialized is used to identify if the audit
related resources have been initialized. it should
be per user namespace too.

Signed-off-by: Gao feng <[email protected]>
---
include/linux/user_namespace.h | 1 +
kernel/audit.c | 21 +++++++++++----------
2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index a2c0a79..c665569 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -21,6 +21,7 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
#ifdef CONFIG_AUDIT
struct audit_ctrl {
struct sock *sock;
+ int initialized;
int enabled;
int pid;
int portid;
diff --git a/kernel/audit.c b/kernel/audit.c
index 923fe27..0b9cef2 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -68,12 +68,12 @@

#include "audit.h"

-/* No auditing will take place until audit_initialized == AUDIT_INITIALIZED.
+/* No auditing will take place until user namespace's
+ * audit.initialized == AUDIT_INITIALIZED.
* (Initialization happens after skb_init is called.) */
#define AUDIT_DISABLED -1
#define AUDIT_UNINITIALIZED 0
#define AUDIT_INITIALIZED 1
-static int audit_initialized;

#define AUDIT_OFF 0
#define AUDIT_ON 1
@@ -953,7 +953,7 @@ static int __init audit_init(void)
{
int i;

- if (audit_initialized == AUDIT_DISABLED)
+ if (init_user_ns.audit.initialized == AUDIT_DISABLED)
return 0;

printk(KERN_INFO "audit: initializing netlink socket (%s)\n",
@@ -963,7 +963,6 @@ static int __init audit_init(void)
return -1;

audit_set_user_ns(&init_user_ns);
- audit_initialized = AUDIT_INITIALIZED;

audit_log(NULL, GFP_KERNEL, AUDIT_KERNEL, "initialized");

@@ -979,14 +978,14 @@ static int __init audit_enable(char *str)
{
audit_default = !!simple_strtol(str, NULL, 0);
if (!audit_default)
- audit_initialized = AUDIT_DISABLED;
+ init_user_ns.audit.initialized = AUDIT_DISABLED;

printk(KERN_INFO "audit: %s", audit_default ? "enabled" : "disabled");

- if (audit_initialized == AUDIT_INITIALIZED) {
+ if (init_user_ns.audit.initialized == AUDIT_INITIALIZED) {
init_user_ns.audit.enabled = audit_default;
init_user_ns.audit.ever_enabled |= !!audit_default;
- } else if (audit_initialized == AUDIT_UNINITIALIZED) {
+ } else if (init_user_ns.audit.initialized == AUDIT_UNINITIALIZED) {
printk(" (after initialization)");
} else {
printk(" (until reboot)");
@@ -1147,7 +1146,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
unsigned long timeout_start = jiffies;
struct sk_buff_head *queue = &init_user_ns.audit.queue;

- if (audit_initialized != AUDIT_INITIALIZED)
+ if (init_user_ns.audit.initialized != AUDIT_INITIALIZED)
return NULL;

if (unlikely(audit_filter_type(type)))
@@ -1784,18 +1783,20 @@ EXPORT_SYMBOL(audit_log_secctx);

void audit_set_user_ns(struct user_namespace *ns)
{
- if (audit_initialized == AUDIT_DISABLED)
+ if (init_user_ns.audit.initialized == AUDIT_DISABLED)
return;

skb_queue_head_init(&ns->audit.queue);
skb_queue_head_init(&ns->audit.hold_queue);
ns->audit.enabled = audit_default;
ns->audit.ever_enabled |= !!audit_default;
+
+ ns->audit.initialized = AUDIT_INITIALIZED;
}

void audit_free_user_ns(struct user_namespace *ns)
{
- if (audit_initialized == AUDIT_DISABLED)
+ if (init_user_ns.audit.initialized == AUDIT_DISABLED)
return;

if (ns->audit.sock) {
--
1.8.1.4

2013-06-19 01:53:50

by Gao feng

[permalink] [raw]
Subject: [PATCH 16/22] Audit: make kauditd_wait per user namespace

kauditd_task is added to the wait queue kaudit_wait when
there is no audit message being generated in user namespace,
so the kaudit_wait should be per user namespace too.

Signed-off-by: Gao feng <[email protected]>
---
include/linux/user_namespace.h | 1 +
kernel/audit.c | 36 ++++++++++++++++++------------------
2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index c665569..28938f3 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -28,6 +28,7 @@ struct audit_ctrl {
struct sk_buff_head queue;
struct sk_buff_head hold_queue;
struct task_struct *kauditd_task;
+ wait_queue_head_t kauditd_wait;
bool ever_enabled;
};
#endif
diff --git a/kernel/audit.c b/kernel/audit.c
index 297ac6e..e3d7da7 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -119,7 +119,6 @@ static DEFINE_SPINLOCK(audit_freelist_lock);
static int audit_freelist_count;
static LIST_HEAD(audit_freelist);

-static DECLARE_WAIT_QUEUE_HEAD(kauditd_wait);
static DECLARE_WAIT_QUEUE_HEAD(audit_backlog_wait);

/* Serialize requests from userspace. */
@@ -345,9 +344,9 @@ static int audit_set_failure(int state)
* This only holds messages is audit_default is set, aka booting with audit=1
* or building your kernel that way.
*/
-static void audit_hold_skb(struct sk_buff *skb)
+static void audit_hold_skb(struct user_namespace *ns, struct sk_buff *skb)
{
- struct sk_buff_head *hold_queue = &init_user_ns.audit.hold_queue;
+ struct sk_buff_head *hold_queue = &ns->audit.hold_queue;

if (audit_default &&
skb_queue_len(hold_queue) < audit_backlog_limit)
@@ -360,7 +359,7 @@ static void audit_hold_skb(struct sk_buff *skb)
* For one reason or another this nlh isn't getting delivered to the userspace
* audit daemon, just send it to printk.
*/
-static void audit_printk_skb(struct sk_buff *skb)
+static void audit_printk_skb(struct user_namespace *ns, struct sk_buff *skb)
{
struct nlmsghdr *nlh = nlmsg_hdr(skb);
char *data = nlmsg_data(nlh);
@@ -372,24 +371,24 @@ static void audit_printk_skb(struct sk_buff *skb)
audit_log_lost("printk limit exceeded\n");
}

- audit_hold_skb(skb);
+ audit_hold_skb(ns, skb);
}

-static void kauditd_send_skb(struct sk_buff *skb)
+static void kauditd_send_skb(struct user_namespace *ns, struct sk_buff *skb)
{
int err;
/* take a reference in case we can't send it and we want to hold it */
skb_get(skb);
- err = netlink_unicast(init_user_ns.audit.sock, skb,
- init_user_ns.audit.portid, 0);
+ err = netlink_unicast(ns->audit.sock, skb,
+ ns->audit.portid, 0);
if (err < 0) {
BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
printk(KERN_ERR "audit: *NO* daemon at audit_pid=%d\n",
- init_user_ns.audit.pid);
+ ns->audit.pid);
audit_log_lost("auditd disappeared\n");
- init_user_ns.audit.pid = 0;
+ ns->audit.pid = 0;
/* we might get lucky and get this in the next auditd */
- audit_hold_skb(skb);
+ audit_hold_skb(ns, skb);
} else
/* drop the extra reference if sent ok */
consume_skb(skb);
@@ -423,7 +422,7 @@ static void flush_hold_queue(struct user_namespace *ns)
return;

while (skb && ns->audit.pid) {
- kauditd_send_skb(skb);
+ kauditd_send_skb(ns, skb);
skb = skb_dequeue(hold_queue);
}

@@ -457,13 +456,13 @@ static int kauditd_thread(void *dummy)
wake_up(&audit_backlog_wait);
if (skb) {
if (ns->audit.pid && ns->audit.sock)
- kauditd_send_skb(skb);
+ kauditd_send_skb(ns, skb);
else
- audit_printk_skb(skb);
+ audit_printk_skb(ns, skb);
continue;
}
set_current_state(TASK_INTERRUPTIBLE);
- add_wait_queue(&kauditd_wait, &wait);
+ add_wait_queue(&ns->audit.kauditd_wait, &wait);

if (!skb_queue_len(queue)) {
try_to_freeze();
@@ -471,7 +470,7 @@ static int kauditd_thread(void *dummy)
}

__set_current_state(TASK_RUNNING);
- remove_wait_queue(&kauditd_wait, &wait);
+ remove_wait_queue(&ns->audit.kauditd_wait, &wait);
}

put_user_ns(ns);
@@ -1728,9 +1727,9 @@ void audit_log_end(struct audit_buffer *ab)

if (init_user_ns.audit.pid && init_user_ns.audit.sock) {
skb_queue_tail(&init_user_ns.audit.queue, ab->skb);
- wake_up_interruptible(&kauditd_wait);
+ wake_up_interruptible(&init_user_ns.audit.kauditd_wait);
} else {
- audit_printk_skb(ab->skb);
+ audit_printk_skb(&init_user_ns, ab->skb);
}
ab->skb = NULL;
}
@@ -1799,6 +1798,7 @@ void audit_set_user_ns(struct user_namespace *ns)
skb_queue_head_init(&ns->audit.hold_queue);
ns->audit.enabled = audit_default;
ns->audit.ever_enabled |= !!audit_default;
+ init_waitqueue_head(&ns->audit.kauditd_wait);

ns->audit.initialized = AUDIT_INITIALIZED;
}
--
1.8.1.4

2013-06-19 01:53:48

by Gao feng

[permalink] [raw]
Subject: [PATCH 13/22] Audit: only allow init user namespace to change rate limit

Because We want to avoid the DoS attack caused by other user
namespace,so don't make audit_rate_limit per user namespace.
And only init user namespace has rights to change it.

Signed-off-by: Gao feng <[email protected]>
---
kernel/audit.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index 0b9cef2..306231d 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -295,6 +295,9 @@ static int audit_do_config_change(char *function_name, int *to_change, int new)

static int audit_set_rate_limit(int limit)
{
+ if (current_user_ns() != &init_user_ns)
+ return -EPERM;
+
return audit_do_config_change("audit_rate_limit", &audit_rate_limit, limit);
}

--
1.8.1.4

2013-06-19 01:53:46

by Gao feng

[permalink] [raw]
Subject: [PATCH 15/22] Audit: only allow init user namespace to change backlog_limit

Prevent un-init user namespace from generating lots of skb.

Signed-off-by: Gao feng <[email protected]>
---
kernel/audit.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index 79a8b8e..297ac6e 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -303,6 +303,9 @@ static int audit_set_rate_limit(int limit)

static int audit_set_backlog_limit(int limit)
{
+ if (current_user_ns() != &init_user_ns)
+ return -EPERM;
+
return audit_do_config_change("audit_backlog_limit", &audit_backlog_limit, limit);
}

--
1.8.1.4

2013-06-19 01:53:44

by Gao feng

[permalink] [raw]
Subject: [PATCH 14/22] Audit: only allow init user namespace to change audit_failure

Setting audit_failure to AUDIT_FAIL_PANIC may
cause system panic.

We should disallow uninit user namesapce to change it.

Signed-off-by: Gao feng <[email protected]>
---
kernel/audit.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index 306231d..79a8b8e 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -327,6 +327,9 @@ static int audit_set_failure(int state)
&& state != AUDIT_FAIL_PANIC)
return -EINVAL;

+ if (current_user_ns() != &init_user_ns)
+ return -EPERM;
+
return audit_do_config_change("audit_failure", &audit_failure, state);
}

--
1.8.1.4

2013-06-19 01:53:43

by Gao feng

[permalink] [raw]
Subject: [PATCH 04/22] netlink: Add compare function for netlink_table

As we know, netlink sockets are private resource of
net namespace, they can communicate with each other
only when they in the same net namespace. this works
well until we try to add namespace support for other
subsystems which use netlink.

Don't like ipv4 and route table.., it is not suited to
make these subsytems belong to net namespace, Such as
audit and crypto subsystems,they are more suitable to
user namespace.

So we must have the ability to make the netlink sockets
in same user namespace can communicate with each other.

This patch adds a new function pointer "compare" for
netlink_table, we can decide if the netlink sockets can
communicate with each other through this netlink_table
self-defined compare function.

The behavior isn't changed if we don't provide the compare
function for netlink_table.

Signed-off-by: Gao feng <[email protected]>
---
include/linux/netlink.h | 1 +
net/netlink/af_netlink.c | 32 ++++++++++++++++++++++++--------
net/netlink/af_netlink.h | 1 +
3 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 6358da5..f78b430 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -46,6 +46,7 @@ struct netlink_kernel_cfg {
void (*input)(struct sk_buff *skb);
struct mutex *cb_mutex;
void (*bind)(int group);
+ bool (*compare)(struct net *net, struct sock *sk);
};

extern struct sock *__netlink_kernel_create(struct net *net, int unit,
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 57ee84d..942d429 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -854,16 +854,23 @@ netlink_unlock_table(void)
wake_up(&nl_table_wait);
}

+static bool netlink_compare(struct net *net, struct sock *sk)
+{
+ return net_eq(sock_net(sk), net);
+}
+
static struct sock *netlink_lookup(struct net *net, int protocol, u32 portid)
{
- struct nl_portid_hash *hash = &nl_table[protocol].hash;
+ struct netlink_table *table = &nl_table[protocol];
+ struct nl_portid_hash *hash = &table->hash;
struct hlist_head *head;
struct sock *sk;

read_lock(&nl_table_lock);
head = nl_portid_hashfn(hash, portid);
sk_for_each(sk, head) {
- if (net_eq(sock_net(sk), net) && (nlk_sk(sk)->portid == portid)) {
+ if (table->compare(net, sk) &&
+ (nlk_sk(sk)->portid == portid)) {
sock_hold(sk);
goto found;
}
@@ -976,7 +983,8 @@ netlink_update_listeners(struct sock *sk)

static int netlink_insert(struct sock *sk, struct net *net, u32 portid)
{
- struct nl_portid_hash *hash = &nl_table[sk->sk_protocol].hash;
+ struct netlink_table *table = &nl_table[sk->sk_protocol];
+ struct nl_portid_hash *hash = &table->hash;
struct hlist_head *head;
int err = -EADDRINUSE;
struct sock *osk;
@@ -986,7 +994,8 @@ static int netlink_insert(struct sock *sk, struct net *net, u32 portid)
head = nl_portid_hashfn(hash, portid);
len = 0;
sk_for_each(osk, head) {
- if (net_eq(sock_net(osk), net) && (nlk_sk(osk)->portid == portid))
+ if (table->compare(net, osk) &&
+ (nlk_sk(osk)->portid == portid))
break;
len++;
}
@@ -1183,7 +1192,8 @@ static int netlink_autobind(struct socket *sock)
{
struct sock *sk = sock->sk;
struct net *net = sock_net(sk);
- struct nl_portid_hash *hash = &nl_table[sk->sk_protocol].hash;
+ struct netlink_table *table = &nl_table[sk->sk_protocol];
+ struct nl_portid_hash *hash = &table->hash;
struct hlist_head *head;
struct sock *osk;
s32 portid = task_tgid_vnr(current);
@@ -1195,7 +1205,7 @@ retry:
netlink_table_grab();
head = nl_portid_hashfn(hash, portid);
sk_for_each(osk, head) {
- if (!net_eq(sock_net(osk), net))
+ if (!table->compare(net, osk))
continue;
if (nlk_sk(osk)->portid == portid) {
/* Bind collision, search negative portid values. */
@@ -2285,6 +2295,8 @@ __netlink_kernel_create(struct net *net, int unit, struct module *module,
if (cfg) {
nl_table[unit].bind = cfg->bind;
nl_table[unit].flags = cfg->flags;
+ if (cfg->compare)
+ nl_table[unit].compare = cfg->compare;
}
nl_table[unit].registered = 1;
} else {
@@ -2707,6 +2719,7 @@ static void *netlink_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
struct sock *s;
struct nl_seq_iter *iter;
+ struct net *net;
int i, j;

++*pos;
@@ -2714,11 +2727,12 @@ static void *netlink_seq_next(struct seq_file *seq, void *v, loff_t *pos)
if (v == SEQ_START_TOKEN)
return netlink_seq_socket_idx(seq, 0);

+ net = seq_file_net(seq);
iter = seq->private;
s = v;
do {
s = sk_next(s);
- } while (s && sock_net(s) != seq_file_net(seq));
+ } while (s && !nl_table[s->sk_protocol].compare(net, s));
if (s)
return s;

@@ -2730,7 +2744,7 @@ static void *netlink_seq_next(struct seq_file *seq, void *v, loff_t *pos)

for (; j <= hash->mask; j++) {
s = sk_head(&hash->table[j]);
- while (s && sock_net(s) != seq_file_net(seq))
+ while (s && !nl_table[s->sk_protocol].compare(net, s))
s = sk_next(s);
if (s) {
iter->link = i;
@@ -2923,6 +2937,8 @@ static int __init netlink_proto_init(void)
hash->shift = 0;
hash->mask = 0;
hash->rehash_time = jiffies;
+
+ nl_table[i].compare = netlink_compare;
}

netlink_add_usersock_entry();
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index ed85222..eaa88d1 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -73,6 +73,7 @@ struct netlink_table {
struct mutex *cb_mutex;
struct module *module;
void (*bind)(int group);
+ bool (*compare)(struct net *net, struct sock *sock);
int registered;
};

--
1.8.1.4

2013-06-19 01:53:41

by Gao feng

[permalink] [raw]
Subject: [PATCH 11/22] Audit: make audit_ever_enabled per user namespace

We set audit_ever_enabled true after we enabled audit once.
and if audit_ever_enabled is true, we will allocate audit
context for task.

We should decide if to allocate audit context for tasks based on
if the audit is enabled once in the user namespace which the
task belongs to.

Signed-off-by: Gao feng <[email protected]>
---
include/linux/user_namespace.h | 1 +
kernel/audit.c | 7 +++----
kernel/auditsc.c | 5 ++++-
3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 9972f0f..a2c0a79 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -27,6 +27,7 @@ struct audit_ctrl {
struct sk_buff_head queue;
struct sk_buff_head hold_queue;
struct task_struct *kauditd_task;
+ bool ever_enabled;
};
#endif

diff --git a/kernel/audit.c b/kernel/audit.c
index 758b1e8..923fe27 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -78,7 +78,6 @@ static int audit_initialized;
#define AUDIT_OFF 0
#define AUDIT_ON 1
#define AUDIT_LOCKED 2
-bool audit_ever_enabled;

/* Default state when kernel boots without any parameters. */
static int audit_default;
@@ -313,7 +312,7 @@ static int audit_set_enabled(struct user_namespace *ns, int state)
rc = audit_do_config_change("audit_enabled", &ns->audit.enabled,
state);
if (!rc)
- audit_ever_enabled |= !!state;
+ ns->audit.ever_enabled |= !!state;

return rc;
}
@@ -965,7 +964,6 @@ static int __init audit_init(void)

audit_set_user_ns(&init_user_ns);
audit_initialized = AUDIT_INITIALIZED;
- audit_ever_enabled |= !!audit_default;

audit_log(NULL, GFP_KERNEL, AUDIT_KERNEL, "initialized");

@@ -987,7 +985,7 @@ static int __init audit_enable(char *str)

if (audit_initialized == AUDIT_INITIALIZED) {
init_user_ns.audit.enabled = audit_default;
- audit_ever_enabled |= !!audit_default;
+ init_user_ns.audit.ever_enabled |= !!audit_default;
} else if (audit_initialized == AUDIT_UNINITIALIZED) {
printk(" (after initialization)");
} else {
@@ -1792,6 +1790,7 @@ void audit_set_user_ns(struct user_namespace *ns)
skb_queue_head_init(&ns->audit.queue);
skb_queue_head_init(&ns->audit.hold_queue);
ns->audit.enabled = audit_default;
+ ns->audit.ever_enabled |= !!audit_default;
}

void audit_free_user_ns(struct user_namespace *ns)
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 8ba8684..3fa69cb 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -938,8 +938,11 @@ int audit_alloc(struct task_struct *tsk)
struct audit_context *context;
enum audit_state state;
char *key = NULL;
+ struct user_namespace *ns = current_user_ns();
+ /* Use current_user_ns, since this new task may run
+ * in new user namespace */

- if (likely(!audit_ever_enabled))
+ if (likely(!ns->audit.ever_enabled))
return 0; /* Return if not auditing. */

state = audit_filter_task(tsk, &key);
--
1.8.1.4

2013-06-19 01:59:40

by Gao feng

[permalink] [raw]
Subject: [PATCH 06/22] Audit: make audit_skb_queue per user namespace

After this patch, ervery user namespace has one
audit_skb_queue. Since we havn't finish the preparations,
only allow user to operate the skb queue of init user
namespace.

Signed-off-by: Gao feng <[email protected]>
---
include/linux/audit.h | 4 ++++
include/linux/user_namespace.h | 2 ++
kernel/audit.c | 34 +++++++++++++++++++++++++---------
kernel/user_namespace.c | 1 +
4 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 85f9d7f..6720901 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -439,6 +439,7 @@ extern int audit_log_task_context(struct audit_buffer *ab);
extern void audit_log_task_info(struct audit_buffer *ab,
struct task_struct *tsk);

+extern void audit_set_user_ns(struct user_namespace *ns);
extern void audit_free_user_ns(struct user_namespace *ns);

extern int audit_update_lsm_rules(void);
@@ -495,6 +496,9 @@ static inline void audit_log_task_info(struct audit_buffer *ab,
struct task_struct *tsk)
{ }

+static inline void audit_set_user_ns(struct user_namespace *ns)
+{ }
+
static inline void audit_free_user_ns(struct user_namespace *ns)
{ }
#define audit_enabled 0
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 8797421..e322f20 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -5,6 +5,7 @@
#include <linux/nsproxy.h>
#include <linux/sched.h>
#include <linux/err.h>
+#include <linux/skbuff.h>

#define UID_GID_MAP_MAX_EXTENTS 5

@@ -20,6 +21,7 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
#ifdef CONFIG_AUDIT
struct audit_ctrl {
struct sock *sock;
+ struct sk_buff_head queue;
};
#endif

diff --git a/kernel/audit.c b/kernel/audit.c
index a411b02..e2f6366 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -131,7 +131,6 @@ static DEFINE_SPINLOCK(audit_freelist_lock);
static int audit_freelist_count;
static LIST_HEAD(audit_freelist);

-static struct sk_buff_head audit_skb_queue;
/* queue of skbs to send to auditd when/if it comes back */
static struct sk_buff_head audit_skb_hold_queue;
static struct task_struct *kauditd_task;
@@ -441,11 +440,12 @@ static int kauditd_thread(void *dummy)
set_freezable();
while (!kthread_should_stop()) {
struct sk_buff *skb;
+ struct sk_buff_head *queue = &init_user_ns.audit.queue;
DECLARE_WAITQUEUE(wait, current);

flush_hold_queue();

- skb = skb_dequeue(&audit_skb_queue);
+ skb = skb_dequeue(queue);
wake_up(&audit_backlog_wait);
if (skb) {
if (audit_pid && init_user_ns.audit.sock)
@@ -457,7 +457,7 @@ static int kauditd_thread(void *dummy)
set_current_state(TASK_INTERRUPTIBLE);
add_wait_queue(&kauditd_wait, &wait);

- if (!skb_queue_len(&audit_skb_queue)) {
+ if (!skb_queue_len(queue)) {
try_to_freeze();
schedule();
}
@@ -648,11 +648,13 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
struct audit_sig_info *sig_data;
char *ctx = NULL;
u32 len;
+ struct user_namespace *ns;

err = audit_netlink_ok(skb, msg_type);
if (err)
return err;

+ ns = current_user_ns();
/* As soon as there's any sign of userspace auditd,
* start kauditd to talk to it */
if (!kauditd_task) {
@@ -674,7 +676,8 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
status_set.rate_limit = audit_rate_limit;
status_set.backlog_limit = audit_backlog_limit;
status_set.lost = atomic_read(&audit_lost);
- status_set.backlog = skb_queue_len(&audit_skb_queue);
+ status_set.backlog =
+ skb_queue_len(&ns->audit.queue);
audit_send_reply(NETLINK_CB(skb).portid, seq, AUDIT_GET, 0, 0,
&status_set, sizeof(status_set));
break;
@@ -952,7 +955,7 @@ static int __init audit_init(void)
if (register_pernet_subsys(&audit_net_ops) < 0)
return -1;

- skb_queue_head_init(&audit_skb_queue);
+ audit_set_user_ns(&init_user_ns);
skb_queue_head_init(&audit_skb_hold_queue);
audit_initialized = AUDIT_INITIALIZED;
audit_enabled = audit_default;
@@ -1102,12 +1105,13 @@ static inline void audit_get_stamp(struct audit_context *ctx,
*/
static void wait_for_auditd(unsigned long sleep_time)
{
+ const struct sk_buff_head *queue = &init_user_ns.audit.queue;
DECLARE_WAITQUEUE(wait, current);
set_current_state(TASK_UNINTERRUPTIBLE);
add_wait_queue(&audit_backlog_wait, &wait);

if (audit_backlog_limit &&
- skb_queue_len(&audit_skb_queue) > audit_backlog_limit)
+ skb_queue_len(queue) > audit_backlog_limit)
schedule_timeout(sleep_time);

__set_current_state(TASK_RUNNING);
@@ -1137,6 +1141,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
unsigned int uninitialized_var(serial);
int reserve;
unsigned long timeout_start = jiffies;
+ struct sk_buff_head *queue = &init_user_ns.audit.queue;

if (audit_initialized != AUDIT_INITIALIZED)
return NULL;
@@ -1151,7 +1156,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
entries over the normal backlog limit */

while (audit_backlog_limit
- && skb_queue_len(&audit_skb_queue) > audit_backlog_limit + reserve) {
+ && skb_queue_len(queue) > audit_backlog_limit + reserve) {
if (gfp_mask & __GFP_WAIT && audit_backlog_wait_time) {
unsigned long sleep_time;

@@ -1165,7 +1170,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
printk(KERN_WARNING
"audit: audit_backlog=%d > "
"audit_backlog_limit=%d\n",
- skb_queue_len(&audit_skb_queue),
+ skb_queue_len(queue),
audit_backlog_limit);
audit_log_lost("backlog limit exceeded");
audit_backlog_wait_time = audit_backlog_wait_overflow;
@@ -1710,7 +1715,7 @@ void audit_log_end(struct audit_buffer *ab)
nlh->nlmsg_len = ab->skb->len - NLMSG_HDRLEN;

if (audit_pid && init_user_ns.audit.sock) {
- skb_queue_tail(&audit_skb_queue, ab->skb);
+ skb_queue_tail(&init_user_ns.audit.queue, ab->skb);
wake_up_interruptible(&kauditd_wait);
} else {
audit_printk_skb(ab->skb);
@@ -1773,6 +1778,14 @@ void audit_log_secctx(struct audit_buffer *ab, u32 secid)
EXPORT_SYMBOL(audit_log_secctx);
#endif

+void audit_set_user_ns(struct user_namespace *ns)
+{
+ if (audit_initialized == AUDIT_DISABLED)
+ return;
+
+ skb_queue_head_init(&ns->audit.queue);
+}
+
void audit_free_user_ns(struct user_namespace *ns)
{
if (audit_initialized == AUDIT_DISABLED)
@@ -1783,10 +1796,13 @@ void audit_free_user_ns(struct user_namespace *ns)
netlink_kernel_release(ns->audit.sock);
net_drop_ns(net);
}
+
+ skb_queue_purge(&ns->audit.queue);
}

EXPORT_SYMBOL(audit_log_start);
EXPORT_SYMBOL(audit_log_end);
EXPORT_SYMBOL(audit_log_format);
EXPORT_SYMBOL(audit_log);
+EXPORT_SYMBOL(audit_set_user_ns);
EXPORT_SYMBOL(audit_free_user_ns);
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 99de920..047921a 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -100,6 +100,7 @@ int create_user_ns(struct cred *new)

update_mnt_policy(ns);

+ audit_set_user_ns(ns);
return 0;
}

--
1.8.1.4

2013-06-19 01:59:39

by Gao feng

[permalink] [raw]
Subject: [PATCH 02/22] Audit: remove duplicate comments

Remove it.

Signed-off-by: Gao feng <[email protected]>
---
kernel/audit.c | 7 -------
1 file changed, 7 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index ad3084c..843e7a2 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1067,13 +1067,6 @@ static void wait_for_auditd(unsigned long sleep_time)
remove_wait_queue(&audit_backlog_wait, &wait);
}

-/* Obtain an audit buffer. This routine does locking to obtain the
- * audit buffer, but then no locking is required for calls to
- * audit_log_*format. If the tsk is a task that is currently in a
- * syscall, then the syscall is marked as auditable and an audit record
- * will be written at syscall exit. If there is no associated task, tsk
- * should be NULL. */
-
/**
* audit_log_start - obtain an audit buffer
* @ctx: audit_context (may be NULL)
--
1.8.1.4

2013-06-19 02:00:08

by Gao feng

[permalink] [raw]
Subject: [PATCH 05/22] Audit: implement audit self-defined compare function

After this patch, audit netlink sockets can
communicate with each other when they belong
to the same user namespace.

Signed-off-by: Gao feng <[email protected]>
---
kernel/audit.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index 11b56b7..a411b02 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -895,6 +895,11 @@ static void audit_receive(struct sk_buff *skb)
mutex_unlock(&audit_cmd_mutex);
}

+static bool audit_compare(struct net *net, struct sock *sk)
+{
+ return (sock_net(sk)->user_ns == net->user_ns);
+}
+
static int __net_init audit_net_init(struct net *net)
{
struct user_namespace *ns = net->user_ns;
@@ -907,6 +912,7 @@ static int __net_init audit_net_init(struct net *net)
*/
struct netlink_kernel_cfg cfg = {
.input = audit_receive,
+ .compare = audit_compare,
};

sk = netlink_kernel_create(net, NETLINK_AUDIT, &cfg);
--
1.8.1.4

2013-06-19 20:49:57

by Aristeu Rozanski

[permalink] [raw]
Subject: Re: [Part1 PATCH 00/22] Add namespace support for audit

On Wed, Jun 19, 2013 at 09:53:32AM +0800, Gao feng wrote:
> This patchset is first part of namespace support for audit.
> in this patchset, the mainly resources of audit system have
> been isolated. the audit filter, rules havn't been isolated
> now. It will be implemented in Part2. We finished the isolation
> of user audit message in this patchset.
>
> I choose to assign audit to the user namespace.
> Right now,there are six kinds of namespaces, such as
> net, mount, ipc, pid, uts and user. the first five
> namespaces have special usage. the audit isn't suitable to
> belong to these five namespaces, And since the flag of system
> call clone is in short supply, we can't provide a new flag such
> as CLONE_NEWAUDIT to enable audit namespace separately. so the
> user namespace may be the best choice.

I thought it was said on the last submission that to tie userns and
audit namespace would be a bad idea?

--
Aristeu

2013-06-19 20:51:43

by Eric Paris

[permalink] [raw]
Subject: Re: [Part1 PATCH 00/22] Add namespace support for audit

On Wed, 2013-06-19 at 16:49 -0400, Aristeu Rozanski wrote:
> On Wed, Jun 19, 2013 at 09:53:32AM +0800, Gao feng wrote:
> > This patchset is first part of namespace support for audit.
> > in this patchset, the mainly resources of audit system have
> > been isolated. the audit filter, rules havn't been isolated
> > now. It will be implemented in Part2. We finished the isolation
> > of user audit message in this patchset.
> >
> > I choose to assign audit to the user namespace.
> > Right now,there are six kinds of namespaces, such as
> > net, mount, ipc, pid, uts and user. the first five
> > namespaces have special usage. the audit isn't suitable to
> > belong to these five namespaces, And since the flag of system
> > call clone is in short supply, we can't provide a new flag such
> > as CLONE_NEWAUDIT to enable audit namespace separately. so the
> > user namespace may be the best choice.
>
> I thought it was said on the last submission that to tie userns and
> audit namespace would be a bad idea?

I consider it a non-starter. unpriv users are allowed to launch their
own user namespace. The whole point of audit is to have only a priv
user be allowed to make changes. If you tied audit namespace to user
namespace you grant an unpriv user the ability to modify audit.

NAK.

If there are not clone flags you will either need to only do this from
unshare and not from clone, or get more flags to clone

-Eric

2013-06-19 21:03:52

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Part1 PATCH 00/22] Add namespace support for audit

Eric Paris <[email protected]> writes:

> On Wed, 2013-06-19 at 16:49 -0400, Aristeu Rozanski wrote:
>> On Wed, Jun 19, 2013 at 09:53:32AM +0800, Gao feng wrote:
>> > This patchset is first part of namespace support for audit.
>> > in this patchset, the mainly resources of audit system have
>> > been isolated. the audit filter, rules havn't been isolated
>> > now. It will be implemented in Part2. We finished the isolation
>> > of user audit message in this patchset.
>> >
>> > I choose to assign audit to the user namespace.
>> > Right now,there are six kinds of namespaces, such as
>> > net, mount, ipc, pid, uts and user. the first five
>> > namespaces have special usage. the audit isn't suitable to
>> > belong to these five namespaces, And since the flag of system
>> > call clone is in short supply, we can't provide a new flag such
>> > as CLONE_NEWAUDIT to enable audit namespace separately. so the
>> > user namespace may be the best choice.
>>
>> I thought it was said on the last submission that to tie userns and
>> audit namespace would be a bad idea?
>
> I consider it a non-starter. unpriv users are allowed to launch their
> own user namespace. The whole point of audit is to have only a priv
> user be allowed to make changes. If you tied audit namespace to user
> namespace you grant an unpriv user the ability to modify audit.
>
> NAK.
>
> If there are not clone flags you will either need to only do this from
> unshare and not from clone, or get more flags to clone

I completely agree that only priveleged user should be able to make
changes.

On the flip side, I don't know if this is at all interesting unless we
have a solution that works for users in unprivileged user namespaces.
Something like having the possibility of two or more instances of audit
working on every action. One for each layer of privilege.

Gao feng, how do you want to use the audit infrastructure?

Eric

2013-06-20 03:00:41

by Gao feng

[permalink] [raw]
Subject: Re: [Part1 PATCH 00/22] Add namespace support for audit

On 06/20/2013 04:51 AM, Eric Paris wrote:
> On Wed, 2013-06-19 at 16:49 -0400, Aristeu Rozanski wrote:
>> On Wed, Jun 19, 2013 at 09:53:32AM +0800, Gao feng wrote:
>>> This patchset is first part of namespace support for audit.
>>> in this patchset, the mainly resources of audit system have
>>> been isolated. the audit filter, rules havn't been isolated
>>> now. It will be implemented in Part2. We finished the isolation
>>> of user audit message in this patchset.
>>>
>>> I choose to assign audit to the user namespace.
>>> Right now,there are six kinds of namespaces, such as
>>> net, mount, ipc, pid, uts and user. the first five
>>> namespaces have special usage. the audit isn't suitable to
>>> belong to these five namespaces, And since the flag of system
>>> call clone is in short supply, we can't provide a new flag such
>>> as CLONE_NEWAUDIT to enable audit namespace separately. so the
>>> user namespace may be the best choice.
>>
>> I thought it was said on the last submission that to tie userns and
>> audit namespace would be a bad idea?
>
> I consider it a non-starter. unpriv users are allowed to launch their
> own user namespace. The whole point of audit is to have only a priv
> user be allowed to make changes. If you tied audit namespace to user
> namespace you grant an unpriv user the ability to modify audit.
>

I understand your views.

But ven the unpriv user are allowed to make changes, they can do no harm.
they can only make changes on the audit namespace they created.they can
only communicate with the audit namespace they created.

Before we have mount namespace, only priv user can change the mount
information, And with mount namespace & user namespace, the unpriv
user can create their own user namespace,and then create they own
mount namespace. so the unpriv user have the ability to change their
own mount namespace, since they become the priv user in their user
namespace.

The only difference to the mount namespace is that,audit namespace
is created automatically when creating user namespace,So the unpriv
users even have no chance to change the init audit namespace. the
unpriv users can't make changes on the init audit namespace.

In this patchset, I already do some check to make sure only the priv
user in init user namespace can change the rate_limit,audit_failure,
backlog_limit. I think we can absolutely make the unpriv users
can't do some bad influence on the whole system.

If we don't tie audit to user namespace, there is still one problem.
how audit netlink works? In this patchset, I use sock_net(sk)->user_ns
to decide if two audit netlink sockets can communicate with each other.
if we don't tie audit to user namespace, we have to add one field such
as 'auditns' in socket and use sk1->auditns == sk2->auditns to decide
if two audit netlink sockets belong to same audit namespace. I think
this is a little of overhead and ugly.

> NAK.
>
> If there are not clone flags you will either need to only do this from
> unshare and not from clone, or get more flags to clone
>

If you still think it's unsafe or unreasonable for unpriv users to create
audit namespace, I think we can only allow priv user to create audit ns
when they creating user namespace. If unpriv users create new user namespace,
we can simply don't create new audit namespace for them.

I can't find out a better solution...

Thanks,
Gao

2013-06-20 03:12:58

by Gao feng

[permalink] [raw]
Subject: Re: [Part1 PATCH 00/22] Add namespace support for audit

On 06/20/2013 11:02 AM, Gao feng wrote:
> If we don't tie audit to user namespace, there is still one problem.

One more problem. some audit messages are generated by some net subsystem
such as netfilter. If we don't tie audit to user namespace, we have no
idea where these audit messages should go. there is no relationship between
net namespace and audit namespace while we can get user namespace through
net user namespace.

2013-06-20 05:20:20

by Gao feng

[permalink] [raw]
Subject: Re: [Part1 PATCH 00/22] Add namespace support for audit

On 06/20/2013 05:03 AM, Eric W. Biederman wrote:
> Eric Paris <[email protected]> writes:
>
>> On Wed, 2013-06-19 at 16:49 -0400, Aristeu Rozanski wrote:
>>> On Wed, Jun 19, 2013 at 09:53:32AM +0800, Gao feng wrote:
>>>> This patchset is first part of namespace support for audit.
>>>> in this patchset, the mainly resources of audit system have
>>>> been isolated. the audit filter, rules havn't been isolated
>>>> now. It will be implemented in Part2. We finished the isolation
>>>> of user audit message in this patchset.
>>>>
>>>> I choose to assign audit to the user namespace.
>>>> Right now,there are six kinds of namespaces, such as
>>>> net, mount, ipc, pid, uts and user. the first five
>>>> namespaces have special usage. the audit isn't suitable to
>>>> belong to these five namespaces, And since the flag of system
>>>> call clone is in short supply, we can't provide a new flag such
>>>> as CLONE_NEWAUDIT to enable audit namespace separately. so the
>>>> user namespace may be the best choice.
>>>
>>> I thought it was said on the last submission that to tie userns and
>>> audit namespace would be a bad idea?
>>
>> I consider it a non-starter. unpriv users are allowed to launch their
>> own user namespace. The whole point of audit is to have only a priv
>> user be allowed to make changes. If you tied audit namespace to user
>> namespace you grant an unpriv user the ability to modify audit.
>>
>> NAK.
>>
>> If there are not clone flags you will either need to only do this from
>> unshare and not from clone, or get more flags to clone
>
> I completely agree that only priveleged user should be able to make
> changes.
>
> On the flip side, I don't know if this is at all interesting unless we
> have a solution that works for users in unprivileged user namespaces.
> Something like having the possibility of two or more instances of audit
> working on every action. One for each layer of privilege.
>
> Gao feng, how do you want to use the audit infrastructure?
>

I want the root user in container can use the audit related api
(audit_open,audit_log_user_message..) and some network related
audit messages generated by container shouldn't be logged to host.

Thanks,
Gao

2013-06-20 13:03:53

by Eric Paris

[permalink] [raw]
Subject: Re: [Part1 PATCH 00/22] Add namespace support for audit

On Thu, 2013-06-20 at 11:02 +0800, Gao feng wrote:
> On 06/20/2013 04:51 AM, Eric Paris wrote:
> > On Wed, 2013-06-19 at 16:49 -0400, Aristeu Rozanski wrote:
> >> On Wed, Jun 19, 2013 at 09:53:32AM +0800, Gao feng wrote:
> >>> This patchset is first part of namespace support for audit.
> >>> in this patchset, the mainly resources of audit system have
> >>> been isolated. the audit filter, rules havn't been isolated
> >>> now. It will be implemented in Part2. We finished the isolation
> >>> of user audit message in this patchset.
> >>>
> >>> I choose to assign audit to the user namespace.
> >>> Right now,there are six kinds of namespaces, such as
> >>> net, mount, ipc, pid, uts and user. the first five
> >>> namespaces have special usage. the audit isn't suitable to
> >>> belong to these five namespaces, And since the flag of system
> >>> call clone is in short supply, we can't provide a new flag such
> >>> as CLONE_NEWAUDIT to enable audit namespace separately. so the
> >>> user namespace may be the best choice.
> >>
> >> I thought it was said on the last submission that to tie userns and
> >> audit namespace would be a bad idea?
> >
> > I consider it a non-starter. unpriv users are allowed to launch their
> > own user namespace. The whole point of audit is to have only a priv
> > user be allowed to make changes. If you tied audit namespace to user
> > namespace you grant an unpriv user the ability to modify audit.
> >
>
> I understand your views.
>
> But ven the unpriv user are allowed to make changes, they can do no harm.
> they can only make changes on the audit namespace they created.they can
> only communicate with the audit namespace they created.

Imagine I set up my machine to audit all user access to super secret
information. With your patch set all an malicious user has to do is
create a new user namespace. Now when he accesses the super secret
information it will be logged inside the user namespace he created. So
he can just dump those logs in the trash.

I believe that each audit namespace should require priv
(CAP_AUDIT_CONTROL) in the user namespace that created the current audit
namespace. So lets say the machine boots and we are in the init_user
and init_audit namespace. Creating a new audit namespace should require
CAP_AUDIT_CONTROL in the init_user namespace. If instead we spawned a
new user namespace userns1 and try to create a new audit namespace, we
should STILL check for CAP_AUDIT_CONTROL in the init_user namespace.

Assuming we've spawned auditns1 in userns1 and want to spawn auditns2 it
should require CAP_AUDIT_CONTROL in userns1. So now you only have
permission to change your audit config (create a new audit namespace) if
you already had permission to change your current audit config.

Now how to handle coding this...

When the kernel receives an audit message on the netlink socket it can
always check the current->[whatever] to figure out which audit namespace
it came from. Then it can be processed accordingly...

Sending messages to the userspace auditd is a little more tricky. We
need to somehow map the audit namespace to a socket connected to auditd
in userspace. I'd imagine we'd have to either have per auditns backlog
queues and run one kauditd per audit namespace, or we'd have to tag the
skb's with the intended namespace somehow and then find the right socket
in kauditd. Either way it doesn't seem too onerous (although I admit, I
don't know how to code the per namespace kauditd right offhand)

2013-06-20 20:56:04

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [Part1 PATCH 00/22] Add namespace support for audit

Quoting Eric Paris ([email protected]):
> On Thu, 2013-06-20 at 11:02 +0800, Gao feng wrote:
> > On 06/20/2013 04:51 AM, Eric Paris wrote:
> > > On Wed, 2013-06-19 at 16:49 -0400, Aristeu Rozanski wrote:
> > >> On Wed, Jun 19, 2013 at 09:53:32AM +0800, Gao feng wrote:
> > >>> This patchset is first part of namespace support for audit.
> > >>> in this patchset, the mainly resources of audit system have
> > >>> been isolated. the audit filter, rules havn't been isolated
> > >>> now. It will be implemented in Part2. We finished the isolation
> > >>> of user audit message in this patchset.
> > >>>
> > >>> I choose to assign audit to the user namespace.
> > >>> Right now,there are six kinds of namespaces, such as
> > >>> net, mount, ipc, pid, uts and user. the first five
> > >>> namespaces have special usage. the audit isn't suitable to
> > >>> belong to these five namespaces, And since the flag of system
> > >>> call clone is in short supply, we can't provide a new flag such
> > >>> as CLONE_NEWAUDIT to enable audit namespace separately. so the
> > >>> user namespace may be the best choice.
> > >>
> > >> I thought it was said on the last submission that to tie userns and
> > >> audit namespace would be a bad idea?
> > >
> > > I consider it a non-starter. unpriv users are allowed to launch their
> > > own user namespace. The whole point of audit is to have only a priv
> > > user be allowed to make changes. If you tied audit namespace to user
> > > namespace you grant an unpriv user the ability to modify audit.
> > >
> >
> > I understand your views.
> >
> > But ven the unpriv user are allowed to make changes, they can do no harm.
> > they can only make changes on the audit namespace they created.they can
> > only communicate with the audit namespace they created.
>
> Imagine I set up my machine to audit all user access to super secret
> information. With your patch set all an malicious user has to do is
> create a new user namespace. Now when he accesses the super secret
> information it will be logged inside the user namespace he created. So
> he can just dump those logs in the trash.

Right, I thought I'd pointed this out last time - it makes LSPP
certification impossible.

> I believe that each audit namespace should require priv
> (CAP_AUDIT_CONTROL) in the user namespace that created the current audit
> namespace. So lets say the machine boots and we are in the init_user

The problem with this is that ... people will then hand out
CAP_AUDIT_CONTROL :)

I'd be happier with Eric Biederman's suggestion: You can create a new
audit namespace, but all of the initial audit namespace's filters still
(separately) apply to you.

-serge

2013-06-20 22:01:43

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Part1 PATCH 00/22] Add namespace support for audit

Gao feng <[email protected]> writes:

> On 06/20/2013 11:02 AM, Gao feng wrote:
>> If we don't tie audit to user namespace, there is still one problem.
>
> One more problem. some audit messages are generated by some net subsystem
> such as netfilter. If we don't tie audit to user namespace, we have no
> idea where these audit messages should go. there is no relationship between
> net namespace and audit namespace while we can get user namespace through
> net user namespace.

I am in favor of the user namespace tie in.

I am in favor of running a per user namespace audit filter once per user
namespace walking up the user namespace hierarchy. Each filter would
deliver messages to a different userspace audit daemon.

Until we agreement to go that far I am not certain the kernel generated
audit messages should go anywhere except to the global audit daemon.

I think on an individual basis we can look at kernel audit messages and
see if they should go to just the global user namespace. Just the user
namspace of the relevant network stack. Or if the message should go to
the audit daemon of every user namespace that is an ancestor of some
starting user namespace.

But please let's error on the side of caution here.

Eric

2013-06-21 03:46:37

by Gao feng

[permalink] [raw]
Subject: Re: [Part1 PATCH 00/22] Add namespace support for audit

On 06/20/2013 09:02 PM, Eric Paris wrote:
> On Thu, 2013-06-20 at 11:02 +0800, Gao feng wrote:
>> On 06/20/2013 04:51 AM, Eric Paris wrote:
>>> On Wed, 2013-06-19 at 16:49 -0400, Aristeu Rozanski wrote:
>>>> On Wed, Jun 19, 2013 at 09:53:32AM +0800, Gao feng wrote:
>>>>> This patchset is first part of namespace support for audit.
>>>>> in this patchset, the mainly resources of audit system have
>>>>> been isolated. the audit filter, rules havn't been isolated
>>>>> now. It will be implemented in Part2. We finished the isolation
>>>>> of user audit message in this patchset.
>>>>>
>>>>> I choose to assign audit to the user namespace.
>>>>> Right now,there are six kinds of namespaces, such as
>>>>> net, mount, ipc, pid, uts and user. the first five
>>>>> namespaces have special usage. the audit isn't suitable to
>>>>> belong to these five namespaces, And since the flag of system
>>>>> call clone is in short supply, we can't provide a new flag such
>>>>> as CLONE_NEWAUDIT to enable audit namespace separately. so the
>>>>> user namespace may be the best choice.
>>>>
>>>> I thought it was said on the last submission that to tie userns and
>>>> audit namespace would be a bad idea?
>>>
>>> I consider it a non-starter. unpriv users are allowed to launch their
>>> own user namespace. The whole point of audit is to have only a priv
>>> user be allowed to make changes. If you tied audit namespace to user
>>> namespace you grant an unpriv user the ability to modify audit.
>>>
>>
>> I understand your views.
>>
>> But ven the unpriv user are allowed to make changes, they can do no harm.
>> they can only make changes on the audit namespace they created.they can
>> only communicate with the audit namespace they created.
>
> Imagine I set up my machine to audit all user access to super secret
> information. With your patch set all an malicious user has to do is
> create a new user namespace. Now when he accesses the super secret
> information it will be logged inside the user namespace he created. So
> he can just dump those logs in the trash.
>

No, my v1 patchset only log the user audit message(which generated through
auditctl -m "xxx") inside user namespace.

I agree with that we should not simply log audit message in the child
audit namespace. for some global resource related audit messages, they
should be logged in init audit namespace too.

> I believe that each audit namespace should require priv
> (CAP_AUDIT_CONTROL) in the user namespace that created the current audit
> namespace. So lets say the machine boots and we are in the init_user
> and init_audit namespace. Creating a new audit namespace should require
> CAP_AUDIT_CONTROL in the init_user namespace. If instead we spawned a
> new user namespace userns1 and try to create a new audit namespace, we
> should STILL check for CAP_AUDIT_CONTROL in the init_user namespace.
>

Ok, I can add this permission check in next version, though this seems a
litter strictness when we make sure child audit namespace won't fool the
init audit namespace,

> Assuming we've spawned auditns1 in userns1 and want to spawn auditns2 it
> should require CAP_AUDIT_CONTROL in userns1. So now you only have
> permission to change your audit config (create a new audit namespace) if
> you already had permission to change your current audit config.
>
> Now how to handle coding this...
>
> When the kernel receives an audit message on the netlink socket it can
> always check the current->[whatever] to figure out which audit namespace
> it came from. Then it can be processed accordingly...

Yes, this situation is easy to handle, since we are in process context...
but in some situations, we are not running in process context... as I
mentioned, audit messages generated by netfilter rules. current process
is untrustable. we can only get meaningful net namespace in this situation.
Actually, it's meaningful to send net related audit messages to the user
namespace which creates this net namespace.


>
> Sending messages to the userspace auditd is a little more tricky. We
> need to somehow map the audit namespace to a socket connected to auditd
> in userspace. I'd imagine we'd have to either have per auditns backlog
> queues and run one kauditd per audit namespace, or we'd have to tag the
> skb's with the intended namespace somehow and then find the right socket
> in kauditd. Either way it doesn't seem too onerous (although I admit, I
> don't know how to code the per namespace kauditd right offhand)
>

As I said in "[PATCH 04/22] netlink: Add compare function for netlink_table",
netlink and socket are private resources of net namespace. socket has no
idea which audit namespace it belongs to,unless we add a field to mark this.
Through I think we can archive audit namespace in your way,but maybe we should
hack into the net namespace. I don't think the network guys will like it.

There is one more thing we have to do if we don't tie audit namespace to user
namespace. we have to implement the audit proc file(/proc/<pid>/ns/audit) and
the clone/unshare/setns parts.

I still this my way is the simplest and can satisfy your requirement.

2013-06-21 05:28:00

by Gao feng

[permalink] [raw]
Subject: Re: [Part1 PATCH 00/22] Add namespace support for audit

On 06/21/2013 06:01 AM, Eric W. Biederman wrote:
> Gao feng <[email protected]> writes:
>
>> On 06/20/2013 11:02 AM, Gao feng wrote:
>>> If we don't tie audit to user namespace, there is still one problem.
>>
>> One more problem. some audit messages are generated by some net subsystem
>> such as netfilter. If we don't tie audit to user namespace, we have no
>> idea where these audit messages should go. there is no relationship between
>> net namespace and audit namespace while we can get user namespace through
>> net user namespace.
>
> I am in favor of the user namespace tie in.
>
> I am in favor of running a per user namespace audit filter once per user
> namespace walking up the user namespace hierarchy. Each filter would
> deliver messages to a different userspace audit daemon.
>

Agree, this sounds reasonable.

> Until we agreement to go that far I am not certain the kernel generated
> audit messages should go anywhere except to the global audit daemon.

There are some audit messages that we sure where they should go, we can start
from them firstly.

>
> I think on an individual basis we can look at kernel audit messages and
> see if they should go to just the global user namespace. Just the user
> namspace of the relevant network stack. Or if the message should go to
> the audit daemon of every user namespace that is an ancestor of some
> starting user namespace.
>
> But please let's error on the side of caution here.
>
> Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2013-06-21 09:51:45

by Daniel Walsh

[permalink] [raw]
Subject: Re: [Part1 PATCH 00/22] Add namespace support for audit

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/20/2013 11:48 PM, Gao feng wrote:
> On 06/20/2013 09:02 PM, Eric Paris wrote:
>> On Thu, 2013-06-20 at 11:02 +0800, Gao feng wrote:
>>> On 06/20/2013 04:51 AM, Eric Paris wrote:
>>>> On Wed, 2013-06-19 at 16:49 -0400, Aristeu Rozanski wrote:
>>>>> On Wed, Jun 19, 2013 at 09:53:32AM +0800, Gao feng wrote:
>>>>>> This patchset is first part of namespace support for audit. in
>>>>>> this patchset, the mainly resources of audit system have been
>>>>>> isolated. the audit filter, rules havn't been isolated now. It
>>>>>> will be implemented in Part2. We finished the isolation of user
>>>>>> audit message in this patchset.
>>>>>>
>>>>>> I choose to assign audit to the user namespace. Right now,there
>>>>>> are six kinds of namespaces, such as net, mount, ipc, pid, uts
>>>>>> and user. the first five namespaces have special usage. the audit
>>>>>> isn't suitable to belong to these five namespaces, And since the
>>>>>> flag of system call clone is in short supply, we can't provide a
>>>>>> new flag such as CLONE_NEWAUDIT to enable audit namespace
>>>>>> separately. so the user namespace may be the best choice.
>>>>>
>>>>> I thought it was said on the last submission that to tie userns
>>>>> and audit namespace would be a bad idea?
>>>>
>>>> I consider it a non-starter. unpriv users are allowed to launch
>>>> their own user namespace. The whole point of audit is to have only a
>>>> priv user be allowed to make changes. If you tied audit namespace to
>>>> user namespace you grant an unpriv user the ability to modify audit.
>>>>
>>>
>>> I understand your views.
>>>
>>> But ven the unpriv user are allowed to make changes, they can do no
>>> harm. they can only make changes on the audit namespace they
>>> created.they can only communicate with the audit namespace they
>>> created.
>>
>> Imagine I set up my machine to audit all user access to super secret
>> information. With your patch set all an malicious user has to do is
>> create a new user namespace. Now when he accesses the super secret
>> information it will be logged inside the user namespace he created. So
>> he can just dump those logs in the trash.
>>
>
> No, my v1 patchset only log the user audit message(which generated through
> auditctl -m "xxx") inside user namespace.
>
> I agree with that we should not simply log audit message in the child audit
> namespace. for some global resource related audit messages, they should be
> logged in init audit namespace too.
>
>> I believe that each audit namespace should require priv
>> (CAP_AUDIT_CONTROL) in the user namespace that created the current audit
>> namespace. So lets say the machine boots and we are in the init_user and
>> init_audit namespace. Creating a new audit namespace should require
>> CAP_AUDIT_CONTROL in the init_user namespace. If instead we spawned a
>> new user namespace userns1 and try to create a new audit namespace, we
>> should STILL check for CAP_AUDIT_CONTROL in the init_user namespace.
>>
>
> Ok, I can add this permission check in next version, though this seems a
> litter strictness when we make sure child audit namespace won't fool the
> init audit namespace,
>
>> Assuming we've spawned auditns1 in userns1 and want to spawn auditns2 it
>> should require CAP_AUDIT_CONTROL in userns1. So now you only have
>> permission to change your audit config (create a new audit namespace) if
>> you already had permission to change your current audit config.
>>
>> Now how to handle coding this...
>>
>> When the kernel receives an audit message on the netlink socket it can
>> always check the current->[whatever] to figure out which audit namespace
>> it came from. Then it can be processed accordingly...
>
> Yes, this situation is easy to handle, since we are in process context...
> but in some situations, we are not running in process context... as I
> mentioned, audit messages generated by netfilter rules. current process is
> untrustable. we can only get meaningful net namespace in this situation.
> Actually, it's meaningful to send net related audit messages to the user
> namespace which creates this net namespace.
>
>
>>
>> Sending messages to the userspace auditd is a little more tricky. We
>> need to somehow map the audit namespace to a socket connected to auditd
>> in userspace. I'd imagine we'd have to either have per auditns backlog
>> queues and run one kauditd per audit namespace, or we'd have to tag the
>> skb's with the intended namespace somehow and then find the right socket
>> in kauditd. Either way it doesn't seem too onerous (although I admit, I
>> don't know how to code the per namespace kauditd right offhand)
>>
>
> As I said in "[PATCH 04/22] netlink: Add compare function for
> netlink_table", netlink and socket are private resources of net namespace.
> socket has no idea which audit namespace it belongs to,unless we add a
> field to mark this. Through I think we can archive audit namespace in your
> way,but maybe we should hack into the net namespace. I don't think the
> network guys will like it.
>
> There is one more thing we have to do if we don't tie audit namespace to
> user namespace. we have to implement the audit proc
> file(/proc/<pid>/ns/audit) and the clone/unshare/setns parts.
>
> I still this my way is the simplest and can satisfy your requirement.
>
> -- Linux-audit mailing list [email protected]
> https://www.redhat.com/mailman/listinfo/linux-audit
>
Will I be able to use the audit namespace without the user namespace. I would
prefer to be able to use the audit namespace long before I am willing to take
a chance with the User Namespace for things like light weight virtualization
and securing processes with MAC.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlHEIiEACgkQrlYvE4MpobP8awCcD4xO34I4IDD+qeHQLPFtWM0q
vnMAn1PTatEUfPkeQG1+dG4ULKpOLCoB
=nrRz
-----END PGP SIGNATURE-----

2013-06-21 10:50:12

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Part1 PATCH 00/22] Add namespace support for audit

Daniel J Walsh <[email protected]> writes:

> Will I be able to use the audit namespace without the user namespace. I would
> prefer to be able to use the audit namespace long before I am willing to take
> a chance with the User Namespace for things like light weight virtualization
> and securing processes with MAC.

I will be very surprised if we settle on a design that allows it.

I still think even the existence of multiple audit contexts is a little
iffy but the desire seems strong enough Gao feng will probably work
through all of the issues.

Without restricting processes to a user namespace at the same time as
restricting them to an audit context it becomes very easy to violate the
important audit policies and to bury user space generated messages from
privileged userspace applications. At least in a user namespace we know
you are not privileged with respect to the rest of the system, so we
would only be dealing with userspace messages you would not be able to
generate otherwise.

As for taking a chance. You will probably safer with a simultaneous use
of user namespaces and having processes secured with a MAC. To the best
of my knowledge previous solutions have only been really safe when you
trusted the processes inside not to be malicious. A user namespace at
least means you can stop using uid 0 inside of your light weight
virtualization.

Eric

2013-06-24 15:02:55

by Aristeu Rozanski

[permalink] [raw]
Subject: Re: [Part1 PATCH 00/22] Add namespace support for audit

On Thu, Jun 20, 2013 at 03:01:09PM -0700, Eric W. Biederman wrote:
> Gao feng <[email protected]> writes:
>
> > On 06/20/2013 11:02 AM, Gao feng wrote:
> >> If we don't tie audit to user namespace, there is still one problem.
> >
> > One more problem. some audit messages are generated by some net subsystem
> > such as netfilter. If we don't tie audit to user namespace, we have no
> > idea where these audit messages should go. there is no relationship between
> > net namespace and audit namespace while we can get user namespace through
> > net user namespace.
>
> I am in favor of the user namespace tie in.
>
> I am in favor of running a per user namespace audit filter once per user
> namespace walking up the user namespace hierarchy. Each filter would
> deliver messages to a different userspace audit daemon.
>
> Until we agreement to go that far I am not certain the kernel generated
> audit messages should go anywhere except to the global audit daemon.
>
> I think on an individual basis we can look at kernel audit messages and
> see if they should go to just the global user namespace. Just the user
> namspace of the relevant network stack. Or if the message should go to
> the audit daemon of every user namespace that is an ancestor of some
> starting user namespace.
>
> But please let's error on the side of caution here.

Let's say I'd like to use userns but still have a single and global
auditd. Dropping the capability will be required for that? That sounds
bad. The fact new user namespaces will want to control the separated
audit namespace doesn't require tie in.

--
Aristeu

2013-06-24 19:04:26

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Part1 PATCH 00/22] Add namespace support for audit

Aristeu Rozanski <[email protected]> writes:

> On Thu, Jun 20, 2013 at 03:01:09PM -0700, Eric W. Biederman wrote:
>> Gao feng <[email protected]> writes:
>>
>> > On 06/20/2013 11:02 AM, Gao feng wrote:
>> >> If we don't tie audit to user namespace, there is still one problem.
>> >
>> > One more problem. some audit messages are generated by some net subsystem
>> > such as netfilter. If we don't tie audit to user namespace, we have no
>> > idea where these audit messages should go. there is no relationship between
>> > net namespace and audit namespace while we can get user namespace through
>> > net user namespace.
>>
>> I am in favor of the user namespace tie in.
>>
>> I am in favor of running a per user namespace audit filter once per user
>> namespace walking up the user namespace hierarchy. Each filter would
>> deliver messages to a different userspace audit daemon.
>>
>> Until we agreement to go that far I am not certain the kernel generated
>> audit messages should go anywhere except to the global audit daemon.
>>
>> I think on an individual basis we can look at kernel audit messages and
>> see if they should go to just the global user namespace. Just the user
>> namspace of the relevant network stack. Or if the message should go to
>> the audit daemon of every user namespace that is an ancestor of some
>> starting user namespace.
>>
>> But please let's error on the side of caution here.
>
> Let's say I'd like to use userns but still have a single and global
> auditd.

By definition the kernel auditd functionality is broken if we don't
allow a global auditd. So that will be allowed.

> Dropping the capability will be required for that?

No. Dropping capabilities and being in a state where you can never
interact with the primary audit context is required to safely have
access to a subordinate audit context. Never being able to intereact
with the primary audit context removes all possibility of confusing
a privileged application and break the security model.

Entering a user namespace by definition drops all capabilities in a
non-recoverable way. Which makes being in a user namespace a sufficient
condition to use a subordinate audit context.

> That sounds
> bad. The fact new user namespaces will want to control the separated
> audit namespace doesn't require tie in.

No. The fact that you can gain root privileges by executing a suid root
application when not in a user namespace requires a tie in.


In summary. A subordinate audit context is not safe if you can still
execute a suid root application and become root. The interesting use
case is inside of a user namespace, which never allows gaining
capabilities outside of the user namespace. Which makes a tie in with
user namespaces sensible, as it never allows subverting the current
mechanisms and it is where people want the functionality.

Eric

2013-07-04 03:28:52

by Gao feng

[permalink] [raw]
Subject: Re: [Part1 PATCH 00/22] Add namespace support for audit

On 06/21/2013 11:48 AM, Gao feng wrote:
> On 06/20/2013 09:02 PM, Eric Paris wrote:
>> On Thu, 2013-06-20 at 11:02 +0800, Gao feng wrote:
>>> On 06/20/2013 04:51 AM, Eric Paris wrote:
>>>> On Wed, 2013-06-19 at 16:49 -0400, Aristeu Rozanski wrote:
>>>>> On Wed, Jun 19, 2013 at 09:53:32AM +0800, Gao feng wrote:
>>>>>> This patchset is first part of namespace support for audit.
>>>>>> in this patchset, the mainly resources of audit system have
>>>>>> been isolated. the audit filter, rules havn't been isolated
>>>>>> now. It will be implemented in Part2. We finished the isolation
>>>>>> of user audit message in this patchset.
>>>>>>
>>>>>> I choose to assign audit to the user namespace.
>>>>>> Right now,there are six kinds of namespaces, such as
>>>>>> net, mount, ipc, pid, uts and user. the first five
>>>>>> namespaces have special usage. the audit isn't suitable to
>>>>>> belong to these five namespaces, And since the flag of system
>>>>>> call clone is in short supply, we can't provide a new flag such
>>>>>> as CLONE_NEWAUDIT to enable audit namespace separately. so the
>>>>>> user namespace may be the best choice.
>>>>>
>>>>> I thought it was said on the last submission that to tie userns and
>>>>> audit namespace would be a bad idea?
>>>>
>>>> I consider it a non-starter. unpriv users are allowed to launch their
>>>> own user namespace. The whole point of audit is to have only a priv
>>>> user be allowed to make changes. If you tied audit namespace to user
>>>> namespace you grant an unpriv user the ability to modify audit.
>>>>
>>>
>>> I understand your views.
>>>
>>> But ven the unpriv user are allowed to make changes, they can do no harm.
>>> they can only make changes on the audit namespace they created.they can
>>> only communicate with the audit namespace they created.
>>
>> Imagine I set up my machine to audit all user access to super secret
>> information. With your patch set all an malicious user has to do is
>> create a new user namespace. Now when he accesses the super secret
>> information it will be logged inside the user namespace he created. So
>> he can just dump those logs in the trash.
>>
>
> No, my v1 patchset only log the user audit message(which generated through
> auditctl -m "xxx") inside user namespace.
>
> I agree with that we should not simply log audit message in the child
> audit namespace. for some global resource related audit messages, they
> should be logged in init audit namespace too.
>
>> I believe that each audit namespace should require priv
>> (CAP_AUDIT_CONTROL) in the user namespace that created the current audit
>> namespace. So lets say the machine boots and we are in the init_user
>> and init_audit namespace. Creating a new audit namespace should require
>> CAP_AUDIT_CONTROL in the init_user namespace. If instead we spawned a
>> new user namespace userns1 and try to create a new audit namespace, we
>> should STILL check for CAP_AUDIT_CONTROL in the init_user namespace.
>>
>
> Ok, I can add this permission check in next version, though this seems a
> litter strictness when we make sure child audit namespace won't fool the
> init audit namespace,
>
>> Assuming we've spawned auditns1 in userns1 and want to spawn auditns2 it
>> should require CAP_AUDIT_CONTROL in userns1. So now you only have
>> permission to change your audit config (create a new audit namespace) if
>> you already had permission to change your current audit config.
>>
>> Now how to handle coding this...
>>
>> When the kernel receives an audit message on the netlink socket it can
>> always check the current->[whatever] to figure out which audit namespace
>> it came from. Then it can be processed accordingly...
>
> Yes, this situation is easy to handle, since we are in process context...
> but in some situations, we are not running in process context... as I
> mentioned, audit messages generated by netfilter rules. current process
> is untrustable. we can only get meaningful net namespace in this situation.
> Actually, it's meaningful to send net related audit messages to the user
> namespace which creates this net namespace.
>
>
>>
>> Sending messages to the userspace auditd is a little more tricky. We
>> need to somehow map the audit namespace to a socket connected to auditd
>> in userspace. I'd imagine we'd have to either have per auditns backlog
>> queues and run one kauditd per audit namespace, or we'd have to tag the
>> skb's with the intended namespace somehow and then find the right socket
>> in kauditd. Either way it doesn't seem too onerous (although I admit, I
>> don't know how to code the per namespace kauditd right offhand)
>>
>
> As I said in "[PATCH 04/22] netlink: Add compare function for netlink_table",
> netlink and socket are private resources of net namespace. socket has no
> idea which audit namespace it belongs to,unless we add a field to mark this.
> Through I think we can archive audit namespace in your way,but maybe we should
> hack into the net namespace. I don't think the network guys will like it.
>
> There is one more thing we have to do if we don't tie audit namespace to user
> namespace. we have to implement the audit proc file(/proc/<pid>/ns/audit) and
> the clone/unshare/setns parts.
>
> I still this my way is the simplest and can satisfy your requirement.
>

Ping...
Eric, I need to know what's your comment..

Thanks