2014-02-03 17:53:30

by Andy Lutomirski

[permalink] [raw]
Subject: [PATCH] audit: Only use the syscall slowpath when syscall audit rules exist

This toggles TIF_SYSCALL_AUDIT as needed when rules change instead of
leaving it set whenever rules might be set in the future. This reduces
syscall latency from >60ns to closer to 40ns on my laptop.

Cc: Oleg Nesterov <[email protected]>
Cc: Steve Grubb <[email protected]>
Cc: Eric Paris <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
---

This is not the best-tested patch in the world. Someone who actually
knows how to use syscall auditing should probably give it a spin. It
fixes an IMO huge performance issue, though.

include/linux/audit.h | 9 +++++--
kernel/auditfilter.c | 4 ++--
kernel/auditsc.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++++---
3 files changed, 71 insertions(+), 7 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index a406419..534ba85 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -298,7 +298,9 @@ static inline void audit_mmap_fd(int fd, int flags)
__audit_mmap_fd(fd, flags);
}

-extern int audit_n_rules;
+extern void audit_inc_n_rules(void);
+extern void audit_dec_n_rules(void);
+
extern int audit_signals;
#else /* CONFIG_AUDITSYSCALL */
static inline int audit_alloc(struct task_struct *task)
@@ -404,7 +406,10 @@ static inline void audit_mmap_fd(int fd, int flags)
{ }
static inline void audit_ptrace(struct task_struct *t)
{ }
-#define audit_n_rules 0
+static inline void audit_inc_n_rules(void)
+{ }
+static inline void audit_dec_n_rules(void)
+{ }
#define audit_signals 0
#endif /* CONFIG_AUDITSYSCALL */

diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index 51f3fd4..0ce531d 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -903,7 +903,7 @@ static inline int audit_add_rule(struct audit_entry *entry)
}
#ifdef CONFIG_AUDITSYSCALL
if (!dont_count)
- audit_n_rules++;
+ audit_inc_n_rules();

if (!audit_match_signal(entry))
audit_signals++;
@@ -955,7 +955,7 @@ static inline int audit_del_rule(struct audit_entry *entry)

#ifdef CONFIG_AUDITSYSCALL
if (!dont_count)
- audit_n_rules--;
+ audit_dec_n_rules();

if (!audit_match_signal(entry))
audit_signals--;
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 90594c9..363d0bb 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -79,8 +79,15 @@
/* no execve audit message should be longer than this (userspace limits) */
#define MAX_EXECVE_AUDIT_LEN 7500

-/* number of audit rules */
-int audit_n_rules;
+/*
+ * number of audit rules
+ *
+ * n_rules_lock protects audit_n_rules. It also prevents audit_alloc from
+ * racing against changes to audit_n_rules: to change someone else's
+ * audit_context or TIF_SYSCALL_AUDIT, you need write_lock(&n_rules_lock).
+ */
+static DEFINE_RWLOCK(n_rules_lock);
+static int audit_n_rules;

/* determines whether we collect data for signals sent */
int audit_signals;
@@ -911,6 +918,47 @@ static inline struct audit_context *audit_alloc_context(enum audit_state state)
return context;
}

+void audit_inc_n_rules()
+{
+ struct task_struct *p, *g;
+
+ write_lock(&n_rules_lock);
+
+ if (audit_n_rules++ != 0)
+ goto out; /* The overall state isn't changing. */
+
+ read_lock(&tasklist_lock);
+ do_each_thread(g, p) {
+ if (p->audit_context)
+ set_tsk_thread_flag(p, TIF_SYSCALL_AUDIT);
+ } while_each_thread(g, p);
+ read_unlock(&tasklist_lock);
+
+out:
+ write_unlock(&n_rules_lock);
+}
+
+void audit_dec_n_rules()
+{
+ struct task_struct *p, *g;
+
+ write_lock(&n_rules_lock);
+ --audit_n_rules;
+ BUG_ON(audit_n_rules < 0);
+
+ if (audit_n_rules != 0)
+ goto out; /* The overall state isn't changing. */
+
+ read_lock(&tasklist_lock);
+ do_each_thread(g, p) {
+ clear_tsk_thread_flag(p, TIF_SYSCALL_AUDIT);
+ } while_each_thread(g, p);
+ read_unlock(&tasklist_lock);
+
+out:
+ write_unlock(&n_rules_lock);
+}
+
/**
* audit_alloc - allocate an audit context block for a task
* @tsk: task
@@ -931,6 +979,11 @@ int audit_alloc(struct task_struct *tsk)

state = audit_filter_task(tsk, &key);
if (state == AUDIT_DISABLED) {
+ /*
+ * There is no relevant race against inc/dec_n_rules
+ * here -- inc won't set TIF_SYSCALL_AUDIT, and, if dec
+ * clears it redundantly, there's no harm done.
+ */
clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
return 0;
}
@@ -942,8 +995,14 @@ int audit_alloc(struct task_struct *tsk)
}
context->filterkey = key;

+ read_lock(&n_rules_lock);
tsk->audit_context = context;
- set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
+ if (audit_n_rules)
+ set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
+ else
+ clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
+ read_unlock(&n_rules_lock);
+
return 0;
}

--
1.8.5.3


2014-02-03 18:11:57

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] audit: Only use the syscall slowpath when syscall audit rules exist

On 02/03, Andy Lutomirski wrote:
>
> @@ -911,6 +918,47 @@ static inline struct audit_context *audit_alloc_context(enum audit_state state)
> return context;
> }
>
> +void audit_inc_n_rules()
> +{
> + struct task_struct *p, *g;
> +
> + write_lock(&n_rules_lock);
> +
> + if (audit_n_rules++ != 0)
> + goto out; /* The overall state isn't changing. */
> +
> + read_lock(&tasklist_lock);
> + do_each_thread(g, p) {
> + if (p->audit_context)
> + set_tsk_thread_flag(p, TIF_SYSCALL_AUDIT);
> + } while_each_thread(g, p);
> + read_unlock(&tasklist_lock);

Cosmetic, but I'd suggest to use for_each_process_thread() instead
of do_each_thread/while_each_thread.

And I am not sure why n_rules_lock is rwlock_t... OK, to make
audit_alloc() more scalable, I guess. Please see below.

> @@ -942,8 +995,14 @@ int audit_alloc(struct task_struct *tsk)
> }
> context->filterkey = key;
>
> + read_lock(&n_rules_lock);
> tsk->audit_context = context;
> - set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
> + if (audit_n_rules)
> + set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
> + else
> + clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
> + read_unlock(&n_rules_lock);

Perhaps this is fine, but n_rules_lock can't prevent the race with
audit_inc/dec_n_rules(). The problem is, this is called before the
new task is visible to for_each_process_thread().

If we want to fix this race, we need something like audit_sync_flags()
called after copy_process() drops tasklist, or from tasklist_lock
protected section (in this case it doesn't need n_rules_lock).

Or perhaps audit_alloc() should not try to clear TIF_SYSCALL_AUDIT at all.
In both cases n_rules_lock can be spinlock_t.

Oleg.

2014-02-03 18:33:54

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] audit: Only use the syscall slowpath when syscall audit rules exist

On Mon, Feb 3, 2014 at 10:11 AM, Oleg Nesterov <[email protected]> wrote:
> On 02/03, Andy Lutomirski wrote:
>>
>> @@ -911,6 +918,47 @@ static inline struct audit_context *audit_alloc_context(enum audit_state state)
>> return context;
>> }
>>
>> +void audit_inc_n_rules()
>> +{
>> + struct task_struct *p, *g;
>> +
>> + write_lock(&n_rules_lock);
>> +
>> + if (audit_n_rules++ != 0)
>> + goto out; /* The overall state isn't changing. */
>> +
>> + read_lock(&tasklist_lock);
>> + do_each_thread(g, p) {
>> + if (p->audit_context)
>> + set_tsk_thread_flag(p, TIF_SYSCALL_AUDIT);
>> + } while_each_thread(g, p);
>> + read_unlock(&tasklist_lock);
>
> Cosmetic, but I'd suggest to use for_each_process_thread() instead
> of do_each_thread/while_each_thread.

I didn't notice that option :)

>
> And I am not sure why n_rules_lock is rwlock_t... OK, to make
> audit_alloc() more scalable, I guess. Please see below.
>

Yes. I should probably also use the irqsave variant.

>> @@ -942,8 +995,14 @@ int audit_alloc(struct task_struct *tsk)
>> }
>> context->filterkey = key;
>>
>> + read_lock(&n_rules_lock);
>> tsk->audit_context = context;
>> - set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
>> + if (audit_n_rules)
>> + set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
>> + else
>> + clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
>> + read_unlock(&n_rules_lock);
>
> Perhaps this is fine, but n_rules_lock can't prevent the race with
> audit_inc/dec_n_rules(). The problem is, this is called before the
> new task is visible to for_each_process_thread().

Hmm. I missed that.

>
> If we want to fix this race, we need something like audit_sync_flags()
> called after copy_process() drops tasklist, or from tasklist_lock
> protected section (in this case it doesn't need n_rules_lock).
>
> Or perhaps audit_alloc() should not try to clear TIF_SYSCALL_AUDIT at all.
> In both cases n_rules_lock can be spinlock_t.

Here are two options:

1. Sync the flags inside tasklist_lock, as I think you're suggesting.
This seems simple.

2. Make this whole thing lazy -- always set TIF_SYSCALL_AUDIT for new
tasks, but clear it on the first syscall.

I'm leaning toward number 1.

Thanks for the instant review!

--Andy

2014-02-03 20:24:18

by Steve Grubb

[permalink] [raw]
Subject: Re: [PATCH] audit: Only use the syscall slowpath when syscall audit rules exist

On Monday, February 03, 2014 09:53:23 AM Andy Lutomirski wrote:
> This toggles TIF_SYSCALL_AUDIT as needed when rules change instead of
> leaving it set whenever rules might be set in the future. This reduces
> syscall latency from >60ns to closer to 40ns on my laptop.

Does this mean that we have processes that don't have the TIF_SYSCALL_AUDIT
flag set? When rules get loaded, how do we get the flag put back into all
processes?

The theory of ops is supposed to be that for anyone not needing audit, there
is only the cost of "if (tif & TIF_SYSCALL_AUDIT)". That should be it. If you
have audit enabled or had it enabled (which means it might be loaded with new
rules), we want to inspect the syscall. There should be a short circuit based
on checking that any rules has ever been loaded or are currently loaded before
doing any real collection.

-Steve

2014-02-03 22:09:06

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] audit: Only use the syscall slowpath when syscall audit rules exist

On Mon, Feb 3, 2014 at 12:23 PM, Steve Grubb <[email protected]> wrote:
> On Monday, February 03, 2014 09:53:23 AM Andy Lutomirski wrote:
>> This toggles TIF_SYSCALL_AUDIT as needed when rules change instead of
>> leaving it set whenever rules might be set in the future. This reduces
>> syscall latency from >60ns to closer to 40ns on my laptop.
>
> Does this mean that we have processes that don't have the TIF_SYSCALL_AUDIT
> flag set? When rules get loaded, how do we get the flag put back into all
> processes?

By looping over all processes and setting the flag, which is what my patch does.

>
> The theory of ops is supposed to be that for anyone not needing audit, there
> is only the cost of "if (tif & TIF_SYSCALL_AUDIT)".

On current kernels *all* processes have TIF_SYSCALL_AUDIT, even if
they don't need auditing because there's nothing to audit. So
everything pays the full cost.

> That should be it. If you
> have audit enabled or had it enabled (which means it might be loaded with new
> rules), we want to inspect the syscall.
>

My point is that there's nothing to inspect -- there are no rules.
Unless the audit code needs to do something just in case a non-syscall
audit event gets written, in which case the audit code should IMO be
fixed. (This is what Eric is talking about, I think.)

--Andy