2014-02-03 19:11:35

by Andy Lutomirski

[permalink] [raw]
Subject: [PATCH v2.1] audit: Only use the syscall slowpath when syscall audit rules exist

This toggles TIF_SYSCALL_AUDIT as needed when rules change instead of
leaving it set whenever rules might be set in the future. This reduces
syscall latency from >60ns to closer to 40ns on my laptop.

Cc: Oleg Nesterov <[email protected]>
Cc: Steve Grubb <[email protected]>
Cc: Eric Paris <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
---

This brown paper bag release is brought to you by git commit's -a flag.

Changes from v2: Contains the correct patch

Changes from v1:
- For new tasks, set flags in a new audit_sync_flags callback instead of
in audit_alloc (thanks, Oleg).
- Rework locking.
- Use irqsave/irqrestore to avoid having to think about who else might have
taken spinlocks.

include/linux/audit.h | 11 ++++++++--
kernel/auditfilter.c | 4 ++--
kernel/auditsc.c | 57 +++++++++++++++++++++++++++++++++++++++++++++------
kernel/fork.c | 3 +++
4 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index a406419..a81f498 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -298,7 +298,9 @@ static inline void audit_mmap_fd(int fd, int flags)
__audit_mmap_fd(fd, flags);
}

-extern int audit_n_rules;
+extern void audit_inc_n_rules(void);
+extern void audit_dec_n_rules(void);
+extern void audit_sync_flags(struct task_struct *tsk);
extern int audit_signals;
#else /* CONFIG_AUDITSYSCALL */
static inline int audit_alloc(struct task_struct *task)
@@ -404,7 +406,12 @@ static inline void audit_mmap_fd(int fd, int flags)
{ }
static inline void audit_ptrace(struct task_struct *t)
{ }
-#define audit_n_rules 0
+static inline void audit_inc_n_rules(void)
+{ }
+static inline void audit_dec_n_rules(void)
+{ }
+static inline void audit_sync_flags(struct task_struct *tsk)
+{ }
#define audit_signals 0
#endif /* CONFIG_AUDITSYSCALL */

diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index 51f3fd4..0ce531d 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -903,7 +903,7 @@ static inline int audit_add_rule(struct audit_entry *entry)
}
#ifdef CONFIG_AUDITSYSCALL
if (!dont_count)
- audit_n_rules++;
+ audit_inc_n_rules();

if (!audit_match_signal(entry))
audit_signals++;
@@ -955,7 +955,7 @@ static inline int audit_del_rule(struct audit_entry *entry)

#ifdef CONFIG_AUDITSYSCALL
if (!dont_count)
- audit_n_rules--;
+ audit_dec_n_rules();

if (!audit_match_signal(entry))
audit_signals--;
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 90594c9..cd44c88 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -79,8 +79,13 @@
/* no execve audit message should be longer than this (userspace limits) */
#define MAX_EXECVE_AUDIT_LEN 7500

-/* number of audit rules */
-int audit_n_rules;
+/*
+ * number of audit rules
+ *
+ * To change this, you must hold audit_filter_mutex *and* have a read lock
+ * on tasklist_lock.
+ */
+static int audit_n_rules;

/* determines whether we collect data for signals sent */
int audit_signals;
@@ -911,6 +916,40 @@ static inline struct audit_context *audit_alloc_context(enum audit_state state)
return context;
}

+void audit_inc_n_rules()
+{
+ struct task_struct *p, *g;
+ unsigned long flags;
+
+ read_lock_irqsave(&tasklist_lock, flags);
+ if (audit_n_rules++ == 0) {
+ do_each_thread(g, p) {
+ if (p->audit_context)
+ set_tsk_thread_flag(p, TIF_SYSCALL_AUDIT);
+ } while_each_thread(g, p);
+ }
+ read_unlock_irqrestore(&tasklist_lock, flags);
+}
+
+void audit_dec_n_rules()
+{
+ struct task_struct *p, *g;
+ unsigned long flags;
+
+ read_lock_irqsave(&tasklist_lock, flags);
+
+ --audit_n_rules;
+ BUG_ON(audit_n_rules < 0);
+
+ if (audit_n_rules == 0) {
+ do_each_thread(g, p) {
+ clear_tsk_thread_flag(p, TIF_SYSCALL_AUDIT);
+ } while_each_thread(g, p);
+ }
+
+ read_unlock_irqrestore(&tasklist_lock, flags);
+}
+
/**
* audit_alloc - allocate an audit context block for a task
* @tsk: task
@@ -930,10 +969,8 @@ int audit_alloc(struct task_struct *tsk)
return 0; /* Return if not auditing. */

state = audit_filter_task(tsk, &key);
- if (state == AUDIT_DISABLED) {
- clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
+ if (state == AUDIT_DISABLED)
return 0;
- }

if (!(context = audit_alloc_context(state))) {
kfree(key);
@@ -943,10 +980,18 @@ int audit_alloc(struct task_struct *tsk)
context->filterkey = key;

tsk->audit_context = context;
- set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
return 0;
}

+void audit_sync_flags(struct task_struct *tsk)
+{
+ /* The caller has a write lock on tasklist_lock. */
+ if (audit_n_rules && tsk->audit_context)
+ set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
+ else
+ clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
+}
+
static inline void audit_free_context(struct audit_context *context)
{
audit_free_names(context);
diff --git a/kernel/fork.c b/kernel/fork.c
index dfa736c..3f28173 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1479,6 +1479,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,

total_forks++;
spin_unlock(&current->sighand->siglock);
+
+ audit_sync_flags(p);
+
write_unlock_irq(&tasklist_lock);
proc_fork_connector(p);
cgroup_post_fork(p);
--
1.8.5.3


2014-02-03 20:36:08

by Eric Paris

[permalink] [raw]
Subject: Re: [PATCH v2.1] audit: Only use the syscall slowpath when syscall audit rules exist

Hmmmm,

My problem with doing this has always actually been because of SELinux.
Knowing syscall information with AVCs can be a huge help running down
problems. We already make people load rules if they want to get
pathname type records, so maybe this is fine. Or we could make SELinux
take a reference on the number of audit rules, but that was your
particular use case. Not sure how I feel about losing syscall
information by default on AVCs...

Do we always have audit_context allocated? I need to look how the TIF
and audit_context are correlated.

For a completely seperate non-audit patch idea I've toyed with making
the arch/syscall_nr a0,a1,a2,a3 stored in task struct rather than audit
context. Would mean that recording that information on syscall entry
could be fast/easy and done quickly in syscall entry assembly code.
Then on entry we could track only if there are rules on the 'entry' list
and skip if none. On exit we could do the same only with exit rules.
Right now all 3 of those different things are tracked in
TIF_SYSCALL_AUDIT (As I recall the slow path is usually a lot of things
other than audit, but audit is what forces us onto the slow patch)

On Mon, 2014-02-03 at 11:11 -0800, Andy Lutomirski wrote:
> This toggles TIF_SYSCALL_AUDIT as needed when rules change instead of
> leaving it set whenever rules might be set in the future. This reduces
> syscall latency from >60ns to closer to 40ns on my laptop.
>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Steve Grubb <[email protected]>
> Cc: Eric Paris <[email protected]>
> Signed-off-by: Andy Lutomirski <[email protected]>
> ---
>
> This brown paper bag release is brought to you by git commit's -a flag.
>
> Changes from v2: Contains the correct patch
>
> Changes from v1:
> - For new tasks, set flags in a new audit_sync_flags callback instead of
> in audit_alloc (thanks, Oleg).
> - Rework locking.
> - Use irqsave/irqrestore to avoid having to think about who else might have
> taken spinlocks.
>
> include/linux/audit.h | 11 ++++++++--
> kernel/auditfilter.c | 4 ++--
> kernel/auditsc.c | 57 +++++++++++++++++++++++++++++++++++++++++++++------
> kernel/fork.c | 3 +++
> 4 files changed, 65 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index a406419..a81f498 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -298,7 +298,9 @@ static inline void audit_mmap_fd(int fd, int flags)
> __audit_mmap_fd(fd, flags);
> }
>
> -extern int audit_n_rules;
> +extern void audit_inc_n_rules(void);
> +extern void audit_dec_n_rules(void);
> +extern void audit_sync_flags(struct task_struct *tsk);
> extern int audit_signals;
> #else /* CONFIG_AUDITSYSCALL */
> static inline int audit_alloc(struct task_struct *task)
> @@ -404,7 +406,12 @@ static inline void audit_mmap_fd(int fd, int flags)
> { }
> static inline void audit_ptrace(struct task_struct *t)
> { }
> -#define audit_n_rules 0
> +static inline void audit_inc_n_rules(void)
> +{ }
> +static inline void audit_dec_n_rules(void)
> +{ }
> +static inline void audit_sync_flags(struct task_struct *tsk)
> +{ }
> #define audit_signals 0
> #endif /* CONFIG_AUDITSYSCALL */
>
> diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> index 51f3fd4..0ce531d 100644
> --- a/kernel/auditfilter.c
> +++ b/kernel/auditfilter.c
> @@ -903,7 +903,7 @@ static inline int audit_add_rule(struct audit_entry *entry)
> }
> #ifdef CONFIG_AUDITSYSCALL
> if (!dont_count)
> - audit_n_rules++;
> + audit_inc_n_rules();
>
> if (!audit_match_signal(entry))
> audit_signals++;
> @@ -955,7 +955,7 @@ static inline int audit_del_rule(struct audit_entry *entry)
>
> #ifdef CONFIG_AUDITSYSCALL
> if (!dont_count)
> - audit_n_rules--;
> + audit_dec_n_rules();
>
> if (!audit_match_signal(entry))
> audit_signals--;
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 90594c9..cd44c88 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -79,8 +79,13 @@
> /* no execve audit message should be longer than this (userspace limits) */
> #define MAX_EXECVE_AUDIT_LEN 7500
>
> -/* number of audit rules */
> -int audit_n_rules;
> +/*
> + * number of audit rules
> + *
> + * To change this, you must hold audit_filter_mutex *and* have a read lock
> + * on tasklist_lock.
> + */
> +static int audit_n_rules;
>
> /* determines whether we collect data for signals sent */
> int audit_signals;
> @@ -911,6 +916,40 @@ static inline struct audit_context *audit_alloc_context(enum audit_state state)
> return context;
> }
>
> +void audit_inc_n_rules()
> +{
> + struct task_struct *p, *g;
> + unsigned long flags;
> +
> + read_lock_irqsave(&tasklist_lock, flags);
> + if (audit_n_rules++ == 0) {
> + do_each_thread(g, p) {
> + if (p->audit_context)
> + set_tsk_thread_flag(p, TIF_SYSCALL_AUDIT);
> + } while_each_thread(g, p);
> + }
> + read_unlock_irqrestore(&tasklist_lock, flags);
> +}
> +
> +void audit_dec_n_rules()
> +{
> + struct task_struct *p, *g;
> + unsigned long flags;
> +
> + read_lock_irqsave(&tasklist_lock, flags);
> +
> + --audit_n_rules;
> + BUG_ON(audit_n_rules < 0);
> +
> + if (audit_n_rules == 0) {
> + do_each_thread(g, p) {
> + clear_tsk_thread_flag(p, TIF_SYSCALL_AUDIT);
> + } while_each_thread(g, p);
> + }
> +
> + read_unlock_irqrestore(&tasklist_lock, flags);
> +}
> +
> /**
> * audit_alloc - allocate an audit context block for a task
> * @tsk: task
> @@ -930,10 +969,8 @@ int audit_alloc(struct task_struct *tsk)
> return 0; /* Return if not auditing. */
>
> state = audit_filter_task(tsk, &key);
> - if (state == AUDIT_DISABLED) {
> - clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
> + if (state == AUDIT_DISABLED)
> return 0;
> - }
>
> if (!(context = audit_alloc_context(state))) {
> kfree(key);
> @@ -943,10 +980,18 @@ int audit_alloc(struct task_struct *tsk)
> context->filterkey = key;
>
> tsk->audit_context = context;
> - set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
> return 0;
> }
>
> +void audit_sync_flags(struct task_struct *tsk)
> +{
> + /* The caller has a write lock on tasklist_lock. */
> + if (audit_n_rules && tsk->audit_context)
> + set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
> + else
> + clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
> +}
> +
> static inline void audit_free_context(struct audit_context *context)
> {
> audit_free_names(context);
> diff --git a/kernel/fork.c b/kernel/fork.c
> index dfa736c..3f28173 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1479,6 +1479,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>
> total_forks++;
> spin_unlock(&current->sighand->siglock);
> +
> + audit_sync_flags(p);
> +
> write_unlock_irq(&tasklist_lock);
> proc_fork_connector(p);
> cgroup_post_fork(p);

2014-02-03 21:10:20

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v2.1] audit: Only use the syscall slowpath when syscall audit rules exist

On Mon, Feb 3, 2014 at 12:35 PM, Eric Paris <[email protected]> wrote:
> Hmmmm,
>
> My problem with doing this has always actually been because of SELinux.
> Knowing syscall information with AVCs can be a huge help running down
> problems. We already make people load rules if they want to get
> pathname type records, so maybe this is fine. Or we could make SELinux
> take a reference on the number of audit rules, but that was your
> particular use case. Not sure how I feel about losing syscall
> information by default on AVCs...
>

Hmm. I never noticed that feature. That'll show me :)

I would personally happily give that up for a 30-50% reduction in
systemwide syscall latency.

Looking at the 64-bit syscall entry code, it looks like the syscall nr
is always saved to pt_regs (as orig_rax) and the arguments are shoved
into the usual places in pt_regs. Have you ever tried using
syscall_get_nr and syscall_get_arguments from the audit code without
setting TIF_SYSCALL_AUDIT? I may be missing something here, but it
looks like it'll work.

> Do we always have audit_context allocated? I need to look how the TIF
> and audit_context are correlated.

In 3.13, TIF_SYSCALL_AUDIT is set iff audit_context is allocated. In
3.12 and below that was not the case due to a bug.

>
> For a completely seperate non-audit patch idea I've toyed with making
> the arch/syscall_nr a0,a1,a2,a3 stored in task struct rather than audit
> context. Would mean that recording that information on syscall entry
> could be fast/easy and done quickly in syscall entry assembly code.
> Then on entry we could track only if there are rules on the 'entry' list
> and skip if none. On exit we could do the same only with exit rules.
> Right now all 3 of those different things are tracked in
> TIF_SYSCALL_AUDIT (As I recall the slow path is usually a lot of things
> other than audit, but audit is what forces us onto the slow patch)

I think that you can already fish out the syscall args on x86_64 at
least. The attached awful patch appears to work, for example.

Test code:

#include <stdio.h>
#include <linux/prctl.h>

int main(int argc, char **argv)
{
if (argc != 5) {
printf("Usage: test_pr500 arg2 arg3 arg4 arg5\n");
return 1;
}

prctl(500, atoi(argv[1]), atoi(argv[2]), atoi(argv[3]), atoi(argv[4]));
return 0;
}

FWIW, I don't know *why* the syscall fast path does this, but it's
convenient for this use :)

Out of curiosity, why does the audit code ignore a4 and a5?

--Andy


Attachments:
awful_syscall_hack.patch (810.00 B)

2014-02-04 16:50:47

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2.1] audit: Only use the syscall slowpath when syscall audit rules exist

On 02/03, Andy Lutomirski wrote:
>
> +void audit_inc_n_rules()
> +{
> + struct task_struct *p, *g;
> + unsigned long flags;
> +
> + read_lock_irqsave(&tasklist_lock, flags);

Confused... read_lock(tasklist) doesn't need to disable irqs.

(ftrace does this for no reason too, perhaps I should resend the patch)

> + if (audit_n_rules++ == 0) {

probably this can be done outside of read_lock?

> + do_each_thread(g, p) {

for_each_process_thread ;) do_each_thread will die, I hope.

> +void audit_dec_n_rules()
> +{
> + struct task_struct *p, *g;
> + unsigned long flags;
> +
> + read_lock_irqsave(&tasklist_lock, flags);
> +
> + --audit_n_rules;
> + BUG_ON(audit_n_rules < 0);
> +
> + if (audit_n_rules == 0) {
> + do_each_thread(g, p) {
> + clear_tsk_thread_flag(p, TIF_SYSCALL_AUDIT);
> + } while_each_thread(g, p);
> + }

The same, and...

On a second thought it seems that audit_dec_n_rules() has a problem.
Note the BUG_ON(context->in_syscall) in __audit_syscall_entry().

Suppose that audit_dec_n_rules() clears TIF_SYSCALL_AUDIT when a task
runs a syscall. In this case (afaics) __audit_syscall_exit() won't be
called. The next audit_inc_n_rules() can set TIF_SYSCALL_AUDIT and
trigger another __audit_syscall_entry() which will hit this BUG_ON().

And in general it doesn't look safe although I know almost nothing
about audit. I mean, currently __audit_syscall_entry() or
__audit_log_bprm_fcaps() assume that __audit_syscall_exit() or
__audit_free() will "cleanup" ->audit_context, perhaps we should not
break the rules?

Once again, I do not pretend I understand this code, this is the
question, not the comment.

But if I am right, then TIF_SYSCALL_AUDIT should be cleared in
__audit_syscall_exit() as you suggested before.

Oleg.

2014-02-04 18:11:48

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v2.1] audit: Only use the syscall slowpath when syscall audit rules exist

On Tue, Feb 4, 2014 at 8:50 AM, Oleg Nesterov <[email protected]> wrote:
> On 02/03, Andy Lutomirski wrote:
>>
>> +void audit_inc_n_rules()
>> +{
>> + struct task_struct *p, *g;
>> + unsigned long flags;
>> +
>> + read_lock_irqsave(&tasklist_lock, flags);
>
> Confused... read_lock(tasklist) doesn't need to disable irqs.
>
> (ftrace does this for no reason too, perhaps I should resend the patch)

Is this because there are no interrupt handlers that write_lock(tasklist)?

>
>> + if (audit_n_rules++ == 0) {
>
> probably this can be done outside of read_lock?

I don't think so. I'm cheating and using the tasklist_lock to prevent
audit_sync_flags from racing against this increment, so this needs to
be inside the lock. I'll add a comment.

>
>> + do_each_thread(g, p) {
>
> for_each_process_thread ;) do_each_thread will die, I hope.
>

Sorry, forgot to mention: where is this mythical
for_each_process_thread? Either it only exists in a kernel version
I'm not looking at, my grepping skills aren't up to the task, or you
just hate do_each_thread so much that you imagined up an alternative
:)

>> +void audit_dec_n_rules()
>> +{
>> + struct task_struct *p, *g;
>> + unsigned long flags;
>> +
>> + read_lock_irqsave(&tasklist_lock, flags);
>> +
>> + --audit_n_rules;
>> + BUG_ON(audit_n_rules < 0);
>> +
>> + if (audit_n_rules == 0) {
>> + do_each_thread(g, p) {
>> + clear_tsk_thread_flag(p, TIF_SYSCALL_AUDIT);
>> + } while_each_thread(g, p);
>> + }
>
> The same, and...
>
> On a second thought it seems that audit_dec_n_rules() has a problem.
> Note the BUG_ON(context->in_syscall) in __audit_syscall_entry().
>
> Suppose that audit_dec_n_rules() clears TIF_SYSCALL_AUDIT when a task
> runs a syscall. In this case (afaics) __audit_syscall_exit() won't be
> called. The next audit_inc_n_rules() can set TIF_SYSCALL_AUDIT and
> trigger another __audit_syscall_entry() which will hit this BUG_ON().
>

Egads!

> And in general it doesn't look safe although I know almost nothing
> about audit. I mean, currently __audit_syscall_entry() or
> __audit_log_bprm_fcaps() assume that __audit_syscall_exit() or
> __audit_free() will "cleanup" ->audit_context, perhaps we should not
> break the rules?
>
> Once again, I do not pretend I understand this code, this is the
> question, not the comment.
>
> But if I am right, then TIF_SYSCALL_AUDIT should be cleared in
> __audit_syscall_exit() as you suggested before.

I think I'll wait for Eric to chime in. I suspect that the real
solution is to simplify all this stuff by relying on the fact that the
syscall nr and args are saved by the (fast path and slow path) entry
code, so the audit entry hook may be entirely unnecessary. Or maybe a
new TIF flag would be needed to make it work.

Anyway, *grumble*. My only real interest in this stuff is to get rid
of the overhead. I don't actually want syscall auditing on any of my
boxes (in contrast to avc auditing, which is useful but (mostly)
independent).

--Andy

2014-02-04 19:11:41

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2.1] audit: Only use the syscall slowpath when syscall audit rules exist

On 02/04, Andy Lutomirski wrote:
>
> On Tue, Feb 4, 2014 at 8:50 AM, Oleg Nesterov <[email protected]> wrote:
> > On 02/03, Andy Lutomirski wrote:
> >>
> >> +void audit_inc_n_rules()
> >> +{
> >> + struct task_struct *p, *g;
> >> + unsigned long flags;
> >> +
> >> + read_lock_irqsave(&tasklist_lock, flags);
> >
> > Confused... read_lock(tasklist) doesn't need to disable irqs.
> >
> > (ftrace does this for no reason too, perhaps I should resend the patch)
>
> Is this because there are no interrupt handlers that write_lock(tasklist)?

Yes.

> >
> >> + if (audit_n_rules++ == 0) {
> >
> > probably this can be done outside of read_lock?
>
> I don't think so. I'm cheating and using the tasklist_lock to prevent
> audit_sync_flags

Ah, yes, you are right.

> >> + do_each_thread(g, p) {
> >
> > for_each_process_thread ;) do_each_thread will die, I hope.
> >
>
> Sorry, forgot to mention: where is this mythical
> for_each_process_thread?

In Linus's tree, please see 0c740d0afc3bff.

> or you
> just hate do_each_thread so much that you imagined up an alternative
> :)

sort of ;)

> I think I'll wait for Eric to chime in. I suspect that the real
> solution is to simplify all this stuff by relying on the fact that the
> syscall nr and args are saved by the (fast path and slow path) entry
> code, so the audit entry hook may be entirely unnecessary.

Perhaps... but even in this case we need to do something with, say,
__audit_log_bprm_fcaps().

At least this list should not grow indefinitely if the task skips
__audit_syscall_exit(). Although at first glance this can be probably
cleanuped too.

Oleg.

2014-02-04 19:32:49

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v2.1] audit: Only use the syscall slowpath when syscall audit rules exist

On Tue, Feb 4, 2014 at 11:11 AM, Oleg Nesterov <[email protected]> wrote:
> On 02/04, Andy Lutomirski wrote:
>> On Tue, Feb 4, 2014 at 8:50 AM, Oleg Nesterov <[email protected]> wrote:
>> > On 02/03, Andy Lutomirski wrote:
>> Sorry, forgot to mention: where is this mythical
>> for_each_process_thread?
>
> In Linus's tree, please see 0c740d0afc3bff.
>
>> or you
>> just hate do_each_thread so much that you imagined up an alternative
>> :)
>
> sort of ;)

Aha -- it probably got merged just after I pulled Linus' tree and
looked for it. Or something.

>
>> I think I'll wait for Eric to chime in. I suspect that the real
>> solution is to simplify all this stuff by relying on the fact that the
>> syscall nr and args are saved by the (fast path and slow path) entry
>> code, so the audit entry hook may be entirely unnecessary.
>
> Perhaps... but even in this case we need to do something with, say,
> __audit_log_bprm_fcaps().
>
> At least this list should not grow indefinitely if the task skips
> __audit_syscall_exit(). Although at first glance this can be probably
> cleanuped too.

OK, here's a thought: let's change the semantics of TIF_SYSCALL_AUDIT.
New semantics:

TIF_SYSCALL_AUDIT is set if (the task is eligible for syscall auditing
and n_rules != 0 *or* something has started writing an audit record).
TIF_SYSCALL_AUDIT is *never* cleared by anything other than
copy_process or __audit_syscall_exit.

This means that, if there's a pending audit record, then
TIF_SYSCALL_AUDIT will be set. That, in turn, means that
__audit_syscall_exit will be called, which avoids the BUGs you pointed
out.

Now we get rid of __audit_syscall_entry. (This speeds up even the
auditing-is-on case.) Instead we have __audit_start_record, which
does more or less the same thing, except that (a) it doesn't BUG if
in_syscall and (b) it *sets* TIF_SYSCALL_AUDIT. This relies on the
fact that syscall_get_nr and syscall_get_arguments are reliable on
x86_64. I suspect that they're reliable everywhere else, too. The
idea is that there's nothing wrong with calling __audit_start_record
more than once. (Maybe it should be called
__audit_record_this_syscall.)

To finish the job, we change __audit_syscall_exit to clear
TIF_SYSCALL_AUDIT if n_rules==0. inc_n_rules can set
TIF_SYSCALL_AUDIT (without even needing to worry about races, I
think). dec_n_rules doesn't need to do anything special.

Benefits:
- Removing the syscall entry hook speeds everyone up. Even silly
people who use exit rules :)
- There's no need to think about mismatched entry/exit hook calls,
since there is no entry hook.
- The same mechanism could be reused for non-audit purposes. Any
code could say, at any point "hey, this syscall is interesting. let's
record it."

Disadvantages:
- Need to check other architectures to make sure that syscall_get_xyz
works reliably for fast path syscalls.
- For the full performance boost, all architectures need to avoid
checking TIF_SYSCALL_AUDIT in the entry path. I prefer not to mess
with non-x86 assembly, so I won't do that part, since it's not
required for correctness.

Eric, any thoughts?

--Andy

2014-02-05 13:46:53

by Eric Paris

[permalink] [raw]
Subject: Re: [PATCH v2.1] audit: Only use the syscall slowpath when syscall audit rules exist

On Mon, 2014-02-03 at 11:11 -0800, Andy Lutomirski wrote:

> +void audit_inc_n_rules()
> +{
> + struct task_struct *p, *g;
> + unsigned long flags;
> +
> + read_lock_irqsave(&tasklist_lock, flags);
> + if (audit_n_rules++ == 0) {

I know it's right, but it's too clever for me :) If we do end up
adding something like this Can we just do:
if (!audit_n_rules) {}
audit_n_rules++

I like dumb code :)

> + do_each_thread(g, p) {
> + if (p->audit_context)
> + set_tsk_thread_flag(p, TIF_SYSCALL_AUDIT);
> + } while_each_thread(g, p);
> + }
> + read_unlock_irqrestore(&tasklist_lock, flags);
> +}