2011-03-07 21:07:07

by Tony Jones

[permalink] [raw]
Subject: PATCH [1/1]: audit: acquire creds selectively to reduce atomic op overhead

Commit c69e8d9c01db added calls to get_task_cred and put_cred in
audit_filter_rules. Profiling with a large number of audit rules active on the
exit chain shows that we are spending upto 48% in this routine for syscall
intensive tests, most of which is in the atomic ops.

The following patch acquires the cred if a rule requires it. In our particular
case above, most rules had no cred requirement and this dropped the time spent
in audit_filter_rules down to ~12%. An alternative would be for the caller to
acquire the cred just once for the whole chain and pass into audit_filter_rules.
I can create an alternate patch doing this if required.

Signed-off-by: Tony Jones <[email protected]>
---

kernel/auditsc.c | 24 +++++++++++++++++++++---
1 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index f49a031..4a930a1 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -450,7 +450,7 @@ static int audit_filter_rules(struct task_struct *tsk,
struct audit_names *name,
enum audit_state *state)
{
- const struct cred *cred = get_task_cred(tsk);
+ const struct cred *cred=NULL;
int i, j, need_sid = 1;
u32 sid;

@@ -470,27 +470,43 @@ static int audit_filter_rules(struct task_struct *tsk,
}
break;
case AUDIT_UID:
+ if (!cred)
+ cred=get_task_cred(tsk);
result = audit_comparator(cred->uid, f->op, f->val);
break;
case AUDIT_EUID:
+ if (!cred)
+ cred=get_task_cred(tsk);
result = audit_comparator(cred->euid, f->op, f->val);
break;
case AUDIT_SUID:
+ if (!cred)
+ cred=get_task_cred(tsk);
result = audit_comparator(cred->suid, f->op, f->val);
break;
case AUDIT_FSUID:
+ if (!cred)
+ cred=get_task_cred(tsk);
result = audit_comparator(cred->fsuid, f->op, f->val);
break;
case AUDIT_GID:
+ if (!cred)
+ cred=get_task_cred(tsk);
result = audit_comparator(cred->gid, f->op, f->val);
break;
case AUDIT_EGID:
+ if (!cred)
+ cred=get_task_cred(tsk);
result = audit_comparator(cred->egid, f->op, f->val);
break;
case AUDIT_SGID:
+ if (!cred)
+ cred=get_task_cred(tsk);
result = audit_comparator(cred->sgid, f->op, f->val);
break;
case AUDIT_FSGID:
+ if (!cred)
+ cred=get_task_cred(tsk);
result = audit_comparator(cred->fsgid, f->op, f->val);
break;
case AUDIT_PERS:
@@ -638,7 +654,8 @@ static int audit_filter_rules(struct task_struct *tsk,
}

if (!result) {
- put_cred(cred);
+ if (cred)
+ put_cred(cred);
return 0;
}
}
@@ -656,7 +673,8 @@ static int audit_filter_rules(struct task_struct *tsk,
case AUDIT_NEVER: *state = AUDIT_DISABLED; break;
case AUDIT_ALWAYS: *state = AUDIT_RECORD_CONTEXT; break;
}
- put_cred(cred);
+ if (cred)
+ put_cred(cred);
return 1;
}


2011-03-08 18:03:07

by David Howells

[permalink] [raw]
Subject: Re: PATCH [1/1]: audit: acquire creds selectively to reduce atomic op overhead

Tony Jones <[email protected]> wrote:

> Commit c69e8d9c01db added calls to get_task_cred and put_cred in
> audit_filter_rules. Profiling with a large number of audit rules active on
> the exit chain shows that we are spending upto 48% in this routine for
> syscall intensive tests, most of which is in the atomic ops.
>
> The following patch acquires the cred if a rule requires it. In our
> particular case above, most rules had no cred requirement and this dropped
> the time spent in audit_filter_rules down to ~12%. An alternative would be
> for the caller to acquire the cred just once for the whole chain and pass
> into audit_filter_rules. I can create an alternate patch doing this if
> required.

There's no actual need to get a ref on the named task's creds.

If tsk == current, no locking is needed at all.

If tsk != current, the RCU read lock is sufficient. See task_cred_xxx() in
include/linux/cred.h.

Hmmm... I wonder... The audit filter uses tsk->real_cred, but is that
correct? Should it be using tsk->cred? And is tsk always going to be
current?

> + const struct cred *cred=NULL;

Binary operators like '=' should have a space on each side.

David

2011-03-10 20:25:38

by Tony Jones

[permalink] [raw]
Subject: Re: PATCH [1/1]: audit: acquire creds selectively to reduce atomic op overhead

On Tue, Mar 08, 2011 at 06:02:53PM +0000, David Howells wrote:
> Tony Jones <[email protected]> wrote:
>
> > Commit c69e8d9c01db added calls to get_task_cred and put_cred in
> > audit_filter_rules. Profiling with a large number of audit rules active on
> > the exit chain shows that we are spending upto 48% in this routine for
> > syscall intensive tests, most of which is in the atomic ops.
> >
> > The following patch acquires the cred if a rule requires it. In our
> > particular case above, most rules had no cred requirement and this dropped
> > the time spent in audit_filter_rules down to ~12%. An alternative would be
> > for the caller to acquire the cred just once for the whole chain and pass
> > into audit_filter_rules. I can create an alternate patch doing this if
> > required.
>
> There's no actual need to get a ref on the named task's creds.
>
> If tsk == current, no locking is needed at all.
>
> If tsk != current, the RCU read lock is sufficient. See task_cred_xxx() in
> include/linux/cred.h.
>
> Hmmm... I wonder... The audit filter uses tsk->real_cred, but is that
> correct? Should it be using tsk->cred? And is tsk always going to be
> current?

Hi David.

I'm not seeing the 'tsk->real_cred' usage, can you clarify?

I went through the call tree. Assuming my analysis is correct the only case
where it's not current is the calls from copy_process. I believe there is no
need for rcu in this case either.

Tony

------

audit_filter_rules
<- audit_filter_task
<- audit_alloc
<- copy_process

<- audit_filter_syscall
<- audit_get_context
<- audit_free
<- copy_process (error path)
<- do_exit (tsk == current)
<- audit_syscall_exit (tsk == current)

<- audit_syscall_entry (tsk == current)

<- audit_filter_inodes
<- audit_update_watch (tsk == current)
<- audit_get_context (see above)

2011-03-11 16:34:06

by David Howells

[permalink] [raw]
Subject: Re: PATCH [1/1]: audit: acquire creds selectively to reduce atomic op overhead

Tony Jones <[email protected]> wrote:

> I'm not seeing the 'tsk->real_cred' usage, can you clarify?

get_task_cred() and task_cred_xxxx() call __task_cred() which uses
tsk->real_cred. These are the real credentials of the process, and the ones
that are used when the process is being acted upon and the ones that are
visible through /proc.

However, if a task is acting upon something, task->cred is used instead. These
are not visible from the outside and may be overridden. current_cred_xxx()
uses these.

It's possible that the credentials being used in audit_filter_rules() are
incorrect under most circumstances and should be task->cred, not
task->real_cred.

David

2011-03-15 17:38:17

by Tony Jones

[permalink] [raw]
Subject: Re: PATCH [1/1]: audit: acquire creds selectively to reduce atomic op overhead

On Fri, Mar 11, 2011 at 04:33:52PM +0000, David Howells wrote:
> get_task_cred() and task_cred_xxxx() call __task_cred() which uses

Sorry, I thought you were referring to the remainder of auditsc.c

> It's possible that the credentials being used in audit_filter_rules() are
> incorrect under most circumstances and should be task->cred, not
> task->real_cred.

Agree. Also I believe it is safe to use tsk->cred directly as tsk == current
or tsk is being created by copy_process.

Eric, can you ACK/NACK?

The following patch corresponds to the above.

Tony
---


Commit c69e8d9c01db added calls to get_task_cred and put_cred in
audit_filter_rules. Profiling with a large number of audit rules active on the
exit chain shows that we are spending upto 48% in this routine for syscall
intensive tests, most of which is in the atomic ops.

The code should be accessing tsk->cred rather than tsk->real_cred. Also, since
tsk is current (or tsk is being created by copy_process) direct access to
tsk->cred is possible.

Signed-off-by: Tony Jones <[email protected]>
---

kernel/auditsc.c | 18 ++++++++++--------
1 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index f49a031..750c08ef 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -441,16 +441,21 @@ static int match_tree_refs(struct audit_context *ctx, struct audit_tree *tree)
return 0;
}

-/* Determine if any context name data matches a rule's watch data */
-/* Compare a task_struct with an audit_rule. Return 1 on match, 0
- * otherwise. */
+/*
+ * Determine if any context name data matches a rule's watch data
+ * Compare a task_struct with an audit_rule. Return 1 on match, 0
+ * otherwise.
+ *
+ * Note: tsk==current or we are being indirectly called from copy_process()
+ * so direct access to tsk->cred is allowed.
+ */
static int audit_filter_rules(struct task_struct *tsk,
struct audit_krule *rule,
struct audit_context *ctx,
struct audit_names *name,
enum audit_state *state)
{
- const struct cred *cred = get_task_cred(tsk);
+ const struct cred *cred = tsk->cred;
int i, j, need_sid = 1;
u32 sid;

@@ -637,10 +642,8 @@ static int audit_filter_rules(struct task_struct *tsk,
break;
}

- if (!result) {
- put_cred(cred);
+ if (!result)
return 0;
- }
}

if (ctx) {
@@ -656,7 +659,6 @@ static int audit_filter_rules(struct task_struct *tsk,
case AUDIT_NEVER: *state = AUDIT_DISABLED; break;
case AUDIT_ALWAYS: *state = AUDIT_RECORD_CONTEXT; break;
}
- put_cred(cred);
return 1;
}

2011-03-15 17:44:48

by Eric Paris

[permalink] [raw]
Subject: Re: PATCH [1/1]: audit: acquire creds selectively to reduce atomic op overhead

On Tue, 2011-03-15 at 10:38 -0700, Tony Jones wrote:
> On Fri, Mar 11, 2011 at 04:33:52PM +0000, David Howells wrote:
> > get_task_cred() and task_cred_xxxx() call __task_cred() which uses
>
> Sorry, I thought you were referring to the remainder of auditsc.c
>
> > It's possible that the credentials being used in audit_filter_rules() are
> > incorrect under most circumstances and should be task->cred, not
> > task->real_cred.
>
> Agree. Also I believe it is safe to use tsk->cred directly as tsk == current
> or tsk is being created by copy_process.

Is there any way we can enforce this condition in the code?
WARN_ON(cred != current->cred && cred->refcnt != 1) or something like
that (I just made that conditional up off the top of my head, but it
explains what I'm talking about)? Who knows what somebody else might do
with this code someday.....

> Eric, can you ACK/NACK?
>
> The following patch corresponds to the above.
>
> Tony
> ---
>
>
> Commit c69e8d9c01db added calls to get_task_cred and put_cred in
> audit_filter_rules. Profiling with a large number of audit rules active on the
> exit chain shows that we are spending upto 48% in this routine for syscall
> intensive tests, most of which is in the atomic ops.
>
> The code should be accessing tsk->cred rather than tsk->real_cred. Also, since
> tsk is current (or tsk is being created by copy_process) direct access to
> tsk->cred is possible.
>
> Signed-off-by: Tony Jones <[email protected]>

Acked-by: Eric Paris <[email protected]>

> ---
>
> kernel/auditsc.c | 18 ++++++++++--------
> 1 files changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index f49a031..750c08ef 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -441,16 +441,21 @@ static int match_tree_refs(struct audit_context *ctx, struct audit_tree *tree)
> return 0;
> }
>
> -/* Determine if any context name data matches a rule's watch data */
> -/* Compare a task_struct with an audit_rule. Return 1 on match, 0
> - * otherwise. */
> +/*
> + * Determine if any context name data matches a rule's watch data
> + * Compare a task_struct with an audit_rule. Return 1 on match, 0
> + * otherwise.
> + *
> + * Note: tsk==current or we are being indirectly called from copy_process()
> + * so direct access to tsk->cred is allowed.
> + */
> static int audit_filter_rules(struct task_struct *tsk,
> struct audit_krule *rule,
> struct audit_context *ctx,
> struct audit_names *name,
> enum audit_state *state)
> {
> - const struct cred *cred = get_task_cred(tsk);
> + const struct cred *cred = tsk->cred;
> int i, j, need_sid = 1;
> u32 sid;
>
> @@ -637,10 +642,8 @@ static int audit_filter_rules(struct task_struct *tsk,
> break;
> }
>
> - if (!result) {
> - put_cred(cred);
> + if (!result)
> return 0;
> - }
> }
>
> if (ctx) {
> @@ -656,7 +659,6 @@ static int audit_filter_rules(struct task_struct *tsk,
> case AUDIT_NEVER: *state = AUDIT_DISABLED; break;
> case AUDIT_ALWAYS: *state = AUDIT_RECORD_CONTEXT; break;
> }
> - put_cred(cred);
> return 1;
> }
>

2011-03-15 20:05:04

by David Howells

[permalink] [raw]
Subject: Re: PATCH [1/1]: audit: acquire creds selectively to reduce atomic op overhead

Tony Jones <[email protected]> wrote:

> Agree. Also I believe it is safe to use tsk->cred directly as tsk == current
> or tsk is being created by copy_process.

You can't quite access it like that without sparse throwing a warning. The
pointer is marked with an __rcu attribute, so you need to use something like
this:

cred = rcu_dereference_check(tsk->cred, (tsk == current ||
called_from_copy_process());

David

2011-03-15 20:11:28

by David Howells

[permalink] [raw]
Subject: Re: PATCH [1/1]: audit: acquire creds selectively to reduce atomic op overhead

Eric Paris <[email protected]> wrote:

> WARN_ON(cred != current->cred && cred->refcnt != 1)

'tsk->parent == current' perhaps? Or audit_alloc() could pass a flag
indicating the state, or just look to see if tsk->audit_context is still NULL.

David

2011-03-17 18:12:08

by Tony Jones

[permalink] [raw]
Subject: Re: PATCH [1/1]: audit: acquire creds selectively to reduce atomic op overhead

On Tue, Mar 15, 2011 at 08:11:17PM +0000, David Howells wrote:
> Eric Paris <[email protected]> wrote:
>
> > WARN_ON(cred != current->cred && cred->refcnt != 1)
>
> 'tsk->parent == current' perhaps? Or audit_alloc() could pass a flag
> indicating the state, or just look to see if tsk->audit_context is still NULL.
>
> David

Round 3. tsk->parent == current isn't an option as it's not set by
copy_process until after audit_alloc. I used a flag to provide an explicit
indication. I didn't have audit_alloc pass the flag into audit_filter_task
as there is already a explicit "process creation time" comment for this static
function. If you want it pushed all the way upto audit_alloc, let me know and
I'll revise.

Oddly sparse didn't throw any warnings about the direct use of tsk->cred.

tony
---

Commit c69e8d9c01db added calls to get_task_cred and put_cred in
audit_filter_rules. Profiling with a large number of audit rules active on the
exit chain shows that we are spending upto 48% in this routine for syscall
intensive tests, most of which is in the atomic ops.

1. The code should be accessing tsk->cred rather than tsk->real_cred.
2. Since tsk is current (or tsk is being created by copy_process) access to
tsk->cred without rcu read lock is possible. At the request of the audit
maintainer, a new flag has been added to audit_filter_rules in order to make
this explicit and guide future code.

Signed-off-by: Tony Jones <[email protected]>
---
kernel/auditsc.c | 27 +++++++++++++++++----------
1 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index f49a031..281dcf1 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -443,17 +443,25 @@ static int match_tree_refs(struct audit_context *ctx, struct audit_tree *tree)

/* Determine if any context name data matches a rule's watch data */
/* Compare a task_struct with an audit_rule. Return 1 on match, 0
- * otherwise. */
+ * otherwise.
+ *
+ * If task_creation is true, this is an explicit indication that we are
+ * filtering a task rule at task creation time. This and tsk == current are
+ * the only situations where tsk->cred may be accessed without an rcu read lock.
+ */
static int audit_filter_rules(struct task_struct *tsk,
struct audit_krule *rule,
struct audit_context *ctx,
struct audit_names *name,
- enum audit_state *state)
+ enum audit_state *state,
+ bool task_creation)
{
- const struct cred *cred = get_task_cred(tsk);
+ const struct cred *cred;
int i, j, need_sid = 1;
u32 sid;

+ cred = rcu_dereference_check(tsk->cred, tsk == current || task_creation);
+
for (i = 0; i < rule->field_count; i++) {
struct audit_field *f = &rule->fields[i];
int result = 0;
@@ -637,10 +645,8 @@ static int audit_filter_rules(struct task_struct *tsk,
break;
}

- if (!result) {
- put_cred(cred);
+ if (!result)
return 0;
- }
}

if (ctx) {
@@ -656,7 +662,6 @@ static int audit_filter_rules(struct task_struct *tsk,
case AUDIT_NEVER: *state = AUDIT_DISABLED; break;
case AUDIT_ALWAYS: *state = AUDIT_RECORD_CONTEXT; break;
}
- put_cred(cred);
return 1;
}

@@ -671,7 +676,8 @@ static enum audit_state audit_filter_task(struct task_struct *tsk, char **key)

rcu_read_lock();
list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_TASK], list) {
- if (audit_filter_rules(tsk, &e->rule, NULL, NULL, &state)) {
+ if (audit_filter_rules(tsk, &e->rule, NULL, NULL,
+ &state, true)) {
if (state == AUDIT_RECORD_CONTEXT)
*key = kstrdup(e->rule.filterkey, GFP_ATOMIC);
rcu_read_unlock();
@@ -705,7 +711,7 @@ static enum audit_state audit_filter_syscall(struct task_struct *tsk,
list_for_each_entry_rcu(e, list, list) {
if ((e->rule.mask[word] & bit) == bit &&
audit_filter_rules(tsk, &e->rule, ctx, NULL,
- &state)) {
+ &state, false)) {
rcu_read_unlock();
ctx->current_state = state;
return state;
@@ -743,7 +749,8 @@ void audit_filter_inodes(struct task_struct *tsk, struct audit_context *ctx)

list_for_each_entry_rcu(e, list, list) {
if ((e->rule.mask[word] & bit) == bit &&
- audit_filter_rules(tsk, &e->rule, ctx, n, &state)) {
+ audit_filter_rules(tsk, &e->rule, ctx, n,
+ &state, false)) {
rcu_read_unlock();
ctx->current_state = state;
return;

2011-03-21 13:57:48

by Eric Paris

[permalink] [raw]
Subject: Re: PATCH [1/1]: audit: acquire creds selectively to reduce atomic op overhead

On Thu, 2011-03-17 at 11:11 -0700, Tony Jones wrote:
> On Tue, Mar 15, 2011 at 08:11:17PM +0000, David Howells wrote:
> > Eric Paris <[email protected]> wrote:
> >
> > > WARN_ON(cred != current->cred && cred->refcnt != 1)
> >
> > 'tsk->parent == current' perhaps? Or audit_alloc() could pass a flag
> > indicating the state, or just look to see if tsk->audit_context is still NULL.
> >
> > David
>
> Round 3. tsk->parent == current isn't an option as it's not set by
> copy_process until after audit_alloc. I used a flag to provide an explicit
> indication. I didn't have audit_alloc pass the flag into audit_filter_task
> as there is already a explicit "process creation time" comment for this static
> function. If you want it pushed all the way upto audit_alloc, let me know and
> I'll revise.
>
> Oddly sparse didn't throw any warnings about the direct use of tsk->cred.
>
> tony
> ---
>
> Commit c69e8d9c01db added calls to get_task_cred and put_cred in
> audit_filter_rules. Profiling with a large number of audit rules active on the
> exit chain shows that we are spending upto 48% in this routine for syscall
> intensive tests, most of which is in the atomic ops.
>
> 1. The code should be accessing tsk->cred rather than tsk->real_cred.
> 2. Since tsk is current (or tsk is being created by copy_process) access to
> tsk->cred without rcu read lock is possible. At the request of the audit
> maintainer, a new flag has been added to audit_filter_rules in order to make
> this explicit and guide future code.
>
> Signed-off-by: Tony Jones <[email protected]>

Acked-by: Eric Paris <[email protected]>

> ---
> kernel/auditsc.c | 27 +++++++++++++++++----------
> 1 files changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index f49a031..281dcf1 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -443,17 +443,25 @@ static int match_tree_refs(struct audit_context *ctx, struct audit_tree *tree)
>
> /* Determine if any context name data matches a rule's watch data */
> /* Compare a task_struct with an audit_rule. Return 1 on match, 0
> - * otherwise. */
> + * otherwise.
> + *
> + * If task_creation is true, this is an explicit indication that we are
> + * filtering a task rule at task creation time. This and tsk == current are
> + * the only situations where tsk->cred may be accessed without an rcu read lock.
> + */
> static int audit_filter_rules(struct task_struct *tsk,
> struct audit_krule *rule,
> struct audit_context *ctx,
> struct audit_names *name,
> - enum audit_state *state)
> + enum audit_state *state,
> + bool task_creation)
> {
> - const struct cred *cred = get_task_cred(tsk);
> + const struct cred *cred;
> int i, j, need_sid = 1;
> u32 sid;
>
> + cred = rcu_dereference_check(tsk->cred, tsk == current || task_creation);
> +
> for (i = 0; i < rule->field_count; i++) {
> struct audit_field *f = &rule->fields[i];
> int result = 0;
> @@ -637,10 +645,8 @@ static int audit_filter_rules(struct task_struct *tsk,
> break;
> }
>
> - if (!result) {
> - put_cred(cred);
> + if (!result)
> return 0;
> - }
> }
>
> if (ctx) {
> @@ -656,7 +662,6 @@ static int audit_filter_rules(struct task_struct *tsk,
> case AUDIT_NEVER: *state = AUDIT_DISABLED; break;
> case AUDIT_ALWAYS: *state = AUDIT_RECORD_CONTEXT; break;
> }
> - put_cred(cred);
> return 1;
> }
>
> @@ -671,7 +676,8 @@ static enum audit_state audit_filter_task(struct task_struct *tsk, char **key)
>
> rcu_read_lock();
> list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_TASK], list) {
> - if (audit_filter_rules(tsk, &e->rule, NULL, NULL, &state)) {
> + if (audit_filter_rules(tsk, &e->rule, NULL, NULL,
> + &state, true)) {
> if (state == AUDIT_RECORD_CONTEXT)
> *key = kstrdup(e->rule.filterkey, GFP_ATOMIC);
> rcu_read_unlock();
> @@ -705,7 +711,7 @@ static enum audit_state audit_filter_syscall(struct task_struct *tsk,
> list_for_each_entry_rcu(e, list, list) {
> if ((e->rule.mask[word] & bit) == bit &&
> audit_filter_rules(tsk, &e->rule, ctx, NULL,
> - &state)) {
> + &state, false)) {
> rcu_read_unlock();
> ctx->current_state = state;
> return state;
> @@ -743,7 +749,8 @@ void audit_filter_inodes(struct task_struct *tsk, struct audit_context *ctx)
>
> list_for_each_entry_rcu(e, list, list) {
> if ((e->rule.mask[word] & bit) == bit &&
> - audit_filter_rules(tsk, &e->rule, ctx, n, &state)) {
> + audit_filter_rules(tsk, &e->rule, ctx, n,
> + &state, false)) {
> rcu_read_unlock();
> ctx->current_state = state;
> return;

2011-04-27 13:12:29

by Jiri Kosina

[permalink] [raw]
Subject: Re: PATCH [1/1]: audit: acquire creds selectively to reduce atomic op overhead

On Mon, 21 Mar 2011, Eric Paris wrote:

> > Commit c69e8d9c01db added calls to get_task_cred and put_cred in
> > audit_filter_rules. Profiling with a large number of audit rules active on the
> > exit chain shows that we are spending upto 48% in this routine for syscall
> > intensive tests, most of which is in the atomic ops.
> >
> > 1. The code should be accessing tsk->cred rather than tsk->real_cred.
> > 2. Since tsk is current (or tsk is being created by copy_process) access to
> > tsk->cred without rcu read lock is possible. At the request of the audit
> > maintainer, a new flag has been added to audit_filter_rules in order to make
> > this explicit and guide future code.
> >
> > Signed-off-by: Tony Jones <[email protected]>
>
> Acked-by: Eric Paris <[email protected]>

I don't see the patch in linux-next as of today. As it has been acked by
subsystem maintainer, I have picked it up into my tree ("retransmission
mode").

If anyone has any objections, please let me know. Thanks.

>
> > ---
> > kernel/auditsc.c | 27 +++++++++++++++++----------
> > 1 files changed, 17 insertions(+), 10 deletions(-)
> >
> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > index f49a031..281dcf1 100644
> > --- a/kernel/auditsc.c
> > +++ b/kernel/auditsc.c
> > @@ -443,17 +443,25 @@ static int match_tree_refs(struct audit_context *ctx, struct audit_tree *tree)
> >
> > /* Determine if any context name data matches a rule's watch data */
> > /* Compare a task_struct with an audit_rule. Return 1 on match, 0
> > - * otherwise. */
> > + * otherwise.
> > + *
> > + * If task_creation is true, this is an explicit indication that we are
> > + * filtering a task rule at task creation time. This and tsk == current are
> > + * the only situations where tsk->cred may be accessed without an rcu read lock.
> > + */
> > static int audit_filter_rules(struct task_struct *tsk,
> > struct audit_krule *rule,
> > struct audit_context *ctx,
> > struct audit_names *name,
> > - enum audit_state *state)
> > + enum audit_state *state,
> > + bool task_creation)
> > {
> > - const struct cred *cred = get_task_cred(tsk);
> > + const struct cred *cred;
> > int i, j, need_sid = 1;
> > u32 sid;
> >
> > + cred = rcu_dereference_check(tsk->cred, tsk == current || task_creation);
> > +
> > for (i = 0; i < rule->field_count; i++) {
> > struct audit_field *f = &rule->fields[i];
> > int result = 0;
> > @@ -637,10 +645,8 @@ static int audit_filter_rules(struct task_struct *tsk,
> > break;
> > }
> >
> > - if (!result) {
> > - put_cred(cred);
> > + if (!result)
> > return 0;
> > - }
> > }
> >
> > if (ctx) {
> > @@ -656,7 +662,6 @@ static int audit_filter_rules(struct task_struct *tsk,
> > case AUDIT_NEVER: *state = AUDIT_DISABLED; break;
> > case AUDIT_ALWAYS: *state = AUDIT_RECORD_CONTEXT; break;
> > }
> > - put_cred(cred);
> > return 1;
> > }
> >
> > @@ -671,7 +676,8 @@ static enum audit_state audit_filter_task(struct task_struct *tsk, char **key)
> >
> > rcu_read_lock();
> > list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_TASK], list) {
> > - if (audit_filter_rules(tsk, &e->rule, NULL, NULL, &state)) {
> > + if (audit_filter_rules(tsk, &e->rule, NULL, NULL,
> > + &state, true)) {
> > if (state == AUDIT_RECORD_CONTEXT)
> > *key = kstrdup(e->rule.filterkey, GFP_ATOMIC);
> > rcu_read_unlock();
> > @@ -705,7 +711,7 @@ static enum audit_state audit_filter_syscall(struct task_struct *tsk,
> > list_for_each_entry_rcu(e, list, list) {
> > if ((e->rule.mask[word] & bit) == bit &&
> > audit_filter_rules(tsk, &e->rule, ctx, NULL,
> > - &state)) {
> > + &state, false)) {
> > rcu_read_unlock();
> > ctx->current_state = state;
> > return state;
> > @@ -743,7 +749,8 @@ void audit_filter_inodes(struct task_struct *tsk, struct audit_context *ctx)
> >
> > list_for_each_entry_rcu(e, list, list) {
> > if ((e->rule.mask[word] & bit) == bit &&
> > - audit_filter_rules(tsk, &e->rule, ctx, n, &state)) {
> > + audit_filter_rules(tsk, &e->rule, ctx, n,
> > + &state, false)) {
> > rcu_read_unlock();
> > ctx->current_state = state;
> > return;

--
Jiri Kosina
SUSE Labs, Novell Inc.

2011-04-27 16:27:05

by Tony Jones

[permalink] [raw]
Subject: Re: PATCH [1/1]: audit: acquire creds selectively to reduce atomic op overhead

On Wed, Apr 27, 2011 at 03:12:23PM +0200, Jiri Kosina wrote:
> I don't see the patch in linux-next as of today. As it has been acked by
> subsystem maintainer, I have picked it up into my tree ("retransmission
> mode").
>
> If anyone has any objections, please let me know. Thanks.

I spoke to Eric about this 10 days ago and he was going to create an audit
tree for the next window.

Tony