2022-09-27 23:10:32

by Ankur Arora

[permalink] [raw]
Subject: [PATCH 2/3] audit: annotate branch direction for audit_in_mask()

With sane audit rules, audit logging would only be triggered
infrequently. Keeping this in mind, annotate audit_in_mask() as
unlikely() to allow the compiler to pessimize the call to
audit_filter_rules().

This allows GCC to invert the branch direction for the audit_filter_rules()
basic block in this loop:

list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_EXIT], list) {
if (audit_in_mask(&e->rule, major) &&
audit_filter_rules(tsk, &e->rule, ctx, NULL,
&state, false)) {
...

such that it executes the common case in a straight line fashion.

On a Skylakex system change in getpid() latency (all results
aggregated across 12 boot cycles):

Min Mean Median Max pstdev
(ns) (ns) (ns) (ns)

- 196.63 207.86 206.60 230.98 (+- 3.92%)
+ 173.11 182.51 179.65 202.09 (+- 4.34%)

Performance counter stats for 'bin/getpid' (3 runs) go from:
cycles 805.58 ( +- 4.11% )
instructions 1654.11 ( +- .05% )
IPC 2.06 ( +- 3.39% )
branches 430.02 ( +- .05% )
branch-misses 1.55 ( +- 7.09% )
L1-dcache-loads 440.01 ( +- .09% )
L1-dcache-load-misses 9.05 ( +- 74.03% )

to:
cycles 706.13 ( +- 4.13% )
instructions 1654.70 ( +- .06% )
IPC 2.35 ( +- 4.25% )
branches 430.99 ( +- .06% )
branch-misses 0.50 ( +- 2.00% )
L1-dcache-loads 440.02 ( +- .07% )
L1-dcache-load-misses 5.22 ( +- 82.75% )

(Both aggregated over 12 boot cycles.)

cycles: performance improves on average by ~100 cycles/call. IPC
improves commensurately. Two reasons for this improvement:

* one fewer branch mispred: no obvious reason for this
branch-miss reduction. There is no significant change in
basic-block structure (apart from the branch inversion.)

* the direction of the branch for the call is now inverted, so it
chooses the not-taken direction more often. The issue-latency
for not-taken branches is often cheaper.

Signed-off-by: Ankur Arora <[email protected]>
---
kernel/auditsc.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 533b087c3c02..bf26f47b5226 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -789,7 +789,7 @@ static enum audit_state audit_filter_task(struct task_struct *tsk, char **key)
return AUDIT_STATE_BUILD;
}

-static int audit_in_mask(const struct audit_krule *rule, unsigned long val)
+static bool audit_in_mask(const struct audit_krule *rule, unsigned long val)
{
int word, bit;

@@ -850,12 +850,13 @@ static void audit_filter_syscall(struct task_struct *tsk,

rcu_read_lock();
list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_EXIT], list) {
- if (audit_in_mask(&e->rule, major) &&
- audit_filter_rules(tsk, &e->rule, ctx, NULL,
- &state, false)) {
- rcu_read_unlock();
- ctx->current_state = state;
- return;
+ if (unlikely(audit_in_mask(&e->rule, major))) {
+ if (audit_filter_rules(tsk, &e->rule, ctx, NULL,
+ &state, false)) {
+ rcu_read_unlock();
+ ctx->current_state = state;
+ return;
+ }
}
}
rcu_read_unlock();
--
2.31.1


2022-09-28 22:56:45

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH 2/3] audit: annotate branch direction for audit_in_mask()

On Tue, Sep 27, 2022 at 7:00 PM Ankur Arora <[email protected]> wrote:
>
> With sane audit rules, audit logging would only be triggered
> infrequently. Keeping this in mind, annotate audit_in_mask() as
> unlikely() to allow the compiler to pessimize the call to
> audit_filter_rules().
>
> This allows GCC to invert the branch direction for the audit_filter_rules()
> basic block in this loop:
>
> list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_EXIT], list) {
> if (audit_in_mask(&e->rule, major) &&
> audit_filter_rules(tsk, &e->rule, ctx, NULL,
> &state, false)) {
> ...
>
> such that it executes the common case in a straight line fashion.
>
> On a Skylakex system change in getpid() latency (all results
> aggregated across 12 boot cycles):
>
> Min Mean Median Max pstdev
> (ns) (ns) (ns) (ns)
>
> - 196.63 207.86 206.60 230.98 (+- 3.92%)
> + 173.11 182.51 179.65 202.09 (+- 4.34%)
>
> Performance counter stats for 'bin/getpid' (3 runs) go from:
> cycles 805.58 ( +- 4.11% )
> instructions 1654.11 ( +- .05% )
> IPC 2.06 ( +- 3.39% )
> branches 430.02 ( +- .05% )
> branch-misses 1.55 ( +- 7.09% )
> L1-dcache-loads 440.01 ( +- .09% )
> L1-dcache-load-misses 9.05 ( +- 74.03% )
>
> to:
> cycles 706.13 ( +- 4.13% )
> instructions 1654.70 ( +- .06% )
> IPC 2.35 ( +- 4.25% )
> branches 430.99 ( +- .06% )
> branch-misses 0.50 ( +- 2.00% )
> L1-dcache-loads 440.02 ( +- .07% )
> L1-dcache-load-misses 5.22 ( +- 82.75% )
>
> (Both aggregated over 12 boot cycles.)
>
> cycles: performance improves on average by ~100 cycles/call. IPC
> improves commensurately. Two reasons for this improvement:
>
> * one fewer branch mispred: no obvious reason for this
> branch-miss reduction. There is no significant change in
> basic-block structure (apart from the branch inversion.)
>
> * the direction of the branch for the call is now inverted, so it
> chooses the not-taken direction more often. The issue-latency
> for not-taken branches is often cheaper.
>
> Signed-off-by: Ankur Arora <[email protected]>
> ---
> kernel/auditsc.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)

I generally dislike merging likely()/unlikely() additions to code
paths that can have varying levels of performance depending on runtime
configuration. While I appreciate the work you are doing to improve
audit performance, I don't think this is something I want to merge,
I'm sorry.

> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 533b087c3c02..bf26f47b5226 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -789,7 +789,7 @@ static enum audit_state audit_filter_task(struct task_struct *tsk, char **key)
> return AUDIT_STATE_BUILD;
> }
>
> -static int audit_in_mask(const struct audit_krule *rule, unsigned long val)
> +static bool audit_in_mask(const struct audit_krule *rule, unsigned long val)
> {
> int word, bit;
>
> @@ -850,12 +850,13 @@ static void audit_filter_syscall(struct task_struct *tsk,
>
> rcu_read_lock();
> list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_EXIT], list) {
> - if (audit_in_mask(&e->rule, major) &&
> - audit_filter_rules(tsk, &e->rule, ctx, NULL,
> - &state, false)) {
> - rcu_read_unlock();
> - ctx->current_state = state;
> - return;
> + if (unlikely(audit_in_mask(&e->rule, major))) {
> + if (audit_filter_rules(tsk, &e->rule, ctx, NULL,
> + &state, false)) {
> + rcu_read_unlock();
> + ctx->current_state = state;
> + return;
> + }
> }
> }
> rcu_read_unlock();
> --
> 2.31.1

--
paul-moore.com

2022-09-29 20:22:16

by Ankur Arora

[permalink] [raw]
Subject: Re: [PATCH 2/3] audit: annotate branch direction for audit_in_mask()


Paul Moore <[email protected]> writes:

> On Tue, Sep 27, 2022 at 7:00 PM Ankur Arora <[email protected]> wrote:
>>
>> With sane audit rules, audit logging would only be triggered
>> infrequently. Keeping this in mind, annotate audit_in_mask() as
>> unlikely() to allow the compiler to pessimize the call to
>> audit_filter_rules().
>>
>> This allows GCC to invert the branch direction for the audit_filter_rules()
>> basic block in this loop:
>>
>> list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_EXIT], list) {
>> if (audit_in_mask(&e->rule, major) &&
>> audit_filter_rules(tsk, &e->rule, ctx, NULL,
>> &state, false)) {
>> ...
>>
>> such that it executes the common case in a straight line fashion.
>>
>> On a Skylakex system change in getpid() latency (all results
>> aggregated across 12 boot cycles):
>>
>> Min Mean Median Max pstdev
>> (ns) (ns) (ns) (ns)
>>
>> - 196.63 207.86 206.60 230.98 (+- 3.92%)
>> + 173.11 182.51 179.65 202.09 (+- 4.34%)
>>
>> Performance counter stats for 'bin/getpid' (3 runs) go from:
>> cycles 805.58 ( +- 4.11% )
>> instructions 1654.11 ( +- .05% )
>> IPC 2.06 ( +- 3.39% )
>> branches 430.02 ( +- .05% )
>> branch-misses 1.55 ( +- 7.09% )
>> L1-dcache-loads 440.01 ( +- .09% )
>> L1-dcache-load-misses 9.05 ( +- 74.03% )
>>
>> to:
>> cycles 706.13 ( +- 4.13% )
>> instructions 1654.70 ( +- .06% )
>> IPC 2.35 ( +- 4.25% )
>> branches 430.99 ( +- .06% )
>> branch-misses 0.50 ( +- 2.00% )
>> L1-dcache-loads 440.02 ( +- .07% )
>> L1-dcache-load-misses 5.22 ( +- 82.75% )
>>
>> (Both aggregated over 12 boot cycles.)
>>
>> cycles: performance improves on average by ~100 cycles/call. IPC
>> improves commensurately. Two reasons for this improvement:
>>
>> * one fewer branch mispred: no obvious reason for this
>> branch-miss reduction. There is no significant change in
>> basic-block structure (apart from the branch inversion.)
>>
>> * the direction of the branch for the call is now inverted, so it
>> chooses the not-taken direction more often. The issue-latency
>> for not-taken branches is often cheaper.
>>
>> Signed-off-by: Ankur Arora <[email protected]>
>> ---
>> kernel/auditsc.c | 15 ++++++++-------
>> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> I generally dislike merging likely()/unlikely() additions to code
> paths that can have varying levels of performance depending on runtime
> configuration.

I think that's fair, and in this particular case the benchmark is quite
contrived.

But, just to elaborate a bit more on why that unlikely() clause made
sense to me: it seems to me that audit typically would be triggered for
control syscalls and the ratio between control and non-control ones
would be fairly lopsided.

Let me see if I can rewrite the conditional in a different way to get a
similar effect but I suspect that might be even more compiler dependent.

Also, let me run the audit-testsuite this time. Is there a good test
there that you would recommend that might serve as a more representative
workload?


Thanks
Ankur

> While I appreciate the work you are doing to improve
> audit performance, I don't think this is something I want to merge,
> I'm sorry.



>
>> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
>> index 533b087c3c02..bf26f47b5226 100644
>> --- a/kernel/auditsc.c
>> +++ b/kernel/auditsc.c
>> @@ -789,7 +789,7 @@ static enum audit_state audit_filter_task(struct task_struct *tsk, char **key)
>> return AUDIT_STATE_BUILD;
>> }
>>
>> -static int audit_in_mask(const struct audit_krule *rule, unsigned long val)
>> +static bool audit_in_mask(const struct audit_krule *rule, unsigned long val)
>> {
>> int word, bit;
>>
>> @@ -850,12 +850,13 @@ static void audit_filter_syscall(struct task_struct *tsk,
>>
>> rcu_read_lock();
>> list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_EXIT], list) {
>> - if (audit_in_mask(&e->rule, major) &&
>> - audit_filter_rules(tsk, &e->rule, ctx, NULL,
>> - &state, false)) {
>> - rcu_read_unlock();
>> - ctx->current_state = state;
>> - return;
>> + if (unlikely(audit_in_mask(&e->rule, major))) {
>> + if (audit_filter_rules(tsk, &e->rule, ctx, NULL,
>> + &state, false)) {
>> + rcu_read_unlock();
>> + ctx->current_state = state;
>> + return;
>> + }
>> }
>> }
>> rcu_read_unlock();
>> --
>> 2.31.1


--
ankur

2022-09-30 18:56:15

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH 2/3] audit: annotate branch direction for audit_in_mask()

On Thu, Sep 29, 2022 at 4:20 PM Ankur Arora <[email protected]> wrote:
> Paul Moore <[email protected]> writes:
> > I generally dislike merging likely()/unlikely() additions to code
> > paths that can have varying levels of performance depending on runtime
> > configuration.
>
> I think that's fair, and in this particular case the benchmark is quite
> contrived.
>
> But, just to elaborate a bit more on why that unlikely() clause made
> sense to me: it seems to me that audit typically would be triggered for
> control syscalls and the ratio between control and non-control ones
> would be fairly lopsided.

I understand, and there is definitely some precedence in the audit
code for using likely()/unlikely() in a manner similar as you
described, but I'll refer to my previous comments - it's not something
I like. As a general rule, aside from the unlikely() calls in the
audit wrappers present in include/linux/audit.h I would suggest not
adding any new likely()/unlikely() calls.

> Let me see if I can rewrite the conditional in a different way to get a
> similar effect but I suspect that might be even more compiler dependent.

I am okay with ordering conditionals to make the common case the
short-circuit case.

> Also, let me run the audit-testsuite this time. Is there a good test
> there that you would recommend that might serve as a more representative
> workload?

Probably not. The audit-testsuite is intended to be a quick, easy to
run regression test that can be used by developers to help reduce
audit breakage. It is not representative of any particular workload,
and is definitely not comprehensive (it is woefully lacking in several
areas unfortunately).

--
paul-moore.com

2022-10-07 01:45:04

by Ankur Arora

[permalink] [raw]
Subject: Re: [PATCH 2/3] audit: annotate branch direction for audit_in_mask()


Paul Moore <[email protected]> writes:

> On Thu, Sep 29, 2022 at 4:20 PM Ankur Arora <[email protected]> wrote:
>> Paul Moore <[email protected]> writes:
>> > I generally dislike merging likely()/unlikely() additions to code
>> > paths that can have varying levels of performance depending on runtime
>> > configuration.
>>
>> I think that's fair, and in this particular case the benchmark is quite
>> contrived.
>>
>> But, just to elaborate a bit more on why that unlikely() clause made
>> sense to me: it seems to me that audit typically would be triggered for
>> control syscalls and the ratio between control and non-control ones
>> would be fairly lopsided.
>
> I understand, and there is definitely some precedence in the audit
> code for using likely()/unlikely() in a manner similar as you
> described, but I'll refer to my previous comments - it's not something
> I like. As a general rule, aside from the unlikely() calls in the
> audit wrappers present in include/linux/audit.h I would suggest not
> adding any new likely()/unlikely() calls.
>
>> Let me see if I can rewrite the conditional in a different way to get a
>> similar effect but I suspect that might be even more compiler dependent.
>
> I am okay with ordering conditionals to make the common case the
> short-circuit case.

So I played around with a bunch of different combinations of the
conditionals but nothing really improved the code all that much.

Just sent out v2 dropping the unlikely() clause.


Thanks
Ankur

>
>> Also, let me run the audit-testsuite this time. Is there a good test
>> there that you would recommend that might serve as a more representative
>> workload?
>
> Probably not. The audit-testsuite is intended to be a quick, easy to
> run regression test that can be used by developers to help reduce
> audit breakage. It is not representative of any particular workload,
> and is definitely not comprehensive (it is woefully lacking in several
> areas unfortunately).


--
ankur