2023-05-11 05:26:57

by Eiichi Tsukata

[permalink] [raw]
Subject: [PATCH v2 5/5] audit: do not use exclusive wait in audit_receive()

kauditd thread issues wake_up() before it goes to sleep. The wake_up()
call wakes up only one process as waiter side uses exclusive wait.
This can be problematic when there are multiple processes (one is in
audit_receive() and others are in audit_log_start()) waiting on
audit_backlog_wait queue.

For example, if there are two processes waiting:

Process (A): in audit_receive()
Process (B): in audit_log_start()

And (A) is at the head of the wait queue. Then kauditd's wake_up() only
wakes up (A) leaving (B) as it is even if @audit_queue is drained. As a
result, (B) can be blocked for up to backlog_wait_time.

To prevent the issue, use non-exclusive wait in audit_receive() so that
kauditd can wake up all waiters in audit_receive().

Fixes: 8f110f530635 ("audit: ensure userspace is penalized the same as the kernel when under pressure")
Signed-off-by: Eiichi Tsukata <[email protected]>
---
kernel/audit.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 6b0cc0459984..ef48210343ae 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -631,22 +631,27 @@ static void kauditd_retry_skb(struct sk_buff *skb, __always_unused int error)
/**
* wait_for_kauditd - Wait for kauditd to drain the queue
* @stime: time to sleep
+ * @exclusive: use exclusive wait if true
*
* Description:
* Waits for kauditd to drain the queue then adds duration of sleep time to
* audit_backlog_wait_time_actual as cumulative sum.
* Returns remaining time to sleep.
*/
-static long wait_for_kauditd(long stime)
+static long wait_for_kauditd(long stime, bool exclusive)
{
long rtime;
DEFINE_WAIT(wait);

- prepare_to_wait_exclusive(&audit_backlog_wait, &wait,
- TASK_UNINTERRUPTIBLE);
+ if (exclusive)
+ prepare_to_wait_exclusive(&audit_backlog_wait, &wait,
+ TASK_UNINTERRUPTIBLE);
+ else
+ prepare_to_wait(&audit_backlog_wait, &wait,
+ TASK_UNINTERRUPTIBLE);

/* need to check if the queue is full again because kauditd might have
- * flushed the queue and went to sleep after prepare_to_wait_exclusive()
+ * flushed the queue and went to sleep after prepare_to_wait_*()
*/
if (audit_queue_full(&audit_queue)) {
rtime = schedule_timeout(stime);
@@ -1601,7 +1606,7 @@ static void audit_receive(struct sk_buff *skb)
if (audit_queue_full(&audit_queue)) {
/* wake kauditd to try and flush the queue */
wake_up_interruptible(&kauditd_wait);
- wait_for_kauditd(audit_backlog_wait_time);
+ wait_for_kauditd(audit_backlog_wait_time, false);
}
}

@@ -1900,7 +1905,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
/* sleep if we are allowed and we haven't exhausted our
* backlog wait limit */
if (gfpflags_allow_blocking(gfp_mask) && stime > 0) {
- stime = wait_for_kauditd(stime);
+ stime = wait_for_kauditd(stime, true);
} else {
if (audit_rate_check() && printk_ratelimit())
pr_warn("audit_backlog=%d > audit_backlog_limit=%d\n",
--
2.40.0



2023-05-19 20:57:54

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] audit: do not use exclusive wait in audit_receive()

On May 11, 2023 Eiichi Tsukata <[email protected]> wrote:
>
> kauditd thread issues wake_up() before it goes to sleep. The wake_up()
> call wakes up only one process as waiter side uses exclusive wait.
> This can be problematic when there are multiple processes (one is in
> audit_receive() and others are in audit_log_start()) waiting on
> audit_backlog_wait queue.
>
> For example, if there are two processes waiting:
>
> Process (A): in audit_receive()
> Process (B): in audit_log_start()
>
> And (A) is at the head of the wait queue. Then kauditd's wake_up() only
> wakes up (A) leaving (B) as it is even if @audit_queue is drained. As a
> result, (B) can be blocked for up to backlog_wait_time.
>
> To prevent the issue, use non-exclusive wait in audit_receive() so that
> kauditd can wake up all waiters in audit_receive().
>
> Fixes: 8f110f530635 ("audit: ensure userspace is penalized the same as the kernel when under pressure")
> Signed-off-by: Eiichi Tsukata <[email protected]>
> ---
> kernel/audit.c | 17 +++++++++++------
> 1 file changed, 11 insertions(+), 6 deletions(-)

This was also discussed in the last patchset.

--
paul-moore.com

2023-05-22 04:54:50

by Eiichi Tsukata

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] audit: do not use exclusive wait in audit_receive()



> On May 20, 2023, at 5:54, Paul Moore <[email protected]> wrote:
>
> On May 11, 2023 Eiichi Tsukata <[email protected]> wrote:
>>
>> kauditd thread issues wake_up() before it goes to sleep. The wake_up()
>> call wakes up only one process as waiter side uses exclusive wait.
>> This can be problematic when there are multiple processes (one is in
>> audit_receive() and others are in audit_log_start()) waiting on
>> audit_backlog_wait queue.
>>
>> For example, if there are two processes waiting:
>>
>> Process (A): in audit_receive()
>> Process (B): in audit_log_start()
>>
>> And (A) is at the head of the wait queue. Then kauditd's wake_up() only
>> wakes up (A) leaving (B) as it is even if @audit_queue is drained. As a
>> result, (B) can be blocked for up to backlog_wait_time.
>>
>> To prevent the issue, use non-exclusive wait in audit_receive() so that
>> kauditd can wake up all waiters in audit_receive().
>>
>> Fixes: 8f110f530635 ("audit: ensure userspace is penalized the same as the kernel when under pressure")
>> Signed-off-by: Eiichi Tsukata <[email protected]>
>> ---
>> kernel/audit.c | 17 +++++++++++------
>> 1 file changed, 11 insertions(+), 6 deletions(-)
>
> This was also discussed in the last patchset.
>
>

This bug is much easily reproducible on real environments and can cause problematic
user space failure like SSH connection timeout.
Let’s not keep the bug unfixed.
(Of course we’ve already carefully tuned audit related params and user space auditd config so that our product won’t hit backlog full.)

Other ideas in my minds are:

(1) Use different wait queues in audit_receive() and audit_log_start() to guarantee kautid
wake_up() tries to wake up a waiter in audit_log_start().

(2) Periodically (say in every 1 sec) check if @audit_queue is full in audit_receive() to prevent
audit_receive() from unnecessarily waiting for so long time.

BTW, the default backlog_wait_time is 60 * HZ which seems pretty large.
I’d appreciate if you could tell me the reason behind that value.

Eiichi

2023-05-23 05:09:48

by Eiichi Tsukata

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] audit: do not use exclusive wait in audit_receive()



> On May 22, 2023, at 13:44, Eiichi Tsukata <[email protected]> wrote:
>
>
>
>> On May 20, 2023, at 5:54, Paul Moore <[email protected]> wrote:
>>
>> On May 11, 2023 Eiichi Tsukata <[email protected]> wrote:
>>>
>>> kauditd thread issues wake_up() before it goes to sleep. The wake_up()
>>> call wakes up only one process as waiter side uses exclusive wait.
>>> This can be problematic when there are multiple processes (one is in
>>> audit_receive() and others are in audit_log_start()) waiting on
>>> audit_backlog_wait queue.
>>>
>>> For example, if there are two processes waiting:
>>>
>>> Process (A): in audit_receive()
>>> Process (B): in audit_log_start()
>>>
>>> And (A) is at the head of the wait queue. Then kauditd's wake_up() only
>>> wakes up (A) leaving (B) as it is even if @audit_queue is drained. As a
>>> result, (B) can be blocked for up to backlog_wait_time.
>>>
>>> To prevent the issue, use non-exclusive wait in audit_receive() so that
>>> kauditd can wake up all waiters in audit_receive().
>>>
>>> Fixes: 8f110f530635 ("audit: ensure userspace is penalized the same as the kernel when under pressure")
>>> Signed-off-by: Eiichi Tsukata <[email protected]>
>>> ---
>>> kernel/audit.c | 17 +++++++++++------
>>> 1 file changed, 11 insertions(+), 6 deletions(-)
>>
>> This was also discussed in the last patchset.
>>
>>
>
> This bug is much easily reproducible on real environments and can cause problematic
> user space failure like SSH connection timeout.
> Let’s not keep the bug unfixed.
> (Of course we’ve already carefully tuned audit related params and user space auditd config so that our product won’t hit backlog full.)
>
> Other ideas in my minds are:
>
> (1) Use different wait queues in audit_receive() and audit_log_start() to guarantee kautid
> wake_up() tries to wake up a waiter in audit_log_start().
>
> (2) Periodically (say in every 1 sec) check if @audit_queue is full in audit_receive() to prevent
> audit_receive() from unnecessarily waiting for so long time.
>
> BTW, the default backlog_wait_time is 60 * HZ which seems pretty large.
> I’d appreciate if you could tell me the reason behind that value.
>
> Eiichi

I came up with a better idea:

(3) Move wait_for_kauditd() in audit_receive() *before* audit_ctl_lock()
and restrict penalty only for msg_type which can queue a new audit record. (AUDIT_USER, AUDIT_TRIM, AUDIT_MAKE_EQUIV, ..)

Originally, it’s not reasonable to give penalty for innocent operation
like AUDIT_GET.
This approach makes successive audit_log_end() wake up kauditd.
Also it prevents audit_log_end() from queueing skb ignoring backlog_limit.

Eiichi

2023-05-23 21:24:49

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] audit: do not use exclusive wait in audit_receive()

On Mon, May 22, 2023 at 11:58 PM Eiichi Tsukata
<[email protected]> wrote:
> > On May 22, 2023, at 13:44, Eiichi Tsukata <[email protected]> wrote:
> >> On May 20, 2023, at 5:54, Paul Moore <[email protected]> wrote:
> >> On May 11, 2023 Eiichi Tsukata <[email protected]> wrote:
> >>>
> >>> kauditd thread issues wake_up() before it goes to sleep. The wake_up()
> >>> call wakes up only one process as waiter side uses exclusive wait.
> >>> This can be problematic when there are multiple processes (one is in
> >>> audit_receive() and others are in audit_log_start()) waiting on
> >>> audit_backlog_wait queue.
> >>>
> >>> For example, if there are two processes waiting:
> >>>
> >>> Process (A): in audit_receive()
> >>> Process (B): in audit_log_start()
> >>>
> >>> And (A) is at the head of the wait queue. Then kauditd's wake_up() only
> >>> wakes up (A) leaving (B) as it is even if @audit_queue is drained. As a
> >>> result, (B) can be blocked for up to backlog_wait_time.
> >>>
> >>> To prevent the issue, use non-exclusive wait in audit_receive() so that
> >>> kauditd can wake up all waiters in audit_receive().
> >>>
> >>> Fixes: 8f110f530635 ("audit: ensure userspace is penalized the same as the kernel when under pressure")
> >>> Signed-off-by: Eiichi Tsukata <[email protected]>
> >>> ---
> >>> kernel/audit.c | 17 +++++++++++------
> >>> 1 file changed, 11 insertions(+), 6 deletions(-)
> >>
> >> This was also discussed in the last patchset.
> >
> > This bug is much easily reproducible on real environments and can cause problematic
> > user space failure like SSH connection timeout.
> > Let’s not keep the bug unfixed.
> > (Of course we’ve already carefully tuned audit related params and user space auditd config so that our product won’t hit backlog full.)

Good. Resolving your issues through audit runtime configuration is
the proper solution to this.

> > BTW, the default backlog_wait_time is 60 * HZ which seems pretty large.
> > I’d appreciate if you could tell me the reason behind that value.

I do not recall the original logic behind that value. It is likely
that the original value predated my maintenance of the audit
subsystem.

--
paul-moore.com