2023-05-08 08:26:48

by Eiichi Tsukata

[permalink] [raw]
Subject: [PATCH 0/4] audit: refactor and fix for potential deadlock

Commit 7ffb8e317bae ("audit: we don't need to
__set_current_state(TASK_RUNNING)") accidentally moved queue full check
before add_wait_queue_exclusive() which introduced the following race:

CPU1 CPU2
======== ========
(in audit_log_start()) (in kauditd_thread())

queue is full
wake_up(&audit_backlog_wait)
wait_event_freezable()
add_wait_queue_exclusive()
...
schedule_timeout()

Once this happens, both audit_log_start() and kauditd_thread() can cause
deadlock for up to backlog_wait_time waiting for each other. To prevent
the race, this patch adds queue full check after
prepare_to_wait_exclusive().

This patchset consists of the following parts:

Patch 1-3: Refactorings and a small fix preferable for deadlock fix
Patch 4: Actual deadlock fix

Eiichi Tsukata (4):
audit: refactor queue full checks
audit: account backlog waiting time in audit_receive()
audit: convert DECLARE_WAITQUEUE to DEFINE_WAIT
audit: check if queue is full after prepare_to_wait_exclusive()

kernel/audit.c | 71 ++++++++++++++++++++++++++++++--------------------
1 file changed, 43 insertions(+), 28 deletions(-)

--
2.40.0


2023-05-08 08:27:15

by Eiichi Tsukata

[permalink] [raw]
Subject: [PATCH 1/4] audit: refactor queue full checks

Currently audit queue full checks are done in multiple places.
Consolidate them into one audit_queue_full().

Signed-off-by: Eiichi Tsukata <[email protected]>
---
kernel/audit.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 9bc0b0301198..c15694e1a76b 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -341,6 +341,12 @@ static inline int audit_rate_check(void)
return retval;
}

+static inline int audit_queue_full(const struct sk_buff_head *queue)
+{
+ return audit_backlog_limit &&
+ (skb_queue_len(queue) > audit_backlog_limit);
+}
+
/**
* audit_log_lost - conditionally log lost audit message event
* @message: the message stating reason for lost audit message
@@ -579,8 +585,7 @@ static void kauditd_hold_skb(struct sk_buff *skb, int error)
* record on the retry queue unless it's full, in which case drop it
*/
if (error == -EAGAIN) {
- if (!audit_backlog_limit ||
- skb_queue_len(&audit_retry_queue) < audit_backlog_limit) {
+ if (!audit_queue_full(&audit_retry_queue)) {
skb_queue_tail(&audit_retry_queue, skb);
return;
}
@@ -589,8 +594,7 @@ static void kauditd_hold_skb(struct sk_buff *skb, int error)
}

/* if we have room in the hold queue, queue the message */
- if (!audit_backlog_limit ||
- skb_queue_len(&audit_hold_queue) < audit_backlog_limit) {
+ if (!audit_queue_full(&audit_hold_queue)) {
skb_queue_tail(&audit_hold_queue, skb);
return;
}
@@ -613,8 +617,7 @@ static void kauditd_hold_skb(struct sk_buff *skb, int error)
*/
static void kauditd_retry_skb(struct sk_buff *skb, __always_unused int error)
{
- if (!audit_backlog_limit ||
- skb_queue_len(&audit_retry_queue) < audit_backlog_limit) {
+ if (!audit_queue_full(&audit_retry_queue)) {
skb_queue_tail(&audit_retry_queue, skb);
return;
}
@@ -1564,8 +1567,7 @@ static void audit_receive(struct sk_buff *skb)
audit_ctl_unlock();

/* can't block with the ctrl lock, so penalize the sender now */
- if (audit_backlog_limit &&
- (skb_queue_len(&audit_queue) > audit_backlog_limit)) {
+ if (audit_queue_full(&audit_queue)) {
DECLARE_WAITQUEUE(wait, current);

/* wake kauditd to try and flush the queue */
@@ -1866,8 +1868,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
if (!(auditd_test_task(current) || audit_ctl_owner_current())) {
long stime = audit_backlog_wait_time;

- while (audit_backlog_limit &&
- (skb_queue_len(&audit_queue) > audit_backlog_limit)) {
+ while (audit_queue_full(&audit_queue)) {
/* wake kauditd to try and flush the queue */
wake_up_interruptible(&kauditd_wait);

--
2.40.0

2023-05-08 14:23:55

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH 0/4] audit: refactor and fix for potential deadlock

On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata
<[email protected]> wrote:
> Commit 7ffb8e317bae ("audit: we don't need to
> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
> before add_wait_queue_exclusive() which introduced the following race:
>
> CPU1 CPU2
> ======== ========
> (in audit_log_start()) (in kauditd_thread())
>
> queue is full
> wake_up(&audit_backlog_wait)
> wait_event_freezable()
> add_wait_queue_exclusive()
> ...
> schedule_timeout()
>
> Once this happens, both audit_log_start() and kauditd_thread() can cause
> deadlock for up to backlog_wait_time waiting for each other. To prevent
> the race, this patch adds queue full check after
> prepare_to_wait_exclusive().

Have you seen this occur in practice? The first thing that
audit_log_start() does when detecting that @audit_queue is full is to
wake kauditd_thread, which should start immediately draining
@audit_queue (even if a daemon is not listening, the @audit_queue will
be drained into one of the other queues). While it is hard to predict
scheduling behavior, it seems unlikely that kauditd_thread would be
able to process the entire queue before audit_log_start() progresses
from waking kauditd_thread() to sleeping.

In the absolute worst case, the task attempting to emit an audit
record sleeps for either the configured wait time or until it is
awakened by kauditd_thread() which could be triggered by another task
attempting to emit an audit record.

--
paul-moore.com

2023-05-09 01:52:01

by Eiichi Tsukata

[permalink] [raw]
Subject: Re: [PATCH 0/4] audit: refactor and fix for potential deadlock



> On May 8, 2023, at 23:07, Paul Moore <[email protected]> wrote:
>
> On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata
> <[email protected]> wrote:
>> Commit 7ffb8e317bae ("audit: we don't need to
>> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
>> before add_wait_queue_exclusive() which introduced the following race:
>>
>> CPU1 CPU2
>> ======== ========
>> (in audit_log_start()) (in kauditd_thread())
>>
>> queue is full
>> wake_up(&audit_backlog_wait)
>> wait_event_freezable()
>> add_wait_queue_exclusive()
>> ...
>> schedule_timeout()
>>
>> Once this happens, both audit_log_start() and kauditd_thread() can cause
>> deadlock for up to backlog_wait_time waiting for each other. To prevent
>> the race, this patch adds queue full check after
>> prepare_to_wait_exclusive().
>
> Have you seen this occur in practice?

Yes, we hit this issue multiple times, though it’s pretty rare as you are mentioning.
In our case, sshd got stuck in audit_log_user_message(), which caused SSH connection
timeout.

Eiichi


2023-05-10 07:16:57

by Rinat Gadelshin

[permalink] [raw]
Subject: Re: [PATCH 1/4] audit: refactor queue full checks

Hi Eiichi!

Just one one for your patch.

On 08.05.2023 10:58, Eiichi Tsukata wrote:
> Currently audit queue full checks are done in multiple places.
> Consolidate them into one audit_queue_full().
>
> Signed-off-by: Eiichi Tsukata <[email protected]>
> ---
> kernel/audit.c | 21 +++++++++++----------
> 1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 9bc0b0301198..c15694e1a76b 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -341,6 +341,12 @@ static inline int audit_rate_check(void)
> return retval;
> }
>
> +static inline int audit_queue_full(const struct sk_buff_head *queue)
> +{
> + return audit_backlog_limit &&
> + (skb_queue_len(queue) > audit_backlog_limit);
It seems that we should use `>=` here.
> +}
> +
> /**
> * audit_log_lost - conditionally log lost audit message event
> * @message: the message stating reason for lost audit message
> @@ -579,8 +585,7 @@ static void kauditd_hold_skb(struct sk_buff *skb, int error)
> * record on the retry queue unless it's full, in which case drop it
> */
> if (error == -EAGAIN) {
> - if (!audit_backlog_limit ||
> - skb_queue_len(&audit_retry_queue) < audit_backlog_limit) {
> + if (!audit_queue_full(&audit_retry_queue)) {
> skb_queue_tail(&audit_retry_queue, skb);
> return;
> }
> @@ -589,8 +594,7 @@ static void kauditd_hold_skb(struct sk_buff *skb, int error)
> }
>
> /* if we have room in the hold queue, queue the message */
> - if (!audit_backlog_limit ||
> - skb_queue_len(&audit_hold_queue) < audit_backlog_limit) {
> + if (!audit_queue_full(&audit_hold_queue)) {
> skb_queue_tail(&audit_hold_queue, skb);
> return;
> }
> @@ -613,8 +617,7 @@ static void kauditd_hold_skb(struct sk_buff *skb, int error)
> */
> static void kauditd_retry_skb(struct sk_buff *skb, __always_unused int error)
> {
> - if (!audit_backlog_limit ||
> - skb_queue_len(&audit_retry_queue) < audit_backlog_limit) {
> + if (!audit_queue_full(&audit_retry_queue)) {
> skb_queue_tail(&audit_retry_queue, skb);
> return;
> }
> @@ -1564,8 +1567,7 @@ static void audit_receive(struct sk_buff *skb)
> audit_ctl_unlock();
>
> /* can't block with the ctrl lock, so penalize the sender now */
> - if (audit_backlog_limit &&
> - (skb_queue_len(&audit_queue) > audit_backlog_limit)) {
> + if (audit_queue_full(&audit_queue)) {
> DECLARE_WAITQUEUE(wait, current);
>
> /* wake kauditd to try and flush the queue */
> @@ -1866,8 +1868,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
> if (!(auditd_test_task(current) || audit_ctl_owner_current())) {
> long stime = audit_backlog_wait_time;
>
> - while (audit_backlog_limit &&
> - (skb_queue_len(&audit_queue) > audit_backlog_limit)) {
> + while (audit_queue_full(&audit_queue)) {
> /* wake kauditd to try and flush the queue */
> wake_up_interruptible(&kauditd_wait);
>

2023-05-10 07:29:51

by Eiichi Tsukata

[permalink] [raw]
Subject: Re: [PATCH 1/4] audit: refactor queue full checks



> On May 10, 2023, at 15:54, Rinat Gadelshin <[email protected]> wrote:
>
> Hi Eiichi!
>
> Just one one for your patch.
>
> On 08.05.2023 10:58, Eiichi Tsukata wrote:
>> Currently audit queue full checks are done in multiple places.
>> Consolidate them into one audit_queue_full().
>>
>> Signed-off-by: Eiichi Tsukata <[email protected]>
>> ---
>> kernel/audit.c | 21 +++++++++++----------
>> 1 file changed, 11 insertions(+), 10 deletions(-)
>>
>> diff --git a/kernel/audit.c b/kernel/audit.c
>> index 9bc0b0301198..c15694e1a76b 100644
>> --- a/kernel/audit.c
>> +++ b/kernel/audit.c
>> @@ -341,6 +341,12 @@ static inline int audit_rate_check(void)
>> return retval;
>> }
>> +static inline int audit_queue_full(const struct sk_buff_head *queue)
>> +{
>> + return audit_backlog_limit &&
>> + (skb_queue_len(queue) > audit_backlog_limit);
> It seems that we should use `>=` here.

Hi Rinat

Could you provide the detailed reason?

Currently queue full checks are done with ‘>’,
on the other hand queue NOT full checks are done with ‘<‘.

Looking into other similar checks in the kernel, unix_recvq_full() is using ‘>’.


Paul, how do you think about it?

Eiichi


2023-05-10 07:46:47

by Rinat Gadelshin

[permalink] [raw]
Subject: Re: [PATCH 1/4] audit: refactor queue full checks


On 10.05.2023 10:17, Eiichi Tsukata wrote:
>
>> On May 10, 2023, at 15:54, Rinat Gadelshin <[email protected]> wrote:
>>
>> Hi Eiichi!
>>
>> Just one one for your patch.
>>
>> On 08.05.2023 10:58, Eiichi Tsukata wrote:
>>> Currently audit queue full checks are done in multiple places.
>>> Consolidate them into one audit_queue_full().
>>>
>>> Signed-off-by: Eiichi Tsukata <[email protected]>
>>> ---
>>> kernel/audit.c | 21 +++++++++++----------
>>> 1 file changed, 11 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/kernel/audit.c b/kernel/audit.c
>>> index 9bc0b0301198..c15694e1a76b 100644
>>> --- a/kernel/audit.c
>>> +++ b/kernel/audit.c
>>> @@ -341,6 +341,12 @@ static inline int audit_rate_check(void)
>>> return retval;
>>> }
>>> +static inline int audit_queue_full(const struct sk_buff_head *queue)
>>> +{
>>> + return audit_backlog_limit &&
>>> + (skb_queue_len(queue) > audit_backlog_limit);
>> It seems that we should use `>=` here.
> Hi Rinat
>
> Could you provide the detailed reason?
>
> Currently queue full checks are done with ‘>’,
> on the other hand queue NOT full checks are done with ‘<‘.
>
> Looking into other similar checks in the kernel, unix_recvq_full() is using ‘>’.
Was (OR statement): `if (!audit_backlog_limit ||
skb_queue_len(&audit_retry_queue) < audit_backlog_limit)
For AND-statement it should be `if (audit_backlog_limit &&
(skb_queue_len(&audit_retry_queue) >= audit_backlog_limit))
Otherwise we get false for case `(skb_queue_len(&audit_retry_queue) ==
audit_backlog_limit)` which was true for the old implementation.
>
> Paul, how do you think about it?
>
> Eiichi
>
>

2023-05-10 08:14:33

by Eiichi Tsukata

[permalink] [raw]
Subject: Re: [PATCH 0/4] audit: refactor and fix for potential deadlock



> On May 9, 2023, at 10:34, Eiichi Tsukata <[email protected]> wrote:
>
>
>
>> On May 8, 2023, at 23:07, Paul Moore <[email protected]> wrote:
>>
>> On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata
>> <[email protected]> wrote:
>>> Commit 7ffb8e317bae ("audit: we don't need to
>>> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
>>> before add_wait_queue_exclusive() which introduced the following race:
>>>
>>> CPU1 CPU2
>>> ======== ========
>>> (in audit_log_start()) (in kauditd_thread())
>>>
>>> queue is full
>>> wake_up(&audit_backlog_wait)
>>> wait_event_freezable()
>>> add_wait_queue_exclusive()
>>> ...
>>> schedule_timeout()
>>>
>>> Once this happens, both audit_log_start() and kauditd_thread() can cause
>>> deadlock for up to backlog_wait_time waiting for each other. To prevent
>>> the race, this patch adds queue full check after
>>> prepare_to_wait_exclusive().
>>
>> Have you seen this occur in practice?
>
> Yes, we hit this issue multiple times, though it’s pretty rare as you are mentioning.
> In our case, sshd got stuck in audit_log_user_message(), which caused SSH connection
> timeout.
>

I found another case.

kauditd_thread issues wake_up(&audit_backlog_wait) once after wake up.
As waiter side is using add_wait_queue_exclusive() it only wakes up one process at once.

If kauditd wakes up a process which is sleeping in audit_log_start(), then the process will
queue skb and issue wake_up_interruptible(&kauditd_wait). No problem.
But if kauditd wakes up a process which is sleeping in audit_receive(), then the process
won’t try to wake up kauditd. In this case other processes sleeping in audit_log_start()
keep sleeping even if kauditd have flushed the queue.

At this point I’m planning to use non-exclusive wait in audit_receive() in v2.
Let me know if we should use wake_up_all() in kauditd or you have better solution.

Eiichi

2023-05-17 16:11:39

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH 0/4] audit: refactor and fix for potential deadlock

On Mon, May 8, 2023 at 9:49 PM Eiichi Tsukata
<[email protected]> wrote:
> > On May 8, 2023, at 23:07, Paul Moore <[email protected]> wrote:
> > On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata
> > <[email protected]> wrote:
> >> Commit 7ffb8e317bae ("audit: we don't need to
> >> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
> >> before add_wait_queue_exclusive() which introduced the following race:
> >>
> >> CPU1 CPU2
> >> ======== ========
> >> (in audit_log_start()) (in kauditd_thread())
> >>
> >> queue is full
> >> wake_up(&audit_backlog_wait)
> >> wait_event_freezable()
> >> add_wait_queue_exclusive()
> >> ...
> >> schedule_timeout()
> >>
> >> Once this happens, both audit_log_start() and kauditd_thread() can cause
> >> deadlock for up to backlog_wait_time waiting for each other. To prevent
> >> the race, this patch adds queue full check after
> >> prepare_to_wait_exclusive().
> >
> > Have you seen this occur in practice?
>
> Yes, we hit this issue multiple times, though it’s pretty rare as you are mentioning.
> In our case, sshd got stuck in audit_log_user_message(), which caused SSH connection
> timeout.

Sorry for the delay, I was away at a conference last week.

Regardless of how we tweak the wait, there is always going to be a
chance that rescheduling the task with a timeout is going to impact
the application. If you have sensitive applications where this is
proving to be a problem I would suggest tuning your backlog size, the
wait time, or removing the backlog limit entirely.

--
paul-moore.com

2023-05-17 16:18:55

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH 0/4] audit: refactor and fix for potential deadlock

On Wed, May 10, 2023 at 4:09 AM Eiichi Tsukata
<[email protected]> wrote:
> > On May 9, 2023, at 10:34, Eiichi Tsukata <[email protected]> wrote:
> >> On May 8, 2023, at 23:07, Paul Moore <[email protected]> wrote:
> >> On Mon, May 8, 2023 at 3:58 AM Eiichi Tsukata
> >> <[email protected]> wrote:
> >>> Commit 7ffb8e317bae ("audit: we don't need to
> >>> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
> >>> before add_wait_queue_exclusive() which introduced the following race:
> >>>
> >>> CPU1 CPU2
> >>> ======== ========
> >>> (in audit_log_start()) (in kauditd_thread())
> >>>
> >>> queue is full
> >>> wake_up(&audit_backlog_wait)
> >>> wait_event_freezable()
> >>> add_wait_queue_exclusive()
> >>> ...
> >>> schedule_timeout()
> >>>
> >>> Once this happens, both audit_log_start() and kauditd_thread() can cause
> >>> deadlock for up to backlog_wait_time waiting for each other. To prevent
> >>> the race, this patch adds queue full check after
> >>> prepare_to_wait_exclusive().
> >>
> >> Have you seen this occur in practice?
> >
> > Yes, we hit this issue multiple times, though it’s pretty rare as you are mentioning.
> > In our case, sshd got stuck in audit_log_user_message(), which caused SSH connection
> > timeout.
> >
>
> I found another case.
>
> kauditd_thread issues wake_up(&audit_backlog_wait) once after wake up.
> As waiter side is using add_wait_queue_exclusive() it only wakes up one process at once.
>
> If kauditd wakes up a process which is sleeping in audit_log_start(), then the process will
> queue skb and issue wake_up_interruptible(&kauditd_wait). No problem.
> But if kauditd wakes up a process which is sleeping in audit_receive(), then the process
> won’t try to wake up kauditd. In this case other processes sleeping in audit_log_start()
> keep sleeping even if kauditd have flushed the queue.
>
> At this point I’m planning to use non-exclusive wait in audit_receive() in v2.
> Let me know if we should use wake_up_all() in kauditd or you have better solution.

The nice part about marking the waiters as WQ_FLAG_EXCLUSIVE is that
we avoid the thundering herd problem. The waiter was held in the
first place because the system was under high levels of audit
pressure, so it stands to reason that a slow wake-up of the waiters is
a wise choice to avoid re-entering another audit pressure state.

I'll take a look at your v2 patch, but as things stand I still believe
that this is better addressed via runtime configuration (see my
previous email). However, some of the refactoring you've done might
be nice, I'll reply back on that in your v2 patch.

Thanks.

--
paul-moore.com