LinuxLists.cc - Re: Flush the hold queue fall into an infinite loop.

2022-01-13 15:22:54

Subject: Re: Flush the hold queue fall into an infinite loop.

On Thu, Jan 13, 2022 at 6:57 AM cuigaosheng <[email protected]> wrote:
>
> When we add "audit=1" to the cmdline, kauditd will take up 100%
> cpu resource.As follows:
>
> configurations:
> auditctl -b 64
> auditctl --backlog_wait_time 60000
> auditctl -r 0
> auditctl -w /root/aaa -p wrx
> shell scripts：
> #!/bin/bash
> i=0
> while [ $i -le 66 ]
> do
> touch /root/aaa
> let i++
> done
> mandatory conditions:
>
> add "audit=1" to the cmdline, and kill -19 pid_number(for /sbin/auditd).
>
> As long as we keep the audit_hold_queue non-empty, flush the hold queue will fall into
> an infinite loop.
>
> 713 static int kauditd_send_queue(struct sock *sk, u32 portid,
> 714 struct sk_buff_head *queue,
> 715 unsigned int retry_limit,
> 716 void (*skb_hook)(struct sk_buff *skb),
> 717 void (*err_hook)(struct sk_buff *skb))
> 718 {
> 719 int rc = 0;
> 720 struct sk_buff *skb;
> 721 unsigned int failed = 0;
> 722
> 723 /* NOTE: kauditd_thread takes care of all our locking, we just use
> 724 * the netlink info passed to us (e.g. sk and portid) */
> 725
> 726 while ((skb = skb_dequeue(queue))) {
> 727 /* call the skb_hook for each skb we touch */
> 728 if (skb_hook)
> 729 (*skb_hook)(skb);
> 730
> 731 /* can we send to anyone via unicast? */
> 732 if (!sk) {
> 733 if (err_hook)
> 734 (*err_hook)(skb);
> 735 continue;
> 736 }
> 737
> 738 retry:
> 739 /* grab an extra skb reference in case of error */
> 740 skb_get(skb);
> 741 rc = netlink_unicast(sk, skb, portid, 0);
> 742 if (rc < 0) {
> 743 /* send failed - try a few times unless fatal error */
> 744 if (++failed >= retry_limit ||
> 745 rc == -ECONNREFUSED || rc == -EPERM) {
> 746 sk = NULL;
> 747 if (err_hook)
> 748 (*err_hook)(skb);
> 749 if (rc == -EAGAIN)
> 750 rc = 0;
> 751 /* continue to drain the queue */
> 752 continue;
> 753 } else
> 754 goto retry;
> 755 } else {
> 756 /* skb sent - drop the extra reference and continue */
> 757 consume_skb(skb);
> 758 failed = 0;
> 759 }
> 760 }
> 761
> 762 return (rc >= 0 ? 0 : rc);
> 763 }
>
> When kauditd attempt to flush the hold queue, the queue parameter is &audit_hold_queue,
> and if netlink_unicast(line 741 ) return -EAGAIN, sk will be NULL(line 746), so err_hook(kauditd_rehold_skb)
> will be call. Then continue, skb_dequeue(line 726) and err_hook(kauditd_rehold_skb,line 733) will
> fall into an infinite loop.
> I don't really understand the value of audit_hold_queue, can we remove it, or stop droping the logs
> into kauditd_rehold_skb when the auditd is abnormal?

Thanks Gaosheng for the bug report, I'm able to reproduce this and I'm
looking into it now. I'll report back when I have a better idea of
the problem and a potential fix.

--
paul moore
http://www.paul-moore.com

2022-01-14 01:22:21

by Gaosheng Cui

[permalink] [raw]

Subject: Re: Flush the hold queue fall into an infinite loop.

I want to stop droping the logs into audit_hold_queue when the auditd is abnormal.it
seems that this modification goes against the design intent of audit_hold_queue. its
effect is similar to removing the audit_hold_queue.

diff --git a/kernel/audit.c b/kernel/audit.c
index 2a38cbaf3ddb..a8091b1a6587 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -748,6 +748,7 @@ static int kauditd_send_queue(struct sock *sk, u32
portid,
                                        (*err_hook)(skb);
                                if (rc == -EAGAIN)
                                        rc = 0;
+                               audit_default = AUDIT_OFF;
                                /* continue to drain the queue */
                                continue;
                        } else
@@ -755,6 +756,7 @@ static int kauditd_send_queue(struct sock *sk, u32
portid,
                } else {
                        /* skb sent - drop the extra reference and
continue */
                        consume_skb(skb);
+                       audit_default = audit_enabled;
                        failed = 0;
                }
        }

在 2022/1/13 23:22, Paul Moore 写道:
> On Thu, Jan 13, 2022 at 6:57 AM cuigaosheng <[email protected]> wrote:
>> When we add "audit=1" to the cmdline, kauditd will take up 100%
>> cpu resource.As follows:
>>
>> configurations:
>> auditctl -b 64
>> auditctl --backlog_wait_time 60000
>> auditctl -r 0
>> auditctl -w /root/aaa -p wrx
>> shell scripts：
>> #!/bin/bash
>> i=0
>> while [ $i -le 66 ]
>> do
>> touch /root/aaa
>> let i++
>> done
>> mandatory conditions:
>>
>> add "audit=1" to the cmdline, and kill -19 pid_number(for /sbin/auditd).
>>
>> As long as we keep the audit_hold_queue non-empty, flush the hold queue will fall into
>> an infinite loop.
>>
>> 713 static int kauditd_send_queue(struct sock *sk, u32 portid,
>> 714 struct sk_buff_head *queue,
>> 715 unsigned int retry_limit,
>> 716 void (*skb_hook)(struct sk_buff *skb),
>> 717 void (*err_hook)(struct sk_buff *skb))
>> 718 {
>> 719 int rc = 0;
>> 720 struct sk_buff *skb;
>> 721 unsigned int failed = 0;
>> 722
>> 723 /* NOTE: kauditd_thread takes care of all our locking, we just use
>> 724 * the netlink info passed to us (e.g. sk and portid) */
>> 725
>> 726 while ((skb = skb_dequeue(queue))) {
>> 727 /* call the skb_hook for each skb we touch */
>> 728 if (skb_hook)
>> 729 (*skb_hook)(skb);
>> 730
>> 731 /* can we send to anyone via unicast? */
>> 732 if (!sk) {
>> 733 if (err_hook)
>> 734 (*err_hook)(skb);
>> 735 continue;
>> 736 }
>> 737
>> 738 retry:
>> 739 /* grab an extra skb reference in case of error */
>> 740 skb_get(skb);
>> 741 rc = netlink_unicast(sk, skb, portid, 0);
>> 742 if (rc < 0) {
>> 743 /* send failed - try a few times unless fatal error */
>> 744 if (++failed >= retry_limit ||
>> 745 rc == -ECONNREFUSED || rc == -EPERM) {
>> 746 sk = NULL;
>> 747 if (err_hook)
>> 748 (*err_hook)(skb);
>> 749 if (rc == -EAGAIN)
>> 750 rc = 0;
>> 751 /* continue to drain the queue */
>> 752 continue;
>> 753 } else
>> 754 goto retry;
>> 755 } else {
>> 756 /* skb sent - drop the extra reference and continue */
>> 757 consume_skb(skb);
>> 758 failed = 0;
>> 759 }
>> 760 }
>> 761
>> 762 return (rc >= 0 ? 0 : rc);
>> 763 }
>>
>> When kauditd attempt to flush the hold queue, the queue parameter is &audit_hold_queue,
>> and if netlink_unicast(line 741 ) return -EAGAIN, sk will be NULL(line 746), so err_hook(kauditd_rehold_skb)
>> will be call. Then continue, skb_dequeue(line 726) and err_hook(kauditd_rehold_skb,line 733) will
>> fall into an infinite loop.
>> I don't really understand the value of audit_hold_queue, can we remove it, or stop droping the logs
>> into kauditd_rehold_skb when the auditd is abnormal?
> Thanks Gaosheng for the bug report, I'm able to reproduce this and I'm
> looking into it now. I'll report back when I have a better idea of
> the problem and a potential fix.
>

2022-01-15 00:18:14

by Paul Moore

[permalink] [raw]

Subject: Re: Flush the hold queue fall into an infinite loop.

On Thu, Jan 13, 2022 at 8:22 PM cuigaosheng <[email protected]> wrote:
>
> I want to stop droping the logs into audit_hold_queue when the auditd is abnormal.it
> seems that this modification goes against the design intent of audit_hold_queue. its
> effect is similar to removing the audit_hold_queue.
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 2a38cbaf3ddb..a8091b1a6587 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -748,6 +748,7 @@ static int kauditd_send_queue(struct sock *sk, u32
> portid,
> (*err_hook)(skb);
> if (rc == -EAGAIN)
> rc = 0;
> + audit_default = AUDIT_OFF;
> /* continue to drain the queue */
> continue;
> } else
> @@ -755,6 +756,7 @@ static int kauditd_send_queue(struct sock *sk, u32
> portid,
> } else {
> /* skb sent - drop the extra reference and
> continue */
> consume_skb(skb);
> + audit_default = audit_enabled;
> failed = 0;
> }
> }

We can't toggle the audit_default setting like this, that isn't
acceptable upstream. I believe I have a fix, but I need to finish the
testing before I can post it for further review.

> 在 2022/1/13 23:22, Paul Moore 写道:
> > On Thu, Jan 13, 2022 at 6:57 AM cuigaosheng <[email protected]> wrote:
> >> When we add "audit=1" to the cmdline, kauditd will take up 100%
> >> cpu resource.As follows:
> >>
> >> configurations:
> >> auditctl -b 64
> >> auditctl --backlog_wait_time 60000
> >> auditctl -r 0
> >> auditctl -w /root/aaa -p wrx
> >> shell scripts：
> >> #!/bin/bash
> >> i=0
> >> while [ $i -le 66 ]
> >> do
> >> touch /root/aaa
> >> let i++
> >> done
> >> mandatory conditions:
> >>
> >> add "audit=1" to the cmdline, and kill -19 pid_number(for /sbin/auditd).
> >>
> >> As long as we keep the audit_hold_queue non-empty, flush the hold queue will fall into
> >> an infinite loop.
> >>
> >> 713 static int kauditd_send_queue(struct sock *sk, u32 portid,
> >> 714 struct sk_buff_head *queue,
> >> 715 unsigned int retry_limit,
> >> 716 void (*skb_hook)(struct sk_buff *skb),
> >> 717 void (*err_hook)(struct sk_buff *skb))
> >> 718 {
> >> 719 int rc = 0;
> >> 720 struct sk_buff *skb;
> >> 721 unsigned int failed = 0;
> >> 722
> >> 723 /* NOTE: kauditd_thread takes care of all our locking, we just use
> >> 724 * the netlink info passed to us (e.g. sk and portid) */
> >> 725
> >> 726 while ((skb = skb_dequeue(queue))) {
> >> 727 /* call the skb_hook for each skb we touch */
> >> 728 if (skb_hook)
> >> 729 (*skb_hook)(skb);
> >> 730
> >> 731 /* can we send to anyone via unicast? */
> >> 732 if (!sk) {
> >> 733 if (err_hook)
> >> 734 (*err_hook)(skb);
> >> 735 continue;
> >> 736 }
> >> 737
> >> 738 retry:
> >> 739 /* grab an extra skb reference in case of error */
> >> 740 skb_get(skb);
> >> 741 rc = netlink_unicast(sk, skb, portid, 0);
> >> 742 if (rc < 0) {
> >> 743 /* send failed - try a few times unless fatal error */
> >> 744 if (++failed >= retry_limit ||
> >> 745 rc == -ECONNREFUSED || rc == -EPERM) {
> >> 746 sk = NULL;
> >> 747 if (err_hook)
> >> 748 (*err_hook)(skb);
> >> 749 if (rc == -EAGAIN)
> >> 750 rc = 0;
> >> 751 /* continue to drain the queue */
> >> 752 continue;
> >> 753 } else
> >> 754 goto retry;
> >> 755 } else {
> >> 756 /* skb sent - drop the extra reference and continue */
> >> 757 consume_skb(skb);
> >> 758 failed = 0;
> >> 759 }
> >> 760 }
> >> 761
> >> 762 return (rc >= 0 ? 0 : rc);
> >> 763 }
> >>
> >> When kauditd attempt to flush the hold queue, the queue parameter is &audit_hold_queue,
> >> and if netlink_unicast(line 741 ) return -EAGAIN, sk will be NULL(line 746), so err_hook(kauditd_rehold_skb)
> >> will be call. Then continue, skb_dequeue(line 726) and err_hook(kauditd_rehold_skb,line 733) will
> >> fall into an infinite loop.
> >> I don't really understand the value of audit_hold_queue, can we remove it, or stop droping the logs
> >> into kauditd_rehold_skb when the auditd is abnormal?
> > Thanks Gaosheng for the bug report, I'm able to reproduce this and I'm
> > looking into it now. I'll report back when I have a better idea of
> > the problem and a potential fix.
> >

--
paul moore
http://www.paul-moore.com