2013-08-28 22:21:18

by Luiz Capitulino

[permalink] [raw]
Subject: [RFC] audit: avoid soft lockup in audit_log_start()

I'm getting the following soft lockup:

CPU: 6 PID: 2278 Comm: killall5 Tainted: GF 3.11.0-rc7+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
0000000000000099 ffff88011fd83de8 ffffffff815324df 0000000000002800
ffffffff817d48f9 ffff88011fd83e68 ffffffff8152e669 ffff88011fd83e68
ffffffff00000008 ffff88011fd83e78 ffff88011fd83e18 0000004081dac040
Call Trace:
<IRQ> [<ffffffff815324df>] dump_stack+0x46/0x58
[<ffffffff8152e669>] panic+0xbb/0x1c4
[<ffffffff810d03c3>] watchdog_timer_fn+0x163/0x170
[<ffffffff8106c691>] __run_hrtimer+0x81/0x1c0
[<ffffffff810d0260>] ? watchdog+0x30/0x30
[<ffffffff8106cea7>] hrtimer_interrupt+0x107/0x240
[<ffffffff8102f61b>] local_apic_timer_interrupt+0x3b/0x60
[<ffffffff81542465>] smp_apic_timer_interrupt+0x45/0x60
[<ffffffff8154124a>] apic_timer_interrupt+0x6a/0x70
<EOI> [<ffffffff810c2f5f>] ? audit_log_start+0xbf/0x430
[<ffffffff810c2fe7>] ? audit_log_start+0x147/0x430
[<ffffffff81079030>] ? try_to_wake_up+0x2a0/0x2a0
[<ffffffff810c86be>] audit_log_exit+0x6ae/0xc30
[<ffffffff81188662>] ? __alloc_fd+0x42/0x100
[<ffffffff810c98e7>] __audit_syscall_exit+0x257/0x2b0
[<ffffffff81540794>] sysret_audit+0x17/0x21

The reproducer is somewhat unusual:

1. Install RHEL6.5 (maybe a similar older user-space will do)
2. Boot the just installed system
3. In this first boot you'll meet the firstboot script, which
will do some setup and (depending on your answers) it will
reboot the machine
4. During that first reboot the system hangs while terminating
all processes:

Sending all processes the TERM signal...

It's when the soft lockup above happens. And yes, I managed
to get this with latest upstream kernel (HEAD fa8218def1b1)

I'm reproducing it on a VM, but the first report was on bare-metal.

This is what is happening:

1. audit_log_start() is called
2. As we have SKBs waiting in audit_skb_queue and all conditions
evaluate to true, we sleep in wait_for_auditd()
3. Go to 2, until sleep_time gets negative and audit_log_start()
just busy-waits

Now, *why* this is happening is a mistery to me. I tried debugging
it, but all I could find is that at some point the kauditd thread
never wakes up after having called schedule(). I even tried waking
it up before calling wait_for_auditd(), but it didn't.

This patch is a RFC because it doesn't actually fix anything. It
just bails out before we busy-wait. Which is better than nothing,
but the system will still give a long pause when terminating all
processes. Besides, we risk papering over the real bug...

Signed-off-by: Luiz Capitulino <[email protected]>
---
kernel/audit.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 91e53d0..33e9c4c 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1097,6 +1097,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
unsigned int uninitialized_var(serial);
int reserve;
unsigned long timeout_start = jiffies;
+ unsigned long sleep_time;

if (audit_initialized != AUDIT_INITIALIZED)
return NULL;
@@ -1112,13 +1113,11 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,

while (audit_backlog_limit
&& skb_queue_len(&audit_skb_queue) > audit_backlog_limit + reserve) {
- if (gfp_mask & __GFP_WAIT && audit_backlog_wait_time) {
- unsigned long sleep_time;
-
sleep_time = timeout_start + audit_backlog_wait_time -
jiffies;
- if ((long)sleep_time > 0)
- wait_for_auditd(sleep_time);
+ if (gfp_mask & __GFP_WAIT && audit_backlog_wait_time
+ && sleep_time > 0) {
+ wait_for_auditd(sleep_time);
continue;
}
if (audit_rate_check() && printk_ratelimit())
--
1.8.1.4


2013-08-28 22:33:49

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start()

On Wed, 28 Aug 2013 18:21:14 -0400 Luiz Capitulino <[email protected]> wrote:

> I'm getting the following soft lockup:
>
> CPU: 6 PID: 2278 Comm: killall5 Tainted: GF 3.11.0-rc7+ #1
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> 0000000000000099 ffff88011fd83de8 ffffffff815324df 0000000000002800
> ffffffff817d48f9 ffff88011fd83e68 ffffffff8152e669 ffff88011fd83e68
> ffffffff00000008 ffff88011fd83e78 ffff88011fd83e18 0000004081dac040
> Call Trace:
> <IRQ> [<ffffffff815324df>] dump_stack+0x46/0x58
> [<ffffffff8152e669>] panic+0xbb/0x1c4
> [<ffffffff810d03c3>] watchdog_timer_fn+0x163/0x170
> [<ffffffff8106c691>] __run_hrtimer+0x81/0x1c0
> [<ffffffff810d0260>] ? watchdog+0x30/0x30
> [<ffffffff8106cea7>] hrtimer_interrupt+0x107/0x240
> [<ffffffff8102f61b>] local_apic_timer_interrupt+0x3b/0x60
> [<ffffffff81542465>] smp_apic_timer_interrupt+0x45/0x60
> [<ffffffff8154124a>] apic_timer_interrupt+0x6a/0x70
> <EOI> [<ffffffff810c2f5f>] ? audit_log_start+0xbf/0x430
> [<ffffffff810c2fe7>] ? audit_log_start+0x147/0x430
> [<ffffffff81079030>] ? try_to_wake_up+0x2a0/0x2a0
> [<ffffffff810c86be>] audit_log_exit+0x6ae/0xc30
> [<ffffffff81188662>] ? __alloc_fd+0x42/0x100
> [<ffffffff810c98e7>] __audit_syscall_exit+0x257/0x2b0
> [<ffffffff81540794>] sysret_audit+0x17/0x21
>
> The reproducer is somewhat unusual:
>
> 1. Install RHEL6.5 (maybe a similar older user-space will do)
> 2. Boot the just installed system
> 3. In this first boot you'll meet the firstboot script, which
> will do some setup and (depending on your answers) it will
> reboot the machine
> 4. During that first reboot the system hangs while terminating
> all processes:
>
> Sending all processes the TERM signal...
>
> It's when the soft lockup above happens. And yes, I managed
> to get this with latest upstream kernel (HEAD fa8218def1b1)
>
> I'm reproducing it on a VM, but the first report was on bare-metal.
>
> This is what is happening:
>
> 1. audit_log_start() is called
> 2. As we have SKBs waiting in audit_skb_queue and all conditions
> evaluate to true, we sleep in wait_for_auditd()
> 3. Go to 2, until sleep_time gets negative and audit_log_start()
> just busy-waits
>
> Now, *why* this is happening is a mistery to me. I tried debugging
> it, but all I could find is that at some point the kauditd thread
> never wakes up after having called schedule(). I even tried waking
> it up before calling wait_for_auditd(), but it didn't.

Odd. If kauditd_thread() is stuck in schedule() in state
TASK_INTERRUPTIBLE then a wake_up(kauditd_task) should get it running
again, unless scheduler-related data structures are screwed up.

Are you really sure that kauditd is stuck in schedule() and doesn't
come out?

2013-08-28 22:54:40

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start()

On Wed, 28 Aug 2013 15:33:45 -0700
Andrew Morton <[email protected]> wrote:

> On Wed, 28 Aug 2013 18:21:14 -0400 Luiz Capitulino <[email protected]> wrote:
>
> > I'm getting the following soft lockup:
> >
> > CPU: 6 PID: 2278 Comm: killall5 Tainted: GF 3.11.0-rc7+ #1
> > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > 0000000000000099 ffff88011fd83de8 ffffffff815324df 0000000000002800
> > ffffffff817d48f9 ffff88011fd83e68 ffffffff8152e669 ffff88011fd83e68
> > ffffffff00000008 ffff88011fd83e78 ffff88011fd83e18 0000004081dac040
> > Call Trace:
> > <IRQ> [<ffffffff815324df>] dump_stack+0x46/0x58
> > [<ffffffff8152e669>] panic+0xbb/0x1c4
> > [<ffffffff810d03c3>] watchdog_timer_fn+0x163/0x170
> > [<ffffffff8106c691>] __run_hrtimer+0x81/0x1c0
> > [<ffffffff810d0260>] ? watchdog+0x30/0x30
> > [<ffffffff8106cea7>] hrtimer_interrupt+0x107/0x240
> > [<ffffffff8102f61b>] local_apic_timer_interrupt+0x3b/0x60
> > [<ffffffff81542465>] smp_apic_timer_interrupt+0x45/0x60
> > [<ffffffff8154124a>] apic_timer_interrupt+0x6a/0x70
> > <EOI> [<ffffffff810c2f5f>] ? audit_log_start+0xbf/0x430
> > [<ffffffff810c2fe7>] ? audit_log_start+0x147/0x430
> > [<ffffffff81079030>] ? try_to_wake_up+0x2a0/0x2a0
> > [<ffffffff810c86be>] audit_log_exit+0x6ae/0xc30
> > [<ffffffff81188662>] ? __alloc_fd+0x42/0x100
> > [<ffffffff810c98e7>] __audit_syscall_exit+0x257/0x2b0
> > [<ffffffff81540794>] sysret_audit+0x17/0x21
> >
> > The reproducer is somewhat unusual:
> >
> > 1. Install RHEL6.5 (maybe a similar older user-space will do)
> > 2. Boot the just installed system
> > 3. In this first boot you'll meet the firstboot script, which
> > will do some setup and (depending on your answers) it will
> > reboot the machine
> > 4. During that first reboot the system hangs while terminating
> > all processes:
> >
> > Sending all processes the TERM signal...
> >
> > It's when the soft lockup above happens. And yes, I managed
> > to get this with latest upstream kernel (HEAD fa8218def1b1)
> >
> > I'm reproducing it on a VM, but the first report was on bare-metal.
> >
> > This is what is happening:
> >
> > 1. audit_log_start() is called
> > 2. As we have SKBs waiting in audit_skb_queue and all conditions
> > evaluate to true, we sleep in wait_for_auditd()
> > 3. Go to 2, until sleep_time gets negative and audit_log_start()
> > just busy-waits
> >
> > Now, *why* this is happening is a mistery to me. I tried debugging
> > it, but all I could find is that at some point the kauditd thread
> > never wakes up after having called schedule(). I even tried waking
> > it up before calling wait_for_auditd(), but it didn't.
>
> Odd. If kauditd_thread() is stuck in schedule() in state
> TASK_INTERRUPTIBLE then a wake_up(kauditd_task) should get it running
> again, unless scheduler-related data structures are screwed up.
>
> Are you really sure that kauditd is stuck in schedule() and doesn't
> come out?

No, that's a guess. Inferred from:

1. I tried calling wake_up_interruptible(&kauditd_wait); right
before wait_for_auditd(). Nothing changes

2. I added this debug printks:

diff --git a/kernel/audit.c b/kernel/audit.c
index 91e53d0..27448ad 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -458,11 +458,14 @@ static int kauditd_thread(void *dummy)
set_current_state(TASK_INTERRUPTIBLE);
add_wait_queue(&kauditd_wait, &wait);

+ pr_emerg_ratelimited("*** sleeping\n");
+
if (!skb_queue_len(&audit_skb_queue)) {
try_to_freeze();
schedule();
}

+ pr_emerg_ratelimited("*** waking up\n");
__set_current_state(TASK_RUNNING);
remove_wait_queue(&kauditd_wait, &wait);
}

I get several pairs of sleeping/waking up strings right before the
system begins to shut down. Then it stops (even though we do
have SKBs queued)

Now, the kauditd thread does exists when the soft lockup happens,
I could verify that with crash.

What's interesting is that I can only get it in that first reboot
done by the firstboot script in RHEL6. I didn't manage to get it
in any other way.

I'm open to any debugging tips, but be gentle because I'm not
very familiar with that stuff.

2013-08-28 23:08:19

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start()

On Wed, 28 Aug 2013 18:54:36 -0400 Luiz Capitulino <[email protected]> wrote:

> > Are you really sure that kauditd is stuck in schedule() and doesn't
> > come out?
>
> No, that's a guess. Inferred from:
>
> 1. I tried calling wake_up_interruptible(&kauditd_wait); right
> before wait_for_auditd(). Nothing changes
>
> 2. I added this debug printks:
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 91e53d0..27448ad 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -458,11 +458,14 @@ static int kauditd_thread(void *dummy)
> set_current_state(TASK_INTERRUPTIBLE);
> add_wait_queue(&kauditd_wait, &wait);
>
> + pr_emerg_ratelimited("*** sleeping\n");
> +
> if (!skb_queue_len(&audit_skb_queue)) {
> try_to_freeze();
> schedule();
> }
>
> + pr_emerg_ratelimited("*** waking up\n");
> __set_current_state(TASK_RUNNING);
> remove_wait_queue(&kauditd_wait, &wait);
> }
>
> I get several pairs of sleeping/waking up strings right before the
> system begins to shut down. Then it stops (even though we do
> have SKBs queued)

Well. I assume "*** sleeping" the last thing kauditd prints? If the
last print is "waking up" then obviously kauditd is stuck somewhere
else, which makes more sense. A sysrq-T (or echo t >
/proc/sysrq-trigger) should provide clues.

I'd remove the "_ratelimited" - you don't want your debug messages to be
ratelimited (ie: omitted).

> Now, the kauditd thread does exists when the soft lockup happens,
> I could verify that with crash.
>
> What's interesting is that I can only get it in that first reboot
> done by the firstboot script in RHEL6. I didn't manage to get it
> in any other way.
>
> I'm open to any debugging tips, but be gentle because I'm not
> very familiar with that stuff.

2013-08-29 00:49:06

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start()

On Wed, 28 Aug 2013 16:08:13 -0700
Andrew Morton <[email protected]> wrote:

> On Wed, 28 Aug 2013 18:54:36 -0400 Luiz Capitulino <[email protected]> wrote:
>
> > > Are you really sure that kauditd is stuck in schedule() and doesn't
> > > come out?
> >
> > No, that's a guess. Inferred from:
> >
> > 1. I tried calling wake_up_interruptible(&kauditd_wait); right
> > before wait_for_auditd(). Nothing changes
> >
> > 2. I added this debug printks:
> >
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 91e53d0..27448ad 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -458,11 +458,14 @@ static int kauditd_thread(void *dummy)
> > set_current_state(TASK_INTERRUPTIBLE);
> > add_wait_queue(&kauditd_wait, &wait);
> >
> > + pr_emerg_ratelimited("*** sleeping\n");
> > +
> > if (!skb_queue_len(&audit_skb_queue)) {
> > try_to_freeze();
> > schedule();
> > }
> >
> > + pr_emerg_ratelimited("*** waking up\n");
> > __set_current_state(TASK_RUNNING);
> > remove_wait_queue(&kauditd_wait, &wait);
> > }
> >
> > I get several pairs of sleeping/waking up strings right before the
> > system begins to shut down. Then it stops (even though we do
> > have SKBs queued)
>
> Well. I assume "*** sleeping" the last thing kauditd prints? If the
> last print is "waking up" then obviously kauditd is stuck somewhere
> else, which makes more sense. A sysrq-T (or echo t >
> /proc/sysrq-trigger) should provide clues.

Yes, "*** sleeping" is the last one.

> I'd remove the "_ratelimited" - you don't want your debug messages to be
> ratelimited (ie: omitted).

I'll try that.

Another interesting info is that the process that triggers the hang
is killall5 and by the time the hang happens auditd is not running
anymore.

2013-08-30 18:23:31

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start()

On Wed, 28 Aug 2013 16:08:13 -0700
Andrew Morton <[email protected]> wrote:

> On Wed, 28 Aug 2013 18:54:36 -0400 Luiz Capitulino <[email protected]> wrote:
>
> > > Are you really sure that kauditd is stuck in schedule() and doesn't
> > > come out?
> >
> > No, that's a guess. Inferred from:
> >
> > 1. I tried calling wake_up_interruptible(&kauditd_wait); right
> > before wait_for_auditd(). Nothing changes
> >
> > 2. I added this debug printks:
> >
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 91e53d0..27448ad 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -458,11 +458,14 @@ static int kauditd_thread(void *dummy)
> > set_current_state(TASK_INTERRUPTIBLE);
> > add_wait_queue(&kauditd_wait, &wait);
> >
> > + pr_emerg_ratelimited("*** sleeping\n");
> > +
> > if (!skb_queue_len(&audit_skb_queue)) {
> > try_to_freeze();
> > schedule();
> > }
> >
> > + pr_emerg_ratelimited("*** waking up\n");
> > __set_current_state(TASK_RUNNING);
> > remove_wait_queue(&kauditd_wait, &wait);
> > }
> >
> > I get several pairs of sleeping/waking up strings right before the
> > system begins to shut down. Then it stops (even though we do
> > have SKBs queued)
>
> Well. I assume "*** sleeping" the last thing kauditd prints? If the
> last print is "waking up" then obviously kauditd is stuck somewhere
> else, which makes more sense.

That's correct. According to crash, here's where it's stuck:

#0 [ffff880115511c58] __schedule at ffffffff815361de
#1 [ffff880115511cd0] schedule at ffffffff81537039
#2 [ffff880115511ce0] schedule_timeout at ffffffff81534524
#3 [ffff880115511d90] netlink_attachskb at ffffffff81498c69
#4 [ffff880115511df0] netlink_unicast at ffffffff81498d7f
#5 [ffff880115511e40] kauditd_send_skb at ffffffff810c254f
#6 [ffff880115511e60] kauditd_thread at ffffffff810c26e9
#7 [ffff880115511ec0] kthread at ffffffff810693b0
#8 [ffff880115511f50] ret_from_fork at ffffffff8154071c

I see two possible reasons for this:

1. User-space triggered a NETLINK_CONGESTED condition. If this is
the problem, then the fix is to make audit's netlink socket
non-blocking and queue SKBs if netlink_unicast() returns -EINVAL

2. It's a race condition in the audit code. This bug is only triggered
on SMP machines _and_ adding printk()s to kauditd_thread() makes
it go away. Also, if I'm not mistaken, there's a number of global
variables that are shared between kaudit_thread() and threads
calling audit_log(). For example: kaudit_sock, audit_skb_hold_queue,
kauditd_wait and others

I can try protecting the global data structures with a mutex but, will
such a patch be accepted if the bug goes away? :)

2013-09-09 14:32:20

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start()

Luiz Capitulino wrote:
> I'm getting the following soft lockup:
>
> CPU: 6 PID: 2278 Comm: killall5 Tainted: GF 3.11.0-rc7+ #1
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> 0000000000000099 ffff88011fd83de8 ffffffff815324df 0000000000002800
> ffffffff817d48f9 ffff88011fd83e68 ffffffff8152e669 ffff88011fd83e68
> ffffffff00000008 ffff88011fd83e78 ffff88011fd83e18 0000004081dac040
> Call Trace:
> <IRQ> [<ffffffff815324df>] dump_stack+0x46/0x58
> [<ffffffff8152e669>] panic+0xbb/0x1c4
> [<ffffffff810d03c3>] watchdog_timer_fn+0x163/0x170
> [<ffffffff8106c691>] __run_hrtimer+0x81/0x1c0
> [<ffffffff810d0260>] ? watchdog+0x30/0x30
> [<ffffffff8106cea7>] hrtimer_interrupt+0x107/0x240
> [<ffffffff8102f61b>] local_apic_timer_interrupt+0x3b/0x60
> [<ffffffff81542465>] smp_apic_timer_interrupt+0x45/0x60
> [<ffffffff8154124a>] apic_timer_interrupt+0x6a/0x70
> <EOI> [<ffffffff810c2f5f>] ? audit_log_start+0xbf/0x430
> [<ffffffff810c2fe7>] ? audit_log_start+0x147/0x430
> [<ffffffff81079030>] ? try_to_wake_up+0x2a0/0x2a0
> [<ffffffff810c86be>] audit_log_exit+0x6ae/0xc30
> [<ffffffff81188662>] ? __alloc_fd+0x42/0x100
> [<ffffffff810c98e7>] __audit_syscall_exit+0x257/0x2b0
> [<ffffffff81540794>] sysret_audit+0x17/0x21
>
> The reproducer is somewhat unusual:
>
> 1. Install RHEL6.5 (maybe a similar older user-space will do)
> 2. Boot the just installed system
> 3. In this first boot you'll meet the firstboot script, which
> will do some setup and (depending on your answers) it will
> reboot the machine
> 4. During that first reboot the system hangs while terminating
> all processes:
>
> Sending all processes the TERM signal...
>
> It's when the soft lockup above happens. And yes, I managed
> to get this with latest upstream kernel (HEAD fa8218def1b1)
>
> I'm reproducing it on a VM, but the first report was on bare-metal.
>
> This is what is happening:
>
> 1. audit_log_start() is called
> 2. As we have SKBs waiting in audit_skb_queue and all conditions
> evaluate to true, we sleep in wait_for_auditd()
> 3. Go to 2, until sleep_time gets negative and audit_log_start()
> just busy-waits
>
> Now, *why* this is happening is a mistery to me. I tried debugging
> it, but all I could find is that at some point the kauditd thread
> never wakes up after having called schedule(). I even tried waking
> it up before calling wait_for_auditd(), but it didn't.

We run into the same problem in rhel6 kernel.

"readahead-collector" uses audit interface and sometimes stuck in 'stopped' state.

After commit 829199197a430dade2519d54f5545c4a094393b8
(which was backported by RH into their kernel)
audit emiters will block forever if userspace daemon cannot handle backlog.
That commit just breaks timeout condition, after timeout waiting loop turns
into busy loop until deamon dies or returns back to work.

this trivial patch should fix that problem

--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1215,9 +1215,10 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,

sleep_time = timeout_start + audit_backlog_wait_time -
jiffies;
- if ((long)sleep_time > 0)
+ if ((long)sleep_time > 0) {
wait_for_auditd(sleep_time);
- continue;
+ continue;
+ }
}
if (audit_rate_check() && printk_ratelimit())
printk(KERN_WARNING

>
> This patch is a RFC because it doesn't actually fix anything. It
> just bails out before we busy-wait. Which is better than nothing,
> but the system will still give a long pause when terminating all
> processes. Besides, we risk papering over the real bug...
>
> Signed-off-by: Luiz Capitulino<[email protected]>
> ---
> kernel/audit.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 91e53d0..33e9c4c 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -1097,6 +1097,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
> unsigned int uninitialized_var(serial);
> int reserve;
> unsigned long timeout_start = jiffies;
> + unsigned long sleep_time;
>
> if (audit_initialized != AUDIT_INITIALIZED)
> return NULL;
> @@ -1112,13 +1113,11 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
>
> while (audit_backlog_limit
> && skb_queue_len(&audit_skb_queue)> audit_backlog_limit + reserve) {
> - if (gfp_mask& __GFP_WAIT&& audit_backlog_wait_time) {
> - unsigned long sleep_time;
> -
> sleep_time = timeout_start + audit_backlog_wait_time -
> jiffies;
> - if ((long)sleep_time> 0)
> - wait_for_auditd(sleep_time);
> + if (gfp_mask& __GFP_WAIT&& audit_backlog_wait_time
> + && sleep_time> 0) {
> + wait_for_auditd(sleep_time);
> continue;
> }
> if (audit_rate_check()&& printk_ratelimit())

2013-09-09 14:54:50

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start()

On Mon, 09 Sep 2013 18:32:13 +0400
Konstantin Khlebnikov <[email protected]> wrote:

> Luiz Capitulino wrote:
> > I'm getting the following soft lockup:
> >
> > CPU: 6 PID: 2278 Comm: killall5 Tainted: GF 3.11.0-rc7+ #1
> > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > 0000000000000099 ffff88011fd83de8 ffffffff815324df 0000000000002800
> > ffffffff817d48f9 ffff88011fd83e68 ffffffff8152e669 ffff88011fd83e68
> > ffffffff00000008 ffff88011fd83e78 ffff88011fd83e18 0000004081dac040
> > Call Trace:
> > <IRQ> [<ffffffff815324df>] dump_stack+0x46/0x58
> > [<ffffffff8152e669>] panic+0xbb/0x1c4
> > [<ffffffff810d03c3>] watchdog_timer_fn+0x163/0x170
> > [<ffffffff8106c691>] __run_hrtimer+0x81/0x1c0
> > [<ffffffff810d0260>] ? watchdog+0x30/0x30
> > [<ffffffff8106cea7>] hrtimer_interrupt+0x107/0x240
> > [<ffffffff8102f61b>] local_apic_timer_interrupt+0x3b/0x60
> > [<ffffffff81542465>] smp_apic_timer_interrupt+0x45/0x60
> > [<ffffffff8154124a>] apic_timer_interrupt+0x6a/0x70
> > <EOI> [<ffffffff810c2f5f>] ? audit_log_start+0xbf/0x430
> > [<ffffffff810c2fe7>] ? audit_log_start+0x147/0x430
> > [<ffffffff81079030>] ? try_to_wake_up+0x2a0/0x2a0
> > [<ffffffff810c86be>] audit_log_exit+0x6ae/0xc30
> > [<ffffffff81188662>] ? __alloc_fd+0x42/0x100
> > [<ffffffff810c98e7>] __audit_syscall_exit+0x257/0x2b0
> > [<ffffffff81540794>] sysret_audit+0x17/0x21
> >
> > The reproducer is somewhat unusual:
> >
> > 1. Install RHEL6.5 (maybe a similar older user-space will do)
> > 2. Boot the just installed system
> > 3. In this first boot you'll meet the firstboot script, which
> > will do some setup and (depending on your answers) it will
> > reboot the machine
> > 4. During that first reboot the system hangs while terminating
> > all processes:
> >
> > Sending all processes the TERM signal...
> >
> > It's when the soft lockup above happens. And yes, I managed
> > to get this with latest upstream kernel (HEAD fa8218def1b1)
> >
> > I'm reproducing it on a VM, but the first report was on bare-metal.
> >
> > This is what is happening:
> >
> > 1. audit_log_start() is called
> > 2. As we have SKBs waiting in audit_skb_queue and all conditions
> > evaluate to true, we sleep in wait_for_auditd()
> > 3. Go to 2, until sleep_time gets negative and audit_log_start()
> > just busy-waits
> >
> > Now, *why* this is happening is a mistery to me. I tried debugging
> > it, but all I could find is that at some point the kauditd thread
> > never wakes up after having called schedule(). I even tried waking
> > it up before calling wait_for_auditd(), but it didn't.
>
> We run into the same problem in rhel6 kernel.
>
> "readahead-collector" uses audit interface and sometimes stuck in 'stopped' state.

Yes, please also see:

http://marc.info/?l=linux-kernel&m=137818375024600&w=2

> After commit 829199197a430dade2519d54f5545c4a094393b8
> (which was backported by RH into their kernel)
> audit emiters will block forever if userspace daemon cannot handle backlog.
> That commit just breaks timeout condition, after timeout waiting loop turns
> into busy loop until deamon dies or returns back to work.
>
> this trivial patch should fix that problem
>
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -1215,9 +1215,10 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
>
> sleep_time = timeout_start + audit_backlog_wait_time -
> jiffies;
> - if ((long)sleep_time > 0)
> + if ((long)sleep_time > 0) {
> wait_for_auditd(sleep_time);
> - continue;
> + continue;
> + }
> }


Chuck Anderson posted a similar fix:

http://marc.info/?l=linux-kernel&m=137817994623832&w=2

I still get a hang for around a minute with Chuck's fix, I believe I'll
get the same thing with yours.

2013-09-09 15:19:19

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start()

Luiz Capitulino wrote:
> On Mon, 09 Sep 2013 18:32:13 +0400
> Konstantin Khlebnikov<[email protected]> wrote:
>
>> Luiz Capitulino wrote:
>>> I'm getting the following soft lockup:
>>>
>>> CPU: 6 PID: 2278 Comm: killall5 Tainted: GF 3.11.0-rc7+ #1
>>> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>>> 0000000000000099 ffff88011fd83de8 ffffffff815324df 0000000000002800
>>> ffffffff817d48f9 ffff88011fd83e68 ffffffff8152e669 ffff88011fd83e68
>>> ffffffff00000008 ffff88011fd83e78 ffff88011fd83e18 0000004081dac040
>>> Call Trace:
>>> <IRQ> [<ffffffff815324df>] dump_stack+0x46/0x58
>>> [<ffffffff8152e669>] panic+0xbb/0x1c4
>>> [<ffffffff810d03c3>] watchdog_timer_fn+0x163/0x170
>>> [<ffffffff8106c691>] __run_hrtimer+0x81/0x1c0
>>> [<ffffffff810d0260>] ? watchdog+0x30/0x30
>>> [<ffffffff8106cea7>] hrtimer_interrupt+0x107/0x240
>>> [<ffffffff8102f61b>] local_apic_timer_interrupt+0x3b/0x60
>>> [<ffffffff81542465>] smp_apic_timer_interrupt+0x45/0x60
>>> [<ffffffff8154124a>] apic_timer_interrupt+0x6a/0x70
>>> <EOI> [<ffffffff810c2f5f>] ? audit_log_start+0xbf/0x430
>>> [<ffffffff810c2fe7>] ? audit_log_start+0x147/0x430
>>> [<ffffffff81079030>] ? try_to_wake_up+0x2a0/0x2a0
>>> [<ffffffff810c86be>] audit_log_exit+0x6ae/0xc30
>>> [<ffffffff81188662>] ? __alloc_fd+0x42/0x100
>>> [<ffffffff810c98e7>] __audit_syscall_exit+0x257/0x2b0
>>> [<ffffffff81540794>] sysret_audit+0x17/0x21
>>>
>>> The reproducer is somewhat unusual:
>>>
>>> 1. Install RHEL6.5 (maybe a similar older user-space will do)
>>> 2. Boot the just installed system
>>> 3. In this first boot you'll meet the firstboot script, which
>>> will do some setup and (depending on your answers) it will
>>> reboot the machine
>>> 4. During that first reboot the system hangs while terminating
>>> all processes:
>>>
>>> Sending all processes the TERM signal...
>>>
>>> It's when the soft lockup above happens. And yes, I managed
>>> to get this with latest upstream kernel (HEAD fa8218def1b1)
>>>
>>> I'm reproducing it on a VM, but the first report was on bare-metal.
>>>
>>> This is what is happening:
>>>
>>> 1. audit_log_start() is called
>>> 2. As we have SKBs waiting in audit_skb_queue and all conditions
>>> evaluate to true, we sleep in wait_for_auditd()
>>> 3. Go to 2, until sleep_time gets negative and audit_log_start()
>>> just busy-waits
>>>
>>> Now, *why* this is happening is a mistery to me. I tried debugging
>>> it, but all I could find is that at some point the kauditd thread
>>> never wakes up after having called schedule(). I even tried waking
>>> it up before calling wait_for_auditd(), but it didn't.
>>
>> We run into the same problem in rhel6 kernel.
>>
>> "readahead-collector" uses audit interface and sometimes stuck in 'stopped' state.
>
> Yes, please also see:
>
> http://marc.info/?l=linux-kernel&m=137818375024600&w=2
>
>> After commit 829199197a430dade2519d54f5545c4a094393b8
>> (which was backported by RH into their kernel)
>> audit emiters will block forever if userspace daemon cannot handle backlog.
>> That commit just breaks timeout condition, after timeout waiting loop turns
>> into busy loop until deamon dies or returns back to work.
>>
>> this trivial patch should fix that problem
>>
>> --- a/kernel/audit.c
>> +++ b/kernel/audit.c
>> @@ -1215,9 +1215,10 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
>>
>> sleep_time = timeout_start + audit_backlog_wait_time -
>> jiffies;
>> - if ((long)sleep_time> 0)
>> + if ((long)sleep_time> 0) {
>> wait_for_auditd(sleep_time);
>> - continue;
>> + continue;
>> + }
>> }
>
>
> Chuck Anderson posted a similar fix:
>
> http://marc.info/?l=linux-kernel&m=137817994623832&w=2
>
> I still get a hang for around a minute with Chuck's fix, I believe I'll
> get the same thing with yours.

Yep, this is normal behaviour -- default audit_backlog_wait_time is 60 seconds.

But this logic is really strange, I don't see any interface for tuning that timeout
and seems like kernel set it to zero after first disaster and never recovers it back.

2013-09-09 15:30:03

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start()

On Mon, 09 Sep 2013 19:19:14 +0400
Konstantin Khlebnikov <[email protected]> wrote:

> Luiz Capitulino wrote:
> > On Mon, 09 Sep 2013 18:32:13 +0400
> > Konstantin Khlebnikov<[email protected]> wrote:
> >
> >> Luiz Capitulino wrote:
> >>> I'm getting the following soft lockup:
> >>>
> >>> CPU: 6 PID: 2278 Comm: killall5 Tainted: GF 3.11.0-rc7+ #1
> >>> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> >>> 0000000000000099 ffff88011fd83de8 ffffffff815324df 0000000000002800
> >>> ffffffff817d48f9 ffff88011fd83e68 ffffffff8152e669 ffff88011fd83e68
> >>> ffffffff00000008 ffff88011fd83e78 ffff88011fd83e18 0000004081dac040
> >>> Call Trace:
> >>> <IRQ> [<ffffffff815324df>] dump_stack+0x46/0x58
> >>> [<ffffffff8152e669>] panic+0xbb/0x1c4
> >>> [<ffffffff810d03c3>] watchdog_timer_fn+0x163/0x170
> >>> [<ffffffff8106c691>] __run_hrtimer+0x81/0x1c0
> >>> [<ffffffff810d0260>] ? watchdog+0x30/0x30
> >>> [<ffffffff8106cea7>] hrtimer_interrupt+0x107/0x240
> >>> [<ffffffff8102f61b>] local_apic_timer_interrupt+0x3b/0x60
> >>> [<ffffffff81542465>] smp_apic_timer_interrupt+0x45/0x60
> >>> [<ffffffff8154124a>] apic_timer_interrupt+0x6a/0x70
> >>> <EOI> [<ffffffff810c2f5f>] ? audit_log_start+0xbf/0x430
> >>> [<ffffffff810c2fe7>] ? audit_log_start+0x147/0x430
> >>> [<ffffffff81079030>] ? try_to_wake_up+0x2a0/0x2a0
> >>> [<ffffffff810c86be>] audit_log_exit+0x6ae/0xc30
> >>> [<ffffffff81188662>] ? __alloc_fd+0x42/0x100
> >>> [<ffffffff810c98e7>] __audit_syscall_exit+0x257/0x2b0
> >>> [<ffffffff81540794>] sysret_audit+0x17/0x21
> >>>
> >>> The reproducer is somewhat unusual:
> >>>
> >>> 1. Install RHEL6.5 (maybe a similar older user-space will do)
> >>> 2. Boot the just installed system
> >>> 3. In this first boot you'll meet the firstboot script, which
> >>> will do some setup and (depending on your answers) it will
> >>> reboot the machine
> >>> 4. During that first reboot the system hangs while terminating
> >>> all processes:
> >>>
> >>> Sending all processes the TERM signal...
> >>>
> >>> It's when the soft lockup above happens. And yes, I managed
> >>> to get this with latest upstream kernel (HEAD fa8218def1b1)
> >>>
> >>> I'm reproducing it on a VM, but the first report was on bare-metal.
> >>>
> >>> This is what is happening:
> >>>
> >>> 1. audit_log_start() is called
> >>> 2. As we have SKBs waiting in audit_skb_queue and all conditions
> >>> evaluate to true, we sleep in wait_for_auditd()
> >>> 3. Go to 2, until sleep_time gets negative and audit_log_start()
> >>> just busy-waits
> >>>
> >>> Now, *why* this is happening is a mistery to me. I tried debugging
> >>> it, but all I could find is that at some point the kauditd thread
> >>> never wakes up after having called schedule(). I even tried waking
> >>> it up before calling wait_for_auditd(), but it didn't.
> >>
> >> We run into the same problem in rhel6 kernel.
> >>
> >> "readahead-collector" uses audit interface and sometimes stuck in 'stopped' state.
> >
> > Yes, please also see:
> >
> > http://marc.info/?l=linux-kernel&m=137818375024600&w=2
> >
> >> After commit 829199197a430dade2519d54f5545c4a094393b8
> >> (which was backported by RH into their kernel)
> >> audit emiters will block forever if userspace daemon cannot handle backlog.
> >> That commit just breaks timeout condition, after timeout waiting loop turns
> >> into busy loop until deamon dies or returns back to work.
> >>
> >> this trivial patch should fix that problem
> >>
> >> --- a/kernel/audit.c
> >> +++ b/kernel/audit.c
> >> @@ -1215,9 +1215,10 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
> >>
> >> sleep_time = timeout_start + audit_backlog_wait_time -
> >> jiffies;
> >> - if ((long)sleep_time> 0)
> >> + if ((long)sleep_time> 0) {
> >> wait_for_auditd(sleep_time);
> >> - continue;
> >> + continue;
> >> + }
> >> }
> >
> >
> > Chuck Anderson posted a similar fix:
> >
> > http://marc.info/?l=linux-kernel&m=137817994623832&w=2
> >
> > I still get a hang for around a minute with Chuck's fix, I believe I'll
> > get the same thing with yours.
>
> Yep, this is normal behaviour -- default audit_backlog_wait_time is 60 seconds.

Yes, I know that's the cause but I don't call it normal behavior to
be unable to use my system during 60 seconds.

> But this logic is really strange, I don't see any interface for tuning that timeout
> and seems like kernel set it to zero after first disaster and never recovers it back.

2013-09-09 15:42:16

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start()

Luiz Capitulino wrote:
> On Mon, 09 Sep 2013 19:19:14 +0400
> Konstantin Khlebnikov<[email protected]> wrote:
>
>> Luiz Capitulino wrote:
>>> On Mon, 09 Sep 2013 18:32:13 +0400
>>> Konstantin Khlebnikov<[email protected]> wrote:
>>>
>>>> Luiz Capitulino wrote:
>>>>> I'm getting the following soft lockup:
>>>>>
>>>>> CPU: 6 PID: 2278 Comm: killall5 Tainted: GF 3.11.0-rc7+ #1
>>>>> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>>>>> 0000000000000099 ffff88011fd83de8 ffffffff815324df 0000000000002800
>>>>> ffffffff817d48f9 ffff88011fd83e68 ffffffff8152e669 ffff88011fd83e68
>>>>> ffffffff00000008 ffff88011fd83e78 ffff88011fd83e18 0000004081dac040
>>>>> Call Trace:
>>>>> <IRQ> [<ffffffff815324df>] dump_stack+0x46/0x58
>>>>> [<ffffffff8152e669>] panic+0xbb/0x1c4
>>>>> [<ffffffff810d03c3>] watchdog_timer_fn+0x163/0x170
>>>>> [<ffffffff8106c691>] __run_hrtimer+0x81/0x1c0
>>>>> [<ffffffff810d0260>] ? watchdog+0x30/0x30
>>>>> [<ffffffff8106cea7>] hrtimer_interrupt+0x107/0x240
>>>>> [<ffffffff8102f61b>] local_apic_timer_interrupt+0x3b/0x60
>>>>> [<ffffffff81542465>] smp_apic_timer_interrupt+0x45/0x60
>>>>> [<ffffffff8154124a>] apic_timer_interrupt+0x6a/0x70
>>>>> <EOI> [<ffffffff810c2f5f>] ? audit_log_start+0xbf/0x430
>>>>> [<ffffffff810c2fe7>] ? audit_log_start+0x147/0x430
>>>>> [<ffffffff81079030>] ? try_to_wake_up+0x2a0/0x2a0
>>>>> [<ffffffff810c86be>] audit_log_exit+0x6ae/0xc30
>>>>> [<ffffffff81188662>] ? __alloc_fd+0x42/0x100
>>>>> [<ffffffff810c98e7>] __audit_syscall_exit+0x257/0x2b0
>>>>> [<ffffffff81540794>] sysret_audit+0x17/0x21
>>>>>
>>>>> The reproducer is somewhat unusual:
>>>>>
>>>>> 1. Install RHEL6.5 (maybe a similar older user-space will do)
>>>>> 2. Boot the just installed system
>>>>> 3. In this first boot you'll meet the firstboot script, which
>>>>> will do some setup and (depending on your answers) it will
>>>>> reboot the machine
>>>>> 4. During that first reboot the system hangs while terminating
>>>>> all processes:
>>>>>
>>>>> Sending all processes the TERM signal...
>>>>>
>>>>> It's when the soft lockup above happens. And yes, I managed
>>>>> to get this with latest upstream kernel (HEAD fa8218def1b1)
>>>>>
>>>>> I'm reproducing it on a VM, but the first report was on bare-metal.
>>>>>
>>>>> This is what is happening:
>>>>>
>>>>> 1. audit_log_start() is called
>>>>> 2. As we have SKBs waiting in audit_skb_queue and all conditions
>>>>> evaluate to true, we sleep in wait_for_auditd()
>>>>> 3. Go to 2, until sleep_time gets negative and audit_log_start()
>>>>> just busy-waits
>>>>>
>>>>> Now, *why* this is happening is a mistery to me. I tried debugging
>>>>> it, but all I could find is that at some point the kauditd thread
>>>>> never wakes up after having called schedule(). I even tried waking
>>>>> it up before calling wait_for_auditd(), but it didn't.
>>>>
>>>> We run into the same problem in rhel6 kernel.
>>>>
>>>> "readahead-collector" uses audit interface and sometimes stuck in 'stopped' state.
>>>
>>> Yes, please also see:
>>>
>>> http://marc.info/?l=linux-kernel&m=137818375024600&w=2
>>>
>>>> After commit 829199197a430dade2519d54f5545c4a094393b8
>>>> (which was backported by RH into their kernel)
>>>> audit emiters will block forever if userspace daemon cannot handle backlog.
>>>> That commit just breaks timeout condition, after timeout waiting loop turns
>>>> into busy loop until deamon dies or returns back to work.
>>>>
>>>> this trivial patch should fix that problem
>>>>
>>>> --- a/kernel/audit.c
>>>> +++ b/kernel/audit.c
>>>> @@ -1215,9 +1215,10 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
>>>>
>>>> sleep_time = timeout_start + audit_backlog_wait_time -
>>>> jiffies;
>>>> - if ((long)sleep_time> 0)
>>>> + if ((long)sleep_time> 0) {
>>>> wait_for_auditd(sleep_time);
>>>> - continue;
>>>> + continue;
>>>> + }
>>>> }
>>>
>>>
>>> Chuck Anderson posted a similar fix:
>>>
>>> http://marc.info/?l=linux-kernel&m=137817994623832&w=2
>>>
>>> I still get a hang for around a minute with Chuck's fix, I believe I'll
>>> get the same thing with yours.
>>
>> Yep, this is normal behaviour -- default audit_backlog_wait_time is 60 seconds.
>
> Yes, I know that's the cause but I don't call it normal behavior to
> be unable to use my system during 60 seconds.

This stuff was designed for auditing, so it shouldn't loose messages too easily.
Meanwhile 'readahead' reuses it for a different purpose without proper tuning.
I think kernel should provide interface for tuning that timeout and let 'readahead'
set it to zero or some small value.

>
>> But this logic is really strange, I don't see any interface for tuning that timeout
>> and seems like kernel set it to zero after first disaster and never recovers it back.

2013-09-10 16:03:29

by Eric Paris

[permalink] [raw]
Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start()

On Mon, 2013-09-09 at 18:32 +0400, Konstantin Khlebnikov wrote:
> Luiz Capitulino wrote:
> > I'm getting the following soft lockup:
> >
> > CPU: 6 PID: 2278 Comm: killall5 Tainted: GF 3.11.0-rc7+ #1
> > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > 0000000000000099 ffff88011fd83de8 ffffffff815324df 0000000000002800
> > ffffffff817d48f9 ffff88011fd83e68 ffffffff8152e669 ffff88011fd83e68
> > ffffffff00000008 ffff88011fd83e78 ffff88011fd83e18 0000004081dac040
> > Call Trace:
> > <IRQ> [<ffffffff815324df>] dump_stack+0x46/0x58
> > [<ffffffff8152e669>] panic+0xbb/0x1c4
> > [<ffffffff810d03c3>] watchdog_timer_fn+0x163/0x170
> > [<ffffffff8106c691>] __run_hrtimer+0x81/0x1c0
> > [<ffffffff810d0260>] ? watchdog+0x30/0x30
> > [<ffffffff8106cea7>] hrtimer_interrupt+0x107/0x240
> > [<ffffffff8102f61b>] local_apic_timer_interrupt+0x3b/0x60
> > [<ffffffff81542465>] smp_apic_timer_interrupt+0x45/0x60
> > [<ffffffff8154124a>] apic_timer_interrupt+0x6a/0x70
> > <EOI> [<ffffffff810c2f5f>] ? audit_log_start+0xbf/0x430
> > [<ffffffff810c2fe7>] ? audit_log_start+0x147/0x430
> > [<ffffffff81079030>] ? try_to_wake_up+0x2a0/0x2a0
> > [<ffffffff810c86be>] audit_log_exit+0x6ae/0xc30
> > [<ffffffff81188662>] ? __alloc_fd+0x42/0x100
> > [<ffffffff810c98e7>] __audit_syscall_exit+0x257/0x2b0
> > [<ffffffff81540794>] sysret_audit+0x17/0x21
> >
> > The reproducer is somewhat unusual:
> >
> > 1. Install RHEL6.5 (maybe a similar older user-space will do)
> > 2. Boot the just installed system
> > 3. In this first boot you'll meet the firstboot script, which
> > will do some setup and (depending on your answers) it will
> > reboot the machine
> > 4. During that first reboot the system hangs while terminating
> > all processes:
> >
> > Sending all processes the TERM signal...
> >
> > It's when the soft lockup above happens. And yes, I managed
> > to get this with latest upstream kernel (HEAD fa8218def1b1)
> >
> > I'm reproducing it on a VM, but the first report was on bare-metal.
> >
> > This is what is happening:
> >
> > 1. audit_log_start() is called
> > 2. As we have SKBs waiting in audit_skb_queue and all conditions
> > evaluate to true, we sleep in wait_for_auditd()
> > 3. Go to 2, until sleep_time gets negative and audit_log_start()
> > just busy-waits
> >
> > Now, *why* this is happening is a mistery to me. I tried debugging
> > it, but all I could find is that at some point the kauditd thread
> > never wakes up after having called schedule(). I even tried waking
> > it up before calling wait_for_auditd(), but it didn't.
>
> We run into the same problem in rhel6 kernel.
>
> "readahead-collector" uses audit interface and sometimes stuck in 'stopped' state.
>
> After commit 829199197a430dade2519d54f5545c4a094393b8
> (which was backported by RH into their kernel)
> audit emiters will block forever if userspace daemon cannot handle backlog.
> That commit just breaks timeout condition, after timeout waiting loop turns
> into busy loop until deamon dies or returns back to work.
>
> this trivial patch should fix that problem
>
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -1215,9 +1215,10 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
>
> sleep_time = timeout_start + audit_backlog_wait_time -
> jiffies;
> - if ((long)sleep_time > 0)
> + if ((long)sleep_time > 0) {
> wait_for_auditd(sleep_time);
> - continue;
> + continue;
> + }
> }
> if (audit_rate_check() && printk_ratelimit())
> printk(KERN_WARNING

I think this is the right(ish) fix, at least it gets at the real bug.
829199197a430dade2519d54f5545c4a094393b8 definitely is the problem. The
60 second wait is NOT causing the soft lockup. Instead after the 60
second wait, sleep_time, will go negative on subsequent trips through
the loop. Since sleep_time is negative we will hit the continue and
loop FOREVER with no opportunity for progress. Aka, softlockup...

I think it's a really ?interesting? (poor) coding style to leave
sleep_time going negative on the second trip through the loop to
indicate a timeout in wait_for_auditd(). Maybe the function should be
rewritten a bit. Have a return code from wait_for_auditd, if it waited
60 seconds, call the failure function, if it succeeded in less the 60
seconds continue along... At least something that makes the logic here
obvious.

I think Richard is also going to set audit_backlog_wait_time to 0 on
failure and will reset it to default on success. To address the other
issue as well. That is just dumb.

RGB should be taking care of it in RHEL as soon as he gets this worked
out.

-Eric

2013-09-10 17:45:59

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start()

On Tue, 10 Sep 2013 12:03:25 -0400
Eric Paris <[email protected]> wrote:

> On Mon, 2013-09-09 at 18:32 +0400, Konstantin Khlebnikov wrote:
> > Luiz Capitulino wrote:
> > > I'm getting the following soft lockup:
> > >
> > > CPU: 6 PID: 2278 Comm: killall5 Tainted: GF 3.11.0-rc7+ #1
> > > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > > 0000000000000099 ffff88011fd83de8 ffffffff815324df 0000000000002800
> > > ffffffff817d48f9 ffff88011fd83e68 ffffffff8152e669 ffff88011fd83e68
> > > ffffffff00000008 ffff88011fd83e78 ffff88011fd83e18 0000004081dac040
> > > Call Trace:
> > > <IRQ> [<ffffffff815324df>] dump_stack+0x46/0x58
> > > [<ffffffff8152e669>] panic+0xbb/0x1c4
> > > [<ffffffff810d03c3>] watchdog_timer_fn+0x163/0x170
> > > [<ffffffff8106c691>] __run_hrtimer+0x81/0x1c0
> > > [<ffffffff810d0260>] ? watchdog+0x30/0x30
> > > [<ffffffff8106cea7>] hrtimer_interrupt+0x107/0x240
> > > [<ffffffff8102f61b>] local_apic_timer_interrupt+0x3b/0x60
> > > [<ffffffff81542465>] smp_apic_timer_interrupt+0x45/0x60
> > > [<ffffffff8154124a>] apic_timer_interrupt+0x6a/0x70
> > > <EOI> [<ffffffff810c2f5f>] ? audit_log_start+0xbf/0x430
> > > [<ffffffff810c2fe7>] ? audit_log_start+0x147/0x430
> > > [<ffffffff81079030>] ? try_to_wake_up+0x2a0/0x2a0
> > > [<ffffffff810c86be>] audit_log_exit+0x6ae/0xc30
> > > [<ffffffff81188662>] ? __alloc_fd+0x42/0x100
> > > [<ffffffff810c98e7>] __audit_syscall_exit+0x257/0x2b0
> > > [<ffffffff81540794>] sysret_audit+0x17/0x21
> > >
> > > The reproducer is somewhat unusual:
> > >
> > > 1. Install RHEL6.5 (maybe a similar older user-space will do)
> > > 2. Boot the just installed system
> > > 3. In this first boot you'll meet the firstboot script, which
> > > will do some setup and (depending on your answers) it will
> > > reboot the machine
> > > 4. During that first reboot the system hangs while terminating
> > > all processes:
> > >
> > > Sending all processes the TERM signal...
> > >
> > > It's when the soft lockup above happens. And yes, I managed
> > > to get this with latest upstream kernel (HEAD fa8218def1b1)
> > >
> > > I'm reproducing it on a VM, but the first report was on bare-metal.
> > >
> > > This is what is happening:
> > >
> > > 1. audit_log_start() is called
> > > 2. As we have SKBs waiting in audit_skb_queue and all conditions
> > > evaluate to true, we sleep in wait_for_auditd()
> > > 3. Go to 2, until sleep_time gets negative and audit_log_start()
> > > just busy-waits
> > >
> > > Now, *why* this is happening is a mistery to me. I tried debugging
> > > it, but all I could find is that at some point the kauditd thread
> > > never wakes up after having called schedule(). I even tried waking
> > > it up before calling wait_for_auditd(), but it didn't.
> >
> > We run into the same problem in rhel6 kernel.
> >
> > "readahead-collector" uses audit interface and sometimes stuck in 'stopped' state.
> >
> > After commit 829199197a430dade2519d54f5545c4a094393b8
> > (which was backported by RH into their kernel)
> > audit emiters will block forever if userspace daemon cannot handle backlog.
> > That commit just breaks timeout condition, after timeout waiting loop turns
> > into busy loop until deamon dies or returns back to work.
> >
> > this trivial patch should fix that problem
> >
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -1215,9 +1215,10 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
> >
> > sleep_time = timeout_start + audit_backlog_wait_time -
> > jiffies;
> > - if ((long)sleep_time > 0)
> > + if ((long)sleep_time > 0) {
> > wait_for_auditd(sleep_time);
> > - continue;
> > + continue;
> > + }
> > }
> > if (audit_rate_check() && printk_ratelimit())
> > printk(KERN_WARNING
>
> I think this is the right(ish) fix, at least it gets at the real bug.
> 829199197a430dade2519d54f5545c4a094393b8 definitely is the problem. The
> 60 second wait is NOT causing the soft lockup.

It's not. What it does cause is a 60 seconds hang, during which my system
is unusable. Let me stress that. It's not auditd that gets a 60 second hang.
My _impression_ is that any process doing a system call will hang there,
which causes the system to become unusable. IMHO, that's not acceptable
behavior.

Now, I don't know if the other changes you suggest Richard doing are going
to help with that (if they are then I'm all for them). Also, I wonder why
this long hang is not a problem for Chuck's and Konstantin's test-cases,
given they submitted similar fixes (which are also similar to my RFC).

2013-09-17 22:28:44

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start()

On Tue, 10 Sep 2013 12:03:25 -0400 Eric Paris <[email protected]> wrote:

> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -1215,9 +1215,10 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
> >
> > sleep_time = timeout_start + audit_backlog_wait_time -
> > jiffies;
> > - if ((long)sleep_time > 0)
> > + if ((long)sleep_time > 0) {
> > wait_for_auditd(sleep_time);
> > - continue;
> > + continue;
> > + }
> > }
> > if (audit_rate_check() && printk_ratelimit())
> > printk(KERN_WARNING
>
> I think this is the right(ish) fix, at least it gets at the real bug.
> 829199197a430dade2519d54f5545c4a094393b8 definitely is the problem.

um, which idiot wrote that?


Thngs are somewhat foggy at present. I have two patches from
Dan/Chuck:

Subject: audit: fix soft lockups due to loop in audit_log_start() wh,en audit_backlog_limit exceeded
Subject: audit: two efficiency fixes for audit mechanism

and two from Luiz:

Subject: audit: flush_hold_queue(): don't drop queued SKBs
Subject: audit: kaudit_send_skb(): make non-blocking call to netlink_unicast()

and now a protopatch from Konstantin which eparis likes.

So, umm, guys, can you please devote a bit of time to working out what
we should do here?

2013-09-17 22:54:11

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start()

On Tue, 17 Sep 2013 15:28:42 -0700
Andrew Morton <[email protected]> wrote:

> On Tue, 10 Sep 2013 12:03:25 -0400 Eric Paris <[email protected]> wrote:
>
> > > --- a/kernel/audit.c
> > > +++ b/kernel/audit.c
> > > @@ -1215,9 +1215,10 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
> > >
> > > sleep_time = timeout_start + audit_backlog_wait_time -
> > > jiffies;
> > > - if ((long)sleep_time > 0)
> > > + if ((long)sleep_time > 0) {
> > > wait_for_auditd(sleep_time);
> > > - continue;
> > > + continue;
> > > + }
> > > }
> > > if (audit_rate_check() && printk_ratelimit())
> > > printk(KERN_WARNING
> >
> > I think this is the right(ish) fix, at least it gets at the real bug.
> > 829199197a430dade2519d54f5545c4a094393b8 definitely is the problem.
>
> um, which idiot wrote that?

LOL!

> Thngs are somewhat foggy at present. I have two patches from
> Dan/Chuck:
>
> Subject: audit: fix soft lockups due to loop in audit_log_start() wh,en audit_backlog_limit exceeded
> Subject: audit: two efficiency fixes for audit mechanism
>
> and two from Luiz:
>
> Subject: audit: flush_hold_queue(): don't drop queued SKBs
> Subject: audit: kaudit_send_skb(): make non-blocking call to netlink_unicast()
>
> and now a protopatch from Konstantin which eparis likes.
>
> So, umm, guys, can you please devote a bit of time to working out what
> we should do here?

You can drop my patches. Konstantin's patch is a better version of my
first RFC. My second series is kind of a new concept which the audit
team seems to disagree with, and I won't push hard on it.

2013-09-18 01:57:42

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start()

On Tue, Sep 17, 2013 at 03:28:42PM -0700, Andrew Morton wrote:
> On Tue, 10 Sep 2013 12:03:25 -0400 Eric Paris <[email protected]> wrote:
>
> > > --- a/kernel/audit.c
> > > +++ b/kernel/audit.c
> > > @@ -1215,9 +1215,10 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
> > >
> > > sleep_time = timeout_start + audit_backlog_wait_time -
> > > jiffies;
> > > - if ((long)sleep_time > 0)
> > > + if ((long)sleep_time > 0) {
> > > wait_for_auditd(sleep_time);
> > > - continue;
> > > + continue;
> > > + }
> > > }
> > > if (audit_rate_check() && printk_ratelimit())
> > > printk(KERN_WARNING
> >
> > I think this is the right(ish) fix, at least it gets at the real bug.
> > 829199197a430dade2519d54f5545c4a094393b8 definitely is the problem.
>
> um, which idiot wrote that?

Heh...

> Thngs are somewhat foggy at present. I have two patches from
> Dan/Chuck:
>
> Subject: audit: fix soft lockups due to loop in audit_log_start() wh,en audit_backlog_limit exceeded
> Subject: audit: two efficiency fixes for audit mechanism

I have a patchset very similar to the first and breaking up the second,
with some added bits to address other related issues I'll post RSN...

> and two from Luiz:
>
> Subject: audit: flush_hold_queue(): don't drop queued SKBs
> Subject: audit: kaudit_send_skb(): make non-blocking call to netlink_unicast()
>
> and now a protopatch from Konstantin which eparis likes.
>
> So, umm, guys, can you please devote a bit of time to working out what
> we should do here?

It is coming... I'm hearing reports from others of the same bug, so it
is getting around...


- RGB

--
Richard Guy Briggs <[email protected]>
Senior Software Engineer
Kernel Security
AMER ENG Base Operating Systems
Remote, Ottawa, Canada
Voice: +1.647.777.2635
Internal: (81) 32635
Alt: +1.613.693.0684x3545

2013-09-18 09:48:22

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH] audit: fix endless wait in audit_log_start()

After commit 829199197a430dade2519d54f5545c4a094393b8 audit emiters will block
forever if userspace daemon cannot handle backlog. After the timeout waiting
loop turns into busy loop and runs until daemon dies or returns back to work.
This is minimal patch for that bug.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Richard Guy Briggs <[email protected]>
Cc: Eric Paris <[email protected]>
Cc: Chuck Anderson <[email protected]>
Cc: Dan Duval <[email protected]>
Cc: Dave Kleikamp <[email protected]>
---
kernel/audit.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 91e53d0..7b0e23a 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1117,9 +1117,10 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,

sleep_time = timeout_start + audit_backlog_wait_time -
jiffies;
- if ((long)sleep_time > 0)
+ if ((long)sleep_time > 0) {
wait_for_auditd(sleep_time);
- continue;
+ continue;
+ }
}
if (audit_rate_check() && printk_ratelimit())
printk(KERN_WARNING

2013-09-18 13:31:59

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH] audit: fix endless wait in audit_log_start()

On Wed, Sep 18, 2013 at 01:48:17PM +0400, Konstantin Khlebnikov wrote:
> After commit 829199197a430dade2519d54f5545c4a094393b8 audit emiters will block
> forever if userspace daemon cannot handle backlog. After the timeout waiting
> loop turns into busy loop and runs until daemon dies or returns back to work.
> This is minimal patch for that bug.

Thanks Konstantin,

This patch is in my patchset...

> Signed-off-by: Konstantin Khlebnikov <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Luiz Capitulino <[email protected]>
> Cc: Richard Guy Briggs <[email protected]>
> Cc: Eric Paris <[email protected]>
> Cc: Chuck Anderson <[email protected]>
> Cc: Dan Duval <[email protected]>
> Cc: Dave Kleikamp <[email protected]>
> ---
> kernel/audit.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 91e53d0..7b0e23a 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -1117,9 +1117,10 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
>
> sleep_time = timeout_start + audit_backlog_wait_time -
> jiffies;
> - if ((long)sleep_time > 0)
> + if ((long)sleep_time > 0) {
> wait_for_auditd(sleep_time);
> - continue;
> + continue;
> + }
> }
> if (audit_rate_check() && printk_ratelimit())
> printk(KERN_WARNING
>

- RGB

--
Richard Guy Briggs <[email protected]>
Senior Software Engineer
Kernel Security
AMER ENG Base Operating Systems
Remote, Ottawa, Canada
Voice: +1.647.777.2635
Internal: (81) 32635
Alt: +1.613.693.0684x3545

2013-09-18 19:07:55

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH 4/8] audit: efficiency fix 1: only wake up if queue shorter than backlog limit

author: Dan Duval <[email protected]>

These and similar errors were seen on a patched 3.8 kernel when the
audit subsystem was overrun during boot:

udevd[876]: worker [887] unexpectedly returned with status 0x0100
udevd[876]: worker [887] failed while handling
'/devices/pci0000:00/0000:00:03.0/0000:40:00.0'
udevd[876]: worker [880] unexpectedly returned with status 0x0100
udevd[876]: worker [880] failed while handling
'/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1/event1'

udevadm settle - timeout of 180 seconds reached, the event queue
contains:
/sys/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1/event1 (3995)
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/INT3F0D:00 (4034)

audit: audit_backlog=258 > audit_backlog_limit=256
audit: audit_lost=1 audit_rate_limit=0 audit_backlog_limit=256

The change below increases the efficiency of the audit code and prevents it
from being overrun:

Only issue a wake_up in kauditd if the length of the skb queue is less than the
backlog limit. Otherwise, threads waiting in wait_for_auditd() will simply
wake up, discover that the queue is still too long for them to proceed, and go
back to sleep. This results in wasted context switches and machine cycles.
kauditd_thread() is the only function that removes buffers from audit_skb_queue
so we can't race. If we did, the timeout in wait_for_auditd() would expire and
the waiting thread would continue.

See: https://lkml.org/lkml/2013/9/2/479

Signed-off-by: Dan Duval <[email protected]>
Signed-off-by: Chuck Anderson <[email protected]>
Signed-off-by: Richard Guy Briggs <[email protected]>
---
kernel/audit.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 42c68db..25fab2d 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -448,8 +448,10 @@ static int kauditd_thread(void *dummy)
flush_hold_queue();

skb = skb_dequeue(&audit_skb_queue);
- wake_up(&audit_backlog_wait);
+
if (skb) {
+ if(skb_queue_len(&audit_skb_queue) <= audit_backlog_limit)
+ wake_up(&audit_backlog_wait);
if (audit_pid)
kauditd_send_skb(skb);
else
--
1.7.1

2013-09-18 19:07:58

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH 8/8] audit: add audit_backlog_wait_time configuration option

reaahead-collector abuses the audit logging facility to discover which files
are accessed at boot time to make a pre-load list

Add a tuning option to audit_backlog_wait_time so that if auditd can't keep up,
or gets blocked, the callers won't be blocked.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
include/uapi/linux/audit.h | 2 ++
kernel/audit.c | 22 +++++++++++++++++++++-
2 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 75cef3f..493a66e 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -316,6 +316,7 @@ enum {
#define AUDIT_STATUS_PID 0x0004
#define AUDIT_STATUS_RATE_LIMIT 0x0008
#define AUDIT_STATUS_BACKLOG_LIMIT 0x0010
+#define AUDIT_STATUS_BACKLOG_WAIT_TIME 0x0020
/* Failure-to-log actions */
#define AUDIT_FAIL_SILENT 0
#define AUDIT_FAIL_PRINTK 1
@@ -367,6 +368,7 @@ struct audit_status {
__u32 backlog_limit; /* waiting messages limit */
__u32 lost; /* messages lost */
__u32 backlog; /* messages waiting in queue */
+ __u32 backlog_wait_time;/* message queue wait timeout */
};

struct audit_tty_status {
diff --git a/kernel/audit.c b/kernel/audit.c
index 3d17670..fc535b6 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -321,6 +321,12 @@ static int audit_set_backlog_limit(int limit)
return audit_do_config_change("audit_backlog_limit", &audit_backlog_limit, limit);
}

+static int audit_set_backlog_wait_time(int timeout)
+{
+ return audit_do_config_change("audit_backlog_wait_time",
+ &audit_backlog_wait_time, timeout);
+}
+
static int audit_set_enabled(int state)
{
int rc;
@@ -669,6 +675,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
s.backlog_limit = audit_backlog_limit;
s.lost = atomic_read(&audit_lost);
s.backlog = skb_queue_len(&audit_skb_queue);
+ s.backlog_wait_time = audit_backlog_wait_time;
audit_send_reply(NETLINK_CB(skb).portid, seq, AUDIT_GET, 0, 0,
&s, sizeof(s));
break;
@@ -701,8 +708,21 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
if (err < 0)
return err;
}
- if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT)
+ if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT) {
err = audit_set_backlog_limit(s.backlog_limit);
+ if (err < 0)
+ return err;
+ }
+ if (s.mask & AUDIT_STATUS_BACKLOG_WAIT_TIME) {
+ if (sizeof(s) > (size_t)nlh->nlmsg_len)
+ break;
+ if (s.backlog_wait_time < 0 ||
+ s.backlog_wait_time > 10*AUDIT_BACKLOG_WAIT_TIME)
+ return -EINVAL;
+ err = audit_set_backlog_wait_time(s.backlog_wait_time);
+ if (err < 0)
+ return err;
+ }
break;
}
case AUDIT_USER:
--
1.7.1

2013-09-18 19:08:11

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH 5/8] audit: efficiency fix 2: request exclusive wait since all need same resource

author: Dan Duval <[email protected]>

These and similar errors were seen on a patched 3.8 kernel when the
audit subsystem was overrun during boot:

udevd[876]: worker [887] unexpectedly returned with status 0x0100
udevd[876]: worker [887] failed while handling
'/devices/pci0000:00/0000:00:03.0/0000:40:00.0'
udevd[876]: worker [880] unexpectedly returned with status 0x0100
udevd[876]: worker [880] failed while handling
'/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1/event1'

udevadm settle - timeout of 180 seconds reached, the event queue
contains:
/sys/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1/event1 (3995)
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/INT3F0D:00 (4034)

audit: audit_backlog=258 > audit_backlog_limit=256
audit: audit_lost=1 audit_rate_limit=0 audit_backlog_limit=256

The change below increases the efficiency of the audit code and prevents it
from being overrun:

Use add_wait_queue_exclusive() in wait_for_auditd() to put the
thread on the wait queue. When kauditd dequeues an skb, all
of the waiting threads are waiting for the same resource, but
only one is going to get it, so there's no need to wake up
more than one waiter.

See: https://lkml.org/lkml/2013/9/2/479

Signed-off-by: Dan Duval <[email protected]>
Signed-off-by: Chuck Anderson <[email protected]>
Signed-off-by: Richard Guy Briggs <[email protected]>
---
kernel/audit.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 25fab2d..990d02f 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1061,7 +1061,7 @@ static unsigned long wait_for_auditd(unsigned long sleep_time)
unsigned long timeout = sleep_time;
DECLARE_WAITQUEUE(wait, current);
set_current_state(TASK_UNINTERRUPTIBLE);
- add_wait_queue(&audit_backlog_wait, &wait);
+ add_wait_queue_exclusive(&audit_backlog_wait, &wait);

if (audit_backlog_limit &&
skb_queue_len(&audit_skb_queue) > audit_backlog_limit)
--
1.7.1

2013-09-18 19:08:04

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH 1/8] audit: avoid soft lockup due to audit_log_start() incorrect loop termination

Commit 82919919 caused the wait for auditd timeout condition to loop endlessly
rather than fall through to the error recovery code.

BUG: soft lockup - CPU#0 stuck for 22s! [preload:785]
RIP: 0010:[<ffffffff810fb240>] [<ffffffff810fb240>] audit_log_start+0xf0/0x460
Call Trace:
[<ffffffff810aca40>] ? try_to_wake_up+0x310/0x310
[<ffffffff81100fd1>] audit_log_exit+0x51/0xcb0
[<ffffffff811020b5>] __audit_syscall_exit+0x275/0x2d0
[<ffffffff816ec540>] sysret_audit+0x17/0x21

If the timeout condition goes negative, the loop continues endlessly instead of
breaking out and going to the failure code and allow other processes to run
when auditd is unable to drain the queue fast enough.

This can easily be triggered by readahead-collector, in particular during
installations. The readahead-collector (ab)uses the audit interface and
sometimes gets stuck in a 'stopped' state.

To trigger:
readahead-collector -f &
pkill -SIGSTOP readahead-collector
top

See:
https://lkml.org/lkml/2013/8/28/626
https://lkml.org/lkml/2013/9/2/471
https://lkml.org/lkml/2013/9/3/4

Signed-off-by: Luiz Capitulino <[email protected]>
Signed-off-by: Konstantin Khlebnikov <[email protected]>
Signed-off-by: Dan Duval <[email protected]>
Signed-off-by: Chuck Anderson <[email protected]>
Signed-off-by: Richard Guy Briggs <[email protected]>
---
kernel/audit.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 91e53d0..7b0e23a 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1117,9 +1117,10 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,

sleep_time = timeout_start + audit_backlog_wait_time -
jiffies;
- if ((long)sleep_time > 0)
+ if ((long)sleep_time > 0) {
wait_for_auditd(sleep_time);
- continue;
+ continue;
+ }
}
if (audit_rate_check() && printk_ratelimit())
printk(KERN_WARNING
--
1.7.1

2013-09-18 19:08:34

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH 3/8] audit: make use of remaining sleep time from wait_for_auditd

If wait_for_auditd() times out, go immediately to the error function rather
than retesting the loop conditions.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
kernel/audit.c | 12 ++++++++----
1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 772725e..42c68db 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1054,18 +1054,21 @@ static inline void audit_get_stamp(struct audit_context *ctx,
/*
* Wait for auditd to drain the queue a little
*/
-static void wait_for_auditd(unsigned long sleep_time)
+static unsigned long wait_for_auditd(unsigned long sleep_time)
{
+ unsigned long timeout = sleep_time;
DECLARE_WAITQUEUE(wait, current);
set_current_state(TASK_UNINTERRUPTIBLE);
add_wait_queue(&audit_backlog_wait, &wait);

if (audit_backlog_limit &&
skb_queue_len(&audit_skb_queue) > audit_backlog_limit)
- schedule_timeout(sleep_time);
+ timeout = schedule_timeout(sleep_time);

__set_current_state(TASK_RUNNING);
remove_wait_queue(&audit_backlog_wait, &wait);
+
+ return timeout;
}

/* Obtain an audit buffer. This routine does locking to obtain the
@@ -1119,8 +1122,9 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
sleep_time = timeout_start + audit_backlog_wait_time -
jiffies;
if ((long)sleep_time > 0) {
- wait_for_auditd(sleep_time);
- continue;
+ sleep_time = wait_for_auditd(sleep_time);
+ if ((long)sleep_time > 0)
+ continue;
}
}
if (audit_rate_check() && printk_ratelimit())
--
1.7.1

2013-09-18 19:08:33

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH 0/8] Audit backlog queue fixes related to soft lockup

This patchset covers a number of issues surrounding the audit backlog queue.
The original trigger was a bug in a patch to fix code that produced negative
sleep times (commit 82919919). This bug caused soft lockups. There were a
number of other issues raised surrounding this bug which have been attempted
to be addressed in this patchset.

The first patch fixes this obvious bug.

The 2nd fixes the lack of reset to the original wait time that was set to
zero under error conditions should the system recover from the conditions that
tickled the first bug.

The 3rd makes use of the remaining timeout to avoid re-checking the loop
conditions on error.

The 4th and 5th are efficiency fixes for waking up waiting processes only when
necessary.

The 6th adds a boot parameter to temporarily override the backlog queue length
in case more buffers are needed before auditd is available.

The 7th and 8th are to add a config option to make the backlog wait time
configurable from the hard-coded default.


Richard Guy Briggs (8):
audit: avoid soft lockup due to audit_log_start() incorrect loop
termination
audit: reset audit backlog wait time after error recovery
audit: make use of remaining sleep time from wait_for_auditd
audit: efficiency fix 1: only wake up if queue shorter than backlog
limit
audit: efficiency fix 2: request exclusive wait since all need same
resource
audit: add boot option to override default backlog limit
audit: clean up AUDIT_GET/SET local variables and future-proof API
audit: add audit_backlog_wait_time configuration option

include/uapi/linux/audit.h | 2 +
kernel/audit.c | 111 +++++++++++++++++++++++++++++++-------------
2 files changed, 80 insertions(+), 33 deletions(-)

2013-09-18 19:09:00

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH 2/8] audit: reset audit backlog wait time after error recovery

When the audit queue overflows and times out (audit_backlog_wait_time), the
audit queue overflow timeout is set to zero. Once the audit queue overflow
timeout condition recovers, the timeout should be reset to the original value.

See also:
https://lkml.org/lkml/2013/9/2/473

Signed-off-by: Luiz Capitulino <[email protected]>
Signed-off-by: Dan Duval <[email protected]>
Signed-off-by: Chuck Anderson <[email protected]>
Signed-off-by: Richard Guy Briggs <[email protected]>
---
kernel/audit.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 7b0e23a..772725e 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -103,7 +103,8 @@ static int audit_rate_limit;

/* Number of outstanding audit_buffers allowed. */
static int audit_backlog_limit = 64;
-static int audit_backlog_wait_time = 60 * HZ;
+#define AUDIT_BACKLOG_WAIT_TIME (60 * HZ)
+static int audit_backlog_wait_time = AUDIT_BACKLOG_WAIT_TIME;
static int audit_backlog_wait_overflow = 0;

/* The identity of the user shutting down the audit system. */
@@ -1134,6 +1135,8 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
return NULL;
}

+ audit_backlog_wait_time = AUDIT_BACKLOG_WAIT_TIME;
+
ab = audit_buffer_alloc(ctx, gfp_mask, type);
if (!ab) {
audit_log_lost("out of memory in audit_log_start");
--
1.7.1

2013-09-18 19:09:26

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH 7/8] audit: clean up AUDIT_GET/SET local variables and future-proof API

Re-named confusing local variable names (status_set and status_get didn't agree
with their command type name) and reduced their scope.

Future-proof API changes by not depending on the exact size of the audit_status
struct.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
kernel/audit.c | 51 +++++++++++++++++++++++++++------------------------
1 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index acfa7a9..3d17670 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -635,7 +635,6 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
{
u32 seq;
void *data;
- struct audit_status *status_get, status_set;
int err;
struct audit_buffer *ab;
u16 msg_type = nlh->nlmsg_type;
@@ -661,47 +660,51 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
data = nlmsg_data(nlh);

switch (msg_type) {
- case AUDIT_GET:
- status_set.enabled = audit_enabled;
- status_set.failure = audit_failure;
- status_set.pid = audit_pid;
- status_set.rate_limit = audit_rate_limit;
- status_set.backlog_limit = audit_backlog_limit;
- status_set.lost = atomic_read(&audit_lost);
- status_set.backlog = skb_queue_len(&audit_skb_queue);
+ case AUDIT_GET: {
+ struct audit_status s;
+ s.enabled = audit_enabled;
+ s.failure = audit_failure;
+ s.pid = audit_pid;
+ s.rate_limit = audit_rate_limit;
+ s.backlog_limit = audit_backlog_limit;
+ s.lost = atomic_read(&audit_lost);
+ s.backlog = skb_queue_len(&audit_skb_queue);
audit_send_reply(NETLINK_CB(skb).portid, seq, AUDIT_GET, 0, 0,
- &status_set, sizeof(status_set));
+ &s, sizeof(s));
break;
- case AUDIT_SET:
- if (nlh->nlmsg_len < sizeof(struct audit_status))
- return -EINVAL;
- status_get = (struct audit_status *)data;
- if (status_get->mask & AUDIT_STATUS_ENABLED) {
- err = audit_set_enabled(status_get->enabled);
+ }
+ case AUDIT_SET: {
+ struct audit_status s;
+ memset(&s, 0, sizeof(s));
+ /* guard against past and future API changes */
+ memcpy(&s, data, min(sizeof(s), (size_t)nlh->nlmsg_len));
+ if (s.mask & AUDIT_STATUS_ENABLED) {
+ err = audit_set_enabled(s.enabled);
if (err < 0)
return err;
}
- if (status_get->mask & AUDIT_STATUS_FAILURE) {
- err = audit_set_failure(status_get->failure);
+ if (s.mask & AUDIT_STATUS_FAILURE) {
+ err = audit_set_failure(s.failure);
if (err < 0)
return err;
}
- if (status_get->mask & AUDIT_STATUS_PID) {
- int new_pid = status_get->pid;
+ if (s.mask & AUDIT_STATUS_PID) {
+ int new_pid = s.pid;

if (audit_enabled != AUDIT_OFF)
audit_log_config_change("audit_pid", new_pid, audit_pid, 1);
audit_pid = new_pid;
audit_nlk_portid = NETLINK_CB(skb).portid;
}
- if (status_get->mask & AUDIT_STATUS_RATE_LIMIT) {
- err = audit_set_rate_limit(status_get->rate_limit);
+ if (s.mask & AUDIT_STATUS_RATE_LIMIT) {
+ err = audit_set_rate_limit(s.rate_limit);
if (err < 0)
return err;
}
- if (status_get->mask & AUDIT_STATUS_BACKLOG_LIMIT)
- err = audit_set_backlog_limit(status_get->backlog_limit);
+ if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT)
+ err = audit_set_backlog_limit(s.backlog_limit);
break;
+ }
case AUDIT_USER:
case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
--
1.7.1

2013-09-18 19:09:58

by Richard Guy Briggs

[permalink] [raw]
Subject: [PATCH 6/8] audit: add boot option to override default backlog limit

The default audit_backlog_limit is 64. This was a reasonable limit at one time.

systemd causes so much audit queue activity on startup that auditd doesn't
start before the backlog queue has already overflowed by more than a factor of
2. On a system with audit= not set on the kernel command line, this isn't an
issue since that history isn't kept for auditd when it is available. On a
system with audit=1 set on the kernel command line, kaudit tries to keep that
history until auditd is able to drain the queue.

This default can be changed by the "-b" option in audit.rules once the system
has booted, but won't help with lost messages on boot.

One way to solve this would be to increase the default backlog queue size to
avoid losing any messages before auditd is able to consume them. This would
be overkill to the embedded community and insufficient for some servers.

Another way to solve it might be to add a kconfig option to set the default
based on the system type. An embedded system would get the current (or
smaller) default, while Workstations might get more than now and servers might
get more.

None of these solutions helps if a system's compiled default is too small to
see the lost messages without compiling a new kernel.

This patch adds a boot option (audit already has one to enable/disable it)
"audit_backlog_limit=<n>" that overrides the default to allow the system
administrator to set the backlog limit.

Signed-off-by: Richard Guy Briggs <[email protected]>
---
kernel/audit.c | 14 +++++++++++++-
1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 990d02f..acfa7a9 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -944,9 +944,21 @@ static int __init audit_enable(char *str)

return 1;
}
-
__setup("audit=", audit_enable);

+/* Process kernel command-line parameter at boot time. audit_backlog_limit=<n> */
+static int __init audit_backlog_limit_set(char *str)
+{
+ int audit_backlog_limit_arg = simple_strtol(str, NULL, 0);
+ if ((audit_backlog_limit_arg >= 0) && (audit_backlog_limit_arg < 8192))
+ audit_backlog_limit = audit_backlog_limit_arg;
+
+ printk(KERN_INFO "audit_backlog_limit: %d\n", audit_backlog_limit);
+
+ return 1;
+}
+__setup("audit_backlog_limit=", audit_backlog_limit_set);
+
static void audit_buffer_free(struct audit_buffer *ab)
{
unsigned long flags;
--
1.7.1

2013-09-18 20:34:19

by Eric Paris

[permalink] [raw]
Subject: Re: [PATCH 8/8] audit: add audit_backlog_wait_time configuration option

On Wed, 2013-09-18 at 15:06 -0400, Richard Guy Briggs wrote:
> reaahead-collector abuses the audit logging facility to discover which files
> are accessed at boot time to make a pre-load list
>
> Add a tuning option to audit_backlog_wait_time so that if auditd can't keep up,
> or gets blocked, the callers won't be blocked.
>
> Signed-off-by: Richard Guy Briggs <[email protected]>
> ---
> include/uapi/linux/audit.h | 2 ++
> kernel/audit.c | 22 +++++++++++++++++++++-
> 2 files changed, 23 insertions(+), 1 deletions(-)
>
> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index 75cef3f..493a66e 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -316,6 +316,7 @@ enum {
> #define AUDIT_STATUS_PID 0x0004
> #define AUDIT_STATUS_RATE_LIMIT 0x0008
> #define AUDIT_STATUS_BACKLOG_LIMIT 0x0010
> +#define AUDIT_STATUS_BACKLOG_WAIT_TIME 0x0020
> /* Failure-to-log actions */
> #define AUDIT_FAIL_SILENT 0
> #define AUDIT_FAIL_PRINTK 1
> @@ -367,6 +368,7 @@ struct audit_status {
> __u32 backlog_limit; /* waiting messages limit */
> __u32 lost; /* messages lost */
> __u32 backlog; /* messages waiting in queue */
> + __u32 backlog_wait_time;/* message queue wait timeout */
> };
>
> struct audit_tty_status {
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 3d17670..fc535b6 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -321,6 +321,12 @@ static int audit_set_backlog_limit(int limit)
> return audit_do_config_change("audit_backlog_limit", &audit_backlog_limit, limit);
> }
>
> +static int audit_set_backlog_wait_time(int timeout)
> +{
> + return audit_do_config_change("audit_backlog_wait_time",
> + &audit_backlog_wait_time, timeout);
> +}
> +
> static int audit_set_enabled(int state)
> {
> int rc;
> @@ -669,6 +675,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> s.backlog_limit = audit_backlog_limit;
> s.lost = atomic_read(&audit_lost);
> s.backlog = skb_queue_len(&audit_skb_queue);
> + s.backlog_wait_time = audit_backlog_wait_time;
> audit_send_reply(NETLINK_CB(skb).portid, seq, AUDIT_GET, 0, 0,
> &s, sizeof(s));
> break;
> @@ -701,8 +708,21 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> if (err < 0)
> return err;
> }
> - if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT)
> + if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT) {
> err = audit_set_backlog_limit(s.backlog_limit);
> + if (err < 0)
> + return err;
> + }
> + if (s.mask & AUDIT_STATUS_BACKLOG_WAIT_TIME) {
> + if (sizeof(s) > (size_t)nlh->nlmsg_len)
> + break;

What gets returned here? I think err has a value of 0, but it doesn't
seem to have been clearly intentional. If they know about the
AUDIT_STATUS_BACKLOG_WAIT_TIME flag, but they didn't send a long enough
skb? That seems like an error condition....

> + if (s.backlog_wait_time < 0 ||
> + s.backlog_wait_time > 10*AUDIT_BACKLOG_WAIT_TIME)
> + return -EINVAL;
> + err = audit_set_backlog_wait_time(s.backlog_wait_time);
> + if (err < 0)
> + return err;
> + }
> break;
> }
> case AUDIT_USER:

2013-09-18 20:49:58

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH 8/8] audit: add audit_backlog_wait_time configuration option

On Wed, Sep 18, 2013 at 04:33:25PM -0400, Eric Paris wrote:
> On Wed, 2013-09-18 at 15:06 -0400, Richard Guy Briggs wrote:
> > reaahead-collector abuses the audit logging facility to discover which files
> > are accessed at boot time to make a pre-load list
> >
> > Add a tuning option to audit_backlog_wait_time so that if auditd can't keep up,
> > or gets blocked, the callers won't be blocked.

> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 3d17670..fc535b6 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -701,8 +708,21 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> > if (err < 0)
> > return err;
> > }
> > - if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT)
> > + if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT) {
> > err = audit_set_backlog_limit(s.backlog_limit);
> > + if (err < 0)
> > + return err;
> > + }
> > + if (s.mask & AUDIT_STATUS_BACKLOG_WAIT_TIME) {
> > + if (sizeof(s) > (size_t)nlh->nlmsg_len)
> > + break;
>
> What gets returned here? I think err has a value of 0, but it doesn't
> seem to have been clearly intentional. If they know about the
> AUDIT_STATUS_BACKLOG_WAIT_TIME flag, but they didn't send a long enough
> skb? That seems like an error condition....

The intent was that it is a NOP, since err would have a value of zero,
but I see your point, that if that flag is present, the struct member
should be too. My original intent was that if the structure member
wasn't present, it would default to zero, unintentionally setting the
wait time to zero. It was part of my paranoia in the absence of an API
version indicator. No harm done, but I agree it should return an error.

Thanks for the catch.

> > + if (s.backlog_wait_time < 0 ||
> > + s.backlog_wait_time > 10*AUDIT_BACKLOG_WAIT_TIME)
> > + return -EINVAL;

I assume values less than zero or larger than 10 times the current
default of one minute are errors or unreasonable.

Any argument for more than 10 minutes?


- RGB

--
Richard Guy Briggs <[email protected]>
Senior Software Engineer
Kernel Security
AMER ENG Base Operating Systems
Remote, Ottawa, Canada
Voice: +1.647.777.2635
Internal: (81) 32635
Alt: +1.613.693.0684x3545

2013-09-18 20:54:09

by Eric Paris

[permalink] [raw]
Subject: Re: [PATCH 8/8] audit: add audit_backlog_wait_time configuration option

On Wed, 2013-09-18 at 16:49 -0400, Richard Guy Briggs wrote:
> On Wed, Sep 18, 2013 at 04:33:25PM -0400, Eric Paris wrote:
> > On Wed, 2013-09-18 at 15:06 -0400, Richard Guy Briggs wrote:
> > > reaahead-collector abuses the audit logging facility to discover which files
> > > are accessed at boot time to make a pre-load list
> > >
> > > Add a tuning option to audit_backlog_wait_time so that if auditd can't keep up,
> > > or gets blocked, the callers won't be blocked.
>
> > > diff --git a/kernel/audit.c b/kernel/audit.c
> > > index 3d17670..fc535b6 100644
> > > --- a/kernel/audit.c
> > > +++ b/kernel/audit.c
> > > @@ -701,8 +708,21 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> > > if (err < 0)
> > > return err;
> > > }
> > > - if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT)
> > > + if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT) {
> > > err = audit_set_backlog_limit(s.backlog_limit);
> > > + if (err < 0)
> > > + return err;
> > > + }
> > > + if (s.mask & AUDIT_STATUS_BACKLOG_WAIT_TIME) {
> > > + if (sizeof(s) > (size_t)nlh->nlmsg_len)
> > > + break;
> >
> > What gets returned here? I think err has a value of 0, but it doesn't
> > seem to have been clearly intentional. If they know about the
> > AUDIT_STATUS_BACKLOG_WAIT_TIME flag, but they didn't send a long enough
> > skb? That seems like an error condition....
>
> The intent was that it is a NOP, since err would have a value of zero,
> but I see your point, that if that flag is present, the struct member
> should be too. My original intent was that if the structure member
> wasn't present, it would default to zero, unintentionally setting the
> wait time to zero. It was part of my paranoia in the absence of an API
> version indicator. No harm done, but I agree it should return an error.
>
> Thanks for the catch.
>
> > > + if (s.backlog_wait_time < 0 ||
> > > + s.backlog_wait_time > 10*AUDIT_BACKLOG_WAIT_TIME)
> > > + return -EINVAL;
>
> I assume values less than zero or larger than 10 times the current
> default of one minute are errors or unreasonable.
>
> Any argument for more than 10 minutes?

10 Minutes already seems nuts. If someone has a reason to go higher, we
can change this sanity check then.

-Eric

2013-09-19 21:19:17

by Steve Grubb

[permalink] [raw]
Subject: Re: [PATCH 7/8] audit: clean up AUDIT_GET/SET local variables and future-proof API

On Wednesday, September 18, 2013 03:06:52 PM Richard Guy Briggs wrote:
> Re-named confusing local variable names (status_set and status_get didn't
> agree with their command type name) and reduced their scope.
>
> Future-proof API changes by not depending on the exact size of the
> audit_status struct.

I wished things like this were coordinated before being written. We had some
discussion of this back in July under a topic, "audit: implement generic
feature setting and retrieving". Maybe that API can be fixed so its not just
set/unset but can take a number as well.

Also, because there is no way to query the kernel to see what kind of things
it supports, we've typically defined a new message type and put into it exactly
what we need. In other words, if you want something expandable, the define a
new message type like AUDIT_GET_EXT and AUDIT_SET_EXT and build it to be
expandable.

Then in a future version of auditctl it will try to use the new command and
fall back to the old one if it gets EINVAL. Then some years later the old GET
and SET can be deprecated. But the audit code base has to support a wide
variety of kernels and suddenly making on resizable might break old code on
new kernel. A new message type is a safer migration path.

-Steve


> Signed-off-by: Richard Guy Briggs <[email protected]>
> ---
> kernel/audit.c | 51 +++++++++++++++++++++++++++------------------------
> 1 files changed, 27 insertions(+), 24 deletions(-)
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index acfa7a9..3d17670 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -635,7 +635,6 @@ static int audit_receive_msg(struct sk_buff *skb, struct
> nlmsghdr *nlh) {
> u32 seq;
> void *data;
> - struct audit_status *status_get, status_set;
> int err;
> struct audit_buffer *ab;
> u16 msg_type = nlh->nlmsg_type;
> @@ -661,47 +660,51 @@ static int audit_receive_msg(struct sk_buff *skb,
> struct nlmsghdr *nlh) data = nlmsg_data(nlh);
>
> switch (msg_type) {
> - case AUDIT_GET:
> - status_set.enabled = audit_enabled;
> - status_set.failure = audit_failure;
> - status_set.pid = audit_pid;
> - status_set.rate_limit = audit_rate_limit;
> - status_set.backlog_limit = audit_backlog_limit;
> - status_set.lost = atomic_read(&audit_lost);
> - status_set.backlog = skb_queue_len(&audit_skb_queue);
> + case AUDIT_GET: {
> + struct audit_status s;
> + s.enabled = audit_enabled;
> + s.failure = audit_failure;
> + s.pid = audit_pid;
> + s.rate_limit = audit_rate_limit;
> + s.backlog_limit = audit_backlog_limit;
> + s.lost = atomic_read(&audit_lost);
> + s.backlog = skb_queue_len(&audit_skb_queue);
> audit_send_reply(NETLINK_CB(skb).portid, seq, AUDIT_GET, 0, 0,
> - &status_set, sizeof(status_set));
> + &s, sizeof(s));
> break;
> - case AUDIT_SET:
> - if (nlh->nlmsg_len < sizeof(struct audit_status))
> - return -EINVAL;
> - status_get = (struct audit_status *)data;
> - if (status_get->mask & AUDIT_STATUS_ENABLED) {
> - err = audit_set_enabled(status_get->enabled);
> + }
> + case AUDIT_SET: {
> + struct audit_status s;
> + memset(&s, 0, sizeof(s));
> + /* guard against past and future API changes */
> + memcpy(&s, data, min(sizeof(s), (size_t)nlh->nlmsg_len));
> + if (s.mask & AUDIT_STATUS_ENABLED) {
> + err = audit_set_enabled(s.enabled);
> if (err < 0)
> return err;
> }
> - if (status_get->mask & AUDIT_STATUS_FAILURE) {
> - err = audit_set_failure(status_get->failure);
> + if (s.mask & AUDIT_STATUS_FAILURE) {
> + err = audit_set_failure(s.failure);
> if (err < 0)
> return err;
> }
> - if (status_get->mask & AUDIT_STATUS_PID) {
> - int new_pid = status_get->pid;
> + if (s.mask & AUDIT_STATUS_PID) {
> + int new_pid = s.pid;
>
> if (audit_enabled != AUDIT_OFF)
> audit_log_config_change("audit_pid", new_pid, audit_pid,
1);
> audit_pid = new_pid;
> audit_nlk_portid = NETLINK_CB(skb).portid;
> }
> - if (status_get->mask & AUDIT_STATUS_RATE_LIMIT) {
> - err = audit_set_rate_limit(status_get->rate_limit);
> + if (s.mask & AUDIT_STATUS_RATE_LIMIT) {
> + err = audit_set_rate_limit(s.rate_limit);
> if (err < 0)
> return err;
> }
> - if (status_get->mask & AUDIT_STATUS_BACKLOG_LIMIT)
> - err = audit_set_backlog_limit(status_get->backlog_limit);
> + if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT)
> + err = audit_set_backlog_limit(s.backlog_limit);
> break;
> + }
> case AUDIT_USER:
> case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
> case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:

2013-09-20 14:48:05

by Eric Paris

[permalink] [raw]
Subject: Re: [PATCH 7/8] audit: clean up AUDIT_GET/SET local variables and future-proof API

On Thu, 2013-09-19 at 17:18 -0400, Steve Grubb wrote:
> On Wednesday, September 18, 2013 03:06:52 PM Richard Guy Briggs wrote:
> > Re-named confusing local variable names (status_set and status_get didn't
> > agree with their command type name) and reduced their scope.
> >
> > Future-proof API changes by not depending on the exact size of the
> > audit_status struct.
>
> I wished things like this were coordinated before being written. We had some
> discussion of this back in July under a topic, "audit: implement generic
> feature setting and retrieving". Maybe that API can be fixed so its not just
> set/unset but can take a number as well.

Your right, we did talk about it. Though it seems we talked past each
other. What was implemented was an on off extensible interface. The
status interface already fits for setting numbers. And because of how
it is used has been extended is is extensible for setting numbers.

>
> Also, because there is no way to query the kernel to see what kind of things
> it supports

I'll agree. Richard, can you please add a version field to the status?
Start at version 1. Any time we add a new audit feature we'll bump it.

> , we've typically defined a new message type and put into it exactly
> what we need. In other words, if you want something expandable, the define a
> new message type like AUDIT_GET_EXT and AUDIT_SET_EXT and build it to be
> expandable.

There is no reason we can't use what we have. (As we've done it before)

2013-09-23 16:43:09

by Richard Guy Briggs

[permalink] [raw]
Subject: Re: [PATCH 7/8] audit: clean up AUDIT_GET/SET local variables and future-proof API

On Fri, Sep 20, 2013 at 10:47:50AM -0400, Eric Paris wrote:
> On Thu, 2013-09-19 at 17:18 -0400, Steve Grubb wrote:
> > On Wednesday, September 18, 2013 03:06:52 PM Richard Guy Briggs wrote:
> > > Re-named confusing local variable names (status_set and status_get didn't
> > > agree with their command type name) and reduced their scope.
> > >
> > > Future-proof API changes by not depending on the exact size of the
> > > audit_status struct.

> > Also, because there is no way to query the kernel to see what kind of things
> > it supports
>
> I'll agree. Richard, can you please add a version field to the status?
> Start at version 1. Any time we add a new audit feature we'll bump it.

Done. Adding the interface and version field is version 1, so without,
it is zero. Adding the backlog timeout is version 2.

- RGB

--
Richard Guy Briggs <[email protected]>
Senior Software Engineer
Kernel Security
AMER ENG Base Operating Systems
Remote, Ottawa, Canada
Voice: +1.647.777.2635
Internal: (81) 32635
Alt: +1.613.693.0684x3545