2013-09-03 03:45:23

by Chuck Anderson

[permalink] [raw]
Subject: [PATCH 0/2] audit: fix soft lockups and udevd errors when audit is overrun

The two patches that follow in separate emails resolve soft lockups and
udevd reported errors that prevented a large memory 3.8 system from booting.

The patches are based on 3.11-rc7.

I believe it is the same issue recently posted as:

[RFC] audit: avoid soft lockup in audit_log_start()
https://lkml.org/lkml/2013/8/28/626

The first patch:

audit: fix soft lockups due to loop in audit_log_start() when
audit_backlog_limit exceeded

fixes a bug in kernel/audit that caused many soft lockups during boot:

BUG: soft lockup - CPU#66 stuck for 22s! [udevd:9559]
RIP: 0010:[<ffffffff810d1d06>] [<ffffffff810d1d06>]
audit_log_start+0xe6/0x350
Call Trace:
[<ffffffff8108ea30>] ? try_to_wake_up+0x2d0/0x2d0
[<ffffffff810d8d6f>] audit_log_exit+0x3f/0x590
[<ffffffff810d975d>] __audit_syscall_exit+0x28d/0x2c0
[<ffffffff815e0440>] sysret_audit+0x17/0x21

The second patch:

audit: Two efficiency fixes for audit mechanism

prevents these and similar error messages repeated often during boot:

udevd[876]: worker [887] unexpectedly returned with status 0x0100
udevd[876]: worker [887] failed while handling
'/devices/pci0000:00/0000:00:03.0/0000:40:00.0'
udevd[876]: worker [880] unexpectedly returned with status 0x0100
udevd[876]: worker [880] failed while handling
'/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1/event1'

udevadm settle - timeout of 180 seconds reached, the event queue
contains:
/sys/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1/event1 (3995)
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/INT3F0D:00 (4034)

audit: audit_backlog=258 > audit_backlog_limit=256
audit: audit_lost=1 audit_rate_limit=0 audit_backlog_limit=256


2013-09-03 04:46:21

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [PATCH 0/2] audit: fix soft lockups and udevd errors when audit is overrun

On Mon, 02 Sep 2013 20:45:14 -0700
Chuck Anderson <[email protected]> wrote:

> The two patches that follow in separate emails resolve soft lockups and
> udevd reported errors that prevented a large memory 3.8 system from booting.
>
> The patches are based on 3.11-rc7.
>
> I believe it is the same issue recently posted as:
>
> [RFC] audit: avoid soft lockup in audit_log_start()
> https://lkml.org/lkml/2013/8/28/626

Nice to see someone else looking into this! And Thanks for CC'ing me.

I've a couple of news to you.

First, I've tried to apply your series but got this:

[lcapitulino@volcano linux-2.6]$ git am ~/audit-fix.mbox
Applying: audit: fix soft lockups due to loop in audit_log_start() wh,en audit_backlog_limit exceeded
fatal: corrupt patch at line 23
Patch failed at 0001 audit: fix soft lockups due to loop in audit_log_start() wh,en audit_backlog_limit exceeded
The copy of the patch that failed is found in:
/home/lcapitulino/work/src/upstream/linux-2.6/.git/rebase-apply/patch
When you have resolved this problem, run "git am --resolved".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
[lcapitulino@volcano linux-2.6]$

Now, I was a few minutes a away before sending a different fix I cooked
this evening when I got your series in my inbox. So I really wanted to give
this a try and applied the first patch manually (resulting version is
attached). The softlockup is gone, but I still get a hang for several
seconds just like I did with my first rfc.

I found a very easy way to reproduce the problem and our analysis is
similar, but our solutions differs.

I'm going to send my solution right now, sorry for any mistakes it's
almost 1h AM here but I really wanted to give your version a try before
sending my version (and before going to bed). If you send a v2 I'll try
it again and we can discuss our approaches.


Attachments:
(No filename) (1.90 kB)
chuck.patch (2.74 kB)
Download all attachments