Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754604Ab3IIPmQ (ORCPT ); Mon, 9 Sep 2013 11:42:16 -0400 Received: from mail-la0-f45.google.com ([209.85.215.45]:58731 "EHLO mail-la0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753200Ab3IIPmN (ORCPT ); Mon, 9 Sep 2013 11:42:13 -0400 Message-ID: <522DEC51.2090608@openvz.org> Date: Mon, 09 Sep 2013 19:42:09 +0400 From: Konstantin Khlebnikov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130823 Firefox/10.0.11esrpre Iceape/2.7.12 MIME-Version: 1.0 To: Luiz Capitulino CC: linux-kernel@vger.kernel.org, oleg@redhat.com, eparis@redhat.com, akpm@linux-foundation.org, rgb@redhat.com Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start() References: <20130828182114.6b67ea90@redhat.com> <522DDBED.2020505@openvz.org> <20130909105443.72e525cb@redhat.com> <522DE6F2.50607@openvz.org> <20130909112953.10ac2e28@redhat.com> In-Reply-To: <20130909112953.10ac2e28@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5438 Lines: 121 Luiz Capitulino wrote: > On Mon, 09 Sep 2013 19:19:14 +0400 > Konstantin Khlebnikov wrote: > >> Luiz Capitulino wrote: >>> On Mon, 09 Sep 2013 18:32:13 +0400 >>> Konstantin Khlebnikov wrote: >>> >>>> Luiz Capitulino wrote: >>>>> I'm getting the following soft lockup: >>>>> >>>>> CPU: 6 PID: 2278 Comm: killall5 Tainted: GF 3.11.0-rc7+ #1 >>>>> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 >>>>> 0000000000000099 ffff88011fd83de8 ffffffff815324df 0000000000002800 >>>>> ffffffff817d48f9 ffff88011fd83e68 ffffffff8152e669 ffff88011fd83e68 >>>>> ffffffff00000008 ffff88011fd83e78 ffff88011fd83e18 0000004081dac040 >>>>> Call Trace: >>>>> [] dump_stack+0x46/0x58 >>>>> [] panic+0xbb/0x1c4 >>>>> [] watchdog_timer_fn+0x163/0x170 >>>>> [] __run_hrtimer+0x81/0x1c0 >>>>> [] ? watchdog+0x30/0x30 >>>>> [] hrtimer_interrupt+0x107/0x240 >>>>> [] local_apic_timer_interrupt+0x3b/0x60 >>>>> [] smp_apic_timer_interrupt+0x45/0x60 >>>>> [] apic_timer_interrupt+0x6a/0x70 >>>>> [] ? audit_log_start+0xbf/0x430 >>>>> [] ? audit_log_start+0x147/0x430 >>>>> [] ? try_to_wake_up+0x2a0/0x2a0 >>>>> [] audit_log_exit+0x6ae/0xc30 >>>>> [] ? __alloc_fd+0x42/0x100 >>>>> [] __audit_syscall_exit+0x257/0x2b0 >>>>> [] sysret_audit+0x17/0x21 >>>>> >>>>> The reproducer is somewhat unusual: >>>>> >>>>> 1. Install RHEL6.5 (maybe a similar older user-space will do) >>>>> 2. Boot the just installed system >>>>> 3. In this first boot you'll meet the firstboot script, which >>>>> will do some setup and (depending on your answers) it will >>>>> reboot the machine >>>>> 4. During that first reboot the system hangs while terminating >>>>> all processes: >>>>> >>>>> Sending all processes the TERM signal... >>>>> >>>>> It's when the soft lockup above happens. And yes, I managed >>>>> to get this with latest upstream kernel (HEAD fa8218def1b1) >>>>> >>>>> I'm reproducing it on a VM, but the first report was on bare-metal. >>>>> >>>>> This is what is happening: >>>>> >>>>> 1. audit_log_start() is called >>>>> 2. As we have SKBs waiting in audit_skb_queue and all conditions >>>>> evaluate to true, we sleep in wait_for_auditd() >>>>> 3. Go to 2, until sleep_time gets negative and audit_log_start() >>>>> just busy-waits >>>>> >>>>> Now, *why* this is happening is a mistery to me. I tried debugging >>>>> it, but all I could find is that at some point the kauditd thread >>>>> never wakes up after having called schedule(). I even tried waking >>>>> it up before calling wait_for_auditd(), but it didn't. >>>> >>>> We run into the same problem in rhel6 kernel. >>>> >>>> "readahead-collector" uses audit interface and sometimes stuck in 'stopped' state. >>> >>> Yes, please also see: >>> >>> http://marc.info/?l=linux-kernel&m=137818375024600&w=2 >>> >>>> After commit 829199197a430dade2519d54f5545c4a094393b8 >>>> (which was backported by RH into their kernel) >>>> audit emiters will block forever if userspace daemon cannot handle backlog. >>>> That commit just breaks timeout condition, after timeout waiting loop turns >>>> into busy loop until deamon dies or returns back to work. >>>> >>>> this trivial patch should fix that problem >>>> >>>> --- a/kernel/audit.c >>>> +++ b/kernel/audit.c >>>> @@ -1215,9 +1215,10 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, >>>> >>>> sleep_time = timeout_start + audit_backlog_wait_time - >>>> jiffies; >>>> - if ((long)sleep_time> 0) >>>> + if ((long)sleep_time> 0) { >>>> wait_for_auditd(sleep_time); >>>> - continue; >>>> + continue; >>>> + } >>>> } >>> >>> >>> Chuck Anderson posted a similar fix: >>> >>> http://marc.info/?l=linux-kernel&m=137817994623832&w=2 >>> >>> I still get a hang for around a minute with Chuck's fix, I believe I'll >>> get the same thing with yours. >> >> Yep, this is normal behaviour -- default audit_backlog_wait_time is 60 seconds. > > Yes, I know that's the cause but I don't call it normal behavior to > be unable to use my system during 60 seconds. This stuff was designed for auditing, so it shouldn't loose messages too easily. Meanwhile 'readahead' reuses it for a different purpose without proper tuning. I think kernel should provide interface for tuning that timeout and let 'readahead' set it to zero or some small value. > >> But this logic is really strange, I don't see any interface for tuning that timeout >> and seems like kernel set it to zero after first disaster and never recovers it back. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/