Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757738AbaKTQLP (ORCPT ); Thu, 20 Nov 2014 11:11:15 -0500 Received: from mx1.redhat.com ([209.132.183.28]:35991 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757547AbaKTQLN (ORCPT ); Thu, 20 Nov 2014 11:11:13 -0500 Date: Thu, 20 Nov 2014 11:10:55 -0500 From: Dave Jones To: Vivek Goyal Cc: Don Zickus , Thomas Gleixner , Linus Torvalds , Linux Kernel , the arch/x86 maintainers , WANG Chao , Baoquan He , Dave Young Subject: Re: frequent lockups in 3.18rc4 Message-ID: <20141120161055.GA8309@redhat.com> Mail-Followup-To: Dave Jones , Vivek Goyal , Don Zickus , Thomas Gleixner , Linus Torvalds , Linux Kernel , the arch/x86 maintainers , WANG Chao , Baoquan He , Dave Young References: <20141118023930.GA2871@redhat.com> <20141118145234.GA7487@redhat.com> <20141118215540.GD35311@redhat.com> <20141118220254.GA2571@redhat.com> <20141119144105.GB108701@redhat.com> <20141119150333.GB2953@redhat.com> <20141119153852.GA16146@redhat.com> <20141119162806.GD2953@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141119162806.GD2953@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 19, 2014 at 11:28:06AM -0500, Vivek Goyal wrote: > I am wondering may be in some cases we panic in second kernel and sit > there. Probably we should append a kernel command line automatically > say "panic=1" so that it reboots itself if second kernel panics. > > By any chance, have you enabled "CONFIG_RANDOMIZE_BASE"? If yes, please > disable that as currently kexec/kdump stuff does not work with it. And > it hangs very early in the boot process and I had to hook serial console > to get following message on console. I did have that enabled. (Perhaps the kconfig should conflict?) After rebuilding without it, this.. > > dracut: *** Stripping files done *** > > dracut: *** Store current command line parameters *** > > dracut: *** Creating image file *** > > dracut: *** Creating image file done *** > > kdumpctl: cat: write error: Broken pipe > > kdumpctl: kexec: failed to load kdump kernel > > kdumpctl: Starting kdump: [FAILED] went away. It generated the image, and things looked good. I did echo c > /proc/sysrq-trigger and got this.. SysRq : Trigger a crash BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1192 in_atomic(): 0, irqs_disabled(): 0, pid: 8860, name: bash 3 locks held by bash/8860: #0: (sb_writers#5){......}, at: [] vfs_write+0x1b3/0x1f0 #1: (rcu_read_lock){......}, at: [] __handle_sysrq+0x5/0x1b0 #2: (&mm->mmap_sem){......}, at: [] __do_page_fault+0x140/0x600 Preemption disabled at:[] printk+0x5c/0x72 CPU: 1 PID: 8860 Comm: bash Not tainted 3.18.0-rc5+ #95 [loadavg: 0.54 0.24 0.09 2/143 8909] 00000000000004a8 00000000e1f75c1b ffff880236473c28 ffffffff817ce5c7 0000000000000000 0000000000000000 ffff880236473c58 ffffffff8109af8a ffff880236473c58 0000000000000029 0000000000000000 ffff880236473d88 Call Trace: [] dump_stack+0x4f/0x7c [] __might_sleep+0x12a/0x190 [] __do_page_fault+0x15b/0x600 [] ? irq_work_queue+0x62/0xd0 [] ? trace_hardirqs_off_thunk+0x3a/0x3f [] do_page_fault+0xc/0x10 [] page_fault+0x22/0x30 [] ? printk+0x5c/0x72 [] ? sysrq_handle_crash+0x16/0x20 [] __handle_sysrq+0x137/0x1b0 [] ? __handle_sysrq+0x5/0x1b0 [] write_sysrq_trigger+0x4a/0x50 [] proc_reg_write+0x3d/0x80 [] vfs_write+0xba/0x1f0 [] SyS_write+0x58/0xd0 [] system_call_fastpath+0x12/0x17 Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 1 PID: 8860 Comm: bash Not tainted 3.18.0-rc5+ #95 [loadavg: 0.54 0.24 0.09 1/143 8909] task: ffff8800a1a60000 ti: ffff880236470000 task.ti: ffff880236470000 RIP: 0010:[] [] sysrq_handle_crash+0x16/0x20 RSP: 0018:ffff880236473e38 EFLAGS: 00010246 RAX: 000000000000000f RBX: ffffffff81cb4a00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff817ca332 RDI: 0000000000000063 RBP: ffff880236473e38 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000358 R11: 0000000000000357 R12: 0000000000000063 R13: 0000000000000000 R14: 0000000000000007 R15: 0000000000000000 FS: 00007fc652f4e740(0000) GS:ffff880244200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000023a3b2000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffff880236473e78 ffffffff8144a567 ffffffff8144a435 0000000000000002 0000000000000002 00007fc652f51000 0000000000000002 ffff880236473f48 ffff880236473ea8 ffffffff8144aa4a 0000000000000002 00007fc652f51000 Call Trace: [] __handle_sysrq+0x137/0x1b0 [] ? __handle_sysrq+0x5/0x1b0 [] write_sysrq_trigger+0x4a/0x50 [] proc_reg_write+0x3d/0x80 [] vfs_write+0xba/0x1f0 [] SyS_write+0x58/0xd0 [] system_call_fastpath+0x12/0x17 Code: 01 f4 45 39 a5 b4 00 00 00 75 e2 4c 89 ef e8 d2 f7 ff ff eb d8 0f 1f 44 00 00 55 c7 05 08 b7 7e 00 01 00 00 00 48 89 e5 0f ae f8 04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 48 89 e5 RIP [] sysrq_handle_crash+0x16/0x20 RSP CR2: 0000000000000000 Which, asides from the sleeping while atomic thing which isn't important, does what I expected. Shortly later, it rebooted. And then /var/crash was empty. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/