Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754846Ab3JFWOc (ORCPT ); Sun, 6 Oct 2013 18:14:32 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:48018 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754614Ab3JFWOb convert rfc822-to-8bit (ORCPT ); Sun, 6 Oct 2013 18:14:31 -0400 MIME-Version: 1.0 Message-ID: <1df5b870-d4bb-46c2-8e6e-af7b63ba21cc@default> Date: Sun, 6 Oct 2013 15:14:23 -0700 (PDT) From: Boris Ostrovsky To: Cc: , , Subject: Re: [xen] double fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC X-Mailer: Zimbra on Oracle Beehive Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Content-Disposition: inline X-Source-IP: ucsinet22.oracle.com [156.151.31.94] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3051 Lines: 73 ----- torvalds@linux-foundation.org wrote: > On Sun, Oct 6, 2013 at 1:23 AM, Fengguang Wu > wrote: > > > > I got the below dmesg and the first bad commit is commit > cf39c8e5352b: > > Merge tag 'stable/for-linus-3.12-rc0-tag' of > git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip > > Ugh. How reliable is the double fault? Because bisecting it to the > merge that didn't even have any conflicts in it as far as I can > remember means that there's something really subtle going on wrt some > semantic conflict or other. Or, alternatively, it means that the > bisect failed because the double fault isn't 100% reliable.. > > Anyway, the stack is crap when the original fault happens at > "boot_tvec_bases+0x1fe", and that causes the double fault debug code > to take *another* fault, which means that it doesn't even show the > right code sequence. Too bad. So ignore the latter part of the oops, > but the top part looks valid: > > > [ 4.136137] double fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > > [ 4.137521] CPU: 0 PID: 132 Comm: bootlogd Not tainted > 3.12.0-rc2-00153-g14951f2 #129 > > [ 4.139156] task: ffff88000c9a6580 ti: ffff88000c9ba000 task.ti: > ffff88000c9ba000 > > [ 4.140042] RIP: 0010:[] [] > boot_tvec_bases+0x1fe/0x2080 > > [ 4.140042] RSP: 0018:0000000088000cd8 EFLAGS: 00010212 > > [ 4.140042] RAX: 000000000000004f RBX: 0000000000000100 RCX: > 0000000000000000 > > [ 4.140042] RDX: 0000000000000f1e RSI: ffffffff81f746a8 RDI: > ffffffff81f31c48 > > [ 4.140042] RBP: ffff88000f003ee0 R08: 0000000000000000 R09: > 0000000000000000 > > [ 4.140042] R10: 0000000000000001 R11: ffff88000f00a000 R12: > ffff88000c9bbfd8 > > [ 4.140042] R13: ffffffff81f31c48 R14: ffffffff81f31c48 R15: > ffffffff81f31c48 > > [ 4.140042] FS: 00007fb1f9662700(0000) GS:ffff88000f000000(0000) > knlGS:0000000000000000 > > [ 4.140042] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 4.140042] CR2: 0000000088000cc8 CR3: 000000000c9cd000 CR4: > 00000000000006b0 > > [ 4.140042] Stack: > > > but it has jumped into a data section and is executing random data as > code, and there is no sign of where it jumped *from*, since the > random > code clearly corrupted the stack - resulting in the double fault in > the first place. > > So the oops is almost entirely useless as a debug aid in this > situation. I'm almost hoping that your bisect was wrong, and you > could > try to see if you could do that again.. For what it's worth, the commit in question touches almost exclusively Xen files, the only exception being lib/swiotlb.c (with what appear to be fairly trivial changes). And CONFIG_XEN in the config file for this report is not set. -boris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/