Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752801AbYGXGwF (ORCPT ); Thu, 24 Jul 2008 02:52:05 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751221AbYGXGvz (ORCPT ); Thu, 24 Jul 2008 02:51:55 -0400 Received: from xsmtp1.ethz.ch ([82.130.70.13]:55330 "EHLO xsmtp1.ethz.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751021AbYGXGvy (ORCPT ); Thu, 24 Jul 2008 02:51:54 -0400 Message-ID: <48882687.7020508@gmx.de> Date: Thu, 24 Jul 2008 08:51:51 +0200 From: Dieter Ries User-Agent: Thunderbird 2.0.0.14 (X11/20080617) MIME-Version: 1.0 To: Vegard Nossum CC: linux-kernel@vger.kernel.org, jgarzik@pobox.com, netdev@vger.kernel.org, Pekka Enberg , jeffrey.t.kirsher@intel.com, e1000-devel@lists.sourceforge.net Subject: Re: Current Git: BUG: unable to handle kernel paging request at 0000000001a40ca0 References: <488750AA.20707@gmx.de> <19f34abd0807231046o4b194409w7d0e28a7cd745afa@mail.gmail.com> <48877200.9040608@gmx.de> <4887A860.6070607@gmx.de> <19f34abd0807231500m3d780d90i39626023e0685369@mail.gmail.com> In-Reply-To: <19f34abd0807231500m3d780d90i39626023e0685369@mail.gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 24 Jul 2008 06:51:52.0432 (UTC) FILETIME=[C0E75F00:01C8ED59] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5134 Lines: 116 Vegard Nossum schrieb: > On Wed, Jul 23, 2008 at 11:53 PM, Dieter Ries wrote: >>>> Dieter: If this is reproducible, it would probably help quite a bit to >>>> configure the kernel with CONFIG_SLUB_DEBUG and boot with >>>> slub_debug=FZPUT (unless you already have CONFIG_SLUB_DEBUG_ON set, in >>>> which case you are already running with the SLUB debugging at boot). >>>> It might catch the corruption before it becomes fatal, or give us some >>>> more clues anyway. >> I tried to bisect the bug, which failed because there were too many kernels >> not booting with other problems, I guess bisecting just fails in the merge >> window. >> >> With CONFIG_SLUB_DEBUG_ON the output looks different, unfortunately >> netconsole stops before those are transmitted. I think I managed to catch one of those: general protection fault: 0000 [1] SMP CPU 0 Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.26-06373-gcaf076e #49 RIP: 0010:[] [] nf_nat_move_storage+0x21/0x7a RSP: 0018:ffffffff8091ab80 EFLAGS: 00010206 RAX: ffffffff805e08d8 RBX: ffff88007d1fb948 RCX: 000000000000006b RDX: ffff88007d175e10 RSI: ffff88007d175e7b RDI: ffff88007d1fb948 RBP: ffffffff8091aba0 R08: 0000000000000000 R09: ffff88007d175e90 R10: ffffe20000000008 R11: ffff88007d175e10 R12: 59d2c3ffff88007d R13: ffff88007d175e7b R14: 00000000000000a0 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffffffff8089ee80(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffffffff808b0000, task ffffffff80842340) Stack: 0000000000000002 ffff88007d3d2000 ffff88007d1fb948 0000000000000070 ffffffff8091abf0 ffffffff8059d3c4 ffffffff8091ac40 0000000100000001 ffffffff809e3658 ffff88007d3d2000 0000000000000002 ffff88007f9f6500 Call Trace: [] __nf_ct_ext_add+0x15f/0x1f7 [] nf_nat_fn+0x84/0x152 [] nf_nat_in+0x2f/0x71 [] nf_iterate+0x48/0x85 [] ? ip_rcv_finish+0x0/0x35d [] nf_hook_slow+0x63/0xcb [] ? ip_rcv_finish+0x0/0x35d [] ? __slab_alloc+0x413/0x4bd [] ip_rcv+0x257/0x297 [] netif_receive_skb+0x1f1/0x263 [] e1000_receive_skb+0x46/0x5d [] e1000_clean_rx_irq+0x20e/0x2a6 [] ? getnstimeofday+0x3f/0xa0 [] e1000_clean+0x6d/0x218 [] ? hrtimer_get_next_event+0xa8/0xb8 [] net_rx_action+0xa9/0x17c [] __do_softirq+0x65/0xd5 [] call_softirq+0x1c/0x28 [] do_softirq+0x39/0x77 [] irq_exit+0x44/0x85 [] do_IRQ+0x147/0x16a [] ret_from_intr+0x0/0xa [] ? acpi_idle_enter_bm+0x2a7/0x317 [] ? acpi_idle_enter_bm+0x29d/0x317 [] ? menu_select+0x75/0x9e [] ? cpuidle_idle_call+0x75/0xa7 [] ? cpu_idle+0x69/0x8c [] ? rest_init+0x61/0x63 [] ? start_kernel+0x2ad/0x2b9 [] ? x86_64_start_reservations+0x84/0x88 [] ? x86_64_start_kernel+0xe4/0xeb Code: ff 5b 41 5c 41 5d 41 5e c9 c3 55 48 89 e5 41 55 41 54 53 48 83 ec 08 e8 c6 a8 c2 ff 4c 8b 66 20 48 89 fb 49 89 f5 4d 85 e4 74 51 <49> f7 44 24 78 80 01 00 00 74 46 48 c7 c7 78 6a 9e 80 e8 8f 2e RIP [] nf_nat_move_storage+0x21/0x7a RSP ---[ end trace 6f6148e13aab302e ]--- Kernel panic - not syncing: Aiee, killing interrupt handler! >> >> As there are always some lines about e1000 in the backtraces, I tried to >> boot without LAN cable connected, and it worked, and crashed afterwards when >> I plugged the cable in, with a bug in net/core/dev.c. >> >> Should I copy the messages with CONFIG_SLUB_DEBUG_ON by hand, or are just >> some parts important? > > There were some e1000 patches in flight on LKML recently; you might be > able to find them and see if it helps you. It also seems that some > changes were just committed to -git, so I guess you should try the > very latest from there. I reverted some of the last patches concerning e1000 one by one, but the last ~12 which I did revert yet didnt solve the problem. > > You also Cced netdev from the start, so somebody from there should be > able to help you more from here than I. :-) > > > Vegard > cu Dieter -- 3rd Law of Computing: Anything that can go wr fortune: Segmentation violation -- Core dumped -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/