Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751790Ab1D0EYL (ORCPT ); Wed, 27 Apr 2011 00:24:11 -0400 Received: from mail-ww0-f44.google.com ([74.125.82.44]:61611 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751050Ab1D0EYI (ORCPT ); Wed, 27 Apr 2011 00:24:08 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; b=lpFMiMOzDvdI3zqCpXcrFS87URAEkhQCPuwD37LFLrhtfXOrOx/gd3L22MI9Bbrqf+ 0FxwXI677SjEk3eq6Q+uf8BLwQsCCGK+zwj+3Km6QKdz8djngjXn8dbxnokAiAGw5dXs 2tbGakOJdl3zAqo+WhQSPAHBI5L5BOakxCdUM= Subject: Re: Kernel crash after using new Intel NIC (igb) From: Eric Dumazet To: Maximilian Engelhardt Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, StuStaNet Vorstand In-Reply-To: <201104250033.03401.maxi@daemonizer.de> References: <201104250033.03401.maxi@daemonizer.de> Content-Type: text/plain; charset="UTF-8" Date: Wed, 27 Apr 2011 06:24:00 +0200 Message-ID: <1303878240.2699.41.camel@edumazet-laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5455 Lines: 117 Le lundi 25 avril 2011 à 00:32 +0200, Maximilian Engelhardt a écrit : > Hello, > > some time ago we switched some of our servers to a new networking card that > uses the Intel igb driver. Since that time we see regular kernel crashes. > The crashes happen at very irregular intervals, sometimes after a week uptime, > sometimes after a month or even more. They seem to be independent of the > server load as they also happen in the night when there is low traffic. > > The affected server is used as a NAT device with some iptables rules and serves > about 2000 people. > > Attached are two logs of the crashes as well as the output of dmesg, lspci, > and /proc/interrupts as well as the used kernel config. > > I have no idea what might be wrong but I think it is a kernel bug. Perhaps > someone with more knowledge has a clue. > > If needed I can provide additional information or build different kernels. > > Greetings, > Maxi Hello Maximilian We had similar reports in the past that disappeared when adding "slab_nomerge" to boot parameters. We suspect a memory corruption from another part of kernel on 64bytes kmemcache objects. In 2.6.37, inetpeer code uses 64bytes objects. Using slab_nomerge and SLUB allocator (as you already do), makes sure inetpeer kmemcache wont be shared by other 64bytes objects in kernel. In 2.6.38 and up, inetpeer objects are now larger, so you also could try latest linux-2.6 tree, just to make sure inetpeer code is not faulty. Thanks BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [] cleanup_once+0x3f/0xa0 PGD 12d82a067 PUD 12ea49067 PMD 0 Oops: 0002 [#1] PREEMPT SMP last sysfs file: /sys/devices/virtual/vc/vcsa5/uevent CPU 0 Pid: 0, comm: swapper Not tainted 2.6.37.1 #1 Supermicro X7SB4/E/X7SB4/E RIP: 0010:[] [] cleanup_once+0x3f/0xa0 RSP: 0018:ffff8800cfc03e40 EFLAGS: 00010202 RAX: ffff880128167798 RBX: ffff880128167780 RCX: 0000000000000000 RDX: c398112e00026cf7 RSI: 00000000000001a2 RDI: ffffffff8166ce10 RBP: 0000000000024702 R08: 00000000003d0900 R09: 00040ea8ea5b7700 R10: ffffffff814f312d R11: 0000000000000010 R12: ffffffff8161ffd8 R13: 0000000000000102 R14: ffffffff8174b4e0 R15: ffffffff8161ffd8 FS: 0000000000000000(0000) GS:ffff8800cfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000008 CR3: 000000012fe67000 CR4: 00000000000406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffffffff8161e000, task ffffffff81638020) Stack: ffff8800cfc11f00 0000000111034f87 0000000000024702 ffffffff8145ed68 ffffffff8174a4c0 ffffffff8174a4c0 ffff8800cfc03eb0 ffffffff81044cb8 ffffffff81034079 ffffffff8145ed30 0000000000000000 ffffffff8174b8e0 Call Trace: [] ? peer_check_expire+0x38/0x110 [] ? run_timer_softirq+0x138/0x250 [] ? scheduler_tick+0xd9/0x2e0 [] ? peer_check_expire+0x0/0x110 [] ? __do_softirq+0x9d/0x130 [] ? call_softirq+0x1c/0x30 [] ? do_softirq+0x4d/0x80 [] ? irq_exit+0x8d/0x90 [] ? smp_apic_timer_interrupt+0x6a/0xa0 [] ? apic_timer_interrupt+0x13/0x20 [] ? mwait_idle+0x6a/0x80 [] ? cpu_idle+0x58/0xb0 [] ? start_kernel+0x334/0x33f [] ? x86_64_start_kernel+0xf3/0xf7 Code: 00 48 8b 05 84 e3 20 00 48 3d 00 ce 66 81 74 5c 48 8d 58 e8 48 8b 15 31 5e 22 00 2b 53 28 48 39 ea 72 49 48 8b 4b 18 48 8b 53 20 <48> 89 51 08 48 89 0a 48 89 43 18 48 89 43 20 f0 ff 40 14 48 c7 RIP [] cleanup_once+0x3f/0xa0 RSP CR2: 0000000000000008 ---[ end trace 904f16191de0663c ]--- Kernel panic - not syncing: Fatal exception in interrupt Pid: 0, comm: swapper Tainted: G D 2.6.37.1 #1 Call Trace: [] ? panic+0xa1/0x19e [] ? oops_end+0x9b/0xa0 [] ? no_context+0x103/0x270 [] ? do_page_fault+0x290/0x430 [] ? __alloc_skb+0x72/0x160 [] ? swiotlb_dma_mapping_error+0x10/0x20 [] ? igb_alloc_rx_buffers_adv+0x208/0x3a0 [] ? page_fault+0x1f/0x30 [] ? cleanup_once+0x3f/0xa0 [] ? peer_check_expire+0x38/0x110 [] ? run_timer_softirq+0x138/0x250 [] ? scheduler_tick+0xd9/0x2e0 [] ? peer_check_expire+0x0/0x110 [] ? __do_softirq+0x9d/0x130 [] ? call_softirq+0x1c/0x30 [] ? do_softirq+0x4d/0x80 [] ? irq_exit+0x8d/0x90 [] ? smp_apic_timer_interrupt+0x6a/0xa0 [] ? apic_timer_interrupt+0x13/0x20 [] ? mwait_idle+0x6a/0x80 [] ? cpu_idle+0x58/0xb0 [] ? start_kernel+0x334/0x33f [] ? x86_64_start_kernel+0xf3/0xf7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/