Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753808AbaARIoY (ORCPT ); Sat, 18 Jan 2014 03:44:24 -0500 Received: from rydia.net ([69.46.88.68]:59170 "EHLO mail.rydia.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753685AbaARIoR (ORCPT ); Sat, 18 Jan 2014 03:44:17 -0500 Date: Sat, 18 Jan 2014 00:44:16 -0800 (PST) From: dormando X-X-Sender: dormando@dinf To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: kmem_cache_alloc panic in 3.10+ Message-ID: User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello again! We've had a rare crash that's existed between 3.10.0 and 3.10.15 at least (trying newer stables now, but I can't tell if it was fixed, and it takes weeks to reproduce). Unfortunately I can only get 8k back from pstore. The panic looks a bit longer than that is caught in the log, but the bottom part is almost always this same trace as this one: Panic#6 Part1 <4>[1197485.199166] [] tcp_push+0x6c/0x90 <4>[1197485.199171] [] tcp_sendmsg+0x109/0xd40 <4>[1197485.199179] [] ? put_page+0x35/0x40 <4>[1197485.199185] [] inet_sendmsg+0x45/0xb0 <4>[1197485.199191] [] sock_aio_write+0x11e/0x130 <4>[1197485.199196] [] ? inet_recvmsg+0x4f/0x80 <4>[1197485.199203] [] do_sync_readv_writev+0x6d/0xa0 <4>[1197485.199209] [] do_readv_writev+0xfb/0x2f0 <4>[1197485.199215] [] ? __free_pages+0x35/0x40 <4>[1197485.199220] [] ? free_pages+0x46/0x50 <4>[1197485.199226] [] ? SyS_mincore+0x152/0x690 <4>[1197485.199231] [] vfs_writev+0x48/0x60 <4>[1197485.199236] [] SyS_writev+0x5f/0xd0 <4>[1197485.199243] [] system_call_fastpath+0x16/0x1b <4>[1197485.199247] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 8b 3c 24 <49> 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c <1>[1197485.199290] RIP [] kmem_cache_alloc+0x5a/0x130 <4>[1197485.199296] RSP <4>[1197485.199299] CR2: 0000000100000000 <4>[1197485.199343] ---[ end trace 90fee06aa40b7304 ]--- <1>[1197485.263911] BUG: unable to handle kernel paging request at 0000000100000000 <1>[1197485.263923] IP: [] kmem_cache_alloc+0x5a/0x130 <4>[1197485.263932] PGD 3f43e5c067 PUD 0 <4>[1197485.263937] Oops: 0000 [#5] SMP <4>[1197485.263941] Modules linked in: ntfs vfat msdos fat macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode sb_edac edac_core lpc_ich mfd_core ixgbe igb i2c_algo_bit mdio ptp pps_core <4>[1197485.263966] CPU: 0 PID: 233846 Comm: cache-worker Tainted: G D 3.10.15 #1 <4>[1197485.263972] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 2.0a 03/07/2013 <4>[1197485.263976] task: ffff883427f9dc00 ti: ffff8830d4312000 task.ti: ffff8830d4312000 <4>[1197485.263982] RIP: 0010:[] [] kmem_cache_alloc+0x5a/0x130 <4>[1197485.263990] RSP: 0018:ffff881fffc038c8 EFLAGS: 00010286 <4>[1197485.263994] RAX: 0000000000000000 RBX: ffffffff81c8c740 RCX: 00000000ffffffff <4>[1197485.263999] RDX: 0000000029273024 RSI: 0000000000000020 RDI: 0000000000015680 <4>[1197485.264004] RBP: ffff881fffc03908 R08: ffff881fffc15680 R09: ffffffff815bdd4b <4>[1197485.264009] R10: ffff881c65d21800 R11: 0000000000000000 R12: ffff881fff803800 <4>[1197485.264014] R13: 0000000100000000 R14: 00000000ffffffff R15: 0000000000000000 <4>[1197485.264019] FS: 00007f8d855eb700(0000) GS:ffff881fffc00000(0000) knlGS:0000000000000000 <4>[1197485.264024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[1197485.264028] CR2: 0000000100000000 CR3: 000000308f258000 CR4: 00000000000407f0 <4>[1197485.264032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[1197485.264037] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[1197485.264041] Stack: <4>[1197485.264044] ffff881fffc03928 00000020815d0d95 ffff881fffc03938 ffffffff81c8c740 <4>[1197485.264050] ffff881fce210000 0000000000000001 00000000ffffffff 0000000000000000 <4>[1197485.264056] ffff881fffc03958 ffffffff815bdd4b ffff881fffc039a8 0000000000000000 <4>[1197485.264063] Call Trace: <4>[1197485.264066] <4>[1197485.264069] [] dst_alloc+0x5b/0x190 <4>[1197485.264080] [] rt_dst_alloc+0x4c/0x50 <4>[1197485.264085] [] __ip_route_output_key+0x270/0x880 <4>[1197485.264092] [] ? try_to_wake_up+0x23e/0x2b0 <4>[1197485.264097] [] ip_route_output_flow+0x27/0x60 <4>[1197485.264102] [] ip_queue_xmit+0x36a/0x390 <4>[1197485.264108] [] tcp_transmit_skb+0x485/0x890 <4>[1197485.264113] [] tcp_send_ack+0xf1/0x130 <4>[1197485.264118] [] __tcp_ack_snd_check+0x5e/0xa0 <4>[1197485.264123] [] tcp_rcv_state_process+0x8b2/0xb20 <4>[1197485.264128] [] tcp_v4_do_rcv+0x191/0x4f0 <4>[1197485.264133] [] tcp_v4_rcv+0x5fc/0x750 <4>[1197485.264138] [] ? ip_rcv+0x350/0x350 <4>[1197485.264143] [] ? nf_hook_slow+0x7d/0x160 <4>[1197485.264147] [] ? ip_rcv+0x350/0x350 <4>[1197485.264152] [] ip_local_deliver_finish+0xce/0x250 <4>[1197485.264156] [] ip_local_deliver+0x4c/0x80 <4>[1197485.264161] [] ip_rcv_finish+0x119/0x360 <4>[1197485.264165] [] ip_rcv+0x230/0x350 <4>[1197485.264170] [] __netif_receive_skb_core+0x477/0x600 <4>[1197485.264175] [] __netif_receive_skb+0x27/0x70 <4>[1197485.264180] [] process_backlog+0xf4/0x1e0 <4>[1197485.264184] [] net_rx_action+0xf5/0x250 <4>[1197485.264190] [] __do_softirq+0xef/0x270 <4>[1197485.264196] [] call_softirq+0x1c/0x30 <4>[1197485.264199] <4>[1197485.264201] [] do_softirq+0x55/0x90 <4>[1197485.264209] [] local_bh_enable+0x94/0xa0 <4>[1197485.264215] [] ipt_do_table+0x22a/0x680 <4>[1197485.264221] [] ? skb_clone_tx_timestamp+0x31/0x110 <4>[1197485.264231] [] ? ixgbe_xmit_frame_ring+0x4c0/0xd40 [ixgbe] <4>[1197485.264239] [] ? ixgbe_xmit_frame+0x43/0x90 [ixgbe] <4>[1197485.264245] [] iptable_raw_hook+0x33/0x70 <4>[1197485.264252] [] nf_iterate+0x87/0xb0 <4>[1197485.264256] [] ? ip_options_echo+0x420/0x420 <4>[1197485.264261] [] nf_hook_slow+0x7d/0x160 <4>[1197485.264266] [] ? ip_options_echo+0x420/0x420 <4>[1197485.264270] [] __ip_local_out+0xa0/0xb0 <4>[1197485.264275] [] ip_local_out+0x16/0x30 <4>[1197485.264280] [] ip_queue_xmit+0x15a/0x390 <4>[1197485.264286] [] ? tcp_v4_md5_lookup+0x13/0x20 <4>[1197485.264290] [] tcp_transmit_skb+0x485/0x890 <4>[1197485.264295] [] tcp_write_xmit+0x1b8/0xa50 <4>[1197485.264300] [] ? __alloc_skb+0xa8/0x1f0 <4>[1197485.264304] [] tcp_push_one+0x30/0x40 <4>[1197485.264309] [] tcp_sendmsg+0xbe4/0xd40 <4>[1197485.264315] [] ? put_page+0x35/0x40 <4>[1197485.264321] [] inet_sendmsg+0x45/0xb0 <4>[1197485.264326] [] sock_aio_write+0x11e/0x130 <4>[1197485.264331] [] ? inet_recvmsg+0x4f/0x80 <4>[1197485.264337] [] do_sync_readv_writev+0x6d/0xa0 <4>[1197485.264343] [] do_readv_writev+0xfb/0x2f0 <4>[1197485.264347] [] ? __free_pages+0x35/0x40 <4>[1197485.264352] [] ? free_pages+0x46/0x50 <4>[1197485.264357] [] ? SyS_mincore+0x152/0x690 <4>[1197485.264363] [] vfs_writev+0x48/0x60 <4>[1197485.264367] [] SyS_writev+0x5f/0xd0 <4>[1197485.264373] [] system_call_fastpath+0x16/0x1b <4>[1197485.264377] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 8b 3c 24 <49> 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c <1>[1197485.264417] RIP [] kmem_cache_alloc+0x5a/0x130 <4>[1197485.264424] RSP <4>[1197485.264427] CR2: 0000000100000000 <4>[1197485.264431] ---[ end trace 90fee06aa40b7305 ]--- <0>[1197485.325141] Kernel panic - not syncing: Fatal exception in interrupt ... way down in the tcp code. Any help would be appreciated :) I'll do what I can to help, but iterating this particular crash is very hard due to the amount of time it takes to reproduce. Since we have a large number of machines they're always crashing here and there, but once they do it's not going to happen again for a while. Thanks! -Dormando -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/