Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754443Ab0A1GZY (ORCPT ); Thu, 28 Jan 2010 01:25:24 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753593Ab0A1GZX (ORCPT ); Thu, 28 Jan 2010 01:25:23 -0500 Received: from dallas.jonmasters.org ([72.29.103.172]:32953 "EHLO dallas.jonmasters.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753192Ab0A1GZW (ORCPT ); Thu, 28 Jan 2010 01:25:22 -0500 Subject: PROBLEM: reproducible crash KVM+nf_conntrack all recent 2.6 kernels From: Jon Masters To: linux-kernel Cc: netdev , netfilter-devel@vger.kernel.org Content-Type: text/plain Organization: World Organi[sz]ation of Broken Dreams Date: Thu, 28 Jan 2010 00:45:59 -0500 Message-Id: <1264657559.2793.103.camel@tonnant> Mime-Version: 1.0 X-Mailer: Evolution 2.26.3 (2.26.3-1.fc11) Content-Transfer-Encoding: 7bit X-SA-Do-Not-Run: Yes X-SA-Exim-Connect-IP: 127.0.0.1 X-SA-Exim-Mail-From: jonathan@jonmasters.org X-SA-Exim-Scanned: No (on dallas.jonmasters.org); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5247 Lines: 114 Folks, A number of people seem to have reported this crash in various forms, but I have yet to see a solution, and can reproduce on 2.6.33-rc5 this evening so I know it's still present in the latest upstream kernels too. Userspace is Fedora 12, and this happens on both all recent F12 kernels (sporadic in 2.6.31 until recently, solidly reproducible on 2.6.32) and upstream 2.6.32, and 2.6.33-rc5 also - hard to find a "known good". The problem happens when using netfilter with KVM (problem does not occur without the firewall loaded, for example) and will occur within a few minutes of attempting to start or stop a guest that is connecting to the network - the easiest way to reproduce so far is simply to start up a bunch of Fedora guests and have them do a "yum update" cycle. All of the crashes appear similar to the following (2.6.33-rc5): general protection fault: 0000 [#1] SMP last sysfs file: /sys/kernel/mm/ksm/run CPU 6 Pid: 2982, comm: qemu-kvm Not tainted 2.6.33-rc5 #2 0F9382/Precision WorkStation 490 RIP: 0010:[] [] destroy_conntrack +0x82/0x114 RSP: 0018:ffff880028383c48 EFLAGS: 00010202 RAX: 0000000080000001 RBX: ffffffff81af33a0 RCX: 0000000000007530 RDX: dead000000200200 RSI: 0000000000000011 RDI: ffffffff81af33a0 RBP: ffff880028383c58 R08: ffff8802171b14d0 R09: 000000000000000a R10: 00000040283957c0 R11: ffff8800283838a8 R12: ffffffff81ddbce0 R13: ffffffffa0281389 R14: 0000000000000000 R15: ffff88021140f430 FS: 00007fc17b7d2780(0000) GS:ffff880028380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fc12c038000 CR3: 00000001db1bb000 CR4: 00000000000026e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process qemu-kvm (pid: 2982, threadinfo ffff8801dab40000, task ffff8801dab38000) Stack: ffff88021140f400 ffff88021360e410 ffff880028383c68 ffffffff813b2016 <0> ffff880028383c88 ffffffff8138dbc3 ffff880028383c88 ffff88021140f400 <0> ffff880028383ca8 ffffffff8138d925 0000000300000000 ffff88021140f400 Call Trace: [] nf_conntrack_destroy+0x1b/0x1d [] skb_release_head_state+0x77/0xb9 [] __kfree_skb+0x16/0x82 [] kfree_skb+0x6a/0x73 [] ip6_mc_input+0x214/0x221 [ipv6] [] ip6_rcv_finish+0x27/0x2b [ipv6] [] ipv6_rcv+0x306/0x33f [ipv6] [] ? nf_hook_slow+0x6a/0xcb [] ? netif_receive_skb+0x0/0x3c6 [] netif_receive_skb+0x3a1/0x3c6 [] br_handle_frame_finish+0x104/0x13c [bridge] [] br_handle_frame+0x191/0x1aa [bridge] [] netif_receive_skb+0x30d/0x3c6 [] process_backlog+0x8a/0xc3 [] net_rx_action+0x78/0x17e [] __do_softirq+0xe5/0x1a6 [] call_softirq+0x1c/0x30 [] ? do_softirq+0x46/0x83 [] netif_rx_ni+0x26/0x2b [] tun_chr_aio_write+0x3ce/0x429 [tun] [] ? tun_chr_aio_write+0x0/0x429 [tun] [] do_sync_readv_writev+0xc1/0x100 [] ? selinux_file_permission+0xa7/0xb3 [] ? copy_from_user+0x2f/0x31 [] ? security_file_permission+0x16/0x18 [] do_readv_writev+0xa7/0x127 [] ? unlock_timer+0x12/0x14 [] ? sys_timer_settime+0x258/0x2aa [] vfs_writev+0x43/0x4e [] sys_writev+0x4a/0x93 [] system_call_fastpath+0x16/0x1b Code: c7 00 cd dd 81 e8 67 f6 ff ff 48 89 df e8 90 28 00 00 f6 43 78 08 75 2a 48 8b 53 10 48 85 d2 75 04 0f 0b eb fe 48 8b 43 08 a8 01 <48> 89 02 75 04 48 89 50 08 48 b8 00 02 20 00 00 00 ad de 48 89 RIP [] destroy_conntrack+0x82/0x114 RSP ---[ end trace ee1619cd5f767f78 ]--- Kernel panic - not syncing: Fatal exception in interrupt Pid: 2982, comm: qemu-kvm Tainted: G D 2.6.33-rc5 #2 Call Trace: [] panic+0x7a/0x13d [] oops_end+0xb7/0xc7 [] die+0x5a/0x63 Several people have suggested various sysctls. I note that my F12 box has the following set by default now: # Disable netfilter on bridges. net.bridge.bridge-nf-call-ip6tables = 0 net.bridge.bridge-nf-call-iptables = 0 net.bridge.bridge-nf-call-arptables = 0 This does not fix the problem, although I am indeed using bridged networking for the guest instances. At this point, I've disabled loading the firewall modules on this box since it's behind a firewall anyway and I need it to keep running more than ten minutes at a time :) but I am obviously interested in helping to track this down and fix it. I don't know the code in question and I won't have time to poke much further until the weekend. Jon. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/