Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755643AbZDNXMf (ORCPT ); Tue, 14 Apr 2009 19:12:35 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752783AbZDNXMU (ORCPT ); Tue, 14 Apr 2009 19:12:20 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:59013 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752280AbZDNXMT (ORCPT ); Tue, 14 Apr 2009 19:12:19 -0400 Date: Tue, 14 Apr 2009 16:09:26 -0700 From: Stephen Hemminger To: Ed Tomlinson Cc: Avi Kivity , bridge@lists.linux-foundation.org, LKML , netdev@vger.kernel.org, linux-acpi@vger.kernel.org Subject: Re: 2.6.30-rc1 A few issues and a stall (nmi_watchdog traceback) Message-ID: <20090414160926.52e07a3d@nehalam> In-Reply-To: <200904141854.43205.edt@aei.ca> References: <200904121124.07193.edt@aei.ca> <200904140832.25057.edt@aei.ca> <49E4840E.6010300@redhat.com> <200904141854.43205.edt@aei.ca> Organization: Linux Foundation X-Mailer: Claws Mail 3.6.1 (GTK+ 2.16.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9434 Lines: 152 On Tue, 14 Apr 2009 18:54:42 -0400 Ed Tomlinson wrote: > On Tuesday 14 April 2009 08:39:42 Avi Kivity wrote: > > Ed Tomlinson wrote: > > > On Tuesday 14 April 2009 06:03:58 Avi Kivity wrote: > > > > > >> Ed Tomlinson wrote: > > >> > > >>> Once the above networking stuff is setup I start kvm with the command below > > >>> > > >>> QEMU_AUDIO_DRV=alsa kvm -m 1024 -cdrom /mnt/sdc4/divx/archlinux-2009.02-ftp-i686.iso -boot c -smp 3 -usb -usbdevice tablet -vga std -drive file=arch.img -net nic,macaddr=52:54:00:12:34:23 -net tap,ifname=qtap0,script=no -soundhw all -mem-path /hugepages > > >>> > > >>> which works and the kvm session boots just fine. > > >>> > > >>> Issue 2. When I attempt to ping outside the kvm session the pc (not just the kvm session) hangs. > > >>> Its impossible to kill the kvm session and there are numerious info messages from RCU (tree RCU enabled) > > >>> about stalls. > > >>> > > >>> > > >> The rcu messages are likely because a processor has died. > > >> > > >> Do things work if you drop -mem-path? > > >> > > > > > > It makes no difference. I did notice that one cpu is peged at 100% though. I'll be trying > > > CONFIG_PROVE_LOCKING tonight for another problem - might give interesting results. > > > > > > > Oh, so this goes away without CONFIG_PROVE_LOCKING? > > > > With CONFIG_PROVE_LOCKING, DEBUG_PAGE_ALLOC and the nmi_watchdog=2, I get the following info: > > I used the following sequence to start the kvm: > > ed@grover ~/vm $ sudo modprobe tun > ed@grover ~/vm $ sudo modprobe nf_nat > ed@grover ~/vm $ sudo modprobe ipt_MASQUERADE > ed@grover ~/vm $ sudo brctl addbr br0 > ed@grover ~/vm $ sudo ifconfig br0 192.168.100.254 netmask 255.255.255.0 up > ed@grover ~/vm $ sudo tunctl -b -u ed -t qtap0 > qtap0 > ed@grover ~/vm $ sudo brctl addif br0 qtap0 > ed@grover ~/vm $ sudo ifconfig qtap0 up 0.0.0.0 promisc > ed@grover ~/vm $ sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE > iptables v1.4.2: can't initialize iptables table `nat': Table does not exist (do you need to insmod?) > Perhaps iptables or your kernel needs to be upgraded. > ed@grover ~/vm $ sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE > > (Yes I really did have to run iptables twice and it did work the second time...) > > Looks like the spinning lock is the the bridge code. > > [ 369.387893] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) > [ 393.537476] device qtap0 entered promiscuous mode > [ 394.585998] br0: no IPv6 routers present > [ 398.171309] br0: port 1(qtap0) entering learning state > [ 403.081910] ip_tables: (C) 2000-2006 Netfilter Core Team > [ 408.945932] qtap0: no IPv6 routers present > [ 413.183725] br0: topology change detected, propagating > [ 413.189027] br0: port 1(qtap0) entering forwarding state > [ 434.211892] kvm: 7758: cpu0 kvm_set_msr_common: MSR_IA32_MC0_STATUS 0x0, nop > [ 434.420571] kvm: 7758: cpu1 kvm_set_msr_common: MSR_IA32_MC0_STATUS 0x0, nop > [ 434.452731] kvm: 7758: cpu2 kvm_set_msr_common: MSR_IA32_MC0_STATUS 0x0, nop > [ 499.270674] BUG: NMI Watchdog detected LOCKUP on CPU2, ip ffffffff802c97a3, registers: > [ 499.270674] CPU 2 > [ 499.270674] Modules linked in: iptable_nat ip_tables ipt_MASQUERADE x_tables nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 > btrfs zlib_deflate zlib_inflate crc32c libcrc32c radeon drm bridge rfcomm stp llc bnep l2cap bluetooth tun sit tunnel4 ipv6 af_packet > snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss edac_core kvm_amd kvm video1394 sbp2 l > gdt330x em28xx_dvb dvb_core em28xx_alsa tuner_xc2028 tuner tvp5150 em28xx ir_common v4l2_common videodev v4l2_compat_ioctl32 videobuf > _vmalloc videobuf_core tveeprom pl2303 i2c_core usbserial usbhid usb_storage sr_mod cdrom asus_atk0110 rtc_cmos snd_hda_codec_atihdmi > rtc_core snd_hda_codec_realtek rtc_lib ohci1394 ieee1394 snd_hda_intel sky2 snd_hda_codec wmi thermal processor snd_pcm button pcspk > r snd_timer snd ata_generic soundcore snd_page_alloc ehci_hcd pata_atiixp ohci_hcd evdev unix [last unloaded: floppy] > [ 499.270674] Pid: 7759, comm: kvm Tainted: G D 2.6.30-rc1-crc #7 System Product Name > [ 499.270674] RIP: 0010:[] [] ftrace_likely_update+0x23/0x50 > [ 499.270674] RSP: 0018:ffff88002813aa68 EFLAGS: 00000002 > [ 499.270674] RAX: 000000005c8eefb8 RBX: 0000000000000000 RCX: 0000000000000000 > [ 499.270674] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8087e280 > [ 499.270674] RBP: ffff88002813aa78 R08: 0000000000000000 R09: 0000000000000001 > [ 499.270674] R10: 0000000000000000 R11: ffff880140c75e20 R12: 0000000049d2f025 > [ 499.270674] R13: 0000000000000001 R14: 0000000049d2f068 R15: 0000000000000002 > [ 499.270674] FS: 00007f8ba8a4d950(0000) GS:ffff880028137000(0000) knlGS:0000000000000000 > [ 499.270674] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 499.270674] CR2: 00007fcea464e680 CR3: 0000000145013000 CR4: 00000000000006e0 > [ 499.270674] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 499.270674] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 499.270674] Process kvm (pid: 7759, threadinfo ffff8801450de000, task ffff880140c75e20) > [ 499.270674] Stack: > [ 499.270674] 000000005c8eefb8 000000005c8eefb8 ffff88002813aac8 ffffffff80445f3d > [ 499.270674] 000000005c8eefb8 0000000000000046 000000005c8eefb8 000000000393fe6c > [ 499.270674] ffff88013f947a38 000000007d7d7530 0000000000000001 0000000000000000 > [ 499.270674] Call Trace: > [ 499.270674] <0> [] delay_tsc+0x10d/0x140 > [ 499.270674] [] __delay+0x22/0x40 > [ 499.270674] [] _raw_spin_lock+0x16f/0x1f0 > [ 499.270674] [] _spin_lock_irqsave+0x5d/0x80 > [ 499.270674] [] ? __wake_up_sync_key+0x78/0xd0 > [ 499.270674] [] __wake_up_sync_key+0x78/0xd0 > [ 499.270674] [] ? br_flood+0xf9/0x110 [bridge] > [ 499.270674] [] __wake_up_sync+0x23/0x40 > [ 500.092533] [] tun_sock_write_space+0x57/0xb0 [tun] > [ 500.092533] [] sock_wfree+0x79/0x90 > [ 500.092533] [] ? trace_hardirqs_on_caller+0x32/0x1e0 > [ 500.092533] [] skb_release_head_state+0x98/0x120 > [ 500.092533] [] ? netif_receive_skb+0x118/0x4a0 > [ 500.092533] [] __kfree_skb+0x25/0xe0 > [ 500.092533] [] kfree_skb+0x138/0x180 > [ 500.092533] [] br_flood+0xf9/0x110 [bridge] > [ 500.092533] [] ? netif_receive_skb+0x118/0x4a0 > [ 500.092533] [] ? __br_forward+0x0/0x120 [bridge] > [ 500.092533] [] br_flood_forward+0x28/0x50 [bridge] > [ 500.157359] [] br_handle_frame_finish+0x19b/0x1c0 [bridge] > [ 500.157359] [] ? netif_receive_skb+0x118/0x4a0 > [ 500.157359] [] br_handle_frame+0x24a/0x260 [bridge] > [ 500.157359] [] netif_receive_skb+0x263/0x4a0 > [ 500.157359] [] ? netif_receive_skb+0x118/0x4a0 > [ 500.157359] [] process_backlog+0x97/0xf0 > [ 500.157359] [] net_rx_action+0x277/0x370 > [ 500.157359] [] ? net_rx_action+0xe5/0x370 > [ 500.157359] [] __do_softirq+0x142/0x2e0 > [ 500.157359] [] call_softirq+0x1c/0xb0 > [ 500.157359] <0> [] do_softirq+0x9d/0x150 > [ 500.157359] [] netif_rx_ni+0xa5/0xc0 > [ 500.157359] [] tun_chr_aio_write+0x3c1/0x4b0 [tun] > [ 500.157359] [] ? tun_chr_aio_write+0x0/0x4b0 [tun] > [ 500.157359] [] do_sync_readv_writev+0x103/0x160 > [ 500.157359] [] ? autoremove_wake_function+0x0/0x70 > [ 500.157359] [] ? kfree+0xd5/0x270 > [ 500.157359] [] ? security_file_permission+0x29/0x50 > [ 500.157359] [] ? rw_verify_area+0x10f/0x180 > [ 500.157359] [] do_readv_writev+0xec/0x220 > [ 500.283996] [] ? sub_preempt_count+0x69/0x70 > [ 500.283996] [] ? __rcu_read_unlock+0x66/0xa0 > [ 500.283996] [] ? fget_light+0xb7/0x190 > [ 500.283996] [] vfs_writev+0x68/0x90 > [ 500.283996] [] sys_writev+0x66/0xf0 > [ 500.283996] [] system_call_fastpath+0x16/0x1b > [ 500.283996] Code: ff 90 90 90 90 90 90 90 55 48 89 e5 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89 45 f8 31 c0 39 d6 74 15 48 ff 4 > 7 20 48 8b 45 f8 <65> 48 33 04 25 28 00 00 00 75 0c c9 c3 48 ff 47 18 0f 1f 40 00 > [ 500.283996] ---[ end trace 19d93ad2f835aa80 ]--- > > Ideas? > Ed Since bridge is echoing back packets to TUN device (flood), maybe there is some way the tun socket is still locked in that path? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/