Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754885AbZDNWzO (ORCPT ); Tue, 14 Apr 2009 18:55:14 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752011AbZDNWyy (ORCPT ); Tue, 14 Apr 2009 18:54:54 -0400 Received: from mail001.aei.ca ([206.123.6.130]:51391 "EHLO mail001.aei.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751833AbZDNWyw (ORCPT ); Tue, 14 Apr 2009 18:54:52 -0400 From: Ed Tomlinson To: Avi Kivity , bridge@lists.linux-foundation.org, shemminger@linux-foundation.org Subject: Re: 2.6.30-rc1 A few issues and a stall (nmi_watchdog traceback) Date: Tue, 14 Apr 2009 18:54:42 -0400 User-Agent: KMail/1.11.2 (Linux/2.6.30-rc1-crc; KDE/4.2.2; x86_64; ; ) Cc: LKML , netdev@vger.kernel.org, linux-acpi@vger.kernel.org References: <200904121124.07193.edt@aei.ca> <200904140832.25057.edt@aei.ca> <49E4840E.6010300@redhat.com> In-Reply-To: <49E4840E.6010300@redhat.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200904141854.43205.edt@aei.ca> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8953 Lines: 146 On Tuesday 14 April 2009 08:39:42 Avi Kivity wrote: > Ed Tomlinson wrote: > > On Tuesday 14 April 2009 06:03:58 Avi Kivity wrote: > > > >> Ed Tomlinson wrote: > >> > >>> Once the above networking stuff is setup I start kvm with the command below > >>> > >>> QEMU_AUDIO_DRV=alsa kvm -m 1024 -cdrom /mnt/sdc4/divx/archlinux-2009.02-ftp-i686.iso -boot c -smp 3 -usb -usbdevice tablet -vga std -drive file=arch.img -net nic,macaddr=52:54:00:12:34:23 -net tap,ifname=qtap0,script=no -soundhw all -mem-path /hugepages > >>> > >>> which works and the kvm session boots just fine. > >>> > >>> Issue 2. When I attempt to ping outside the kvm session the pc (not just the kvm session) hangs. > >>> Its impossible to kill the kvm session and there are numerious info messages from RCU (tree RCU enabled) > >>> about stalls. > >>> > >>> > >> The rcu messages are likely because a processor has died. > >> > >> Do things work if you drop -mem-path? > >> > > > > It makes no difference. I did notice that one cpu is peged at 100% though. I'll be trying > > CONFIG_PROVE_LOCKING tonight for another problem - might give interesting results. > > > > Oh, so this goes away without CONFIG_PROVE_LOCKING? > With CONFIG_PROVE_LOCKING, DEBUG_PAGE_ALLOC and the nmi_watchdog=2, I get the following info: I used the following sequence to start the kvm: ed@grover ~/vm $ sudo modprobe tun ed@grover ~/vm $ sudo modprobe nf_nat ed@grover ~/vm $ sudo modprobe ipt_MASQUERADE ed@grover ~/vm $ sudo brctl addbr br0 ed@grover ~/vm $ sudo ifconfig br0 192.168.100.254 netmask 255.255.255.0 up ed@grover ~/vm $ sudo tunctl -b -u ed -t qtap0 qtap0 ed@grover ~/vm $ sudo brctl addif br0 qtap0 ed@grover ~/vm $ sudo ifconfig qtap0 up 0.0.0.0 promisc ed@grover ~/vm $ sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE iptables v1.4.2: can't initialize iptables table `nat': Table does not exist (do you need to insmod?) Perhaps iptables or your kernel needs to be upgraded. ed@grover ~/vm $ sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE (Yes I really did have to run iptables twice and it did work the second time...) Looks like the spinning lock is the the bridge code. [ 369.387893] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) [ 393.537476] device qtap0 entered promiscuous mode [ 394.585998] br0: no IPv6 routers present [ 398.171309] br0: port 1(qtap0) entering learning state [ 403.081910] ip_tables: (C) 2000-2006 Netfilter Core Team [ 408.945932] qtap0: no IPv6 routers present [ 413.183725] br0: topology change detected, propagating [ 413.189027] br0: port 1(qtap0) entering forwarding state [ 434.211892] kvm: 7758: cpu0 kvm_set_msr_common: MSR_IA32_MC0_STATUS 0x0, nop [ 434.420571] kvm: 7758: cpu1 kvm_set_msr_common: MSR_IA32_MC0_STATUS 0x0, nop [ 434.452731] kvm: 7758: cpu2 kvm_set_msr_common: MSR_IA32_MC0_STATUS 0x0, nop [ 499.270674] BUG: NMI Watchdog detected LOCKUP on CPU2, ip ffffffff802c97a3, registers: [ 499.270674] CPU 2 [ 499.270674] Modules linked in: iptable_nat ip_tables ipt_MASQUERADE x_tables nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 btrfs zlib_deflate zlib_inflate crc32c libcrc32c radeon drm bridge rfcomm stp llc bnep l2cap bluetooth tun sit tunnel4 ipv6 af_packet snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss edac_core kvm_amd kvm video1394 sbp2 l gdt330x em28xx_dvb dvb_core em28xx_alsa tuner_xc2028 tuner tvp5150 em28xx ir_common v4l2_common videodev v4l2_compat_ioctl32 videobuf _vmalloc videobuf_core tveeprom pl2303 i2c_core usbserial usbhid usb_storage sr_mod cdrom asus_atk0110 rtc_cmos snd_hda_codec_atihdmi rtc_core snd_hda_codec_realtek rtc_lib ohci1394 ieee1394 snd_hda_intel sky2 snd_hda_codec wmi thermal processor snd_pcm button pcspk r snd_timer snd ata_generic soundcore snd_page_alloc ehci_hcd pata_atiixp ohci_hcd evdev unix [last unloaded: floppy] [ 499.270674] Pid: 7759, comm: kvm Tainted: G D 2.6.30-rc1-crc #7 System Product Name [ 499.270674] RIP: 0010:[] [] ftrace_likely_update+0x23/0x50 [ 499.270674] RSP: 0018:ffff88002813aa68 EFLAGS: 00000002 [ 499.270674] RAX: 000000005c8eefb8 RBX: 0000000000000000 RCX: 0000000000000000 [ 499.270674] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8087e280 [ 499.270674] RBP: ffff88002813aa78 R08: 0000000000000000 R09: 0000000000000001 [ 499.270674] R10: 0000000000000000 R11: ffff880140c75e20 R12: 0000000049d2f025 [ 499.270674] R13: 0000000000000001 R14: 0000000049d2f068 R15: 0000000000000002 [ 499.270674] FS: 00007f8ba8a4d950(0000) GS:ffff880028137000(0000) knlGS:0000000000000000 [ 499.270674] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 499.270674] CR2: 00007fcea464e680 CR3: 0000000145013000 CR4: 00000000000006e0 [ 499.270674] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 499.270674] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 499.270674] Process kvm (pid: 7759, threadinfo ffff8801450de000, task ffff880140c75e20) [ 499.270674] Stack: [ 499.270674] 000000005c8eefb8 000000005c8eefb8 ffff88002813aac8 ffffffff80445f3d [ 499.270674] 000000005c8eefb8 0000000000000046 000000005c8eefb8 000000000393fe6c [ 499.270674] ffff88013f947a38 000000007d7d7530 0000000000000001 0000000000000000 [ 499.270674] Call Trace: [ 499.270674] <0> [] delay_tsc+0x10d/0x140 [ 499.270674] [] __delay+0x22/0x40 [ 499.270674] [] _raw_spin_lock+0x16f/0x1f0 [ 499.270674] [] _spin_lock_irqsave+0x5d/0x80 [ 499.270674] [] ? __wake_up_sync_key+0x78/0xd0 [ 499.270674] [] __wake_up_sync_key+0x78/0xd0 [ 499.270674] [] ? br_flood+0xf9/0x110 [bridge] [ 499.270674] [] __wake_up_sync+0x23/0x40 [ 500.092533] [] tun_sock_write_space+0x57/0xb0 [tun] [ 500.092533] [] sock_wfree+0x79/0x90 [ 500.092533] [] ? trace_hardirqs_on_caller+0x32/0x1e0 [ 500.092533] [] skb_release_head_state+0x98/0x120 [ 500.092533] [] ? netif_receive_skb+0x118/0x4a0 [ 500.092533] [] __kfree_skb+0x25/0xe0 [ 500.092533] [] kfree_skb+0x138/0x180 [ 500.092533] [] br_flood+0xf9/0x110 [bridge] [ 500.092533] [] ? netif_receive_skb+0x118/0x4a0 [ 500.092533] [] ? __br_forward+0x0/0x120 [bridge] [ 500.092533] [] br_flood_forward+0x28/0x50 [bridge] [ 500.157359] [] br_handle_frame_finish+0x19b/0x1c0 [bridge] [ 500.157359] [] ? netif_receive_skb+0x118/0x4a0 [ 500.157359] [] br_handle_frame+0x24a/0x260 [bridge] [ 500.157359] [] netif_receive_skb+0x263/0x4a0 [ 500.157359] [] ? netif_receive_skb+0x118/0x4a0 [ 500.157359] [] process_backlog+0x97/0xf0 [ 500.157359] [] net_rx_action+0x277/0x370 [ 500.157359] [] ? net_rx_action+0xe5/0x370 [ 500.157359] [] __do_softirq+0x142/0x2e0 [ 500.157359] [] call_softirq+0x1c/0xb0 [ 500.157359] <0> [] do_softirq+0x9d/0x150 [ 500.157359] [] netif_rx_ni+0xa5/0xc0 [ 500.157359] [] tun_chr_aio_write+0x3c1/0x4b0 [tun] [ 500.157359] [] ? tun_chr_aio_write+0x0/0x4b0 [tun] [ 500.157359] [] do_sync_readv_writev+0x103/0x160 [ 500.157359] [] ? autoremove_wake_function+0x0/0x70 [ 500.157359] [] ? kfree+0xd5/0x270 [ 500.157359] [] ? security_file_permission+0x29/0x50 [ 500.157359] [] ? rw_verify_area+0x10f/0x180 [ 500.157359] [] do_readv_writev+0xec/0x220 [ 500.283996] [] ? sub_preempt_count+0x69/0x70 [ 500.283996] [] ? __rcu_read_unlock+0x66/0xa0 [ 500.283996] [] ? fget_light+0xb7/0x190 [ 500.283996] [] vfs_writev+0x68/0x90 [ 500.283996] [] sys_writev+0x66/0xf0 [ 500.283996] [] system_call_fastpath+0x16/0x1b [ 500.283996] Code: ff 90 90 90 90 90 90 90 55 48 89 e5 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89 45 f8 31 c0 39 d6 74 15 48 ff 4 7 20 48 8b 45 f8 <65> 48 33 04 25 28 00 00 00 75 0c c9 c3 48 ff 47 18 0f 1f 40 00 [ 500.283996] ---[ end trace 19d93ad2f835aa80 ]--- Ideas? Ed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/