Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752022AbbLEEsv (ORCPT ); Fri, 4 Dec 2015 23:48:51 -0500 Received: from mailrelay.lanline.com ([216.187.10.16]:59490 "EHLO mailrelay.lanline.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750968AbbLEEst (ORCPT ); Fri, 4 Dec 2015 23:48:49 -0500 X-Greylist: delayed 1213 seconds by postgrey-1.27 at vger.kernel.org; Fri, 04 Dec 2015 23:48:49 EST MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <22114.26609.457727.203801@quad.stoffel.home> Date: Fri, 4 Dec 2015 23:28:33 -0500 From: "John Stoffel" To: linux-kernel@vger.kernel.org CC: netdev@vger.kernel.org Subject: 4.4-rc3, KVM, br0 and instant hang X-Mailer: VM 8.2.0b under 24.4.1 (x86_64-pc-linux-gnu) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6181 Lines: 157 Hi all, I've been trying to upgrade to something newer than 4.2.6 since I want to use LVM Cache on my home NFS fileserver, KVM server, test server, etc. So when it goes down, I lose all my other systems which mount stuff from it. Right now I'm trying to figure out how to use Netconsole to grab a dump of the oops, but it's not working well. But let me describe the situation as I've found it so far. When the system boots up, it first starts with eth0 on the network, then switches to br0 since I have a KVM bridge setup so my VMs can run on the same home network, 192.168.1.0/24 which is pretty standard. The system is an AMD Phenom(tm) II X4 945 Processor, running at a max of 3Ghz, with 16gb of RAM, mpt2 LSI PCI-E 8 port sata controller, on an ASUS motherboard. I can get details if you like. It's an older box, but still runs really well, so why change? Anyway, if I try to boot up anything past the 4.2.6 kernel, the system locks up pretty quickly with an oops message that scrolls off the screen too far. I've got some pictures which I'll attach in a bit, maybe they'll help. So at first I thought it was something to do with bad kworker threads, or SCSI or SATA interactions, but as I tried to configure Netconsole to log to my beaglebone black SBC, I found out that if I compiled and installed 4.4-rc3, started the bridge up (br0), even started KVM, but did NOT start my VMs, the system was stable. And if I didn't start br0, I could start a VM, but the system wouldn't crash. The VM wasn't on the network... but the system didn't crash. So I think I've found a wierd interaction here. My KVMs are both Debian images, with 1-2gb of RAM and 1 CPU each. Nothing strange. My network config is: > cat /etc/network/interfaces # This file describes the network interfaces available on your system # and how to activate them. For more information, see interfaces(5). # The loopback network interface auto lo iface lo inet loopback # Bridge for VMs auto br0 iface br0 inet static address 192.168.1.6 netmask 255.255.255.0 network 192.168.1.0 gateway 192.168.1.254 bridge_ports eth0 bridge_stp on bridge_maxwait 0 bridge_fd 0 # Old setup # auto eth0 # iface eth0 inet static # address 192.168.1.6 # netmask 255.255.255.0 # gateway 192.168.1.254 The currently running system version is: > cat /proc/version Linux version 4.4.0-rc3 (john@quad) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Thu Dec 3 12:13:30 EST 2015 And more detailed CPU info > cat /proc/cpuinfo ..... processor : 3 vendor_id : AuthenticAMD cpu family : 16 model : 4 model name : AMD Phenom(tm) II X4 945 Processor stepping : 3 microcode : 0x10000b6 cpu MHz : 800.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock nrip_save vmmcall bugs : tlb_mmatch apic_c1e fxsave_leak sysret_ss_attrs bogomips : 6027.13 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate Here's my bootup messages, unfortunately I don't have any oops messages. For whatever reason, it kicks in so quickly, that I can't get anything out over the network. I'm going to see if I can stuff another network card in there and use that to send traffic, instead of over the brige. My next step is going to be to try and disable some of the bridge settings, like bridge_stp, bridge_maxwait and bridge_fd to just accept the defaults. I set this up under Debian Wheezy a long time ago and never touched it since. My network config is: quad:~> ifconfig -a br0 Link encap:Ethernet HWaddr 20:cf:30:95:5f:2f inet addr:192.168.1.6 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: 2002:42bd:1ac0:1:22cf:30ff:fe95:5f2f/64 Scope:Global inet6 addr: fe80::22cf:30ff:fe95:5f2f/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:24154 errors:0 dropped:0 overruns:0 frame:0 TX packets:16103 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:68682293 (65.5 MiB) TX bytes:2563964 (2.4 MiB) eth0 Link encap:Ethernet HWaddr 20:cf:30:95:5f:2f UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:66460 errors:0 dropped:0 overruns:0 frame:0 TX packets:18157 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:71819217 (68.4 MiB) TX bytes:2782126 (2.6 MiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:7308 errors:0 dropped:0 overruns:0 frame:0 TX packets:7308 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1539613 (1.4 MiB) TX bytes:1539613 (1.4 MiB) Any suggestions on what else I can do to help debug this issue? It's amazing how quickly the system locks up when I have all three steps taken. John -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/