Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755392Ab0HQLIs (ORCPT ); Tue, 17 Aug 2010 07:08:48 -0400 Received: from reptilian.habets.pp.se ([193.151.93.131]:4637 "EHLO reptilian.habets.pp.se" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751937Ab0HQLIr (ORCPT ); Tue, 17 Aug 2010 07:08:47 -0400 Date: Tue, 17 Aug 2010 13:08:41 +0200 (CEST) From: Thomas Habets X-X-Sender: thompa@red.crap.retrofitta.se To: Eric Dumazet cc: Thomas Habets , linux-kernel@vger.kernel.org, netdev Subject: Re: BUG: IPv6 stops working after a while, needs ip ne del command to reset In-Reply-To: <1282024802.2487.687.camel@edumazet-laptop> Message-ID: References: <1281953960.2524.23.camel@edumazet-laptop> <1282024802.2487.687.camel@edumazet-laptop> User-Agent: Alpine 1.10 (DEB 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5140 Lines: 136 Aha! New development: The Cisco router can't discover the address of the Linux box because Linux doesn't seem to be listening to ff02::1 (all-nodes). ----------- cisco#ping ff02::1 Output Interface: GigabitEthernet1/2 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to FF02::1, timeout is 2 seconds: Packet sent with a source address of FE80::222:55FF:FE17:4B80%GigabitEthernet1/2 Request 0 timed out Request 1 timed out Request 2 timed out Request 3 timed out Request 4 timed out Success rate is 0 percent (0/5) 0 multicast replies and 0 errors. ------------ If i set promisc mode on the interface (tcpdump without -p or "ip link set promisc on eth0") it starts working (both normal ping and the above ping from the Cisco to ff02::1). It continues working until I guess the neighbor table on the cisco times out (leaving it overnight seems to be enough idle time) or I manually do a "clear ipv6 neig". So great news! I can reproduce it at will with no waiting time! Right after rebooting the Linux box I run "clear ipv6 neighbors" and Linux can no longer ping the router. Tested reproducing it immediately after reboot. The Linux box itself can ping ff02::1%eth0 with no problem, and gets replies from the fe80:: link-local of itself and the Cisco router. So could this be that for some reason the NIC isn't listening multicast MAC address 33:33:ff:5c:00:02 ? Is there a way to see the list of addresses that get past the NIC? Or can this perhaps be filtered after the NIC, but before tcpdump -p? Since this now looks like a NIC thing, here's some info about eth0: $ dmesg | grep eth0 [...] tg3 0000:03:04.0: eth0: Tigon3 [partno(N/A) rev 9003] (PCIX:133MHz:64-bit) MAC address 00:24:81:a3:44:24 tg3 0000:03:04.0: eth0: attached PHY is 5714 (10/100/1000Base-T Ethernet) (WireSpeed[1]) tg3 0000:03:04.0: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1] tg3 0000:03:04.0: eth0: dma_rwctrl[76148000] dma_mask[40-bit] [...] $ sudo lspci -v -s 03:04.0 03:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5715 Gigabit Ethernet (rev a3) Subsystem: Hewlett-Packard Company NC326i PCIe Dual Port Gigabit Server Adapter Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 47 Memory at fdff0000 (64-bit, non-prefetchable) [size=64K] Memory at fdfe0000 (64-bit, non-prefetchable) [size=64K] Expansion ROM at [disabled] Capabilities: [40] PCI-X non-bridge device Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 Enable+ Kernel driver in use: tg3 Kernel modules: tg3 $ sudo ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:24:81:a3:44:24 inet addr:x.x.x.x Bcast:x.x.x.x Mask:255.255.255.252 inet6 addr: 2a00:800:752:1::5c:2/112 Scope:Global inet6 addr: fe80::224:81ff:fea3:4424/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:928 errors:0 dropped:0 overruns:0 frame:0 TX packets:834 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:142281 (138.9 KiB) TX bytes:154616 (150.9 KiB) Interrupt:16 I have doublechecked iptables, ip6tables and arptables, and they are either not compiled in the kernel or they are empty ACCEPT lists. I have answered your questions below even if they may no longer be applicable. On Tue, 17 Aug 2010, Eric Dumazet wrote: >> $ ip -6 ne sh >> 2a00:800:752:1::5c:1 dev eth0 lladdr 00:22:55:17:4b:80 router STALE >> >> [try ping6 again, no reply] >> >> $ ip -6 ne sh >> 2a00:800:752:1::5c:1 dev eth0 lladdr 00:22:55:17:4b:80 router DELAY >> >> [try ping6 again, no reply] >> >> $ ip -6 ne sh >> 2a00:800:752:1::5c:1 dev eth0 lladdr 00:22:55:17:4b:80 router REACHABLE >> > This seems a bit different than previous mail. Apparently discovery now > works ? I didn't post the "ip -6 ne sh" immediately after ping attempt last time. I'm not sure this changed since last time. But the tcpdump output from last time seems to indicate that ND did work then, at least in one direction, even if solicitation came from link-local address and not the global address. The solicitation was answered, after all (as seen in the tcpdump in in the original mail). > Could you have a tcpdump on both sides ? Not easily. The other end is a Cisco and a bit inconvenient to get to. I'm going there tomorrow night, so I can hook up a cable and do a monitor port then if needed. --------- typedef struct me_s { char name[] = { "Thomas Habets" }; char email[] = { "thomas@habets.pp.se" }; char kernel[] = { "Linux" }; char *pgpKey[] = { "http://www.habets.pp.se/pubkey.txt" }; char pgp[] = { "A8A3 D1DD 4AE0 8467 7FDE 0945 286A E90A AD48 E854" }; char coolcmd[] = { "echo '. ./_&. ./_'>_;. ./_" }; } me_t; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/