2001-02-16 04:16:23

by Chris Friesen

[permalink] [raw]
Subject: calling all gurus! odd and subtle network problem

I am trying to get some ideas on what the heck caused a problem with the network
at work, and I was hoping someone might have some ideas.

Yesterday we were having some major network problems, many machines were
completely bogged down. This morning I came in to work to find my linux box
unplugged from the network and a note saying to call the network engineering
dept.

We have a large pool of IP addresses set aside for assignment by DHCP to support
laptops and whatnot. Apparently the network problems were caused by a
particular MAC address being associated with essentially the entire pool of
DHCP-assigned addresses, so the DHCP client boxes were all trying to negotiate
for an address and kept getting error messages and trying again. This
apparently caused enough traffic that it bogged down the rest of the network.

The kicker is that the NIC with the MAC address in question happened to be in my
G4 box running linux (yellowdog, 2.2.17 kernel). It was a D-Link 530TX NIC, if
it matters. The linux box was not configured as a DHCP server or client, and
both interfaces on the box were configured with static IP addresses. The
motherboard interface was eth0 and was set to an address on the corporate LAN.
The other NIC was eth1 and was set to an address in the 192.168 range for
testing. The machine has been up and running in this configuration since
september of last year with no known issues. I made no changes at the time the
problems started.

My understanding of the evidence is that 1) in the routers my MAC address was
associated with hundreds if not thousands of IP addresses. 2) it was sending
out packets to all boxes configured via DHCP that there was an IP address
conflict and that it in fact owned that IP address (not sure exactly what packet
this would be, but I saw a printout of an error message from a DHCP-configured
printer). 3) when they pulled my machine off the LAN, the problem stopped. 4)
today we pulled the NIC with the MAC address in question and hooked the box back
up using only eth0, and everything seems to be working fine.

On my box, the linux kernel had no knowledge of the IP addresses, "ifconfig" and
"ip addr" both showed just the two addresses assigned to it (I checked it during
the problems for work related reasons). /etc/hosts has about 9 entries, all
ones that I've put in.

Does anyone have any ideas as to what was going on? My only theory is that
something is screwy with the card or the drivers, but I have no idea why it
would run fine for almost 6 months then suddenly start causing problems.

I eagerly await your opinions.

Chris


2001-02-16 04:49:41

by Chris Friesen

[permalink] [raw]
Subject: Re: calling all gurus! odd and subtle network problem -- followup, possible answer

Chris Friesen wrote:

> The kicker is that the NIC with the MAC address in question happened to be in my
> G4 box running linux (yellowdog, 2.2.17 kernel). It was a D-Link 530TX NIC, if
> it matters. The linux box was not configured as a DHCP server or client, and
> both interfaces on the box were configured with static IP addresses. The
> motherboard interface was eth0 and was set to an address on the corporate LAN.
> The other NIC was eth1 and was set to an address in the 192.168 range for
> testing. The machine has been up and running in this configuration since
> september of last year with no known issues. I made no changes at the time the
> problems started.

Okay, so some further digging revealed that someone was testing out the
"ethertap" device, which according to him requires ip_forwarding and proxy_arp
to be turned on. I suspect this is what changed and caused the problems. Since
rebooting cleaned up the /proc filesystem and reset these to zero, the problems
went away.

This feels like the answer to me. It saw the arp requests coming in the one
interface, sent them out the other interface, and the other machines got
confused because they saw that arps coming from a different MAC address.

Guess this goes to show the problems people can get into with stuff they don't
understand....


Chris