2000-12-01 07:04:07

by Mike Perry

[permalink] [raw]
Subject: 2.2.17 IP masq bug

While setting up a cluster, and I've stumbled across what appears to be a bug
in the IP masqing of 2.2.17.

Here's my setup:
15 machines on 10.0.0.1-15. 10.0.0.1 has IP aliasing and is also on the
external net, so it can IP masq the other 14 machines. The machines are on a
switch, and share a semi-switched network segment with a bunch of other
external IP'd machines (we are all in the same lab, actually).

The bug:
When I make a connection from any internal node to the one of the other
externally routed machines in my lab, then close it, this external machine then
becomes unreachable to successive connects from that node.

ex:
[root@node2 /root]# telnet 128.174.21.2 22
Trying 128.174.21.2...
Connected to fake.ip.uiuc.edu (128.174.21.2).
Escape character is '^]'.
SSH-1.5-1.2.27
^]
telnet> q
Connection closed.

[root@node2 /root]# telnet 128.174.21.2 22
Trying 128.174.21.2...

...

The problem also happens if I telnet to a closed port a few times in a row.
Soon the machine is unreachable by any network traffic from that node. If I
switch to a new node, I can connect just once from that node, and then
silence.

This problem does NOT manifest itself for connecting to machines outside of
the local network. That seems to work fine.

More detailed setup info:
If it matters, all internal machines use eepro100's, and netboot via
dhcp/PXE off of the 10.0.0.1 machine.

Here's a sample routing table of the internal machines:
[root@node2 /root]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 eth0
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
0.0.0.0 10.0.0.1 0.0.0.0 UG 0 0 0 eth0

And the world node:
[root@world /root]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
128.174.21.50 0.0.0.0 255.255.255.255 UH 0 0 0 eth0
128.174.21.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
10.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 eth0
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
0.0.0.0 128.174.21.11 0.0.0.0 UG 0 0 0 eth0

And the script to load the IP masq modules and setup ipchains:
#!/bin/sh
/sbin/ifconfig eth0:0 10.0.0.1
echo 1 > /proc/sys/net/ipv4/ip_forward
/sbin/ipchains -F input
/sbin/ipchains -F output
/sbin/ipchains -F forward
/sbin/ipchains -P input ACCEPT
/sbin/ipchains -P output ACCEPT
/sbin/ipchains -P forward DENY
/sbin/ipchains -A forward -s 10.0.0.0/8 -j MASQ

All the ip masq modules are loadded:
[root@world /root]# lsmod
Module Size Used by
ip_masq_vdolive 1336 0 (unused)
ip_masq_user 2632 0 (unused)
ip_masq_raudio 3000 0 (unused)
ip_masq_quake 1352 0 (unused)
ip_masq_portfw 2560 0 (unused)
ip_masq_mfw 3144 0 (unused)
ip_masq_irc 1592 0 (unused)
ip_masq_ftp 2616 0 (unused)
ip_masq_cuseeme 1080 0 (unused)
ip_masq_autofw 2480 0 (unused)

The problem still occurs with no modules loaded.

--
Mike Perry
http://so.fscked.org


2000-12-01 07:46:29

by Julian Anastasov

[permalink] [raw]
Subject: Re: 2.2.17 IP masq bug


Hello,

On Fri, 1 Dec 2000, Mike Perry wrote:

> external net, so it can IP masq the other 14 machines. The machines are on a
> switch, and share a semi-switched network segment with a bunch of other
> external IP'd machines (we are all in the same lab, actually).
>
> The bug:
> When I make a connection from any internal node to the one of the other
> externally routed machines in my lab, then close it, this external machine then
> becomes unreachable to successive connects from that node.

This problem can be caused from the ICMP redirect. Can these
commands help?

echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects
echo 0 > /proc/sys/net/ipv4/conf/eth0/send_redirects


Regards

--
Julian Anastasov <[email protected]>

2000-12-01 08:35:22

by Mike Perry

[permalink] [raw]
Subject: Re: 2.2.17 IP masq bug

Thus spake Julian Anastasov ([email protected]):

> On Fri, 1 Dec 2000, Mike Perry wrote:
>
> > The bug:
> > When I make a connection from any internal node to the one of the other
> > externally routed machines in my lab, then close it, this external machine then
> > becomes unreachable to successive connects from that node.
>
> This problem can be caused from the ICMP redirect. Can these
> commands help?
>
> echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects
> echo 0 > /proc/sys/net/ipv4/conf/eth0/send_redirects

Why yes they do. Problem seems to be completely solved. *blush*
At least it wasn't in the HOWTO.

Thanks!

--
Mike Perry
http://so.fscked.org

2000-12-01 10:40:15

by Ben McCann

[permalink] [raw]
Subject: Re: 2.2.17 IP masq bug

I'm curious about how ICMP redirect is causing this problem.
Would you elaborate on how ICMP is involved?

-Ben McCann

Julian Anastasov wrote:
>
> Hello,
>
> On Fri, 1 Dec 2000, Mike Perry wrote:
>
> > external net, so it can IP masq the other 14 machines. The machines are on a
> > switch, and share a semi-switched network segment with a bunch of other
> > external IP'd machines (we are all in the same lab, actually).
> >
> > The bug:
> > When I make a connection from any internal node to the one of the other
> > externally routed machines in my lab, then close it, this external machine then
> > becomes unreachable to successive connects from that node.
>
> This problem can be caused from the ICMP redirect. Can these
> commands help?
>
> echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects
> echo 0 > /proc/sys/net/ipv4/conf/eth0/send_redirects
>
> Regards
>
> --
> Julian Anastasov <[email protected]>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/

--
Ben McCann Indus River Networks
31 Nagog Park
Acton, MA, 01720
email: [email protected] web: http://www.indusriver.com
phone: (978) 266-8140 fax: (978) 266-8111

2000-12-01 21:18:19

by Julian Anastasov

[permalink] [raw]
Subject: Re: 2.2.17 IP masq bug


Hello,

On Fri, 1 Dec 2000, Ben McCann wrote:

> I'm curious about how ICMP redirect is causing this problem.
> Would you elaborate on how ICMP is involved?

The masq box sends ICMP redirects to the internal host
when the destination host is on the same shared media, i.e.
"please, go directly to the destination". When the internal host
accepts these redirects it reroutes the packets directly to the
destination which is on the same LAN. The packets reach the destination
with saddr=10/8 because they are not masqueraded. It seems the
destination does not use direct route to 10/8 and the traffic
is not replied. The connection is not established when it is not
masqueraded. When we block these redirects the internal hosts
continue to forward the packets to the masq box without knowing
the destination is directly connected.

> -Ben McCann

Regards

--
Julian Anastasov <[email protected]>