2002-08-13 00:15:56

by Olaf Dabrunz

[permalink] [raw]
Subject: TCP/IP connection setup using ECN: interaction with firewall problems

Hello all,

I am using the explicit congestion notification feature by switching it on
with "echo 1 >/proc/sys/net/ipv4/tcp_ecn". Usually this works fine. But
there are certain hosts whose firewalls consider the ECN flags to be an
indicator for a scan or attack. This (mis-)behaviour has also been
described in the standards track protocol definition for ECN, RFC 3168.

Though the firewalls are the faulty components (see RFC 3168), my
experience with similar problems (server-side firewalls that drop all ICMP
packets, causing a loss of service to clients with a smaller MTU than the
server) has shown that these kinds of firewall problems are difficult to
overcome on the firewall configuration side (it seems this is due to lack
of time and/or problem-awareness of the administrators).

In order to still work together with such hosts, RFC 3168 section 6.1.1.1
suggests a change in behaviour of the ECN-capable clients. It states that

A host that receives no reply to an ECN-setup SYN within the normal
SYN retransmission timeout interval MAY resend the SYN and any
subsequent SYN retransmissions with CWR and ECE cleared. To overcome
normal packet loss that results in the original SYN being lost, the
originating host may retransmit one or more ECN-setup SYN packets
before giving up and retransmitting the SYN with the CWR and ECE bits
cleared.

Also, in case the firewall responds to an ECN-setup SYN by sending a
packet with the RST flag set, it states that

[...] a host that receives a RST in response to the transmission
of an ECN-setup SYN packet MAY resend a SYN with CWR and ECE cleared.
This could result in a TCP connection being established without using
ECN.

I actually experience the first problem stated above. The tpcdump trace
below shows what happens when I try to connect to http://www.nvidia.com.

19:10:37.192791 123.45.67.89.34342 > 209.213.198.80.http: S [ECN-Echo,CWR] 2222688860:2222688860(0) win 5808 <mss 1452,sackOK,timestamp 15670700,nop,wscale 0> (DF)
19:10:40.183059 123.45.67.89.34342 > 209.213.198.80.http: S [ECN-Echo,CWR] 2222688860:2222688860(0) win 5808 <mss 1452,sackOK,timestamp 15673700,nop,wscale 0> (DF)
19:10:46.183270 123.45.67.89.34342 > 209.213.198.80.http: S [ECN-Echo,CWR] 2222688860:2222688860(0) win 5808 <mss 1452,sackOK,timestamp 15679700,nop,wscale 0> (DF)
19:10:58.183715 123.45.67.89.34342 > 209.213.198.80.http: S [ECN-Echo,CWR] 2222688860:2222688860(0) win 5808 <mss 1452,sackOK,timestamp 15691700,nop,wscale 0> (DF)

AFAICS from the kernel ChangeLogs Linux versions 2.4.* and 2.5.* do not
implement the interoperability features described above. Is that correct?
Is someone working on a patch that implements these features?

Thanks, Olaf.


2002-08-13 00:21:43

by David Miller

[permalink] [raw]
Subject: Re: TCP/IP connection setup using ECN: interaction with firewall problems

From: Olaf Dabrunz <[email protected]>
Date: Tue, 13 Aug 2002 02:19:44 +0200

AFAICS from the kernel ChangeLogs Linux versions 2.4.* and 2.5.* do not
implement the interoperability features described above. Is that correct?
Is someone working on a patch that implements these features?

We purposely do not implement the interoperability features because
they have known holes and also we totally disagree with them in
principle.

2002-08-13 04:43:52

by Willy Tarreau

[permalink] [raw]
Subject: Re: TCP/IP connection setup using ECN: interaction with firewall problems

On Tue, Aug 13, 2002 at 02:19:44AM +0200, Olaf Dabrunz wrote:

> I actually experience the first problem stated above. The tpcdump trace
> below shows what happens when I try to connect to http://www.nvidia.com.

I also incidently noticed that nvidia drops ECN packets the first and only
time I tried to reach their site. IIRC, they also have other problems with
MTU. I think that their drivers are closed source because their developers
are as good as the network administrators, or even the same people :-/

Regards,
Willy

2002-08-13 18:26:55

by Kevin Buhr

[permalink] [raw]
Subject: Re: TCP/IP connection setup using ECN: interaction with firewall problems

Olaf Dabrunz <[email protected]> writes:
>
> AFAICS from the kernel ChangeLogs Linux versions 2.4.* and 2.5.* do not
> implement the interoperability features described above. Is that correct?
> Is someone working on a patch that implements these features?

Olaf:

Here's a small patch I put together a while ago and have been using
with some success. It implements *only* the SYN retransmission in the
"no reply" case (after a user-configurable number of lost SYN packets)
but won't help with the RST case.

It adds a new sysctl variable "tcp_ecn_retries". The default value of
zero gives the old behaviour. But, for example:

echo 3 >/proc/sys/net/ipv4/tcp_ecn_retries

will retry with an ECN-disabled SYN after three unanswered ECN-enabled
SYNs (i.e., after a 20 second delay or so). Of course, it doesn't
keep track of what hosts need this kluge. Every new TCP connection to
a naughty host will be negotiated the same way with a long initial
delay.

The following patch is against 2.4.19-pre10-ac2. I imagine it'll
apply cleanly to more recent kernels except for the index in
"sysctl.h".

I have my doubts about it (in other words, I really don't understand
enough about the network code to do it right), but after I got it
working for myself, I kind of lost interest.

Anyway, hope this helps.

Kevin Buhr <[email protected]>

* * *

diff -ru linux-2.4.19-pre10-ac2/include/linux/sysctl.h linux-2.4.19-pre10-ac2-local/include/linux/sysctl.h
--- linux-2.4.19-pre10-ac2/include/linux/sysctl.h Thu Jun 6 15:16:50 2002
+++ linux-2.4.19-pre10-ac2-local/include/linux/sysctl.h Thu Jun 6 15:51:07 2002
@@ -298,7 +298,8 @@
NET_IPV4_NONLOCAL_BIND=88,
NET_IPV4_ICMP_RATELIMIT=89,
NET_IPV4_ICMP_RATEMASK=90,
- NET_TCP_TW_REUSE=91
+ NET_TCP_TW_REUSE=91,
+ NET_IPV4_TCP_ECN_RETRIES=92,
};

enum {
diff -ru linux-2.4.19-pre10-ac2/include/net/tcp.h linux-2.4.19-pre10-ac2-local/include/net/tcp.h
--- linux-2.4.19-pre10-ac2/include/net/tcp.h Thu Jun 6 15:16:02 2002
+++ linux-2.4.19-pre10-ac2-local/include/net/tcp.h Thu Jun 6 15:51:08 2002
@@ -454,6 +454,7 @@
extern int sysctl_tcp_fack;
extern int sysctl_tcp_reordering;
extern int sysctl_tcp_ecn;
+extern int sysctl_tcp_ecn_retries;
extern int sysctl_tcp_dsack;
extern int sysctl_tcp_mem[3];
extern int sysctl_tcp_wmem[3];
diff -ru linux-2.4.19-pre10-ac2/include/net/tcp_ecn.h linux-2.4.19-pre10-ac2-local/include/net/tcp_ecn.h
--- linux-2.4.19-pre10-ac2/include/net/tcp_ecn.h Fri Nov 2 17:43:26 2001
+++ linux-2.4.19-pre10-ac2-local/include/net/tcp_ecn.h Thu Jun 6 15:42:48 2002
@@ -38,6 +38,12 @@
}

static __inline__ void
+TCP_ECN_noecn_syn(struct sk_buff *skb)
+{
+ TCP_SKB_CB(skb)->flags &= ~(TCPCB_FLAG_ECE|TCPCB_FLAG_CWR);
+}
+
+static __inline__ void
TCP_ECN_make_synack(struct open_request *req, struct tcphdr *th)
{
if (req->ecn_ok)
diff -ru linux-2.4.19-pre10-ac2/net/ipv4/sysctl_net_ipv4.c linux-2.4.19-pre10-ac2-local/net/ipv4/sysctl_net_ipv4.c
--- linux-2.4.19-pre10-ac2/net/ipv4/sysctl_net_ipv4.c Thu Jun 6 15:16:03 2002
+++ linux-2.4.19-pre10-ac2-local/net/ipv4/sysctl_net_ipv4.c Thu Jun 6 15:42:48 2002
@@ -203,6 +203,8 @@
&sysctl_tcp_reordering, sizeof(int), 0644, NULL, &proc_dointvec},
{NET_TCP_ECN, "tcp_ecn",
&sysctl_tcp_ecn, sizeof(int), 0644, NULL, &proc_dointvec},
+ {NET_IPV4_TCP_ECN_RETRIES, "tcp_ecn_retries",
+ &sysctl_tcp_ecn_retries, sizeof(int), 0644, NULL, &proc_dointvec},
{NET_TCP_DSACK, "tcp_dsack",
&sysctl_tcp_dsack, sizeof(int), 0644, NULL, &proc_dointvec},
{NET_TCP_MEM, "tcp_mem",
diff -ru linux-2.4.19-pre10-ac2/net/ipv4/tcp_timer.c linux-2.4.19-pre10-ac2-local/net/ipv4/tcp_timer.c
--- linux-2.4.19-pre10-ac2/net/ipv4/tcp_timer.c Mon Oct 1 09:19:57 2001
+++ linux-2.4.19-pre10-ac2-local/net/ipv4/tcp_timer.c Tue Aug 13 10:43:07 2002
@@ -30,6 +30,7 @@
int sysctl_tcp_retries1 = TCP_RETR1;
int sysctl_tcp_retries2 = TCP_RETR2;
int sysctl_tcp_orphan_retries;
+int sysctl_tcp_ecn_retries;

static void tcp_write_timer(unsigned long);
static void tcp_delack_timer(unsigned long);
@@ -373,6 +374,11 @@
}

tcp_enter_loss(sk, 0);
+
+ /* If this is a SYN packet, retry with ECN disabled */
+ if (sk->state == TCP_SYN_SENT
+ && sysctl_tcp_ecn_retries && tp->retransmits+1 >= sysctl_tcp_ecn_retries)
+ TCP_ECN_noecn_syn(skb_peek(&sk->write_queue));

if (tcp_retransmit_skb(sk, skb_peek(&sk->write_queue)) > 0) {
/* Retransmission failed because of local congestion,