2008-11-04 14:32:41

by Daniel J Blueman

[permalink] [raw]
Subject: time for TCP ECN defaulting to on?

Is it time to enable TCP ECN per default and get the benefits, since
router support has been around and known-about for really considerable
time?

Perhaps it should be a question of enabling it, and educating people
to disable it if they run into issues, since we'll probably be in the
same situation in 5 years...and it'll be some time before these
kernels hit devices/servers anyway.

Daniel
--
Daniel J Blueman


2008-11-04 16:34:32

by Dave Hudson

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

Daniel J Blueman wrote:
> Is it time to enable TCP ECN per default and get the benefits, since
> router support has been around and known-about for really considerable
> time?
>
> Perhaps it should be a question of enabling it, and educating people
> to disable it if they run into issues, since we'll probably be in the
> same situation in 5 years...and it'll be some time before these
> kernels hit devices/servers anyway.
>
> Daniel

Unfortunately I think you'll find there are sufficiently large numbers
of broken SOHO routers out there that if you try this you'll cause a lot
of problems. The problems range from no connectivity to in a few
extreme cases routers actually crashing or behaving in very
unpredictable ways. Here's one summary that got presented to the IETF
about 18 months ago:

http://www.ietf.org/proceedings/07mar/slides/tsvarea-3/sld6.htm

When a clueless end-user gets a Linux-enabled netbook that crashes their
router while their existing Vista or XP systems appear to work just fine
then the Linux network stack will get the blame for being buggy, not the
router :-(


Regards,
Dave

2008-11-04 22:53:23

by David Miller

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

From: Dave Hudson <[email protected]>
Date: Tue, 04 Nov 2008 16:16:03 +0000

> Daniel J Blueman wrote:
> > Is it time to enable TCP ECN per default and get the benefits, since
> > router support has been around and known-about for really considerable
> > time?
> > Perhaps it should be a question of enabling it, and educating people
> > to disable it if they run into issues, since we'll probably be in the
> > same situation in 5 years...and it'll be some time before these
> > kernels hit devices/servers anyway.
> > Daniel
>
> Unfortunately I think you'll find there are sufficiently large
> numbers of broken SOHO routers out there that if you try this you'll
> cause a lot of problems. The problems range from no connectivity to
> in a few extreme cases routers actually crashing or behaving in very
> unpredictable ways. Here's one summary that got presented to the
> IETF about 18 months ago:
>
> http://www.ietf.org/proceedings/07mar/slides/tsvarea-3/sld6.htm

Another issue is that, even if we turn it on by default, it won't
be on for a significant number of network cards out there.

This is because TSO, which is on by default, doesn't support ECN
in many implementations.

2008-11-05 01:18:57

by Michael Chan

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?


On Tue, 2008-11-04 at 14:52 -0800, David Miller wrote:
> From: Dave Hudson <[email protected]>
> Date: Tue, 04 Nov 2008 16:16:03 +0000
>
> > Daniel J Blueman wrote:
> > > Is it time to enable TCP ECN per default and get the benefits, since
> > > router support has been around and known-about for really considerable
> > > time?
> > > Perhaps it should be a question of enabling it, and educating people
> > > to disable it if they run into issues, since we'll probably be in the
> > > same situation in 5 years...and it'll be some time before these
> > > kernels hit devices/servers anyway.
> > > Daniel
> >
> > Unfortunately I think you'll find there are sufficiently large
> > numbers of broken SOHO routers out there that if you try this you'll
> > cause a lot of problems. The problems range from no connectivity to
> > in a few extreme cases routers actually crashing or behaving in very
> > unpredictable ways. Here's one summary that got presented to the
> > IETF about 18 months ago:
> >
> > http://www.ietf.org/proceedings/07mar/slides/tsvarea-3/sld6.htm
>
> Another issue is that, even if we turn it on by default, it won't
> be on for a significant number of network cards out there.
>
> This is because TSO, which is on by default, doesn't support ECN
> in many implementations.

I think this is no longer a limitation. The GSO code will take care of
ECN properly if the hardware does not support it when doing TSO.

2008-11-05 05:59:24

by David Miller

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

From: "Michael Chan" <[email protected]>
Date: Tue, 04 Nov 2008 17:16:03 -0800

> I think this is no longer a limitation. The GSO code will take care
> of ECN properly if the hardware does not support it when doing TSO.

Hmm, good point, but if that is what happens I don't know if I agree
with it.

If "take care of ECN" means doing TSO in software, that's in my
opinion the wrong thing to do.

2008-11-05 06:31:32

by Michael Chan

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

David Miller wrote:

> From: "Michael Chan" <[email protected]>
> Date: Tue, 04 Nov 2008 17:16:03 -0800
>
> > I think this is no longer a limitation. The GSO code will take care
> > of ECN properly if the hardware does not support it when doing TSO.
>
> Hmm, good point, but if that is what happens I don't know if I agree
> with it.
>
> If "take care of ECN" means doing TSO in software, that's in my
> opinion the wrong thing to do.
>
Right, it means TSO will be done in software by the GSO code if
ECE or CWR is set in a TSO frame and the driver indicates that
the hardware cannot segment such packets properly.

This allows TSO and ECN to coexist. Before this, ECN was always
disabled when TSO was enabled.

Assuming ECE and CWR are set infrequently on TSO frames, we still
benefit from hardware TSO most of the time. Why is it the wrong
thing to do?

2008-11-05 07:30:00

by David Miller

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

From: "Michael Chan" <[email protected]>
Date: Tue, 4 Nov 2008 22:31:00 -0800

> Assuming ECE and CWR are set infrequently on TSO frames, we still
> benefit from hardware TSO most of the time. Why is it the wrong
> thing to do?

I had forgotten about that aspect, and yes this is a
good tradeoff considering that.

2008-11-05 22:20:43

by Mikael Abrahamsson

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

On Tue, 4 Nov 2008, Daniel J Blueman wrote:

> Is it time to enable TCP ECN per default and get the benefits, since
> router support has been around and known-about for really considerable
> time?

I think enabling ECN by default is a bad idea.

Looking at the largest core router vendor out there:

http://www.cisco.com/en/US/docs/ios/12_2t/12_2t8/feature/guide/ftwrdecn.html#wp1031751

ECN has to actually be turned on in the routers along the way, it's not
default behaviour. No ISP I know of does this, but I can do a poll of
other ISP engineers in case more information is wanted.

So the upside of enabling it is minimal (I'd gladly see data proving the
opposite) and the downside is a lot of trouble with lots of older devices
which behave badly when ECN is enabled.

--
Mikael Abrahamsson email: [email protected]

2008-11-05 23:10:59

by David Miller

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

From: Mikael Abrahamsson <[email protected]>
Date: Wed, 5 Nov 2008 23:20:25 +0100 (CET)

> So the upside of enabling it is minimal (I'd gladly see data proving
> the opposite) and the downside is a lot of trouble with lots of
> older devices which behave badly when ECN is enabled.

This kind of thinking just perpetuates the problem forever.

If nothing important on end nodes enables it by default, people
running core routers have no reason to turn it on, and so on and so
forth.

Linux is much bigger and smarter than that, so we should break
the loop and enable it by default some point soon.

2008-11-07 04:46:46

by Mikael Abrahamsson

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

On Wed, 5 Nov 2008, David Miller wrote:

> This kind of thinking just perpetuates the problem forever.

Well, I also think IPv6 breaks things for some people (mostly buggt DNS
resolvers) but I wholly support this being default on. The ISP business is
going in the direction of faster links and smaller interface buffers,
meaning WRED is used less and less, thus lessening the benefit of ECN.

I see that in
<http://www.icir.org/floyd/papers/draft-ietf-tsvwg-tcp-ecn-00.txt> there
is a recommendation to not use ECN on retransmits, is there code right now
(or planned) to do some kind of "ECN blackhole detection", ie if no
response is received to SYN with ECN set, continue by sending the second
SYN without ECN and keep this information for the duration of the TCP
session?

If there is, I do agree with you that enabling ECN by default is a good
idea. Without such code, I still believe there are enough broken devices
out there that will create problems for people.

It's like the TCP option order "bug", where some devices would drop the
packets because of buggy implementations, that was changed in Linux to
work around others buggy code, and I see "ECN blackhole detection" as a
similar measure.

--
Mikael Abrahamsson email: [email protected]

2008-11-07 04:49:44

by David Miller

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

From: Mikael Abrahamsson <[email protected]>
Date: Fri, 7 Nov 2008 05:46:28 +0100 (CET)

> I see that in
> <http://www.icir.org/floyd/papers/draft-ietf-tsvwg-tcp-ecn-00.txt>
> there is a recommendation to not use ECN on retransmits, is there
> code right now (or planned) to do some kind of "ECN blackhole
> detection", ie if no response is received to SYN with ECN set,
> continue by sending the second SYN without ECN and keep this
> information for the duration of the TCP session?

No, we are firmly against any form of ECN blackhole detection. Alexey Kuznetsov and
Sally Floyd argued this out exhaustively several years ago.

2008-11-07 07:53:35

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

On Fri, 7 Nov 2008, Mikael Abrahamsson wrote:

> On Wed, 5 Nov 2008, David Miller wrote:
>
> > This kind of thinking just perpetuates the problem forever.
>
> It's like the TCP option order "bug", where some devices would drop the
> packets because of buggy implementations, that was changed in Linux to work
> around others buggy code, and I see "ECN blackhole detection" as a similar
> measure.

That is entirely bogus claim! The different ordering of options cost us
nothing, while disabling ECN certainly has an innumerable cost both in
performance and in nobody taking the initiative which makes the situation
worse for everybody.

And about somebody earlier claiming that they'll get an impressions that
Linux stack is broken (if such people even know that there's some network
stack in Linux :-))... I'm rather sure those isp supports etc. put a blaim
on us anyway even when loads of counterproof would exists because it's
just cheaper to do nothing and blaim linux instead. Also some claims
asserted by incompetent people easily start to live among random forums;
an example from the previous incident: "since disabling timestamps helps,
it must be that timestamps are broken" (and somebody even "more clueful"
added that they got enabled for 2.6.27?!?), needless to say, neither
holds.


--
i.

2008-11-07 08:12:10

by Mikael Abrahamsson

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

On Fri, 7 Nov 2008, Ilpo J?rvinen wrote:

> On Fri, 7 Nov 2008, Mikael Abrahamsson wrote:
>
>> On Wed, 5 Nov 2008, David Miller wrote:
>>
>>> This kind of thinking just perpetuates the problem forever.
>>
>> It's like the TCP option order "bug", where some devices would drop the
>> packets because of buggy implementations, that was changed in Linux to work
>> around others buggy code, and I see "ECN blackhole detection" as a similar
>> measure.
>
> That is entirely bogus claim! The different ordering of options cost us
> nothing, while disabling ECN certainly has an innumerable cost both in
> performance and in nobody taking the initiative which makes the situation
> worse for everybody.

I can't comment on "ECN blackhole detection" costing or costing none since
I haven't been able to find the discussion between Alexey Kuznetsov and
Sally Floyd that David Miller was referring to. Anything more to go on? A
direct link to the thread would be great.

I have sent an email (which will hopefully initiate a discussion) to a
mailinglist populated by a lot of the operational ISP community and asked
around about ECN and views on that. I also checked around on core router
platforms (Cisco 12000 and Cisco CRS-1, which definitely is two of the top
three core router platforms deployed in the world) and it seems they do
not support ECN as far as I can discern. This pretty much in the next 5
year timeframe ECN widespread support in the major core ISP networks out
of the question, leaving ECN support on the slower links where it might be
deployed faster. I doubt it though.

> And about somebody earlier claiming that they'll get an impressions that
> Linux stack is broken (if such people even know that there's some network
> stack in Linux :-))... I'm rather sure those isp supports etc. put a blaim
> on us anyway even when loads of counterproof would exists because it's
> just cheaper to do nothing and blaim linux instead. Also some claims
> asserted by incompetent people easily start to live among random forums;
> an example from the previous incident: "since disabling timestamps helps,
> it must be that timestamps are broken" (and somebody even "more clueful"
> added that they got enabled for 2.6.27?!?), needless to say, neither
> holds.

People just want it to work, people disable IPv6 because their DNS servers
don't respond properly to AAAA queries so they shut off IPv6 because they
they just want everything to work, they don't want to understand.

Now, IPv6 for me is cruicial to the continuing life and prosperity of the
Internet (NAT is bad). ECN is "nice to have".

But let me check out what the ISP community has to say before we get too
upset, it might be that people agree and will start requesting ECN in the
core equipment (I know I will) and then it might be worthwile after all.

I do see Linux (and Linux users) as leader(s) in deploying new technology,
with ECN being one of them. Question is how much hurt we're going to take
for it.

<http://www.merit.edu/mail.archives/nanog/msg12756.html> is a link to my
email to the NANOG ML referenced above.

--
Mikael Abrahamsson email: [email protected]

2008-11-07 09:25:46

by Bjørn Mork

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

Mikael Abrahamsson <[email protected]> writes:

> I have sent an email (which will hopefully initiate a discussion) to a
> mailinglist populated by a lot of the operational ISP community and
> asked around about ECN and views on that. I also checked around on
> core router platforms (Cisco 12000 and Cisco CRS-1, which definitely
> is two of the top three core router platforms deployed in the world)
> and it seems they do not support ECN as far as I can discern.

I believe you can forget about ECN in core networks as long as these
searches fail:

http://www.google.com/search?q=%22rfc+5129%22+site%3Acisco.com
http://www.google.com/search?q=%22rfc+5129%22+site%3Ajuniper.net



Bjørn

2008-11-07 11:17:21

by Mikael Abrahamsson

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

On Fri, 7 Nov 2008, Bjørn Mork wrote:

> I believe you can forget about ECN in core networks as long as these
> searches fail:

A lot of ISPs use MPLS yes, but it would be a simplification to say that
it's useless without MPLS support, because it's quite common that
congestion happens at the interconnections/borders between ISPs and they
are very rarely MPLS-labeled.

--
Mikael Abrahamsson email: [email protected]

2008-11-07 11:19:26

by Dave Hudson

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

Ilpo J?rvinen wrote:
> On Fri, 7 Nov 2008, Mikael Abrahamsson wrote:
>
> And about somebody earlier claiming that they'll get an impressions that
> Linux stack is broken (if such people even know that there's some network
> stack in Linux :-))... I'm rather sure those isp supports etc. put a blaim
> on us anyway even when loads of counterproof would exists because it's
> just cheaper to do nothing and blaim linux instead. Also some claims
> asserted by incompetent people easily start to live among random forums;
> an example from the previous incident: "since disabling timestamps helps,
> it must be that timestamps are broken" (and somebody even "more clueful"
> added that they got enabled for 2.6.27?!?), needless to say, neither
> holds.

Not all of the routers in question (the ones that crash, block packets
or otherwise misbehave) are provided by ISPs - in fact a huge number of
them are and have been sold retail. Over time most of those boxes will
get replaced with ones that don't have the problem because most
(probably all major) SOHO router suppliers now test that they don't
break with ECN so eventually there will be a point where enabling ECN by
default will make a lot of sense (there will be too few broken routers
to care about).

What I do believe (having spent a lot of years writing embedded device
and router code - and no, not the ones that crash ;-)) is that if you
enable a feature that causes just 1% of users to have an out-of-the-box
problem you'll see a seriously disproportionate response from end users.
Most people (and engineers are not "most people" :-)) will blame the
new thing that they've just added or changed, not the old thing that was
broken to begin with (it's human nature not to truly understand cause
and effect).

Whether we like it or not there's currently a known problem deploying
ECN on a wide scale - it has been sufficient to stop pretty-much
everyone from enabling it by default so far.


Regards,
Dave

2008-11-07 11:28:34

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

On Fri, 7 Nov 2008, Mikael Abrahamsson wrote:

> On Fri, 7 Nov 2008, Ilpo J?rvinen wrote:
>
> > On Fri, 7 Nov 2008, Mikael Abrahamsson wrote:
> >
> > > On Wed, 5 Nov 2008, David Miller wrote:
> > >
> > > > This kind of thinking just perpetuates the problem forever.
> > >
> > > It's like the TCP option order "bug", where some devices would drop the
> > > packets because of buggy implementations, that was changed in Linux to
> > > work
> > > around others buggy code, and I see "ECN blackhole detection" as a similar
> > > measure.
> >
> > That is entirely bogus claim! The different ordering of options cost us
> > nothing, while disabling ECN certainly has an innumerable cost both in
> > performance and in nobody taking the initiative which makes the situation
> > worse for everybody.
>
> I can't comment on "ECN blackhole detection" costing or costing none since I
> haven't been able to find the discussion between Alexey Kuznetsov and Sally
> Floyd that David Miller was referring to. Anything more to go on? A direct
> link to the thread would be great.

No idea about the mail. But anyway some cost comes from the fact that
there is no desired to fix broken things then, nor even to start doing
compliant equipment. Thus losing the potential benefits of ECN. It
has been around for years and we're still having this discussion about
blackhole detection being necessary to keep operating, which is
ridicilous.

And, would there be a need for reorder the TCP headers it would certainly
get done with all breakage associated (not very likely that need will
arise though because those parts of the header are well utilized already).
It would basically be the same as with such things like window scaling,
there's no window scaling blackhole detection in kernel besides one
manually turning it off. Would there be detection why would those window
scaling broken devices ever get fixed (and the corresponding end hosts
would be doomed for 64k window forever)... Not to mention other similar
examples.

> I have sent an email (which will hopefully initiate a discussion) to a
> mailinglist populated by a lot of the operational ISP community and asked
> around about ECN and views on that. I also checked around on core router
> platforms (Cisco 12000 and Cisco CRS-1, which definitely is two of the top
> three core router platforms deployed in the world) and it seems they do not
> support ECN as far as I can discern. This pretty much in the next 5 year
> timeframe ECN widespread support in the major core ISP networks out of the
> question, leaving ECN support on the slower links where it might be deployed
> faster. I doubt it though.

I think you partially miss the point here. In many cases not every single
router has to _support_ ECN to get its benefits, not-supporting is not the
problem in itself (though it would be nice to get that "fixed" as well)
but breaking ecn-enabled connections. I suppose you didn't check that
aspect? I'd guess those mentioned devices will interoperate just fine
since one can mostly connect ok with ecn too besides rare exceptions
rather than things being vice-versa.

The most crucial components are anyway the points of congestion, I don't
know enough isp topologies but I suppose those core routers are not the
ones where towards subscribers device traffic congests?

> Now, IPv6 for me is cruicial to the continuing life and prosperity of the
> Internet (NAT is bad). ECN is "nice to have".

Sure.

> I do see Linux (and Linux users) as leader(s) in deploying new technology,
> with ECN being one of them. Question is how much hurt we're going to take for
> it.

I doubt it any worse than with eg. timestamps.


--
i.

2008-11-07 11:39:22

by Mikael Abrahamsson

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

On Fri, 7 Nov 2008, Ilpo J?rvinen wrote:

> I think you partially miss the point here. In many cases not every
> single router has to _support_ ECN to get its benefits, not-supporting
> is not the problem in itself (though it would be nice to get that
> "fixed" as well) but breaking ecn-enabled connections. I suppose you
> didn't check that aspect? I'd guess those mentioned devices will
> interoperate just fine since one can mostly connect ok with ecn too
> besides rare exceptions rather than things being vice-versa.

I don't understand. My point is that most of the ISP core equipment out
there doesn't act on ECN rendering it mostly useless. The N in ECN renders
useless because there is no device doing the *notification*. They'll just
pass the traffic without acting on it differently regardless if ECN is on
or off.

> The most crucial components are anyway the points of congestion, I don't know
> enough isp topologies but I suppose those core routers are not the ones where
> towards subscribers device traffic congests?

There can be congestion anywhere in the network, best would be if all
routers supported it. My problem with ECN is that the most advanced
routers do not support it, it's useless with L2/L3 switches (as they have
very small buffers, there is "nothing" to do WRED on), so that leaves
potential implementation by either DSLAM/BRAS vendors (where Cisco BRAS
does support it but it needs to be enabled by the ISP) or the SOHO devices
which run Linux and might implement it, but I'd rather see them do active
queue management at all (fair-queue for instance) before asking them to do
ECN. Of course, if users start to ask for ECN and we get fair-queue at the
same time, all the better. One very common congestion point is definitely
the upstream connection of someones cable or DSL modem.

> I doubt it any worse than with eg. timestamps.

According to <http://www.imperialviolet.org/binary/ecntest.pdf> it's 0.5%
of hosts that drop packets when ECN is enabled. It's a substantial part of
the Internet. Yes, not doing blackhole detection might get these hosts
fixed faster, but at the expense of more end user hurt.

--
Mikael Abrahamsson email: [email protected]

2008-11-07 12:22:35

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

On Fri, 7 Nov 2008, Mikael Abrahamsson wrote:

> On Fri, 7 Nov 2008, Ilpo J?rvinen wrote:
>
> > I think you partially miss the point here. In many cases not every single
> > router has to _support_ ECN to get its benefits, not-supporting is not the
> > problem in itself (though it would be nice to get that "fixed" as well) but
> > breaking ecn-enabled connections. I suppose you didn't check that aspect?
> > I'd guess those mentioned devices will interoperate just fine since one can
> > mostly connect ok with ecn too besides rare exceptions rather than things
> > being vice-versa.
>
> I don't understand. My point is that most of the ISP core equipment out there
> doesn't act on ECN rendering it mostly useless. The N in ECN renders useless
> because there is no device doing the *notification*. They'll just pass the
> traffic without acting on it differently regardless if ECN is on or off.

Likewise, not enabling ecn renders any device doing notification useless.

One alternative to full enable would be enable it for the listening
allowing client end to decide but that is effectively same as not enabling
it at all (using the same logic as above).

> > The most crucial components are anyway the points of congestion, I don't
> > know enough isp topologies but I suppose those core routers are not the ones
> > where towards subscribers device traffic congests?
>
> There can be congestion anywhere in the network, best would be if all routers
> supported it.

I agree. But...

> My problem with ECN is that the most advanced routers do not
> support it, it's useless with L2/L3 switches (as they have very small buffers,
> there is "nothing" to do WRED on), so that leaves potential implementation by
> either DSLAM/BRAS vendors (where Cisco BRAS does support it but it needs to be
> enabled by the ISP) or the SOHO devices which run Linux and might implement
> it, but I'd rather see them do active queue management at all (fair-queue for
> instance) before asking them to do ECN. Of course, if users start to ask for
> ECN and we get fair-queue at the same time, all the better. One very common
> congestion point is definitely the upstream connection of someones cable or
> DSL modem.

...I'd assume that marking on end(s) of the cable/dsl link which is often
congested is the most low hanging fruit, and like you seem to say (if I
understood you correctly), there exists some support already which could
then incrementally be turned on. Just realized though that it might, after
all, not be what isps like doing since just getting higher bw
subscriptions is perhaps more rewarding from their perspective (please
don't take this as an offense, it's not meant to be one :-))...

Imho, the end-users end has less gain since those packets usually travel
just locally before getting dropped but ecn-aware streaming might change
that as well.

> > I doubt it any worse than with eg. timestamps.
>
> According to <http://www.imperialviolet.org/binary/ecntest.pdf> it's 0.5% of
> hosts that drop packets when ECN is enabled. It's a substantial part of the
> Internet. Yes, not doing blackhole detection might get these hosts fixed
> faster, but at the expense of more end user hurt.

Did you notice:

However, the failing hosts are not distributed randomly. The
7,627 hosts span only 4,613 /24 subnets. The top twenty
subnets account for 778 failing hosts (10%). WHOIS[1] information
for 18 of those 20 subnets suggests that they are located in China.

Just some bigger device(s) perhaps?


--
i.

2008-11-07 13:43:58

by Daniel J Blueman

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

On Fri, Nov 7, 2008 at 12:22 PM, Ilpo J?rvinen
<[email protected]> wrote:
> On Fri, 7 Nov 2008, Mikael Abrahamsson wrote:
>> On Fri, 7 Nov 2008, Ilpo J?rvinen wrote:
>> > I think you partially miss the point here. In many cases not every
>> > single
>> > router has to _support_ ECN to get its benefits, not-supporting is not
>> > the
>> > problem in itself (though it would be nice to get that "fixed" as well)
>> > but
>> > breaking ecn-enabled connections. I suppose you didn't check that
>> > aspect?
>> > I'd guess those mentioned devices will interoperate just fine since one
>> > can
>> > mostly connect ok with ecn too besides rare exceptions rather than
>> > things
>> > being vice-versa.
>>
>> I don't understand. My point is that most of the ISP core equipment out
>> there
>> doesn't act on ECN rendering it mostly useless. The N in ECN renders
>> useless
>> because there is no device doing the *notification*. They'll just pass the
>> traffic without acting on it differently regardless if ECN is on or off.

I've been running with ECN enabled on all my client linux systems and
(personal) webservers for the past 6 or so years. When I've
encountered issues accessing particular hosts, I turn it and TCP
window scaling off, but invariably it is always another cause.

If most ECN-broken hardware is embedded consumer appliances (which are
generally short-lifespan and moving more and more to linux), then we
avoid hurting these users by enabling ECN per default when eg
CONFIG_IP_ADVANCED_ROUTER is set (to little direct benefit of course).
It's a start and a constructive idea; by doing this and documenting
it, we provide a wake-up call for vendors, laying the path for
enabling it for all types of host in a few years. Even enabling ECN
for -rc kernels will raise awareness.

Alternatively, an ECN-day could be publicised targeting the linux tech
community, where we can report failing networks/sites to a central
website to quantify actual potential negative impact.

But doing nothing is cyclic - when will the natural break suddenly occur?
--
Daniel J Blueman

2008-11-07 14:29:22

by Andi Kleen

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

Dave Hudson <[email protected]> writes:
>
> Not all of the routers in question (the ones that crash, block packets
> or otherwise misbehave) are provided by ISPs - in fact a huge number
> of them are and have been sold retail. Over time most of those boxes
> will get replaced with ones that don't have the problem because most
> (probably all major) SOHO router suppliers now test that they don't
> break with ECN so eventually there will be a point where enabling ECN
> by default will make a lot of sense (there will be too few broken
> routers to care about).

One option would be also to enable it by default for IPv6 only.

-Andi

--
[email protected]

2008-11-07 14:30:27

by Alan

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

> Not all of the routers in question (the ones that crash, block packets
> or otherwise misbehave) are provided by ISPs

The ones that block and misbehave are the bigger problem. The ones that
crash are less of a problem and I think they are less common. Certainly
if they were common then people would be abusing the flaw routinely.
Similar end users tend to grasp "if it keeps crashing blame the supplier".

When stuff just mysteriously doesn't work it is a whole lot more
problematic. I think however Sally Floyd had it right and Alexey has it
wrong (as does Davem).

If you turn it off on a retransmit then you provide an immediate
incentive for everyone on the web server end of the business to fix their
network. Especially if you turn it off for second retransmit. That will
cause faulty ECN handling sites to feel "a bit slow" and we know from
marketing data that web site performance is crucial to customer base. A
three or four second delay getting a page up translates into dramatically
reduced hit counts.

Alan

2008-11-07 14:30:47

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

On Fri, 7 Nov 2008, Daniel J Blueman wrote:

> then we
> avoid hurting these users by enabling ECN per default when eg
> CONFIG_IP_ADVANCED_ROUTER is set (to little direct benefit of course).

I suppose all distros enable that anyway in generic kernels so it's not
going to be any different from just enabling it.

> It's a start and a constructive idea; by doing this and documenting
> it, we provide a wake-up call for vendors, laying the path for
> enabling it for all types of host in a few years. Even enabling ECN
> for -rc kernels will raise awareness.
>
> Alternatively, an ECN-day could be publicised targeting the linux tech
> community, where we can report failing networks/sites to a central
> website to quantify actual potential negative impact.

This will still miss much. Eg., the ordering problems were not discovered
afaik until 2.6.27 release, that's quite long time of testing without
anybody noticing that hey it's broken (it might be that some distro
circles saw this with some -rcx if they were using them but that didn't
gain much attention until 2.6.27 was already out). And at that time
the imminent release of Ubuntu's made the amount of testers much more
abundant resource than with some other kernel version.

Agreed that we definately should do more than just turn it on and wait for
troubles but educating users might turn out to be quite hard problem.
And certainly there will be troubles as even with the most comprehensive
attempts within linux' dev+tester community are going to leave major holes
like was proven with the tcp option ordering saga.


--
i.

2008-11-07 14:46:21

by David Newall

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

Isn't this a question for the IETF to answer? Are they saying turn on
ECN now?

2008-11-07 15:07:53

by Rémi Denis-Courmont

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

On Friday 07 November 2008 16:45:55 ext David Newall, you wrote:
> Isn't this a question for the IETF to answer? Are they saying turn on
> ECN now?

For what it's worth, the IESG says RFC3168 is a PROPOSED STANDARD.
This is the "entry-level maturity for the standards track":
(ftp://ftp.rfc-editor.org/in-notes/bcp/bcp9.txt)

A Proposed Standard specification is generally stable, has resolved
known design choices, is believed to be well-understood, has received
significant community review, and appears to enjoy enough community
interest to be considered valuable. However, further experience
might result in a change or even retraction of the specification
before it advances.
(...)
Implementors should treat Proposed Standards as immature
specifications. It is desirable to implement them in order to gain
experience and to validate, test, and clarify the specification.
However, since the content of Proposed Standards may be changed if
problems are found or better solutions are identified, deploying
implementations of such standards into a disruption-sensitive
environment is not recommended.

--
R?mi Denis-Courmont
Maemo Software, Nokia Devices R&D

2008-11-07 18:44:01

by John Heffner

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

The IETF has provided a spec, and additional documents on deployment
issues. They have provided all the guidance they are going to. It's
now up to implementers to weigh the trade-offs.

My own observations and opinions, for what they're worth:

Turning on ECN doesn't hurt as much as it used to. Back in the early
'00s, there were a lot of devices sold especially to financial
institutions to "protect" their web sites. These devices dropped any
packets with (previously) reserved header bits set, because some
people used these as a covert information channel. I believe these
devices are not as common as they once were, but there are still a few
big sites that black hole these packets. (I know that southwest.com
is still an offender.)

I have not actually heard of any issues with consumer-grade stuff, but
that may be because ECN has been disabled by default for so long.

Almost no network operators turn on ECN marking in their routers. In
fact, almost none care to do any sort of AQM. The practical benefits
of ECN are still somewhat unclear for most people. For example, it
can help with latency-sensitive applications, but mostly requires a
big queue to work well, so doesn't help as much as you would hope.
There are some interesting ideas on how to better use ECN information,
but these are mostly still research.

ECN black hole detection is pretty simple, and I don't see much reason
not to do it.

-John


On Fri, Nov 7, 2008 at 6:45 AM, David Newall <[email protected]> wrote:
> Isn't this a question for the IETF to answer? Are they saying turn on
> ECN now?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-net" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2008-11-16 09:13:36

by Herbert Xu

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

David Miller <[email protected]> wrote:
>
> This is because TSO, which is on by default, doesn't support ECN
> in many implementations.

Dave, we fixed that ages ago.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2008-11-16 09:24:25

by David Miller

[permalink] [raw]
Subject: Re: time for TCP ECN defaulting to on?

From: Herbert Xu <[email protected]>
Date: Sun, 16 Nov 2008 17:13:06 +0800

> David Miller <[email protected]> wrote:
> >
> > This is because TSO, which is on by default, doesn't support ECN
> > in many implementations.
>
> Dave, we fixed that ages ago.

I know, Michael Chan corrected me in a followup posting :)