LinuxLists.cc - R: Kernel bug handling TCP_RTO

2002-12-12 20:13:39

Subject: R: Kernel bug handling TCP_RTO_MAX?

Never say never ;-)
I need to change it now as a temporary workaround for a problem in the UMTS core network of my company. But I think there could be thousands of situations where a fine tuning of this TCP parameter could be useful.

Any contributes on the problem?

Stefano.

-----Messaggio originale-----
Da: David S. Miller [mailto:[email protected]]
Inviato: gioved? 12 dicembre 2002 20.59
A: Andreani Stefano
Cc: [email protected]; [email protected]
Oggetto: Re: Kernel bug handling TCP_RTO_MAX?

From: "Andreani Stefano" <[email protected]>
Date: Thu, 12 Dec 2002 20:15:42 +0100

Problem: I need to change the max value of the TCP retransmission
timeout.

Why? There should be zero reason to change this value.

2002-12-12 20:28:52

by David Miller

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

From: "Andreani Stefano" <[email protected]>
Date: Thu, 12 Dec 2002 21:18:21 +0100

Never say never ;-)
I need to change it now as a temporary workaround for a problem in
the UMTS core network of my company. But I think there could be
thousands of situations where a fine tuning of this TCP parameter
could be useful.

You still aren't giving specific examples and details of
the problem you are seeing.

2002-12-12 20:31:22

by Alan

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

On Thu, 2002-12-12 at 20:18, Andreani Stefano wrote:
> Never say never ;-)
> I need to change it now as a temporary workaround for a problem in the UMTS core network of my company. But I think there could be thousands of situations where a fine tuning of this TCP parameter could be useful.
>
The default is too short ?

2002-12-12 20:33:32

by Nivedita Singhvi

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

> Never say never ;-)
> I need to change it now as a temporary workaround for a problem in the UMTS core \
> network of my company. But I think there could be thousands of situations where a \
> fine tuning of this TCP parameter could be useful.
>
> Any contributes on the problem?

If what you are trying to do is terminate the connection earlier,
than reduce the tcp sysctl variable tcp_retries2. This should be the
maximum number of retransmits TCP will make in established state.

The TCP_RTO_MAX parameter is simply an *upper bound* on the
value of the retransmission timeout, which increases exponentially
from the original timeout value.

thanks,
Nivedita

2002-12-13 02:24:11

by Nivedita Singhvi

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

Alan Cox wrote:
>
> On Thu, 2002-12-12 at 20:18, Andreani Stefano wrote:
> > Never say never ;-)
> > I need to change it now as a temporary workaround for a
> > problem in the UMTS core network of my company. But I think
> > there could be thousands of situations where a fine tuning
> > of this TCP parameter could be useful.
> >
> The default is too short ?

Short?? :). On the contrary...

[I apologize for the length of this note, it became a river ]

here's what it would roughly look like:

assuming HZ = 100 (2.4)

tcp_retries2 = 15 (default) /* The # of retransmits */

TCP_RTO_MAX = 120*HZ = 120 seconds = 120000ms
TCP_RTO_MAX2 = 6*HZ = 6 seconds = 6000 ms /* modified value */

TCP_RTO_MIN = HZ/5 = 200ms

Assuming you are on a local lan, your round trip
times are going to be much less than 200 ms, and
so using the TCP_RTO_MIN of 200ms ("The algorithm
ensures that the rto cant go below that").

At each retransmit, TCP backs off exponentially:

Retransmission # Default rto (ms) With TCP_RTO_MAX(2) (ms)
1 200 200
2 400 400
3 800 800
4 1600 1600
5 3200 3200
6 6400 6000
7 12800 6000
8 25600 6000
9 51200 6000
10 102400 6000
11 120000 6000
12 120000 6000
13 120000 6000
14 120000 6000
15 120000 6000

Total time = 804.6 seconds 66.2 seconds
13.4 minutes

So the minimum total time to time out a tcp connection
(barring application close) would be ~13 minutes in the
default case and 66 seconds with a modified TCP_RTO_MAX
of 6*HZ.

I can see the argument for lowering both, the TCP_RTO_MAX
and the TCP_RTO_MIN default values.

I just did a bunch of testing over satellite, and round trip
times were of the order of 850ms ~ 4000ms.

The max retransmission timeout of 120 seconds is two orders of
magnitude larger than really the slowest round trip times
probably experienced on this planet..(Are we trying to make this
work to the moon and back? Surely NASA has its own code??)

Particularly since we also retransmit 15 times, cant we conclude
"Its dead, Jim" earlier??

200ms is for the minimum retransmission timeout is roughly a
thousand times, if not more, the round trip time on a
fast lan. Since the algorithm is adaptive (a function of the
measured round trip times), what would be the negative
repercussions of lowering this?

It may not be a good idea to make either tunable, but what about
the default init rto value, TCP_TIMEOUT_INIT, since that would allow a
starting point of something close to a suitable value?

The problem with all of the above is that the TCP engine is
global and undifferentiated, and tuning for at least these parameters
is the same regardless of the interface or route or environment..

Yes, we should and want to meet the standards for the internet, and
behave in a network friendly fashion. But all networks != internet.

I'm thinking for eg of a dedicated fast gigabit or better connection
between a tier 2 webserver and a backend database, for example, that
has every need of performance and few of standards compliance..

It would be wonderful if we could tune TCP on a per-interface or a
per-route basis (everything public, for a start, considered the
internet, and non-routable networks (10, etc), could be configured
suitably for its environment. (TCP over private LAN - rfc?). Trusting
users would be a big issue..

Any thoughts? How stupid is this? Old hat??

thanks,
Nivedita

2002-12-13 03:31:45

by Matti Aarnio

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

On Thu, Dec 12, 2002 at 06:26:45PM -0800, Nivedita Singhvi wrote:
> Alan Cox wrote:
> > The default is too short ?
>
> Short?? :). On the contrary...
>
> here's what it would roughly look like:
>
> assuming HZ = 100 (2.4)
>
> tcp_retries2 = 15 (default) /* The # of retransmits */
>
> TCP_RTO_MAX = 120*HZ = 120 seconds = 120000ms
> TCP_RTO_MAX2 = 6*HZ = 6 seconds = 6000 ms /* modified value */
>
> TCP_RTO_MIN = HZ/5 = 200ms
>
> Assuming you are on a local lan, your round trip
> times are going to be much less than 200 ms, and
> so using the TCP_RTO_MIN of 200ms ("The algorithm
> ensures that the rto cant go below that").

The RTO steps in only when there is a need to RETRANSMIT.
For that reason, it makes no sense to place its start
any shorter.

> At each retransmit, TCP backs off exponentially:
>
> Retransmission # Default rto (ms) With TCP_RTO_MAX(2) (ms)
> 1 200 200
...
> 14 120000 6000
> 15 120000 6000
>
> Total time = 804.6 seconds 66.2 seconds
> 13.4 minutes
>
> So the minimum total time to time out a tcp connection
> (barring application close) would be ~13 minutes in the
> default case and 66 seconds with a modified TCP_RTO_MAX
> of 6*HZ.

You can have this by doing carefull non-blocking socket
coding, and protocol traffic monitoring along with
protocol level keepalive ping-pong packets to have
something flying around (like NJE ping-pong, not
that every IBM person knows what that is/was..)

> I can see the argument for lowering both, the TCP_RTO_MAX
> and the TCP_RTO_MIN default values.

I don't.

> I just did a bunch of testing over satellite, and round trip
> times were of the order of 850ms ~ 4000ms.
>
> The max retransmission timeout of 120 seconds is two orders of
> magnitude larger than really the slowest round trip times
> probably experienced on this planet..(Are we trying to make this
> work to the moon and back? Surely NASA has its own code??)

We try not to kill overloaded network routers while they
are trying to compensate some line breakage and doing
large-scale network topology re-routing.

> Particularly since we also retransmit 15 times, cant we conclude
> "Its dead, Jim" earlier??

No. I have had LAN spanning-tree flaps taking 60 seconds
(actually a bit over 30 seconds), and years ago Linux's
TCP code timed out in that. It was most annoying to
use some remote system thru such a network...

> 200ms is for the minimum retransmission timeout is roughly a
> thousand times, if not more, the round trip time on a
> fast lan. Since the algorithm is adaptive (a function of the
> measured round trip times), what would be the negative
> repercussions of lowering this?

When things _fail_ in the lan, what would be sensible value ?
How long will such abnormality last ?

In overload, resending quickly won't help a bit, just raise
the backoff (and prolong overload.)

Loosing a packet sometimes, and that way needing to retransmit
is the gray area I can't define quickly. If it is rare, it
really does not matter. If it happens often, there could be
so serious trouble that having quicker retransmit will only
aggreviate the trouble more.

> It may not be a good idea to make either tunable, but what about
> the default init rto value, TCP_TIMEOUT_INIT, since that would allow a
> starting point of something close to a suitable value?
>
> The problem with all of the above is that the TCP engine is
> global and undifferentiated, and tuning for at least these parameters
> is the same regardless of the interface or route or environment..

You are looking for "STP" perhaps ?
It has a feature of waking all streams retransmits, in between
particular machines, when at least one STP frame travels in between
the hosts.

I can't find it now from my RFC collection. Odd at that..
Neither as a draft. has it been abandoned ?

> Yes, we should and want to meet the standards for the internet, and
> behave in a network friendly fashion. But all networks != internet.
>
> I'm thinking for eg of a dedicated fast gigabit or better connection
> between a tier 2 webserver and a backend database, for example, that
> has every need of performance and few of standards compliance..
>
> It would be wonderful if we could tune TCP on a per-interface or a
> per-route basis (everything public, for a start, considered the
> internet, and non-routable networks (10, etc), could be configured
> suitably for its environment. (TCP over private LAN - rfc?). Trusting
> users would be a big issue..
>
> Any thoughts? How stupid is this? Old hat??

More and more of STP ..

> thanks,
> Nivedita

/Matti Aarnio

2002-12-13 04:39:36

by Nivedita Singhvi

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

Matti Aarnio wrote:

> > Assuming you are on a local lan, your round trip
> > times are going to be much less than 200 ms, and
> > so using the TCP_RTO_MIN of 200ms ("The algorithm
> > ensures that the rto cant go below that").
>
> The RTO steps in only when there is a need to RETRANSMIT.
> For that reason, it makes no sense to place its start
> any shorter.

Not sure I understood your point clearly here - that things
are going to be broken, so dont kick it off too early?

For the most part, dropped packets are recovered by fast
retransmit getting triggered. So when the retransmission
timer goes off, I'd agree things are in all likelihood
messed up. BUT..the default TCP_TIMEOUT_INIT = 300ms, which
is what the timeout calculation engine is fed to begin
with. After that, the actual measured round trip times
smooth out and help make the retransmit timeout accurate.

TCP_RTO_MIN is the lower bound for the rto. On fast
lans, though, if measured round trip times are say .01ms,
and our MIN is 200ms, thats a thousand times the value - which
means that we are reacting to events too far back in time
on the fast lan scale. If there was congestion
way back then, does that reflect conditions now??

> > So the minimum total time to time out a tcp connection
> > (barring application close) would be ~13 minutes in the
> > default case and 66 seconds with a modified TCP_RTO_MAX
> > of 6*HZ.
>
> You can have this by doing carefull non-blocking socket
> coding, and protocol traffic monitoring along with
> protocol level keepalive ping-pong packets to have
> something flying around (like NJE ping-pong, not
> that every IBM person knows what that is/was..)

Er, this IBMer is unfortunately rather underinformed on that
subject ;) I'll look it up, but I can guestimate what you are
referring to..True, but for the most part, getting every
application to be performant and knowledgeable about
network conditions and program accordingly is hard :). And
if by protocol level you mean transport level, then we're back to
altering the protocol. Wouldnt pingpongs just add to the
traffic under all conditions (I admit this is a rather lame point :)).

> We try not to kill overloaded network routers while they
> are trying to compensate some line breakage and doing
> large-scale network topology re-routing.

Good point! :). I have little experience with Internet router traffic
snarls, and am certainly not arguing for a major alteration to
TCP exponential backoff :). See below..(the environment I was
thinking of..)

> > Particularly since we also retransmit 15 times, cant we conclude
> > "Its dead, Jim" earlier??
>
> No. I have had LAN spanning-tree flaps taking 60 seconds
> (actually a bit over 30 seconds), and years ago Linux's
> TCP code timed out in that. It was most annoying to
> use some remote system thru such a network...

Urgh. Bletch. OK. But minor nit here - how often does that
happen? Whats the right thing to do in that situation?
Which situation should we optimize our settings for?
I accept, though, that we need that kind of time frame..

> When things _fail_ in the lan, what would be sensible value ?
> How long will such abnormality last ?

Hmm, good questions, but ones I'm going to handwave at :).

One, my assumption that the ratio of the (say) ave expected
round trip times to the rto value should be around the same -
i.e why not be as conservative/aggressive as the normal default:

our default init rto is 300, so currently we're going to timeout
on anything thats a 100ms over the min of 200. that is far
less conservative than setting an rto of 200 when your round
trip time is a thousand or 10,000 times less..does that make sense?

The other assumption that I'm operating under is that when
things fail talking to a directly attached host - its because
that host has died (even if its only the app or the NIC, whatever).
i.e. the situation is that your connection is going to break,
except you are going to futilely retransmit 15 times and
wait an interminably long time before you do..hence the
advantage of learning whats happening quickly..

> In overload, resending quickly won't help a bit, just raise
> the backoff (and prolong overload.)

See above..

> Loosing a packet sometimes, and that way needing to retransmit
> is the gray area I can't define quickly. If it is rare, it
> really does not matter. If it happens often, there could be
> so serious trouble that having quicker retransmit will only
> aggreviate the trouble more.

Thats true..

> You are looking for "STP" perhaps ?
> It has a feature of waking all streams retransmits, in between
> particular machines, when at least one STP frame travels in between
> the hosts.
>
> I can't find it now from my RFC collection. Odd at that..
> Neither as a draft. has it been abandoned ?

Learn something new every day :). Thanks for the ptr. I'll
look it up..

> > It would be wonderful if we could tune TCP on a per-interface or a
> > per-route basis (everything public, for a start, considered the
> > internet, and non-routable networks (10, etc), could be configured
> > suitably for its environment. (TCP over private LAN - rfc?). Trusting
> > users would be a big issue..
> >
> > Any thoughts? How stupid is this? Old hat??
>
> More and more of STP ..

thanks,
Nivedita

2002-12-13 05:21:41

by David Miller

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

From: Matti Aarnio <[email protected]>
Date: Fri, 13 Dec 2002 05:39:28 +0200

On Thu, Dec 12, 2002 at 06:26:45PM -0800, Nivedita Singhvi wrote:
> Assuming you are on a local lan, your round trip
> times are going to be much less than 200 ms, and
> so using the TCP_RTO_MIN of 200ms ("The algorithm
> ensures that the rto cant go below that").

The RTO steps in only when there is a need to RETRANSMIT.
For that reason, it makes no sense to place its start
any shorter.

Actually, TCP_RTO_MIN cannot be made any smaller without
some serious thought.

The reason it is 200ms is due to the granularity of the BSD
TCP socket timers.

In short, the repercussions are not exactly well known, so it's
a research problem to fiddle here.

2002-12-13 06:21:05

by Nivedita Singhvi

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

Nivedita Singhvi wrote:

> our default init rto is 300, so currently we're going to timeout
> on anything thats a 100ms over the min of 200. that is far
> less conservative than setting an rto of 200 when your round
> trip time is a thousand or 10,000 times less..does that make sense?

Doh! init rto is NOT 300ms, its 3 seconds. That minor blooper
shreds my comparison argument a tad :)..but Dave's point renders
that moot, in any case..

"David S. Miller" wrote:

> Actually, TCP_RTO_MIN cannot be made any smaller without
> some serious thought.
>
> The reason it is 200ms is due to the granularity of the BSD
> TCP socket timers.
>
> In short, the repercussions are not exactly well known, so it's
> a research problem to fiddle here.

Ack.

Sometime in the not too distant future, the next generation of
infrastructure will require this to be reworked :).

thanks,
Nivedita

2002-12-13 06:50:12

by David Stevens

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

I believe the very large BSD number was based on the large
granularity of the timer (500ms for slowtimeout), designed for use on a VAX
780. The PC on my desk is 3500 times faster than a VAX 780, and you can
send a lot of data on Gigabit Ethernet instead of sitting on your hands for
an enormous min timeout on modern hardware. Switched gigabit isn't exactly
the same kind of environment as shared 10 Mbps (or 2 Mbps) when that stuff
went in, but the min timeouts are the same.
I think the exponential back-off should handle most issues for
underestimated timers, and the min RTO should be the timer granularity.
Variability in that is already accounted for by the RTT estimator.
I certainly agree it needs careful investigating, but it's been a pet
peeve of mine for years on BSD systems that it forced an arbitrary minimum
that had no accounting for hardware differences over the last 20 years.

+-DLS

"David S. Miller" <[email protected]>@vger.kernel.org on 12/12/2002 09:23:35
PM

Sent by: [email protected]

To: [email protected]
cc: [email protected], [email protected],
[email protected], [email protected],
[email protected]
Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

From: Matti Aarnio <[email protected]>
Date: Fri, 13 Dec 2002 05:39:28 +0200

On Thu, Dec 12, 2002 at 06:26:45PM -0800, Nivedita Singhvi wrote:
> Assuming you are on a local lan, your round trip
> times are going to be much less than 200 ms, and
> so using the TCP_RTO_MIN of 200ms ("The algorithm
> ensures that the rto cant go below that").

The RTO steps in only when there is a need to RETRANSMIT.
For that reason, it makes no sense to place its start
any shorter.

Actually, TCP_RTO_MIN cannot be made any smaller without
some serious thought.

The reason it is 200ms is due to the granularity of the BSD
TCP socket timers.

In short, the repercussions are not exactly well known, so it's
a research problem to fiddle here.

2002-12-13 06:56:50

by David Miller

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

From: David Stevens <[email protected]>
Date: Thu, 12 Dec 2002 23:55:35 -0700

I believe the very large BSD number was based on the large
granularity of the timer (500ms for slowtimeout), designed for use on a VAX
780. The PC on my desk is 3500 times faster than a VAX 780, and you can
send a lot of data on Gigabit Ethernet instead of sitting on your hands for
an enormous min timeout on modern hardware. Switched gigabit isn't exactly
the same kind of environment as shared 10 Mbps (or 2 Mbps) when that stuff
went in, but the min timeouts are the same.

This is well understood, the problem is that BSD's coarse timers are
going to cause all sorts of problems when a Linux stack with a reduced
MIN RTO talks to it.

Consider also, delayed ACKs and possible false retransmits this could
induce with a smaller MIN RTO.

2002-12-13 11:38:33

by Bogdan Costescu

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

On Thu, 12 Dec 2002, David S. Miller wrote:

> This is well understood, the problem is that BSD's coarse timers are
> going to cause all sorts of problems when a Linux stack with a reduced
> MIN RTO talks to it.

Sorry to jump into the discussion without a good understanding of inner
workings of TCP, I just want to share my view as a possible user of this:
one of the messages at the beginning of the thread said that this would be
useful on a closed network and I think that this point was overlooked.

Think of a closed network with only Linux machines on it (world
domination, right :-)) like a Beowulf cluster, web frontends talking to
NFS fileservers, web frontends talking to database backends, etc. Again as
proposed earlier, border hosts (those connected to both the closed
network and outside one) could change their communication parameters based
on device or route and this would become an internal affair that would not
affect communication with other stacks.

I don't want to suggest to make this the default behaviour; rather, have
it a parameter that can be changed by the sysadmin and have the current
value as default.

--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [email protected]

2002-12-13 11:45:46

by Andrew McGregor

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

Er, wasn't that SCTP? If so, that's RFC 3309 and many, many drafts. You
might also want to look at DCCP (draft-ietf-dccp-*) and the various
documents from the IETF's PILC group. There is also a proposal for a new
TCP-style protocol with a real differential controller, the name of which I
can't recall right now.

See also draft-allman-tcp-sack for another proposal for a fix that won't
break old stacks. Also draft-ietf-tsvwg-tcp-eifel-alg,
draft-ietf-tsvwg-tcp-eifel-response and many more.

I can't claim to be a TCP expert, but TCP_RTO_MIN can certainly have a
different value for IPv6, where I believe millisecond reolution timers are
required, so 2ms would be correct.

Unfortuntately, TCP is incredibly subtle. So, the IETF are really
conservative about even suggesting modifications to it, because a common
and badly behaved stack can cause major disasters in the 'net.

Andrew

--On Thursday, December 12, 2002 20:45:24 -0800 Nivedita Singhvi
<[email protected]> wrote:

>> You are looking for "STP" perhaps ?
>> It has a feature of waking all streams retransmits, in between
>> particular machines, when at least one STP frame travels in between
>> the hosts.
>>
>> I can't find it now from my RFC collection. Odd at that..
>> Neither as a draft. has it been abandoned ?
>
> Learn something new every day :). Thanks for the ptr. I'll
> look it up..
>
>> > It would be wonderful if we could tune TCP on a per-interface or a
>> > per-route basis (everything public, for a start, considered the
>> > internet, and non-routable networks (10, etc), could be configured
>> > suitably for its environment. (TCP over private LAN - rfc?). Trusting
>> > users would be a big issue..
>> >
>> > Any thoughts? How stupid is this? Old hat??
>>
>> More and more of STP ..
>
> thanks,
> Nivedita

2002-12-13 11:54:24

by Andrew McGregor

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

You're going to make lots of IETFer's really annoyed by suggesting that :-)

Honestly, there are lots of other ways to solve this, and it would be nice
if the IETF's recent additions got implemented; there are many relevant
things going on there. Those interested should just talk to the draft
authors about implementing things. It's an open organisation just like
linux-kernel after all, just a bit more formal.

In a closed network, why not have SOCK_STREAM map to something faster than
TCP anyway? That is, if I connect(address matching localnet), SOCK_STREAM
maps to (eg) SCTP. That would be a far more dramatic performance hack!

Andrew

--On Friday, December 13, 2002 12:46:15 +0100 Bogdan Costescu
<[email protected]> wrote:

> On Thu, 12 Dec 2002, David S. Miller wrote:
>
>> This is well understood, the problem is that BSD's coarse timers are
>> going to cause all sorts of problems when a Linux stack with a reduced
>> MIN RTO talks to it.
>
> Sorry to jump into the discussion without a good understanding of inner
> workings of TCP, I just want to share my view as a possible user of this:
> one of the messages at the beginning of the thread said that this would
> be useful on a closed network and I think that this point was overlooked.
>
> Think of a closed network with only Linux machines on it (world
> domination, right :-)) like a Beowulf cluster, web frontends talking to
> NFS fileservers, web frontends talking to database backends, etc. Again
> as proposed earlier, border hosts (those connected to both the closed
> network and outside one) could change their communication parameters
> based on device or route and this would become an internal affair that
> would not affect communication with other stacks.
>
> I don't want to suggest to make this the default behaviour; rather, have
> it a parameter that can be changed by the sysadmin and have the current
> value as default.
>
> --
> Bogdan Costescu
>
> IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
> Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
> Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
> E-mail: [email protected]
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2002-12-13 12:25:28

by Bogdan Costescu

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

On Sat, 14 Dec 2002, Andrew McGregor wrote:

> You're going to make lots of IETFer's really annoyed by suggesting that :-)

I hope not. That was the reason for allowing it to be tuned and for having
a default value equal to the existing one.

> In a closed network, why not have SOCK_STREAM map to something faster than
> TCP anyway?

Sure, just give me a protocol that:
- is reliable
- has low latency
- comes with the standard kernel
and I'll just use it. But you always get only 2 out ot 3...

--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [email protected]

2002-12-13 13:14:38

by Andrew McGregor

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

--On Friday, December 13, 2002 13:33:16 +0100 Bogdan Costescu
<[email protected]> wrote:

> On Sat, 14 Dec 2002, Andrew McGregor wrote:
>
>> You're going to make lots of IETFer's really annoyed by suggesting that
>> :-)
>
> I hope not. That was the reason for allowing it to be tuned and for
> having a default value equal to the existing one.

I know the folks in question :-) Actually, they'd be nice about it, but
say something like:

Well, RFC 2988 says that the present value is too small and should be 1s,
although I take it from other discussion that experiment shows 200ms to be
OK.

Instead, RFCs 3042 and 3390 present the IETF's preferred approach that has
actually made it through the process. But there are lots of drafts in
progress, so that isn't the final word, although it is certainly better
than tuning down RTO_MAX.

Now, I have no idea if the kernel presently implements the latter two by
default (and on a quick look I can't find either in the code). If not, it
should. Shouldn't the initial window be a tunable?

>> In a closed network, why not have SOCK_STREAM map to something faster
>> than TCP anyway?
>
> Sure, just give me a protocol that:
> - is reliable
> - has low latency
> - comes with the standard kernel
> and I'll just use it. But you always get only 2 out ot 3...
>
> --
> Bogdan Costescu

SCTP is in 2.5 now. Does that not fit the bill? I admit, I don't know
about the reliability, although I guess I'm going to find out as I have
cause to use it shortly. Wearing an IETF hat, I'd like to hear about this,
as I'm on a bit of a practicality crusade there :-)

Andrew

2002-12-13 18:01:11

by Nivedita Singhvi

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

Andrew McGregor wrote:

> In a closed network, why not have SOCK_STREAM map to something faster than
> TCP anyway? That is, if I connect(address matching localnet), SOCK_STREAM
> maps to (eg) SCTP. That would be a far more dramatic performance hack!
>
> Andrew

Not that simple. SCTP (if that is what Matti was referring to) is
a SOCK_STREAM socket, with a protocol of IPPROTO_SCTP. I'm just
getting done implementing a testsuite against the SCTP API.

i.e. You have to know you want an SCTP socket at the time you
open the socket. You certainly have no idea whether youre on
a closed network or not, for that matter, the app may want to talk
on multiple interfaces etc. (Most hosts will have one interface
on a public net)..

Currently, Linux SCTP doesn't yet support TCP style i.e SOCK_STREAM
sockets, we only do udp-style sockets (SOCK_SEQPACKET). We will be
putting in SOCK_STREAM support next, but understand that performance
is not something that has been addressed yet, and a performant SCTP
is still some ways away (though I'm sure Jon and Sridhar will be
working their tails off to do so ;)).

But dont expect SCTP to be the surreptitious underlying layer
carrying TCP traffic, if thats an expectation that anyone has :)

Solving this problem without application involvement is a
more limited scenario..

thanks,
Nivedita

2002-12-13 22:33:34

by Andrew McGregor

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

--On Friday, December 13, 2002 10:07:01 -0800 Nivedita Singhvi
<[email protected]> wrote:

> Andrew McGregor wrote:
>
>> In a closed network, why not have SOCK_STREAM map to something faster
>> than TCP anyway? That is, if I connect(address matching localnet),
>> SOCK_STREAM maps to (eg) SCTP. That would be a far more dramatic
>> performance hack!
>>
>> Andrew
>
> Not that simple. SCTP (if that is what Matti was referring to) is
> a SOCK_STREAM socket, with a protocol of IPPROTO_SCTP. I'm just
> getting done implementing a testsuite against the SCTP API.
>
> i.e. You have to know you want an SCTP socket at the time you
> open the socket. You certainly have no idea whether youre on
> a closed network or not, for that matter, the app may want to talk
> on multiple interfaces etc. (Most hosts will have one interface
> on a public net)..

Things are never that simple. But I was basically talking about a local
policy to change the (semantics of the) API in certain cases. It's
probably a bad idea and would cause all kinds of breakage, but it is
interesting to think about.

>
> Currently, Linux SCTP doesn't yet support TCP style i.e SOCK_STREAM
> sockets, we only do udp-style sockets (SOCK_SEQPACKET). We will be
> putting in SOCK_STREAM support next, but understand that performance
> is not something that has been addressed yet, and a performant SCTP
> is still some ways away (though I'm sure Jon and Sridhar will be
> working their tails off to do so ;)).

I wasn't aware of the current status. Ok, that's just where it's at.

>
> But dont expect SCTP to be the surreptitious underlying layer
> carrying TCP traffic, if thats an expectation that anyone has :)

That's my particular kind of crazy idea.

>
> Solving this problem without application involvement is a
> more limited scenario..

Indeed.

>
> thanks,
> Nivedita
>
>

Andrew

2002-12-13 22:50:32

by Matti Aarnio

[permalink] [raw]

Subject: Re: R: Kernel bug handling TCP_RTO_MAX?

On Fri, Dec 13, 2002 at 10:07:01AM -0800, Nivedita Singhvi wrote:
> Andrew McGregor wrote:
> > In a closed network, why not have SOCK_STREAM map to something faster than
> > TCP anyway? That is, if I connect(address matching localnet), SOCK_STREAM
> > maps to (eg) SCTP. That would be a far more dramatic performance hack!
> >
> > Andrew
>
> Not that simple. SCTP (if that is what Matti was referring to) is
> a SOCK_STREAM socket, with a protocol of IPPROTO_SCTP. I'm just
> getting done implementing a testsuite against the SCTP API.

Most likely that is what I did mean.
Things in IETF do on occasion change names, or I don't always
remember all characters in (E)TLA-acronyms I use rarely...

...
> But dont expect SCTP to be the surreptitious underlying layer
> carrying TCP traffic, if thats an expectation that anyone has :)

At least I didn't expect that, don't know of others.

It all depends on application coders, if users will be able
to use arbitrary network protocols -- say any SOCK_STREAM
protocol supported now, and in future by system kernel.
Ever heard of "TLI" ?

> Solving this problem without application involvement is a
> more limited scenario..

Yes, but sufficiently important to occasionally.

Doing things like this mapping might make limited sense
via routing table lookups.

> thanks,
> Nivedita

/Matti Aarnio