LinuxLists.cc - Raise initial congestion window size / speedup slow start?

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

Ed W wrote:

>
> Just checking the basics here because I don't think this is a bug so
> much as a, less common installation that differs from the "normal" case.
>
> - When we create a tcp connection we always start with tcp slow start
> - This sets the congestion window to effectively 4 packets?
> - This applies in both directions?

Any TCP sender in some degree of compliance with the RFCs on the topic will
employ slow-start.

Linux adds the auto-tuning of the receiver's advertised window. It will start
at a small size, and then grow it as it sees fit.

> - Remote sender responds to my hypothetical http request with the first
> 4 packets of data
> - We need to wait one RTT for the ack to come back and now we can send
> the next 8 packets,
> - Wait for the next ack and at 16 packets we are now moving at a
> sensible fraction of the bandwidth delay product?

There may be some wrinkles depending on how many ACKs the reciever generates
(LRO being enabled and such) and how the ACKs get counted.

> So just to be clear:
> - We don't seem to have any user-space tuning knobs to influence this
> right now?
> - In this age of short attention spans, a couple of extra seconds
> between clicking something and it responding is worth optimising (IMHO)

There is an effort under way, lead by some folks at Google and including some
others, to get the RFC's enhanced in support of the concept of larger initial
congestion windows. Some of the discussion may be in the "tcpm" mailing list
(assuming I've not gotten my mailing lists confused). There may be some
previous discussion of that work in the netdev archives as well.

rick jones

> - I think I need to take this to netdev, but anyone else with any ideas
> happy to hear them?
>
> Thanks
>
> Ed W
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-07-14 20:46:05

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

* Rick Jones | 2010-07-14 13:17:24 [-0700]:

>There is an effort under way, lead by some folks at Google and
>including some others, to get the RFC's enhanced in support of the
>concept of larger initial congestion windows. Some of the discussion
>may be in the "tcpm" mailing list (assuming I've not gotten my
>mailing lists confused). There may be some previous discussion of
>that work in the netdev archives as well.

tcpm is the right mailing list but there is currently no effort to develop
this topic. Why? Because is not a standardization issue, rather it is a
technical issue. You cannot rise the initial CWND and expect a fair behavior.
This was discussed several times and is documented in several documents and
RFCs.

RFC 5681 Section 3.1. Google employees should start with Section 3. This topic
pop's of every two months in netdev and until now I _never_ read a
consolidated contribution.

Partial local issues can already be "fixed" via route specific ip options -
see initcwnd.

HGN

2010-07-14 21:55:33

by David Miller

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

From: Hagen Paul Pfeifer <[email protected]>
Date: Wed, 14 Jul 2010 22:39:19 +0200

> * Rick Jones | 2010-07-14 13:17:24 [-0700]:
>
>>There is an effort under way, lead by some folks at Google and
>>including some others, to get the RFC's enhanced in support of the
>>concept of larger initial congestion windows. Some of the discussion
>>may be in the "tcpm" mailing list (assuming I've not gotten my
>>mailing lists confused). There may be some previous discussion of
>>that work in the netdev archives as well.
>
> tcpm is the right mailing list but there is currently no effort to develop
> this topic. Why? Because is not a standardization issue, rather it is a
> technical issue. You cannot rise the initial CWND and expect a fair behavior.
> This was discussed several times and is documented in several documents and
> RFCs.
>
> RFC 5681 Section 3.1. Google employees should start with Section 3. This topic
> pop's of every two months in netdev and until now I _never_ read a
> consolidated contribution.
>
> Partial local issues can already be "fixed" via route specific ip options -
> see initcwnd.

Although section 3 of RFC 5681 is a great text, it does not say at all
that increasing the initial CWND would lead to fairness issues.

To be honest, I think google's proposal holds a lot of weight. If
over time link sizes and speeds are increasing (they are) then nudging
the initial CWND every so often is a legitimate proposal. Were
someone to claim that utilization is lower than it could be because of
the currenttly specified initial CWND, I would have no problem
believing them.

And I'm happy to make Linux use an increased value once it has
traction in the standardization community.

But for all we know this side discussion about initial CWND settings
could have nothing to do with the issue being reported at the start of
this thread. :-)

2010-07-14 22:05:35

by Ed W

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On 14/07/2010 21:39, Hagen Paul Pfeifer wrote:
> * Rick Jones | 2010-07-14 13:17:24 [-0700]:
>
>
>> There is an effort under way, lead by some folks at Google and
>> including some others, to get the RFC's enhanced in support of the
>> concept of larger initial congestion windows. Some of the discussion
>> may be in the "tcpm" mailing list (assuming I've not gotten my
>> mailing lists confused). There may be some previous discussion of
>> that work in the netdev archives as well.
>>
> tcpm is the right mailing list but there is currently no effort to develop
> this topic. Why? Because is not a standardization issue, rather it is a
> technical issue. You cannot rise the initial CWND and expect a fair behavior.
> This was discussed several times and is documented in several documents and
> RFCs.
>

I'm sure you have covered this to the point you are fed up, but my
searches turn up only a smattering of posts covering this - could you
summarise why "you cannot raise the initial cwnd and expect a fair
behaviour"?

Initial cwnd was changed (increased) in the past (rfc3390) and the RFC
claims that studies then suggested that the benefits were all positive.
Some reasonably smart people have suggested that it might be time to
review the status quo again so it doesn't seem completely obvious that
the current number is optimal?

> RFC 5681 Section 3.1. Google employees should start with Section 3. This topic
> pop's of every two months in netdev and until now I _never_ read a
> consolidated contribution.
>

Sorry, what do you mean by a "consolidated contribution"?

That RFC is a subtle read - it appears to give more specific guidance on
what to do in certain situations, but I'm not sure I see that it
improves slow start convergence speed for my situation (large RTT)?
Would you mind highlighting the new bits for those of us a bit newer to
the subject?

> Partial local issues can already be "fixed" via route specific ip options -
> see initcwnd.
>

Oh, excellent. This seems like exactly what I'm after. (Thanks Stephen
Hemminger!)

Many thanks

Ed W

2010-07-14 22:13:07

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

* David Miller | 2010-07-14 14:55:47 [-0700]:

>Although section 3 of RFC 5681 is a great text, it does not say at all
>that increasing the initial CWND would lead to fairness issues.

Because it is only one side of the medal, probing conservative the available
link capacity in conjunction with n simultaneous probing TCP/SCTP/DCCP
instances is another.

>To be honest, I think google's proposal holds a lot of weight. If
>over time link sizes and speeds are increasing (they are) then nudging
>the initial CWND every so often is a legitimate proposal. Were
>someone to claim that utilization is lower than it could be because of
>the currenttly specified initial CWND, I would have no problem
>believing them.
>
>And I'm happy to make Linux use an increased value once it has
>traction in the standardization community.

Currently I know no working link capacity probing approach, without active
network feedback, to conservatively probing the available link capacity with a
high CWND. I am curious about any future trends.

>But for all we know this side discussion about initial CWND settings
>could have nothing to do with the issue being reported at the start of
>this thread. :-)

;-) sure, but it is often wise to thwart these kind of discussions. It seems
these CWND discussions turn up once every other month. ;-)

Hagen

2010-07-14 22:19:39

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

Hagen Paul Pfeifer wrote:
> * David Miller | 2010-07-14 14:55:47 [-0700]:
>>But for all we know this side discussion about initial CWND settings
>>could have nothing to do with the issue being reported at the start of
>>this thread. :-)
>
>
> ;-) sure, but it is often wise to thwart these kind of discussions. It seems
> these CWND discussions turn up once every other month. ;-)

Which suggests there is a constant "force" out there yet to be rekoned with. :)

rick jones

2010-07-14 22:36:37

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

* Ed W | 2010-07-14 23:05:31 [+0100]:

>Initial cwnd was changed (increased) in the past (rfc3390) and the
>RFC claims that studies then suggested that the benefits were all
>positive. Some reasonably smart people have suggested that it might
>be time to review the status quo again so it doesn't seem completely
>obvious that the current number is optimal?

Do you cite "An Argument for Increasing TCP's Initial Congestion Window"?
People at google stated that a CWND of 10 seems to be fair in their
measurements. 10 because the test setup was equipped with a reasonable large
link capacity? Do they analyse their modification in environments with a small
BDP (e.g. multihop MANET setup, ...)? I am curious, but We will see what
happens if TCPM adopts this.

>That RFC is a subtle read - it appears to give more specific guidance
>on what to do in certain situations, but I'm not sure I see that it
>improves slow start convergence speed for my situation (large RTT)?
>Would you mind highlighting the new bits for those of us a bit newer
>to the subject?

The objection/hint was more of general nature - not specific for larger RTTs.
Environments with larger RTTs are disadvantaged because TCP is ACK clocked.
Half-truth statement for my part because RTT fairness is and was an issue at
the development of new congestion control algorithms: BIC, CUBIC and friends.

>>Partial local issues can already be "fixed" via route specific ip options -
>>see initcwnd.
>
>Oh, excellent. This seems like exactly what I'm after. (Thanks
>Stephen Hemminger!)

Great, you are welcome! ;-)

Hagen

2010-07-14 22:38:10

by Mitchell Erblich

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On Jul 14, 2010, at 12:10 PM, Stephen Hemminger wrote:

> On Wed, 14 Jul 2010 19:48:36 +0100
> Ed W <[email protected]> wrote:
>
>> On 14/07/2010 19:15, David Miller wrote:
>>> From: Bill Davidsen<[email protected]>
>>> Date: Wed, 14 Jul 2010 11:21:15 -0400
>>>
>>>
>>>> You may have to go into /proc/sys/net/core and crank up the
>>>> rmem_* settings, depending on your distribution.
>>>>
>>> You should never, ever, have to touch the various networking sysctl
>>> values to get good performance in any normal setup. If you do, it's a
>>> bug, report it so we can fix it.
>>>
>>
>> Just checking the basics here because I don't think this is a bug so
>> much as a, less common installation that differs from the "normal" case.
>>
>> - When we create a tcp connection we always start with tcp slow start
>> - This sets the congestion window to effectively 4 packets?
>> - This applies in both directions?
>> - Remote sender responds to my hypothetical http request with the first
>> 4 packets of data
>> - We need to wait one RTT for the ack to come back and now we can send
>> the next 8 packets,
>> - Wait for the next ack and at 16 packets we are now moving at a
>> sensible fraction of the bandwidth delay product?
>>
>> So just to be clear:
>> - We don't seem to have any user-space tuning knobs to influence this
>> right now?
>> - In this age of short attention spans, a couple of extra seconds
>> between clicking something and it responding is worth optimising (IMHO)
>> - I think I need to take this to netdev, but anyone else with any ideas
>> happy to hear them?
>>
>> Thanks
>>
>> Ed W
>
> TCP slow start is required by the RFC. It is there to prevent a TCP congestion
> collapse. The HTTP problem is exacerbated by things beyond the user's control:
> 1. stupid server software that dribbles out data and doesn't used the full
> payload of the packets
> 2. web pages with data from multiple sources (ads especially), each of which
> requires a new connection
> 3. pages with huge graphics.
>
> Most of this is because of sites that haven't figured out that somebody on a phone
> across the globl might not have the same RTT and bandwidth that the developer on a
> local network that created them. Changing the initial cwnd isn't going to fix it.
> --

IMO, in theory one of the RFCs state a window with 4 ETH MTU (~6k window)
size packets/segment to allow a fast retransmit if a pkt is dropped.

I thought their is a fast-rexmit knob of 2 or 3 DUPACKs, for faster loss recovery.
Theorecticly it could be set to 1 DUPACK for lossey environments.

Now, the orig slow-start doubles the number of pkts per RTT assuming no loss,
which is a faster ramp up vs the orig congestion avoidance.

Now, with IPv4 with a default of 576 sized segments, without invalidating
the amount of data, 12 pkts could be sent. This would be helpful if your
app only generates smaller buffers, gets more ACKs in return which sets
the ACK clocking at a faster rate. To compensate for the smaller pkt, the ABC
Experimental RFC does byte counting to suggest fairness.

During a few round trips, the pkt size could be increased to the 1.5k ETH MTU
and hopefully to even a 9k Jumbo, probing with one increasing sized pkt.
(?to prevent rexmit of the too large pkt, overlap the increasing pkt with the next
one?)

Mitchell Erblich

> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-07-14 22:40:27

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

* Rick Jones | 2010-07-14 15:19:35 [-0700]:

>>;-) sure, but it is often wise to thwart these kind of discussions. It seems
>>these CWND discussions turn up once every other month. ;-)
>
>Which suggests there is a constant "force" out there yet to be rekoned with. :)

;-) I am _not_ unconscious, but the better address for this kind of
discussions is still tcpm.

Hagen

2010-07-14 22:52:08

by Ed W

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

>> Although section 3 of RFC 5681 is a great text, it does not say at all
>> that increasing the initial CWND would lead to fairness issues.
>>
> Because it is only one side of the medal, probing conservative the available
> link capacity in conjunction with n simultaneous probing TCP/SCTP/DCCP
> instances is another.
>

So lets define the problem more succinctly:
- New TCP connections are assumed to have no knowledge of current
network conditions (bah)
- We desire the connection to consume the maximum amount of bandwidth
possible, but staying ever so fractionally under the maximum link bandwidth

> Currently I know no working link capacity probing approach, without active
> network feedback, to conservatively probing the available link capacity with a
> high CWND. I am curious about any future trends.
>

Sounds like smarter people than I have played this game, but just to
chuck out one idea: How about attacking the idea that we have no
knowledge of network conditions? After all we have a bunch of
information about:

1) very good information about the size of the link to the first hop (eg
the modem/network card reported rate)
2) often a reasonably good idea about the bandwidth to the first
"restrictive" router along our default path (ie usually the situation is
there is a pool of high speed network locally, then a more limited
connectivity between our network and other networks. We can look at the
maximum flows through our network device to outside our subnet and infer
an approximate link speed from that)
3) often moderate quality information about the size of the link between
us and a specific destination IP

So here goes: the heuristic could be to examine current flows through
our interface, use this to offer hints to the remote end during SYN
handshake as to a recommended starting size, and additionally the client
side can examine the implied RTT of the SYN/ACK to further fine tune the
initial cwnd?

In practice this could be implemented in other ways such as examining
recent TCP congestion windows and using some heuristic to start "near"
those. Or remembering congestion windows recently used for popular
destinations? Also we can benefit the receiver of our data - if we see
some app open up 16 http connections to some poor server then some of
those connections will NOT be given large initial cwnd.

Essentially perhaps we can refine our initial cwnd heuristic somewhat if
we assume better than zero knowledge about the network link?

Out of curiousity, why has it taken so long for active feedback to
appear? If every router simply added a hint to the packet as to the max
bandwidth it can offer then we would appear to be able to make massively
better decisions on window sizes. Furthermore routers have the ability
to put backpressure on classes of traffic as appropriate. I guess the
speed at which ECN has been adopted answers the question of why nothing
more exotic has appeared?

>> But for all we know this side discussion about initial CWND settings
>> could have nothing to do with the issue being reported at the start of
>> this thread. :-)
>>

Actually the original question was mine and it was literally - can I
adjust the initial cwnd for users of my very specific satellite network
which has a high RTT. I believe Stephen Hemminger has been kind enough
to recently add the facility to experiment with this to the ip utility
and so I am now in a position to go do some testing - thanks Stephen

Cheers

Ed W

2010-07-14 23:01:07

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

* Ed W | 2010-07-14 23:52:02 [+0100]:

>Out of curiousity, why has it taken so long for active feedback to
>appear? If every router simply added a hint to the packet as to the
>max bandwidth it can offer then we would appear to be able to make
>massively better decisions on window sizes. Furthermore routers have
>the ability to put backpressure on classes of traffic as appropriate.
>I guess the speed at which ECN has been adopted answers the question
>of why nothing more exotic has appeared?

It is quite late here so I will quickly write two sentence about ECN: one
month ago Lars Eggers posted a link at the tcpm maillinglist where google (not
really sure if it was google) analysed the employment of ECN - the usage was
really low. Search the PDF, it is quite interesting one.

Hagen

2010-07-14 23:01:40

by Ed W

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

> Do you cite "An Argument for Increasing TCP's Initial Congestion Window"?
> People at google stated that a CWND of 10 seems to be fair in their
> measurements. 10 because the test setup was equipped with a reasonable large
> link capacity? Do they analyse their modification in environments with a small
> BDP (e.g. multihop MANET setup, ...)? I am curious, but We will see what
> happens if TCPM adopts this.
>

Well, I personally would shoot for starting from the position of
assuming better than zero knowledge about our link and incorporating
that into the initial cwnd estimate...

We know something about the RTT from the syn/ack times, speed of the
local link and quickly we will learn about median window sizes to other
destinations, plus additionally the kernel has some knowledge of other
connections currently in progress. With all that information perhaps we
can make a more informed option than just a hard coded magic number? (Oh
and lets make the option pluggable so that we can soon have 10 different
kernel options...)

Seems like there is evidence that networks are starting to cluster into groups that would benefit from a range of cwnd options (higher/lower) - perhaps there is some way to choose a reasonable heuristic to cluster these and choose a better starting option?

Cheers

Ed W

2010-07-14 23:05:51

by Ed W

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On 15/07/2010 00:01, Hagen Paul Pfeifer wrote:
> It is quite late here so I will quickly write two sentence about ECN: one
> month ago Lars Eggers posted a link at the tcpm maillinglist where google (not
> really sure if it was google) analysed the employment of ECN - the usage was
> really low. Search the PDF, it is quite interesting one.
>

I would speculate that this is because there is a big warning on ECN
saying that it may cause you to loose customers who can't connect to
you... Businesses are driven by needing to support the most common case,
not the most optimal (witness the pain of html development and needing
to consider IE6...)

What would be more useful is for google to survey how many devices are
unable to interoperate with ECN and if that number turned out to be
extremely low, and this fact were advertised, then I suspect we might
see a mass increase in it's deployment? I know I have it turned off on
all my servers because I worry more about loosing one customer than
improving the experience for all customers...

Cheers

Ed W

2010-07-15 02:52:46

by Bill Fink

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On Wed, 14 Jul 2010, David Miller wrote:

> From: Bill Davidsen <[email protected]>
> Date: Wed, 14 Jul 2010 11:21:15 -0400
>
> > You may have to go into /proc/sys/net/core and crank up the
> > rmem_* settings, depending on your distribution.
>
> You should never, ever, have to touch the various networking sysctl
> values to get good performance in any normal setup. If you do, it's a
> bug, report it so we can fix it.
>
> I cringe every time someone says to do this, so please do me a favor
> and don't spread this further. :-)
>
> For one thing, TCP dynamically adjusts the socket buffer sizes based
> upon the behavior of traffic on the connection.
>
> And the TCP memory limit sysctls (not the core socket ones) are sized
> based upon available memory. They are there to protect you from
> situations such as having so much memory dedicated to socket buffers
> that there is none left to do other things effectively. It's a
> protective limit, rather than a setting meant to increase or improve
> performance. So like the others, leave these alone too.

What's normal? :-)

netem1% cat /proc/version
Linux version 2.6.30.10-105.2.23.fc11.x86_64 ([email protected]) (gcc version 4.4.1 20090725 (Red Hat 4.4.1-2) (GCC) ) #1 SMP Thu Feb 11 07:06:34 UTC 2010

Linux TCP autotuning across an 80 ms RTT cross country network path:

netem1% nuttcp -T10 -i1 192.168.1.18
14.1875 MB / 1.00 sec = 119.0115 Mbps 0 retrans
558.0000 MB / 1.00 sec = 4680.7169 Mbps 0 retrans
872.8750 MB / 1.00 sec = 7322.3527 Mbps 0 retrans
869.6875 MB / 1.00 sec = 7295.5478 Mbps 0 retrans
858.4375 MB / 1.00 sec = 7201.0165 Mbps 0 retrans
857.3750 MB / 1.00 sec = 7192.2116 Mbps 0 retrans
865.5625 MB / 1.00 sec = 7260.7193 Mbps 0 retrans
872.3750 MB / 1.00 sec = 7318.2095 Mbps 0 retrans
862.7500 MB / 1.00 sec = 7237.2571 Mbps 0 retrans
857.6250 MB / 1.00 sec = 7194.1864 Mbps 0 retrans

7504.2771 MB / 10.09 sec = 6236.5068 Mbps 11 %TX 25 %RX 0 retrans 80.59 msRTT

Manually specified 100 MB TCP socket buffer on the same path:

netem1% nuttcp -T10 -i1 -w100m 192.168.1.18
106.8125 MB / 1.00 sec = 895.9598 Mbps 0 retrans
1092.0625 MB / 1.00 sec = 9160.3254 Mbps 0 retrans
1111.2500 MB / 1.00 sec = 9322.6424 Mbps 0 retrans
1115.4375 MB / 1.00 sec = 9356.2569 Mbps 0 retrans
1116.4375 MB / 1.00 sec = 9365.6937 Mbps 0 retrans
1115.3125 MB / 1.00 sec = 9356.2749 Mbps 0 retrans
1121.2500 MB / 1.00 sec = 9405.6233 Mbps 0 retrans
1125.5625 MB / 1.00 sec = 9441.6949 Mbps 0 retrans
1130.0000 MB / 1.00 sec = 9478.7479 Mbps 0 retrans
1139.0625 MB / 1.00 sec = 9555.8559 Mbps 0 retrans

10258.5120 MB / 10.20 sec = 8440.3558 Mbps 15 %TX 40 %RX 0 retrans 80.59 msRTT

The manually selected TCP socket buffer size both ramps up
quicker and achieves a much higher steady state rate.

-Bill

2010-07-15 03:49:22

by Bill Fink

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On Thu, 15 Jul 2010, Hagen Paul Pfeifer wrote:

> * David Miller | 2010-07-14 14:55:47 [-0700]:
>
> >Although section 3 of RFC 5681 is a great text, it does not say at all
> >that increasing the initial CWND would lead to fairness issues.
>
> Because it is only one side of the medal, probing conservative the available
> link capacity in conjunction with n simultaneous probing TCP/SCTP/DCCP
> instances is another.
>
> >To be honest, I think google's proposal holds a lot of weight. If
> >over time link sizes and speeds are increasing (they are) then nudging
> >the initial CWND every so often is a legitimate proposal. Were
> >someone to claim that utilization is lower than it could be because of
> >the currenttly specified initial CWND, I would have no problem
> >believing them.
> >
> >And I'm happy to make Linux use an increased value once it has
> >traction in the standardization community.
>
> Currently I know no working link capacity probing approach, without active
> network feedback, to conservatively probing the available link capacity with a
> high CWND. I am curious about any future trends.

A long, long time ago, I suggested a Path BW Discovery mechanism
to the IETF, analogous to the Path MTU Discovery mechanism, but
it didn't get any traction. Such information could be extremely
useful to TCP endpoints, to determine a maximum window size to
use, to effectively rate limit a much stronger sender from
overpowering a much weaker receiver (for example 10-GigE -> GigE),
resulting in abominable performance across large RTT paths
(as low as 12 Mbps), even in the absence of any real network
contention.

-Bill

2010-07-15 04:12:39

by Tom Herbert

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On Wed, Jul 14, 2010 at 1:39 PM, Hagen Paul Pfeifer <[email protected]> wrote:
> * Rick Jones | 2010-07-14 13:17:24 [-0700]:
>
>>There is an effort under way, lead by some folks at Google and
>>including some others, to get the RFC's enhanced in support of the
>>concept of larger initial congestion windows. ?Some of the discussion
>>may be in the "tcpm" mailing list (assuming I've not gotten my
>>mailing lists confused). ?There may be some previous discussion of
>>that work in the netdev archives as well.
>
> tcpm is the right mailing list but there is currently no effort to develop
> this topic. Why? Because is not a standardization issue, rather it is a
> technical issue. You cannot rise the initial CWND and expect a fair behavior.
> This was discussed several times and is documented in several documents and
> RFCs.
>
> RFC 5681 Section 3.1. Google employees should start with Section 3. This topic
> pop's of every two months in netdev and until now I _never_ read a
> consolidated contribution.
>

There is an Internet draft
(http://datatracker.ietf.org/doc/draft-hkchu-tcpm-initcwnd/) on
raising the default Initial Congestion window to 10 segments, as well
as a SIGCOMM paper (http://ccr.sigcomm.org/online/?q=node/621). We
presented this proposal and data supporting it at Anaheim IETF, and
will be following up in Netherlands with more data including some of
which should further address fairness questions.

In terms of Linux implementation, setting ICW via ip route is
sufficient support on the server side. There is also a proposed patch
which could allow applications to set ICW themselves (in hopes that
application can reduce number of simultaneous connections). On the
client side we can now adjust the receive window to advertise larger
initial windows. Among current implementations, Linux advertises the
smallest default receive window of major OSes, so it turns out Linux
clients won't get lower latency benefits currently (so we'll probably
ask to raise the default some day :-)).

Tom

> Partial local issues can already be "fixed" via route specific ip options -
> see initcwnd.
>
> HGN
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2010-07-15 04:51:54

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On Wed, Jul 14, 2010 at 11:15 AM, David Miller <[email protected]> wrote:
> From: Bill Davidsen <[email protected]>
> Date: Wed, 14 Jul 2010 11:21:15 -0400
>
>> You may have to go into /proc/sys/net/core and crank up the
>> rmem_* settings, depending on your distribution.
>
> You should never, ever, have to touch the various networking sysctl
> values to get good performance in any normal setup. ?If you do, it's a
> bug, report it so we can fix it.

Agreed, except there are indeed bugs in the code today in that the
code in various places assumes initcwnd as per RFC3390. So when
initcwnd is raised, that actual value may be limited unnecessarily by
the initial wmem/sk_sndbuf.

Will try to find time to submit a patch.

Jerry

>
> I cringe every time someone says to do this, so please do me a favor
> and don't spread this further. :-)
>
> For one thing, TCP dynamically adjusts the socket buffer sizes based
> upon the behavior of traffic on the connection.
>
> And the TCP memory limit sysctls (not the core socket ones) are sized
> based upon available memory. ?They are there to protect you from
> situations such as having so much memory dedicated to socket buffers
> that there is none left to do other things effectively. ?It's a
> protective limit, rather than a setting meant to increase or improve
> performance. ?So like the others, leave these alone too.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2010-07-15 05:09:08

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On Wed, Jul 14, 2010 at 1:39 PM, Hagen Paul Pfeifer <[email protected]> wrote:
> * Rick Jones | 2010-07-14 13:17:24 [-0700]:
>
>>There is an effort under way, lead by some folks at Google and
>>including some others, to get the RFC's enhanced in support of the
>>concept of larger initial congestion windows. ?Some of the discussion
>>may be in the "tcpm" mailing list (assuming I've not gotten my
>>mailing lists confused). ?There may be some previous discussion of
>>that work in the netdev archives as well.
>
> tcpm is the right mailing list but there is currently no effort to develop
> this topic. Why? Because is not a standardization issue, rather it is a

Please don't mislead. Raising the initcwnd is actively being pursued at IETF
right now. If not here, where else? It is following the same path where initcwnd
was first raised in late 90' through rfc2414/rfc3390.

IETF is not a standard organization just for protocol lawyers to play
word games.
It is responsible for solving real technical issues as well.

Jerry

> technical issue. You cannot rise the initial CWND and expect a fair behavior.
> This was discussed several times and is documented in several documents and
> RFCs.
>
> RFC 5681 Section 3.1. Google employees should start with Section 3. This topic
> pop's of every two months in netdev and until now I _never_ read a
> consolidated contribution.
>
> Partial local issues can already be "fixed" via route specific ip options -
> see initcwnd.
>
> HGN
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2010-07-15 05:29:26

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On Wed, Jul 14, 2010 at 8:49 PM, Bill Fink <[email protected]> wrote:
> On Thu, 15 Jul 2010, Hagen Paul Pfeifer wrote:
>
>> * David Miller | 2010-07-14 14:55:47 [-0700]:
>>
>> >Although section 3 of RFC 5681 is a great text, it does not say at all
>> >that increasing the initial CWND would lead to fairness issues.
>>
>> Because it is only one side of the medal, probing conservative the available
>> link capacity in conjunction with n simultaneous probing TCP/SCTP/DCCP
>> instances is another.
>>
>> >To be honest, I think google's proposal holds a lot of weight. ?If
>> >over time link sizes and speeds are increasing (they are) then nudging
>> >the initial CWND every so often is a legitimate proposal. ?Were
>> >someone to claim that utilization is lower than it could be because of
>> >the currenttly specified initial CWND, I would have no problem
>> >believing them.
>> >
>> >And I'm happy to make Linux use an increased value once it has
>> >traction in the standardization community.
>>
>> Currently I know no working link capacity probing approach, without active
>> network feedback, to conservatively probing the available link capacity with a
>> high CWND. I am curious about any future trends.
>
> A long, long time ago, I suggested a Path BW Discovery mechanism
> to the IETF, analogous to the Path MTU Discovery mechanism, but
> it didn't get any traction. ?Such information could be extremely
> useful to TCP endpoints, to determine a maximum window size to
> use, to effectively rate limit a much stronger sender from
> overpowering a much weaker receiver (for example 10-GigE -> GigE),
> resulting in abominable performance across large RTT paths
> (as low as 12 Mbps), even in the absence of any real network
> contention.

Unfortunately that is not going to help initcwnd (unless one can invent a
PBWD protocol from just 3WHS), and the web is dominated by short-lived
connections so the small initcwnd becomes a choke point.

Jerry

>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-Bill
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2010-07-15 07:48:38

by Ed W

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On 15/07/2010 05:12, Tom Herbert wrote:
> There is an Internet draft
> (http://datatracker.ietf.org/doc/draft-hkchu-tcpm-initcwnd/) on
> raising the default Initial Congestion window to 10 segments, as well
> as a SIGCOMM paper (http://ccr.sigcomm.org/online/?q=node/621).
>

You guys have obviously done a lot of work on this, however, it seems
that there is a case for introducing some heuristics into the choice of
init cwnd as well as offering the option to go larger? An initial size
of 10 packets is just another magic number that obviously works with the
median bandwidth delay product on today's networks - can we not do
better still?

Seems like a bunch of clever folks have already suggested tweaks to the
steady stage congestion avoidance, but so far everyone is afraid to
touch the early stage heuristics?

Also would you guys not benefit from wider deployment of ECN? Can you
not help find some ways that deployment could be increased? At present
there are big warnings all over the option that it causes some problems,
but there is no quantification of how much and really whether this
warning is still appropriate?

Ed W

2010-07-15 10:24:23

by Alan

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On Thu, 15 Jul 2010 00:13:01 +0200
Hagen Paul Pfeifer <[email protected]> wrote:

> * David Miller | 2010-07-14 14:55:47 [-0700]:
>
> >Although section 3 of RFC 5681 is a great text, it does not say at all
> >that increasing the initial CWND would lead to fairness issues.
>
> Because it is only one side of the medal, probing conservative the available
> link capacity in conjunction with n simultaneous probing TCP/SCTP/DCCP
> instances is another.
>
> >To be honest, I think google's proposal holds a lot of weight. If
> >over time link sizes and speeds are increasing (they are) then nudging
> >the initial CWND every so often is a legitimate proposal. Were
> >someone to claim that utilization is lower than it could be because of
> >the currenttly specified initial CWND, I would have no problem
> >believing them.
> >
> >And I'm happy to make Linux use an increased value once it has
> >traction in the standardization community.
>
> Currently I know no working link capacity probing approach, without active
> network feedback, to conservatively probing the available link capacity with a
> high CWND. I am curious about any future trends.

Given perfect information from the network nodes you still need to
traverse the network each direction and then return an answer which means
with a 0.5sec end to end time as in the original posting causality itself
demands 1.5 seconds to get an answer which is itself incomplete and
obsolete.

Causality isn't showing any signs of going away soon.

2010-07-15 15:10:34

by Bill Davidsen

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

Ed W wrote:
>
>>> Does someone have some pointers on where to look to modify initial
>>> congestion window please?
>>>
>> Are you sure that's the issue? The backlog is in incoming, is it not?
>
> Well, I was simplifying a little bit, actually I have a bunch of
> protocols in use, http is one of them
>
>
>> Having dealt with moderately long delays push TB between timezones,
>> have you set your window size up? Set
>> /proc/sys/net/ipv4/tcp_adv_win_scale to 5 or 6 and see if that helps.
>> You may have to go into /proc/sys/net/core and crank up the rmem_*
>> settings, depending on your distribution.
>>
>> This allows the server to push a lot of data without an ack, which is
>> what you want, the ack will be delayed by the long latency, so this
>> helps.
>
> I think I'm misunderstanding something fundamental here:
>
> - Surely the limited congestion window is what throttles me at
> connection initialisation time and this will not be affected by
> changing the params you mention above? For sure the sliding window
> will be relevant vs my bandwidth delay product once the tcp connection
> reaches steady state, but I'm mostly worried here about performance
> right at the creation of the connection?
>
> - Both you and Alan mention that the bulk of the traffic is "incoming"
> - this implies you think it's relevant? Obviously I'm missing
> something fundamental here because my understanding is that the
> congestion window shuts us down in both directions (at the start of
> the connection?)
>
> Thanks for the replies - I will take it over to netdev
>
Perhaps they will give you an answer you like better.

--
Bill Davidsen <[email protected]>
"We can't solve today's problems by using the same thinking we
used in creating them." - Einstein

2010-07-15 17:36:08

by Jerry Chu

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On Thu, Jul 15, 2010 at 12:48 AM, Ed W <[email protected]> wrote:
>
> On 15/07/2010 05:12, Tom Herbert wrote:
>>
>> There is an Internet draft
>> (http://datatracker.ietf.org/doc/draft-hkchu-tcpm-initcwnd/) on
>> raising the default Initial Congestion window to 10 segments, as well
>> as a SIGCOMM paper (http://ccr.sigcomm.org/online/?q=node/621).
>>
>
> You guys have obviously done a lot of work on this, however, it seems that there is a case for introducing some heuristics into the choice of init cwnd as well as offering the option to go larger? ?An initial size of 10 packets is just another magic number that obviously works with the median bandwidth delay product on today's networks - can we not do better still?
>
> Seems like a bunch of clever folks have already suggested tweaks to the steady stage congestion avoidance, but so far everyone is afraid to touch the early stage heuristics?

This is because there is not enough info for deriving any heuristic.
For initcwnd one is constrained to
only info from 3WHS. This includes a rough estimate of RTT plus all
the bits in the SYN/SYN-ACK
headers. I'm assuming a stateless approach. We've tried a stateful
solution (i.e., seeding initcwnd from
past history) but found its complexity outweigh the gain.
(See http://www.ietf.org/proceedings/77/slides/tcpm-4.pdf)

>
> Also would you guys not benefit from wider deployment of ECN? ?Can you not help find some ways that deployment could be increased? ?At present there are big warnings all over the option that it causes some problems, but there is no quantification of how much and really whether this warning is still appropriate?

That will add yet another hoop for us to jump over. Also I'm not sure
a couple of bits are sufficient for a
guesstimate of what initcwnd ought to be.

Our reasoning is simple - there has been tremendous b/w growth since
rfc2414 was published. Even the
lowest common denominator (i.e., dialup links) has moved from 9.6Kbps
to 56Kbps. That's a six fold
increase. If you believe initcwnd should grow proportionally to the
buffer sizes in access links, and the
buffer sizes grows proportionally to b/w, then the initcwnd outght to
be 3*6 = 18 today.

We chose a modest increase (10) with the hope to expedite the
standardization process (and would
certainly appreciate helps from folks on this list). 10 is very
conservative considering many deployment
has gone beyond 3, including Linux stack, which allows one additional
pkt if it's the last data pkt.

Longer term it will be nice to find a way to get rid of this fixed,
somewhat arbitrary initcwnd. Mark
Allman's JumpStart is one idea, but it'd be a much longer route.

Jerry

>
> Ed W
>

2010-07-15 19:51:28

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

I have to wonder if the only heuristic one could employ for divining the initial
congestion window is to be either pessimistic/conservative or
optimistic/liberal. Or for that matter the only one one really needs here?

That's what it comes down to doesn't it? At any one point in time, we don't
*really* know the state of the network and whether it can handle the load we
might wish to put upon it. We are always reacting to it. Up until now, it has
been felt necessary to be pessimistic/conservative at time of connection
establishment and not rely as much on the robustness of the "control" part of
avoidance and control.

Now, the folks at Google have lots of data to suggest we don't need to be so
pessimistic/conservative and so we have to decide if we are willing to be more
optimistic/liberal. Broadly handwaving, the "netdev we" seems to be willing to
be more optimistic/liberal in at least a few cases, and the question comes down
to whether or not the "IETF we" will be similarly willing.

rick jones

2010-07-15 20:49:09

by Stephen Hemminger

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On Thu, 15 Jul 2010 12:51:22 -0700
Rick Jones <[email protected]> wrote:

> I have to wonder if the only heuristic one could employ for divining the initial
> congestion window is to be either pessimistic/conservative or
> optimistic/liberal. Or for that matter the only one one really needs here?
>
> That's what it comes down to doesn't it? At any one point in time, we don't
> *really* know the state of the network and whether it can handle the load we
> might wish to put upon it. We are always reacting to it. Up until now, it has
> been felt necessary to be pessimistic/conservative at time of connection
> establishment and not rely as much on the robustness of the "control" part of
> avoidance and control.
>
> Now, the folks at Google have lots of data to suggest we don't need to be so
> pessimistic/conservative and so we have to decide if we are willing to be more
> optimistic/liberal. Broadly handwaving, the "netdev we" seems to be willing to
> be more optimistic/liberal in at least a few cases, and the question comes down
> to whether or not the "IETF we" will be similarly willing.

I am not convinced that a host being aggressive with initial cwnd (Linux) would
not end up unfairly monopolizing available bandwidth compared to older more conservative
implementations (Windows). Whether fairness is important or not is another debate.

2010-07-15 23:14:30

by Bill Davidsen

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

David Miller wrote:
> From: Bill Davidsen <[email protected]>
> Date: Wed, 14 Jul 2010 11:21:15 -0400
>
>> You may have to go into /proc/sys/net/core and crank up the
>> rmem_* settings, depending on your distribution.
>
> You should never, ever, have to touch the various networking sysctl
> values to get good performance in any normal setup. If you do, it's a
> bug, report it so we can fix it.
>
> I cringe every time someone says to do this, so please do me a favor
> and don't spread this further. :-)
>
I think transit time measured in 1/10th sec would disqualify this as a "normal
setup."

High bandwidth and high latency don't work well because you get "send until the
window is full then wait for ack" and poor performance. I saw this with sat feed
to Wyoming from GE's Research Center in upstate NY in the late 80's or early
90's. (I think this was NYserNet at that time). I did feeds from NYC area to
California and Hawaii with SBC in the early to mid 2k years. In every case
SunOS, Solaris, AIX and Linux all failed to hit anything like reasonable
transfer speeds without manually tweaking, and I got the advice on increasing
window size from network engineers at ISPs and backbone providers.

The O.P. may have other issues, and may benefit from doing other things as well,
but raising window size is a reasonable thing to do on links with RTT in
hundreds of ms, and it's easy to try without changing config files.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

2010-07-16 00:23:43

by Henrique de Moraes Holschuh

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

I don't even consider a modest IW increase to 10 is aggressive. The scaling
of IW is only adequate IMO given the huge b/w growth in the past
decade. Remember there could be plenty of flows sending large cwnd
bursts at
twice the bottleneck link rate at any point of time in the network anyway so
the "fairness" question may already be ill-defined. In any case we're
trying to conduct some experiment in a private testbed to hopefully
get some insights
with real data.

Jerry

On Thu, Jul 15, 2010 at 1:48 PM, Stephen Hemminger
<[email protected]> wrote:
> On Thu, 15 Jul 2010 12:51:22 -0700
> Rick Jones <[email protected]> wrote:
>
>> I have to wonder if the only heuristic one could employ for divining the initial
>> congestion window is to be either pessimistic/conservative or
>> optimistic/liberal. ?Or for that matter the only one one really needs here?
>>
>> That's what it comes down to doesn't it? ?At any one point in time, we don't
>> *really* know the state of the network and whether it can handle the load we
>> might wish to put upon it. ?We are always reacting to it. Up until now, it has
>> been felt necessary to be pessimistic/conservative at time of connection
>> establishment and not rely as much on the robustness of the "control" part of
>> avoidance and control.
>>
>> Now, the folks at Google have lots of data to suggest we don't need to be so
>> pessimistic/conservative and so we have to decide if we are willing to be more
>> optimistic/liberal. ?Broadly handwaving, the "netdev we" seems to be willing to
>> be more optimistic/liberal in at least a few cases, and the question comes down
>> to whether or not the "IETF we" will be similarly willing.
>
> I am not convinced that a host being aggressive with initial cwnd (Linux) would
> not end up unfairly monopolizing available bandwidth compared to older more conservative
> implementations (Windows). Whether fairness is important or not is another debate.
>
>

2010-07-16 02:58:37

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On Wed, 14 Jul 2010, Ed W wrote:
> Hi, my network connection looks like 500Kbits with a round trip
> latency of perhaps 1s+ (it's a satellite link).

Last time I dealt with such stuff (hundreds of VSATs across the whole
country, arriving at a Satellite Base Station), you absolutely had to use
protocol enhancement proxies in the SBS AND in the VSAT clients to get good
performance for typical end-user Internet usage. This was a few years ago,
but it probably hasn't changed much. I don't recall what proprietary stuff
was used for the proxy, but...

http://en.wikipedia.org/wiki/Performance_Enhancing_Proxy
http://sourceforge.net/projects/pepsal/

A Google search for pepsal will return a link to a PDF explaining the
design. Maybe that could be of some help for you?

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2010-07-16 09:03:22

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On Wed, 14 Jul 2010 23:49:17 -0400, Bill Fink wrote:

> A long, long time ago, I suggested a Path BW Discovery mechanism
> to the IETF, analogous to the Path MTU Discovery mechanism, but
> it didn't get any traction. Such information could be extremely
> useful to TCP endpoints, to determine a maximum window size to
> use, to effectively rate limit a much stronger sender from
> overpowering a much weaker receiver (for example 10-GigE -> GigE),
> resulting in abominable performance across large RTT paths
> (as low as 12 Mbps), even in the absence of any real network
> contention.

Much weaker middlebox? The windowing mechanism should be sufficient to
avoid endpoints from over-commiting.

Anyway, your proposed draft (I didn't searched for it) sound like a
mechanism similar to RFC 4782: Quick-Start for TCP and IP.

This document specifies an optional Quick-Start mechanism for
transport protocols, in cooperation with routers, to determine an
allowed sending rate at the start and, at times, in the middle of a
data transfer (e.g., after an idle period). While Quick-Start is
designed to be used by a range of transport protocols, in this
document we only specify its use with TCP. Quick-Start is designed
to allow connections to use higher sending rates when there is
significant unused bandwidth along the path, and the sender and all
of the routers along the path approve the Quick-Start Request.

Cheers, Hagen

2010-07-16 17:13:27

by Patrick McManus

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On Wed, 2010-07-14 at 21:51 -0700, H.K. Jerry Chu wrote:
> except there are indeed bugs in the code today in that the
> code in various places assumes initcwnd as per RFC3390. So when
> initcwnd is raised, that actual value may be limited unnecessarily by
> the initial wmem/sk_sndbuf.

Thanks for the discussion!

can you tell us more about the impl concerns of initcwnd stored on the
route?

and while I'm asking for info, can you expand on the conclusion
regarding poor cache hit rates for reusing learned cwnds? (ok, I admit I
only read the slides.. maybe the paper has more info?)

article and slides much appreciated and very interetsing. I've long been
of the opinion that the downsides of being too aggressive once in a
while aren't all that serious anymore.. as someone else said in a
non-reservation world you are always trying to predict the future anyhow
and therefore overflowing a queue is always possible no matter how
conservative.

2010-07-16 17:41:47

by Ed W

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

> and while I'm asking for info, can you expand on the conclusion
> regarding poor cache hit rates for reusing learned cwnds? (ok, I admit I
> only read the slides.. maybe the paper has more info?)
>

My guess is that this result is specific to google and their servers?

I guess we can probably stereotype the world into two pools of devices:

1) Devices in a pool of fast networking, but connected to the rest of
the world through a relatively slow router
2) Devices connected via a high speed network and largely the bottleneck
device is many hops down the line and well away from us

I'm thinking here 1) client users behind broadband routers, wireless,
3G, dialup, etc and 2) public servers that have obviously been
deliberately placed in locations with high levels of interconnectivity.

I think history information could be more useful for clients in category
1) because there is a much higher probability that their most
restrictive device is one hop away and hence affects all connections and
relatively occasionally the bottleneck is multiple hops away. For
devices in category 2) it's much harder because the restriction will
usually be lots of hops away and effectively you are trying to figure
out and cache the speed of every ADSL router out there... For sure you
can probably figure out how to cluster this stuff and say that pool
there is 56K dialup, that pool there is "broadband", that pool is cell
phone, etc, but probably it's hard to do better than that?

So my guess is this is why google have had poor results investigating
cwnd caching?

However, I would suggest that whilst it's of little value for the server
side, it still remains a very interesting idea for the client side and
the cache hit ratio would seem to be dramatically higher here?

I haven't studied the code, but given there is a userspace ability to
change init cwnd through the IP utility, it would seem likely that
relatively little coding would now be required to implement some kind of
limited cwnd caching and experiment with whether this is a valuable
addition? I would have thought if you are only fiddling with devices
behind a broadband router then there is little chance of you "crashing
the internet" with these kind of experiments?

Good luck

Ed W

2010-07-17 00:36:49

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On Fri, Jul 16, 2010 at 10:01 AM, Patrick McManus <[email protected]> wrote:
> On Wed, 2010-07-14 at 21:51 -0700, H.K. Jerry Chu wrote:
>> ?except there are indeed bugs in the code today in that the
>> code in various places assumes initcwnd as per RFC3390. So when
>> initcwnd is raised, that actual value may be limited unnecessarily by
>> the initial wmem/sk_sndbuf.
>
> Thanks for the discussion!
>
> can you tell us more about the impl concerns of initcwnd stored on the
> route?

We have found two issues when altering initcwnd through the ip route cmd:
1. initcwnd is actually capped by sndbuf (i.e., tcp_wmem[1], which is
defaulted to a small value of 16KB). This problem has been made obscured
by the TSO code, which fudges the flow control limit (and could be a bug by
itself).

2. the congestion backoff code is supposed to take inflight, rather than cwnd,
but initcwnd presents a special case. I don't fully understand the code yet to
propose a fix.

>
> and while I'm asking for info, can you expand on the conclusion
> regarding poor cache hit rates for reusing learned cwnds? (ok, I admit I
> only read the slides.. maybe the paper has more info?)

This is partly due to our load balancer policy resulting in poor cache hit,
partly due to the sheer volumes of remote clients. Some of colleagues
tried to change the host cache to a /24 subnet cache but the result wasn't
that good either (sorry I don't remember all the details.)

>
> article and slides much appreciated and very interetsing. I've long been
> of the opinion that the downsides of being too aggressive once in a
> while aren't all that serious anymore.. as someone else said in a
> non-reservation world you are always trying to predict the future anyhow
> and therefore overflowing a queue is always possible no matter how
> conservative.

Please voice your support to TCPM then :)

Jerry

>
>
>
>
>

2010-07-17 01:23:59

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

On Fri, Jul 16, 2010 at 10:41 AM, Ed W <[email protected]> wrote:
>
>> and while I'm asking for info, can you expand on the conclusion
>> regarding poor cache hit rates for reusing learned cwnds? (ok, I admit I
>> only read the slides.. maybe the paper has more info?)
>>
>
> My guess is that this result is specific to google and their servers?
>
> I guess we can probably stereotype the world into two pools of devices:
>
> 1) Devices in a pool of fast networking, but connected to the rest of the
> world through a relatively slow router
> 2) Devices connected via a high speed network and largely the bottleneck
> device is many hops down the line and well away from us
>
> I'm thinking here 1) client users behind broadband routers, wireless, 3G,
> dialup, etc and 2) public servers that have obviously been deliberately
> placed in locations with high levels of interconnectivity.
>
> I think history information could be more useful for clients in category 1)
> because there is a much higher probability that their most restrictive
> device is one hop away and hence affects all connections and relatively
> occasionally the bottleneck is multiple hops away. ?For devices in category
> 2) it's much harder because the restriction will usually be lots of hops
> away and effectively you are trying to figure out and cache the speed of
> every ADSL router out there... ?For sure you can probably figure out how to
> cluster this stuff and say that pool there is 56K dialup, that pool there is
> "broadband", that pool is cell phone, etc, but probably it's hard to do
> better than that?
>
> So my guess is this is why google have had poor results investigating cwnd
> caching?

Actually we have investigated two type of caches, a short-history limited size
internal cache that is subject to some LRU replacement policy hence
much limiting
the cache hit rate, and a long-history external cache, which provides much more
accurate cwnd history per subnet but with high complexity and
deployment headache.

Also we have set out for a much more ambitious goal, to not just speed
up our own
services, but also provide a solution that could benefit the whole web
(see http://code.google.com/speed/index.html). The latter pretty much
precludes a complex
external cache scheme mentioned above.

Jerry

>
> However, I would suggest that whilst it's of little value for the server
> side, it still remains a very interesting idea for the client side and the
> cache hit ratio would seem to be dramatically higher here?
>
>
> I haven't studied the code, but given there is a userspace ability to change
> init cwnd through the IP utility, it would seem likely that relatively
> little coding would now be required to implement some kind of limited cwnd
> caching and experiment with whether this is a valuable addition? ?I would
> have thought if you are only fiddling with devices behind a broadband router
> then there is little chance of you "crashing the internet" with these kind
> of experiments?
>
> Good luck
>
> Ed W
>

2010-07-19 17:08:30

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

H.K. Jerry Chu wrote:
> On Fri, Jul 16, 2010 at 10:01 AM, Patrick McManus <[email protected]> wrote:
>>can you tell us more about the impl concerns of initcwnd stored on the
>>route?
>
>
> We have found two issues when altering initcwnd through the ip route cmd:
> 1. initcwnd is actually capped by sndbuf (i.e., tcp_wmem[1], which is
> defaulted to a small value of 16KB). This problem has been made obscured
> by the TSO code, which fudges the flow control limit (and could be a bug by
> itself).

I'll ask my Emily Litella question of the day and inquire as to why that would
be unique to altering initcwnd via the route?

The slightly less Emily Litella-esque question is why an appliction with a
desire to know it could send more than 16K at one time wouldn't have either
asked via its install docs to have the minimum tweaked (certainly if one is
already tweaking routes...), or "gone all the way" and made an explicit
setsockopt(SO_SNDBUF) call? We are in a realm of applications for which there
was a proposal to allow them to pick their own initcwnd right? Having them pick
an SO_SNDBUF size would seem to be no more to ask.

rick jones

sendbuf_init = max(tcp_mem,initcwnd)?

2010-07-19 22:51:16

[permalink] [raw]

Subject: Re: Raise initial congestion window size / speedup slow start?

Mon, Jul 19, 2010 at 10:08 AM, Rick Jones <[email protected]> wrote:
> H.K. Jerry Chu wrote:
>>
>> On Fri, Jul 16, 2010 at 10:01 AM, Patrick McManus <[email protected]>
>> wrote:
>>>
>>> can you tell us more about the impl concerns of initcwnd stored on the
>>> route?
>>
>>
>> We have found two issues when altering initcwnd through the ip route cmd:
>> 1. initcwnd is actually capped by sndbuf (i.e., tcp_wmem[1], which is
>> defaulted to a small value of 16KB). This problem has been made obscured
>> by the TSO code, which fudges the flow control limit (and could be a bug
>> by
>> itself).
>
> I'll ask my Emily Litella question of the day and inquire as to why that
> would be unique to altering initcwnd via the route?
>
> The slightly less Emily Litella-esque question is why an appliction with a
> desire to know it could send more than 16K at one time wouldn't have either
> asked via its install docs to have the minimum tweaked (certainly if one is
> already tweaking routes...), or "gone all the way" and made an explicit
> setsockopt(SO_SNDBUF) call? ?We are in a realm of applications for which
> there was a proposal to allow them to pick their own initcwnd right? ?Having

Per app setting of initcwnd is just one case. Another is per route setting of
initcwnd basis through the ip route cmd. For the latter the initcwnd change is
more or less supposed to be transparent to apps.

This wasn't a big issue and can probably be easily fixed by
initializing sk_sndbuf
to max(tcp_wmem[1], initcwnd) as you alluded to below. It is just our
experiements got hindered by this little bug but we weren't aware of it sooner
due to TSO fudging sndbuf.

Jerry

> them pick an SO_SNDBUF size would seem to be no more to ask.
>
> rick jones
>
> sendbuf_init = max(tcp_mem,initcwnd)?
>

2010-07-19 23:42:19