This patch will make TCP use only linear timeouts if the stream is thin. This will help to avoid the very high latencies that thin stream suffer because of exponential backoff. This mechanism is only active if enabled by iocontrol or syscontrol and the stream is identified as thin.
Signed-off-by: Andreas Petlund <[email protected]>
---
include/linux/tcp.h | 3 +++
include/net/tcp.h | 1 +
net/ipv4/sysctl_net_ipv4.c | 8 ++++++++
net/ipv4/tcp.c | 5 +++++
net/ipv4/tcp_timer.c | 17 ++++++++++++++++-
5 files changed, 33 insertions(+), 1 deletions(-)
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 61723a7..e64368d 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -96,6 +96,7 @@ enum {
#define TCP_QUICKACK 12 /* Block/reenable quick acks */
#define TCP_CONGESTION 13 /* Congestion control algorithm */
#define TCP_MD5SIG 14 /* TCP MD5 Signature (RFC2385) */
+#define TCP_THIN_RM_EXPB 15 /* Remove exp. backoff for thin streams*/
#define TCPI_OPT_TIMESTAMPS 1
#define TCPI_OPT_SACK 2
@@ -299,6 +300,8 @@ struct tcp_sock {
u16 advmss; /* Advertised MSS */
u8 frto_counter; /* Number of new acks after RTO */
u8 nonagle; /* Disable Nagle algorithm? */
+ u8 thin_rm_expb:1, /* Remove exp. backoff for thin streams */
+ thin_undef : 7;
/* RTT measurement */
u32 srtt; /* smoothed round trip time << 3 */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7c4482f..412c1bd 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -237,6 +237,7 @@ extern int sysctl_tcp_base_mss;
extern int sysctl_tcp_workaround_signed_windows;
extern int sysctl_tcp_slow_start_after_idle;
extern int sysctl_tcp_max_ssthresh;
+extern int sysctl_tcp_force_thin_rm_expb;
extern atomic_t tcp_memory_allocated;
extern struct percpu_counter tcp_sockets_allocated;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 2dcf04d..7458f37 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -713,6 +713,14 @@ static struct ctl_table ipv4_table[] = {
.proc_handler = proc_dointvec,
},
{
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "tcp_force_thin_rm_expb",
+ .data = &sysctl_tcp_force_thin_rm_expb,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec
+ },
+ {
.ctl_name = CTL_UNNUMBERED,
.procname = "udp_mem",
.data = &sysctl_udp_mem,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 90b2e06..b4b0931 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2134,6 +2134,11 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
}
break;
+ case TCP_THIN_RM_EXPB:
+ if (val)
+ tp->thin_rm_expb = 1;
+ break;
+
case TCP_CORK:
/* When set indicates to always queue non-full frames.
* Later the user clears this option and we transmit
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index cdb2ca7..24d6dc3 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -29,6 +29,7 @@ int sysctl_tcp_keepalive_intvl __read_mostly = TCP_KEEPALIVE_INTVL;
int sysctl_tcp_retries1 __read_mostly = TCP_RETR1;
int sysctl_tcp_retries2 __read_mostly = TCP_RETR2;
int sysctl_tcp_orphan_retries __read_mostly;
+int sysctl_tcp_force_thin_rm_expb __read_mostly;
static void tcp_write_timer(unsigned long);
static void tcp_delack_timer(unsigned long);
@@ -386,7 +387,21 @@ void tcp_retransmit_timer(struct sock *sk)
icsk->icsk_retransmits++;
out_reset_timer:
- icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
+ if ((tp->thin_rm_expb || sysctl_tcp_force_thin_rm_expb) &&
+ tcp_stream_is_thin(tp) && sk->sk_state == TCP_ESTABLISHED) {
+ /* If stream is thin, remove exponential backoff.
+ * Since 'icsk_backoff' is used to reset timer, set to 0
+ * Recalculate 'icsk_rto' as this might be increased if
+ * stream oscillates between thin and thick, thus the old
+ * value might already be too high compared to the value
+ * set by 'tcp_set_rto' in tcp_input.c which resets the
+ * rto without backoff. */
+ icsk->icsk_backoff = 0;
+ icsk->icsk_rto = min(((tp->srtt >> 3) + tp->rttvar), TCP_RTO_MAX);
+ } else {
+ /* Use normal backoff */
+ icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
+ }
inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, TCP_RTO_MAX);
if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1))
__sk_dst_reset(sk);
--
1.6.0.4
Andreas Petlund a ?crit :
> This patch will make TCP use only linear timeouts if the stream is thin. This will help to avoid the very high latencies that thin stream suffer because of exponential backoff. This mechanism is only active if enabled by iocontrol or syscontrol and the stream is identified as thin.
>
Wont this reduce the session timeout to something very small, ie 15 retransmits, way under the minute ?
Sorry to be too picky about the naming, but "rm_expb" really doesn't
mean what is actually done. Perhaps TCP_THIN_LINEAR_BACKOFF and
sysctl_tcp_thin_linear_backoff?
Also, as debated on some other recent patches, shouldn't the global
sysctl specify the default, and the per socket option specify the
forced override?
Eric Dumazet schrieb:
> Andreas Petlund a ?crit :
>> This patch will make TCP use only linear timeouts if the stream is thin. This will help to avoid the very high latencies that thin stream suffer because of exponential backoff. This mechanism is only active if enabled by iocontrol or syscontrol and the stream is identified as thin.
>>
>
> Wont this reduce the session timeout to something very small, ie 15 retransmits, way under the minute ?
The session timeout no longer depends on the actual number of retransmits. Instead its a time interval,
which is roughly equivalent to the time a TCP, performing exponential backoff would need to perform
15 retransmits.
However, addressing the proposal:
I wonder how one can seriously suggest to just skip congestion response during timeout-based
loss recovery? I believe that in a heavily congested scenarios, this would lead to a goodput
goodput disaster... Not to mention that in a heavily congested scenario, suddenly every flow
will become "thin", so this will even amplify the problems. Or did I miss something?
Best regards,
Arnd
On Tue, 27 Oct 2009, Andreas Petlund wrote:
> This patch will make TCP use only linear timeouts if the stream is thin. This will help to avoid the very high latencies that thin stream suffer because of exponential backoff. This mechanism is only active if enabled by iocontrol or syscontrol and the stream is identified as thin.
>
>
> Signed-off-by: Andreas Petlund <[email protected]>
> ---
> include/linux/tcp.h | 3 +++
> include/net/tcp.h | 1 +
> net/ipv4/sysctl_net_ipv4.c | 8 ++++++++
> net/ipv4/tcp.c | 5 +++++
> net/ipv4/tcp_timer.c | 17 ++++++++++++++++-
> 5 files changed, 33 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index 61723a7..e64368d 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -96,6 +96,7 @@ enum {
> #define TCP_QUICKACK 12 /* Block/reenable quick acks */
> #define TCP_CONGESTION 13 /* Congestion control algorithm */
> #define TCP_MD5SIG 14 /* TCP MD5 Signature (RFC2385) */
> +#define TCP_THIN_RM_EXPB 15 /* Remove exp. backoff for thin streams*/
>
> #define TCPI_OPT_TIMESTAMPS 1
> #define TCPI_OPT_SACK 2
> @@ -299,6 +300,8 @@ struct tcp_sock {
> u16 advmss; /* Advertised MSS */
> u8 frto_counter; /* Number of new acks after RTO */
> u8 nonagle; /* Disable Nagle algorithm? */
> + u8 thin_rm_expb:1, /* Remove exp. backoff for thin streams */
> + thin_undef : 7;
>
> /* RTT measurement */
> u32 srtt; /* smoothed round trip time << 3 */
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 7c4482f..412c1bd 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -237,6 +237,7 @@ extern int sysctl_tcp_base_mss;
> extern int sysctl_tcp_workaround_signed_windows;
> extern int sysctl_tcp_slow_start_after_idle;
> extern int sysctl_tcp_max_ssthresh;
> +extern int sysctl_tcp_force_thin_rm_expb;
>
> extern atomic_t tcp_memory_allocated;
> extern struct percpu_counter tcp_sockets_allocated;
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index 2dcf04d..7458f37 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -713,6 +713,14 @@ static struct ctl_table ipv4_table[] = {
> .proc_handler = proc_dointvec,
> },
> {
> + .ctl_name = CTL_UNNUMBERED,
> + .procname = "tcp_force_thin_rm_expb",
> + .data = &sysctl_tcp_force_thin_rm_expb,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec
> + },
> + {
> .ctl_name = CTL_UNNUMBERED,
> .procname = "udp_mem",
> .data = &sysctl_udp_mem,
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 90b2e06..b4b0931 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2134,6 +2134,11 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
> }
> break;
>
> + case TCP_THIN_RM_EXPB:
> + if (val)
> + tp->thin_rm_expb = 1;
> + break;
> +
> case TCP_CORK:
> /* When set indicates to always queue non-full frames.
> * Later the user clears this option and we transmit
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index cdb2ca7..24d6dc3 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -29,6 +29,7 @@ int sysctl_tcp_keepalive_intvl __read_mostly = TCP_KEEPALIVE_INTVL;
> int sysctl_tcp_retries1 __read_mostly = TCP_RETR1;
> int sysctl_tcp_retries2 __read_mostly = TCP_RETR2;
> int sysctl_tcp_orphan_retries __read_mostly;
> +int sysctl_tcp_force_thin_rm_expb __read_mostly;
>
> static void tcp_write_timer(unsigned long);
> static void tcp_delack_timer(unsigned long);
> @@ -386,7 +387,21 @@ void tcp_retransmit_timer(struct sock *sk)
> icsk->icsk_retransmits++;
>
> out_reset_timer:
> - icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
> + if ((tp->thin_rm_expb || sysctl_tcp_force_thin_rm_expb) &&
> + tcp_stream_is_thin(tp) && sk->sk_state == TCP_ESTABLISHED) {
> + /* If stream is thin, remove exponential backoff.
> + * Since 'icsk_backoff' is used to reset timer, set to 0
> + * Recalculate 'icsk_rto' as this might be increased if
> + * stream oscillates between thin and thick, thus the old
> + * value might already be too high compared to the value
> + * set by 'tcp_set_rto' in tcp_input.c which resets the
> + * rto without backoff. */
> + icsk->icsk_backoff = 0;
> + icsk->icsk_rto = min(((tp->srtt >> 3) + tp->rttvar), TCP_RTO_MAX);
The first part is nowadays done with __tcp_set_rto(tp).
--
i.
On Wed, 28 Oct 2009, Arnd Hannemann wrote:
> Eric Dumazet schrieb:
> > Andreas Petlund a ?crit :
> >> This patch will make TCP use only linear timeouts if the stream is
> >> thin. This will help to avoid the very high latencies that thin
> >> stream suffer because of exponential backoff. This mechanism is only
> >> active if enabled by iocontrol or syscontrol and the stream is
> >> identified as thin.
...I don't see how high latency is in any connection to stream being
"thin" or not btw. If all ACKs are lost it usually requires silence for
the full RTT, which affects a stream regardless of its size. ...If not all
ACKs are lost, then the dupACK approach in the other patch should cover
it already.
> However, addressing the proposal:
> I wonder how one can seriously suggest to just skip congestion response
> during timeout-based loss recovery? I believe that in a heavily
> congested scenarios, this would lead to a goodput disaster... Not to
> mention that in a heavily congested scenario, suddenly every flow will
> become "thin", so this will even amplify the problems. Or did I miss
> something?
Good point. I suppose such an under-provisioned network can certainly be
there. I have heard that at least some people who remove exponential back
off apply it later on nth retransmission as very often there really isn't
such a super heavy congestion scenario but something completely unrelated
to congestion which causes the RTO.
--
i.
Den 28. okt. 2009 kl. 04.20 skrev William Allen Simpson:
> Sorry to be too picky about the naming, but "rm_expb" really doesn't
> mean what is actually done. Perhaps TCP_THIN_LINEAR_BACKOFF and
> sysctl_tcp_thin_linear_backoff?
>
I agree that the name should be changed. As it represents linear
timeouts and does not back off exponentially, I suggest
TCP_THIN_LINEAR_TIMEOUTS and sysctl_tcp_thin_linear_timeouts.
> Also, as debated on some other recent patches, shouldn't the global
> sysctl specify the default, and the per socket option specify the
> forced override?
The rationale behind the suggested model is to allow the modifications
to be forced active in cases where the application that generates the
thin stream is proprietary, legacy, or that the code for other reasons
can not be modified (as is the case, for instance, for many game
clients).
Den 28. okt. 2009 kl. 15.31 skrev Ilpo J?rvinen:
> On Wed, 28 Oct 2009, Arnd Hannemann wrote:
>
>> Eric Dumazet schrieb:
>>> Andreas Petlund a ?crit :
>>>> This patch will make TCP use only linear timeouts if the stream is
>>>> thin. This will help to avoid the very high latencies that thin
>>>> stream suffer because of exponential backoff. This mechanism is
>>>> only
>>>> active if enabled by iocontrol or syscontrol and the stream is
>>>> identified as thin.
>
> ...I don't see how high latency is in any connection to stream being
> "thin" or not btw. If all ACKs are lost it usually requires silence
> for
> the full RTT, which affects a stream regardless of its size. ...If
> not all
> ACKs are lost, then the dupACK approach in the other patch should
> cover
> it already.
>
The increased latency that we observed does not arise from lost ACKs,
but from the lack of enough packets in flight to be able to trigger
fast retransmits. This effectively limits the retransmission options
to retransmission by timeout, which again will increase exponentially
with each subsequent retransmissions. We have also found that the
"thin" stream patterns are very often generated by applications where
human interaction is the purpose. Such applications will give a
degraded experience to the user if such high latencies happen often.
In-depth discussion of these effects can be found in the papers I
linked to.
If the application produces less than one packet per RTT, the dupACK-
modification will be ineffective and any improved latency will be from
linear timeouts. If the number of packets in flight are 2-4, no fast
retransmissions may be triggered based on the 3 dupACK scheme, but a
retransmission upon the first indication of loss will improve
retransmission latency.
>> However, addressing the proposal:
>> I wonder how one can seriously suggest to just skip congestion
>> response
>> during timeout-based loss recovery? I believe that in a heavily
>> congested scenarios, this would lead to a goodput disaster... Not to
>> mention that in a heavily congested scenario, suddenly every flow
>> will
>> become "thin", so this will even amplify the problems. Or did I miss
>> something?
>
> Good point. I suppose such an under-provisioned network can
> certainly be
> there. I have heard that at least some people who remove exponential
> back
> off apply it later on nth retransmission as very often there really
> isn't
> such a super heavy congestion scenario but something completely
> unrelated
> to congestion which causes the RTO.
>
> --
> i.
The removal of exponential backoff on a general basis has been
investigated and discussed already, for instance here:
http://ccr.sigcomm.org/online/?q=node/416
Such steps are, however considered drastic, and I agree that caution
must be made to thoroughly investigate the effects of such changes.
The changes introduced by the proposed patches, however, are not
default behaviour, but an option for applications that suffer from the
thin-stream TCP increased retransmission latencies. They will, as
such, not affect all streams. In addition, the changes will only be
active for streams which are perpetually thin or in the early phase of
expanding their cwnd. Also, experiments performed on congested
bottlenecks with tail-drop queues show very little (if any at all)
effect on goodput for the modified scenario compared to a scenario
with unmodified TCP streams.
Graphs both for latency-results and fairness tests can be found here:
http://folk.uio.no/apetlund/lktmp/
-AP-
Andreas Petlund a ?crit :
>
> The removal of exponential backoff on a general basis has been
> investigated and discussed already, for instance here:
> http://ccr.sigcomm.org/online/?q=node/416
> Such steps are, however considered drastic, and I agree that caution
> must be made to thoroughly investigate the effects of such changes.
> The changes introduced by the proposed patches, however, are not default
> behaviour, but an option for applications that suffer from the
> thin-stream TCP increased retransmission latencies. They will, as such,
> not affect all streams. In addition, the changes will only be active for
> streams which are perpetually thin or in the early phase of expanding
> their cwnd. Also, experiments performed on congested bottlenecks with
> tail-drop queues show very little (if any at all) effect on goodput for
> the modified scenario compared to a scenario with unmodified TCP streams.
>
> Graphs both for latency-results and fairness tests can be found here:
> http://folk.uio.no/apetlund/lktmp/
>
There should be a limit to linear timeouts, to say ... no more than 6 retransmits
(eventually tunable), then switch to exponential backoff. Maybe your patch
already implement such heuristic ?
True link collapses do happen, it would be good if not all streams wakeup in the same
second and make recovery very slow.
Thats too easy to accept possibly dangerous features with the excuse of saying
"It wont be used very much", because you cannot predict the future.
I apologise that some of you received this mail more than once. My email
client played a HTML-trick on me.
>> + icsk->icsk_backoff = 0;
>> + icsk->icsk_rto = min(((tp->srtt >> 3) + tp->rttvar), TCP_RTO_MAX);
>
> The first part is nowadays done with __tcp_set_rto(tp).
>
> --
> i.
>
I will address this in the next iteration of the patch.
-AP
I apologise that some of you received this mail more than once. My email
client played a HTML-trick on me.
> Eric Dumazet schrieb:
>> Andreas Petlund a ?crit :
>>> This patch will make TCP use only linear timeouts if the stream is
thin. This will help to avoid the very high latencies that thin stream
suffer because of exponential backoff. This mechanism is only active
if
>>> enabled by iocontrol or syscontrol and the stream is identified as thin.
>> Wont this reduce the session timeout to something very small, ie 15
retransmits, way under the minute ?
>
> The session timeout no longer depends on the actual number of
retransmits.
> Instead its a time interval,
> which is roughly equivalent to the time a TCP, performing exponential
backoff would need to perform
> 15 retransmits.
>
> However, addressing the proposal:
> I wonder how one can seriously suggest to just skip congestion response
during timeout-based
> loss recovery? I believe that in a heavily congested scenarios, this
would
> lead to a goodput
> goodput disaster... Not to mention that in a heavily congested scenario,
suddenly every flow
> will become "thin", so this will even amplify the problems. Or did I
miss
> something?
We have found no noticeable degradation of the goodput in a series of
experiments we have performed in order to map the effects of the
modifications. Furthermore, the modifications implemented in the patches
are explicitly enabled only for applications where the developer knows
that streams will be thin, thus only a small subset of the streams will
apply the modifications.
Graphs presenting results from experiments performed to analyse latency
and fairness issues can be found here:
http://folk.uio.no/apetlund/lktmp/
-AP
> Andreas Petlund a ?crit :
>
>> The removal of exponential backoff on a general basis has been
>> investigated and discussed already, for instance here:
>> http://ccr.sigcomm.org/online/?q=node/416
>> Such steps are, however considered drastic, and I agree that caution
must be made to thoroughly investigate the effects of such changes. The
changes introduced by the proposed patches, however, are not
default
>> behaviour, but an option for applications that suffer from the
>> thin-stream TCP increased retransmission latencies. They will, as such,
not affect all streams. In addition, the changes will only be active
for
>> streams which are perpetually thin or in the early phase of expanding
their cwnd. Also, experiments performed on congested bottlenecks with
tail-drop queues show very little (if any at all) effect on goodput for
the modified scenario compared to a scenario with unmodified TCP
streams.
>> Graphs both for latency-results and fairness tests can be found here:
http://folk.uio.no/apetlund/lktmp/
>
> There should be a limit to linear timeouts, to say ... no more than 6
retransmits
> (eventually tunable), then switch to exponential backoff. Maybe your
patch
> already implement such heuristic ?
>
The limitation you suggest to the linear timeouts makes very good sense.
Our experiments performed on the Internet indicate that it is extremely
rare that more than 6 retransmissions are needed to recover. It is not
included in the current patch, so I will include this in the next
iteration.
> True link collapses do happen, it would be good if not all streams
wakeup
> in the same
> second and make recovery very slow.
>
Each stream will have its own schedule for wakeup, so such events will
still be subject to coincidence. The timer granularity of the TCP wakeup
timer will also influence how many streams will wake at the same time. The
experiments we have performed on severely congested bottlenecks (link
above) indicate that the modifications will not create a large negative
effect. In fact, when goodput is drastically reduced due to severe
overload, regular TCP and the LT and dupACK modifications seem to perform
nearly identically. Other scenarios may exist where different effects can
be observed, and I am open to suggestions for further testing.
> Thats too easy to accept possibly dangerous features with the excuse of
saying
> "It wont be used very much", because you cannot predict the future.
I agree that it is no argument to say that it won't be used much; indeed,
my hope is that it will be used much. However, our experiments indicate no
negative effects while showing a large improvement on retransmission
latency for the scenario in question. I therefore think that the option
for such an improvement should be made available for time-dependent
thin-stream applications.
-AP
[email protected] a ?crit :
>> Andreas Petlund a ?crit :
>> There should be a limit to linear timeouts, to say ... no more than 6
> retransmits
>> (eventually tunable), then switch to exponential backoff. Maybe your
> patch
>> already implement such heuristic ?
>>
>
> The limitation you suggest to the linear timeouts makes very good sense.
> Our experiments performed on the Internet indicate that it is extremely
> rare that more than 6 retransmissions are needed to recover. It is not
> included in the current patch, so I will include this in the next
> iteration.
>
>> True link collapses do happen, it would be good if not all streams
> wakeup
>> in the same
>> second and make recovery very slow.
>>
>
> Each stream will have its own schedule for wakeup, so such events will
> still be subject to coincidence. The timer granularity of the TCP wakeup
> timer will also influence how many streams will wake at the same time. The
> experiments we have performed on severely congested bottlenecks (link
> above) indicate that the modifications will not create a large negative
> effect. In fact, when goodput is drastically reduced due to severe
> overload, regular TCP and the LT and dupACK modifications seem to perform
> nearly identically. Other scenarios may exist where different effects can
> be observed, and I am open to suggestions for further testing.
>
>> Thats too easy to accept possibly dangerous features with the excuse of
> saying
>> "It wont be used very much", because you cannot predict the future.
>
> I agree that it is no argument to say that it won't be used much; indeed,
> my hope is that it will be used much. However, our experiments indicate no
> negative effects while showing a large improvement on retransmission
> latency for the scenario in question. I therefore think that the option
> for such an improvement should be made available for time-dependent
> thin-stream applications.
>
Thanks ! I must say I am very interested by these experiments, I am looking
forward your next iteration.
Andreas Petlund schrieb:
> We have found no noticeable degradation of the goodput in a series of
> experiments we have performed in order to map the effects of the
> modifications. Furthermore, the modifications implemented in the patches
> are explicitly enabled only for applications where the developer knows
> that streams will be thin, thus only a small subset of the streams will
> apply the modifications.
>
> Graphs presenting results from experiments performed to analyse latency
> and fairness issues can be found here:
> http://folk.uio.no/apetlund/lktmp/
How often did you hit consecutive RTOs in these measurements?
As I see you did a measurement with 512 thick vs. 512 thin streams.
Lets do a hypothetical calculation with only 512 "thin" streams.
Lets further assume the rtt is low, so that RTO is around 200ms.
Assume each segment has 128 Bytes (already very small...).
Assume after a period of normal operation all streams are in
timeout-based loss recovery. (e.g. because destination endpoint
suddenly behaves like a black hole)
As all streams are in timeout-based loss recovery, each stream
will transmit 5 segments each second with your modification.
This would result in a throughput around 512*5*1024bit = 2560 kbit/s
and a goodput of 0 kbit/s (because the receiver is a black hole).
So you can easily saturate a 2 MBit/s link, only with retransmissions.
Unfortunately in Germany an ADSL uplink of 786 kbit/s is still quite
common, and its already called "broadband"...
Regarding the "small subset", why have a global sysctl option, then?
And I think "tcp_stream_is_thin(tp)" will be true for every flow
in the RTO case, at least for consecutive RTOs.
Best regards,
Arnd Hannemann
> Andreas Petlund schrieb:
>> We have found no noticeable degradation of the goodput in a series of
experiments we have performed in order to map the effects of the
modifications. Furthermore, the modifications implemented in the
patches
>> are explicitly enabled only for applications where the developer knows
that streams will be thin, thus only a small subset of the streams will
apply the modifications.
>> Graphs presenting results from experiments performed to analyse latency
and fairness issues can be found here:
>> http://folk.uio.no/apetlund/lktmp/
>
> How often did you hit consecutive RTOs in these measurements?
> As I see you did a measurement with 512 thick vs. 512 thin streams. Lets
do a hypothetical calculation with only 512 "thin" streams. Lets further
assume the rtt is low, so that RTO is around 200ms. Assume each segment
has 128 Bytes (already very small...).
> Assume after a period of normal operation all streams are in
> timeout-based loss recovery. (e.g. because destination endpoint
> suddenly behaves like a black hole)
> As all streams are in timeout-based loss recovery, each stream
> will transmit 5 segments each second with your modification.
> This would result in a throughput around 512*5*1024bit = 2560 kbit/s and
a goodput of 0 kbit/s (because the receiver is a black hole). So you can
easily saturate a 2 MBit/s link, only with retransmissions.
I have not yet performed experiments where the receiver becomes a black
hole, but I recognise the problem. Eric Dumazet suggested that the
mechanism switch to exponential backoff after 6 linear retries. This would
avoid situation where the link stays congested indefinitely, and I will
implement this in the next iteration.
> Unfortunately in Germany an ADSL uplink of 786 kbit/s is still quite
common, and its already called "broadband"...
I believe that a subscriber for such an uplink would not keep several
hundred thin-stream connections, though accidents do happen.
> Regarding the "small subset", why have a global sysctl option, then? And
I think "tcp_stream_is_thin(tp)" will be true for every flow in the RTO
case, at least for consecutive RTOs.
The sysctl is ment for cases of proprietary code that will benefit from
the modifications. In our experiments, we have found it useful in many
cases for such applications (like game clients).
Regards,
Andreas
Just how thin can a thin stream be when a thin stream is found thin? (to the
cadence of "How much wood could a woodchuck chuck if a woodchuck could chuck wood?")
Does a stream get so thin that a user's send could not be split into four,
sub-MSS TCP segments?
rick jones
On Thu, 29 Oct 2009, [email protected] wrote:
> > Andreas Petlund a ?crit :
> >
> >> The removal of exponential backoff on a general basis has been
> >> investigated and discussed already, for instance here:
> >> http://ccr.sigcomm.org/online/?q=node/416
> >> Such steps are, however considered drastic, and I agree that caution
> must be made to thoroughly investigate the effects of such changes. The
> changes introduced by the proposed patches, however, are not
> default
> >> behaviour, but an option for applications that suffer from the
> >> thin-stream TCP increased retransmission latencies. They will, as such,
> not affect all streams. In addition, the changes will only be active
> for
> >> streams which are perpetually thin or in the early phase of expanding
> their cwnd. Also, experiments performed on congested bottlenecks with
> tail-drop queues show very little (if any at all) effect on goodput for
> the modified scenario compared to a scenario with unmodified TCP
> streams.
> >> Graphs both for latency-results and fairness tests can be found here:
> http://folk.uio.no/apetlund/lktmp/
> >
> > There should be a limit to linear timeouts, to say ... no more than 6
> retransmits
> > (eventually tunable), then switch to exponential backoff. Maybe your
> patch
> > already implement such heuristic ?
>
> The limitation you suggest to the linear timeouts makes very good sense.
> Our experiments performed on the Internet indicate that it is extremely
> rare that more than 6 retransmissions are needed to recover. It is not
> included in the current patch, so I will include this in the next
> iteration.
I've heard that BSD would use linear for first three and then exponential
but this is based on some gossip (which could well turn out to be a myth)
rather than checking it out myself. But if it is true, it certainly hasn't
been that devastating.
> > True link collapses do happen, it would be good if not all streams
> wakeup
> > in the same
> > second and make recovery very slow.
> >
>
> Each stream will have its own schedule for wakeup, so such events will
> still be subject to coincidence. The timer granularity of the TCP wakeup
> timer will also influence how many streams will wake at the same time. The
> experiments we have performed on severely congested bottlenecks (link
> above) indicate that the modifications will not create a large negative
> effect. In fact, when goodput is drastically reduced due to severe
> overload, regular TCP and the LT and dupACK modifications seem to perform
> nearly identically. Other scenarios may exist where different effects can
> be observed, and I am open to suggestions for further testing.
Could you point out where exactly where the goodput results? ...I only
seem to find latency results which is not exactly the same. I don't except
some that is in order of what Nagle talks (32kbps -> 40bps irc) but 10-50%
goodput reduction over a relatively short period of time (until RTTs top
RTOs once again preventing spurious RTOs and thus also segment duplication
due to retransmissions ceases).
Were these results obtained with Linux, and if so what was FRTO set to?
> > Thats too easy to accept possibly dangerous features with the excuse of
> saying
> > "It wont be used very much", because you cannot predict the future.
>
> I agree that it is no argument to say that it won't be used much; indeed,
> my hope is that it will be used much. However, our experiments indicate no
> negative effects while showing a large improvement on retransmission
> latency for the scenario in question. I therefore think that the option
> for such an improvement should be made available for time-dependent
> thin-stream applications.
Everyone can right away tell that most RTOs are not due to extreme
congestion, so some linear back off seems sensible when dupACK feedback
is lacking for some reason. Of course it is a tradeoff as there's that
chance for getting 1/(n+1) goodput only (where n is the number of linear
steps) step if RTOs were spurious (and without FRTO even more unnecessary
retransmission will be triggered so in fact even could be slightly worse
in theory). But that to happen in the first place requires of course this
RTT > RTO situation which is hard to see to be a persisting state.
--
i.
> Just how thin can a thin stream be when a thin stream is found thin? (to
the
> cadence of "How much wood could a woodchuck chuck if a woodchuck could
chuck wood?")
>
> Does a stream get so thin that a user's send could not be split into
four,
> sub-MSS TCP segments?
That was a nifty idea: Anti-Nagle the segments to be able to trigger fast
retransmissions. I think it is possible.
Besides using more resources on each send, this scheme will introduce the
need to delay parts of the segment, which is undesirable for
time-dependent applications (the intended target of the mechanisms).
I think it would be fun to implement and play around with such a mechanism
to see the effects.
Regards,
Andreas
> On Thu, 29 Oct 2009, [email protected] wrote:
>
>> > Andreas Petlund a ?crit :
>> >
>> >> The removal of exponential backoff on a general basis has been
investigated and discussed already, for instance here:
>> >> http://ccr.sigcomm.org/online/?q=node/416
>> >> Such steps are, however considered drastic, and I agree that caution
>> must be made to thoroughly investigate the effects of such changes. The
changes introduced by the proposed patches, however, are not
>> default
>> >> behaviour, but an option for applications that suffer from the
thin-stream TCP increased retransmission latencies. They will, as
>> such,
>> not affect all streams. In addition, the changes will only be active for
>> >> streams which are perpetually thin or in the early phase of
expanding
>> their cwnd. Also, experiments performed on congested bottlenecks with
tail-drop queues show very little (if any at all) effect on goodput for
the modified scenario compared to a scenario with unmodified TCP
streams.
>> >> Graphs both for latency-results and fairness tests can be found
here:
>> http://folk.uio.no/apetlund/lktmp/
>> >
>> > There should be a limit to linear timeouts, to say ... no more than 6
>> retransmits
>> > (eventually tunable), then switch to exponential backoff. Maybe your
>> patch
>> > already implement such heuristic ?
>> The limitation you suggest to the linear timeouts makes very good
sense.
>> Our experiments performed on the Internet indicate that it is extremely
rare that more than 6 retransmissions are needed to recover. It is not
included in the current patch, so I will include this in the next
iteration.
>
> I've heard that BSD would use linear for first three and then
exponential
> but this is based on some gossip (which could well turn out to be a
myth)
> rather than checking it out myself. But if it is true, it certainly
hasn't
> been that devastating.
>> > True link collapses do happen, it would be good if not all streams
>> wakeup
>> > in the same
>> > second and make recovery very slow.
>> >
>> Each stream will have its own schedule for wakeup, so such events will
still be subject to coincidence. The timer granularity of the TCP
wakeup
>> timer will also influence how many streams will wake at the same time. The
>> experiments we have performed on severely congested bottlenecks (link
above) indicate that the modifications will not create a large negative
effect. In fact, when goodput is drastically reduced due to severe
overload, regular TCP and the LT and dupACK modifications seem to
perform
>> nearly identically. Other scenarios may exist where different effects can
>> be observed, and I am open to suggestions for further testing.
>
> Could you point out where exactly where the goodput results? ...I only
seem to find latency results which is not exactly the same. I don't
except
> some that is in order of what Nagle talks (32kbps -> 40bps irc) but
10-50%
> goodput reduction over a relatively short period of time (until RTTs top
RTOs once again preventing spurious RTOs and thus also segment
duplication
> due to retransmissions ceases).
The plot can be found here:
http://folk.uio.no/apetlund/lktmp/n-vs-n-fairness.pdf
I'm sorry that I didn't explain at once, as the parameters and setup is
not obvious. The boxplot shows aggregate throughput of all the unmodified,
greedy TCP New Reno streams when competing with thin streams using TCP New
Reno, linear timeouts, modified dupACK, RDB (which is not included this
patch set) and the combination of all the modifications. The streams
compete for a 1Mbps bottleneck that use tc with a tail-dropping queue to
limit bandwidth and netem to create loss and delay.
The RTT for the test is 100ms and the packet interarrival time for the
thin streams are 85ms.
> Were these results obtained with Linux, and if so what was FRTO set to?
The results are from our Linux implementation of the mechanisms. FRTO was
disabled and Nagle was disabled for all test sets.
>> > Thats too easy to accept possibly dangerous features with the excuse
>> of
>> saying
>> > "It wont be used very much", because you cannot predict the future.
>> I agree that it is no argument to say that it won't be used much; indeed,
>> my hope is that it will be used much. However, our experiments indicate no
>> negative effects while showing a large improvement on retransmission
latency for the scenario in question. I therefore think that the option
for such an improvement should be made available for time-dependent
thin-stream applications.
>
> Everyone can right away tell that most RTOs are not due to extreme
congestion, so some linear back off seems sensible when dupACK feedback
is lacking for some reason. Of course it is a tradeoff as there's that
chance for getting 1/(n+1) goodput only (where n is the number of linear
steps) step if RTOs were spurious (and without FRTO even more
unnecessary
> retransmission will be triggered so in fact even could be slightly worse
in theory). But that to happen in the first place requires of course
this
> RTT > RTO situation which is hard to see to be a persisting state.
Actually, we have found the low number of packets in flight to be a
persisting state in a large amount of applications that are interactive or
time-dependent. Some examples can be found in the table linked to below:
http://folk.uio.no/apetlund/lktmp/thin_apps_table.pdf
It seems that human interaction, sensor networks, and several other
scenarios that are not inherently greedy will produce a steady trickle of
data segments that fall into the "thin-stream" category and stays there.
Regards,
Andreas
[email protected] wrote:
>> Just how thin can a thin stream be when a thin stream is found thin? (to
>> the cadence of "How much wood could a woodchuck chuck if a woodchuck could
>> chuck wood?")
>> Does a stream get so thin that a user's send could not be split into four,
>> sub-MSS TCP segments?
>
>
> That was a nifty idea: Anti-Nagle the segments to be able to trigger fast
> retransmissions. I think it is possible.
>
> Besides using more resources on each send, this scheme will introduce the
> need to delay parts of the segment, which is undesirable for
> time-dependent applications (the intended target of the mechanisms).
>
> I think it would be fun to implement and play around with such a mechanism
> to see the effects.
Indeed, it does feel a bit "anti-nagle" but at the same time, these thin streams
are supposed to be quite rare right? I mean we have survived 20 odd years of
congestion control and fast retransmission without it being a big issue.
They are also supposed to not have terribly high bandwidth requirements yes?
Suppose that instead of an explicit "I promise to be thin" setsockopt(), they
instead set a Very Small (tm) in today's thinking socket buffer size and the
stack then picks the MSS to be no more than 1/4 that size? Or for that matter,
assuming the permissions are acceptable, the thin application makes a
setsockopt(TCP_MAXSEG) call such that the actual MSS is small enough to allow
the send()'s to be four (or more) segments. And, if one wants to spin-away the
anti-Nagle, Nagle is defined by the send() being smaller than the MSS, so if the
MSS is smaller, it isn't anti-Nagle :)
Further blue-skying...
If SACK were also enabled, it would seem that only loss of the last segment in
the "thin train" would be an issue? Presumably, the thin stream receiver would
be in a position to detect this, perhaps with an application-level timeout.
Whether then it would suffice to allow the receiving app to make a setsockopt()
call to force an extra ACK or two I'm not sure. Perhaps if the thin-stream had
a semi-aggressive "heartbeat" going...
But it does seem that it should be possible to deal with this sort of thing
without having to make wholesale changes to TCP's RTO policies and whatnot?
rick jones
Rick Jones wrote:
> [email protected] wrote:
>>> Just how thin can a thin stream be when a thin stream is found thin?
>>> (to the cadence of "How much wood could a woodchuck chuck if a
>>> woodchuck could chuck wood?")
>
>>> Does a stream get so thin that a user's send could not be split into
>>> four,
>>> sub-MSS TCP segments?
>>
>>
>> That was a nifty idea: Anti-Nagle the segments to be able to trigger fast
>> retransmissions. I think it is possible.
>>
>> Besides using more resources on each send, this scheme will introduce the
>> need to delay parts of the segment, which is undesirable for
>> time-dependent applications (the intended target of the mechanisms).
>>
>> I think it would be fun to implement and play around with such a
>> mechanism
>> to see the effects.
>
> Indeed, it does feel a bit "anti-nagle" but at the same time, these thin
> streams are supposed to be quite rare right? I mean we have survived 20
> odd years of congestion control and fast retransmission without it being
> a big issue.
>
> They are also supposed to not have terribly high bandwidth requirements
> yes? Suppose that instead of an explicit "I promise to be thin"
> setsockopt(), they instead set a Very Small (tm) in today's thinking
> socket buffer size and the stack then picks the MSS to be no more than
> 1/4 that size? Or for that matter, assuming the permissions are
> acceptable, the thin application makes a setsockopt(TCP_MAXSEG) call
> such that the actual MSS is small enough to allow the send()'s to be
> four (or more) segments. And, if one wants to spin-away the anti-Nagle,
> Nagle is defined by the send() being smaller than the MSS, so if the MSS
> is smaller, it isn't anti-Nagle :)
>
This is not a new idea. Folks used to set the MSS really low for M$
Windows, so that their short little packets went over dialup links more
quickly and they saw a little bit more of their graphic as it crawled to
the screen. Even though it was actually slower in total time, it "felt"
faster because of the continuing visual feedback. It depended upon VJ
Header Prediction to keep the overhead down for the link.
These are/were called "TCP mice", and the result was routers and servers
being nibbled by mice. Not pleasant.
> Further blue-skying...
>
> If SACK were also enabled, it would seem that only loss of the last
> segment in the "thin train" would be an issue? Presumably, the thin
> stream receiver would be in a position to detect this, perhaps with an
> application-level timeout. Whether then it would suffice to allow the
> receiving app to make a setsockopt() call to force an extra ACK or two
> I'm not sure. Perhaps if the thin-stream had a semi-aggressive
> "heartbeat" going...
>
Heartbeats are the usual solution for gaming. Handles a host of
issues, including detection of clients that have become unreachable.
(No, these are not the same as TCP keep-alives.)
Beside my code in the field and widespread discussion, I know that Paul
Francis had several related papers a decade or so ago. My memory is that
younger game coders weren't particularly avid readers....
> But it does seem that it should be possible to deal with this sort of
> thing without having to make wholesale changes to TCP's RTO policies and
> whatnot?
>
Yep.
William Allen Simpson wrote:
>> Further blue-skying...
>>
>> If SACK were also enabled, it would seem that only loss of the last
>> segment in the "thin train" would be an issue? Presumably, the thin
>> stream receiver would be in a position to detect this, perhaps with an
>> application-level timeout. Whether then it would suffice to allow the
>> receiving app to make a setsockopt() call to force an extra ACK or two
>> I'm not sure. Perhaps if the thin-stream had a semi-aggressive
>> "heartbeat" going...
>>
> Heartbeats are the usual solution for gaming. Handles a host of
> issues, including detection of clients that have become unreachable.
>
> (No, these are not the same as TCP keep-alives.)
>
> Beside my code in the field and widespread discussion, I know that Paul
> Francis had several related papers a decade or so ago. My memory is that
> younger game coders weren't particularly avid readers....
>
>> But it does seem that it should be possible to deal with this sort of
>> thing without having to make wholesale changes to TCP's RTO policies
>> and whatnot?
>>
> Yep.
We recognise the possibility of increasing the aggressiveness of application
send rate in order to counteract the effect of thin streams on retransmission
latency. Applications are by nature uninformed about the state of the layers
below. To work around the fast-retransmit latency problems, an application
would have to keep a very aggressive heartbeat rate even though there is no
data to send, thus spamming the network with unneeded traffic.
To exemplify this, let's choose an SSH session from this set of statistics:
http://folk.uio.no/apetlund/lktmp/thin_apps_table.pdf. This thin stream has
an averge packet interarrival time of 323ms. The application developer would
have to consider how many "duds" to send in order to ensure a low
retransmission latency. Let's say he considers RTTs lower than 60ms harmless,
he would need to send more than 4 packets per 60ms. This would mean a
heartbeat rate of one packet each 15ms. Considering this, the aggressively
heartbeated application would send 67 packets per second compared to 3 in
the original stream.
By including thin-stream semantics into the TCP code, informed decisions
can be made to minimise the overhead while still reducing the retransmission
latency.
Best regards,
Andreas