2020-03-19 06:39:49

by Pengcheng Yang

[permalink] [raw]
Subject: [PATCH RFC net-next] tcp: make cwnd-limited not affected by tcp internal pacing

The current cwnd-limited is set when cwnd is fully used
(inflight >= cwnd), which allows the congestion algorithm
to accurately determine whether cwnd needs to be added.

However, there may be a problem when using tcp internal pacing:
In congestion avoidance phase, when a burst of packets are
acked by a stretched ACK or a burst of ACKs, this makes a large
reduction in inflight in a short time. At this time, the sender
sends data according to the pacing rate cannot fill CWND and
cwnd-limited is not set. The worst case is that cwnd-limited
is set only after the last packet in a window is sent. This causes
the congestion algorithm to be too conservative to increase CWND.

The idea is that once cwnd-limited is set, it maintains a window period.
In this period, it is considered that the CWND is limited. This makes
the congestion algorithm unaffected by tcp internal pacing.

Signed-off-by: Pengcheng Yang <[email protected]>
---
include/linux/tcp.h | 2 +-
net/ipv4/tcp_output.c | 14 ++++++++------
2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 3dc9640..3b3329f 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -286,7 +286,7 @@ struct tcp_sock {
u32 packets_out; /* Packets which are "in flight" */
u32 retrans_out; /* Retransmitted packets out */
u32 max_packets_out; /* max packets_out in last window */
- u32 max_packets_seq; /* right edge of max_packets_out flight */
+ u32 cwnd_limited_seq; /* snd_nxt at cwnd limited */

u16 urg_data; /* Saved octet of OOB data and control flags */
u8 ecn_flags; /* ECN status bits. */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 306e25d..31dd6dc 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1705,14 +1705,16 @@ static void tcp_cwnd_validate(struct sock *sk, bool is_cwnd_limited)
const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
struct tcp_sock *tp = tcp_sk(sk);

- /* Track the maximum number of outstanding packets in each
- * window, and remember whether we were cwnd-limited then.
+ /* Remember whether we were cwnd-limited in last window,
+ * and track the maximum number of outstanding packets in each window.
*/
- if (!before(tp->snd_una, tp->max_packets_seq) ||
- tp->packets_out > tp->max_packets_out) {
- tp->max_packets_out = tp->packets_out;
- tp->max_packets_seq = tp->snd_nxt;
+ if (is_cwnd_limited ||
+ !before(tp->snd_una, tp->cwnd_limited_seq)) {
tp->is_cwnd_limited = is_cwnd_limited;
+ tp->cwnd_limited_seq = tp->snd_nxt;
+ tp->max_packets_out = tp->packets_out;
+ } else if (tp->packets_out > tp->max_packets_out) {
+ tp->max_packets_out = tp->packets_out;
}

if (tcp_is_cwnd_limited(sk)) {
--
1.8.3.1


2020-03-19 18:52:31

by Neal Cardwell

[permalink] [raw]
Subject: Re: [PATCH RFC net-next] tcp: make cwnd-limited not affected by tcp internal pacing

On Thu, Mar 19, 2020 at 2:33 AM Pengcheng Yang <[email protected]> wrote:
>
> The current cwnd-limited is set when cwnd is fully used
> (inflight >= cwnd), which allows the congestion algorithm
> to accurately determine whether cwnd needs to be added.
>
> However, there may be a problem when using tcp internal pacing:
> In congestion avoidance phase, when a burst of packets are
> acked by a stretched ACK or a burst of ACKs, this makes a large
> reduction in inflight in a short time. At this time, the sender
> sends data according to the pacing rate cannot fill CWND and
> cwnd-limited is not set. The worst case is that cwnd-limited
> is set only after the last packet in a window is sent. This causes
> the congestion algorithm to be too conservative to increase CWND.
>
> The idea is that once cwnd-limited is set, it maintains a window period.
> In this period, it is considered that the CWND is limited. This makes
> the congestion algorithm unaffected by tcp internal pacing.
>
> Signed-off-by: Pengcheng Yang <[email protected]>
> ---

Thanks for sending this patch! We ran into this bug in our team
recently as well, and have been working on iterating on patches to fix
it.

I think this particular proposal in this patch does not properly
persist the max_packets_out until all the ACKs have been received for
a flight of data. The consequence of this would be that the cwnd does
not grow properly in slow-start for cases where the max_packets_out is
high enough to merit growing cwnd, but the connection is not strictly
cwnd-limited.

I'm a bit busy this week but I will try to put together and send out a
proposed patch ASAP.

best,
neal