When CONFIG_HZ defaults to 1000Hz and the network transmission time is
less than 1ms, lsndtime and lrcvtime are likely to be equal, which will
lead to hundreds of interactions before entering pingpong mode.
Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3")
Suggested-by: Jakub Kicinski <[email protected]>
Signed-off-by: LemmyHuang <[email protected]>
---
v2:
* Use !after() wrapping the values. (Jakub Kicinski)
v1: https://lore.kernel.org/netdev/[email protected]/
---
net/ipv4/tcp_output.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 858a15cc2..c1c95dc40 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -172,7 +172,7 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
* and it is a reply for ato after last received packet,
* increase pingpong count.
*/
- if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
+ if (!after(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
(u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato)
inet_csk_inc_pingpong_cnt(sk);
--
2.27.0
On Wed, Jul 20, 2022 at 3:25 AM LemmyHuang <[email protected]> wrote:
>
> When CONFIG_HZ defaults to 1000Hz and the network transmission time is
> less than 1ms, lsndtime and lrcvtime are likely to be equal, which will
> lead to hundreds of interactions before entering pingpong mode.
>
> Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3")
> Suggested-by: Jakub Kicinski <[email protected]>
> Signed-off-by: LemmyHuang <[email protected]>
> ---
> v2:
> * Use !after() wrapping the values. (Jakub Kicinski)
>
> v1: https://lore.kernel.org/netdev/[email protected]/
> ---
> net/ipv4/tcp_output.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 858a15cc2..c1c95dc40 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -172,7 +172,7 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
> * and it is a reply for ato after last received packet,
> * increase pingpong count.
> */
> - if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
> + if (!after(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
> (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato)
> inet_csk_inc_pingpong_cnt(sk);
>
> --
Thanks for pointing out this problem!
AFAICT this patch would result in incorrect behavior.
With this patch, we could have cases where tp->lsndtime ==
icsk->icsk_ack.lrcvtime and (u32)(now - icsk->icsk_ack.lrcvtime) <
icsk->icsk_ack.ato and yet we do not really have a ping-pong exchange.
For example, with this patch we could have:
T1: jiffies=J1; host B receives RPC request from host A
T2: jiffies=J1; host B sends first RPC response data packet to host A;
-> calls inet_csk_inc_pingpong_cnt()
T3: jiffies=J1; host B sends second RPC response data packet to host A;
-> calls inet_csk_inc_pingpong_cnt()
In this scenario there is only one ping-pong exchange but the code
calls inet_csk_inc_pingpong_cnt() twice.
So I'm hoping we can come up with a better fix.
A simpler approach might be to simplify the model and go back to
having a single ping-pong interaction cause delayed ACKs to be enabled
on a connection endpoint. Our team has been seeing good results for a
while with the simpler approach. What do folks think?
neal
At 2022-07-21 02:49:35, "Neal Cardwell" <[email protected]> wrote:
> On Wed, Jul 20, 2022 at 3:25 AM LemmyHuang <[email protected]> wrote:
>>
>> When CONFIG_HZ defaults to 1000Hz and the network transmission time is
>> less than 1ms, lsndtime and lrcvtime are likely to be equal, which will
>> lead to hundreds of interactions before entering pingpong mode.
>>
>> Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3")
>> Suggested-by: Jakub Kicinski <[email protected]>
>> Signed-off-by: LemmyHuang <[email protected]>
>> ---
>> v2:
>> * Use !after() wrapping the values. (Jakub Kicinski)
>>
>> v1: https://lore.kernel.org/netdev/[email protected]/
>> ---
>> net/ipv4/tcp_output.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index 858a15cc2..c1c95dc40 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -172,7 +172,7 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
>> * and it is a reply for ato after last received packet,
>> * increase pingpong count.
>> */
>> - if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
>> + if (!after(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
>> (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato)
>> inet_csk_inc_pingpong_cnt(sk);
>>
>> --
>
> Thanks for pointing out this problem!
>
> AFAICT this patch would result in incorrect behavior.
>
> With this patch, we could have cases where tp->lsndtime ==
> icsk->icsk_ack.lrcvtime and (u32)(now - icsk->icsk_ack.lrcvtime) <
> icsk->icsk_ack.ato and yet we do not really have a ping-pong exchange.
>
> For example, with this patch we could have:
>
> T1: jiffies=J1; host B receives RPC request from host A
> T2: jiffies=J1; host B sends first RPC response data packet to host A;
> -> calls inet_csk_inc_pingpong_cnt()
> T3: jiffies=J1; host B sends second RPC response data packet to host A;
> -> calls inet_csk_inc_pingpong_cnt()
>
> In this scenario there is only one ping-pong exchange but the code
> calls inet_csk_inc_pingpong_cnt() twice.
>
> So I'm hoping we can come up with a better fix.
>
> A simpler approach might be to simplify the model and go back to
> having a single ping-pong interaction cause delayed ACKs to be enabled
> on a connection endpoint. Our team has been seeing good results for a
> while with the simpler approach. What do folks think?
>
>
> neal
It seems better to go back.
Look at this revert patch:
https://lore.kernel.org/netdev/[email protected]/