2024-06-14 06:00:45

by luoxuanqiang

[permalink] [raw]
Subject: [PATCH v1 1/1] Fix race for duplicate reqsk on identical SYN

When bonding is configured in BOND_MODE_BROADCAST mode, if two identical SYN packets
are received at the same time and processed on different CPUs, it can potentially
create the same sk (sock) but two different reqsk (request_sock) in tcp_conn_request().

These two different reqsk will respond with two SYNACK packets, and since the generation
of the seq (ISN) incorporates a timestamp, the final two SYNACK packets will have
different seq values.

The consequence is that when the Client receives and replies with an ACK to the earlier
SYNACK packet, we will reset(RST) it.

====================================================================================

This behavior is consistently reproducible in my local setup, which comprises:

| NETA1 ------ NETB1 |
PC_A --- bond --- | | --- bond --- PC_B
| NETA2 ------ NETB2 |

- PC_A is the Server and has two network cards, NETA1 and NETA2. I have bonded these two
cards using BOND_MODE_BROADCAST mode and configured them to be handled by different CPU.

- PC_B is the Client, also equipped with two network cards, NETB1 and NETB2, which are
also bonded and configured in BOND_MODE_BROADCAST mode.

If the client attempts a TCP connection to the server, it might encounter a failure.
Capturing packets from the server side reveals:

10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [S], seq 320236027,
10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [S], seq 320236027,
localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [S.], seq 2967855116,
localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [S.], seq 2967855123, <==
10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [.], ack 4294967290,
10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [.], ack 4294967290,
localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [R], seq 2967855117, <==
localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [R], seq 2967855117,

Two SYNACKs with different seq numbers are sent by localhost, resulting in an anomaly.

====================================================================================

The attempted solution is as follows:
In the tcp_conn_request(), while inserting reqsk into the ehash table, it also checks
if an entry already exists. If found, it avoids reinsertion and releases it.

Simultaneously, In the reqsk_queue_hash_req(), the start of the req->rsk_timer is
adjusted to be after successful insertion.

Signed-off-by: luoxuanqiang <[email protected]>
---
include/net/inet_connection_sock.h | 2 +-
net/dccp/ipv4.c | 2 +-
net/dccp/ipv6.c | 2 +-
net/ipv4/inet_connection_sock.c | 16 ++++++++++++----
net/ipv4/tcp_input.c | 11 ++++++++++-
5 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index 7d6b1254c92d..8773d161d184 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -264,7 +264,7 @@ struct sock *inet_csk_reqsk_queue_add(struct sock *sk,
struct request_sock *req,
struct sock *child);
void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
- unsigned long timeout);
+ unsigned long timeout, bool *found_dup_sk);
struct sock *inet_csk_complete_hashdance(struct sock *sk, struct sock *child,
struct request_sock *req,
bool own_req);
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index ff41bd6f99c3..13aafdeb9205 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -657,7 +657,7 @@ int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
if (dccp_v4_send_response(sk, req))
goto drop_and_free;

- inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
+ inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
reqsk_put(req);
return 0;

diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 85f4b8fdbe5e..493cdb12ce2b 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -400,7 +400,7 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
if (dccp_v6_send_response(sk, req))
goto drop_and_free;

- inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
+ inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
reqsk_put(req);
return 0;

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index d81f74ce0f02..d9394db98a5a 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1123,12 +1123,17 @@ static void reqsk_timer_handler(struct timer_list *t)
}

static void reqsk_queue_hash_req(struct request_sock *req,
- unsigned long timeout)
+ unsigned long timeout, bool *found_dup_sk)
{
+
+ inet_ehash_insert(req_to_sk(req), NULL, found_dup_sk);
+ if(found_dup_sk && *found_dup_sk)
+ return;
+
+ /* The timer needs to be setup after a successful insertion. */
timer_setup(&req->rsk_timer, reqsk_timer_handler, TIMER_PINNED);
mod_timer(&req->rsk_timer, jiffies + timeout);

- inet_ehash_insert(req_to_sk(req), NULL, NULL);
/* before letting lookups find us, make sure all req fields
* are committed to memory and refcnt initialized.
*/
@@ -1137,9 +1142,12 @@ static void reqsk_queue_hash_req(struct request_sock *req,
}

void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
- unsigned long timeout)
+ unsigned long timeout, bool *found_dup_sk)
{
- reqsk_queue_hash_req(req, timeout);
+ reqsk_queue_hash_req(req, timeout, found_dup_sk);
+ if(found_dup_sk && *found_dup_sk)
+ return;
+
inet_csk_reqsk_queue_added(sk);
}
EXPORT_SYMBOL_GPL(inet_csk_reqsk_queue_hash_add);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9c04a9c8be9d..467f1b7bbd5a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -7255,8 +7255,17 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
} else {
tcp_rsk(req)->tfo_listener = false;
if (!want_cookie) {
+ bool found_dup_sk = false;
+
req->timeout = tcp_timeout_init((struct sock *)req);
- inet_csk_reqsk_queue_hash_add(sk, req, req->timeout);
+ inet_csk_reqsk_queue_hash_add(sk, req, req->timeout,
+ &found_dup_sk);
+
+ if(unlikely(found_dup_sk)){
+ reqsk_free(req);
+ return 0;
+ }
+
}
af_ops->send_synack(sk, dst, &fl, req, &foc,
!want_cookie ? TCP_SYNACK_NORMAL :
--
2.25.1



2024-06-14 06:50:15

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v1 1/1] Fix race for duplicate reqsk on identical SYN

On Fri, Jun 14, 2024 at 8:01 AM luoxuanqiang <[email protected]> wrote:
>
> When bonding is configured in BOND_MODE_BROADCAST mode, if two identical SYN packets
> are received at the same time and processed on different CPUs, it can potentially
> create the same sk (sock) but two different reqsk (request_sock) in tcp_conn_request().
>
> These two different reqsk will respond with two SYNACK packets, and since the generation
> of the seq (ISN) incorporates a timestamp, the final two SYNACK packets will have
> different seq values.
>
> The consequence is that when the Client receives and replies with an ACK to the earlier
> SYNACK packet, we will reset(RST) it.
>
> ====================================================================================
>
> This behavior is consistently reproducible in my local setup, which comprises:
>
> | NETA1 ------ NETB1 |
> PC_A --- bond --- | | --- bond --- PC_B
> | NETA2 ------ NETB2 |
>
> - PC_A is the Server and has two network cards, NETA1 and NETA2. I have bonded these two
> cards using BOND_MODE_BROADCAST mode and configured them to be handled by different CPU.
>
> - PC_B is the Client, also equipped with two network cards, NETB1 and NETB2, which are
> also bonded and configured in BOND_MODE_BROADCAST mode.
>
> If the client attempts a TCP connection to the server, it might encounter a failure.
> Capturing packets from the server side reveals:
>
> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [S], seq 320236027,
> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [S], seq 320236027,
> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [S.], seq 2967855116,
> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [S.], seq 2967855123, <==
> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [.], ack 4294967290,
> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [.], ack 4294967290,
> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [R], seq 2967855117, <==
> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [R], seq 2967855117,
>
> Two SYNACKs with different seq numbers are sent by localhost, resulting in an anomaly.
>
> ====================================================================================
>
> The attempted solution is as follows:
> In the tcp_conn_request(), while inserting reqsk into the ehash table, it also checks
> if an entry already exists. If found, it avoids reinsertion and releases it.
>
> Simultaneously, In the reqsk_queue_hash_req(), the start of the req->rsk_timer is
> adjusted to be after successful insertion.
>
> Signed-off-by: luoxuanqiang <[email protected]>
> ---
> include/net/inet_connection_sock.h | 2 +-
> net/dccp/ipv4.c | 2 +-
> net/dccp/ipv6.c | 2 +-
> net/ipv4/inet_connection_sock.c | 16 ++++++++++++----
> net/ipv4/tcp_input.c | 11 ++++++++++-
> 5 files changed, 25 insertions(+), 8 deletions(-)
>
> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
> index 7d6b1254c92d..8773d161d184 100644
> --- a/include/net/inet_connection_sock.h
> +++ b/include/net/inet_connection_sock.h
> @@ -264,7 +264,7 @@ struct sock *inet_csk_reqsk_queue_add(struct sock *sk,
> struct request_sock *req,
> struct sock *child);
> void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
> - unsigned long timeout);
> + unsigned long timeout, bool *found_dup_sk);
> struct sock *inet_csk_complete_hashdance(struct sock *sk, struct sock *child,
> struct request_sock *req,
> bool own_req);
> diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
> index ff41bd6f99c3..13aafdeb9205 100644
> --- a/net/dccp/ipv4.c
> +++ b/net/dccp/ipv4.c
> @@ -657,7 +657,7 @@ int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
> if (dccp_v4_send_response(sk, req))
> goto drop_and_free;
>
> - inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
> + inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
> reqsk_put(req);
> return 0;
>
> diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
> index 85f4b8fdbe5e..493cdb12ce2b 100644
> --- a/net/dccp/ipv6.c
> +++ b/net/dccp/ipv6.c
> @@ -400,7 +400,7 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
> if (dccp_v6_send_response(sk, req))
> goto drop_and_free;
>
> - inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
> + inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
> reqsk_put(req);
> return 0;
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index d81f74ce0f02..d9394db98a5a 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -1123,12 +1123,17 @@ static void reqsk_timer_handler(struct timer_list *t)
> }
>
> static void reqsk_queue_hash_req(struct request_sock *req,
> - unsigned long timeout)
> + unsigned long timeout, bool *found_dup_sk)
> {
> +
> + inet_ehash_insert(req_to_sk(req), NULL, found_dup_sk);
> + if(found_dup_sk && *found_dup_sk)
> + return;
> +
> + /* The timer needs to be setup after a successful insertion. */

I am pretty sure we had a prior attempt to fix this issue, and the fix
was problematic.

You are moving the inet_ehash_insert() before the mod_timer(), this
will add races.

Hint here is the use of TIMER_PINNED.

CCing Florian, because he just removed TIMER_PINNED for TW, he might
have the context
to properly fix this issue.

> timer_setup(&req->rsk_timer, reqsk_timer_handler, TIMER_PINNED);
> mod_timer(&req->rsk_timer, jiffies + timeout);
>
> - inet_ehash_insert(req_to_sk(req), NULL, NULL);
> /* before letting lookups find us, make sure all req fields
> * are committed to memory and refcnt initialized.
> */
> @@ -1137,9 +1142,12 @@ static void reqsk_queue_hash_req(struct request_sock *req,
> }
>
> void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
> - unsigned long timeout)
> + unsigned long timeout, bool *found_dup_sk)
> {
> - reqsk_queue_hash_req(req, timeout);
> + reqsk_queue_hash_req(req, timeout, found_dup_sk);
> + if(found_dup_sk && *found_dup_sk)
> + return;
> +
> inet_csk_reqsk_queue_added(sk);
> }
> EXPORT_SYMBOL_GPL(inet_csk_reqsk_queue_hash_add);
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 9c04a9c8be9d..467f1b7bbd5a 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -7255,8 +7255,17 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
> } else {
> tcp_rsk(req)->tfo_listener = false;
> if (!want_cookie) {
> + bool found_dup_sk = false;
> +
> req->timeout = tcp_timeout_init((struct sock *)req);
> - inet_csk_reqsk_queue_hash_add(sk, req, req->timeout);
> + inet_csk_reqsk_queue_hash_add(sk, req, req->timeout,
> + &found_dup_sk);
> +
> + if(unlikely(found_dup_sk)){
> + reqsk_free(req);
> + return 0;
> + }
> +
> }
> af_ops->send_synack(sk, dst, &fl, req, &foc,
> !want_cookie ? TCP_SYNACK_NORMAL :
> --
> 2.25.1
>

2024-06-14 08:17:46

by luoxuanqiang

[permalink] [raw]
Subject: Re: [PATCH v1 1/1] Fix race for duplicate reqsk on identical SYN

On Fri, Jun 14, 2024 at 8:01 AM luoxuanqiang <[email protected]> wrote:
>>
>> When bonding is configured in BOND_MODE_BROADCAST mode, if two identical SYN packets
>> are received at the same time and processed on different CPUs, it can potentially
>> create the same sk (sock) but two different reqsk (request_sock) in tcp_conn_request().
>>
>> These two different reqsk will respond with two SYNACK packets, and since the generation
>> of the seq (ISN) incorporates a timestamp, the final two SYNACK packets will have
>> different seq values.
>>
>> The consequence is that when the Client receives and replies with an ACK to the earlier
>> SYNACK packet, we will reset(RST) it.
>>
>> ====================================================================================
>>
>> This behavior is consistently reproducible in my local setup, which comprises:
>>
>> | NETA1 ------ NETB1 |
>> PC_A --- bond --- | | --- bond --- PC_B
>> | NETA2 ------ NETB2 |
>>
>> - PC_A is the Server and has two network cards, NETA1 and NETA2. I have bonded these two
>> cards using BOND_MODE_BROADCAST mode and configured them to be handled by different CPU.
>>
>> - PC_B is the Client, also equipped with two network cards, NETB1 and NETB2, which are
>> also bonded and configured in BOND_MODE_BROADCAST mode.
>>
>> If the client attempts a TCP connection to the server, it might encounter a failure.
>> Capturing packets from the server side reveals:
>>
>> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [S], seq 320236027,
>> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [S], seq 320236027,
>> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [S.], seq 2967855116,
>> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [S.], seq 2967855123, <==
>> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [.], ack 4294967290,
>> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [.], ack 4294967290,
>> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [R], seq 2967855117, <==
>> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [R], seq 2967855117,
>>
>> Two SYNACKs with different seq numbers are sent by localhost, resulting in an anomaly.
>>
>> ====================================================================================
>>
>> The attempted solution is as follows:
>> In the tcp_conn_request(), while inserting reqsk into the ehash table, it also checks
>> if an entry already exists. If found, it avoids reinsertion and releases it.
>>
>> Simultaneously, In the reqsk_queue_hash_req(), the start of the req->rsk_timer is
>> adjusted to be after successful insertion.
>>
>> Signed-off-by: luoxuanqiang <[email protected]>
>> ---
>> include/net/inet_connection_sock.h | 2 +-
>> net/dccp/ipv4.c | 2 +-
>> net/dccp/ipv6.c | 2 +-
>> net/ipv4/inet_connection_sock.c | 16 ++++++++++++----
>> net/ipv4/tcp_input.c | 11 ++++++++++-
>> 5 files changed, 25 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
>> index 7d6b1254c92d..8773d161d184 100644
>> --- a/include/net/inet_connection_sock.h
>> +++ b/include/net/inet_connection_sock.h
>> @@ -264,7 +264,7 @@ struct sock *inet_csk_reqsk_queue_add(struct sock *sk,
>> struct request_sock *req,
>> struct sock *child);
>> void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
>> - unsigned long timeout);
>> + unsigned long timeout, bool *found_dup_sk);
>> struct sock *inet_csk_complete_hashdance(struct sock *sk, struct sock *child,
>> struct request_sock *req,
>> bool own_req);
>> diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
>> index ff41bd6f99c3..13aafdeb9205 100644
>> --- a/net/dccp/ipv4.c
>> +++ b/net/dccp/ipv4.c
>> @@ -657,7 +657,7 @@ int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>> if (dccp_v4_send_response(sk, req))
>> goto drop_and_free;
>>
>> - inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
>> + inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
>> reqsk_put(req);
>> return 0;
>>
>> diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
>> index 85f4b8fdbe5e..493cdb12ce2b 100644
>> --- a/net/dccp/ipv6.c
>> +++ b/net/dccp/ipv6.c
>> @@ -400,7 +400,7 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
>> if (dccp_v6_send_response(sk, req))
>> goto drop_and_free;
>>
>> - inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
>> + inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
>> reqsk_put(req);
>> return 0;
>>
>> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
>> index d81f74ce0f02..d9394db98a5a 100644
>> --- a/net/ipv4/inet_connection_sock.c
>> +++ b/net/ipv4/inet_connection_sock.c
>> @@ -1123,12 +1123,17 @@ static void reqsk_timer_handler(struct timer_list *t)
>> }
>>
>> static void reqsk_queue_hash_req(struct request_sock *req,
>> - unsigned long timeout)
>> + unsigned long timeout, bool *found_dup_sk)
>> {
>> +
>> + inet_ehash_insert(req_to_sk(req), NULL, found_dup_sk);
>> + if(found_dup_sk && *found_dup_sk)
>> + return;
>> +
>> + /* The timer needs to be setup after a successful insertion. */
>
>I am pretty sure we had a prior attempt to fix this issue, and the fix
>was problematic.
>
>You are moving the inet_ehash_insert() before the mod_timer(), this
>will add races.
Could you kindly explain what "races" refer to here? Thank you!

>
>Hint here is the use of TIMER_PINNED.
>
>CCing Florian, because he just removed TIMER_PINNED for TW, he might
>have the context
>to properly fix this issue.
>
>> timer_setup(&req->rsk_timer, reqsk_timer_handler, TIMER_PINNED);
>> mod_timer(&req->rsk_timer, jiffies + timeout);
>>
>> - inet_ehash_insert(req_to_sk(req), NULL, NULL);
>> /* before letting lookups find us, make sure all req fields
>> * are committed to memory and refcnt initialized.
>> */
>> @@ -1137,9 +1142,12 @@ static void reqsk_queue_hash_req(struct request_sock *req,
>> }
>>
>> void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
>> - unsigned long timeout)
>> + unsigned long timeout, bool *found_dup_sk)
>> {
>> - reqsk_queue_hash_req(req, timeout);
>> + reqsk_queue_hash_req(req, timeout, found_dup_sk);
>> + if(found_dup_sk && *found_dup_sk)
>> + return;
>> +
>> inet_csk_reqsk_queue_added(sk);
>> }
>> EXPORT_SYMBOL_GPL(inet_csk_reqsk_queue_hash_add);
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> index 9c04a9c8be9d..467f1b7bbd5a 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -7255,8 +7255,17 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
>> } else {
>> tcp_rsk(req)->tfo_listener = false;
>> if (!want_cookie) {
>> + bool found_dup_sk = false;
>> +
>> req->timeout = tcp_timeout_init((struct sock *)req);
>> - inet_csk_reqsk_queue_hash_add(sk, req, req->timeout);
>> + inet_csk_reqsk_queue_hash_add(sk, req, req->timeout,
>> + &found_dup_sk);
>> +
>> + if(unlikely(found_dup_sk)){
>> + reqsk_free(req);
>> + return 0;
>> + }
>> +
>> }
>> af_ops->send_synack(sk, dst, &fl, req, &foc,
>> !want_cookie ? TCP_SYNACK_NORMAL :
>> --
>> 2.25.1
>>

2024-06-14 09:42:11

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v1 1/1] Fix race for duplicate reqsk on identical SYN

On Fri, Jun 14, 2024 at 10:59 AM luoxuanqiang <[email protected]> wrote:
>
> On Fri, Jun 14, 2024 at 8:01 AM luoxuanqiang <[email protected]> wrote:
> >>
> >> When bonding is configured in BOND_MODE_BROADCAST mode, if two identical SYN packets
> >> are received at the same time and processed on different CPUs, it can potentially
> >> create the same sk (sock) but two different reqsk (request_sock) in tcp_conn_request().
> >>
> >> These two different reqsk will respond with two SYNACK packets, and since the generation
> >> of the seq (ISN) incorporates a timestamp, the final two SYNACK packets will have
> >> different seq values.
> >>
> >> The consequence is that when the Client receives and replies with an ACK to the earlier
> >> SYNACK packet, we will reset(RST) it.
> >>
> >> ====================================================================================
> >>
> >> This behavior is consistently reproducible in my local setup, which comprises:
> >>
> >> | NETA1 ------ NETB1 |
> >> PC_A --- bond --- | | --- bond --- PC_B
> >> | NETA2 ------ NETB2 |
> >>
> >> - PC_A is the Server and has two network cards, NETA1 and NETA2. I have bonded these two
> >> cards using BOND_MODE_BROADCAST mode and configured them to be handled by different CPU.
> >>
> >> - PC_B is the Client, also equipped with two network cards, NETB1 and NETB2, which are
> >> also bonded and configured in BOND_MODE_BROADCAST mode.
> >>
> >> If the client attempts a TCP connection to the server, it might encounter a failure.
> >> Capturing packets from the server side reveals:
> >>
> >> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [S], seq 320236027,
> >> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [S], seq 320236027,
> >> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [S.], seq 2967855116,
> >> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [S.], seq 2967855123, <==
> >> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [.], ack 4294967290,
> >> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [.], ack 4294967290,
> >> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [R], seq 2967855117, <==
> >> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [R], seq 2967855117,
> >>
> >> Two SYNACKs with different seq numbers are sent by localhost, resulting in an anomaly.
> >>
> >> ====================================================================================
> >>
> >> The attempted solution is as follows:
> >> In the tcp_conn_request(), while inserting reqsk into the ehash table, it also checks
> >> if an entry already exists. If found, it avoids reinsertion and releases it.
> >>
> >> Simultaneously, In the reqsk_queue_hash_req(), the start of the req->rsk_timer is
> >> adjusted to be after successful insertion.
> >>
> >> Signed-off-by: luoxuanqiang <[email protected]>
> >> ---
> >> include/net/inet_connection_sock.h | 2 +-
> >> net/dccp/ipv4.c | 2 +-
> >> net/dccp/ipv6.c | 2 +-
> >> net/ipv4/inet_connection_sock.c | 16 ++++++++++++----
> >> net/ipv4/tcp_input.c | 11 ++++++++++-
> >> 5 files changed, 25 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
> >> index 7d6b1254c92d..8773d161d184 100644
> >> --- a/include/net/inet_connection_sock.h
> >> +++ b/include/net/inet_connection_sock.h
> >> @@ -264,7 +264,7 @@ struct sock *inet_csk_reqsk_queue_add(struct sock *sk,
> >> struct request_sock *req,
> >> struct sock *child);
> >> void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
> >> - unsigned long timeout);
> >> + unsigned long timeout, bool *found_dup_sk);
> >> struct sock *inet_csk_complete_hashdance(struct sock *sk, struct sock *child,
> >> struct request_sock *req,
> >> bool own_req);
> >> diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
> >> index ff41bd6f99c3..13aafdeb9205 100644
> >> --- a/net/dccp/ipv4.c
> >> +++ b/net/dccp/ipv4.c
> >> @@ -657,7 +657,7 @@ int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
> >> if (dccp_v4_send_response(sk, req))
> >> goto drop_and_free;
> >>
> >> - inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
> >> + inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
> >> reqsk_put(req);
> >> return 0;
> >>
> >> diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
> >> index 85f4b8fdbe5e..493cdb12ce2b 100644
> >> --- a/net/dccp/ipv6.c
> >> +++ b/net/dccp/ipv6.c
> >> @@ -400,7 +400,7 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
> >> if (dccp_v6_send_response(sk, req))
> >> goto drop_and_free;
> >>
> >> - inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
> >> + inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
> >> reqsk_put(req);
> >> return 0;
> >>
> >> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> >> index d81f74ce0f02..d9394db98a5a 100644
> >> --- a/net/ipv4/inet_connection_sock.c
> >> +++ b/net/ipv4/inet_connection_sock.c
> >> @@ -1123,12 +1123,17 @@ static void reqsk_timer_handler(struct timer_list *t)
> >> }
> >>
> >> static void reqsk_queue_hash_req(struct request_sock *req,
> >> - unsigned long timeout)
> >> + unsigned long timeout, bool *found_dup_sk)
> >> {
> >> +
> >> + inet_ehash_insert(req_to_sk(req), NULL, found_dup_sk);
> >> + if(found_dup_sk && *found_dup_sk)
> >> + return;
> >> +
> >> + /* The timer needs to be setup after a successful insertion. */
> >
> >I am pretty sure we had a prior attempt to fix this issue, and the fix
> >was problematic.
> >
> >You are moving the inet_ehash_insert() before the mod_timer(), this
> >will add races.
> Could you kindly explain what "races" refer to here? Thank you!


Hmmm... maybe this is ok. Please respin your patch after fixing
checkpatch issues, and add a 'net' tag

( See https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
for all warnings / fails)

Please CC Kuniyuki Iwashima <[email protected]> because I will be OOO
for about 4 days.

2024-06-14 10:35:17

by luoxuanqiang

[permalink] [raw]
Subject: Re: [PATCH v1 1/1] Fix race for duplicate reqsk on identical SYN


在 2024/6/14 17:41, Eric Dumazet 写道:
> On Fri, Jun 14, 2024 at 10:59 AM luoxuanqiang <[email protected]> wrote:
>> On Fri, Jun 14, 2024 at 8:01 AM luoxuanqiang <[email protected]> wrote:
>>>> When bonding is configured in BOND_MODE_BROADCAST mode, if two identical SYN packets
>>>> are received at the same time and processed on different CPUs, it can potentially
>>>> create the same sk (sock) but two different reqsk (request_sock) in tcp_conn_request().
>>>>
>>>> These two different reqsk will respond with two SYNACK packets, and since the generation
>>>> of the seq (ISN) incorporates a timestamp, the final two SYNACK packets will have
>>>> different seq values.
>>>>
>>>> The consequence is that when the Client receives and replies with an ACK to the earlier
>>>> SYNACK packet, we will reset(RST) it.
>>>>
>>>> ====================================================================================
>>>>
>>>> This behavior is consistently reproducible in my local setup, which comprises:
>>>>
>>>> | NETA1 ------ NETB1 |
>>>> PC_A --- bond --- | | --- bond --- PC_B
>>>> | NETA2 ------ NETB2 |
>>>>
>>>> - PC_A is the Server and has two network cards, NETA1 and NETA2. I have bonded these two
>>>> cards using BOND_MODE_BROADCAST mode and configured them to be handled by different CPU.
>>>>
>>>> - PC_B is the Client, also equipped with two network cards, NETB1 and NETB2, which are
>>>> also bonded and configured in BOND_MODE_BROADCAST mode.
>>>>
>>>> If the client attempts a TCP connection to the server, it might encounter a failure.
>>>> Capturing packets from the server side reveals:
>>>>
>>>> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [S], seq 320236027,
>>>> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [S], seq 320236027,
>>>> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [S.], seq 2967855116,
>>>> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [S.], seq 2967855123, <==
>>>> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [.], ack 4294967290,
>>>> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [.], ack 4294967290,
>>>> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [R], seq 2967855117, <==
>>>> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [R], seq 2967855117,
>>>>
>>>> Two SYNACKs with different seq numbers are sent by localhost, resulting in an anomaly.
>>>>
>>>> ====================================================================================
>>>>
>>>> The attempted solution is as follows:
>>>> In the tcp_conn_request(), while inserting reqsk into the ehash table, it also checks
>>>> if an entry already exists. If found, it avoids reinsertion and releases it.
>>>>
>>>> Simultaneously, In the reqsk_queue_hash_req(), the start of the req->rsk_timer is
>>>> adjusted to be after successful insertion.
>>>>
>>>> Signed-off-by: luoxuanqiang <[email protected]>
>>>> ---
>>>> include/net/inet_connection_sock.h | 2 +-
>>>> net/dccp/ipv4.c | 2 +-
>>>> net/dccp/ipv6.c | 2 +-
>>>> net/ipv4/inet_connection_sock.c | 16 ++++++++++++----
>>>> net/ipv4/tcp_input.c | 11 ++++++++++-
>>>> 5 files changed, 25 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
>>>> index 7d6b1254c92d..8773d161d184 100644
>>>> --- a/include/net/inet_connection_sock.h
>>>> +++ b/include/net/inet_connection_sock.h
>>>> @@ -264,7 +264,7 @@ struct sock *inet_csk_reqsk_queue_add(struct sock *sk,
>>>> struct request_sock *req,
>>>> struct sock *child);
>>>> void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
>>>> - unsigned long timeout);
>>>> + unsigned long timeout, bool *found_dup_sk);
>>>> struct sock *inet_csk_complete_hashdance(struct sock *sk, struct sock *child,
>>>> struct request_sock *req,
>>>> bool own_req);
>>>> diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
>>>> index ff41bd6f99c3..13aafdeb9205 100644
>>>> --- a/net/dccp/ipv4.c
>>>> +++ b/net/dccp/ipv4.c
>>>> @@ -657,7 +657,7 @@ int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>>>> if (dccp_v4_send_response(sk, req))
>>>> goto drop_and_free;
>>>>
>>>> - inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
>>>> + inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
>>>> reqsk_put(req);
>>>> return 0;
>>>>
>>>> diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
>>>> index 85f4b8fdbe5e..493cdb12ce2b 100644
>>>> --- a/net/dccp/ipv6.c
>>>> +++ b/net/dccp/ipv6.c
>>>> @@ -400,7 +400,7 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
>>>> if (dccp_v6_send_response(sk, req))
>>>> goto drop_and_free;
>>>>
>>>> - inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
>>>> + inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
>>>> reqsk_put(req);
>>>> return 0;
>>>>
>>>> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
>>>> index d81f74ce0f02..d9394db98a5a 100644
>>>> --- a/net/ipv4/inet_connection_sock.c
>>>> +++ b/net/ipv4/inet_connection_sock.c
>>>> @@ -1123,12 +1123,17 @@ static void reqsk_timer_handler(struct timer_list *t)
>>>> }
>>>>
>>>> static void reqsk_queue_hash_req(struct request_sock *req,
>>>> - unsigned long timeout)
>>>> + unsigned long timeout, bool *found_dup_sk)
>>>> {
>>>> +
>>>> + inet_ehash_insert(req_to_sk(req), NULL, found_dup_sk);
>>>> + if(found_dup_sk && *found_dup_sk)
>>>> + return;
>>>> +
>>>> + /* The timer needs to be setup after a successful insertion. */
>>> I am pretty sure we had a prior attempt to fix this issue, and the fix
>>> was problematic.
>>>
>>> You are moving the inet_ehash_insert() before the mod_timer(), this
>>> will add races.
>> Could you kindly explain what "races" refer to here? Thank you!
>
> Hmmm... maybe this is ok. Please respin your patch after fixing
> checkpatch issues, and add a 'net' tag
>
> ( See https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
> for all warnings / fails)
>
> Please CC Kuniyuki Iwashima <[email protected]> because I will be OOO
> for about 4 days.

The formatting error has been resolved and the V2 version has been sent. Enjoy your holiday! Thank you!