2024-06-14 10:34:48

by luoxuanqiang

[permalink] [raw]
Subject: [PATCH net v2] Fix race for duplicate reqsk on identical SYN

When bonding is configured in BOND_MODE_BROADCAST mode, if two identical
SYN packets are received at the same time and processed on different CPUs,
it can potentially create the same sk (sock) but two different reqsk
(request_sock) in tcp_conn_request().

These two different reqsk will respond with two SYNACK packets, and since
the generation of the seq (ISN) incorporates a timestamp, the final two
SYNACK packets will have different seq values.

The consequence is that when the Client receives and replies with an ACK
to the earlier SYNACK packet, we will reset(RST) it.

========================================================================

This behavior is consistently reproducible in my local setup,
which comprises:

| NETA1 ------ NETB1 |
PC_A --- bond --- | | --- bond --- PC_B
| NETA2 ------ NETB2 |

- PC_A is the Server and has two network cards, NETA1 and NETA2. I have
bonded these two cards using BOND_MODE_BROADCAST mode and configured
them to be handled by different CPU.

- PC_B is the Client, also equipped with two network cards, NETB1 and
NETB2, which are also bonded and configured in BOND_MODE_BROADCAST mode.

If the client attempts a TCP connection to the server, it might encounter
a failure. Capturing packets from the server side reveals:

10.10.10.10.45182 > localhost: Flags [S], seq 320236027,
10.10.10.10.45182 > localhost: Flags [S], seq 320236027,
localhost > 10.10.10.10.45182: Flags [S.], seq 2967855116,
localhost > 10.10.10.10.45182: Flags [S.], seq 2967855123, <==
10.10.10.10.45182 > localhost: Flags [.], ack 4294967290,
10.10.10.10.45182 > localhost: Flags [.], ack 4294967290,
localhost > 10.10.10.10.45182: Flags [R], seq 2967855117, <==
localhost > 10.10.10.10.45182: Flags [R], seq 2967855117,

Two SYNACKs with different seq numbers are sent by localhost,
resulting in an anomaly.

========================================================================

The attempted solution is as follows:
In the tcp_conn_request(), while inserting reqsk into the ehash table,
it also checks if an entry already exists. If found, it avoids
reinsertion and releases it.

Simultaneously, In the reqsk_queue_hash_req(), the start of the
req->rsk_timer is adjusted to be after successful insertion.

Signed-off-by: luoxuanqiang <[email protected]>
---
include/net/inet_connection_sock.h | 2 +-
net/dccp/ipv4.c | 2 +-
net/dccp/ipv6.c | 2 +-
net/ipv4/inet_connection_sock.c | 15 +++++++++++----
net/ipv4/tcp_input.c | 11 ++++++++++-
5 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index 7d6b1254c92d..8773d161d184 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -264,7 +264,7 @@ struct sock *inet_csk_reqsk_queue_add(struct sock *sk,
struct request_sock *req,
struct sock *child);
void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
- unsigned long timeout);
+ unsigned long timeout, bool *found_dup_sk);
struct sock *inet_csk_complete_hashdance(struct sock *sk, struct sock *child,
struct request_sock *req,
bool own_req);
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index ff41bd6f99c3..13aafdeb9205 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -657,7 +657,7 @@ int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
if (dccp_v4_send_response(sk, req))
goto drop_and_free;

- inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
+ inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
reqsk_put(req);
return 0;

diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 85f4b8fdbe5e..493cdb12ce2b 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -400,7 +400,7 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
if (dccp_v6_send_response(sk, req))
goto drop_and_free;

- inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
+ inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
reqsk_put(req);
return 0;

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index d81f74ce0f02..045d0701acfd 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1123,12 +1123,16 @@ static void reqsk_timer_handler(struct timer_list *t)
}

static void reqsk_queue_hash_req(struct request_sock *req,
- unsigned long timeout)
+ unsigned long timeout, bool *found_dup_sk)
{
+ inet_ehash_insert(req_to_sk(req), NULL, found_dup_sk);
+ if (found_dup_sk && *found_dup_sk)
+ return;
+
+ /* The timer needs to be setup after a successful insertion. */
timer_setup(&req->rsk_timer, reqsk_timer_handler, TIMER_PINNED);
mod_timer(&req->rsk_timer, jiffies + timeout);

- inet_ehash_insert(req_to_sk(req), NULL, NULL);
/* before letting lookups find us, make sure all req fields
* are committed to memory and refcnt initialized.
*/
@@ -1137,9 +1141,12 @@ static void reqsk_queue_hash_req(struct request_sock *req,
}

void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
- unsigned long timeout)
+ unsigned long timeout, bool *found_dup_sk)
{
- reqsk_queue_hash_req(req, timeout);
+ reqsk_queue_hash_req(req, timeout, found_dup_sk);
+ if (found_dup_sk && *found_dup_sk)
+ return;
+
inet_csk_reqsk_queue_added(sk);
}
EXPORT_SYMBOL_GPL(inet_csk_reqsk_queue_hash_add);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9c04a9c8be9d..49876477c2b9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -7255,8 +7255,17 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
} else {
tcp_rsk(req)->tfo_listener = false;
if (!want_cookie) {
+ bool found_dup_sk = false;
+
req->timeout = tcp_timeout_init((struct sock *)req);
- inet_csk_reqsk_queue_hash_add(sk, req, req->timeout);
+ inet_csk_reqsk_queue_hash_add(sk, req, req->timeout,
+ &found_dup_sk);
+
+ if (unlikely(found_dup_sk)) {
+ reqsk_free(req);
+ return 0;
+ }
+
}
af_ops->send_synack(sk, dst, &fl, req, &foc,
!want_cookie ? TCP_SYNACK_NORMAL :
--
2.25.1



2024-06-14 10:55:18

by Florian Westphal

[permalink] [raw]
Subject: Re: [PATCH net v2] Fix race for duplicate reqsk on identical SYN

luoxuanqiang <[email protected]> wrote:
> include/net/inet_connection_sock.h | 2 +-
> net/dccp/ipv4.c | 2 +-
> net/dccp/ipv6.c | 2 +-
> net/ipv4/inet_connection_sock.c | 15 +++++++++++----
> net/ipv4/tcp_input.c | 11 ++++++++++-
> 5 files changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
> index 7d6b1254c92d..8773d161d184 100644
> --- a/include/net/inet_connection_sock.h
> +++ b/include/net/inet_connection_sock.h
> @@ -264,7 +264,7 @@ struct sock *inet_csk_reqsk_queue_add(struct sock *sk,
> struct request_sock *req,
> struct sock *child);
> void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
> - unsigned long timeout);
> + unsigned long timeout, bool *found_dup_sk);

Nit:

I think it would be preferrable to change retval to bool rather than
bool *found_dup_sk extra arg, so one can do

bool inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
unsigned long timeout)
{
if (!reqsk_queue_hash_req(req, timeout))
return false;

i.e. let retval indicate wheter reqsk was inserted or not.

Patch looks good to me otherwise.

2024-06-14 12:43:50

by luoxuanqiang

[permalink] [raw]
Subject: Re: [PATCH net v2] Fix race for duplicate reqsk on identical SYN


在 2024/6/14 18:54, Florian Westphal 写道:
> luoxuanqiang <[email protected]> wrote:
>> include/net/inet_connection_sock.h | 2 +-
>> net/dccp/ipv4.c | 2 +-
>> net/dccp/ipv6.c | 2 +-
>> net/ipv4/inet_connection_sock.c | 15 +++++++++++----
>> net/ipv4/tcp_input.c | 11 ++++++++++-
>> 5 files changed, 24 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
>> index 7d6b1254c92d..8773d161d184 100644
>> --- a/include/net/inet_connection_sock.h
>> +++ b/include/net/inet_connection_sock.h
>> @@ -264,7 +264,7 @@ struct sock *inet_csk_reqsk_queue_add(struct sock *sk,
>> struct request_sock *req,
>> struct sock *child);
>> void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
>> - unsigned long timeout);
>> + unsigned long timeout, bool *found_dup_sk);
> Nit:
>
> I think it would be preferrable to change retval to bool rather than
> bool *found_dup_sk extra arg, so one can do
>
> bool inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
> unsigned long timeout)
> {
> if (!reqsk_queue_hash_req(req, timeout))
> return false;
>
> i.e. let retval indicate wheter reqsk was inserted or not.
>
> Patch looks good to me otherwise.

Thank you for your confirmation!

Regarding your suggestion, I had considered it before,
but besides tcp_conn_request() calling inet_csk_reqsk_queue_hash_add(),
dccp_v4(v6)_conn_request() also calls it. However, there is no
consideration for a failed insertion within that function, so it's
reasonable to let the caller decide whether to check for duplicate
reqsk.

The purpose of my modification this time is solely to confirm if a
reqsk for the same connection has already been inserted into the ehash.
If the insertion fails, inet_ehash_insert() will handle the
non-insertion gracefully, and I only need to release the duplicate
reqsk. I believe this change is minimal and effective.

Those are my considerations.


2024-06-14 22:25:02

by Kuniyuki Iwashima

[permalink] [raw]
Subject: Re: [PATCH net v2] Fix race for duplicate reqsk on identical SYN

From: luoxuanqiang <[email protected]>
Date: Fri, 14 Jun 2024 20:42:07 +0800
> 在 2024/6/14 18:54, Florian Westphal 写道:
> > luoxuanqiang <[email protected]> wrote:
> >> include/net/inet_connection_sock.h | 2 +-
> >> net/dccp/ipv4.c | 2 +-
> >> net/dccp/ipv6.c | 2 +-
> >> net/ipv4/inet_connection_sock.c | 15 +++++++++++----
> >> net/ipv4/tcp_input.c | 11 ++++++++++-
> >> 5 files changed, 24 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
> >> index 7d6b1254c92d..8773d161d184 100644
> >> --- a/include/net/inet_connection_sock.h
> >> +++ b/include/net/inet_connection_sock.h
> >> @@ -264,7 +264,7 @@ struct sock *inet_csk_reqsk_queue_add(struct sock *sk,
> >> struct request_sock *req,
> >> struct sock *child);
> >> void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
> >> - unsigned long timeout);
> >> + unsigned long timeout, bool *found_dup_sk);
> > Nit:
> >
> > I think it would be preferrable to change retval to bool rather than
> > bool *found_dup_sk extra arg, so one can do

+1


> >
> > bool inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
> > unsigned long timeout)
> > {
> > if (!reqsk_queue_hash_req(req, timeout))
> > return false;
> >
> > i.e. let retval indicate wheter reqsk was inserted or not.
> >
> > Patch looks good to me otherwise.
>
> Thank you for your confirmation!
>
> Regarding your suggestion, I had considered it before,
> but besides tcp_conn_request() calling inet_csk_reqsk_queue_hash_add(),
> dccp_v4(v6)_conn_request() also calls it. However, there is no
> consideration for a failed insertion within that function, so it's
> reasonable to let the caller decide whether to check for duplicate
> reqsk.

I guess you followed 01770a1661657 where found_dup_sk was introduced,
but note that the commit is specific to TCP SYN Cookie and TCP Fast Open
and DCCP is not related.

Then, own_req is common to TCP and DCCP, so found_dup_sk was added as an
additional argument.

However, another similar commit 5e0724d027f05 actually added own_req check
in DCCP path.

I personally would'nt care if DCCP was not changed to handle such a
failure because DCCP will be removed next year, but I still prefer
Florian's suggestion.

Thanks

2024-06-15 06:40:56

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH net v2] Fix race for duplicate reqsk on identical SYN

On Sat, Jun 15, 2024 at 12:24 AM Kuniyuki Iwashima <[email protected]> wrote:
>
> From: luoxuanqiang <[email protected]>
> Date: Fri, 14 Jun 2024 20:42:07 +0800
> > 在 2024/6/14 18:54, Florian Westphal 写道:
> > > luoxuanqiang <[email protected]> wrote:
> > >> include/net/inet_connection_sock.h | 2 +-
> > >> net/dccp/ipv4.c | 2 +-
> > >> net/dccp/ipv6.c | 2 +-
> > >> net/ipv4/inet_connection_sock.c | 15 +++++++++++----
> > >> net/ipv4/tcp_input.c | 11 ++++++++++-
> > >> 5 files changed, 24 insertions(+), 8 deletions(-)
> > >>
> > >> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
> > >> index 7d6b1254c92d..8773d161d184 100644
> > >> --- a/include/net/inet_connection_sock.h
> > >> +++ b/include/net/inet_connection_sock.h
> > >> @@ -264,7 +264,7 @@ struct sock *inet_csk_reqsk_queue_add(struct sock *sk,
> > >> struct request_sock *req,
> > >> struct sock *child);
> > >> void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
> > >> - unsigned long timeout);
> > >> + unsigned long timeout, bool *found_dup_sk);
> > > Nit:
> > >
> > > I think it would be preferrable to change retval to bool rather than
> > > bool *found_dup_sk extra arg, so one can do
>
> +1
>
>
> > >
> > > bool inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
> > > unsigned long timeout)
> > > {
> > > if (!reqsk_queue_hash_req(req, timeout))
> > > return false;
> > >
> > > i.e. let retval indicate wheter reqsk was inserted or not.
> > >
> > > Patch looks good to me otherwise.
> >
> > Thank you for your confirmation!
> >
> > Regarding your suggestion, I had considered it before,
> > but besides tcp_conn_request() calling inet_csk_reqsk_queue_hash_add(),
> > dccp_v4(v6)_conn_request() also calls it. However, there is no
> > consideration for a failed insertion within that function, so it's
> > reasonable to let the caller decide whether to check for duplicate
> > reqsk.
>
> I guess you followed 01770a1661657 where found_dup_sk was introduced,
> but note that the commit is specific to TCP SYN Cookie and TCP Fast Open
> and DCCP is not related.
>
> Then, own_req is common to TCP and DCCP, so found_dup_sk was added as an
> additional argument.
>
> However, another similar commit 5e0724d027f05 actually added own_req check
> in DCCP path.
>
> I personally would'nt care if DCCP was not changed to handle such a
> failure because DCCP will be removed next year, but I still prefer
> Florian's suggestion.
>

Other things to consider :

- I presume this patch targets net tree, and luoxuanqiang needs the
fix to reach stable trees.

- This means a Fixes: tag is needed

- This also means that we should favor a patch with no or trivial
conflicts for stable backports.

Should the patch target the net-next tree, then the requirements can
be different.