Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp3334395pxb; Mon, 9 Nov 2020 08:31:41 -0800 (PST) X-Google-Smtp-Source: ABdhPJz30tpInP6rnSi2Es2VfcM2SzUKnGzrwwel0+T1ajTx5hJGytsRaayXBKI6qIQqlx9BhGMT X-Received: by 2002:aa7:de89:: with SMTP id j9mr16587620edv.363.1604939501121; Mon, 09 Nov 2020 08:31:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604939501; cv=none; d=google.com; s=arc-20160816; b=WxyFYZZbUtQsgt1cGJLsJBme1B+vgUWxGETlgHZxwn1TD7JMrDTvN9YbPIlKgdvtyW Ruicqce9NhcceRCzjUqYM+aeywwPCVhHwEFBacZEn3G1i7q65UssCwD5+gbypzrPE1YA fnIsf1h2nO5hMZmS8iZs+JNoZA2ymu5TTG6kFYP5EmnBKLFWDsufgYPRoJgsNVFJEPaJ wzOT1McsoN3uGfkXNVQ3kMLPqntoXHpCeh4K4vURu3WHawQ461lRrNniPlxTrqAvapfV +H5uYV5AZl0ff+F4vrgg6zQAHQ2tJcSzPNN1aCIWsOUaLfLCZOhKX+WXt6W8JlUZ1xv6 WZLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=xWumfhF+4cyjGgATytRWJ7/lhYkpBLx8iwpCjuaDTdc=; b=TAtdj6eceeI6Is9d/ZrPxh6vAyDbpnFfKgGP205WXAitqjQhukPWKNWrwrvhcoXj1d cAk/awLBusI/dl3cQvjNBNB6yvD44WxWSYKk7vVLi1DEKyg7tciKXU1LZJUJyY2CQYZ+ 6r2fupdzG4IPlhjlcWwF4vb02xMtkk0u3drCE+7y7MOADO/VaSxnZRO17HHQ/nIF8xjP r5zHxxZkxkYjSywbLYSumZr+e5AsbHPI8QBcnE9lQL6qzHS9eAoDn+UbNdDwcLOrF9Ls mnqbixZFx1MwTNjt4NK5Y6ByHhpOv+lQMtFjk3jx+nbFnKJsMGII2FINVappst5Dd3Nm yu1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=XmcAmTDS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id qc4si7018340ejb.200.2020.11.09.08.31.16; Mon, 09 Nov 2020 08:31:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=XmcAmTDS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730260AbgKIQ3q (ORCPT + 99 others); Mon, 9 Nov 2020 11:29:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729982AbgKIQ3p (ORCPT ); Mon, 9 Nov 2020 11:29:45 -0500 Received: from mail-il1-x133.google.com (mail-il1-x133.google.com [IPv6:2607:f8b0:4864:20::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A5D37C0613D3 for ; Mon, 9 Nov 2020 08:29:45 -0800 (PST) Received: by mail-il1-x133.google.com with SMTP id q1so8853394ilt.6 for ; Mon, 09 Nov 2020 08:29:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=xWumfhF+4cyjGgATytRWJ7/lhYkpBLx8iwpCjuaDTdc=; b=XmcAmTDSM6cmvtBIgljy8h0/ZJXlrDXbLlLnm1ta7C+6kuNrmKyn5uddT6zLD1NPYW 6opR9dbnnkxhGgjJWzw+Oow3d4v5RLT0eYCRzw6OmHvMSFrk+Ya0V0Xv64HYgg/h13qm uUul12xO+UaqI6PTjj45FOIEOsFSj+M0SHsKjYUghoo87GxSvVc+YiCAXN+WSAn0coyJ YeIERDvuamNtQeyFwwqglk7hU2tyKjaR8sVnJ8InuXm07Ief1Cdnw5wwAEeUUOblLCNw TFTt7Tko82Jjx97rvgZTsakzAMrRioxQVZyR+K/JexDtmMKyn4Vr2mS7Iudhs+bNLM+I lQ+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=xWumfhF+4cyjGgATytRWJ7/lhYkpBLx8iwpCjuaDTdc=; b=c2+DzI/iCBGHuagaQP3kGMMq/B+bh7u2CijCedcpsy2UM76HTF+IMxIhJsYQyXmKbr uIzHCjmhSjbWQMq3P96E9ulWb41hKA54Te4IBce2alsAddY10fOvTnaiYfIdNeP6Ugh7 slK5BVsZ2BOWE5bSbRpGz8cwoWOp1pImc8ogRqvaGW6KD0bebtnQ7PZQN7FKa1yYa+U3 /mK1HxpbOudp52cMxYZ/AqoqhCZiYAt3Sm/DtGbhsMZ/1ulL8jHCBUvQvUbq8wjYj58t w/d8NX8zFnHK3iqBxxBtQXAn+0+HYZ9pVxYNsniJdWMZawnVHuVsgu+ki8+nsHAGvXbc cb3g== X-Gm-Message-State: AOAM530zsOJUIEB5z5rAM4QoKbOAB8BhlNO8RzCiAQjN413K895zEhXn 4Sf9nmnu7/hcou9ytJRsaD7Mj7dWgDYVljzVxz5h/OoQZuY= X-Received: by 2002:a92:6f11:: with SMTP id k17mr10807878ilc.69.1604939384629; Mon, 09 Nov 2020 08:29:44 -0800 (PST) MIME-Version: 1.0 References: <20201109161146.GA629827@rdias-suse-pc.lan> In-Reply-To: <20201109161146.GA629827@rdias-suse-pc.lan> From: Eric Dumazet Date: Mon, 9 Nov 2020 17:29:33 +0100 Message-ID: Subject: Re: [PATCH v2] tcp: fix race condition when creating child sockets from syncookies To: Ricardo Dias Cc: David Miller , Jakub Kicinski , Alexey Kuznetsov , Hideaki YOSHIFUJI , netdev , LKML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 9, 2020 at 5:11 PM Ricardo Dias wrote: > > When the TCP stack is in SYN flood mode, the server child socket is > created from the SYN cookie received in a TCP packet with the ACK flag > set. > > The child socket is created when the server receives the first TCP > packet with a valid SYN cookie from the client. Usually, this packet > corresponds to the final step of the TCP 3-way handshake, the ACK > packet. But is also possible to receive a valid SYN cookie from the > first TCP data packet sent by the client, and thus create a child socket > from that SYN cookie. > > Since a client socket is ready to send data as soon as it receives the > SYN+ACK packet from the server, the client can send the ACK packet (sent > by the TCP stack code), and the first data packet (sent by the userspace > program) almost at the same time, and thus the server will equally > receive the two TCP packets with valid SYN cookies almost at the same > instant. > > When such event happens, the TCP stack code has a race condition that > occurs between the momement a lookup is done to the established > connections hashtable to check for the existence of a connection for the > same client, and the moment that the child socket is added to the > established connections hashtable. As a consequence, this race condition > can lead to a situation where we add two child sockets to the > established connections hashtable and deliver two sockets to the > userspace program to the same client. > > This patch fixes the race condition by checking if an existing child > socket exists for the same client when we are adding the second child > socket to the established connections socket. If an existing child > socket exists, we return that socket and use it to process the TCP > packet received, and discard the second child socket to the same client. > > Signed-off-by: Ricardo Dias > Reported-by: kernel test robot The kernel test robot reported a bug on your v1, you do not have to claim the bot found this issue. > --- > v2 (2020-11-09): > * Changed the author's email domain. > * Removed the helper function inet_ehash_insert_chk_dup and moved the > logic to the existing inet_ehash_insert. > * Updated the callers of iner_ehash_nolisten to deal with the new > logic. > > > include/net/inet_hashtables.h | 6 ++-- > net/dccp/ipv4.c | 4 ++- > net/dccp/ipv6.c | 4 ++- > net/ipv4/inet_hashtables.c | 63 +++++++++++++++++++++++++++++------ > net/ipv4/syncookies.c | 5 ++- > net/ipv4/tcp_ipv4.c | 12 ++++++- > net/ipv6/tcp_ipv6.c | 19 ++++++++++- > 7 files changed, 94 insertions(+), 19 deletions(-) > > diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h > index 92560974ea67..dffa345d52a7 100644 > --- a/include/net/inet_hashtables.h > +++ b/include/net/inet_hashtables.h > @@ -247,9 +247,9 @@ void inet_hashinfo2_init(struct inet_hashinfo *h, const char *name, > unsigned long high_limit); > int inet_hashinfo2_init_mod(struct inet_hashinfo *h); > > -bool inet_ehash_insert(struct sock *sk, struct sock *osk); > -bool inet_ehash_nolisten(struct sock *sk, struct sock *osk); > -int __inet_hash(struct sock *sk, struct sock *osk); > +bool inet_ehash_insert(struct sock *sk, struct sock **osk); > +bool inet_ehash_nolisten(struct sock *sk, struct sock **osk); > +int __inet_hash(struct sock *sk, struct sock **osk); > int inet_hash(struct sock *sk); > void inet_unhash(struct sock *sk); > > diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c > index 9c28c8251125..99bbba478991 100644 > --- a/net/dccp/ipv4.c > +++ b/net/dccp/ipv4.c > @@ -400,6 +400,7 @@ struct sock *dccp_v4_request_recv_sock(const struct sock *sk, > struct inet_request_sock *ireq; > struct inet_sock *newinet; > struct sock *newsk; > + struct sock *osk; > > if (sk_acceptq_is_full(sk)) > goto exit_overflow; > @@ -427,7 +428,8 @@ struct sock *dccp_v4_request_recv_sock(const struct sock *sk, > > if (__inet_inherit_port(sk, newsk) < 0) > goto put_and_exit; > - *own_req = inet_ehash_nolisten(newsk, req_to_sk(req_unhash)); > + osk = req_to_sk(req_unhash); > + *own_req = inet_ehash_nolisten(newsk, &osk); > if (*own_req) > ireq->ireq_opt = NULL; > else > diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c > index ef4ab28cfde0..91a825c00a97 100644 > --- a/net/dccp/ipv6.c > +++ b/net/dccp/ipv6.c > @@ -407,6 +407,7 @@ static struct sock *dccp_v6_request_recv_sock(const struct sock *sk, > struct inet_sock *newinet; > struct dccp6_sock *newdp6; > struct sock *newsk; > + struct sock *osk; > > if (skb->protocol == htons(ETH_P_IP)) { > /* > @@ -533,7 +534,8 @@ static struct sock *dccp_v6_request_recv_sock(const struct sock *sk, > dccp_done(newsk); > goto out; > } > - *own_req = inet_ehash_nolisten(newsk, req_to_sk(req_unhash)); > + osk = req_to_sk(req_unhash); > + *own_req = inet_ehash_nolisten(newsk, &osk); > /* Clone pktoptions received with SYN, if we own the req */ > if (*own_req && ireq->pktopts) { > newnp->pktoptions = skb_clone(ireq->pktopts, GFP_ATOMIC); > diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c > index 239e54474b65..8d62b22b9a95 100644 > --- a/net/ipv4/inet_hashtables.c > +++ b/net/ipv4/inet_hashtables.c > @@ -510,17 +510,27 @@ static u32 inet_sk_port_offset(const struct sock *sk) > inet->inet_dport); > } > > -/* insert a socket into ehash, and eventually remove another one > - * (The another one can be a SYN_RECV or TIMEWAIT > +/* Insert a socket into ehash, and eventually remove another one > + * (The another one can be a SYN_RECV or TIMEWAIT) > + * If an existing socket already exists, it returns that socket > + * through the osk parameter. > */ > -bool inet_ehash_insert(struct sock *sk, struct sock *osk) > +bool inet_ehash_insert(struct sock *sk, struct sock **osk) > { > struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; > struct hlist_nulls_head *list; > struct inet_ehash_bucket *head; > - spinlock_t *lock; > + const struct hlist_nulls_node *node; > + struct sock *esk; > + spinlock_t *lock; /* protects hashinfo socket entry */ > + struct net *net = sock_net(sk); > + const int dif = sk->sk_bound_dev_if; > + const int sdif = sk->sk_bound_dev_if; > bool ret = true; > > + INET_ADDR_COOKIE(acookie, sk->sk_daddr, sk->sk_rcv_saddr); > + const __portpair ports = INET_COMBINED_PORTS(sk->sk_dport, sk->sk_num); > + This does not work for IPv6. This function is used both for IPv4 and IPv6 Please test your changes for IPv6, thank you ! > WARN_ON_ONCE(!sk_unhashed(sk)); > > sk->sk_hash = sk_ehashfn(sk); > @@ -529,17 +539,48 @@ bool inet_ehash_insert(struct sock *sk, struct sock *osk) > lock = inet_ehash_lockp(hashinfo, sk->sk_hash); > > spin_lock(lock); > - if (osk) { > - WARN_ON_ONCE(sk->sk_hash != osk->sk_hash); > - ret = sk_nulls_del_node_init_rcu(osk); > + if (osk && *osk) { > + WARN_ON_ONCE(sk->sk_hash != (*osk)->sk_hash); > + ret = sk_nulls_del_node_init_rcu(*osk); > + } else if (osk && !*osk) { > +begin: > + sk_nulls_for_each_rcu(esk, node, list) { > + if (esk->sk_hash != sk->sk_hash) > + continue; > + if (likely(INET_MATCH(esk, net, acookie, > + sk->sk_daddr, > + sk->sk_rcv_saddr, ports, > + dif, sdif))) { > + if (unlikely(!refcount_inc_not_zero(&esk->sk_refcnt))) > + goto out; > + if (unlikely(!INET_MATCH(esk, net, acookie, > + sk->sk_daddr, > + sk->sk_rcv_saddr, > + ports, > + dif, sdif))) { This can not happen, since you own the spinlock protecting the hash bucket.