Received: by 2002:a05:6602:2086:0:0:0:0 with SMTP id a6csp4452035ioa; Wed, 27 Apr 2022 04:17:57 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxPOYXL2nAWRs58Q6QziId/4ESgPh2FVV27ng1vBt/51rJbIjkG4kZM+79zmsk8cTyfPbis X-Received: by 2002:a17:90b:4b12:b0:1d2:8bda:ef7 with SMTP id lx18-20020a17090b4b1200b001d28bda0ef7mr43550420pjb.174.1651058277258; Wed, 27 Apr 2022 04:17:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651058277; cv=none; d=google.com; s=arc-20160816; b=xLQEWM7URzvmfbWq9CriZeOV4rvxCHgImy3Uz1KsNnbWjcIxFhE+tYmTh2U2tnSM4p ZF0pJjn1daJuiw05y6O863UeDcl10U4kEYOYtSRrgI0ryNLbOjeLyNWslrDVkNHEIYGJ 7ebIlIbzZbtHnk9erQF6kcjU9odf/GxsQSGMMRcU5oObO9JkKy9KQgVU+mubiOnnT7LU unA+PNq5R3E8e7QXj/+J1HypEf6JG6+OrAf10BJOPiG+NGDmPznbmsoFJdUdIbpWhXc5 PBriUMO4BTHgEC0yHbx/k/2Hz7OEpAqKUZyl2Iam/3+6YGjD47nxZ2rIYss6QEDvBJ4K mOWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=9N/FgfTVX6C143pRAhiYosWy/nOP0pKS/QyYwEp904I=; b=SnbrLMTB0gcbj5OCHLL6hyzQeTgSSkciciGkq3gONz4IkVaUhksqGRX50sIlu/WobN 4JmeQftGBZF9XajLqXjkKEw4nkcvw10xZKffQLdf6yOYtmWwSJd4RD+iS0OyI6lSZ4eA TyaDT8x3EhO7krhyKjODm/6R9HceEa2YRFgpQ3ZGnyEPYW75XSq9spcTgwwkMR/yKnVF 6mwWiA+usfSIh/pdD4nE4UowUG7/NZe33mt7gJZ0XJiZwL6tcSpyCHbcrGjSjqSuQ7pw GL7niUHXyfJ4e3o5mj3l0spUuvAFpEKGlcJB3q0odTv6ByPenHtKF3KoJyPlcJTtvICf QMKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=rDGuvNz4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id z11-20020a63c04b000000b003ab6bb4164dsi1121796pgi.856.2022.04.27.04.17.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Apr 2022 04:17:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=rDGuvNz4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0E3694630FD; Wed, 27 Apr 2022 03:11:53 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239638AbiDZIiR (ORCPT + 99 others); Tue, 26 Apr 2022 04:38:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345435AbiDZIeg (ORCPT ); Tue, 26 Apr 2022 04:34:36 -0400 Received: from sin.source.kernel.org (sin.source.kernel.org [IPv6:2604:1380:40e1:4800::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C0F57629A; Tue, 26 Apr 2022 01:27:20 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id 87F9FCE1BB9; Tue, 26 Apr 2022 08:27:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3B6BAC385A4; Tue, 26 Apr 2022 08:27:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1650961636; bh=20+meZ/1nobmL5ZVTR+I2Z8l9mZWKUh6uugxe+UszFc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rDGuvNz4cIPmLCMcPqtQWE84WqC9Gw0q5lrTwJQJRwt5r03Ver3zk79yApc7rt7Ug kBvcUnYLP7Pd9on20qGZ83lmey7mH8nBd8HWpuilM+RV5gCtvAiQLgTm61ruM2uyuW kG8n1Whdo9nRVs4CVEElj5+SyHIbaJsq9DDkbtDE= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Ricardo Dias , Eric Dumazet , Jakub Kicinski , Sasha Levin Subject: [PATCH 4.19 08/53] tcp: fix race condition when creating child sockets from syncookies Date: Tue, 26 Apr 2022 10:20:48 +0200 Message-Id: <20220426081735.900497168@linuxfoundation.org> X-Mailer: git-send-email 2.36.0 In-Reply-To: <20220426081735.651926456@linuxfoundation.org> References: <20220426081735.651926456@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Ricardo Dias [ Upstream commit 01770a166165738a6e05c3d911fb4609cc4eb416 ] When the TCP stack is in SYN flood mode, the server child socket is created from the SYN cookie received in a TCP packet with the ACK flag set. The child socket is created when the server receives the first TCP packet with a valid SYN cookie from the client. Usually, this packet corresponds to the final step of the TCP 3-way handshake, the ACK packet. But is also possible to receive a valid SYN cookie from the first TCP data packet sent by the client, and thus create a child socket from that SYN cookie. Since a client socket is ready to send data as soon as it receives the SYN+ACK packet from the server, the client can send the ACK packet (sent by the TCP stack code), and the first data packet (sent by the userspace program) almost at the same time, and thus the server will equally receive the two TCP packets with valid SYN cookies almost at the same instant. When such event happens, the TCP stack code has a race condition that occurs between the momement a lookup is done to the established connections hashtable to check for the existence of a connection for the same client, and the moment that the child socket is added to the established connections hashtable. As a consequence, this race condition can lead to a situation where we add two child sockets to the established connections hashtable and deliver two sockets to the userspace program to the same client. This patch fixes the race condition by checking if an existing child socket exists for the same client when we are adding the second child socket to the established connections socket. If an existing child socket exists, we drop the packet and discard the second child socket to the same client. Signed-off-by: Ricardo Dias Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20201120111133.GA67501@rdias-suse-pc.lan Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- include/net/inet_hashtables.h | 5 +- net/dccp/ipv4.c | 2 - net/dccp/ipv6.c | 2 - net/ipv4/inet_connection_sock.c | 2 - net/ipv4/inet_hashtables.c | 68 +++++++++++++++++++++++++++++++++++----- net/ipv4/tcp_ipv4.c | 15 +++++++- net/ipv6/tcp_ipv6.c | 13 +++++++ 7 files changed, 91 insertions(+), 16 deletions(-) --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -232,8 +232,9 @@ void inet_hashinfo2_init(struct inet_has unsigned long low_limit, unsigned long high_limit); -bool inet_ehash_insert(struct sock *sk, struct sock *osk); -bool inet_ehash_nolisten(struct sock *sk, struct sock *osk); +bool inet_ehash_insert(struct sock *sk, struct sock *osk, bool *found_dup_sk); +bool inet_ehash_nolisten(struct sock *sk, struct sock *osk, + bool *found_dup_sk); int __inet_hash(struct sock *sk, struct sock *osk); int inet_hash(struct sock *sk); void inet_unhash(struct sock *sk); --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -428,7 +428,7 @@ struct sock *dccp_v4_request_recv_sock(c if (__inet_inherit_port(sk, newsk) < 0) goto put_and_exit; - *own_req = inet_ehash_nolisten(newsk, req_to_sk(req_unhash)); + *own_req = inet_ehash_nolisten(newsk, req_to_sk(req_unhash), NULL); if (*own_req) ireq->ireq_opt = NULL; else --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -538,7 +538,7 @@ static struct sock *dccp_v6_request_recv dccp_done(newsk); goto out; } - *own_req = inet_ehash_nolisten(newsk, req_to_sk(req_unhash)); + *own_req = inet_ehash_nolisten(newsk, req_to_sk(req_unhash), NULL); /* Clone pktoptions received with SYN, if we own the req */ if (*own_req && ireq->pktopts) { newnp->pktoptions = skb_clone(ireq->pktopts, GFP_ATOMIC); --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -793,7 +793,7 @@ static void reqsk_queue_hash_req(struct timer_setup(&req->rsk_timer, reqsk_timer_handler, TIMER_PINNED); mod_timer(&req->rsk_timer, jiffies + timeout); - inet_ehash_insert(req_to_sk(req), NULL); + inet_ehash_insert(req_to_sk(req), NULL, NULL); /* before letting lookups find us, make sure all req fields * are committed to memory and refcnt initialized. */ --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -24,6 +24,9 @@ #include #include #include +#if IS_ENABLED(CONFIG_IPV6) +#include +#endif #include #include #include @@ -513,10 +516,52 @@ static u32 inet_sk_port_offset(const str inet->inet_dport); } -/* insert a socket into ehash, and eventually remove another one - * (The another one can be a SYN_RECV or TIMEWAIT +/* Searches for an exsiting socket in the ehash bucket list. + * Returns true if found, false otherwise. */ -bool inet_ehash_insert(struct sock *sk, struct sock *osk) +static bool inet_ehash_lookup_by_sk(struct sock *sk, + struct hlist_nulls_head *list) +{ + const __portpair ports = INET_COMBINED_PORTS(sk->sk_dport, sk->sk_num); + const int sdif = sk->sk_bound_dev_if; + const int dif = sk->sk_bound_dev_if; + const struct hlist_nulls_node *node; + struct net *net = sock_net(sk); + struct sock *esk; + + INET_ADDR_COOKIE(acookie, sk->sk_daddr, sk->sk_rcv_saddr); + + sk_nulls_for_each_rcu(esk, node, list) { + if (esk->sk_hash != sk->sk_hash) + continue; + if (sk->sk_family == AF_INET) { + if (unlikely(INET_MATCH(esk, net, acookie, + sk->sk_daddr, + sk->sk_rcv_saddr, + ports, dif, sdif))) { + return true; + } + } +#if IS_ENABLED(CONFIG_IPV6) + else if (sk->sk_family == AF_INET6) { + if (unlikely(INET6_MATCH(esk, net, + &sk->sk_v6_daddr, + &sk->sk_v6_rcv_saddr, + ports, dif, sdif))) { + return true; + } + } +#endif + } + return false; +} + +/* Insert a socket into ehash, and eventually remove another one + * (The another one can be a SYN_RECV or TIMEWAIT) + * If an existing socket already exists, socket sk is not inserted, + * and sets found_dup_sk parameter to true. + */ +bool inet_ehash_insert(struct sock *sk, struct sock *osk, bool *found_dup_sk) { struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; struct hlist_nulls_head *list; @@ -535,16 +580,23 @@ bool inet_ehash_insert(struct sock *sk, if (osk) { WARN_ON_ONCE(sk->sk_hash != osk->sk_hash); ret = sk_nulls_del_node_init_rcu(osk); + } else if (found_dup_sk) { + *found_dup_sk = inet_ehash_lookup_by_sk(sk, list); + if (*found_dup_sk) + ret = false; } + if (ret) __sk_nulls_add_node_rcu(sk, list); + spin_unlock(lock); + return ret; } -bool inet_ehash_nolisten(struct sock *sk, struct sock *osk) +bool inet_ehash_nolisten(struct sock *sk, struct sock *osk, bool *found_dup_sk) { - bool ok = inet_ehash_insert(sk, osk); + bool ok = inet_ehash_insert(sk, osk, found_dup_sk); if (ok) { sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); @@ -588,7 +640,7 @@ int __inet_hash(struct sock *sk, struct int err = 0; if (sk->sk_state != TCP_LISTEN) { - inet_ehash_nolisten(sk, osk); + inet_ehash_nolisten(sk, osk, NULL); return 0; } WARN_ON(!sk_unhashed(sk)); @@ -683,7 +735,7 @@ int __inet_hash_connect(struct inet_time tb = inet_csk(sk)->icsk_bind_hash; spin_lock_bh(&head->lock); if (sk_head(&tb->owners) == sk && !sk->sk_bind_node.next) { - inet_ehash_nolisten(sk, NULL); + inet_ehash_nolisten(sk, NULL, NULL); spin_unlock_bh(&head->lock); return 0; } @@ -759,7 +811,7 @@ ok: inet_bind_hash(sk, tb, port); if (sk_unhashed(sk)) { inet_sk(sk)->inet_sport = htons(port); - inet_ehash_nolisten(sk, (struct sock *)tw); + inet_ehash_nolisten(sk, (struct sock *)tw, NULL); } if (tw) inet_twsk_bind_unhash(tw, hinfo); --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1415,6 +1415,7 @@ struct sock *tcp_v4_syn_recv_sock(const bool *own_req) { struct inet_request_sock *ireq; + bool found_dup_sk = false; struct inet_sock *newinet; struct tcp_sock *newtp; struct sock *newsk; @@ -1485,12 +1486,22 @@ struct sock *tcp_v4_syn_recv_sock(const if (__inet_inherit_port(sk, newsk) < 0) goto put_and_exit; - *own_req = inet_ehash_nolisten(newsk, req_to_sk(req_unhash)); + *own_req = inet_ehash_nolisten(newsk, req_to_sk(req_unhash), + &found_dup_sk); if (likely(*own_req)) { tcp_move_syn(newtp, req); ireq->ireq_opt = NULL; } else { - newinet->inet_opt = NULL; + if (!req_unhash && found_dup_sk) { + /* This code path should only be executed in the + * syncookie case only + */ + bh_unlock_sock(newsk); + sock_put(newsk); + newsk = NULL; + } else { + newinet->inet_opt = NULL; + } } return newsk; --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1090,6 +1090,7 @@ static struct sock *tcp_v6_syn_recv_sock struct ipv6_txoptions *opt; struct tcp6_sock *newtcp6sk; struct inet_sock *newinet; + bool found_dup_sk = false; struct tcp_sock *newtp; struct sock *newsk; #ifdef CONFIG_TCP_MD5SIG @@ -1258,7 +1259,8 @@ static struct sock *tcp_v6_syn_recv_sock tcp_done(newsk); goto out; } - *own_req = inet_ehash_nolisten(newsk, req_to_sk(req_unhash)); + *own_req = inet_ehash_nolisten(newsk, req_to_sk(req_unhash), + &found_dup_sk); if (*own_req) { tcp_move_syn(newtp, req); @@ -1273,6 +1275,15 @@ static struct sock *tcp_v6_syn_recv_sock skb_set_owner_r(newnp->pktoptions, newsk); } } + } else { + if (!req_unhash && found_dup_sk) { + /* This code path should only be executed in the + * syncookie case only + */ + bh_unlock_sock(newsk); + sock_put(newsk); + newsk = NULL; + } } return newsk;