Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp412513rwb; Sat, 14 Jan 2023 02:32:08 -0800 (PST) X-Google-Smtp-Source: AMrXdXsPH0yb2S27diJPJYY6byqijxpgSbElVrCLEJd94OwQXp2ICE49eRnUoYUfJ7nsGQ2fuLok X-Received: by 2002:aa7:d9d1:0:b0:46c:b25a:6d7f with SMTP id v17-20020aa7d9d1000000b0046cb25a6d7fmr3119929eds.8.1673692328646; Sat, 14 Jan 2023 02:32:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673692328; cv=none; d=google.com; s=arc-20160816; b=guTA3Gy/MFnWDKZx9GbVHYI0ykbT4+L40+0Ywp6+hkczqhbvuFR573CXzEF6rXcrCs 4lA8vfcoRLTWpjRoxU+ssyK+9bDwMvED+Sdstagk4Blrw/pMti3nCsO1GWSHWlzZ/5sR Qw/e40qgyCGfAdbPXh752U89du1cr6a5l/gZaQmKXyvH8v+jm7UpULZQEjk5iVyFSnW3 zS6DPYVhiHr8z6Mifno4CHKmmaH5qXUFuKcuXBw7MvZsLU51WcmmM979jnG/HhFG0JBv Erya6jhF8+12E0FNFmIyR4v4WdSfgFRdmYG0Nh0Jgu59k0hWsUae9YPxIWKwDqINLh1r YQ6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=vlkHxXRAjBY27dQMCNR+CzihlFwMuNW06sCOMehcAq0=; b=UedE5YKv5nh0dAlPCzR03n1tUPWOo8cCp7/N7AxkRsQHboCfSCnT2l1AuZLxYRIawS LtkPi4h70nxA3+Snl88i8RYaE9RvAot1KAstXvFul+CdQeLZhNp6Gsm7Dn1j8xTyFqQf fcb/RZptNaz1qsDjdNvIN6lGcKzBD40SFqKpQHIU7dL5RD0R+63Z3/7fhAaQeU9e2KUG sTkfaPhhki1c8tyhVZuxwechwa8W7dqe8YzJNj03ga1WDui8CdsDc0O8rm6iEGfXtcj5 Oz0BDubtIJsDvW70krKv8m01Cd9VOS+F/lvHA4sAyj4T1vkB1Mte/DoCyRtoz8ipJpH7 qHUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=GHOUUpLR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bb2-20020a1709070a0200b008626e197ac3si9699469ejc.692.2023.01.14.02.31.55; Sat, 14 Jan 2023 02:32:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=GHOUUpLR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230188AbjANJp7 (ORCPT + 53 others); Sat, 14 Jan 2023 04:45:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230182AbjANJpj (ORCPT ); Sat, 14 Jan 2023 04:45:39 -0500 Received: from mail-yb1-xb31.google.com (mail-yb1-xb31.google.com [IPv6:2607:f8b0:4864:20::b31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3C33769D for ; Sat, 14 Jan 2023 01:45:37 -0800 (PST) Received: by mail-yb1-xb31.google.com with SMTP id 9so7652486ybn.6 for ; Sat, 14 Jan 2023 01:45:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=vlkHxXRAjBY27dQMCNR+CzihlFwMuNW06sCOMehcAq0=; b=GHOUUpLRj6JqmyJ+OnKfQOyK4nGJxSufmjelFoejUpkbqGOrxcmSyXOJtt3s8s4u9G qia7s6/xIqHmMinGnDfCjJK44mjUY6y+asbzWFKEOA+PZFRoCGRSwgTKH0JW2ze33eKF 9iRWXvOxsYM0sPsc+N4jN1IMDhlUf6+xASqm0NyOaCAEWz19SJQ+PiaWEEWD56Nf4XVT 2ZeLe26sDJD69VNkXjPxJjT9czMWH8uWK85v+wdQkc/fvXZS7mTzhRYjhZRfn1h6ZdCI /RPAgvsiW6X5v7e5OEu+HG1QjQDCLChhO8zcGESFUVdk95UYCaSGqCFU4C8jBa+tbRgs tKRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=vlkHxXRAjBY27dQMCNR+CzihlFwMuNW06sCOMehcAq0=; b=243efok6zeU/hEeLIXY4GixJLmR4wWpwz10ajp8TSKbFiTVZWHohMw9Jt9Ui+TDppr Lirf/tR+374p6+7yEeRxO52imZnNB057OEPvWuAHXwMm/Jsqa9X92FWGbE9m0Yes3ymH zeHmYU2V0JkU7l3aJKaaPhY228tFvWFr+RKk5UyfQhJqstqZ1GgehCOwtyfv3Uz6nxMF 79+eXTKmAzK8aEEFhnV9rKwCIJpMM7KO5aGHRD2FvV77U3+VWpwK4GftdCEf+Rz+N8NJ J+5GoHLD30pRncZ6ZRbGXFeUySOwSiZR0jqk0BFqvj1tRdUnGrsww/NJLsWEWExXAkm2 Lw/Q== X-Gm-Message-State: AFqh2kqw+2EiRY0mqBbOhdVRMRzV/mdB6CUIIYOeBDC2Qqx4HmOUHsp6 iSvBkuWVnqKwpQLXlbtEETFaO5zbp8tCI5GzuvUm7A== X-Received: by 2002:a25:8f89:0:b0:7b3:bb8:9daf with SMTP id u9-20020a258f89000000b007b30bb89dafmr2207027ybl.427.1673689536628; Sat, 14 Jan 2023 01:45:36 -0800 (PST) MIME-Version: 1.0 References: <20230112065336.41034-1-kerneljasonxing@gmail.com> In-Reply-To: <20230112065336.41034-1-kerneljasonxing@gmail.com> From: Eric Dumazet Date: Sat, 14 Jan 2023 10:45:23 +0100 Message-ID: Subject: Re: [PATCH net] tcp: avoid the lookup process failing to get sk in ehash table To: Jason Xing Cc: davem@davemloft.net, yoshfuji@linux-ipv6.org, dsahern@kernel.org, kuba@kernel.org--cc, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Jason Xing Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 12, 2023 at 7:54 AM Jason Xing wrote: > > From: Jason Xing > > While one cpu is working on looking up the right socket from ehash > table, another cpu is done deleting the request socket and is about > to add (or is adding) the big socket from the table. It means that > we could miss both of them, even though it has little chance. > > Let me draw a call trace map of the server side. > CPU 0 CPU 1 > ----- ----- > tcp_v4_rcv() syn_recv_sock() > inet_ehash_insert() > -> sk_nulls_del_node_init_rcu(osk) > __inet_lookup_established() > -> __sk_nulls_add_node_rcu(sk, list) > > Notice that the CPU 0 is receiving the data after the final ack > during 3-way shakehands and CPU 1 is still handling the final ack. > > Why could this be a real problem? > This case is happening only when the final ack and the first data > receiving by different CPUs. Then the server receiving data with > ACK flag tries to search one proper established socket from ehash > table, but apparently it fails as my map shows above. After that, > the server fetches a listener socket and then sends a RST because > it finds a ACK flag in the skb (data), which obeys RST definition > in RFC 793. > > Many thanks to Eric for great help from beginning to end. > > Fixes: 5e0724d027f0 ("tcp/dccp: fix hashdance race for passive sessions") > Signed-off-by: Jason Xing > --- > net/ipv4/inet_hashtables.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c > index 24a38b56fab9..18f88cb4efcb 100644 > --- a/net/ipv4/inet_hashtables.c > +++ b/net/ipv4/inet_hashtables.c > @@ -650,7 +650,16 @@ bool inet_ehash_insert(struct sock *sk, struct sock *osk, bool *found_dup_sk) > spin_lock(lock); > if (osk) { > WARN_ON_ONCE(sk->sk_hash != osk->sk_hash); > + if (sk_hashed(osk)) > + /* Before deleting the node, we insert a new one to make > + * sure that the look-up=sk process would not miss either > + * of them and that at least one node would exist in ehash > + * table all the time. Otherwise there's a tiny chance > + * that lookup process could find nothing in ehash table. > + */ > + __sk_nulls_add_node_rcu(sk, list); In our private email exchange, I suggested to insert sk at the _tail_ of the hash bucket. Inserting it at the _head_ would still leave a race condition, because a concurrent reader might have already started the bucket traversal, and would not see 'sk'. Thanks. > ret = sk_nulls_del_node_init_rcu(osk); > + goto unlock; > } else if (found_dup_sk) { > *found_dup_sk = inet_ehash_lookup_by_sk(sk, list); > if (*found_dup_sk) > @@ -660,6 +669,7 @@ bool inet_ehash_insert(struct sock *sk, struct sock *osk, bool *found_dup_sk) > if (ret) > __sk_nulls_add_node_rcu(sk, list); > > +unlock: > spin_unlock(lock); > > return ret; > -- > 2.37.3 >