Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp96913pxj; Thu, 10 Jun 2021 16:00:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyumyQZndqXfRiYcRcyuUeR8JZdfm8dCtTU64oKb5QG0d0HM+Mje/ayEvn7otoXBDd831Xe X-Received: by 2002:a17:906:a850:: with SMTP id dx16mr700446ejb.333.1623366032369; Thu, 10 Jun 2021 16:00:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623366032; cv=none; d=google.com; s=arc-20160816; b=uDO6m1z4gZvkTnR2aDFEJyRZury/wlgcZVWZSCdFDGhVnZ/HGz0tMoh6CixJmjTuCP MqzYDeyOJfRoJvLJ/ngjF0lgocJWdbOabvzig8IWjpL/fKjVHugvMcekPpBhjcu6oZ0f 4ereO7TEHAzm+oJJTMTwfyyG6YZxolsPjVqi6n8m24AX8b7HCYkrQtQrptIcadfDGKFv AIB8dR1d/KDXmftcbk6GoOs/8eEokuylEg/y2u7iMWmHlbmORPLvERhWdKeA0XzC/+os +ZoikySuQzu+AKc+5wXIJN6/HZqMnK+lnTMtrH2EL4ESjrRtCU+s4q9RPEzn8G/HmJ/U EZ4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=R82nyUuxGqSGusSBGbWrIsDlie+YRw4RwNrKAYgaLA0=; b=eQtavpq5QNELmckGED2aaE65HehkA+AhxI0ny9CSPAZViBujf96zPO7fb4J+086y4F RmkWOUxyXrfpSsFJcx4wCKMm7h8q8+AdXoDzWRJZQ3Kjnaw+Jn0iPO3A2It+NhPlGjXv 4CFz//aMrlTrkEF0p1jt5gPeqwU4yJ9VPHirzBGHL4osxnEtFFMjIj/rCzE2b1dEY2XU dPCadMvHp22+H6M7XG3trVHTGzjGXvz/AQg1brq9HrtL3dft2HDA9L9J2wU7BbLZsF/M vSWe+dY5JvlJpBvtcQoG3lkEBGxnmffuu0nHHKTKreBwJhPjtiEa6G1zB9g8TX4N9+V+ rdrg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.co.jp header.s=amazon201209 header.b=FWoQxbE2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.co.jp Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dt21si3205105ejb.198.2021.06.10.16.00.08; Thu, 10 Jun 2021 16:00:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.co.jp header.s=amazon201209 header.b=FWoQxbE2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.co.jp Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230269AbhFJW6U (ORCPT + 99 others); Thu, 10 Jun 2021 18:58:20 -0400 Received: from smtp-fw-80006.amazon.com ([99.78.197.217]:20612 "EHLO smtp-fw-80006.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230001AbhFJW6U (ORCPT ); Thu, 10 Jun 2021 18:58:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.jp; i=@amazon.co.jp; q=dns/txt; s=amazon201209; t=1623365784; x=1654901784; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=R82nyUuxGqSGusSBGbWrIsDlie+YRw4RwNrKAYgaLA0=; b=FWoQxbE2blo3rhrySiLg6d8NX7s4W0qjshojTmPuPmua+YZuMEXK6VI8 CDW4jYhDd6UiW0ukxIw9KkUvHGrcNulnRo+fiA1+8hr4CQjDN7rAcpUgU J8+qLeZg8u1nm2wV7NcDLDsTlga0I8iLvnih8EUjo8+finrD3r7Fc/aWp o=; X-IronPort-AV: E=Sophos;i="5.83,264,1616457600"; d="scan'208";a="6079187" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO email-inbound-relay-1d-5dd976cd.us-east-1.amazon.com) ([10.25.36.214]) by smtp-border-fw-80006.pdx80.corp.amazon.com with ESMTP; 10 Jun 2021 22:56:23 +0000 Received: from EX13MTAUWB001.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan2.iad.amazon.com [10.40.159.162]) by email-inbound-relay-1d-5dd976cd.us-east-1.amazon.com (Postfix) with ESMTPS id A33C8A25B2; Thu, 10 Jun 2021 22:56:18 +0000 (UTC) Received: from EX13D04ANC001.ant.amazon.com (10.43.157.89) by EX13MTAUWB001.ant.amazon.com (10.43.161.207) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Thu, 10 Jun 2021 22:56:18 +0000 Received: from 88665a182662.ant.amazon.com (10.43.160.41) by EX13D04ANC001.ant.amazon.com (10.43.157.89) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Thu, 10 Jun 2021 22:56:07 +0000 From: Kuniyuki Iwashima To: CC: , , , , , , , , , , , , Subject: Re: [PATCH v7 bpf-next 07/11] tcp: Migrate TCP_NEW_SYN_RECV requests at receiving the final ACK. Date: Fri, 11 Jun 2021 07:56:04 +0900 Message-ID: <20210610225604.618-1-kuniyu@amazon.co.jp> X-Mailer: git-send-email 2.30.2 In-Reply-To: <89c4ce38-fe2c-1d80-f814-c4b3a5e4781d@gmail.com> References: <89c4ce38-fe2c-1d80-f814-c4b3a5e4781d@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.43.160.41] X-ClientProxiedBy: EX13D04UWA002.ant.amazon.com (10.43.160.31) To EX13D04ANC001.ant.amazon.com (10.43.157.89) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Eric Dumazet Date: Thu, 10 Jun 2021 22:36:27 +0200 > On 5/21/21 8:21 PM, Kuniyuki Iwashima wrote: > > This patch also changes the code to call reuseport_migrate_sock() and > > inet_reqsk_clone(), but unlike the other cases, we do not call > > inet_reqsk_clone() right after reuseport_migrate_sock(). > > > > Currently, in the receive path for TCP_NEW_SYN_RECV sockets, its listener > > has three kinds of refcnt: > > > > (A) for listener itself > > (B) carried by reuqest_sock > > (C) sock_hold() in tcp_v[46]_rcv() > > > > While processing the req, (A) may disappear by close(listener). Also, (B) > > can disappear by accept(listener) once we put the req into the accept > > queue. So, we have to hold another refcnt (C) for the listener to prevent > > use-after-free. > > > > For socket migration, we call reuseport_migrate_sock() to select a listener > > with (A) and to increment the new listener's refcnt in tcp_v[46]_rcv(). > > This refcnt corresponds to (C) and is cleaned up later in tcp_v[46]_rcv(). > > Thus we have to take another refcnt (B) for the newly cloned request_sock. > > > > In inet_csk_complete_hashdance(), we hold the count (B), clone the req, and > > try to put the new req into the accept queue. By migrating req after > > winning the "own_req" race, we can avoid such a worst situation: > > > > CPU 1 looks up req1 > > CPU 2 looks up req1, unhashes it, then CPU 1 loses the race > > CPU 3 looks up req2, unhashes it, then CPU 2 loses the race > > ... > > > > Signed-off-by: Kuniyuki Iwashima > > Acked-by: Martin KaFai Lau > > --- > > net/ipv4/inet_connection_sock.c | 34 ++++++++++++++++++++++++++++++--- > > net/ipv4/tcp_ipv4.c | 20 +++++++++++++------ > > net/ipv4/tcp_minisocks.c | 4 ++-- > > net/ipv6/tcp_ipv6.c | 14 +++++++++++--- > > 4 files changed, 58 insertions(+), 14 deletions(-) > > > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > > index c1f068464363..b795198f919a 100644 > > --- a/net/ipv4/inet_connection_sock.c > > +++ b/net/ipv4/inet_connection_sock.c > > @@ -1113,12 +1113,40 @@ struct sock *inet_csk_complete_hashdance(struct sock *sk, struct sock *child, > > struct request_sock *req, bool own_req) > > { > > if (own_req) { > > - inet_csk_reqsk_queue_drop(sk, req); > > - reqsk_queue_removed(&inet_csk(sk)->icsk_accept_queue, req); > > - if (inet_csk_reqsk_queue_add(sk, req, child)) > > + inet_csk_reqsk_queue_drop(req->rsk_listener, req); > > + reqsk_queue_removed(&inet_csk(req->rsk_listener)->icsk_accept_queue, req); > > + > > + if (sk != req->rsk_listener) { > > + /* another listening sk has been selected, > > + * migrate the req to it. > > + */ > > + struct request_sock *nreq; > > + > > + /* hold a refcnt for the nreq->rsk_listener > > + * which is assigned in inet_reqsk_clone() > > + */ > > + sock_hold(sk); > > + nreq = inet_reqsk_clone(req, sk); > > + if (!nreq) { > > + inet_child_forget(sk, req, child); > > Don't you need a sock_put(sk) here ? Yes. If nreq == NULL, inet_reqsk_clone() calls sock_put(). > > \ > > + goto child_put; > > + } > > + > > + refcount_set(&nreq->rsk_refcnt, 1); > > + if (inet_csk_reqsk_queue_add(sk, nreq, child)) { > > + reqsk_migrate_reset(req); > > + reqsk_put(req); > > + return child; > > + } > > + > > + reqsk_migrate_reset(nreq); > > + __reqsk_free(nreq); > > + } else if (inet_csk_reqsk_queue_add(sk, req, child)) { > > return child; > > + } > >