Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp4165222pxy; Mon, 26 Apr 2021 20:49:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyCuAC2qf5OSnqP8xxc4GOQLy/D+28ju5Yk61BqgO65nZP8d0T4iFwMJiiAxvb/6E9ZcLEA X-Received: by 2002:a63:b09:: with SMTP id 9mr20036050pgl.107.1619495378074; Mon, 26 Apr 2021 20:49:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619495378; cv=none; d=google.com; s=arc-20160816; b=oo7txQKBlU8irR19xHyrNNiArzx1d87V4FiB2xy2o15p/Izj7y0HUP8ndeXvb8CYBH syJAt4YtiXDMv96I6Hda+XDpZ0N37wHlVbEdeRoLlk8uty5aS9F9r10gQjfH9flfWajM Vw8lz0AD5HTC9esBRjlG1Rc3fLpyUGp1MWtNvsCtMlzktKLG/sItCeNJOBGly6t7vtKp +8Vu9tPugoMZf/2/xyTcOb3VelBVeBVn7slkglPQ9ri2JesfilySiVhlelaa2OdvXy6K iYWTrjyzYJlF+1SE4hsUxauMNkgxnIjMGYf3d0nc3HNiUzQKmGBizvsBpwX1znRKrIIy G+Yg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=StTHGdJH0e7a7hTsUp4VoJJabGCqsz7CPVVNUY3bueU=; b=FfIsQ4YuysFkQwkL9d5pkJ0NyjBOY9qp5qdOkEgM3PtXAOkDxzosC4mqUAbKpO8T5A wAtCVQbpvFDX9SYdSuKIfKA/o0ByR2YV7ULWdYaqylhQgpc0NxteAV+Jdzv/JA7fXPGz mZi0OLUeSDxiMCPAimgPTYIOIDZ8gD/2ZQMcraBoxMh1ZXyl5uKdP2CorS9091dWwxDX 1ZEi7TVrSVROHgo6wpsdxqvSdohsA/rLH40aKNDk16UId21JmsDlRvSGoabA2uoMNRlQ ymxot9tFjxbZrICYnRZmVeZya7h4kpELwKP6OqbWQ6/fxgTQlr2J4ROEcKViBBULIMz3 yDbQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.co.jp header.s=amazon201209 header.b=sniXgPyx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.co.jp Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g1si20492023pgp.134.2021.04.26.20.49.25; Mon, 26 Apr 2021 20:49:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.co.jp header.s=amazon201209 header.b=sniXgPyx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.co.jp Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234717AbhD0DsR (ORCPT + 99 others); Mon, 26 Apr 2021 23:48:17 -0400 Received: from smtp-fw-6002.amazon.com ([52.95.49.90]:34631 "EHLO smtp-fw-6002.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234038AbhD0DsR (ORCPT ); Mon, 26 Apr 2021 23:48:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.jp; i=@amazon.co.jp; q=dns/txt; s=amazon201209; t=1619495256; x=1651031256; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=StTHGdJH0e7a7hTsUp4VoJJabGCqsz7CPVVNUY3bueU=; b=sniXgPyxZeMB0gYM5TVfl1fuiYCkjzS5ZkkHBq99oV82InKqP6jf44Ya QV/Yy7wIbkCRR606ZhQx1fIgoa8d/4J+r45X9O0jEz1Aa3zTCAjrb2efR UeYRSgIL/XjDRo0qsh6vvtCa4+hCjXpj8jUPo/WNk1uo8SHuFKdGQO/Ch E=; X-IronPort-AV: E=Sophos;i="5.82,254,1613433600"; d="scan'208";a="108548950" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO email-inbound-relay-2a-e7be2041.us-west-2.amazon.com) ([10.43.8.2]) by smtp-border-fw-6002.iad6.amazon.com with ESMTP; 27 Apr 2021 03:47:34 +0000 Received: from EX13MTAUWB001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan2.pdx.amazon.com [10.236.137.194]) by email-inbound-relay-2a-e7be2041.us-west-2.amazon.com (Postfix) with ESMTPS id ACB73A0576; Tue, 27 Apr 2021 03:47:32 +0000 (UTC) Received: from EX13D04ANC001.ant.amazon.com (10.43.157.89) by EX13MTAUWB001.ant.amazon.com (10.43.161.249) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 27 Apr 2021 03:47:32 +0000 Received: from 88665a182662.ant.amazon.com (10.43.162.93) by EX13D04ANC001.ant.amazon.com (10.43.157.89) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 27 Apr 2021 03:47:27 +0000 From: Kuniyuki Iwashima To: "David S . Miller" , Jakub Kicinski , Eric Dumazet , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau CC: Benjamin Herrenschmidt , Kuniyuki Iwashima , Kuniyuki Iwashima , , , Subject: [PATCH v4 bpf-next 02/11] tcp: Add num_closed_socks to struct sock_reuseport. Date: Tue, 27 Apr 2021 12:46:14 +0900 Message-ID: <20210427034623.46528-3-kuniyu@amazon.co.jp> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210427034623.46528-1-kuniyu@amazon.co.jp> References: <20210427034623.46528-1-kuniyu@amazon.co.jp> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.43.162.93] X-ClientProxiedBy: EX13D18UWC002.ant.amazon.com (10.43.162.88) To EX13D04ANC001.ant.amazon.com (10.43.157.89) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org As noted in the following commit, a closed listener has to hold the reference to the reuseport group for socket migration. This patch adds a field (num_closed_socks) to struct sock_reuseport to manage closed sockets within the same reuseport group. Moreover, this and the following commits introduce some helper functions to split socks[] into two sections and keep TCP_LISTEN and TCP_CLOSE sockets in each section. Like a double-ended queue, we will place TCP_LISTEN sockets from the front and TCP_CLOSE sockets from the end. TCP_LISTEN----------> <-------TCP_CLOSE +---+---+ --- +---+ --- +---+ --- +---+ | 0 | 1 | ... | i | ... | j | ... | k | +---+---+ --- +---+ --- +---+ --- +---+ i = num_socks - 1 j = max_socks - num_closed_socks k = max_socks - 1 This patch also extends reuseport_add_sock() and reuseport_grow() to support num_closed_socks. Signed-off-by: Kuniyuki Iwashima --- include/net/sock_reuseport.h | 5 ++- net/core/sock_reuseport.c | 76 +++++++++++++++++++++++++++--------- 2 files changed, 60 insertions(+), 21 deletions(-) diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h index 505f1e18e9bf..0e558ca7afbf 100644 --- a/include/net/sock_reuseport.h +++ b/include/net/sock_reuseport.h @@ -13,8 +13,9 @@ extern spinlock_t reuseport_lock; struct sock_reuseport { struct rcu_head rcu; - u16 max_socks; /* length of socks */ - u16 num_socks; /* elements in socks */ + u16 max_socks; /* length of socks */ + u16 num_socks; /* elements in socks */ + u16 num_closed_socks; /* closed elements in socks */ /* The last synq overflow event timestamp of this * reuse->socks[] group. */ diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index b065f0a103ed..079bd1aca0e7 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -18,6 +18,49 @@ DEFINE_SPINLOCK(reuseport_lock); static DEFINE_IDA(reuseport_ida); +static int reuseport_sock_index(struct sock *sk, + struct sock_reuseport *reuse, + bool closed) +{ + int left, right; + + if (!closed) { + left = 0; + right = reuse->num_socks; + } else { + left = reuse->max_socks - reuse->num_closed_socks; + right = reuse->max_socks; + } + + for (; left < right; left++) + if (reuse->socks[left] == sk) + return left; + return -1; +} + +static void __reuseport_add_sock(struct sock *sk, + struct sock_reuseport *reuse) +{ + reuse->socks[reuse->num_socks] = sk; + /* paired with smp_rmb() in reuseport_select_sock() */ + smp_wmb(); + reuse->num_socks++; +} + +static bool __reuseport_detach_sock(struct sock *sk, + struct sock_reuseport *reuse) +{ + int i = reuseport_sock_index(sk, reuse, false); + + if (i == -1) + return false; + + reuse->socks[i] = reuse->socks[reuse->num_socks - 1]; + reuse->num_socks--; + + return true; +} + static struct sock_reuseport *__reuseport_alloc(unsigned int max_socks) { unsigned int size = sizeof(struct sock_reuseport) + @@ -72,9 +115,8 @@ int reuseport_alloc(struct sock *sk, bool bind_inany) } reuse->reuseport_id = id; - reuse->socks[0] = sk; - reuse->num_socks = 1; reuse->bind_inany = bind_inany; + __reuseport_add_sock(sk, reuse); rcu_assign_pointer(sk->sk_reuseport_cb, reuse); out: @@ -98,6 +140,7 @@ static struct sock_reuseport *reuseport_grow(struct sock_reuseport *reuse) return NULL; more_reuse->num_socks = reuse->num_socks; + more_reuse->num_closed_socks = reuse->num_closed_socks; more_reuse->prog = reuse->prog; more_reuse->reuseport_id = reuse->reuseport_id; more_reuse->bind_inany = reuse->bind_inany; @@ -105,9 +148,13 @@ static struct sock_reuseport *reuseport_grow(struct sock_reuseport *reuse) memcpy(more_reuse->socks, reuse->socks, reuse->num_socks * sizeof(struct sock *)); + memcpy(more_reuse->socks + + (more_reuse->max_socks - more_reuse->num_closed_socks), + reuse->socks + reuse->num_socks, + reuse->num_closed_socks * sizeof(struct sock *)); more_reuse->synq_overflow_ts = READ_ONCE(reuse->synq_overflow_ts); - for (i = 0; i < reuse->num_socks; ++i) + for (i = 0; i < reuse->max_socks; ++i) rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb, more_reuse); @@ -158,7 +205,7 @@ int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany) return -EBUSY; } - if (reuse->num_socks == reuse->max_socks) { + if (reuse->num_socks + reuse->num_closed_socks == reuse->max_socks) { reuse = reuseport_grow(reuse); if (!reuse) { spin_unlock_bh(&reuseport_lock); @@ -166,10 +213,7 @@ int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany) } } - reuse->socks[reuse->num_socks] = sk; - /* paired with smp_rmb() in reuseport_select_sock() */ - smp_wmb(); - reuse->num_socks++; + __reuseport_add_sock(sk, reuse); rcu_assign_pointer(sk->sk_reuseport_cb, reuse); spin_unlock_bh(&reuseport_lock); @@ -183,7 +227,6 @@ EXPORT_SYMBOL(reuseport_add_sock); void reuseport_detach_sock(struct sock *sk) { struct sock_reuseport *reuse; - int i; spin_lock_bh(&reuseport_lock); reuse = rcu_dereference_protected(sk->sk_reuseport_cb, @@ -200,16 +243,11 @@ void reuseport_detach_sock(struct sock *sk) bpf_sk_reuseport_detach(sk); rcu_assign_pointer(sk->sk_reuseport_cb, NULL); + __reuseport_detach_sock(sk, reuse); + + if (reuse->num_socks + reuse->num_closed_socks == 0) + call_rcu(&reuse->rcu, reuseport_free_rcu); - for (i = 0; i < reuse->num_socks; i++) { - if (reuse->socks[i] == sk) { - reuse->socks[i] = reuse->socks[reuse->num_socks - 1]; - reuse->num_socks--; - if (reuse->num_socks == 0) - call_rcu(&reuse->rcu, reuseport_free_rcu); - break; - } - } spin_unlock_bh(&reuseport_lock); } EXPORT_SYMBOL(reuseport_detach_sock); @@ -274,7 +312,7 @@ struct sock *reuseport_select_sock(struct sock *sk, prog = rcu_dereference(reuse->prog); socks = READ_ONCE(reuse->num_socks); if (likely(socks)) { - /* paired with smp_wmb() in reuseport_add_sock() */ + /* paired with smp_wmb() in __reuseport_add_sock() */ smp_rmb(); if (!prog || !skb) -- 2.30.2