Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2206581pxj; Sun, 16 May 2021 18:35:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxeHutMFf0LeKTQxGGm9pOSr6H+bIN4FEtgiMUrI5NCXxvm5df3960KTMazEG0WUWNQQeFW X-Received: by 2002:a05:6402:4313:: with SMTP id m19mr67716801edc.263.1621215326531; Sun, 16 May 2021 18:35:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621215326; cv=none; d=google.com; s=arc-20160816; b=RVMjAxLPCN5cPzKHCWFSCNGI5IQFLwNo7unXJuAWZSlN8TYuumIHXCxjRLaeTBJE66 icps4YUBkEPgBFHCPCI0SJlNpL7kVFH+Lw7maRhdvue1b5p5afm/wj3rZc47v90mq8Pu R1zsgSAcPZVPeSUecIKtHrch+G7ozfLaoiqZAJZ4OCk5jHxHTVi9fPfWlJQ+S8urfj5I xnCtoQdGd1j1uLlzA9ipU4LW0819abNTyS+QRh5i4ZmthUoH89LwUSId82s8RH0dUg4s M62Is0Tr5+nTYSIos2lAxXRGcP471B/am5GHcA04a0e0UQhz7Wny/nkTKFdQ09g6wpNB Nccg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=g6x1WCv+DxHVu1dnCoY7n/b4DxS2J1V7lVJBJSaEQd0=; b=Of5GkT4RL0LgBflnmVaEE6sILM0IiEZkoXa2uElR0newsvG+5Bpukpmpsr967LXsL/ tV1wiVLCAsDWGge/w063ZEozsP9HCErwoqqbDkiSZTVMAB1rq6ruFMqFCEon2G9uOuGG KB01SCWrQ3HUO49LYTSjrdbbk3Ws02oLue5omOo8HPCIMnZIMNfLx47G0EKHUKCKEMJS 5fRB+X1Kghn6BNyZ82eMR6QhYm6BP1GwO3mcLDBqjC95RPHFcy9wrIaxZ9Y7UjthxMCh Y7lpygYUsR4vcnx1YAFKUwtmrH99uzYvQdDGoOm5sPKBJ5QxWONCO4/Pkv+FyFTVdA/Q FrVQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.co.jp header.s=amazon201209 header.b=u2ohqiY1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.co.jp Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p2si12415133edr.154.2021.05.16.18.35.01; Sun, 16 May 2021 18:35:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.co.jp header.s=amazon201209 header.b=u2ohqiY1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.co.jp Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234814AbhEQAZL (ORCPT + 99 others); Sun, 16 May 2021 20:25:11 -0400 Received: from smtp-fw-80006.amazon.com ([99.78.197.217]:57659 "EHLO smtp-fw-80006.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234789AbhEQAZL (ORCPT ); Sun, 16 May 2021 20:25:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.jp; i=@amazon.co.jp; q=dns/txt; s=amazon201209; t=1621211036; x=1652747036; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=g6x1WCv+DxHVu1dnCoY7n/b4DxS2J1V7lVJBJSaEQd0=; b=u2ohqiY1K4S8LF5qvlH5nePJK5seigLgsb5mqoxb7ysN2hlxL+kDlA/w VmFX1D/X/NifenHCyojvY3eOWy/Ino/XHHJJ8fNPsVMqA7PkBcDl1FFCP el18UgzEDM6rSnaX5JnuDmAhdFXMqGVxeqnAIjCE0cy0rU3TkdUbyktev I=; X-IronPort-AV: E=Sophos;i="5.82,306,1613433600"; d="scan'208";a="1587787" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO email-inbound-relay-1d-474bcd9f.us-east-1.amazon.com) ([10.25.36.210]) by smtp-border-fw-80006.pdx80.corp.amazon.com with ESMTP; 17 May 2021 00:23:55 +0000 Received: from EX13MTAUWB001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan2.iad.amazon.com [10.40.163.34]) by email-inbound-relay-1d-474bcd9f.us-east-1.amazon.com (Postfix) with ESMTPS id 13A52A1C0E; Mon, 17 May 2021 00:23:51 +0000 (UTC) Received: from EX13D04ANC001.ant.amazon.com (10.43.157.89) by EX13MTAUWB001.ant.amazon.com (10.43.161.249) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Mon, 17 May 2021 00:23:51 +0000 Received: from 88665a182662.ant.amazon.com (10.43.162.28) by EX13D04ANC001.ant.amazon.com (10.43.157.89) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Mon, 17 May 2021 00:23:46 +0000 From: Kuniyuki Iwashima To: "David S . Miller" , Jakub Kicinski , Eric Dumazet , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau CC: Benjamin Herrenschmidt , Kuniyuki Iwashima , Kuniyuki Iwashima , , , Subject: [PATCH v6 bpf-next 02/11] tcp: Add num_closed_socks to struct sock_reuseport. Date: Mon, 17 May 2021 09:22:49 +0900 Message-ID: <20210517002258.75019-3-kuniyu@amazon.co.jp> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210517002258.75019-1-kuniyu@amazon.co.jp> References: <20210517002258.75019-1-kuniyu@amazon.co.jp> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.43.162.28] X-ClientProxiedBy: EX13D21UWA004.ant.amazon.com (10.43.160.252) To EX13D04ANC001.ant.amazon.com (10.43.157.89) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org As noted in the following commit, a closed listener has to hold the reference to the reuseport group for socket migration. This patch adds a field (num_closed_socks) to struct sock_reuseport to manage closed sockets within the same reuseport group. Moreover, this and the following commits introduce some helper functions to split socks[] into two sections and keep TCP_LISTEN and TCP_CLOSE sockets in each section. Like a double-ended queue, we will place TCP_LISTEN sockets from the front and TCP_CLOSE sockets from the end. TCP_LISTEN----------> <-------TCP_CLOSE +---+---+ --- +---+ --- +---+ --- +---+ | 0 | 1 | ... | i | ... | j | ... | k | +---+---+ --- +---+ --- +---+ --- +---+ i = num_socks - 1 j = max_socks - num_closed_socks k = max_socks - 1 This patch also extends reuseport_add_sock() and reuseport_grow() to support num_closed_socks. Signed-off-by: Kuniyuki Iwashima Acked-by: Martin KaFai Lau --- include/net/sock_reuseport.h | 5 ++- net/core/sock_reuseport.c | 76 +++++++++++++++++++++++++++--------- 2 files changed, 60 insertions(+), 21 deletions(-) diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h index 505f1e18e9bf..0e558ca7afbf 100644 --- a/include/net/sock_reuseport.h +++ b/include/net/sock_reuseport.h @@ -13,8 +13,9 @@ extern spinlock_t reuseport_lock; struct sock_reuseport { struct rcu_head rcu; - u16 max_socks; /* length of socks */ - u16 num_socks; /* elements in socks */ + u16 max_socks; /* length of socks */ + u16 num_socks; /* elements in socks */ + u16 num_closed_socks; /* closed elements in socks */ /* The last synq overflow event timestamp of this * reuse->socks[] group. */ diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index b065f0a103ed..079bd1aca0e7 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -18,6 +18,49 @@ DEFINE_SPINLOCK(reuseport_lock); static DEFINE_IDA(reuseport_ida); +static int reuseport_sock_index(struct sock *sk, + struct sock_reuseport *reuse, + bool closed) +{ + int left, right; + + if (!closed) { + left = 0; + right = reuse->num_socks; + } else { + left = reuse->max_socks - reuse->num_closed_socks; + right = reuse->max_socks; + } + + for (; left < right; left++) + if (reuse->socks[left] == sk) + return left; + return -1; +} + +static void __reuseport_add_sock(struct sock *sk, + struct sock_reuseport *reuse) +{ + reuse->socks[reuse->num_socks] = sk; + /* paired with smp_rmb() in reuseport_select_sock() */ + smp_wmb(); + reuse->num_socks++; +} + +static bool __reuseport_detach_sock(struct sock *sk, + struct sock_reuseport *reuse) +{ + int i = reuseport_sock_index(sk, reuse, false); + + if (i == -1) + return false; + + reuse->socks[i] = reuse->socks[reuse->num_socks - 1]; + reuse->num_socks--; + + return true; +} + static struct sock_reuseport *__reuseport_alloc(unsigned int max_socks) { unsigned int size = sizeof(struct sock_reuseport) + @@ -72,9 +115,8 @@ int reuseport_alloc(struct sock *sk, bool bind_inany) } reuse->reuseport_id = id; - reuse->socks[0] = sk; - reuse->num_socks = 1; reuse->bind_inany = bind_inany; + __reuseport_add_sock(sk, reuse); rcu_assign_pointer(sk->sk_reuseport_cb, reuse); out: @@ -98,6 +140,7 @@ static struct sock_reuseport *reuseport_grow(struct sock_reuseport *reuse) return NULL; more_reuse->num_socks = reuse->num_socks; + more_reuse->num_closed_socks = reuse->num_closed_socks; more_reuse->prog = reuse->prog; more_reuse->reuseport_id = reuse->reuseport_id; more_reuse->bind_inany = reuse->bind_inany; @@ -105,9 +148,13 @@ static struct sock_reuseport *reuseport_grow(struct sock_reuseport *reuse) memcpy(more_reuse->socks, reuse->socks, reuse->num_socks * sizeof(struct sock *)); + memcpy(more_reuse->socks + + (more_reuse->max_socks - more_reuse->num_closed_socks), + reuse->socks + reuse->num_socks, + reuse->num_closed_socks * sizeof(struct sock *)); more_reuse->synq_overflow_ts = READ_ONCE(reuse->synq_overflow_ts); - for (i = 0; i < reuse->num_socks; ++i) + for (i = 0; i < reuse->max_socks; ++i) rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb, more_reuse); @@ -158,7 +205,7 @@ int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany) return -EBUSY; } - if (reuse->num_socks == reuse->max_socks) { + if (reuse->num_socks + reuse->num_closed_socks == reuse->max_socks) { reuse = reuseport_grow(reuse); if (!reuse) { spin_unlock_bh(&reuseport_lock); @@ -166,10 +213,7 @@ int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany) } } - reuse->socks[reuse->num_socks] = sk; - /* paired with smp_rmb() in reuseport_select_sock() */ - smp_wmb(); - reuse->num_socks++; + __reuseport_add_sock(sk, reuse); rcu_assign_pointer(sk->sk_reuseport_cb, reuse); spin_unlock_bh(&reuseport_lock); @@ -183,7 +227,6 @@ EXPORT_SYMBOL(reuseport_add_sock); void reuseport_detach_sock(struct sock *sk) { struct sock_reuseport *reuse; - int i; spin_lock_bh(&reuseport_lock); reuse = rcu_dereference_protected(sk->sk_reuseport_cb, @@ -200,16 +243,11 @@ void reuseport_detach_sock(struct sock *sk) bpf_sk_reuseport_detach(sk); rcu_assign_pointer(sk->sk_reuseport_cb, NULL); + __reuseport_detach_sock(sk, reuse); + + if (reuse->num_socks + reuse->num_closed_socks == 0) + call_rcu(&reuse->rcu, reuseport_free_rcu); - for (i = 0; i < reuse->num_socks; i++) { - if (reuse->socks[i] == sk) { - reuse->socks[i] = reuse->socks[reuse->num_socks - 1]; - reuse->num_socks--; - if (reuse->num_socks == 0) - call_rcu(&reuse->rcu, reuseport_free_rcu); - break; - } - } spin_unlock_bh(&reuseport_lock); } EXPORT_SYMBOL(reuseport_detach_sock); @@ -274,7 +312,7 @@ struct sock *reuseport_select_sock(struct sock *sk, prog = rcu_dereference(reuse->prog); socks = READ_ONCE(reuse->num_socks); if (likely(socks)) { - /* paired with smp_wmb() in reuseport_add_sock() */ + /* paired with smp_wmb() in __reuseport_add_sock() */ smp_rmb(); if (!prog || !skb) -- 2.30.2