Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp1333267pxj; Sat, 12 Jun 2021 05:36:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyBDRUZfScW5xY9jEr9aohmbsqCnEcM+3ztgoByyrfHWAacWi+3KEIpvnwNVhDflMJzpSAW X-Received: by 2002:a50:ee84:: with SMTP id f4mr8243570edr.97.1623501393059; Sat, 12 Jun 2021 05:36:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623501393; cv=none; d=google.com; s=arc-20160816; b=OGBkmmPD0XVDLbqJFXUQdAZ0VrUhtOrJAWi16MDHUKNnd4+/KZvqnuPIVq6XAEMXUA R1/fQZ+8JJhFsaPD4gu/FhuQxh7k9crQ//J0wJcnlgkWPpfdXo0CKKfp6nfgTe3vuxBx iRpGePrayM5AJ6HZLa7GhJDENKKH4m2XlhhvrkbS3hxOKXS4hkNRRG3Lw7gcpCghZpE/ IIEf8iP17AcLVExgdQDXeKg6MxbGS4a+4JaXg4bm75p8tzv9KjJL18evl+2xqUkaw1yQ 9WqYXJwlgBiC2G0aTyr9ETGDEnmTCJzaUGf1yZnPfR8cdK9E8c5mXYT+S8pOed5c15ET Yv/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=m8jIsW4TYkPNrMtEqC2G05t5YT1MIPrYi1tKdSARzv0=; b=bSl91Utc/0KpJtHzEUy6lnBiw9YgLX+VpQ6UfISGf5lltpAvyiEG0KNp2k8dXMRPJZ O/HKaQFo9fIn0rjt2oYVbxJYi2BM7SlMU1U0TeeByyqeWDou/gQaNRqRB+NVoUa1zVRU FVt1ECQd8R+1hT8THbteZehh1f9A1zLX+IjdiBiDkVa8yYlHlwa7tCIP5NP/co5X89R8 HfV7FlIA8CzQi1q6Qhx7tRDw5W0eaMrsz2fSsKUhpi6dTiuccBFI4fMPMl+0hm1TlaYx wcFLdT65L9r15hhhzO/uCnDMSK1Zp4/UVcwVgO3zADwH2kRPUMJcsVkJegf4bOCg3NBl 9t/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.co.jp header.s=amazon201209 header.b=pV7kl7jE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.co.jp Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f11si7264644ejh.536.2021.06.12.05.36.10; Sat, 12 Jun 2021 05:36:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.co.jp header.s=amazon201209 header.b=pV7kl7jE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.co.jp Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231339AbhFLMfo (ORCPT + 99 others); Sat, 12 Jun 2021 08:35:44 -0400 Received: from smtp-fw-4101.amazon.com ([72.21.198.25]:9321 "EHLO smtp-fw-4101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231301AbhFLMfn (ORCPT ); Sat, 12 Jun 2021 08:35:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.jp; i=@amazon.co.jp; q=dns/txt; s=amazon201209; t=1623501224; x=1655037224; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=m8jIsW4TYkPNrMtEqC2G05t5YT1MIPrYi1tKdSARzv0=; b=pV7kl7jEFek1aYYtRtwNedc+/yt+jf7ElBRi4U3tEQ+UGQwqDZTnhPQZ RmxyVhDAoxYISlbjJDAZS/N52qpEOLNS/XUF6RapY/QPhHcqneuu7famD H3+OYRqAyewhlkXUzx/gGwHqmYBtS2J8VEwU8icZBxkmWQESCO0cKnn1L Q=; X-IronPort-AV: E=Sophos;i="5.83,268,1616457600"; d="scan'208";a="113874409" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-2b-859fe132.us-west-2.amazon.com) ([10.43.8.6]) by smtp-border-fw-4101.iad4.amazon.com with ESMTP; 12 Jun 2021 12:33:42 +0000 Received: from EX13MTAUWB001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan2.pdx.amazon.com [10.236.137.194]) by email-inbound-relay-2b-859fe132.us-west-2.amazon.com (Postfix) with ESMTPS id 9D08F221BD0; Sat, 12 Jun 2021 12:33:41 +0000 (UTC) Received: from EX13D04ANC001.ant.amazon.com (10.43.157.89) by EX13MTAUWB001.ant.amazon.com (10.43.161.207) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Sat, 12 Jun 2021 12:33:40 +0000 Received: from 88665a182662.ant.amazon.com (10.43.160.55) by EX13D04ANC001.ant.amazon.com (10.43.157.89) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Sat, 12 Jun 2021 12:33:34 +0000 From: Kuniyuki Iwashima To: "David S . Miller" , Jakub Kicinski , Eric Dumazet , Neal Cardwell , Yuchung Cheng , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau CC: Benjamin Herrenschmidt , Kuniyuki Iwashima , Kuniyuki Iwashima , , , Subject: [PATCH v8 bpf-next 04/11] tcp: Add reuseport_migrate_sock() to select a new listener. Date: Sat, 12 Jun 2021 21:32:17 +0900 Message-ID: <20210612123224.12525-5-kuniyu@amazon.co.jp> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210612123224.12525-1-kuniyu@amazon.co.jp> References: <20210612123224.12525-1-kuniyu@amazon.co.jp> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.43.160.55] X-ClientProxiedBy: EX13D04UWB002.ant.amazon.com (10.43.161.133) To EX13D04ANC001.ant.amazon.com (10.43.157.89) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org reuseport_migrate_sock() does the same check done in reuseport_listen_stop_sock(). If the reuseport group is capable of migration, reuseport_migrate_sock() selects a new listener by the child socket hash and increments the listener's sk_refcnt beforehand. Thus, if we fail in the migration, we have to decrement it later. We will support migration by eBPF in the later commits. Signed-off-by: Kuniyuki Iwashima Signed-off-by: Martin KaFai Lau Reviewed-by: Eric Dumazet --- include/net/sock_reuseport.h | 3 ++ net/core/sock_reuseport.c | 78 +++++++++++++++++++++++++++++------- 2 files changed, 67 insertions(+), 14 deletions(-) diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h index 1333d0cddfbc..473b0b0fa4ab 100644 --- a/include/net/sock_reuseport.h +++ b/include/net/sock_reuseport.h @@ -37,6 +37,9 @@ extern struct sock *reuseport_select_sock(struct sock *sk, u32 hash, struct sk_buff *skb, int hdr_len); +struct sock *reuseport_migrate_sock(struct sock *sk, + struct sock *migrating_sk, + struct sk_buff *skb); extern int reuseport_attach_prog(struct sock *sk, struct bpf_prog *prog); extern int reuseport_detach_prog(struct sock *sk); diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index 41fcd55ab5ae..b239f8cd9d39 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -44,7 +44,7 @@ static void __reuseport_add_sock(struct sock *sk, struct sock_reuseport *reuse) { reuse->socks[reuse->num_socks] = sk; - /* paired with smp_rmb() in reuseport_select_sock() */ + /* paired with smp_rmb() in reuseport_(select|migrate)_sock() */ smp_wmb(); reuse->num_socks++; } @@ -434,6 +434,23 @@ static struct sock *run_bpf_filter(struct sock_reuseport *reuse, u16 socks, return reuse->socks[index]; } +static struct sock *reuseport_select_sock_by_hash(struct sock_reuseport *reuse, + u32 hash, u16 num_socks) +{ + int i, j; + + i = j = reciprocal_scale(hash, num_socks); + while (reuse->socks[i]->sk_state == TCP_ESTABLISHED) { + i++; + if (i >= num_socks) + i = 0; + if (i == j) + return NULL; + } + + return reuse->socks[i]; +} + /** * reuseport_select_sock - Select a socket from an SO_REUSEPORT group. * @sk: First socket in the group. @@ -477,19 +494,8 @@ struct sock *reuseport_select_sock(struct sock *sk, select_by_hash: /* no bpf or invalid bpf result: fall back to hash usage */ - if (!sk2) { - int i, j; - - i = j = reciprocal_scale(hash, socks); - while (reuse->socks[i]->sk_state == TCP_ESTABLISHED) { - i++; - if (i >= socks) - i = 0; - if (i == j) - goto out; - } - sk2 = reuse->socks[i]; - } + if (!sk2) + sk2 = reuseport_select_sock_by_hash(reuse, hash, socks); } out: @@ -498,6 +504,50 @@ struct sock *reuseport_select_sock(struct sock *sk, } EXPORT_SYMBOL(reuseport_select_sock); +/** + * reuseport_migrate_sock - Select a socket from an SO_REUSEPORT group. + * @sk: close()ed or shutdown()ed socket in the group. + * @migrating_sk: ESTABLISHED/SYN_RECV full socket in the accept queue or + * NEW_SYN_RECV request socket during 3WHS. + * @skb: skb to run through BPF filter. + * Returns a socket (with sk_refcnt +1) that should accept the child socket + * (or NULL on error). + */ +struct sock *reuseport_migrate_sock(struct sock *sk, + struct sock *migrating_sk, + struct sk_buff *skb) +{ + struct sock_reuseport *reuse; + struct sock *nsk = NULL; + u16 socks; + u32 hash; + + rcu_read_lock(); + + reuse = rcu_dereference(sk->sk_reuseport_cb); + if (!reuse) + goto out; + + socks = READ_ONCE(reuse->num_socks); + if (unlikely(!socks)) + goto out; + + /* paired with smp_wmb() in __reuseport_add_sock() */ + smp_rmb(); + + hash = migrating_sk->sk_hash; + if (sock_net(sk)->ipv4.sysctl_tcp_migrate_req) + nsk = reuseport_select_sock_by_hash(reuse, hash, socks); + + if (nsk && unlikely(!refcount_inc_not_zero(&nsk->sk_refcnt))) + nsk = NULL; + +out: + rcu_read_unlock(); + return nsk; +} +EXPORT_SYMBOL(reuseport_migrate_sock); + int reuseport_attach_prog(struct sock *sk, struct bpf_prog *prog) { struct sock_reuseport *reuse; -- 2.30.2