Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp7552891rwp; Tue, 18 Jul 2023 17:57:36 -0700 (PDT) X-Google-Smtp-Source: APBJJlHvwolOm81oUD/crlYWQUWYLDrC4qG9/S7oiP7UEDXAdaiJq7wWXY2lsZE5DJnd25NfEKN7 X-Received: by 2002:a05:6a00:1a01:b0:668:806d:c2d6 with SMTP id g1-20020a056a001a0100b00668806dc2d6mr932086pfv.12.1689728255753; Tue, 18 Jul 2023 17:57:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689728255; cv=none; d=google.com; s=arc-20160816; b=0zMRfik7GXIx/nLeZdc80/Y5Tu60CJaXvkNz7nK9NCkO9c3peQ7sE7X/1vtULCsBsE Ug0Ma0GGprGJM3qZYzNQe5BnEieXmjmEYyy+NYvlPhrQ5jJBbOL6ctpglR/5rhGH97jy G3Uax6F3pGgtKvs9MbkvL0CmK7lVX4bjsK14nBK0ZZEoI4n4zxXcma21spDx/H/l/7y7 yDmHbdlhdCiqKIwAZFpQkCoexyAvCVy0ym3/ztzLTA0PAqvWWvNzHBLXNkKJeaXKt3uN JJJGc83aWGForwS0mZ0FzJN5YaJh/iU39PskgMxpCtLwbGzHrZ9ByH1xl3EBfFl2k9Yl 8YVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=YVG6/fd8/TheeZKmbnuDQmZdww8fyIwE/W3lpq/H1bc=; fh=jejeGpHkr9vVlEyvdVBCSLFut92eNP6vAI+O1tx9vp4=; b=IwDxRKX2nfKMEMR8EqEd+9JuDrqYmpYF6tCTPaNdSwYj+DPoTrCRrxZnNHD1wETjv5 fuxYXqlv1LwNg7pS5+Nsgd7ZJou93gPUZT/UZLW/vsYfxc5WgWGSgQ3rXTdRAxIR/oVZ dOHsMNVJz4ZmwCfmJhXW+azeIBPKgK6uK9oZ76jkt43Q+fYWY8RDEKLXZEgR/M06S/C3 B3msUsKNhOPF4bvW1RkgcC+wJmlDcat4XGHX52ivZkaIP7CjBc5go19B6WPPDsS9hKo3 uI5eGlX8vBzVskxOEdPYD8Gfm1xbmyOe+k167Tqj6DTjwNgS2hto9UgWk0fKcWOdpaTG mVLA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=M1eAJpu3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k8-20020aa788c8000000b00682bec0b685si2378466pff.237.2023.07.18.17.57.23; Tue, 18 Jul 2023 17:57:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=M1eAJpu3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230289AbjGSAvU (ORCPT + 99 others); Tue, 18 Jul 2023 20:51:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229977AbjGSAvA (ORCPT ); Tue, 18 Jul 2023 20:51:00 -0400 Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FFDE1995 for ; Tue, 18 Jul 2023 17:50:14 -0700 (PDT) Received: by mail-qk1-x734.google.com with SMTP id af79cd13be357-7679d444210so521273485a.2 for ; Tue, 18 Jul 2023 17:50:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1689727813; x=1692319813; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=YVG6/fd8/TheeZKmbnuDQmZdww8fyIwE/W3lpq/H1bc=; b=M1eAJpu3CrxziEpHRP2XtbX8TUaYeVu0Z7OFMz3BDmc+5Zu6FVi4kVgFeg0IMOBIuK CbMVnZLS9QwWEcl2SBAlk6NmUL7L2NI5FNnpakeq9h2XDhWrIj91ZkPueFOVWfrnPRj7 3OCUpGxk/hRy2rDb1B6+dyRg1zS+an1cw2SxJJAU30HWMiSrwRDCrU+KAFJDSH0+VmLd PNKfLsyqBT491jiB8zcaZRd6P37+KFR2uOeseT5s+z8dYLH6G7S5tM8Muxl1fR1lzZbQ EKSEQfb1GYzVISzT3tiPw3q9u4FdlVzQ5uihYjuK3rqMwiT5cEbfDayk24sCb0Ct/mD8 UncA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689727813; x=1692319813; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YVG6/fd8/TheeZKmbnuDQmZdww8fyIwE/W3lpq/H1bc=; b=T4ySjq1xw+HUZlzLAlwO9ad5Em137fMQHcibBuwodXk4eMgl5LHS388LHWXCWt+RvJ LxvRI19Jm6fYm1lGjJbsuVWWNky6gfKLtVb/v8LYkQCLGhvxzLlif2zKRdvi4Rs9TnVX ZQCFLqfw4TdG59Wg0oFqV/IUsQbRSYnWUfPgZDad2XLpPm2AArDkiiYRoR+vg11Jn5rH PcYYUxQc/lS4FJyzLVPLWnNKO5Ns5rBkFdMsNn8EtZy8YtOe0mUaShmeuxaHnPDgW4NE 7mwBFkHe3wKofUL03pR6VJWn900Dx9h9PghQddIoerUVHkv97n4xI/74SPj2klBorqB9 ml3A== X-Gm-Message-State: ABy/qLaq1w8TuSx0S9wCrVY/V9nY0xQBJ6GHp0DA6izB/0brqoIot2q/ Ht9VVL2QJrky5sQXglsjfkLVuw== X-Received: by 2002:a05:620a:45ab:b0:766:fa7b:8b20 with SMTP id bp43-20020a05620a45ab00b00766fa7b8b20mr868491qkb.50.1689727813660; Tue, 18 Jul 2023 17:50:13 -0700 (PDT) Received: from [172.17.0.7] ([130.44.212.112]) by smtp.gmail.com with ESMTPSA id c5-20020a05620a11a500b0076738337cd1sm968696qkk.1.2023.07.18.17.50.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jul 2023 17:50:13 -0700 (PDT) From: Bobby Eshleman Date: Wed, 19 Jul 2023 00:50:09 +0000 Subject: [PATCH RFC net-next v5 05/14] af_vsock: use a separate dgram bind table MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20230413-b4-vsock-dgram-v5-5-581bd37fdb26@bytedance.com> References: <20230413-b4-vsock-dgram-v5-0-581bd37fdb26@bytedance.com> In-Reply-To: <20230413-b4-vsock-dgram-v5-0-581bd37fdb26@bytedance.com> To: Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Bryan Tan , Vishnu Dasa , VMware PV-Drivers Reviewers Cc: Dan Carpenter , Simon Horman , Krasnov Arseniy , kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, bpf@vger.kernel.org, Bobby Eshleman X-Mailer: b4 0.12.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This commit adds support for bound dgram sockets to be tracked in a separate bind table from connectible sockets in order to avoid address collisions. With this commit, users can simultaneously bind a dgram socket and connectible socket to the same CID and port. Signed-off-by: Bobby Eshleman --- net/vmw_vsock/af_vsock.c | 103 ++++++++++++++++++++++++++++++++++------------- 1 file changed, 76 insertions(+), 27 deletions(-) diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 88100154156c..0895f4c1d340 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -10,18 +10,23 @@ * - There are two kinds of sockets: those created by user action (such as * calling socket(2)) and those created by incoming connection request packets. * - * - There are two "global" tables, one for bound sockets (sockets that have - * specified an address that they are responsible for) and one for connected - * sockets (sockets that have established a connection with another socket). - * These tables are "global" in that all sockets on the system are placed - * within them. - Note, though, that the bound table contains an extra entry - * for a list of unbound sockets and SOCK_DGRAM sockets will always remain in - * that list. The bound table is used solely for lookup of sockets when packets - * are received and that's not necessary for SOCK_DGRAM sockets since we create - * a datagram handle for each and need not perform a lookup. Keeping SOCK_DGRAM - * sockets out of the bound hash buckets will reduce the chance of collisions - * when looking for SOCK_STREAM sockets and prevents us from having to check the - * socket type in the hash table lookups. + * - There are three "global" tables, one for bound connectible (stream / + * seqpacket) sockets, one for bound datagram sockets, and one for connected + * sockets. Bound sockets are sockets that have specified an address that + * they are responsible for. Connected sockets are sockets that have + * established a connection with another socket. These tables are "global" in + * that all sockets on the system are placed within them. - Note, though, + * that the bound tables contain an extra entry for a list of unbound + * sockets. The bound tables are used solely for lookup of sockets when packets + * are received. + * + * - There are separate bind tables for connectible and datagram sockets to avoid + * address collisions between stream/seqpacket sockets and datagram sockets. + * + * - Transports may elect to NOT use the global datagram bind table by + * implementing the ->dgram_bind() callback. If that callback is implemented, + * the global bind table is not used and the responsibility of bound datagram + * socket tracking is deferred to the transport. * * - Sockets created by user action will either be "client" sockets that * initiate a connection or "server" sockets that listen for connections; we do @@ -115,6 +120,7 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr); static void vsock_sk_destruct(struct sock *sk); static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb); +static bool sock_type_connectible(u16 type); /* Protocol family. */ struct proto vsock_proto = { @@ -151,21 +157,25 @@ static DEFINE_MUTEX(vsock_register_mutex); * VSocket is stored in the connected hash table. * * Unbound sockets are all put on the same list attached to the end of the hash - * table (vsock_unbound_sockets). Bound sockets are added to the hash table in - * the bucket that their local address hashes to (vsock_bound_sockets(addr) - * represents the list that addr hashes to). + * tables (vsock_unbound_sockets/vsock_unbound_dgram_sockets). Bound sockets + * are added to the hash table in the bucket that their local address hashes to + * (vsock_bound_sockets(addr) and vsock_bound_dgram_sockets(addr) represents + * the list that addr hashes to). * - * Specifically, we initialize the vsock_bind_table array to a size of - * VSOCK_HASH_SIZE + 1 so that vsock_bind_table[0] through - * vsock_bind_table[VSOCK_HASH_SIZE - 1] are for bound sockets and - * vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets. The hash function - * mods with VSOCK_HASH_SIZE to ensure this. + * Specifically, taking connectible sockets as an example we initialize the + * vsock_bind_table array to a size of VSOCK_HASH_SIZE + 1 so that + * vsock_bind_table[0] through vsock_bind_table[VSOCK_HASH_SIZE - 1] are for + * bound sockets and vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets. + * The hash function mods with VSOCK_HASH_SIZE to ensure this. + * Datagrams and vsock_dgram_bind_table operate in the same way. */ #define MAX_PORT_RETRIES 24 #define VSOCK_HASH(addr) ((addr)->svm_port % VSOCK_HASH_SIZE) #define vsock_bound_sockets(addr) (&vsock_bind_table[VSOCK_HASH(addr)]) +#define vsock_bound_dgram_sockets(addr) (&vsock_dgram_bind_table[VSOCK_HASH(addr)]) #define vsock_unbound_sockets (&vsock_bind_table[VSOCK_HASH_SIZE]) +#define vsock_unbound_dgram_sockets (&vsock_dgram_bind_table[VSOCK_HASH_SIZE]) /* XXX This can probably be implemented in a better way. */ #define VSOCK_CONN_HASH(src, dst) \ @@ -181,6 +191,8 @@ struct list_head vsock_connected_table[VSOCK_HASH_SIZE]; EXPORT_SYMBOL_GPL(vsock_connected_table); DEFINE_SPINLOCK(vsock_table_lock); EXPORT_SYMBOL_GPL(vsock_table_lock); +static struct list_head vsock_dgram_bind_table[VSOCK_HASH_SIZE + 1]; +static DEFINE_SPINLOCK(vsock_dgram_table_lock); /* Autobind this socket to the local address if necessary. */ static int vsock_auto_bind(struct vsock_sock *vsk) @@ -203,6 +215,9 @@ static void vsock_init_tables(void) for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++) INIT_LIST_HEAD(&vsock_connected_table[i]); + + for (i = 0; i < ARRAY_SIZE(vsock_dgram_bind_table); i++) + INIT_LIST_HEAD(&vsock_dgram_bind_table[i]); } static void __vsock_insert_bound(struct list_head *list, @@ -270,13 +285,28 @@ static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src, return NULL; } -static void vsock_insert_unbound(struct vsock_sock *vsk) +static void __vsock_insert_dgram_unbound(struct vsock_sock *vsk) +{ + spin_lock_bh(&vsock_dgram_table_lock); + __vsock_insert_bound(vsock_unbound_dgram_sockets, vsk); + spin_unlock_bh(&vsock_dgram_table_lock); +} + +static void __vsock_insert_connectible_unbound(struct vsock_sock *vsk) { spin_lock_bh(&vsock_table_lock); __vsock_insert_bound(vsock_unbound_sockets, vsk); spin_unlock_bh(&vsock_table_lock); } +static void vsock_insert_unbound(struct vsock_sock *vsk) +{ + if (sock_type_connectible(sk_vsock(vsk)->sk_type)) + __vsock_insert_connectible_unbound(vsk); + else + __vsock_insert_dgram_unbound(vsk); +} + void vsock_insert_connected(struct vsock_sock *vsk) { struct list_head *list = vsock_connected_sockets( @@ -288,6 +318,14 @@ void vsock_insert_connected(struct vsock_sock *vsk) } EXPORT_SYMBOL_GPL(vsock_insert_connected); +static void vsock_remove_dgram_bound(struct vsock_sock *vsk) +{ + spin_lock_bh(&vsock_dgram_table_lock); + if (__vsock_in_bound_table(vsk)) + __vsock_remove_bound(vsk); + spin_unlock_bh(&vsock_dgram_table_lock); +} + void vsock_remove_bound(struct vsock_sock *vsk) { spin_lock_bh(&vsock_table_lock); @@ -339,7 +377,10 @@ EXPORT_SYMBOL_GPL(vsock_find_connected_socket); void vsock_remove_sock(struct vsock_sock *vsk) { - vsock_remove_bound(vsk); + if (sock_type_connectible(sk_vsock(vsk)->sk_type)) + vsock_remove_bound(vsk); + else + vsock_remove_dgram_bound(vsk); vsock_remove_connected(vsk); } EXPORT_SYMBOL_GPL(vsock_remove_sock); @@ -722,11 +763,19 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk, return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1); } -static int __vsock_bind_dgram(struct vsock_sock *vsk, - struct sockaddr_vm *addr) +static int vsock_bind_dgram(struct vsock_sock *vsk, + struct sockaddr_vm *addr) { - if (!vsk->transport || !vsk->transport->dgram_bind) - return -EINVAL; + if (!vsk->transport || !vsk->transport->dgram_bind) { + int retval; + + spin_lock_bh(&vsock_dgram_table_lock); + retval = vsock_bind_common(vsk, addr, vsock_dgram_bind_table, + VSOCK_HASH_SIZE); + spin_unlock_bh(&vsock_dgram_table_lock); + + return retval; + } return vsk->transport->dgram_bind(vsk, addr); } @@ -757,7 +806,7 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr) break; case SOCK_DGRAM: - retval = __vsock_bind_dgram(vsk, addr); + retval = vsock_bind_dgram(vsk, addr); break; default: -- 2.30.2