Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp2407407rdf; Mon, 6 Nov 2023 13:18:35 -0800 (PST) X-Google-Smtp-Source: AGHT+IF6SgCHzjztBWSry66eu1cgcjwdZo655jqZjezydRoOahmcjFDuw8/kSuPyw+zz8rLTc7n2 X-Received: by 2002:a17:902:e80f:b0:1cc:7af4:d12c with SMTP id u15-20020a170902e80f00b001cc7af4d12cmr18136986plg.62.1699305515091; Mon, 06 Nov 2023 13:18:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699305515; cv=none; d=google.com; s=arc-20160816; b=l9/gcTJBitTs74N3/rz8OjANREf7w7qHJei4jDxFIk26SqFmm2QJWFFk0MjVLasAPJ NBkvTKU3e84kgQ0JXPdzL2CCT/uXZttU8Ld0CximJtCdd7ajuIFmQaucbNXrwtzQjVyl uALdu5J5vzVobFv+4tuSkNc6FCn03M4E/PvAhbEH81zv1S/1MjDlH4tUYMTBOqbMe+R6 6V7kaiRFj5e3eOeRIHias/10zxkKqUVfgLMEZyaY9iVxtJrzxm4RwSoQF2wr57w9PT/x gibya9v849JR2yVRmdMBIR2IkD6+Ox7NW/qqFz7mxW+EPkYj+/AVBjiC17kFSPqWYPCb hfbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:from:subject :message-id:references:mime-version:in-reply-to:date:dkim-signature; bh=Q+e/wl/IpkV0o6PDo9Tkqa7B1djQqq2R6AetcJyqBsw=; fh=gxZei/e6+xUVfmM1A9qM1GQERxiD8ExX6F4epe3X8Lg=; b=HEiTvMEK3xQT10FzibI9+F5kQp1YOxD+sIMq8oOCmJ0vPYrY81GXfNQxth9z3dsHnQ Qsi8DOO9MCKRK/VS+Vp0bi46qfn24ScDjcpsD0tZlTJWm2r945LSGzXE/tt8gadBN60v mg1Kx46zjYA1DIHCHL3ovJHhsJjAUCBbPhfZKTHELRNOaoCqhkR/xDZRhMkKmD9cysmK ubnqJIMbE8xYSLGP2oPtr7X1dpV6avp7DPfgOGGJcQ83Ty7c1zaBgUzwy8cIyd/z+wq3 zqVt+tqw92U+vc1LqQb4AZ2t8D24ngRdaA1g1E0kKqiW1EFcEtxJ4Z3JfOwHokulX6b1 ya1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=guQnzDNr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id kn8-20020a170903078800b001c9abb72958si8360342plb.590.2023.11.06.13.18.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 13:18:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=guQnzDNr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 98B3480310C2; Mon, 6 Nov 2023 13:18:32 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233112AbjKFVSI (ORCPT + 99 others); Mon, 6 Nov 2023 16:18:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53564 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233129AbjKFVSF (ORCPT ); Mon, 6 Nov 2023 16:18:05 -0500 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28BE910C1 for ; Mon, 6 Nov 2023 13:18:01 -0800 (PST) Received: by mail-pl1-x64a.google.com with SMTP id d9443c01a7336-1cc23f2226bso33363715ad.2 for ; Mon, 06 Nov 2023 13:18:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699305480; x=1699910280; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=Q+e/wl/IpkV0o6PDo9Tkqa7B1djQqq2R6AetcJyqBsw=; b=guQnzDNrQ0IEnDzO5hmG0IkO9uQpBsE3dQM3ouRmgcIUj8ovtjwTeObif9ZB0Nh1IL Ogw6bASCzlrDbxFB1Ig73mKR0hWGudoulglTPgsFRJei6ZPUavDy4CLE+hiAQ2wiPiVN JnVSSX6b0kXkG45BERqEsZABipd1zMRyVExPL5PbHvNDYjMHAKcAGZMk0PXvKxLwHz6D o2WL1O+bGCtEfRVE6g71WCK1G9uV763h8zt62FZGAOOnSjhBTeAFTbXE8POz71PKERm6 IWsFEuPkzt4SAFAUIRJ2XlWdw05sk/Sqf7JtUTQINTH/G/eTA8goRYsvdbBmmRirOhKz 7CLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699305480; x=1699910280; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=Q+e/wl/IpkV0o6PDo9Tkqa7B1djQqq2R6AetcJyqBsw=; b=NJTnXo9J8HY6yGU+KEY5goIKGq/PTfmDDtE0vDHIGRFTVk+h6Es2SJHlFHGuQvE96b mWFs8FGlFL3uPqdWSR8zLx7W8JKGyBGAhSKH7xjcHtAPjdvwuUs66zXjs+nrAvo5ru1U aHm1+fvoRPNJqk6avUKDIDpwlwQR1doHZIn9QSCBm2KR4te2gdVu+tm7eIsBl5ZoyL4n yQYVAHtuzpqI/0T6VBYQ5At1GTPYIqohvIDl20vuRM/H58joAc7dymmmRAG+dcl0HOtX Briw3v9SRMWIq8KlDDr9+CkZ5/WHUdYSSZj7GuLtVBaYpig+YkPBWJHlUMw99+U4N4LM yZXA== X-Gm-Message-State: AOJu0YwkgZIm7yXuDf66hpdGf67w4oh8b3ONjH+849du0weZTKLIHuBA SxqCxYyWRxRbKOETo++RZLwYLTI= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a17:902:f809:b0:1cc:2a6f:ab91 with SMTP id ix9-20020a170902f80900b001cc2a6fab91mr467862plb.0.1699305480560; Mon, 06 Nov 2023 13:18:00 -0800 (PST) Date: Mon, 6 Nov 2023 13:17:59 -0800 In-Reply-To: Mime-Version: 1.0 References: <20231106024413.2801438-1-almasrymina@google.com> <20231106024413.2801438-11-almasrymina@google.com> Message-ID: Subject: Re: [RFC PATCH v3 10/12] tcp: RX path for devmem TCP From: Stanislav Fomichev To: Mina Almasry Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , "Christian =?utf-8?B?S8O2bmln?=" , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , Willem de Bruijn , Kaiyuan Zhang Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Mon, 06 Nov 2023 13:18:32 -0800 (PST) On 11/06, Mina Almasry wrote: > On Mon, Nov 6, 2023 at 10:44=E2=80=AFAM Stanislav Fomichev wrote: > > > > On 11/05, Mina Almasry wrote: > > > In tcp_recvmsg_locked(), detect if the skb being received by the user > > > is a devmem skb. In this case - if the user provided the MSG_SOCK_DEV= MEM > > > flag - pass it to tcp_recvmsg_devmem() for custom handling. > > > > > > tcp_recvmsg_devmem() copies any data in the skb header to the linear > > > buffer, and returns a cmsg to the user indicating the number of bytes > > > returned in the linear buffer. > > > > > > tcp_recvmsg_devmem() then loops over the unaccessible devmem skb frag= s, > > > and returns to the user a cmsg_devmem indicating the location of the > > > data in the dmabuf device memory. cmsg_devmem contains this informati= on: > > > > > > 1. the offset into the dmabuf where the payload starts. 'frag_offset'= . > > > 2. the size of the frag. 'frag_size'. > > > 3. an opaque token 'frag_token' to return to the kernel when the buff= er > > > is to be released. > > > > > > The pages awaiting freeing are stored in the newly added > > > sk->sk_user_pages, and each page passed to userspace is get_page()'d. > > > This reference is dropped once the userspace indicates that it is > > > done reading this page. All pages are released when the socket is > > > destroyed. > > > > > > Signed-off-by: Willem de Bruijn > > > Signed-off-by: Kaiyuan Zhang > > > Signed-off-by: Mina Almasry > > > > > > --- > > > > > > RFC v3: > > > - Fixed issue with put_cmsg() failing silently. > > > > > > --- > > > include/linux/socket.h | 1 + > > > include/net/page_pool/helpers.h | 9 ++ > > > include/net/sock.h | 2 + > > > include/uapi/asm-generic/socket.h | 5 + > > > include/uapi/linux/uio.h | 6 + > > > net/ipv4/tcp.c | 189 ++++++++++++++++++++++++++++= +- > > > net/ipv4/tcp_ipv4.c | 7 ++ > > > 7 files changed, 214 insertions(+), 5 deletions(-) > > > > > > diff --git a/include/linux/socket.h b/include/linux/socket.h > > > index cfcb7e2c3813..fe2b9e2081bb 100644 > > > --- a/include/linux/socket.h > > > +++ b/include/linux/socket.h > > > @@ -326,6 +326,7 @@ struct ucred { > > > * plain text and require encr= yption > > > */ > > > > > > +#define MSG_SOCK_DEVMEM 0x2000000 /* Receive devmem skbs as cmsg = */ > > > > Sharing the feedback that I've been providing internally on the public = list: > > >=20 > There may have been a miscommunication. I don't recall hearing this > specific feedback from you, at least in the last few months. Sorry if > it seemed like I'm ignoring feedback :) No worries, there was a thread long time ago about this whole token interface and whether it should support out-of-order refills, etc. > > IMHO, we need a better UAPI to receive the tokens and give them back to > > the kernel. CMSG + setsockopt(SO_DEVMEM_DONTNEED) get the job done, > > but look dated and hacky :-( > > > > We should either do some kind of user/kernel shared memory queue to > > receive/return the tokens (similar to what Jonathan was doing in his > > proposal?) >=20 > I'll take a look at Jonathan's proposal, sorry, I'm not immediately > familiar but I wanted to respond :-) But is the suggestion here to > build a new kernel-user communication channel primitive for the > purpose of passing the information in the devmem cmsg? IMHO that seems > like an overkill. Why add 100-200 lines of code to the kernel to add > something that can already be done with existing primitives? I don't > see anything concretely wrong with cmsg & setsockopt approach, and if > we switch to something I'd prefer to switch to an existing primitive > for simplicity? >=20 > The only other existing primitive to pass data outside of the linear > buffer is the MSG_ERRQUEUE that is used for zerocopy. Is that > preferred? Any other suggestions or existing primitives I'm not aware > of? I guess I'm just wondering whether other people have any suggestions here. Not sure Jonathan's way was better, but we fundamentally have two queues between the kernel and the userspace: - userspace receiving tokens (recvmsg + magical flag) - userspace refilling tokens (setsockopt + magical flag) So having some kind of shared memory producer-consumer queue feels natural. And using 'classic' socket api here feels like a stretch, idk. But maybe I'm overthinking and overcomplicating :-) > > or bite the bullet and switch to io_uring. > > >=20 > IMO io_uring & socket support are orthogonal, and one doesn't preclude > the other. As you know we like to use sockets and I believe there are > issues with io_uring adoption at Google that I'm not familiar with > (and could be wrong). I'm interested in exploring io_uring support as > a follow up but I think David Wei will be interested in io_uring > support as well anyway. Ack, might be one more reason on our side to adopt iouring :-p > > I was also suggesting to do it via netlink initially, but it's probably > > a bit slow for these purpose, idk. >=20 > Yeah, I hear netlink is reserved for control paths and is > inappropriate for data path, but I'll let folks correct me if wrong. >=20 > --=20 > Thanks, > Mina