Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp2353345rdf; Mon, 6 Nov 2023 11:29:57 -0800 (PST) X-Google-Smtp-Source: AGHT+IH/fpyZRFuDM6Rxf71/udkMBLD7hSAcAcdzZZj7lHowByssBdiMMVnyj0Ei6LMEW5FtmU2n X-Received: by 2002:a05:6a20:c1a8:b0:162:edc2:4e9f with SMTP id bg40-20020a056a20c1a800b00162edc24e9fmr29714768pzb.62.1699298997540; Mon, 06 Nov 2023 11:29:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699298997; cv=none; d=google.com; s=arc-20160816; b=OcoYeTmKfSQkME+9r0+PfBEWTzZBIsLs9C+yzTftHzj1biV2UVlOQAhtNagpemuc7d AuupK9hH31pRMphH4E57aOEAEehJQBQOG+kuENOcwatYZUapRL0ej7gya76HgnQa3kVH L8N0O2qcvSXz+wR5B/zE3JaWO0uLb3rUUSyaBgg+1FOqEDWAdI6hibUk7bTTbDJdk4Sz 0MLH29g8cofbuQnGjkipFS9jzKkA1WagEbPUX8fEDXw+1BfxqvPvLbZUzIi6RrSQCJMT Fh+Iin2PdCRcwPzBrasvhyH3s6/6dCCd9xuAv+07BwSojtCKzqUKKcver3PdL/xp997P LFCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=G7Si+k3SGcxGf6sJGMI/kj5eJZ6V+SjfPyFVm8vGoBA=; fh=GdBDO2znvq06AKGdbZucoLGJ5HZxm/H6//RrP3eh6Hs=; b=ermuff6jR1YiHF+Dvu0UwAM8NI0buKAsBHBzcoscCk4FWPCBtKvTTgjVt49Atv1UjC F9k2McdR0SzhRGEDS5k9M5aS/+w+6xjrrLiNBjOQNZMOEaZZ4blCQxI0J+rOLVhbMDC0 P9mfpJNYOIFxt4Q29OChqUCprsH8tUSMRu3nfUOvsi1O0CFpJQigKMJTlpyy4m4KR1/L Av3SPnf2rwYY17x4nsWx2yovtwXPAkfENcETMv08WsIdrmeOr5EOUG6Zw6X3Gn9t79k/ rx8qjZVr3B8Tqv1ZkdXrvwXt2JXRiDqVkJkrL4ara1BNniCLpbdbHy6/cI86lMkoUQEQ JzgA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=LqlReyfq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id q22-20020a056a00089600b0069341622984si8924119pfj.147.2023.11.06.11.29.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 11:29:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=LqlReyfq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 8C9C480D44D7; Mon, 6 Nov 2023 11:29:56 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232196AbjKFT3x (ORCPT + 99 others); Mon, 6 Nov 2023 14:29:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231699AbjKFT3w (ORCPT ); Mon, 6 Nov 2023 14:29:52 -0500 Received: from mail-vs1-xe30.google.com (mail-vs1-xe30.google.com [IPv6:2607:f8b0:4864:20::e30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42468103 for ; Mon, 6 Nov 2023 11:29:48 -0800 (PST) Received: by mail-vs1-xe30.google.com with SMTP id ada2fe7eead31-45f0e332d6bso838188137.2 for ; Mon, 06 Nov 2023 11:29:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699298987; x=1699903787; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=G7Si+k3SGcxGf6sJGMI/kj5eJZ6V+SjfPyFVm8vGoBA=; b=LqlReyfqOi5hTdMQbb+YXa8+g7KEHgzvrOt4dBaC2f3Fn1pp1kl+nekse8KlFM49ll k1tuf0NyoOUYtWmrTR/FzHfuh9VczuBtD4kwQ1iPmv4fzL4mguF1IZCyYtvZem/8ddqQ Vp7G1IGaHYgyO4r1YsHBD9G/4lN++Lcdrk907s0ZKDP0GdV4EgmL6mtDEx9g+Q6KZtKn cIcjMsIBNHNOhO3WChRAcKEWGKwbu0zQQHrgrmtiu+h1ugFZ8MsydbCgufcarunRAtQL O4WVSSFCdSmKRZ1XOdS2X+FDURpiHUl9/AYVD5GnqLQFqX2zWDQdnZUPySVTzzd8RQu5 nVTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699298987; x=1699903787; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=G7Si+k3SGcxGf6sJGMI/kj5eJZ6V+SjfPyFVm8vGoBA=; b=P+Gp0LsqCmG8u8yzopbWOhTxIHj4bN2AZ4p+qXMPpa1bduj+MjPWIGf9EDWt50gUTk 3EILGHTKRB1Ey83VQ2QtGCeEHe6IFYi/sBenJIb31gKhZh2hcp1SoZ5XlaqobPOW8cmw etADcr4poykRPO73o0iLQJfN2sBCo6i8dE25XqLM6IQtJEPX0GOC2qW0oKmeogMOYbcf oaL8VOWI4b0F1IOoU/jnsaYp3C584T8aBx7x934k9HB938eVlKk1w1KTUDdSwPuvMVYB bTxBWP3Y+YLfDimA2HM5ly2FT1pubwmDg/euJj2/7dSPSpkB2w6gShTYzh6F9BCYTYGd ib4w== X-Gm-Message-State: AOJu0YxB9xjQfUErbOjGCft0gFvwU9KO1cQ3X0hAR5kdy43MhyefBMn7 8G1BGoJxbHf3oSZk8sYObBCUKVfZuAYPiacDRgtl5A== X-Received: by 2002:a67:a247:0:b0:45d:9083:f877 with SMTP id t7-20020a67a247000000b0045d9083f877mr7751422vsh.5.1699298986878; Mon, 06 Nov 2023 11:29:46 -0800 (PST) MIME-Version: 1.0 References: <20231106024413.2801438-1-almasrymina@google.com> <20231106024413.2801438-11-almasrymina@google.com> In-Reply-To: From: Mina Almasry Date: Mon, 6 Nov 2023 11:29:33 -0800 Message-ID: Subject: Re: [RFC PATCH v3 10/12] tcp: RX path for devmem TCP To: Stanislav Fomichev Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , =?UTF-8?Q?Christian_K=C3=B6nig?= , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , Willem de Bruijn , Kaiyuan Zhang Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Mon, 06 Nov 2023 11:29:56 -0800 (PST) On Mon, Nov 6, 2023 at 10:44=E2=80=AFAM Stanislav Fomichev = wrote: > > On 11/05, Mina Almasry wrote: > > In tcp_recvmsg_locked(), detect if the skb being received by the user > > is a devmem skb. In this case - if the user provided the MSG_SOCK_DEVME= M > > flag - pass it to tcp_recvmsg_devmem() for custom handling. > > > > tcp_recvmsg_devmem() copies any data in the skb header to the linear > > buffer, and returns a cmsg to the user indicating the number of bytes > > returned in the linear buffer. > > > > tcp_recvmsg_devmem() then loops over the unaccessible devmem skb frags, > > and returns to the user a cmsg_devmem indicating the location of the > > data in the dmabuf device memory. cmsg_devmem contains this information= : > > > > 1. the offset into the dmabuf where the payload starts. 'frag_offset'. > > 2. the size of the frag. 'frag_size'. > > 3. an opaque token 'frag_token' to return to the kernel when the buffer > > is to be released. > > > > The pages awaiting freeing are stored in the newly added > > sk->sk_user_pages, and each page passed to userspace is get_page()'d. > > This reference is dropped once the userspace indicates that it is > > done reading this page. All pages are released when the socket is > > destroyed. > > > > Signed-off-by: Willem de Bruijn > > Signed-off-by: Kaiyuan Zhang > > Signed-off-by: Mina Almasry > > > > --- > > > > RFC v3: > > - Fixed issue with put_cmsg() failing silently. > > > > --- > > include/linux/socket.h | 1 + > > include/net/page_pool/helpers.h | 9 ++ > > include/net/sock.h | 2 + > > include/uapi/asm-generic/socket.h | 5 + > > include/uapi/linux/uio.h | 6 + > > net/ipv4/tcp.c | 189 +++++++++++++++++++++++++++++- > > net/ipv4/tcp_ipv4.c | 7 ++ > > 7 files changed, 214 insertions(+), 5 deletions(-) > > > > diff --git a/include/linux/socket.h b/include/linux/socket.h > > index cfcb7e2c3813..fe2b9e2081bb 100644 > > --- a/include/linux/socket.h > > +++ b/include/linux/socket.h > > @@ -326,6 +326,7 @@ struct ucred { > > * plain text and require encryp= tion > > */ > > > > +#define MSG_SOCK_DEVMEM 0x2000000 /* Receive devmem skbs as cmsg */ > > Sharing the feedback that I've been providing internally on the public li= st: > There may have been a miscommunication. I don't recall hearing this specific feedback from you, at least in the last few months. Sorry if it seemed like I'm ignoring feedback :) > IMHO, we need a better UAPI to receive the tokens and give them back to > the kernel. CMSG + setsockopt(SO_DEVMEM_DONTNEED) get the job done, > but look dated and hacky :-( > > We should either do some kind of user/kernel shared memory queue to > receive/return the tokens (similar to what Jonathan was doing in his > proposal?) I'll take a look at Jonathan's proposal, sorry, I'm not immediately familiar but I wanted to respond :-) But is the suggestion here to build a new kernel-user communication channel primitive for the purpose of passing the information in the devmem cmsg? IMHO that seems like an overkill. Why add 100-200 lines of code to the kernel to add something that can already be done with existing primitives? I don't see anything concretely wrong with cmsg & setsockopt approach, and if we switch to something I'd prefer to switch to an existing primitive for simplicity? The only other existing primitive to pass data outside of the linear buffer is the MSG_ERRQUEUE that is used for zerocopy. Is that preferred? Any other suggestions or existing primitives I'm not aware of? > or bite the bullet and switch to io_uring. > IMO io_uring & socket support are orthogonal, and one doesn't preclude the other. As you know we like to use sockets and I believe there are issues with io_uring adoption at Google that I'm not familiar with (and could be wrong). I'm interested in exploring io_uring support as a follow up but I think David Wei will be interested in io_uring support as well anyway. > I was also suggesting to do it via netlink initially, but it's probably > a bit slow for these purpose, idk. Yeah, I hear netlink is reserved for control paths and is inappropriate for data path, but I'll let folks correct me if wrong. --=20 Thanks, Mina