Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp2385215rdf; Mon, 6 Nov 2023 12:32:20 -0800 (PST) X-Google-Smtp-Source: AGHT+IHhcz2KQBQcMydtmPkv0bKBH7kVCE/nluzXkLmfalNXCIJqOO1zh+egvH+1LoibeyHvThc2 X-Received: by 2002:a05:6358:7e42:b0:169:9788:8741 with SMTP id p2-20020a0563587e4200b0016997888741mr20700188rwm.27.1699302739897; Mon, 06 Nov 2023 12:32:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699302739; cv=none; d=google.com; s=arc-20160816; b=NjKRmePMg9fe65ZFP45DON5aBf6SZbh8vyPwNu2fa4rsyXorIOwDraesXlhvqDOiPy z9sB+x/qo/DjgAIljiHy56fj4MdY3wvhQSwmNwd/MU+wsznD/2HEtHVx1d5aV7ywo2d4 ElnzBtxCrHpdJd93zZqvyCc1+sp/CAfJxLX9lKFMYwfjDhgFT1usUlHvaIYWutWCcW2a OHZL5ZH0TboIt6D+bAWeiOFYp0mEWB9SLTQkNTTOjxmsjFF9zF3efwhOAVrNt9y1LoCt GG/OoeeYnDk0+dRL8ibmZDq1uGzVoBiKgNT1VLqcv63GfYDELX7Sx96r0AebNAchdjbK O1yA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=zK/ivvZpYGK0txvsaPdwVPtPvREv30UHRwOyNyNZQMA=; fh=qrP9GnGiN9NNXTidoe2z6Mi4daM7GMI5qV+rR+HRkyE=; b=D6NFDV53bVoFjlRmNADkcSRMGD+UHP31rWMYvq3QMyQJrc5/ZcfQoiguxbXshmwRzY 45/bDXd4vJfGzXJPa7eAaEagXG+2/vx5TpIGuQFqW5noRwEKF1gYScLV4/QpTT/KEj8d KVd9xfv0lR1273LD2O0U0DCIahBcQXqGwC3I8AWS4ulupvZuAvToY63qK0vD+ihTdqtT VZngd4kzM6mCqnINRnZemvFO+vCXvwh3o6pPVrAz+UG3qG+5SEzZb+qYNO4MuT2ngqQa +6DVWmT8YvCpwWlQRzSntcoxcGR+Y/eA/NOC1Zi0FLsQn6MR8VlAQe+oBH2Q5FBE+zZe 30iA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=RRD3Usw6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id t23-20020a656097000000b005bd5a60d73csi417341pgu.708.2023.11.06.12.32.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 12:32:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=RRD3Usw6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id D6DB6803BE96; Mon, 6 Nov 2023 12:32:16 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232927AbjKFUcC (ORCPT + 99 others); Mon, 6 Nov 2023 15:32:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50588 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232969AbjKFUcC (ORCPT ); Mon, 6 Nov 2023 15:32:02 -0500 Received: from mail-ua1-x934.google.com (mail-ua1-x934.google.com [IPv6:2607:f8b0:4864:20::934]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 455B7D71 for ; Mon, 6 Nov 2023 12:31:59 -0800 (PST) Received: by mail-ua1-x934.google.com with SMTP id a1e0cc1a2514c-7bae0c07007so754796241.1 for ; Mon, 06 Nov 2023 12:31:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699302718; x=1699907518; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zK/ivvZpYGK0txvsaPdwVPtPvREv30UHRwOyNyNZQMA=; b=RRD3Usw6APbZsZLoI8bEeQkg7WrZzeZbbmdr5GvP1hZtW6uIYdSfdEYCcyfgGGm2sY FCXWebTTVbfIgHqfnPdxBJj3GSQwxGX/H1znruYCBRtlvo0FCaAlPga+7i0wwmWBDmkn yYukarL5c9f8Z8C9NDjefOe2cCVoBZhVqqUdvlTIsuRsj4DGviNzQn0QXkI6CPS698em azET5PHfQTUsyGRcrlNky4KPtn71y4vZHF1NTtZPzu+ZsWZxfsk7SJAgaUVSDehMZfBs xXCgfMzOkSYLC6wVAJwdSKQ88XUpEDiacRNTob/dl9OzXTZ6V62FurENasMdKbUCLBy7 3oyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699302718; x=1699907518; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zK/ivvZpYGK0txvsaPdwVPtPvREv30UHRwOyNyNZQMA=; b=mGrE/BJuembJJED31L2TqyuveIvOU9xlQjHgK+FBUdUWhi6i87Ksxid71Ut7NCoAGg Zt5a0VdQt1E5PEMeWZGGkubx6Gg/ehLdVBWbOfpeVUSGJG1dqZMHw9bxXrPYHW1g4ScX MBBvNymtA9FRVZDdGQ3skLW1b2VYrVDmoo/S4EHwTvn7iqO5AIGbTGTkj7kgQbi3hANP 2qHz0Sp9pz8unQhBoPFVxvmCysti2wkmP8FvmguPdTwMGPD2jAACShLToT1s/fYenJrc CrV1FW0W2qNTGAFmDVJNgc0CRGG6rKU2eZNAF+JxglQnBJ82+93Jb7hvehCEZMLBLkOC Dt1g== X-Gm-Message-State: AOJu0YzYxF484wc8CsVUrZsUPIlmxwsMosMOhklwqOI9vbismAAtGsJc pksqyE2tu/0diDxcQc9cKUL7wjQ/527lVi4bHnMepg== X-Received: by 2002:a67:e083:0:b0:45f:6490:b6c6 with SMTP id f3-20020a67e083000000b0045f6490b6c6mr1189218vsl.18.1699302718146; Mon, 06 Nov 2023 12:31:58 -0800 (PST) MIME-Version: 1.0 References: <20231106024413.2801438-1-almasrymina@google.com> <20231106024413.2801438-10-almasrymina@google.com> <19129763-6f74-4b04-8a5f-441255b76d34@kernel.org> In-Reply-To: <19129763-6f74-4b04-8a5f-441255b76d34@kernel.org> From: Mina Almasry Date: Mon, 6 Nov 2023 12:31:44 -0800 Message-ID: Subject: Re: [RFC PATCH v3 09/12] net: add support for skbs with unreadable frags To: David Ahern Cc: Stanislav Fomichev , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , Willem de Bruijn , Shuah Khan , Sumit Semwal , =?UTF-8?Q?Christian_K=C3=B6nig?= , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , Willem de Bruijn , Kaiyuan Zhang Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Mon, 06 Nov 2023 12:32:17 -0800 (PST) On Mon, Nov 6, 2023 at 11:34=E2=80=AFAM David Ahern wr= ote: > > On 11/6/23 11:47 AM, Stanislav Fomichev wrote: > > On 11/05, Mina Almasry wrote: > >> For device memory TCP, we expect the skb headers to be available in ho= st > >> memory for access, and we expect the skb frags to be in device memory > >> and unaccessible to the host. We expect there to be no mixing and > >> matching of device memory frags (unaccessible) with host memory frags > >> (accessible) in the same skb. > >> > >> Add a skb->devmem flag which indicates whether the frags in this skb > >> are device memory frags or not. > >> > >> __skb_fill_page_desc() now checks frags added to skbs for page_pool_io= vs, > >> and marks the skb as skb->devmem accordingly. > >> > >> Add checks through the network stack to avoid accessing the frags of > >> devmem skbs and avoid coalescing devmem skbs with non devmem skbs. > >> > >> Signed-off-by: Willem de Bruijn > >> Signed-off-by: Kaiyuan Zhang > >> Signed-off-by: Mina Almasry > >> > >> --- > >> include/linux/skbuff.h | 14 +++++++- > >> include/net/tcp.h | 5 +-- > >> net/core/datagram.c | 6 ++++ > >> net/core/gro.c | 5 ++- > >> net/core/skbuff.c | 77 ++++++++++++++++++++++++++++++++++++-----= - > >> net/ipv4/tcp.c | 6 ++++ > >> net/ipv4/tcp_input.c | 13 +++++-- > >> net/ipv4/tcp_output.c | 5 ++- > >> net/packet/af_packet.c | 4 +-- > >> 9 files changed, 115 insertions(+), 20 deletions(-) > >> > >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > >> index 1fae276c1353..8fb468ff8115 100644 > >> --- a/include/linux/skbuff.h > >> +++ b/include/linux/skbuff.h > >> @@ -805,6 +805,8 @@ typedef unsigned char *sk_buff_data_t; > >> * @csum_level: indicates the number of consecutive checksums found = in > >> * the packet minus one that have been verified as > >> * CHECKSUM_UNNECESSARY (max 3) > >> + * @devmem: indicates that all the fragments in this skb are backed = by > >> + * device memory. > >> * @dst_pending_confirm: need to confirm neighbour > >> * @decrypted: Decrypted SKB > >> * @slow_gro: state present at GRO time, slower prepare step require= d > >> @@ -991,7 +993,7 @@ struct sk_buff { > >> #if IS_ENABLED(CONFIG_IP_SCTP) > >> __u8 csum_not_inet:1; > >> #endif > >> - > >> + __u8 devmem:1; > >> #if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS) > >> __u16 tc_index; /* traffic control index = */ > >> #endif > >> @@ -1766,6 +1768,12 @@ static inline void skb_zcopy_downgrade_managed(= struct sk_buff *skb) > >> __skb_zcopy_downgrade_managed(skb); > >> } > >> > >> +/* Return true if frags in this skb are not readable by the host. */ > >> +static inline bool skb_frags_not_readable(const struct sk_buff *skb) > >> +{ > >> + return skb->devmem; > > > > bikeshedding: should we also rename 'devmem' sk_buff flag to 'not_reada= ble'? > > It better communicates the fact that the stack shouldn't dereference th= e > > frags (because it has 'devmem' fragments or for some other potential > > future reason). > > +1. > > Also, the flag on the skb is an optimization - a high level signal that > one or more frags is in unreadable memory. There is no requirement that > all of the frags are in the same memory type. The flag indicates that the skb contains all devmem dma-buf memory specifically, not generic 'not_readable' frags as the comment says: + * @devmem: indicates that all the fragments in this skb are backed by + * device memory. The reason it's not a generic 'not_readable' flag is because handing off a generic not_readable skb to the userspace is semantically not what we're doing. recvmsg() is augmented in this patch series to return a devmem skb to the user via a cmsg_devmem struct which refers specifically to the memory in the dma-buf. recvmsg() in this patch series is not augmented to give any 'not_readable' skb to the userspace. IMHO skb->devmem + an skb_frags_not_readable() as implemented is correct. If a new type of unreadable skbs are introduced to the stack, I imagine the stack would implement: 1. new header flag: skb->newmem 2. static inline bool skb_frags_not_readable(const struct skb_buff *skb) { return skb->devmem || skb->newmem; } 3. tcp_recvmsg_devmem() would handle skb->devmem skbs is in this patch series, but tcp_recvmsg_newmem() would handle skb->newmem skbs. --=20 Thanks, Mina