Received: by 2002:a05:7412:8598:b0:f9:33c2:5753 with SMTP id n24csp88742rdh; Mon, 18 Dec 2023 12:25:10 -0800 (PST) X-Google-Smtp-Source: AGHT+IEj7triqPuGIbszCM4FrHi76frCz5Wc6J0ol7NdP4nfCbiomwq271Ek5Y7TeFdi4YjGczJc X-Received: by 2002:a17:90a:34c5:b0:28b:5060:c93a with SMTP id m5-20020a17090a34c500b0028b5060c93amr4897707pjf.2.1702931110549; Mon, 18 Dec 2023 12:25:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702931110; cv=none; d=google.com; s=arc-20160816; b=HlRXDbfCRERb4l5qXRvqftrtpNwne3ctitT3RwvFhUvjR55n/ihvDMmR+XVlC5qUAa 9MU+kMMhSnyzaQp7gWlHbkrzFFSM8Pxr03YKe7do0l2cydrol2EuRXEwUS3gk+IlS2ww Xnw1nMKiWRznUTYfGE/bxIRb1pHAKslaq1U8aa5I9zB8rgksPFK6lFNST8JZX0EmDKiC YI0Xg7ksmrhdM8C1ToAB0kRRhT76fd49bffqXxpLzL8/0kpgSXcMr2dRfIcVRSjA5AA3 dO9fZJJh1HI4rQLiAsS+vDBa4MuHuIvEE+Aq4kl2G9FpXeDsNZ81DASBoryJ27lxg8dI Zkhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=hBnDjqR8ev8eEOR/Wb/mNlLBPLyySVDmi6fwcpUC1aE=; fh=wYHZTVomEAOlC9s+rN5ugyv/CioUR7dbHFj6y9Orbvw=; b=d0rpxQGy3v5QG0OH0SQyBEgB/MHi3epvCC5LKyu74sZiKFnRVb2W/0GSjEl5GGS3/V jFfd2+smEjJ3eyfee8YbSGbXDUp+Birf9knX3irayvQXEJ3MDz4LBCJAF2TCFkAUcZW5 V1yAjjgV8I0+6OfcESxdNSsWSK7Pi/2XFGynyBgK3S7UY/CPnjH9u0uT907sotK54l/h u3zrLdW2I8S59M0aotLRXfjGXhN6t69mOptYcRXNLztmuRb0dgKN5WvFVaJ4j+i6K9z2 LkiSZWahmdM/38GAioLFgvxkWy0q+yJMdmpRBPeL3dnoKohPryaaKUs6k4LNA+F09xPn gbkg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=oljQn0xn; spf=pass (google.com: domain of linux-kernel+bounces-4341-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-4341-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id j18-20020a170902759200b001d36292f2b2si2393734pll.215.2023.12.18.12.25.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 12:25:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-4341-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=oljQn0xn; spf=pass (google.com: domain of linux-kernel+bounces-4341-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-4341-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 3E2FFB23D16 for ; Mon, 18 Dec 2023 20:22:52 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 50ABE7349F; Mon, 18 Dec 2023 20:22:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="oljQn0xn" X-Original-To: linux-kernel@vger.kernel.org Received: from mail-ej1-f48.google.com (mail-ej1-f48.google.com [209.85.218.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF66648784 for ; Mon, 18 Dec 2023 20:22:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-ej1-f48.google.com with SMTP id a640c23a62f3a-a2370535060so66746066b.1 for ; Mon, 18 Dec 2023 12:22:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702930943; x=1703535743; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hBnDjqR8ev8eEOR/Wb/mNlLBPLyySVDmi6fwcpUC1aE=; b=oljQn0xnbHUycvaF+zVaIvUUqRi2ivtzBgVGbZpm+FtVvn3zwZrHLFRwK+OIQbHkMG UQUFTeylQBFUE19DpGZcTU+1xWmbzEFJLwbyZaJt/TqL7j/iFWHiajeNKzEUBF9Gpn5y PPB9yn18Aa4nHyv8G2ifU6MZevfnDTCq5hXvdJJ1LA8lO/ClkJvRFruGPIR5BnpC9Kpr 8EQH+DwX3Goyqx0M7Ld4P0r+H5SdNUJWICs8V3j6HMXvTdAwHrq6a/2Zafq3zzU4BaRr VRaWO+nLw06R3EQKsWMkNLSQxJcSwtb09f2uHGrkILYZokRiQmSOEuyo0f+pe8GQF0zL nEWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702930943; x=1703535743; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hBnDjqR8ev8eEOR/Wb/mNlLBPLyySVDmi6fwcpUC1aE=; b=FBJBhueJ8v4/3DuAXI/VcXZidENxewv8/giHyv44nROVKk0imRhODPNPvVUVSVO8XM FLtqIIi6ALkZIK0Xy73+ZSh/Vtdcz7MeWjrVNtK0u/vivFPbIxxm+LAWBXcTdLVoIIZH H0Oxp795CbT2YZo0UC8aQlDvYk0rIJNpdBcDB6n3RLebOfwAYW2MPJwFhL9tXk7v/gUs 8/WPUiYmSZ/qOqOWs+mXL/74TQISNlIe+C6jf+hVjh7OsU3Iky0PNg9DdsEWY98E410z HFvT19xbfWQYJepvxcofD6M5vZ7iUUROYMFKzNOkt61rtiOaMMm+hLGDop3gbstWiT7N 0Lrw== X-Gm-Message-State: AOJu0Yxro80Am0OUvHCU7n3mZJyIMtCb23WQgf79zLIAZhJZIGp/0A9W nd2Q97EndCkzPhnhkfChB5llNxSEqXGxtMW9wdWsp71Lv+KePrbmhJXqyw== X-Received: by 2002:a17:906:1b0e:b0:a23:5893:1ac8 with SMTP id o14-20020a1709061b0e00b00a2358931ac8mr2231828ejg.27.1702930942877; Mon, 18 Dec 2023 12:22:22 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20231217080913.2025973-1-almasrymina@google.com> <20231217080913.2025973-4-almasrymina@google.com> <1195676f-59a4-40d8-b459-d2668eb8c5fe@huawei.com> In-Reply-To: <1195676f-59a4-40d8-b459-d2668eb8c5fe@huawei.com> From: Mina Almasry Date: Mon, 18 Dec 2023 12:22:11 -0800 Message-ID: Subject: Re: [PATCH net-next v2 3/3] net: add netmem_t to skb_frag_t To: Yunsheng Lin Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux.dev, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Stefan Hajnoczi , Stefano Garzarella , Jason Gunthorpe , =?UTF-8?Q?Christian_K=C3=B6nig?= , Shakeel Butt , Willem de Bruijn Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Dec 18, 2023 at 4:39=E2=80=AFAM Yunsheng Lin wrote: > > On 2023/12/17 16:09, Mina Almasry wrote: > > Use netmem_t instead of page directly in skb_frag_t. Currently netmem_t > > is always a struct page underneath, but the abstraction allows efforts > > to add support for skb frags not backed by pages. > > > > There is unfortunately 1 instance where the skb_frag_t is assumed to be > > a bio_vec in kcm. For this case, add a debug assert that the skb frag i= s > > indeed backed by a page, and do a cast. > > > > Add skb[_frag]_fill_netmem_*() and skb_add_rx_frag_netmem() helpers so > > that the API can be used to create netmem skbs. > > > > Signed-off-by: Mina Almasry > > > > ... > > > > > -typedef struct bio_vec skb_frag_t; > > +typedef struct skb_frag { > > + struct netmem *bv_page; > > bv_page -> bv_netmem? > bv_page, bv_len & bv_offset all are misnomers after this change indeed, because bv_ refers to bio_vec and skb_frag_t is no longer a bio_vec. However I'm hoping renaming everything can be done in a separate series. Maybe I'll just apply the bv_page -> bv_netmem change, that doesn't seem to be much code churn and it makes things much less confusing. > > + unsigned int bv_len; > > + unsigned int bv_offset; > > +} skb_frag_t; > > > > /** > > * skb_frag_size() - Returns the size of a skb fragment > > @@ -2431,22 +2436,37 @@ static inline unsigned int skb_pagelen(const st= ruct sk_buff *skb) > > return skb_headlen(skb) + __skb_pagelen(skb); > > } > > > > ... > > > /** > > @@ -2462,10 +2482,10 @@ static inline void skb_len_add(struct sk_buff *= skb, int delta) > > } > > > > /** > > - * __skb_fill_page_desc - initialise a paged fragment in an skb > > + * __skb_fill_netmem_desc - initialise a paged fragment in an skb > > * @skb: buffer containing fragment to be initialised > > * @i: paged fragment index to initialise > > - * @page: the page to use for this fragment > > + * @netmem: the netmem to use for this fragment > > * @off: the offset to the data with @page > > * @size: the length of the data > > * > > @@ -2474,10 +2494,13 @@ static inline void skb_len_add(struct sk_buff *= skb, int delta) > > * > > * Does not take any additional reference on the fragment. > > */ > > -static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, > > - struct page *page, int off, int s= ize) > > +static inline void __skb_fill_netmem_desc(struct sk_buff *skb, int i, > > + struct netmem *netmem, int off, > > + int size) > > { > > - __skb_fill_page_desc_noacc(skb_shinfo(skb), i, page, off, size); > > + struct page *page =3D netmem_to_page(netmem); > > + > > + __skb_fill_netmem_desc_noacc(skb_shinfo(skb), i, netmem, off, siz= e); > > > > /* Propagate page pfmemalloc to the skb if we can. The problem is > > * that not all callers have unique ownership of the page but rel= y > > @@ -2485,7 +2508,21 @@ static inline void __skb_fill_page_desc(struct s= k_buff *skb, int i, > > */ > > page =3D compound_head(page); > > if (page_is_pfmemalloc(page)) > > - skb->pfmemalloc =3D true; > > + skb->pfmemalloc =3D true; > > Is it possible to introduce netmem_is_pfmemalloc() and netmem_compound_he= ad() > for netmem, That is exactly the plan, and I added these helpers in the follow up series which introduces devmem support: https://patchwork.kernel.org/project/netdevbpf/patch/20231218024024.3516870= -8-almasrymina@google.com/ > and have some built-time testing to ensure the implementation > is the same between page_is_pfmemalloc()/compound_head() and > netmem_is_pfmemalloc()/netmem_compound_head()? That doesn't seem desirable to me. It's too hacky IMO to duplicate the implementation details of the MM stack in the net stack and that is not the implementation you see in the patch that adds these helpers above. > So that we can avoid the > netmem_to_page() as much as possible, especially in the driver. > Agreed. > > > +} > > + > > +static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, > > + struct page *page, int off, int s= ize) > > +{ > > + __skb_fill_netmem_desc(skb, i, page_to_netmem(page), off, size); > > +} > > + > > ... > > > */ > > static inline struct page *skb_frag_page(const skb_frag_t *frag) > > { > > - return frag->bv_page; > > + return netmem_to_page(frag->bv_page); > > It seems we are not able to have a safe type protection for the above > function, as the driver may be able to pass a devmem frag as a param here= , > and pass the returned page into the mm subsystem, and compiler is not abl= e > to catch it when compiling. > That depends on the implementation of netmem_to_page(). As I implemented it in the follow up series, netmem_to_page() always checks that netmem is actually a page before doing the conversion via checking the LSB checking. It's of course unacceptable to make an unconditional cast here. That will get around the type safety as you point out, and defeat the point. But I'm not doing that. I can add a comment above netmem_to_page(): /* Returns page* if the netmem is backed by a page, NULL otherwise. Current= ly * netmem can only be backed by a page, so we always return the underlying * page. */ static inline struct page *netmem_to_page(struct netmem *netmem); > > } > > > > /** > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index 83af8aaeb893..053d220aa2f2 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -845,16 +845,24 @@ struct sk_buff *__napi_alloc_skb(struct napi_stru= ct *napi, unsigned int len, > > } > > EXPORT_SYMBOL(__napi_alloc_skb); > > > > ... > > > diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c > > index 65d1f6755f98..5c46db045f4c 100644 > > --- a/net/kcm/kcmsock.c > > +++ b/net/kcm/kcmsock.c > > @@ -636,9 +636,15 @@ static int kcm_write_msgs(struct kcm_sock *kcm) > > for (i =3D 0; i < skb_shinfo(skb)->nr_frags; i++) > > msize +=3D skb_shinfo(skb)->frags[i].bv_len; > > > > + /* The cast to struct bio_vec* here assumes the frags are > > + * struct page based. > > + */ > > + DEBUG_NET_WARN_ON_ONCE( > > + !skb_frag_page(&skb_shinfo(skb)->frags[0])); > > It seems skb_frag_page() always return non-NULL in this patch, the above > checking seems unnecessary? We're doing a cast below, and the cast is only valid if the frag has a page underneath. This check makes sure the skb has a page. In the series that adds devmem support, skb_frag_page() returns NULL as the frag has no page. Since this patch adds the dangerous cast, I think it may be reasonable for it to also add the check. I can add a comment above skb_frag_page() to indicate the intention again: * Returns the &struct page associated with @frag. Returns NULL if this fra= g * has no associated page. But as of this series netmem can only be a page, so I think adding that information may be premature. --=20 Thanks, Mina