Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp37793299rwd; Tue, 11 Jul 2023 21:00:45 -0700 (PDT) X-Google-Smtp-Source: APBJJlHnmL5gvRBOYlTzMu6ocw8tMf2naNHjeg4upQb8S0Ma7xafRVjquH8qQjhdxeAtdI73ABhl X-Received: by 2002:a05:6402:1b0e:b0:51d:d5f2:121e with SMTP id by14-20020a0564021b0e00b0051dd5f2121emr13018824edb.5.1689134445058; Tue, 11 Jul 2023 21:00:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689134445; cv=none; d=google.com; s=arc-20160816; b=FSi7BRajxJlxIwDpSl7GYMelkMXDWCNdqQomPTaRDtfnJ5UbLCKSAwq90Eh/aZCiu4 2uizNrS4CQzfS9+GYpx0e+BJNjGhRlpNvpBScxTTLIOuGdsrXXb3evr0vJUHVhOoguYE XWQVRYiXbH5flfO8tX694ux2b/TeIgJt3U43CYk2ZDt5iexI7xSYvK12bWX/l6n8CwY0 s72mD+PaQ1jutcu7JK4mA32WXan3ZFga/CmFBIQffOVDgAzvxW10evCvfgg1GPSgFTa7 qAYlQGNBWVqrBJbzeE0O+/CIrCJAw78DrfQcs1UxgCHOnesnoigeC2zXFs5PNPvUcDrN xM+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=AGC+6rPypaLJlCQX7D5f9PQLl8/lGoMDODsVm7b1HB0=; fh=L3iumA2yHW3WfkTyifwAiXtvfBlBG/8FZ1olkm1cBBw=; b=PQYHCsyZjo5HvzezyHOQeG5juribTud6z7tGM5nGtW0XYufpa9knB4A6UWij33YR2a VENu5FaL9QuKNzb0RTRD6wlYvh8z8dJRJ6J0Vi/RlW/3fgrz40WkEOJRVMppTEy6TEQH KVyuk9i8djl9ggAUH0vUPzU1qQrLx1NmGd7F/g7ZCbIBX9+/0PoT2qAGAnLXs1O7vfVN +GMH/VnEmtOB6TqUGNfeASVTvNvtW735JYluxiXcfwAHnTmtgHUkeqUHu/5kpZwcUhrY zeNfX13nASbUQPydtESuRi8o5k5tLiFWv3JK/04uo0nvUu+Ee9uNckTstM8xqVg0POad p3vA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=3GLId8Tp; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y8-20020aa7ccc8000000b0051ddbdeb0c7si3657918edt.350.2023.07.11.21.00.26; Tue, 11 Jul 2023 21:00:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=3GLId8Tp; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231417AbjGLDmk (ORCPT + 60 others); Tue, 11 Jul 2023 23:42:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229668AbjGLDmi (ORCPT ); Tue, 11 Jul 2023 23:42:38 -0400 Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com [IPv6:2607:f8b0:4864:20::82e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A529810CF for ; Tue, 11 Jul 2023 20:42:36 -0700 (PDT) Received: by mail-qt1-x82e.google.com with SMTP id d75a77b69052e-403c0f12e77so6941981cf.2 for ; Tue, 11 Jul 2023 20:42:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689133356; x=1691725356; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=AGC+6rPypaLJlCQX7D5f9PQLl8/lGoMDODsVm7b1HB0=; b=3GLId8TpPFMbUN9+cqTJNNfvBWQHOwWggxKJsfePx31Keuk7IDA1cPTlVt51od2qSi YDxpkLxjSX5K7RB9R0IMlh54mcS5SYJRZ6P/MzOUrhAILetGevNyuWSehecm0MkkDY4B cGtibi5VjM67BkjWh3puy/uVWENRwqa13UsXS3DU2fV4AZYmAdnfgTUqRuEGYcJEos5i 3GK0UELsxf/4Sb87gmmGZVXLaxsYWNHu7nvKWg9GMmPqrKrHBZIuaZzh7JyXtSERo8y4 mTUDMdrI4PgfLk7lPMQBVMLpNHTuOXBGIXb1AK9mchDingX0e+fVFsGsAW8sBts2z52d HkIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689133356; x=1691725356; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AGC+6rPypaLJlCQX7D5f9PQLl8/lGoMDODsVm7b1HB0=; b=i4GIUmAkYHOaN6xW1PcOmCj7UHkmsbG9VycV0cUeopwQNzRQh8P27iQo6eOe2XS5Pn dcTYJA8qBlvdrr6nCmgY++i7X/ow1+rHNmpXvseRY/kJYq7fkSHKLdIl3b3iizNL3y5J k5a3EDJ5tDtq8VYdqFGx8pdkR1/+8M5b5H73MRvmcnaNWYAoFu4/L44Yp9g4TqJY47Ig Tud/9t2mAfXUB5FLgAVGMRv+gbE9I5NDooWgQ8MlgFnx3HY4v/0DuH3JQn4mXk42mMmj nJhNOGa3FTDEx23V+1VyBW9/UM21dQIskroaxYj/IDYESEzjZy3VUrIqnl4zasDGf2GP 0FZg== X-Gm-Message-State: ABy/qLaOzmUDtUxw1OJzDN62Ys9Eux9ShW69ymwqtidxKoK7DWT3oZD9 D6MY13UmyF09cl7gAzbtQwkgdXZnkDAHTYqa+5AQUA== X-Received: by 2002:ac8:7c50:0:b0:403:c417:ba9d with SMTP id o16-20020ac87c50000000b00403c417ba9dmr1033325qtv.16.1689133355634; Tue, 11 Jul 2023 20:42:35 -0700 (PDT) MIME-Version: 1.0 References: <5e0ac5bb-2cfa-3b58-9503-1e161f3c9bd5@kernel.org> <20230711042708.GA18658@lst.de> <20230710215906.49514550@kernel.org> <20230711050445.GA19323@lst.de> <20230711090047.37d7fe06@kernel.org> <04187826-8dad-d17b-2469-2837bafd3cd5@kernel.org> <20230711093224.1bf30ed5@kernel.org> <20230711133915.03482fdc@kernel.org> <2263ae79-690e-8a4d-fca2-31aacc5c9bc6@kernel.org> In-Reply-To: <2263ae79-690e-8a4d-fca2-31aacc5c9bc6@kernel.org> From: Mina Almasry Date: Tue, 11 Jul 2023 20:42:24 -0700 Message-ID: Subject: Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag) To: David Ahern , Samiullah Khawaja , Willem de Bruijn Cc: Jakub Kicinski , Jason Gunthorpe , Christoph Hellwig , John Hubbard , Dan Williams , Jesper Dangaard Brouer , brouer@redhat.com, Alexander Duyck , Yunsheng Lin , davem@davemloft.net, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Lorenzo Bianconi , Yisen Zhuang , Salil Mehta , Eric Dumazet , Sunil Goutham , Geetha sowjanya , Subbaraya Sundeep , hariprasad , Saeed Mahameed , Leon Romanovsky , Felix Fietkau , Ryder Lee , Shayne Chen , Sean Wang , Kalle Valo , Matthias Brugger , AngeloGioacchino Del Regno , Jesper Dangaard Brouer , Ilias Apalodimas , linux-rdma@vger.kernel.org, linux-wireless@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, Jonathan Lemon , =?UTF-8?Q?Christian_K=C3=B6nig?= , logang@deltatee.com, Bjorn Helgaas Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org On Tue, Jul 11, 2023 at 2:39=E2=80=AFPM David Ahern wr= ote: > > On 7/11/23 2:39 PM, Jakub Kicinski wrote: > > On Tue, 11 Jul 2023 10:06:28 -0700 Mina Almasry wrote: > >>>> Any reason not to allow an alternative representation for skb frags = than > >>>> struct page? > >>> > >>> I don't think there's a hard technical reason. We can make it work. > >> > >> I also think we can switch the representation for skb frags to > >> something else. However - please do correct me if I'm wrong - I don't > >> think that is sufficient for device memory TCP. My understanding is > >> that we also need to modify any NIC drivers that want to use device > >> memory TCP to understand a new memory type, and the page pool as well > >> if that's involved. I think in particular modifying the memory type in > >> all the NIC drivers that want to do device memory TCP is difficult. Do > >> you think this is feasible? > > > > That's why I was thinking about adding an abstraction between > > the page pool and the driver. Instead of feeding driver pages > > a new abstraction could feed the driver just an identifier and a PA. > > skb frag is currently a bio_vec. Overloading the 'struct page' address > in that struct with another address is easy to do. Requiring a certain > alignment on the address gives you a few low bits to use a flags / magic > / etc. > > Overloading len and offset is not really possible - way too much code is > affected (e.g., iov walking and MSS / TSO segmenting). > > ie., you could overload page address with a pointer to an object in your > new abstraction layer and the struct has the other meta data. > > typedef struct skb_frag { > union { > struct bio_vec bvec; > struct new_abstraction abs; > }; > } skb_frag_t; > > where > > struct new_abstraction { > void *addr, > unsigned int len; > unsigned int offset; > }; > > I have been playing with a similar and it co-exists with the existing > code quite well with the constraint on location of len and offset. > > > > > Whether we want to support fragmentation in that model or not would > > have to be decided. > > > > We can take pages from the page pool and feed them to drivers via > > such an API, but drivers need to stop expecting pages. > > yes, drivers would have to be updated to understand the new format. A > downside, but again relatively easy to manage. > I'm glad to see that you're open to this approach. As far as I understand, getting device memory in a struct page form would still be preferred, no? And the approach you point to would be a backup plan I presume? Since the good folks on this thread have pointed me to p2pdma to address my use case, I've been doing some homework to see if it can apply. AFACT so far, it applies, and Willem actually had a prototype of it working a while back. The rough approach Willem and I are thinking of would be something like: 1. The device memory driver would be the p2pdma provider. It would expose a user API which allocates a device memory region, calls pci_p2pdma_add_resource() and pci_p2pmem_publish() on it, and returns a reference to it to the userspace. 2. The NIC driver would be the p2pdma client and orchestrator. It would expose a user API which binds an rxq to a pci device. Prior to the bind the user API would check that the pci device has published p2p memory (pci_has_p2pmem()), and check the the p2p mem is accessible to the driver (pci_p2pdma_distance() I think), etc. 3. The NIC would allocate pages from the p2pdma provider for incoming packets, and create devmem skbs, and deliver the devmem skbs to the user using the support in my RFC. AFACT all that code need not be changed. AFAICT, all the concerns brought up in this thread are sidestepped by using p2pdma. I need not allocate struct pages in the core dma-buf code anymore (or anywhere), and I need not allocate pgmaps. I would just re-use the p2pdma support. Anyone see any glaring issues with this approach? I plan on trying to implement a PoC and sending an RFC v2. The only pending concern is integration with the page pool, but we already have some ideas on how to solve that. > > > > That's for data buffers only, obviously. We can keep using pages > > and raw page pool for headers. > > yes. --=20 Thanks, Mina