Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp5087262rwp; Sun, 16 Jul 2023 19:16:54 -0700 (PDT) X-Google-Smtp-Source: APBJJlEQ9S5ddFXnBu+Pa8nTs03B2EbLnJjF0FiKb4jOdac/rtbrNiZmQgvNNw1qJVrYcsHKf1I/ X-Received: by 2002:aa7:cf8b:0:b0:51d:8aaf:5adc with SMTP id z11-20020aa7cf8b000000b0051d8aaf5adcmr10452423edx.14.1689560214540; Sun, 16 Jul 2023 19:16:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689560214; cv=none; d=google.com; s=arc-20160816; b=MuOD2zwjK2qxmfpA/xxm9eYxTeYO0mEQo1IVDNcwTANYkuJDjQ1LGUQ2O/B0kRKwAG QmluNORQ/dWzSnpeUMLVMUiwtrxk/qXj6x5lG6mA/OXZX3p2UGYixAzA8Ft2x4fJEcCf FQTrm7F/XQl56gVCWeT/xtbF8aAOzuHMmrc/aUrubu2PpSIX4/5Plk0X17VYBYJ6g/pl IiaTm04cy36s3lMTSWd0PVJnGNudMP9TM/D8FOKhPf6qKbQe3oVS5Zn4u9NKJhNM5Rpb iqkZit2PT0zuM9liumoofNe9/nxTc2stySvVpDmMXF574BnkBoaDZzn5XdDGgLJ6CIiv WGcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=NbHxQY6TxAMzw+E74X1AuxdarcQJj9Gt/ZbpwZ/TygU=; fh=HBlwvOyudevUhMEwTA3mmaOBNZoMHxjOXobW7Q+IOk8=; b=wShzetbXSdeDPjpSw5a/kVsGYCHRWbuHr6tNhQEEl3Qdhlpi9pPw8KVUvwJVO9g5Lt CkbdnOUvgJepw6KB/cvUXfF5UuaORlmtVpSPN2gnlheqCJnMKYpDdT4OC9juSmCSVRV2 4Zv7htFFLmI8QkFlnl++Ld1SL4tdyjYBvvRT7+SOShicsSTkKNMwpvJGXYtTpey5yo6N twE5iOsR32XO926Y/eoPwBbGGC45M3R498ZELs7HpIvBGzaqSvzMwq2TtkvTk0yWbKdU KNN6TSd0lbY0uji0IL4oiCX882YdMo5d9Usip9zSWNo1Jjx316KRRbeTwHUA4I7egB2w SIlQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=LlSTUor5; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q7-20020aa7d447000000b0051dd487b0d6si13491288edr.361.2023.07.16.19.16.36; Sun, 16 Jul 2023 19:16:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=LlSTUor5; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230367AbjGQCFh (ORCPT + 58 others); Sun, 16 Jul 2023 22:05:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230356AbjGQCFg (ORCPT ); Sun, 16 Jul 2023 22:05:36 -0400 Received: from mail-ua1-x933.google.com (mail-ua1-x933.google.com [IPv6:2607:f8b0:4864:20::933]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69BE9E5A for ; Sun, 16 Jul 2023 19:05:34 -0700 (PDT) Received: by mail-ua1-x933.google.com with SMTP id a1e0cc1a2514c-7948c329363so1416753241.0 for ; Sun, 16 Jul 2023 19:05:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689559533; x=1692151533; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=NbHxQY6TxAMzw+E74X1AuxdarcQJj9Gt/ZbpwZ/TygU=; b=LlSTUor5++VfrsgeMNhTsQvIQfzINqZTzWZQdM/YlSc1psb5UXBj79x8AXNgn/Hc5A KaTvYC4VC1eG+mmxJ4DbPB6p+VgNFFYOC4tYdVzYNT3ug5LNJgsRg8VsoyzcS+sm04pE Wt6eRiT0+FDWCzJqcN53T2IfZTeSknkhipnm3jGqPVT4dK4nCvsAPXomUyqhI8+MYlbe Tz/Ilq2UPKQNetNtsj4hODKleTp8Z+h1sWKV52KA6HPz8QiUF3xBp4UM4U28CixheMZj b6a42JRIQsnZ2XS9G1W0iVxXQt3cn/loHL8RtxGp6yeNpHvnKxpK/70JC5nAjn3wb7MP KwKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689559533; x=1692151533; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NbHxQY6TxAMzw+E74X1AuxdarcQJj9Gt/ZbpwZ/TygU=; b=X4ccZqQay1rsrXzvDHIfkc0ZT/s/B2ie5pm7iBY4nLxowo3Nhju8F6JtF3ZzYSEiQs eeN25VKOj204lzyz78pFzWVsLr+O2mRctQiWKei8NvguBAujAd71pk0WfyP+YQOFaX6v l82qqLECg68hHafLB4lUcUinSImDJfU5bjIbPuPvzW9YJvVGmBEfxViW0HeRPGp9uhqu 9vFdvmyw0Lztv8qtj5Vs1mNjBtndWUEebrOlwCMn9bQmq/hYt6BiORoEi9C/ALA+AkYa aNhao2P6Sx5NCMUS54wAk2JpqowAyJ8daBHneSsWcG/3YODtETVEd5FPhpViwcPPYfCO to4g== X-Gm-Message-State: ABy/qLZkcF21tJ0Y/mttwRrJRps3x4eWUhKZcsB+dI6fjVleYNl0wuCr /SOO2Ii69LxQPW/Dn3RP3tgvXW1aWrZ9XOCRODocFw== X-Received: by 2002:a1f:e004:0:b0:481:2d4c:162d with SMTP id x4-20020a1fe004000000b004812d4c162dmr5072269vkg.8.1689559533341; Sun, 16 Jul 2023 19:05:33 -0700 (PDT) MIME-Version: 1.0 References: <20230710215906.49514550@kernel.org> <20230711050445.GA19323@lst.de> <20230711090047.37d7fe06@kernel.org> <04187826-8dad-d17b-2469-2837bafd3cd5@kernel.org> <20230711093224.1bf30ed5@kernel.org> <20230711133915.03482fdc@kernel.org> <2263ae79-690e-8a4d-fca2-31aacc5c9bc6@kernel.org> <143a7ca4-e695-db98-9488-84cf8b78cf86@amd.com> <9cf3ce79-2d5e-090d-c83e-0c359ace1cb9@kernel.org> In-Reply-To: <9cf3ce79-2d5e-090d-c83e-0c359ace1cb9@kernel.org> From: Mina Almasry Date: Sun, 16 Jul 2023 19:05:21 -0700 Message-ID: Subject: Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag) To: David Ahern Cc: =?UTF-8?Q?Christian_K=C3=B6nig?= , Hari Ramakrishnan , Jason Gunthorpe , Samiullah Khawaja , Willem de Bruijn , Jakub Kicinski , Christoph Hellwig , John Hubbard , Dan Williams , Jesper Dangaard Brouer , brouer@redhat.com, Alexander Duyck , Yunsheng Lin , davem@davemloft.net, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Lorenzo Bianconi , Yisen Zhuang , Salil Mehta , Eric Dumazet , Sunil Goutham , Geetha sowjanya , Subbaraya Sundeep , hariprasad , Saeed Mahameed , Leon Romanovsky , Felix Fietkau , Ryder Lee , Shayne Chen , Sean Wang , Kalle Valo , Matthias Brugger , AngeloGioacchino Del Regno , Jesper Dangaard Brouer , Ilias Apalodimas , linux-rdma@vger.kernel.org, linux-wireless@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, Jonathan Lemon , logang@deltatee.com, Bjorn Helgaas Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org On Fri, Jul 14, 2023 at 8:19=E2=80=AFAM David Ahern wr= ote: > > On 7/14/23 8:55 AM, Mina Almasry wrote: > > > > I guess the remaining option not fully explored is the idea of getting > > the networking stack to consume the scatterlist that > > dma_buf_map_attachment() provides for the device memory. The very > > rough approach I have in mind (for the RX path) is: > > > > 1. Some uapi that binds a dmabuf to an RX queue. It will do a > > dma_buf_map_attachment() and get the sg table. > > > > 2. We need to feed the scratterlist entries to some allocator that > > will chunk it up into pieces that can be allocated by the NIC for > > incoming traffic. I'm thinking genalloc may work for this as-is, but I > > may need to add one or use something else if I run into some issue. > > > > 3. We can implement a memory_provider that allocates these chunks and > > wraps them in a struct new_abstraction (as David called it) and feeds > > those into the page pool. > > > > 4. The page pool would need to be able to process these struct > > new_abstraction alongside the struct pages it normally gets from > > providers. This is maybe the most complicated part, but looking at the > > page pool code it doesn't seem that big of a hurdle (but I have not > > tried a POC yet). > > > > 5. The drivers (I looked at mlx5) seem to avoid making any mm calls on > > the struct pages returned by the pool; the pool abstracts everything > > already. The changes to the drivers may be minimal..? > > > > 6. We would need to add a new helper, skb_add_rx_new_abstraction_frag > > that creates a frag out of new_abstraction rather than a struct page. > > > > Once the skb frags with struct new_abstraction are in the TCP stack, > > they will need some special handling in code accessing the frags. But > > my RFC already addressed that somewhat because the frags were > > inaccessible in that case. In this case the frags will be both > > inaccessible and will not be struct pages at all (things like > > get_page() will not work), so more special handling will be required, > > maybe. > > > > I imagine the TX path would be considerably less complicated because > > the allocator and page pool are not involved (I think). > > > > Anyone see any glaring issues with this approach? > > Moving skb_frags to an alternative scheme is essential to make this > work. The current page scheme to go from user virtual to pages to > physical is not needed for the dmabuf use case. > > For the driver and hardware queue: don't you need a dedicated queue for > the flow(s) in question? In the RFC and the implementation I'm thinking of, the queue is 'dedicated' in that each queue will be a devmem TCP queue or a regular queue. devmem queues generate devmem skbs and non-devmem queues generate non-devmem skbs. We support switching queues between devmem mode and non-devmem mode via a uapi. > If not, how can you properly handle the > teardown case (e.g., app crashes and you need to ensure all references > to GPU memory are removed from NIC descriptors)? Jason and Christian will correct me if I'm wrong, but AFAICT the dma-buf API requires the dma-buf provider to keep the attachment mapping alive as long as the importer requires it. The dma-buf API gives the importer dma_buf_map_attachment() and dma_buf_unmap_attachment() APIs, but there is no callback for the exporter to inform the importer that it has to take the mapping away. The closest thing I saw was the move_notify() callback, but that is optional. In my mind the way it works is that there will be some uapi that binds a dma-buf to an RX queue, that will create the attachment and the mapping. If the user crashes or closes the dma-buf handle then that will unbind the dma-buf from the RX queue, but the mapping will remain alive (via some refcounting) until all the NIC descriptors are freed and the mapping is not under use anymore. Usually this will happen next driver reset which destroys and recreates rx queues thereby freeing all the NIC descriptors (but could be a new API so that we don't rely on a driver reset). > If you agree on this > point, then you can require the dedicated queue management in the driver > to use and expect only the alternative frag addressing scheme. ie., it > knows the address is not struct page (validates by checking skb flag or > frag flag or address magic), but a reference to say a page_pool entry > (if you are using page_pool for management of the dmabuf slices) which > contains the metadata needed for the use case. Honestly if my understanding above doesn't match what you want, I could implement 'dedicated queues' instead, just let me know what you want at some future iteration. Now, I'm more worried about this memory format issue and I'm working on an RX prototype without struct pages. So far purely technically speaking it seems possible. --=20 Thanks, Mina