Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp29819759rwd; Wed, 5 Jul 2023 18:40:04 -0700 (PDT) X-Google-Smtp-Source: APBJJlG9V/TskeTdn/lAhaB2U2tGlcS7JOnm9CvtKP0lZQyFC1eMn9vkPCtsf6DEGSNvn46oG8FK X-Received: by 2002:a17:90a:a613:b0:263:a966:7a75 with SMTP id c19-20020a17090aa61300b00263a9667a75mr225487pjq.49.1688607603945; Wed, 05 Jul 2023 18:40:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688607603; cv=none; d=google.com; s=arc-20160816; b=iZZSFeQTpWqwBcD7SkjdAWHmotgq6+6pect2mL+KgBaPtaoevw3ybKJ09nhG09D7MR MI495x7P12N7LnieCDDzVbAMFD3ed+HeMVKZ3vJKTBHhwLAiODlhMwPzn9s6qyAb4Ydx muy5QRG5jYJzVzfI5xRdgoBDYYrj8e4GOH8akHOdrIO44Rua23A9p/TP6gVkNdhkXPF6 v+wGuW5ZDSj8wpPjsKTqN2BcZSnMThvG0vBg8Yjn8V2+pb2G1pWy4lXhVutqpKX6jfNs OFdCcqhLflxfZUBY3TQqPnS8D79r4HfTPJ0s32J657ZViPZX86/mw/M70pNkGXOGdtJG xgpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=jLpBEWm0pyR5LDArWKNLZgik34a5WAbJKk4VIu4a3mM=; fh=NQWmIRLAToqSkcLcBYfbMoCZD2dRat6lqWrf1B2L1/g=; b=dvy8PO2MG8dB61GKbNjncqy9UnmzNUSC32Z1pNreFeXBGPvnYBg4ksLUVh4QdiSD3G bb1rzzRHcGcmaV6MEsp7PkSnF1TmnVQACFBA6nkp7ZDyAY8RZ58uVirXidz0WpCRzUHe 9J7+ef87M39gWeLJFipororhkqo/9u7PmYbCoVNb6yua8DuJxSGK0Oeotn7ctUs82YJg 5GwDXzGazf/xmLCqr4HJiE2pgNZBh6m2nTEZ2V1weqaCaEOZ1Fhg/f//JfMM9w/Yex0q N3nb1fof+SACGJMdYJR/C9GFYjZEmeXo7AshFCmMlYZoRPX08EVeT/ilWDXbnJ3f7Nr9 Q/cw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b="4//uKxlv"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c26-20020a631c5a000000b0053fd39c7f6csi390806pgm.646.2023.07.05.18.39.51; Wed, 05 Jul 2023 18:40:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b="4//uKxlv"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231902AbjGFBRy (ORCPT + 99 others); Wed, 5 Jul 2023 21:17:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42042 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230305AbjGFBRx (ORCPT ); Wed, 5 Jul 2023 21:17:53 -0400 Received: from mail-yb1-xb32.google.com (mail-yb1-xb32.google.com [IPv6:2607:f8b0:4864:20::b32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4CCCB1982 for ; Wed, 5 Jul 2023 18:17:52 -0700 (PDT) Received: by mail-yb1-xb32.google.com with SMTP id 3f1490d57ef6-bd77424c886so151797276.0 for ; Wed, 05 Jul 2023 18:17:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1688606271; x=1691198271; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=jLpBEWm0pyR5LDArWKNLZgik34a5WAbJKk4VIu4a3mM=; b=4//uKxlvWkGMHPzbfDP452dSEOLEXmN1iaJ//zX0rqP/mbDzvCG5TyTIw0DWPUERnF bFuiirw8RrlS3ZXoiPx8V5m2pC0/DQmoyQq6pRMgeyiiosUaxxiNJEhNVlxNVkTtjfcD O0HbxSwU8xJBqKqU7og1+KqIEUJ3mGr5lsE9XcRDCuzBQbMLXb7X8fhCRp1PtUEgSPxa 98p815bY0xDraVhFBsnQipGUjeeVYLIte4Tj4mb2UHdtPIGrc/AubtmyLjTwpJ+LFftM WS73MxucrQDbAc0IIf+hHxzZ3cp7tBFtvv6B/uj+lOSXbnepsln4ABkuwudKSVOcQbK0 QYnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688606271; x=1691198271; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jLpBEWm0pyR5LDArWKNLZgik34a5WAbJKk4VIu4a3mM=; b=HCuAdUzOy9auL66mzF/C7ozTj10M6rthW7MtwEfir1AZrdaY8jkxfrET0oHy9H+ahT FqQ2SVqKTQqijkAlrkoU/4lSsHWIF6KjilhQSAHIisS8X92N9Jm6aw45ikLQARzZH6mR L4M3RdlQ8RISnW0kCnMBNVYJQmfhdwWFfOD1Jf71fK4P3HYuHiAIZ8rAXpU9TMUVJNVK /ebyDTm/BqS+F7IIwfvWRMpKBsuDGeq3nDbMcaeh8Q2lIMSYdXAbLu1OFHIZQ4Hl0jib d64U2Fyc/bLZ1Wm1UBrqMM3aKM4GK9WSg832BpR/aTl+p1hOkg4TYFjMlT5U8Oi8vwyv nAGg== X-Gm-Message-State: ABy/qLZwpkC2rBquuj1IUltLHYay1AbVVpJ3CxvxpE2CMzG8sBK6zIRS ReTujjZp9fSU/Yg3TJHCPb5K/EVGngow8/dri8scJA== X-Received: by 2002:a05:6902:110:b0:c62:bdfa:57f4 with SMTP id o16-20020a056902011000b00c62bdfa57f4mr924315ybh.15.1688606271064; Wed, 05 Jul 2023 18:17:51 -0700 (PDT) MIME-Version: 1.0 References: <908b8b17-f942-f909-61e6-276df52a5ad5@huawei.com> <72ccf224-7b45-76c5-5ca9-83e25112c9c6@redhat.com> <20230616122140.6e889357@kernel.org> <20230619110705.106ec599@kernel.org> <5e0ac5bb-2cfa-3b58-9503-1e161f3c9bd5@kernel.org> In-Reply-To: From: Mina Almasry Date: Wed, 5 Jul 2023 18:17:39 -0700 Message-ID: Subject: Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag) To: Jason Gunthorpe Cc: David Ahern , Jakub Kicinski , Jesper Dangaard Brouer , brouer@redhat.com, Alexander Duyck , Yunsheng Lin , davem@davemloft.net, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Lorenzo Bianconi , Yisen Zhuang , Salil Mehta , Eric Dumazet , Sunil Goutham , Geetha sowjanya , Subbaraya Sundeep , hariprasad , Saeed Mahameed , Leon Romanovsky , Felix Fietkau , Ryder Lee , Shayne Chen , Sean Wang , Kalle Valo , Matthias Brugger , AngeloGioacchino Del Regno , Jesper Dangaard Brouer , Ilias Apalodimas , linux-rdma@vger.kernel.org, linux-wireless@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, Jonathan Lemon Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 3, 2023 at 2:43=E2=80=AFPM Jason Gunthorpe wrote= : > > On Sun, Jul 02, 2023 at 11:22:33PM -0700, Mina Almasry wrote: > > On Sun, Jul 2, 2023 at 9:20=E2=80=AFPM David Ahern = wrote: > > > > > > On 6/29/23 8:27 PM, Mina Almasry wrote: > > > > > > > > Hello Jakub, I'm looking into device memory (peer-to-peer) networki= ng > > > > actually, and I plan to pursue using the page pool as a front end. > > > > > > > > Quick description of what I have so far: > > > > current implementation uses device memory with struct pages; I am > > > > putting all those pages in a gen_pool, and we have written an > > > > allocator that allocates pages from the gen_pool. In the driver, we > > > > use this allocator instead of alloc_page() (the driver in question = is > > > > gve which currently doesn't use the page pool). When the driver is > > > > done with the p2p page, it simply decrements the refcount on it and > > > > the page is freed back to the gen_pool. > > > > Quick update here, I was able to get my implementation working with > > the page pool as a front end with the memory provider API Jakub wrote > > here: > > https://github.com/kuba-moo/linux/tree/pp-providers > > > > The main complication indeed was the fact that my device memory pages > > are ZONE_DEVICE pages, which are incompatible with the page_pool due > > to the union in struct page. I thought of a couple of approaches to > > resolve that. > > > > 1. Make my device memory pages non-ZONE_DEVICE pages. > > Hard no on this from a mm perspective.. We need P2P memory to be > properly tagged and have the expected struct pages to be DMA mappable > and otherwise, you totally break everything if you try to do this.. > > > 2. Convert the pages from ZONE_DEVICE pages to page_pool pages and > > vice versa as they're being inserted and removed from the page pool. > > This is kind of scary, it is very, very, fragile to rework the pages > like this. Eg what happens when the owning device unplugs and needs to > revoke these pages? I think it would likely crash.. > > I think it also technically breaks the DMA API as we may need to look > into the pgmap to do cache ops on some architectures. > > I suggest you try to work with 8k folios and then the tail page's > struct page is empty enough to store the information you need.. Hi Jason, sorry for the late reply, I think this could work, and the page pool already supports > order 0 allocations. It may end up being a big change to the GVE driver which as I understand currently deals with order 0 allocations exclusively. Another issue is that in networks with low MTU, we could be DMAing 1400/1500 bytes into each allocation, which is problematic if the allocation is 8K+. I would need to investigate a bit to see if/how to solve that, and we may end up having to split the page and again run into the 'not enough room in struct page' problem. > Or allocate per page memory and do a memdesc like thing.. > I need to review memdesc more closely. Do you imagine I add a pointer in struct page that points to the memdesc? Or implement a page to memdesc mapping in the page_pool? Either approach could work. I think the concern would be accessing the memdesc entries may be a cache miss unacceptable in fast paths, but I think I already dereference page->pgmap in a few places and it doesn't seem to be an issue. > Though overall, you won't find devices creating struct pages for their > P2P memory today, so I'm not sure what the purpose is. Jonathan > already got highly slammed for proposing code to the kernel that was > unusable. Please don't repeat that. Other than a special NVMe use case > the interface for P2P is DMABUF right now and it is not struct page > backed. > Our approach is actually to extend DMABUF to provide struct page backed attachment mappings, which as far as I understand sidesteps the issues Jonathan ran into. Our code is fully functional with any device that supports dmabuf and in fact a lot of my tests use udmabuf to minimize the dependencies. The RFC may come with a udmabuf selftest to showcase that any dmabuf, even a mocked one, would be supported. > Even if we did get to struct pages for device memory, it is highly > likely cases you are interested in will be using larger than 4k > folios, so page pool would need to cope with this nicely as well. > -- Thanks, Mina