Received: by 2002:a05:7412:b101:b0:e2:908c:2ebd with SMTP id az1csp2810399rdb; Wed, 15 Nov 2023 11:06:09 -0800 (PST) X-Google-Smtp-Source: AGHT+IGg94ul4K8/RTfBfVm5WvZiVdEYkSKqjzdrI16C397wqgqXqjTayckPHBVCrMWDLHbM7A9b X-Received: by 2002:a17:902:82cb:b0:1cc:3fcb:8d2b with SMTP id u11-20020a17090282cb00b001cc3fcb8d2bmr5505467plz.21.1700075169315; Wed, 15 Nov 2023 11:06:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700075169; cv=none; d=google.com; s=arc-20160816; b=nJNHUnwgB1ZblTDAOBB+TTYemeXjLOAvQX+0/VbYoPZtm9Mgts0LUQjQazAtD7ItE+ /PmLNBlbfs69OCbpB0EapcKbBYfoINDSslD7jh0PiQB8A5MQRsu8WQJn6/0XRO3qcCSd 2lV5VlwavlSpFdgjB5rMCFMQvHq/DxUf2HomVxiy+QfGzIuE0scdY69vRp56mc3sXw4u u3JQytygj6pqtbT1d3fZGs1VQDrbBUQ1bxVGiPhbb+wedQGm4Dni2KgnX7den6Z7jX9Q 2R1s/a07E1GkunLHqXDXgK2gy1U7VQWu5mVCn7fContfL/aKwULtHE/vNdM8q+zx7RtX +Cqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=l7gRaeRzzj5lepzboTcMyTTCMCWwZvb+WSwh0pgbFKk=; fh=tyyQaih5D8glEhhLy2ShZ3M5aIllraoBu5zVDuA0b7Y=; b=Ibc38uXwydMVniCspDTsjUL2U83Mv0LP+beAB1Cp8ejoge8/dE8HsWPC+D+mADpbC/ vncv5S2gn6mykTH/ewsyLBWM6w9oQnd1fIJ7jDoVGygW0QUbx0zBLsmBsO4i2Sb1EIRf SwkteUjNpR8OEirvw62MB1APsIe3kgZ/beC94J/0czoAJ4Z3yRrqOKH1F7afbKdks/6M HYe67JuQAbXaWrRqDMawU1syR/RwXuXkXVN5W0pVvm8Q7IC3AwgdSHwFA6WZ37YKIGoG +ucbOhNmRZFPfefD8VIPDsQdCkQ28N+niwNKxbYNI0Z/jdcILLcVqhUAHWvEEb1geTBF eoGA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=3w5V52Yb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id n1-20020a170902e54100b001c6069b659csi11132233plf.384.2023.11.15.11.06.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Nov 2023 11:06:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=3w5V52Yb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 26B94807C7CC; Wed, 15 Nov 2023 11:06:06 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232046AbjKOTFu (ORCPT + 99 others); Wed, 15 Nov 2023 14:05:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58308 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231745AbjKOTFt (ORCPT ); Wed, 15 Nov 2023 14:05:49 -0500 Received: from mail-oa1-x30.google.com (mail-oa1-x30.google.com [IPv6:2001:4860:4864:20::30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99D4CC7 for ; Wed, 15 Nov 2023 11:05:45 -0800 (PST) Received: by mail-oa1-x30.google.com with SMTP id 586e51a60fabf-1f066fc2a28so4243014fac.0 for ; Wed, 15 Nov 2023 11:05:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1700075145; x=1700679945; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=l7gRaeRzzj5lepzboTcMyTTCMCWwZvb+WSwh0pgbFKk=; b=3w5V52YbhWpqjGI1CHJjdmMt3ESVLgEIKGa2sQaNfghpcM5vhoygLAcfzKHQoMskZE wG4XONGuZ9w3R/xtCg7lvELJWMV2cBEQ3I7XhOJ9N8LHXTEPM4fvp9ZBaL5Aed/AqRm6 lbdekoncENSXzZ3a7PwBCtxicOcZAIwFfgIEQMLAS2p5r9HinhFjqBMCyb496CsSPZ0a ZGwA4+Prep55aH/3ri1HBQctTWAbpFIRLJqmNmnZbgu3YwDA0GfnoyjLnRiGDd27KhPD Gm5RSwlH90e9Em/BJYA6ImL7Tg7iBhhWHEDSVPQyEY8XTh1vf7PZ5/tSKLWmBVDqR9qq pxAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700075145; x=1700679945; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=l7gRaeRzzj5lepzboTcMyTTCMCWwZvb+WSwh0pgbFKk=; b=xGLSH6iqd4qAST+XkJSSu01i6cBvVbMbQHojRMmzcx2rrjjsf+ytTOLWRpbZrauaH/ dVoIrk3w/R+Vz36F3+nyxD98IZvg9i8sn7sm7oeBpiaVhlK1KL5QM99m5VCTOZZ/ailB /GDpgjS4R3FywrEIoJjiac93eWSEhGtKZG2Lmwaaa7JroPx64TAhg57DZ48iZ9gSKD0s +953smB9zjYLMjBLPpK367UupRRm2XJCgvuJz7UjyqpsCj+FETIYe2RfbtOLp/MaZHs+ OQzXxSAytkrSWYuyfJ8w2UpWQbtKCoMNGRaK06t13cpe9YxI+asq+0hxv2hnCA/MvWhM G3UA== X-Gm-Message-State: AOJu0YzIopQO9taNTTZECi7f6giX6a6l0417buC7w18bfi6Po9QCURxC C+oATYVIRELXnjTFsvd10kphu2UWgUZtUlmmD5j+LA== X-Received: by 2002:a05:6358:713:b0:16b:f554:f359 with SMTP id e19-20020a056358071300b0016bf554f359mr6881080rwj.7.1700075144624; Wed, 15 Nov 2023 11:05:44 -0800 (PST) MIME-Version: 1.0 References: <20231113130041.58124-1-linyunsheng@huawei.com> <20231113130041.58124-4-linyunsheng@huawei.com> <20231113180554.1d1c6b1a@kernel.org> <0c39bd57-5d67-3255-9da2-3f3194ee5a66@huawei.com> <3ff54a20-7e5f-562a-ca2e-b078cc4b4120@huawei.com> <6553954141762_1245c529423@willemb.c.googlers.com.notmuch> <8b7d25eb-1f10-3e37-8753-92b42da3fb34@huawei.com> In-Reply-To: From: Mina Almasry Date: Wed, 15 Nov 2023 11:05:31 -0800 Message-ID: Subject: Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider To: Yunsheng Lin Cc: Willem de Bruijn , Jakub Kicinski , davem@davemloft.net, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Willem de Bruijn , Kaiyuan Zhang , Jesper Dangaard Brouer , Ilias Apalodimas , Eric Dumazet , =?UTF-8?Q?Christian_K=C3=B6nig?= , Jason Gunthorpe , Matthew Wilcox , Linux-MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 15 Nov 2023 11:06:06 -0800 (PST) On Wed, Nov 15, 2023 at 10:07=E2=80=AFAM Mina Almasry wrote: > > On Wed, Nov 15, 2023 at 1:29=E2=80=AFAM Yunsheng Lin wrote: > > > > On 2023/11/14 23:41, Willem de Bruijn wrote: > > >> > > >> I am not sure dma-buf maintainer's concern is still there with this = patchset. > > >> > > >> Whatever name you calling it for the struct, however you arrange eac= h field > > >> in the struct, some metadata is always needed for dmabuf to intergra= te into > > >> page pool. > > >> > > >> If the above is true, why not utilize the 'struct page' to have more= unified > > >> handling? > > > > > > My understanding is that there is a general preference to simplify st= ruct > > > page, and at the least not move in the other direction by overloading= the > > > struct in new ways. > > > > As my understanding, the new struct is just mirroring the struct page p= ool > > is already using, see: > > https://elixir.free-electrons.com/linux/v6.7-rc1/source/include/linux/m= m_types.h#L119 > > > > If there is simplifying to the struct page_pool is using, I think the n= ew > > stuct the devmem memory provider is using can adjust accordingly. > > > > As a matter of fact, I think the way 'struct page' for devmem is decoup= led > > from mm subsystem may provide a way to simplify or decoupled the alread= y > > existing 'struct page' used in netstack from mm subsystem, before this > > patchset, it seems we have the below types of 'struct page': > > 1. page allocated in the netstack using page pool. > > 2. page allocated in the netstack using buddy allocator. > > 3. page allocated in other subsystem and passed to the netstack, such a= s > > zcopy or spliced page? > > > > If we can decouple 'struct page' for devmem from mm subsystem, we may b= e able > > to decouple the above 'struct page' from mm subsystem one by one. > > > > > > > > If using struct page for something that is not memory, there is ZONE_= DEVICE. > > > But using that correctly is non-trivial: > > > > > > https://lore.kernel.org/all/ZKyZBbKEpmkFkpWV@ziepe.ca/ > > > > > > Since all we need is a handle that does not leave the network stack, > > > a network specific struct like page_pool_iov entirely avoids this iss= ue. > > > > Yes, I am agree about the network specific struct. > > I am wondering if we can make the struct more generic if we want to > > intergrate it into page_pool and use it in net stack. > > > > > RFC v3 seems like a good simplification over RFC v1 in that regard to= me. > > > I was also pleasantly surprised how minimal the change to the users o= f > > > skb_frag_t actually proved to be. > > > > Yes, I am agreed about that too. Maybe we can make it simpler by using > > a more abstract struct as page_pool, and utilize some features of > > page_pool too. > > > > For example, from the page_pool doc, page_pool have fast cache and > > ptr-ring cache as below, but if napi_frag_unref() call > > page_pool_page_put_many() and return the dmabuf chunk directly to > > gen_pool in the memory provider, then it seems we are bypassing the > > below caches in the page_pool. > > > > I think you're just misunderstanding the code. The page recycling > works with my patchset. napi_frag_unref() calls napi_pp_put_page() if > recycle =3D=3D true, and that works the same with devmem as with regular > pages. > > If recycle =3D=3D false, we call page_pool_page_put_many() which will cal= l > put_page() for regular pages and page_pool_iov_put_many() for devmem > pages. So, the memory recycling works exactly the same as before with > devmem as with regular pages. In my tests I do see the devmem being > recycled correctly. We are not bypassing any caches. > > Ah, taking a closer look here, the devmem recycling works for me but I think that's a side effect to the fact that the page_pool support I implemented with GVE is unusual. I currently allocate pages from the page_pool but do not set skb_mark_for_recycle(). The page recycling still happens when GVE is done with the page and calls page_pool_put_full_pgae(), as that eventually checks the refcount on the devmem and recycles it. I will fix up the GVE to call skb_mark_for_recycle() and ensure the napi_pp_put_page() path recycles the devmem or page correctly in the next version. > > +------------------+ > > | Driver | > > +------------------+ > > ^ > > | > > | > > | > > v > > +--------------------------------------------+ > > | request memory | > > +--------------------------------------------+ > > ^ ^ > > | | > > | Pool empty | Pool has entries > > | | > > v v > > +-----------------------+ +------------------------+ > > | alloc (and map) pages | | get page from cache | > > +-----------------------+ +------------------------+ > > ^ ^ > > | | > > | cache available | No entries, = refill > > | | from ptr-rin= g > > | | > > v v > > +-----------------+ +------------------+ > > | Fast cache | | ptr-ring cache | > > +-----------------+ +------------------+ > > > > > > > > > > . > > > > > > > -- > Thanks, > Mina --=20 Thanks, Mina