Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp4687357rdb; Tue, 12 Dec 2023 06:48:01 -0800 (PST) X-Google-Smtp-Source: AGHT+IF1VebPYF+9TwsUkJXmrfJqjbB/CXnn4NZ/xaMAER8RQVreZUnAYUOA1CCo7Hu8WpOLA2c2 X-Received: by 2002:a17:90b:5108:b0:286:7c14:6d0a with SMTP id sc8-20020a17090b510800b002867c146d0amr7966262pjb.10.1702392480759; Tue, 12 Dec 2023 06:48:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702392480; cv=none; d=google.com; s=arc-20160816; b=drAxB8DwBpidamcw/JtemiMin24mF24y83QBOqfaDkpLLRUZD8RFBmQeLG7lYvWaf/ FOFWoVEjkBmDlhbhmeVNGSO3qaPENrp4plsFGbc+hGiWkmBBxzXKoVYIsjl7L2f/8oQw 7KFC/+8JfUvKe+QTXj42DcAiiJxVkONWQzZ2mOWIiLdNjTnU/33ZE4rspwB0I3XnSfrl aoVKaXut5gKLJ3/7K3CW9CyuxrCSnOnvjJsFKuqbJDO4NkEPEOmCfQx0QHSdEKQOZ+YH L4njgk57pPDgCUzuGjpKhI+qjS/Vj4/dyVtArZgv+gm0IbQHBK4Ti+Wl+QWfZt3IB8rl Cvuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=svSGkfXxoPYjvETADo38bpUlqdTTk6pLGr40su5OtW8=; fh=f4BibU7riIPzEuYsVRNODLJihcBj6ZDQzVfNUWBwZXE=; b=LY25+Plo2UL21sqzefUCREru7/2UhwwItGpbhCYvJcxYPF5H2VkKn63brrBWw103FS nb1X9gxxd6Sh/enLSdVzTA1fc6MK9CmAe5Z64H58xtoFkzTmI1KXhrM/GFHM3IHKWtF+ QbkNEjUWM8jgl1CsXmdjrNsi35aRO7hqf/xmZSTQGyYOPL7Ou9ffifgXHj81on4x8eCg xFdbKi6EwMaxuaYK2k1qSxHeErrMpfUkTQq0PBurIXoOY7X7jmKmRqB1BjU1zzMtM24Z BD+2tz0kTaHOI6U07e4A5ZOQcPOwj+dT/kCR3JTo7Q/mRrayy+9fbcmXN9+ph7FH69eI FlBg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=nTnDX5dR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id d21-20020a631d55000000b005c700fd8da0si5931921pgm.113.2023.12.12.06.48.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Dec 2023 06:48:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=nTnDX5dR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 843E78078631; Tue, 12 Dec 2023 06:47:57 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376997AbjLLOrk (ORCPT + 99 others); Tue, 12 Dec 2023 09:47:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45860 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377053AbjLLOrY (ORCPT ); Tue, 12 Dec 2023 09:47:24 -0500 Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28E661BF for ; Tue, 12 Dec 2023 06:47:29 -0800 (PST) Received: by mail-lf1-x12b.google.com with SMTP id 2adb3069b0e04-50c0f13ea11so6578701e87.3 for ; Tue, 12 Dec 2023 06:47:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702392447; x=1702997247; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=svSGkfXxoPYjvETADo38bpUlqdTTk6pLGr40su5OtW8=; b=nTnDX5dRv4zCBEDfRkfcYuVCl/vE25w3FMur5nsi8CgW510HrLDFELeAfyQEFsJtVp 3ChMDv1PAFvf0x/pKsJDEdfKvehLJSkYKZYfvdAXjOGqKoMbJqC/DDpeMZAQiWWtn68H VfnJSDxNJUIRkRavSwmvQj5VzEhKT7lwoQjpWasjQKlaUS+NThK4El0OCiVZx+Hln7aL NAlRdw0UPHaSav3/g9VAsA4UVxZFo1+pCIvJUP3nH6thQHC1I+8KCbZmD5uIEnyQDpzZ GO9Q8qvXlYcfd0+RFoElVfIA6BAbmQj7SrYhyRp4zUTHs23cvIZ+uSFhXwwbp669zjxF hsmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702392447; x=1702997247; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=svSGkfXxoPYjvETADo38bpUlqdTTk6pLGr40su5OtW8=; b=UmtkzU9iYuGvA/YvNUoC86x5e4XXaF90oKZ04Vq+sVVOSwyA0aKO23CmXl/q/A6hoT NBAuOiT9AgiDY3B1T2bY1GAW/GWguM4Uybo0Z6z4iqYbyjuGpY2ayWTaSLB5eNcnVM24 aC13eXDD1bP37l0X2gL8B+i7R0z576yzOd6aMeOw0MOQ47cK3DwQAq+m/bL3zhmwNa4u eTwK9qOZLpwIUQJcrRqHOjN2JmvgbVSIYGZq94/T6EvnSzl8lTlBYo08UodDSVE9uJNh 1viw2Gn1CqupLISzGk3TUFuXks+i2mhFTfY+pOM506ATBDqyaYaaZna5MDIG7sZMQMQy Wgiw== X-Gm-Message-State: AOJu0Yyp8WwlnwMo+8NlWeLllBXe0+y6rcDTjZR9Ph18evVR0S4NOH7W KC7kW5Jjg3QymJj+nEPcJE2SJwSPcVBzyTcD9GXyjw== X-Received: by 2002:a05:6512:ac8:b0:50b:feb2:dac9 with SMTP id n8-20020a0565120ac800b0050bfeb2dac9mr3712110lfu.2.1702392446733; Tue, 12 Dec 2023 06:47:26 -0800 (PST) MIME-Version: 1.0 References: <20231208005250.2910004-1-almasrymina@google.com> <20231208005250.2910004-3-almasrymina@google.com> In-Reply-To: From: Mina Almasry Date: Tue, 12 Dec 2023 06:47:14 -0800 Message-ID: Subject: Re: [net-next v1 02/16] net: page_pool: create hooks for custom page providers To: Ilias Apalodimas Cc: Shailend Chand , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Jeroen de Borst , Praveen Kaligineedi , Jesper Dangaard Brouer , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , =?UTF-8?Q?Christian_K=C3=B6nig?= , Yunsheng Lin , Harshitha Ramamurthy , Shakeel Butt Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Tue, 12 Dec 2023 06:47:57 -0800 (PST) On Tue, Dec 12, 2023 at 12:07=E2=80=AFAM Ilias Apalodimas wrote: > > Hi Mina, > > Apologies for not participating in the party earlier. > No worries, thanks for looking. > On Fri, 8 Dec 2023 at 02:52, Mina Almasry wrote: > > > > From: Jakub Kicinski > > > > The page providers which try to reuse the same pages will > > need to hold onto the ref, even if page gets released from > > the pool - as in releasing the page from the pp just transfers > > the "ownership" reference from pp to the provider, and provider > > will wait for other references to be gone before feeding this > > page back into the pool. > > > > Signed-off-by: Jakub Kicinski > > Signed-off-by: Mina Almasry > > > > --- > > > > This is implemented by Jakub in his RFC: > > https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@red= hat.com/T/ > > > > I take no credit for the idea or implementation; I only added minor > > edits to make this workable with device memory TCP, and removed some > > hacky test code. This is a critical dependency of device memory TCP > > and thus I'm pulling it into this series to make it revewable and > > mergable. > > > > RFC v3 -> v1 > > - Removed unusued mem_provider. (Yunsheng). > > - Replaced memory_provider & mp_priv with netdev_rx_queue (Jakub). > > > > --- > > include/net/page_pool/types.h | 12 ++++++++++ > > net/core/page_pool.c | 43 +++++++++++++++++++++++++++++++---- > > 2 files changed, 50 insertions(+), 5 deletions(-) > > > > diff --git a/include/net/page_pool/types.h b/include/net/page_pool/type= s.h > > index ac286ea8ce2d..0e9fa79a5ef1 100644 > > --- a/include/net/page_pool/types.h > > +++ b/include/net/page_pool/types.h > > @@ -51,6 +51,7 @@ struct pp_alloc_cache { > > * @dev: device, for DMA pre-mapping purposes > > * @netdev: netdev this pool will serve (leave as NULL if none or m= ultiple) > > * @napi: NAPI which is the sole consumer of pages, otherwise NUL= L > > + * @queue: struct netdev_rx_queue this page_pool is being created = for. > > * @dma_dir: DMA mapping direction > > * @max_len: max DMA sync memory size for PP_FLAG_DMA_SYNC_DEV > > * @offset: DMA sync address offset for PP_FLAG_DMA_SYNC_DEV > > @@ -63,6 +64,7 @@ struct page_pool_params { > > int nid; > > struct device *dev; > > struct napi_struct *napi; > > + struct netdev_rx_queue *queue; > > enum dma_data_direction dma_dir; > > unsigned int max_len; > > unsigned int offset; > > @@ -125,6 +127,13 @@ struct page_pool_stats { > > }; > > #endif > > > > +struct memory_provider_ops { > > + int (*init)(struct page_pool *pool); > > + void (*destroy)(struct page_pool *pool); > > + struct page *(*alloc_pages)(struct page_pool *pool, gfp_t gfp); > > + bool (*release_page)(struct page_pool *pool, struct page *page)= ; > > +}; > > + > > struct page_pool { > > struct page_pool_params_fast p; > > > > @@ -174,6 +183,9 @@ struct page_pool { > > */ > > struct ptr_ring ring; > > > > + void *mp_priv; > > + const struct memory_provider_ops *mp_ops; > > + > > #ifdef CONFIG_PAGE_POOL_STATS > > /* recycle stats are per-cpu to avoid locking */ > > struct page_pool_recycle_stats __percpu *recycle_stats; > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > > index ca1b3b65c9b5..f5c84d2a4510 100644 > > --- a/net/core/page_pool.c > > +++ b/net/core/page_pool.c > > @@ -25,6 +25,8 @@ > > > > #include "page_pool_priv.h" > > > > +static DEFINE_STATIC_KEY_FALSE(page_pool_mem_providers); > > We could add the existing page pool mechanisms as another 'provider', > but I assume this is coded like this for performance reasons (IOW skip > the expensive ptr call for the default case?) > Correct, it's done like this for performance reasons. > > + > > #define DEFER_TIME (msecs_to_jiffies(1000)) > > #define DEFER_WARN_INTERVAL (60 * HZ) > > > > @@ -174,6 +176,7 @@ static int page_pool_init(struct page_pool *pool, > > const struct page_pool_params *params) > > { > > unsigned int ring_qsize =3D 1024; /* Default */ > > + int err; > > > > memcpy(&pool->p, ¶ms->fast, sizeof(pool->p)); > > memcpy(&pool->slow, ¶ms->slow, sizeof(pool->slow)); > > @@ -234,10 +237,25 @@ static int page_pool_init(struct page_pool *pool, > > /* Driver calling page_pool_create() also call page_pool_destro= y() */ > > refcount_set(&pool->user_cnt, 1); > > > > + if (pool->mp_ops) { > > + err =3D pool->mp_ops->init(pool); > > + if (err) { > > + pr_warn("%s() mem-provider init failed %d\n", > > + __func__, err); > > + goto free_ptr_ring; > > + } > > + > > + static_branch_inc(&page_pool_mem_providers); > > + } > > + > > if (pool->p.flags & PP_FLAG_DMA_MAP) > > get_device(pool->p.dev); > > > > return 0; > > + > > +free_ptr_ring: > > + ptr_ring_cleanup(&pool->ring, NULL); > > + return err; > > } > > > > static void page_pool_uninit(struct page_pool *pool) > > @@ -519,7 +537,10 @@ struct page *page_pool_alloc_pages(struct page_poo= l *pool, gfp_t gfp) > > return page; > > > > /* Slow-path: cache empty, do real allocation */ > > - page =3D __page_pool_alloc_pages_slow(pool, gfp); > > + if (static_branch_unlikely(&page_pool_mem_providers) && pool->m= p_ops) > > Why do we need && pool->mp_ops? On the init function, we only bump > page_pool_mem_providers if the ops are there > Note that page_pool_mem_providers is a static variable (not part of the page_pool struct), so if you have 2 page_pools on the system, one using devmem and one not, we need to check pool->mp_ops to make sure this page_pool is using a memory provider. > > + page =3D pool->mp_ops->alloc_pages(pool, gfp); > > + else > > + page =3D __page_pool_alloc_pages_slow(pool, gfp); > > return page; > > } > > EXPORT_SYMBOL(page_pool_alloc_pages); > > @@ -576,10 +597,13 @@ void __page_pool_release_page_dma(struct page_poo= l *pool, struct page *page) > > void page_pool_return_page(struct page_pool *pool, struct page *page) > > { > > int count; > > + bool put; > > > > - __page_pool_release_page_dma(pool, page); > > - > > - page_pool_clear_pp_info(page); > > + put =3D true; > > + if (static_branch_unlikely(&page_pool_mem_providers) && pool->m= p_ops) > > ditto > > > + put =3D pool->mp_ops->release_page(pool, page); > > + else > > + __page_pool_release_page_dma(pool, page); > > > > /* This may be the last page returned, releasing the pool, so > > * it is not safe to reference pool afterwards. > > @@ -587,7 +611,10 @@ void page_pool_return_page(struct page_pool *pool,= struct page *page) > > count =3D atomic_inc_return_relaxed(&pool->pages_state_release_= cnt); > > trace_page_pool_state_release(pool, page, count); > > > > - put_page(page); > > + if (put) { > > + page_pool_clear_pp_info(page); > > + put_page(page); > > + } > > /* An optimization would be to call __free_pages(page, pool->p.= order) > > * knowing page is not part of page-cache (thus avoiding a > > * __page_cache_release() call). > > @@ -857,6 +884,12 @@ static void __page_pool_destroy(struct page_pool *= pool) > > > > page_pool_unlist(pool); > > page_pool_uninit(pool); > > + > > + if (pool->mp_ops) { > > Same here. Using a mix of pool->mp_ops and page_pool_mem_providers > will work, but since we always check the ptr on init, can't we simply > rely on page_pool_mem_providers for the rest of the code? > > Thanks > /Ilias > > + pool->mp_ops->destroy(pool); > > + static_branch_dec(&page_pool_mem_providers); > > + } > > + > > kfree(pool); > > } > > > > -- > > 2.43.0.472.g3155946c3a-goog > > -- Thanks, Mina