Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp3241113rdb; Sat, 9 Dec 2023 19:05:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IEXZ7mkSS1UEg9rm07le13o9tBzKB/4wS2Kr6LNrNykt3vYfJWhNa9fgm1SuMdmNVreCEnX X-Received: by 2002:a05:6359:6ec1:b0:170:8da:1f50 with SMTP id tj1-20020a0563596ec100b0017008da1f50mr2725422rwb.20.1702177535754; Sat, 09 Dec 2023 19:05:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702177535; cv=none; d=google.com; s=arc-20160816; b=h/k3xDqR2R82Kh7/30ZcKiMFrUsNrwEldVMRsarWmGMajPEa1cXnZBmItB5T6rJatf 1n5QPW3DXED78s73sw98Tqw+Ia5FC+GBIkRSV/byGlGVASaU8S+tbSwFVVpufpGNR/IL YGhM4NOfbo+LkX+jYytMiCnypd1GWkbvrHRu7Iwh0hczVgHSaxoQJmOy55c8lgdj56Vn VCRtevhIthnfv8idzxKxMjf3IpvEdMg1mOLnBD37RIa0CUDlU3BucAXqtzDFdZZwrQgz USk6ZE+A1I2kOd/4lO2VzuRoqLNHiOIx0ZIubJ2jknZOKm5GduDKlEoginBKdZapyUP9 sjlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=Wxl9nnVIfiBOr8Rw+j4D/uD/Nj3r0oHUhj6QGiUGdeU=; fh=g6R8eOJ4BHfVwsSKujzw1tI9AAeFHxXwUAvPpPn0m0M=; b=tVGI8+ni2+lulL30zRWtfCEbdbUEBqn5fX9iaAScvrZESK8000ieYzIpohqHwM/7W0 YN1kXxdy/UAfQPBYTmyUjossq/cqNho18VCBSV0U7/RYTXK8jgR38iHetuDyhLJFzlA/ 24b++BqeDsARU+S/SsNLEZPf1EnkPPbQThF63tmfwQPBR7j6CRKIpd4njFWtTCPZ8Xdo 7wje6+YxdNxru/KWBbMXKhRhVACYXeMg/PKAz26UaW/nLUkhcsOXuAdn861GEHphSvsM M04DicchEFcZf5BdMUP7Pg2hiblR9SORUYwo8M3lhnxBkwABGPJmr6hmDDjr0Zw8d3MA my9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b="LoxyH//m"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id n15-20020a170903110f00b001d0a7201fe4si3912346plh.310.2023.12.09.19.05.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 09 Dec 2023 19:05:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b="LoxyH//m"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id A089D8087FF1; Sat, 9 Dec 2023 19:05:32 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231414AbjLJDE5 (ORCPT + 99 others); Sat, 9 Dec 2023 22:04:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229488AbjLJDEz (ORCPT ); Sat, 9 Dec 2023 22:04:55 -0500 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E0379B3; Sat, 9 Dec 2023 19:05:01 -0800 (PST) Received: by mail-wm1-x334.google.com with SMTP id 5b1f17b1804b1-40c1e3ea2f2so36402845e9.2; Sat, 09 Dec 2023 19:05:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702177500; x=1702782300; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=Wxl9nnVIfiBOr8Rw+j4D/uD/Nj3r0oHUhj6QGiUGdeU=; b=LoxyH//m1WFoAMXNwx3y8v+IwHs6ATUWaj36ukRaIT4y83df4SPIPRUfmUMMy8R3XX Ues1NiyS+Q+YNwVa9L6lXBfKABFGkF/c7+RzNIrE4X7aySTz+KqMNMmNfdDWXqX45VtW B0yzygKqaqRG7mw1i1q5E32ybQ93m3CP1PEx0mF5GTk9Bdd3c+L/QlBbV20FiPyDTsJ0 MGdB+RQ1lz0r6WSkSO1QDdXgfIQ7GYhzH9S73cyFLo8mtRkfznLYVt58V0kHpXXsuQdN iiY7kS0/HZWZbMzKKiEPRGgRcdjQDpSDueXV+1Zq6Wwh4z926z5fObOSIqdqEnqEO2SO o+nQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702177500; x=1702782300; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Wxl9nnVIfiBOr8Rw+j4D/uD/Nj3r0oHUhj6QGiUGdeU=; b=P8ar+Epx4ClcZp/udhkseNpty+T5ylx98BzwMkuiFHBnX6doNdPxGB9Nv2buqIWAcI Bknpum3/GXCx8m+gQDVHXHXx5i7IQ84/cd1XfQfKJrY4e16fx8cRFmwNs3W89zMJU2gv ivc2xBip95ONWt7/CrHEpVy94SdA0H/RpsqkCrfS2VNOV3boBZOGYf+o6MKiL9eyJIqo XAAjswsdRiKK0FYLh5BgFkTf+v/v0T9e31NCuSoaYlspuPWkIuXizduIpYl+wUFU63OX bb4LOUdonodr04UoRF3elaaKYLE5zJ4NXN/6ZS9aH13cteitMGEsLu6JntvZjS+6Fs7u /DCA== X-Gm-Message-State: AOJu0YwJ0zpnSO4XgLAscswxIqDppGM0aic2MHJj2t2sH1CpKY+YRpel MSTKgtb3g2pfXJ/AxZqqDFg= X-Received: by 2002:adf:e282:0:b0:332:ef1e:bb88 with SMTP id v2-20020adfe282000000b00332ef1ebb88mr1266810wri.33.1702177500084; Sat, 09 Dec 2023 19:05:00 -0800 (PST) Received: from [192.168.8.100] ([85.255.236.102]) by smtp.gmail.com with ESMTPSA id fm21-20020a05600c0c1500b0040c03c3289bsm8341129wmb.37.2023.12.09.19.04.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 09 Dec 2023 19:04:59 -0800 (PST) Message-ID: Date: Sun, 10 Dec 2023 03:03:37 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [net-next v1 08/16] memory-provider: dmabuf devmem memory provider Content-Language: en-US To: Mina Almasry Cc: Shailend Chand , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Jeroen de Borst , Praveen Kaligineedi , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , =?UTF-8?Q?Christian_K=C3=B6nig?= , Yunsheng Lin , Harshitha Ramamurthy , Shakeel Butt , Willem de Bruijn , Kaiyuan Zhang References: <20231208005250.2910004-1-almasrymina@google.com> <20231208005250.2910004-9-almasrymina@google.com> From: Pavel Begunkov In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Sat, 09 Dec 2023 19:05:32 -0800 (PST) On 12/8/23 23:25, Mina Almasry wrote: > On Fri, Dec 8, 2023 at 2:56 PM Pavel Begunkov wrote: >> >> On 12/8/23 00:52, Mina Almasry wrote: > ... >>> + if (pool->p.queue) >>> + binding = READ_ONCE(pool->p.queue->binding); >>> + >>> + if (binding) { >>> + pool->mp_ops = &dmabuf_devmem_ops; >>> + pool->mp_priv = binding; >>> + } >> >> Hmm, I don't understand why would we replace a nice transparent >> api with page pool relying on a queue having devmem specific >> pointer? It seemed more flexible and cleaner in the last RFC. >> > > Jakub requested this change and may chime in, but I suspect it's to > further abstract the devmem changes from driver. In this iteration, > the driver grabs the netdev_rx_queue and passes it to the page_pool, > and any future configurations between the net stack and page_pool can > be passed this way with the driver unbothered. Ok, that makes sense, but even if passed via an rx queue I'd at least hope it keeping abstract provider parameters, e.g. ops, but not hard coded with devmem specific code. It might even be better done with a helper like create_page_pool_from_queue(), unless there is some deeper interaction b/w pp and rx queues is predicted. >>> + >>> if (pool->mp_ops) { >>> err = pool->mp_ops->init(pool); >>> if (err) { >>> @@ -1020,3 +1033,77 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid) >>> } >>> } >>> EXPORT_SYMBOL(page_pool_update_nid); >>> + >>> +void __page_pool_iov_free(struct page_pool_iov *ppiov) >>> +{ >>> + if (WARN_ON(ppiov->pp->mp_ops != &dmabuf_devmem_ops)) >>> + return; >>> + >>> + netdev_free_dmabuf(ppiov); >>> +} >>> +EXPORT_SYMBOL_GPL(__page_pool_iov_free); >> >> I didn't look too deep but I don't think I immediately follow >> the pp refcounting. It increments pages_state_hold_cnt on >> allocation, but IIUC doesn't mark skbs for recycle? Then, they all >> will be put down via page_pool_iov_put_many() bypassing >> page_pool_return_page() and friends. That will call >> netdev_free_dmabuf(), which doesn't bump pages_state_release_cnt. >> >> At least I couldn't make it work with io_uring, and for my purposes, >> I forced all puts to go through page_pool_return_page(), which calls >> the ->release_page callback. The callback will put the reference and >> ask its page pool to account release_cnt. It also gets rid of >> __page_pool_iov_free(), as we'd need to add a hook there for >> customization otherwise. >> >> I didn't care about overhead because the hot path for me is getting >> buffers from a ring, which is somewhat analogous to sock_devmem_dontneed(), >> but done on pp allocations under napi, and it's done separately. >> >> Completely untested with TCP devmem: >> >> https://github.com/isilence/linux/commit/14bd56605183dc80b540999e8058c79ac92ae2d8 >> > > This was a mistake in the last RFC, which should be fixed in v1. In > the RFC I was not marking the skbs as skb_mark_for_recycle(), so the > unreffing path wasn't as expected. > > In this iteration, that should be completely fixed. I suspect since I > just posted this you're actually referring to the issue tested on the > last RFC? Correct me if wrong. Right, it was with RFCv3 > In this iteration, the reffing story: > > - memory provider allocs ppiov and returns it to the page pool with > ppiov->refcount == 1. > - The page_pool gives the page to the driver. The driver may > obtain/release references with page_pool_page_[get|put]_many(), but > the driver is likely not doing that unless it's doing its own page > recycling. > - The net stack obtains references via skb_frag_ref() -> > page_pool_page_get_many() > - The net stack drops references via skb_frag_unref() -> > napi_pp_put_page() -> page_pool_return_page() and friends. > > Thus, the issue where the unref path was skipping > page_pool_return_page() and friends should be resolved in this > iteration, let me know if you think otherwise, but I think this was an > issue limited to the last RFC. Then page_pool_iov_put_many() should and supposedly would never be called by non devmap code because all puts must circle back into ->release_page. Why adding it to into page_pool_page_put_many()? @@ -731,6 +731,29 @@ __page_pool_put_page(struct page_pool *pool, struct page *page, + if (page_is_page_pool_iov(page)) { ... + page_pool_page_put_many(page, 1); + return NULL; + } Well, I'm looking at this new branch from Patch 10, it can put the buffer, but what if we race at it's actually the final put? Looks like nobody is going to to bump up pages_state_release_cnt If you remove the branch, let it fall into ->release and rely on refcounting there, then the callback could also fix up release_cnt or ask pp to do it, like in the patch I linked above -- Pavel Begunkov