Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp1334635pxy; Thu, 6 May 2021 06:03:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJylAAlA4NLxdpYjlakkj3CxM06aI3ruF2RVEJZl3v25UWvkV658zZukT5gzo/PeV4noaF8r X-Received: by 2002:a05:6402:268c:: with SMTP id w12mr5247152edd.234.1620306192397; Thu, 06 May 2021 06:03:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620306192; cv=none; d=google.com; s=arc-20160816; b=0nfL0OzcDNqa4qaG+zDrwJkq2/J3ANf1Q9q4wg0VQjo9v/Qy1qKmOTHl4SBS0p8ub6 7/njgjPe/iDPrBizCaFmRU3yOR2k6NGUfIvk+ddskRRAPW8EztlLRAdDtY4qDsWHCoEQ ioakZLRfeBSBpcCjENBZnnFsaLuD0qysOjS5F7Qn5kbSa4UJ/Yi5msoRm0pKp+egmKoW 4aWEamVCA+i6Man+ffIicPiixCtSEbnJSU0Ad8Jc5S6PMnLVTsqkeCAASGKdjhIjw2j6 SeEtEhkpVKXO1b6cwAc2fMC8DJqde34f13NEWNP/srRRvVUPzVAFnKnkPxTgZFsy6Ugh hNkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=fa7ACIhZyoerlvMI7YYqAPObnFDdQHVVAOFKQxST/+c=; b=tpzpZLLKJxAdKnCsbORyvfeu8BaHRDTfbwxGoeYoegydWHCXmXW1k7KYf8X0vI7HBU 1k9YHtCr+9qOSHPXZhqr+yipHpHEaw7n1q7K1VoxjW0g+jn/skilYJsNR9tTJzht1i2d ZQV/JU8WzU3byy/YESelDvf42NELsOXG/sxief4x8aOBh13sGpz723ogKdFV6RuNVZc+ s32xJH+wTtNtrBlF+u0dh6nPTIuim/GFdmL1lM3eHgEEjIMaqus+qMQL/UNigWvuHhKU DlBAcOQ5w0vFXG86BvfBsdGMkyFEorPV3pIZLg5M3rFqBuWji5DWNjEWYhf3GWpO13tc LWdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZDliHnCP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u15si2768885edt.43.2021.05.06.06.02.48; Thu, 06 May 2021 06:03:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZDliHnCP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232974AbhEFM7X (ORCPT + 99 others); Thu, 6 May 2021 08:59:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54880 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232958AbhEFM7V (ORCPT ); Thu, 6 May 2021 08:59:21 -0400 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8DD90C061574 for ; Thu, 6 May 2021 05:58:21 -0700 (PDT) Received: by mail-wr1-x430.google.com with SMTP id t18so5517340wry.1 for ; Thu, 06 May 2021 05:58:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=fa7ACIhZyoerlvMI7YYqAPObnFDdQHVVAOFKQxST/+c=; b=ZDliHnCPrXjG8/kRhZWx/9CDOguqLFd3w23uhVzyHIXVaEo7R55rdgns9b/PQGv+qt /F9sk1EmvR1PpKNn3vlc1KcEUCaUaTj9WPbrOrYtu16fDdAkBnLUfmV5f7qTQ80Mw/qF lPStC8I9512h+QwnZMas1eRB2ODQAJahlH8a5LFjsNw/wdkFvcZreg0ubY0J5yVwwoNI AQbU3BDLlojgVhVhcCTagK8gSkE/GuNIWtVqv+BkyvC2Ydqszjmnz7BJkdzKdx4mrh/A TM4vP+kOaT5qtJ/sx7bdCMVCd7Rdkrx2nUVVEqU0KF2e3p0ODFr9KstwndrszpfVeHDr XDHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=fa7ACIhZyoerlvMI7YYqAPObnFDdQHVVAOFKQxST/+c=; b=ODiFANmooYXqqbDQJyQ5kS816UwuJWt+EN0OY88Lxz6HTvl6JvZnbNaJrFlsVEP94f ZbWLsxRWqP6Vn8ppxPxF0todqVggy2VMDE97ebVTK3XAlABVGz2hA8gHnTdZfFGRqlg6 S6nY5JeLyZJDP78T7xY723b3RGDIfEzUWAVa5jzaNZKNdadlC5i5qCeMmll0rD/D8pE1 4FU1oAblZq1d9hR3CuJwxhniTo4AXkptx2DSD23j8IFbpukihtLISn4fOFLsIDOywNa2 300K+JxT9pc2KE+tN+6Kid78s/9TBKANGoPM4dXGsNMpdRxqbaBprdYa8Nw8rtipE25f hS7Q== X-Gm-Message-State: AOAM533QrvuzV3PoFsBf1TjNVFuD0VGpjHVm3e7NqY365tBtL6uLtINM fyfV5wntNmRkvAFCLM0p5fr2hQ== X-Received: by 2002:a5d:678d:: with SMTP id v13mr4889409wru.85.1620305900254; Thu, 06 May 2021 05:58:20 -0700 (PDT) Received: from apalos.home ([94.69.77.156]) by smtp.gmail.com with ESMTPSA id b6sm9299994wmj.2.2021.05.06.05.58.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 May 2021 05:58:19 -0700 (PDT) Date: Thu, 6 May 2021 15:58:14 +0300 From: Ilias Apalodimas To: Yunsheng Lin Cc: Matteo Croce , netdev@vger.kernel.org, linux-mm@kvack.org, Ayush Sawal , Vinay Kumar Yadav , Rohit Maheshwari , "David S. Miller" , Jakub Kicinski , Thomas Petazzoni , Marcin Wojtas , Russell King , Mirko Lindner , Stephen Hemminger , Tariq Toukan , Jesper Dangaard Brouer , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Boris Pismenny , Arnd Bergmann , Andrew Morton , "Peter Zijlstra (Intel)" , Vlastimil Babka , Yu Zhao , Will Deacon , Michel Lespinasse , Fenghua Yu , Roman Gushchin , Hugh Dickins , Peter Xu , Jason Gunthorpe , Guoqing Jiang , Jonathan Lemon , Alexander Lobakin , Cong Wang , wenxu , Kevin Hao , Aleksandr Nogikh , Jakub Sitnicki , Marco Elver , Willem de Bruijn , Miaohe Lin , Guillaume Nault , linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, bpf@vger.kernel.org, Matthew Wilcox , Eric Dumazet , David Ahern , Lorenzo Bianconi , Saeed Mahameed , Andrew Lunn , Paolo Abeni Subject: Re: [PATCH net-next v3 0/5] page_pool: recycle buffers Message-ID: References: <20210409223801.104657-1-mcroce@linux.microsoft.com> <9bf7c5b3-c3cf-e669-051f-247aa8df5c5a@huawei.com> <33b02220-cc50-f6b2-c436-f4ec041d6bc4@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <33b02220-cc50-f6b2-c436-f4ec041d6bc4@huawei.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 06, 2021 at 08:34:48PM +0800, Yunsheng Lin wrote: > On 2021/5/1 0:24, Ilias Apalodimas wrote: > > [...] > >>>> > >>>> 1. skb frag page recycling do not need "struct xdp_rxq_info" or > >>>> "struct xdp_mem_info" to bond the relation between "struct page" and > >>>> "struct page_pool", which seems uncessary at this point if bonding > >>>> a "struct page_pool" pointer directly in "struct page" does not cause > >>>> space increasing. > >>> > >>> We can't do that. The reason we need those structs is that we rely on the > >>> existing XDP code, which already recycles it's buffers, to enable > >>> recycling. Since we allocate a page per packet when using page_pool for a > >>> driver , the same ideas apply to an SKB and XDP frame. We just recycle the > >> > >> I am not really familar with XDP here, but a packet from hw is either a > >> "struct xdp_frame/xdp_buff" for XDP or a "struct sk_buff" for TCP/IP stack, > >> a packet can not be both "struct xdp_frame/xdp_buff" and "struct sk_buff" at > >> the same time, right? > >> > > > > Yes, but the payload is irrelevant in both cases and that's what we use > > page_pool for. You can't use this patchset unless your driver usues > > build_skb(). So in both cases you just allocate memory for the payload and > > I am not sure I understood why build_skb() matters here. If the head data of > a skb is a page frag and is from page pool, then it's page->signature should be > PP_SIGNATURE, otherwise it's page->signature is zero, so a recyclable skb does > not require it's head data being from a page pool, right? > Correct, and that's the big improvement compared to the original RFC. The wording was a bit off in my initial response. I was trying to point out you can recycle *any* buffer coming from page_pool and one of the ways you can do that in your driver, is use build_skb() while the payload is allocated by page_pool. > > decide what the wrap the buffer with (XDP or SKB) later. > > [...] > > >> > >> I am not sure I understand what you meant by "free the skb", does it mean > >> that kfree_skb() is called to free the skb. > > > > Yes > > > >> > >> As my understanding, if the skb completely own the page(which means page_count() > >> == 1) when kfree_skb() is called, __page_pool_put_page() is called, otherwise > >> page_ref_dec() is called, which is exactly what page_pool_atomic_sub_if_positive() > >> try to handle it atomically. > >> > > > > Not really, the opposite is happening here. If the pp_recycle bit is set we > > will always call page_pool_return_skb_page(). If the page signature matches > > the 'magic' set by page pool we will always call xdp_return_skb_frame() will > > end up calling __page_pool_put_page(). If the refcnt is 1 we'll try > > to recycle the page. If it's not we'll release it from page_pool (releasing > > some internal references we keep) unmap the buffer and decrement the refcnt. > > Yes, I understood the above is what the page pool do now. > > But the question is who is still holding an extral reference to the page when > kfree_skb()? Perhaps a cloned and pskb_expand_head()'ed skb is holding an extral > reference to the same page? So why not just do a page_ref_dec() if the orginal skb > is freed first, and call __page_pool_put_page() when the cloned skb is freed later? > So that we can always reuse the recyclable page from a recyclable skb. This may > make the page_pool_destroy() process delays longer than before, I am supposed the > page_pool_destroy() delaying for cloned skb case does not really matters here. > > If the above works, I think the samiliar handling can be added to RX zerocopy if > the RX zerocopy also hold extral references to the recyclable page from a recyclable > skb too? > Right, this sounds doable, but I'll have to go back code it and see if it really makes sense. However I'd still prefer the support to go in as-is (including the struct xdp_mem_info in struct page, instead of a page_pool pointer). There's a couple of reasons for that. If we keep the struct xdp_mem_info we can in the future recycle different kind of buffers using __xdp_return(). And this is a non intrusive change if we choose to store the page pool address directly in the future. It just affects the internal contract between the page_pool code and struct page. So it won't affect any drivers that already use the feature. Regarding the page_ref_dec(), which as I said sounds doable, I'd prefer playing it safe for now and getting rid of the buffers that somehow ended up holding an extra reference. Once this gets approved we can go back and try to save the extra space. I hope I am not wrong but the changes required to support a few extra refcounts should not change the current patches much. Thanks for taking the time on this! /Ilias > > > > [1] https://lore.kernel.org/netdev/154413868810.21735.572808840657728172.stgit@firesoul/ > > > > Cheers > > /Ilias > > > > . > > >