Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp2733146pxb; Mon, 19 Apr 2021 12:29:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwKnvsgiyVd3/ZQl0V8WTth5HBz76KGqJ+FvxCeCxhKuF2sxFdo6htfZObhigulETvDPH09 X-Received: by 2002:a05:6402:1a52:: with SMTP id bf18mr10372347edb.289.1618860544206; Mon, 19 Apr 2021 12:29:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618860544; cv=none; d=google.com; s=arc-20160816; b=BF7hEPbu0fiVwsN+YoKlXHlvz3vHbrzB9qCAuVyZ+J9ErzrIcJfdnGIiWclRS1CBQa dsPx6DLfMSeT8q3sW41n/J8V9Ba7w7NsqugL+/Z79ag27a61RLgIyxA1+nKZxKwJYcDB zbjsL4nnl+0S9zr9INEMmp5tbPUNgD9mL9MXLtYuNEY3fV/VNClbLOdEFNoFD6B/Xh+1 5HDjJoqYN5MeRHH3hV6Uhx0wNH7uIkTD7vhd+xh+YQeYu61hm6ttoQx1B3ml2fCuvWCj W3GF2QuuV5hhSbQXkibmf63+L2XRR6GcU1UTcsX+bEJkPE55m/FDtuF9x9A1v6IivMz6 fm6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=wf0QWwNwYK6Jhss3gCic3ehYYejoO/XLDc6PTFbc/rU=; b=nri99MnFAHPV9bcxl4gy8HePEHdGKFylZBWYHEQhYHcGrn19jz2r5L7GwfOvQZ0+ly knAHvHLuERdLknV1H3wY9/X9iVyAw/b8BC2928M5r6kYTljmo78X/PqvQropj2NnInF2 KxG2QtVeWqjEnrd74xJXksjj7HOzbaDQfzcVBvm2s1wvBGlT9PBgypkX+MRQl5KRBQM8 vOVOz+oPQddU9oe3vT7hSJmgfIM56PjbxXYGTvRL15qjGShQkByWbmW+VZNVFb8p/VEi dgTDx3gPEftd2yCwrIne7BFdaIvZE8+RF2syggzFOUKZo60REsRjRWQcyk3w5q5vtrcE whIQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=AcV8eFNy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cw16si11892338edb.482.2021.04.19.12.28.39; Mon, 19 Apr 2021 12:29:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=AcV8eFNy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241450AbhDSPn6 (ORCPT + 99 others); Mon, 19 Apr 2021 11:43:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241324AbhDSPn4 (ORCPT ); Mon, 19 Apr 2021 11:43:56 -0400 Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0448C06174A for ; Mon, 19 Apr 2021 08:43:24 -0700 (PDT) Received: by mail-ed1-x52e.google.com with SMTP id i3so15611670edt.1 for ; Mon, 19 Apr 2021 08:43:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=wf0QWwNwYK6Jhss3gCic3ehYYejoO/XLDc6PTFbc/rU=; b=AcV8eFNybarUUIKChCGLHFNWKTXeCXUS2RwBuMjj5z9IPjcJJzr+bKeR2lL61o4jJJ sRwQo3IzAcvyzF4JFtxWbtvpUxNLNtl+iUWDSbH8GdI2gQSryXkcPO2EgjqBsZxNL/Dp oEVDryNxsWLjXMkNbGCx5oX+lFL+nn6Okl0BYTH1LHz5mYShDONiNnVrbqfgCiPz4xQG h+xu8ewH/NOn5+DsVldXLFK4y0RknlSRIbJpnzFt+Me2/3yAJBF8qKjo7Wwfn1w/uHs6 mb2HggcdKRRt97ivNvwAFF9kD37O4OBBnD8HNyLoFsDdEp6+/21qeeklsXWgJzvdIn5X hmRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=wf0QWwNwYK6Jhss3gCic3ehYYejoO/XLDc6PTFbc/rU=; b=Gzi1rTz9W0uy3gqjV8pLOUvO0nkOFkLZ4obDd0tJVsiVvmQgA2v/RsiqhzYVLukDLm DQ+5RET8dyd5SfvCQ/dvahI1flf+u3n2u3v62OveKp85a71JrQOY4BM0uirRW04iH9Qy o58nsF5BwOIjaShFMjiGZ6yOQSG9oFOxq02p0POrqvT7jALFShxhauwtN0vYwbjCH7J9 9QiwgNcpP5/siwlBycdYnpripWVjWOyQ7OkRm8goxhalrYXn7p/tSMemUW76SxNXa1TL /pH+8nj4E8qmPZHck7UqzflUyXj/eHgQf5H4QEu4vZWwTAiQ+EKNnUGlHSsktO3ynbup Escw== X-Gm-Message-State: AOAM531cLktlVVTo+V8LTv8ku8O3Ykmf+y+J8YnFpmq9sZXovwQl0hj2 9FzVydfbPWrQcZzxts9FUDkWlg== X-Received: by 2002:aa7:d3c6:: with SMTP id o6mr19000389edr.359.1618847003325; Mon, 19 Apr 2021 08:43:23 -0700 (PDT) Received: from apalos.home (ppp-94-65-92-88.home.otenet.gr. [94.65.92.88]) by smtp.gmail.com with ESMTPSA id s11sm13468946edt.27.2021.04.19.08.43.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Apr 2021 08:43:22 -0700 (PDT) Date: Mon, 19 Apr 2021 18:43:17 +0300 From: Ilias Apalodimas To: Shakeel Butt Cc: Jesper Dangaard Brouer , Matthew Wilcox , Matteo Croce , netdev , Linux MM , Ayush Sawal , Vinay Kumar Yadav , Rohit Maheshwari , "David S. Miller" , Jakub Kicinski , Thomas Petazzoni , Marcin Wojtas , Russell King , Mirko Lindner , Stephen Hemminger , Tariq Toukan , Jesper Dangaard Brouer , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Boris Pismenny , Arnd Bergmann , Andrew Morton , "Peter Zijlstra (Intel)" , Vlastimil Babka , Yu Zhao , Will Deacon , Michel Lespinasse , Fenghua Yu , Roman Gushchin , Hugh Dickins , Peter Xu , Jason Gunthorpe , Guoqing Jiang , Jonathan Lemon , Alexander Lobakin , Cong Wang , wenxu , Kevin Hao , Aleksandr Nogikh , Jakub Sitnicki , Marco Elver , Willem de Bruijn , Miaohe Lin , Yunsheng Lin , Guillaume Nault , LKML , linux-rdma@vger.kernel.org, bpf , Eric Dumazet , David Ahern , Lorenzo Bianconi , Saeed Mahameed , Andrew Lunn , Paolo Abeni Subject: Re: [PATCH net-next v3 2/5] mm: add a signature in struct page Message-ID: References: <20210409223801.104657-1-mcroce@linux.microsoft.com> <20210409223801.104657-3-mcroce@linux.microsoft.com> <20210410154824.GZ2531743@casper.infradead.org> <20210414214132.74f721dd@carbon> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Shakeel, On Mon, Apr 19, 2021 at 07:57:03AM -0700, Shakeel Butt wrote: > On Sun, Apr 18, 2021 at 10:12 PM Ilias Apalodimas > wrote: > > > > On Wed, Apr 14, 2021 at 01:09:47PM -0700, Shakeel Butt wrote: > > > On Wed, Apr 14, 2021 at 12:42 PM Jesper Dangaard Brouer > > > wrote: > > > > > > > [...] > > > > > > > > > > > > Can this page_pool be used for TCP RX zerocopy? If yes then PageType > > > > > > can not be used. > > > > > > > > > > Yes it can, since it's going to be used as your default allocator for > > > > > payloads, which might end up on an SKB. > > > > > > > > I'm not sure we want or should "allow" page_pool be used for TCP RX > > > > zerocopy. > > > > For several reasons. > > > > > > > > (1) This implies mapping these pages page to userspace, which AFAIK > > > > means using page->mapping and page->index members (right?). > > > > > > > > > > No, only page->_mapcount is used. > > > > > > > I am not sure I like leaving out TCP RX zerocopy. Since we want driver to > > adopt the recycling mechanism we should try preserving the current > > functionality of the network stack. > > > > The question is how does it work with the current drivers that already have an > > internal page recycling mechanism. > > > > I think the current drivers check page_ref_count(page) to decide to > reuse (or not) the already allocated pages. > > Some examples from the drivers: > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:ixgbe_can_reuse_rx_page() > drivers/net/ethernet/intel/igb/igb_main.c:igb_can_reuse_rx_page() > drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:mlx5e_rx_cache_get() > Yes, that's how internal recycling is done in drivers. As Jesper mentioned the refcnt of the page is 1 for the page_pool owned pages and that's how we decide what to do with the page. > > > > (2) It feels wrong (security wise) to keep the DMA-mapping (for the > > > > device) and also map this page into userspace. > > > > > > > > > > I think this is already the case i.e pages still DMA-mapped and also > > > mapped into userspace. > > > > > > > (3) The page_pool is optimized for refcnt==1 case, and AFAIK TCP-RX > > > > zerocopy will bump the refcnt, which means the page_pool will not > > > > recycle the page when it see the elevated refcnt (it will instead > > > > release its DMA-mapping). > > > > > > Yes this is right but the userspace might have already consumed and > > > unmapped the page before the driver considers to recycle the page. > > > > Same question here. I'll have a closer look in a few days and make sure we are > > not breaking anything wrt zerocopy. > > > > Pages mapped into the userspace have their refcnt elevated, so the > page_ref_count() check by the drivers indicates to not reuse such > pages. > When tcp_zerocopy_receive() is invoked it will call tcp_zerocopy_vm_insert_batch() which will end up doing a get_page(). What you are saying is that once the zerocopy is done though, skb_release_data() won't be called, but instead put_page() will be? If that's the case then we are indeed leaking DMA mappings and memory. That sounds weird though, since the refcnt will be one in that case (zerocopy will do +1/-1 once it's done), so who eventually frees the page? If kfree_skb() (or any wrapper that calls skb_release_data()) is called eventually, we'll end up properly recycling the page into our pool. > > > > > > > > > > > (4) I remember vaguely that this code path for (TCP RX zerocopy) uses > > > > page->private for tricks. And our patch [3/5] use page->private for > > > > storing xdp_mem_info. > > > > > > > > IMHO when the SKB travel into this TCP RX zerocopy code path, we should > > > > call page_pool_release_page() to release its DMA-mapping. > > > > > > > > > > I will let TCP RX zerocopy experts respond to this but from my high > > > level code inspection, I didn't see page->private usage. > > > > Shakeel are you aware of any 'easy' way I can have rx zerocopy running? > > > > I would recommend tools/testing/selftests/net/tcp_mmap.c. Ok, thanks I'll have a look. Cheers /Ilias