Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp107207pxb; Tue, 17 Aug 2021 20:36:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwJpbuHuTgAZnL6W2/9Bw/he69v5k8n8YjfVGqf5S8vXL535cC/vH4QQjFKI0R7uijELa/1 X-Received: by 2002:aa7:cd92:: with SMTP id x18mr7605218edv.325.1629257811454; Tue, 17 Aug 2021 20:36:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629257811; cv=none; d=google.com; s=arc-20160816; b=XJo4U/DNJLk0eQI2wGyODRlsCmarox3TdgpJ5N30gfiDhGaDlBW08k0FOZ+Q0OGRMt 5eMaKsbGGGXp2INtduhtAOEHGtovwXeWkVIe+8EN+O+cvC1zlnYywam+wUfwAALLCCXg D/e4CH49Ys9EC+UW/i4lEvo+Aug+UZhmOsrxZh6gAfxIaZcoORq9UjMeLpjpqsm9K7nr G7FXBTurAfvtgxYbIzOHROitybosKdKqhBYZ5X1fJr48SiyIFcj0SEdteATfvj6FjrEe va5YuUT90tKljBucVGTgN0yPEIgBCv5/B64/77jTGSDGcqmfyOudf/XLR18wcW5xORNI LXhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:in-reply-to:message-id :date:subject:cc:to:from; bh=R1VnGfW3Bg2kjWZc9czc0NsyzCjA9mqD+NMUIkAmxss=; b=UGdYBN7pVwItdncitNsflQq/AIZiETJquUVNbKZP3/n8xD0KJCINVBYv1Bjm+cxMzM hDtUIWWs9t7it2IekRHt6+8Rfw9T0+x99hHJxQiZOANWlL9nvg3Z5pMvaIGtx22DeQ5K PELdwZSF9Yqfv6le4PEVA+Q4DmP8bZxWsqvsz8iXK7r0q7sPjPUMJKXrs+dXdPLd8Mu5 WALgEqw4Q5Bouca/XmGBpwkjwNQZOR1W4IayEV7pU70VWOOOMiSuWCd6P7SYxZXqWO/O 6y1ymoqHOY440nLZkqaso7vE9EH3o252NKe1SRL5F3WRL92DhGDyXim4EsJWnSwaufLH tTtA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q3si4435408edr.145.2021.08.17.20.36.29; Tue, 17 Aug 2021 20:36:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237735AbhHRDem (ORCPT + 99 others); Tue, 17 Aug 2021 23:34:42 -0400 Received: from szxga02-in.huawei.com ([45.249.212.188]:8873 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236287AbhHRDeX (ORCPT ); Tue, 17 Aug 2021 23:34:23 -0400 Received: from dggemv703-chm.china.huawei.com (unknown [172.30.72.55]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4GqD0n1Tyjz8sZd; Wed, 18 Aug 2021 11:29:45 +0800 (CST) Received: from dggpemm500005.china.huawei.com (7.185.36.74) by dggemv703-chm.china.huawei.com (10.3.19.46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 18 Aug 2021 11:33:27 +0800 Received: from localhost.localdomain (10.69.192.56) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 18 Aug 2021 11:33:27 +0800 From: Yunsheng Lin To: , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH RFC 2/7] skbuff: add interface to manipulate frag count for tx recycling Date: Wed, 18 Aug 2021 11:32:18 +0800 Message-ID: <1629257542-36145-3-git-send-email-linyunsheng@huawei.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1629257542-36145-1-git-send-email-linyunsheng@huawei.com> References: <1629257542-36145-1-git-send-email-linyunsheng@huawei.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.69.192.56] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggpemm500005.china.huawei.com (7.185.36.74) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org As the skb->pp_recycle and page->pp_magic may not be enough to track if a frag page is from page pool after the calling of __skb_frag_ref(), mostly because of a data race, see: commit 2cc3aeb5eccc ("skbuff: Fix a potential race while recycling page_pool packets"). As the case of tcp, there may be fragmenting, coalescing or retransmiting case that might lose the track if a frag page is from page pool or not. So increment the frag count when __skb_frag_ref() is called, and use the bit 0 in frag->bv_page to indicate if a page is from a page pool, which automically pass down to another frag->bv_page when doing a '*new_frag = *frag' or memcpying the shinfo. It seems we could do the trick for rx too if it makes sense. Signed-off-by: Yunsheng Lin --- include/linux/skbuff.h | 43 ++++++++++++++++++++++++++++++++++++++++--- include/net/page_pool.h | 5 +++++ 2 files changed, 45 insertions(+), 3 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 6bdb0db..2878d26 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -331,6 +331,11 @@ static inline unsigned int skb_frag_size(const skb_frag_t *frag) return frag->bv_len; } +static inline bool skb_frag_is_pp(const skb_frag_t *frag) +{ + return (unsigned long)frag->bv_page & 1UL; +} + /** * skb_frag_size_set() - Sets the size of a skb fragment * @frag: skb fragment @@ -2190,6 +2195,21 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, skb->pfmemalloc = true; } +static inline void __skb_fill_pp_page_desc(struct sk_buff *skb, int i, + struct page *page, int off, + int size) +{ + skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; + + frag->bv_page = (struct page *)((unsigned long)page | 0x1UL); + frag->bv_offset = off; + skb_frag_size_set(frag, size); + + page = compound_head(page); + if (page_is_pfmemalloc(page)) + skb->pfmemalloc = true; +} + /** * skb_fill_page_desc - initialise a paged fragment in an skb * @skb: buffer containing fragment to be initialised @@ -2211,6 +2231,14 @@ static inline void skb_fill_page_desc(struct sk_buff *skb, int i, skb_shinfo(skb)->nr_frags = i + 1; } +static inline void skb_fill_pp_page_desc(struct sk_buff *skb, int i, + struct page *page, int off, + int size) +{ + __skb_fill_pp_page_desc(skb, i, page, off, size); + skb_shinfo(skb)->nr_frags = i + 1; +} + void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off, int size, unsigned int truesize); @@ -3062,7 +3090,10 @@ static inline void skb_frag_off_copy(skb_frag_t *fragto, */ static inline struct page *skb_frag_page(const skb_frag_t *frag) { - return frag->bv_page; + unsigned long page = (unsigned long)frag->bv_page; + + page &= ~1UL; + return (struct page *)page; } /** @@ -3073,7 +3104,12 @@ static inline struct page *skb_frag_page(const skb_frag_t *frag) */ static inline void __skb_frag_ref(skb_frag_t *frag) { - get_page(skb_frag_page(frag)); + struct page *page = skb_frag_page(frag); + + if (skb_frag_is_pp(frag)) + page_pool_atomic_inc_frag_count(page); + else + get_page(page); } /** @@ -3101,7 +3137,8 @@ static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle) struct page *page = skb_frag_page(frag); #ifdef CONFIG_PAGE_POOL - if (recycle && page_pool_return_skb_page(page)) + if ((recycle || skb_frag_is_pp(frag)) && + page_pool_return_skb_page(page)) return; #endif put_page(page); diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 8d4ae4b..86babb2 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -270,6 +270,11 @@ static inline long page_pool_atomic_sub_frag_count_return(struct page *page, return ret; } +static void page_pool_atomic_inc_frag_count(struct page *page) +{ + atomic_long_inc(&page->pp_frag_count); +} + static inline bool is_page_pool_compiled_in(void) { #ifdef CONFIG_PAGE_POOL -- 2.7.4