Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp3324033pxb; Sun, 3 Oct 2021 22:54:57 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwdfkf8ePGjEV+04KIS3LSH00uCcgA6oFicRBRpzb8yoyIgW8247J7KtHqdZelQ/Er/yYMm X-Received: by 2002:a17:90b:33c7:: with SMTP id lk7mr19136597pjb.172.1633326897310; Sun, 03 Oct 2021 22:54:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633326897; cv=none; d=google.com; s=arc-20160816; b=t4tkcjAX+5iKBTBVA1/1RYQwftNilqKEK//tOoNZbsH0UvfV0Us0Ch2kcqViIprSx4 iwm1WzxhEbpgUTcd5GWShrIxAT4lDvIifURCENUzYb8LgR6ueLalJQimCka5n4jSpPfS fggut2LUJ85UdrNVTI17DCA1p8G4GcmaK0XktApW1ztQrVi5FcwR6XZgsf3SLrO3CRSw qGUUNTAJyqFsexgzxl2kStETb86zmIBd95NcYYh59fATHBtmmLf0cY6jrfb9676HR4mB eryfp4ZaXn+m7wPa0WScZEW/23SyP//GJCk9Yakt8voQmswFvbV5jSmRDECkljV8CH9C 1YDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=EpVI6CkfPsA7We1+C10fWDLHsxafmk126aaDImN3124=; b=O3ZCEug0PiLb+s5rAKJbqrRt3Hg+t5B+HRuFpDbdZpBVZ89QUK7H6ywtBv4k473cHu i73/dLycY3MwN8S3uEfZlCGCC+B7Jjlvv64QrYG1G5zMgxjTCeCLrNUL7pDFB94w/zI6 9zscjvRkAcv7T87aPdjkBfLGeOTAFv+sWc5IiMHdRrBoTyhnOGrZUOBG3/FydsH/dcs6 IJqEMPt6peW1l9+TTgS2LgrE0SgdAjmkbNL0B8xbz0VtUgRw+Gnhh7mhusyC8bU54F1S gO10iIzHAWadOP15KwvS4QWC34v6V/y9Lcs7JjrYbbBxG1QW21wQlVU4lwtT/Jx4IEco n8Rg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=xyEN8pYu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y16si16555247pfp.293.2021.10.03.22.54.43; Sun, 03 Oct 2021 22:54:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=xyEN8pYu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232559AbhJDFw2 (ORCPT + 99 others); Mon, 4 Oct 2021 01:52:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232517AbhJDFw1 (ORCPT ); Mon, 4 Oct 2021 01:52:27 -0400 Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B71E3C061783 for ; Sun, 3 Oct 2021 22:50:38 -0700 (PDT) Received: by mail-wm1-x335.google.com with SMTP id k21-20020a05600c0b5500b0030d6ac87a80so1253992wmr.0 for ; Sun, 03 Oct 2021 22:50:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=EpVI6CkfPsA7We1+C10fWDLHsxafmk126aaDImN3124=; b=xyEN8pYuMZiiG/gPy7lL1c3OMFE7WV1qHhsCgiJzzCg9swOwApkv16OJpIqYNs/dCf cjIL0B5FM8bBc4cNh7S4tD8LU3uYyfXIZDWJIlaeyN2fLrUe0dQWWIYgMSkjm/zX8apo vZqluyUOwlnxUaZIyPtxxBNmCD+bX3GR3DUuLDctMAfZiQsu9KhqFgNs5HMQZdXww0WZ B0bU5BTcw0hEdpAf2vfHxSIcp7YLsvm/e0QHgYAQRrt5g5AqR8JLb5ERMBMC1xVBN0PQ xk+BCExtUuf36jCDbsQUSgfA64zfbxmRPiJr7QJ7WzTru+xetMt/9LfSCpxNdF00AOUR uANg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=EpVI6CkfPsA7We1+C10fWDLHsxafmk126aaDImN3124=; b=jy9+fW+D9tNvX3b3HZy/IhoqAFptMWKrusB//1TuVK8JN2/6GjbyQUvvnYBp+7Qos+ +GnnuSRAIbkIh5u1TRGG2Xkq/5z8wPO/Z5TKT3X5kyTgU1m3ReMSaQ9uAoblmmhKHlPb FoQwQhlX3h1ahlNYOrmZ5TOd8zgBl8EzakbMMwzi/FV/FfDlpgoAbBX/p8eJCR2vNkbb jQ0FkuJVPoFnB2j3D1OeodqWma7aCM246OyLl+bZpPHonl/2QzA8qt1qEN8QZ/YSurs0 emcWxXIJEObOULrCUlqV/suGOUYHBuqA4rS1029hEolVlIInzFlYeUuNlI2oynal61IF eCvg== X-Gm-Message-State: AOAM530P/FkEiCtRku9dFvqnQ0uMbKG/Fg2QUo6H0al3v15JnedAA/7y jLRD6zS84aX+qcCRKcHdaIUcxg== X-Received: by 2002:a1c:2d6:: with SMTP id 205mr16327691wmc.48.1633326636198; Sun, 03 Oct 2021 22:50:36 -0700 (PDT) Received: from enceladus (ppp-94-66-220-209.home.otenet.gr. [94.66.220.209]) by smtp.gmail.com with ESMTPSA id d2sm7468649wrs.73.2021.10.03.22.50.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 03 Oct 2021 22:50:35 -0700 (PDT) Date: Mon, 4 Oct 2021 08:50:32 +0300 From: Ilias Apalodimas To: Yunsheng Lin Cc: davem@davemloft.net, kuba@kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linuxarm@openeuler.org, hawk@kernel.org, jonathan.lemon@gmail.com, alobakin@pm.me, willemb@google.com, cong.wang@bytedance.com, pabeni@redhat.com, haokexin@gmail.com, nogikh@google.com, elver@google.com, memxor@gmail.com, edumazet@google.com, alexander.duyck@gmail.com, dsahern@gmail.com Subject: Re: [PATCH net-next v4 3/3] skbuff: keep track of pp page when pp_frag_count is used Message-ID: References: <20210930080747.28297-1-linyunsheng@huawei.com> <20210930080747.28297-4-linyunsheng@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210930080747.28297-4-linyunsheng@huawei.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 30, 2021 at 04:07:47PM +0800, Yunsheng Lin wrote: > As the skb->pp_recycle and page->pp_magic may not be enough > to track if a frag page is from page pool after the calling > of __skb_frag_ref(), mostly because of a data race, see: > commit 2cc3aeb5eccc ("skbuff: Fix a potential race while > recycling page_pool packets"). > > There may be clone and expand head case that might lose the > track if a frag page is from page pool or not. > > And not being able to keep track of pp page may cause problem > for the skb_split() case in tso_fragment() too: > Supposing a skb has 3 frag pages, all coming from a page pool, > and is split to skb1 and skb2: > skb1: first frag page + first half of second frag page > skb2: second half of second frag page + third frag page > > How do we set the skb->pp_recycle of skb1 and skb2? > 1. If we set both of them to 1, then we may have a similar > race as the above commit for second frag page. > 2. If we set only one of them to 1, then we may have resource > leaking problem as both first frag page and third frag page > are indeed from page pool. > > Increment the pp_frag_count of pp page frag in __skb_frag_ref(), > and only use page->pp_magic to indicate a pp page frag in > __skb_frag_unref() to keep track of pp page frag. > > Similar handling is done for the head page of a skb too. > > As we need the head page of a compound page to decide if it is > from page pool at first, so __page_frag_cache_drain() and > page_ref_inc() is used to avoid unnecessary compound_head() > calling. > > Signed-off-by: Yunsheng Lin > --- > include/linux/skbuff.h | 30 ++++++++++++++++++++---------- > include/net/page_pool.h | 24 +++++++++++++++++++++++- > net/core/page_pool.c | 17 ++--------------- > net/core/skbuff.c | 10 ++++++++-- > 4 files changed, 53 insertions(+), 28 deletions(-) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 841e2f0f5240..aeee150d4a04 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -3073,7 +3073,19 @@ static inline struct page *skb_frag_page(const skb_frag_t *frag) > */ > static inline void __skb_frag_ref(skb_frag_t *frag) > { > - get_page(skb_frag_page(frag)); > + struct page *page = skb_frag_page(frag); > + > + page = compound_head(page); > + > +#ifdef CONFIG_PAGE_POOL > + if (page_pool_is_pp_page(page) && > + page_pool_is_pp_page_frag(page)) { > + page_pool_atomic_inc_frag_count(page); > + return; > + } > +#endif > + > + page_ref_inc(page); There's a BUG_ON we are now ignoring on get_page. > } > > /** > @@ -3100,11 +3112,16 @@ static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle) > { > struct page *page = skb_frag_page(frag); > > + page = compound_head(page); > + > #ifdef CONFIG_PAGE_POOL > - if (recycle && page_pool_return_skb_page(page)) > + if (page_pool_is_pp_page(page) && > + (recycle || page_pool_is_pp_page_frag(page))) { > + page_pool_return_skb_page(page); > return; > + } > #endif > - put_page(page); > + __page_frag_cache_drain(page, 1); Same here, freeing the page is not the only thing put_page does. > } > > /** > @@ -4718,12 +4735,5 @@ static inline void skb_mark_for_recycle(struct sk_buff *skb) > } > #endif > > -static inline bool skb_pp_recycle(struct sk_buff *skb, void *data) > -{ > - if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle) > - return false; > - return page_pool_return_skb_page(virt_to_page(data)); > -} > - > #endif /* __KERNEL__ */ > #endif /* _LINUX_SKBUFF_H */ > diff --git a/include/net/page_pool.h b/include/net/page_pool.h > index 3855f069627f..740a8ca7f4a6 100644 > --- a/include/net/page_pool.h > +++ b/include/net/page_pool.h > @@ -164,7 +164,7 @@ inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool) > return pool->p.dma_dir; > } > > -bool page_pool_return_skb_page(struct page *page); > +void page_pool_return_skb_page(struct page *page); > > struct page_pool *page_pool_create(const struct page_pool_params *params); > > @@ -231,6 +231,28 @@ static inline void page_pool_set_frag_count(struct page *page, long nr) > atomic_long_set(&page->pp_frag_count, nr); > } > > +static inline void page_pool_atomic_inc_frag_count(struct page *page) > +{ > + atomic_long_inc(&page->pp_frag_count); > +} > + > +static inline bool page_pool_is_pp_page(struct page *page) > +{ > + /* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation > + * in order to preserve any existing bits, such as bit 0 for the > + * head page of compound page and bit 1 for pfmemalloc page, so > + * mask those bits for freeing side when doing below checking, > + * and page_is_pfmemalloc() is checked in __page_pool_put_page() > + * to avoid recycling the pfmemalloc page. > + */ > + return (page->pp_magic & ~0x3UL) == PP_SIGNATURE; > +} > + > +static inline bool page_pool_is_pp_page_frag(struct page *page) > +{ > + return !!atomic_long_read(&page->pp_frag_count); > +} > + > static inline long page_pool_atomic_sub_frag_count_return(struct page *page, > long nr) > { > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > index 2c643b72ce16..d141e00459c9 100644 > --- a/net/core/page_pool.c > +++ b/net/core/page_pool.c > @@ -219,6 +219,7 @@ static void page_pool_set_pp_info(struct page_pool *pool, > { > page->pp = pool; > page->pp_magic |= PP_SIGNATURE; > + page_pool_set_frag_count(page, 0); > } > > static void page_pool_clear_pp_info(struct page *page) > @@ -736,22 +737,10 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid) > } > EXPORT_SYMBOL(page_pool_update_nid); > > -bool page_pool_return_skb_page(struct page *page) > +void page_pool_return_skb_page(struct page *page) > { > struct page_pool *pp; > > - page = compound_head(page); > - > - /* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation > - * in order to preserve any existing bits, such as bit 0 for the > - * head page of compound page and bit 1 for pfmemalloc page, so > - * mask those bits for freeing side when doing below checking, > - * and page_is_pfmemalloc() is checked in __page_pool_put_page() > - * to avoid recycling the pfmemalloc page. > - */ > - if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE)) > - return false; > - > pp = page->pp; > > /* Driver set this to memory recycling info. Reset it on recycle. > @@ -760,7 +749,5 @@ bool page_pool_return_skb_page(struct page *page) > * 'flipped' fragment being in use or not. > */ > page_pool_put_full_page(pp, page, false); > - > - return true; > } > EXPORT_SYMBOL(page_pool_return_skb_page); > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 74601bbc56ac..e3691b025d30 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -646,9 +646,15 @@ static void skb_free_head(struct sk_buff *skb) > unsigned char *head = skb->head; > > if (skb->head_frag) { > - if (skb_pp_recycle(skb, head)) > + struct page *page = virt_to_head_page(head); > + > + if (page_pool_is_pp_page(page) && > + (skb->pp_recycle || page_pool_is_pp_page_frag(page))) { > + page_pool_return_skb_page(page); > return; > - skb_free_frag(head); > + } > + > + __page_frag_cache_drain(page, 1); > } else { > kfree(head); > } > -- > 2.33.0 > Regardless of the comments above, providing some numbers on how the patches affect performance (at least on hns3), would be good to have. I'll try giving this another look. I still think having three indicators to look at before recycling the page is not ideal. Regards /Ilias