Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp251443ybl; Wed, 11 Dec 2019 17:36:36 -0800 (PST) X-Google-Smtp-Source: APXvYqzFQ09NT6AfM4uB5LZxt9vxmd0cMvXSA+2pJyuhw2Kc4xL0bKPzb9E/V8kKI9/+CoN9BJ6P X-Received: by 2002:a9d:7e8a:: with SMTP id m10mr4869885otp.27.1576114596746; Wed, 11 Dec 2019 17:36:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1576114596; cv=none; d=google.com; s=arc-20160816; b=sVo1F4ZvhM2BMYhF6+xvlQ37kcbaNax7/oQhI72Aa2j+Qnl1hXuyZvBD7YNAvMlwhP wrXNlDejfbMB9p89SBy1C/X8VAUsVCZrSmnsq9UlLFuEHB5DMDINB1bwF3+vBpsYwtqF ETjmuqibVBWS+QUvlVOgT6pHh0ATLLRLJQxhrYsdMtFA3MHR4gC9hXBDN45rBMpokPgx tibrTGMuHMmcz/7GEKhjggT4u+MZW5Sv5l9xH9mRo83Bu7uKtFNh9gfnAp77gdT0LZzu o6AGpk8WxVyq9ytiq935tBtvacwTH0EKWGJYILsdusCluHPU7o0IUkSh/dlpHOu9yQp6 5p2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=Hde5KqEEyOxt+FPpuQC3HoB44U/tiA1z5kPOWLg4UV0=; b=hD6zyT/CzYwzrDq5sD8QMA+uNlQVFr3uf3QCMnwev9+4YYcRBxkbsr+zdBmHbXROxP RDmqBJUh5C/cL/H1jwuoNzmvJbxFEm5TP2dQz1se7O/5CvbOn0rCHL1SpsX/v0YBvp96 0mttoslGWq+dZwshlQMueLNsJDeaHXr3gVWJOuGt5YDhMW7ritaHxfhxDiqnw1vleyqF XSbWLYsY4/LOZ7WafZnTdY2FjCyV6S2tMhZB6MLNlREapxfIpQvA6XBJcLq4eqcCMhIb wzAF7rhEcg6Mej9feTT9hfd+DoKk1ovCbCa7M2eFUwPTcEd5/uCYS+oh0Le3QJQqbFOX X5LQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d6si2221444oic.86.2019.12.11.17.36.20; Wed, 11 Dec 2019 17:36:36 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727567AbfLLBeW (ORCPT + 99 others); Wed, 11 Dec 2019 20:34:22 -0500 Received: from szxga05-in.huawei.com ([45.249.212.191]:7218 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727351AbfLLBeW (ORCPT ); Wed, 11 Dec 2019 20:34:22 -0500 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id C297EAAC34B986444FA1; Thu, 12 Dec 2019 09:34:20 +0800 (CST) Received: from [127.0.0.1] (10.74.191.121) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.439.0; Thu, 12 Dec 2019 09:34:15 +0800 Subject: Re: [PATCH][v2] page_pool: handle page recycle for NUMA_NO_NODE condition To: Saeed Mahameed , "brouer@redhat.com" CC: "ilias.apalodimas@linaro.org" , "jonathan.lemon@gmail.com" , Li Rongqing , "netdev@vger.kernel.org" , , , Greg Kroah-Hartman , , "linux-kernel@vger.kernel.org" References: <1575624767-3343-1-git-send-email-lirongqing@baidu.com> <9fecbff3518d311ec7c3aee9ae0315a73682a4af.camel@mellanox.com> <20191211194933.15b53c11@carbon> <831ed886842c894f7b2ffe83fe34705180a86b3b.camel@mellanox.com> From: Yunsheng Lin Message-ID: <0a252066-fdc3-a81d-7a36-8f49d2babc01@huawei.com> Date: Thu, 12 Dec 2019 09:34:14 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: <831ed886842c894f7b2ffe83fe34705180a86b3b.camel@mellanox.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.74.191.121] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org +CC Michal, Peter, Greg and Bjorn Because there has been disscusion about where and how the NUMA_NO_NODE should be handled before. On 2019/12/12 5:24, Saeed Mahameed wrote: > On Wed, 2019-12-11 at 19:49 +0100, Jesper Dangaard Brouer wrote: >> On Sat, 7 Dec 2019 03:52:41 +0000 >> Saeed Mahameed wrote: >> >>> I don't think it is correct to check that the page nid is same as >>> numa_mem_id() if pool is NUMA_NO_NODE. In such case we should allow >>> all >>> pages to recycle, because you can't assume where pages are >>> allocated >>> from and where they are being handled. >> >> I agree, using numa_mem_id() is not valid, because it takes the numa >> node id from the executing CPU and the call to __page_pool_put_page() >> can happen on a remote CPU (e.g. cpumap redirect, and in future >> SKBs). >> >> >>> I suggest the following: >>> >>> return !page_pfmemalloc() && >>> ( page_to_nid(page) == pool->p.nid || pool->p.nid == NUMA_NO_NODE >>> ); >> >> Above code doesn't generate optimal ASM code, I suggest: >> >> static bool pool_page_reusable(struct page_pool *pool, struct page >> *page) >> { >> return !page_is_pfmemalloc(page) && >> pool->p.nid != NUMA_NO_NODE && >> page_to_nid(page) == pool->p.nid; >> } >> > > this is not equivalent to the above. Here in case pool->p.nid is > NUMA_NO_NODE, pool_page_reusable() will always be false. > > We can avoid the extra check in data path. > How about avoiding NUMA_NO_NODE in page_pool altogether, and force > numa_mem_id() as pool->p.nid when user requests NUMA_NO_NODE at page > pool init, as already done in alloc_pages_node(). That means we will not support page reuse migragtion for NUMA_NO_NODE, which is not same semantic that alloc_pages_node() handle NUMA_NO_NODE, because alloc_pages_node() will allocate the page based on the node of the current running cpu. Also, There seems to be a wild guessing of the node id here, which has been disscussed before and has not reached a agreement yet. > > which will imply recycling without adding any extra condition to the > data path. > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > index a6aefe989043..00c99282a306 100644 > --- a/net/core/page_pool.c > +++ b/net/core/page_pool.c > @@ -28,6 +28,9 @@ static int page_pool_init(struct page_pool *pool, > > memcpy(&pool->p, params, sizeof(pool->p)); > > + /* overwrite to allow recycling.. */ > + if (pool->p.nid == NUMA_NO_NODE) > + pool->p.nid = numa_mem_id(); > + > > After a quick look, i don't see any reason why to keep NUMA_NO_NODE in > pool->p.nid.. > > >> I have compiled different variants and looked at the A >> SM code generated >> by GCC. This seems to give the best result. >> >> >>> 1) never recycle emergency pages, regardless of pool nid. >>> 2) always recycle if pool is NUMA_NO_NODE. >> >> Yes, this defines the semantics, that a page_pool configured with >> NUMA_NO_NODE means skip NUMA checks. I think that sounds okay... >> >> >>> the above change should not add any overhead, a modest branch >>> predictor will handle this with no effort. >> >> It still annoys me that we keep adding instructions to this code >> hot-path (I counted 34 bytes and 11 instructions in my proposed >> function). >> >> I think that it might be possible to move these NUMA checks to >> alloc-side (instead of return/recycles side as today), and perhaps >> only >> on slow-path when dequeuing from ptr_ring (as recycles that call >> __page_pool_recycle_direct() will be pinned during NAPI). But lets >> focus on a smaller fix for the immediate issue... >> > > I know. It annoys me too, but we need recycling to work in production : > where rings/napi can migrate and numa nodes can be NUMA_NO_NODE :-(. > >