Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp62505imu; Mon, 19 Nov 2018 17:47:00 -0800 (PST) X-Google-Smtp-Source: AJdET5eA6kncZNC2p+SXEah1/vXL61S7iK8qCXtGq9B6iBFgQ9x1Ud/4aRtwGxBy9e6NQpqFRDtv X-Received: by 2002:a62:5793:: with SMTP id i19mr98068pfj.49.1542678420264; Mon, 19 Nov 2018 17:47:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542678420; cv=none; d=google.com; s=arc-20160816; b=TKLzD6iFb91FRoqF/tf4PmKIiQpuwDLbun05fwL/lTXQAhQvio5a2Fur+2159L0oWU h6cguR4G7DMPCAaSiai4psoYR1OjcLXzRvBYf8psXec4hRVzLRGuMbdRf/5dCKZg80Za 6ACu85KxJrvexoeuj6FwPwu0oar3yOr4nVg6EP9xWDf4me1yqRUVsCyOYJrB+OG+f6H4 /QZ3R7qfxCIrIGdl+2R5ZNC5rEor9jCx5cw9JaTpWhBYaP4DOvyx8+MLhNAMGOex998N CyfV3HoxE4o21qGWOzbbgifDwOtPAM/Cko6tnzjmcKBPTJASIvyUa/eS90Hm4KztC+Cz VHAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=MfITaOE2f/knXjS0KxRRgWfhZCCh4b+IRmZ1cidD4aY=; b=guzjEmSx2hmKPJiUBiwrsiCc2LXI9tZmPe7i9cS6l5JCpE5GD2KdlHp7P5fT236IPD Tvpc0pNZpHi+nAJ7+04g7I6UK4/euQIie59Pf9e972nySRgpvQDRjB7x0E9Jz50rwm9p q1DrQfJJUxrpXTgmtgNfGGtrQkKVR0+GrLfwHOU/B+vwFLO4EeKFh3L+Vgla99qph3Bj ROoHfpyAmqMDRSMjyBkRBFWu6PK4fvULKVnx2sMCJBhIDrt/YPDRvXTjfP4gQADIif5v DGTRKPyVO08gBB+NkHj09pQX9ipumPhuDceelmONRBGMugw3mCBQ12Ep5gPC6bWCa/g6 Wu/A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p9si23815484pll.63.2018.11.19.17.46.45; Mon, 19 Nov 2018 17:47:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732668AbeKTMMc (ORCPT + 99 others); Tue, 20 Nov 2018 07:12:32 -0500 Received: from mga14.intel.com ([192.55.52.115]:48862 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726119AbeKTMMc (ORCPT ); Tue, 20 Nov 2018 07:12:32 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Nov 2018 17:45:48 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,255,1539673200"; d="scan'208";a="109528008" Received: from aaronlu.sh.intel.com (HELO intel.com) ([10.239.159.44]) by fmsmga001.fm.intel.com with ESMTP; 19 Nov 2018 17:45:45 -0800 Date: Tue, 20 Nov 2018 09:45:44 +0800 From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: Andrew Morton , =?utf-8?B?UGF3ZcWC?= Staszewski , Jesper Dangaard Brouer , Eric Dumazet , Tariq Toukan , Ilias Apalodimas , Yoel Caspersen , Mel Gorman , Saeed Mahameed , Michal Hocko , Vlastimil Babka , Dave Hansen , Alexander Duyck , Ian Kumlien Subject: [PATCH v2 RESEND update 1/2] mm/page_alloc: free order-0 pages through PCP in page_frag_free() Message-ID: <20181120014544.GB10657@intel.com> References: <20181119134834.17765-1-aaron.lu@intel.com> <20181119134834.17765-2-aaron.lu@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20181119134834.17765-2-aaron.lu@intel.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org page_frag_free() calls __free_pages_ok() to free the page back to Buddy. This is OK for high order page, but for order-0 pages, it misses the optimization opportunity of using Per-Cpu-Pages and can cause zone lock contention when called frequently. Paweł Staszewski recently shared his result of 'how Linux kernel handles normal traffic'[1] and from perf data, Jesper Dangaard Brouer found the lock contention comes from page allocator: mlx5e_poll_tx_cq | --16.34%--napi_consume_skb | |--12.65%--__free_pages_ok | | | --11.86%--free_one_page | | | |--10.10%--queued_spin_lock_slowpath | | | --0.65%--_raw_spin_lock | |--1.55%--page_frag_free | --1.44%--skb_release_data Jesper explained how it happened: mlx5 driver RX-page recycle mechanism is not effective in this workload and pages have to go through the page allocator. The lock contention happens during mlx5 DMA TX completion cycle. And the page allocator cannot keep up at these speeds.[2] I thought that __free_pages_ok() are mostly freeing high order pages and thought this is an lock contention for high order pages but Jesper explained in detail that __free_pages_ok() here are actually freeing order-0 pages because mlx5 is using order-0 pages to satisfy its page pool allocation request.[3] The free path as pointed out by Jesper is: skb_free_head() -> skb_free_frag() -> page_frag_free() And the pages being freed on this path are order-0 pages. Fix this by doing similar things as in __page_frag_cache_drain() - send the being freed page to PCP if it's an order-0 page, or directly to Buddy if it is a high order page. With this change, Paweł hasn't noticed lock contention yet in his workload and Jesper has noticed a 7% performance improvement using a micro benchmark and lock contention is gone. Ilias' test on a 'low' speed 1Gbit interface on an cortex-a53 shows ~11% performance boost testing with 64byte packets and __free_pages_ok() disappeared from perf top. [1]: https://www.spinics.net/lists/netdev/msg531362.html [2]: https://www.spinics.net/lists/netdev/msg531421.html [3]: https://www.spinics.net/lists/netdev/msg531556.html Reported-by: Paweł Staszewski Analysed-by: Jesper Dangaard Brouer Acked-by: Vlastimil Babka Acked-by: Mel Gorman Acked-by: Jesper Dangaard Brouer Acked-by: Ilias Apalodimas Tested-by: Ilias Apalodimas Acked-by: Alexander Duyck Acked-by: Tariq Toukan Signed-off-by: Aaron Lu --- update: fix Tariq's email tag. mm/page_alloc.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 421c5b652708..8f8c6b33b637 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4677,8 +4677,14 @@ void page_frag_free(void *addr) { struct page *page = virt_to_head_page(addr); - if (unlikely(put_page_testzero(page))) - __free_pages_ok(page, compound_order(page)); + if (unlikely(put_page_testzero(page))) { + unsigned int order = compound_order(page); + + if (order == 0) + free_unref_page(page); + else + __free_pages_ok(page, order); + } } EXPORT_SYMBOL(page_frag_free); -- 2.17.2