Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp1708034pxj; Wed, 19 May 2021 12:01:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwqYdyn11qjVLdlPiFoOPT4vq6loCVJIAA6crCoEfIhvJuBqI5ZSHEqqfWWZ7OBFeVfzT8G X-Received: by 2002:a05:6402:35c4:: with SMTP id z4mr534561edc.362.1621450893032; Wed, 19 May 2021 12:01:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621450893; cv=none; d=google.com; s=arc-20160816; b=H5ataulwgesaPZrfP6G8xITuUFWjAM+mW9pR5EkWhpfZXvk9hTtLMhz0x/s0QnTNH6 yNkfln44zwVq+dmlnHxRist+YelcYZocLWShs/HgARPk/NOhSXE91DgNty83Y6l5mHAh N+W6MN8Skvm+6DN2V4syJSZJa654hQG/QzRoWI74PsCEO/pRgLTiRi1abLMU02vQ2fyl ycUYbHwg56bDrBZso6Q/hbttZDQm8Q4Fm3VZtIi0rQKaZE5Etoh0tavkq+gGRHWlVF7M Y7M0vr0/BngzKQGv6s2vh/juPIRCnn1TJzQakBM0nzQhWHAy+qG2+SycsUn6UfsiFSLd q/+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:ironport-sdr:ironport-sdr; bh=u/qZskmh+FW+VQUpEYkWByCE1aD78Q61AxAxknhR5oY=; b=yt8lDCR78YHLlGk3kII4D3oEyHkWFD0Ci59ZLs+++IpEfmDSrKWa/xP2X23WULE3Un Ub/8uZ0i1a7s8PAZEfvrdoy938em8bAjG476XCrQi1BFCQ3JXO5p6h5swWErMWQZgoYC XBppPpD6OcvtqHHMQOkVmlG9PhLMfJWyh3Yv19aS4Ggsbh4yRT8NnMFhDahHlcMYkUsn keyId78MYIfI3YBioaKmPZdutHJIOXSJK7VCIlsZ0CSq0oC9aS+OalNk5nR4pB2ul6/X cfASkyUZJ/s5QE6cvTrbm5W0X+JLD60C0BW1yvPpwgfYFvdSbJztXVX4kbnWcO510Ny8 7cRA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w24si400352eje.665.2021.05.19.12.01.08; Wed, 19 May 2021 12:01:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232953AbhESBe6 (ORCPT + 99 others); Tue, 18 May 2021 21:34:58 -0400 Received: from mga03.intel.com ([134.134.136.65]:58850 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231689AbhESBe4 (ORCPT ); Tue, 18 May 2021 21:34:56 -0400 IronPort-SDR: HHshT0gCdbdAeH060g/xjFnjOEpC3t2NKu2tj/1qOBMmB36X7DOCZ4hCWxU5syu16eq8kdTYuv qYEhQMltK8JA== X-IronPort-AV: E=McAfee;i="6200,9189,9988"; a="200925570" X-IronPort-AV: E=Sophos;i="5.82,310,1613462400"; d="scan'208";a="200925570" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2021 18:33:37 -0700 IronPort-SDR: hx8bMtbsi2KwAQEa4SOelQFC+WYV5Z4cfZX195eaMLLryloBWqo/GsTVMIHq2yfUYoBpF5y8Ln /e2WoNTUQFOQ== X-IronPort-AV: E=Sophos;i="5.82,310,1613462400"; d="scan'208";a="473240948" Received: from yhuang6-desk1.sh.intel.com ([10.239.13.1]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2021 18:33:32 -0700 From: Huang Ying To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , Matthew Wilcox , Linus Torvalds , Peter Xu , Hugh Dickins , Johannes Weiner , Mel Gorman , Rik van Riel , Andrea Arcangeli , Michal Hocko , Dave Hansen , Tim Chen Subject: [PATCH] mm: move idle swap cache pages to the tail of LRU after COW Date: Wed, 19 May 2021 09:33:13 +0800 Message-Id: <20210519013313.1274454-1-ying.huang@intel.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With commit 09854ba94c6a ("mm: do_wp_page() simplification"), after COW, the idle swap cache (neither the page nor the corresponding swap entry is mapped by any process) will be left at the original position in the LRU list. While it may be in the active list or the head of the inactive list, so that vmscan may take more overhead or time to reclaim these actually unused pages. To help the page reclaiming, in this patch, after COW, the idle swap cache will be tried to be moved to the tail of the inactive LRU list. To avoid to introduce much overhead to the hot COW code path, all locks are acquired with try locking. To test the patch, we used pmbench memory accessing benchmark with working-set larger than available memory on a 2-socket Intel server with a NVMe SSD as swap device. Test results shows that the pmbench score increases up to 21.8% with the decreased size of swap cache and swapin throughput. Signed-off-by: "Huang, Ying" Suggested-by: Matthew Wilcox Cc: Linus Torvalds Cc: Peter Xu Cc: Hugh Dickins Cc: Johannes Weiner Cc: Mel Gorman Cc: Rik van Riel Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Dave Hansen Cc: Tim Chen --- include/linux/memcontrol.h | 10 ++++++++++ include/linux/swap.h | 3 +++ mm/memcontrol.c | 12 ++++++++++++ mm/memory.c | 5 +++++ mm/swapfile.c | 29 +++++++++++++++++++++++++++++ 5 files changed, 59 insertions(+) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0ce97eff79e2..68956db13772 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -761,6 +761,7 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm); struct lruvec *lock_page_lruvec(struct page *page); struct lruvec *lock_page_lruvec_irq(struct page *page); +struct lruvec *trylock_page_lruvec_irq(struct page *page); struct lruvec *lock_page_lruvec_irqsave(struct page *page, unsigned long *flags); @@ -1251,6 +1252,15 @@ static inline struct lruvec *lock_page_lruvec_irq(struct page *page) return &pgdat->__lruvec; } +static inline struct lruvec *trylock_page_lruvec_irq(struct page *page) +{ + struct pglist_data *pgdat = page_pgdat(page); + + if (spin_trylock_irq(&pgdat->__lruvec.lru_lock)) + return &pgdat->__lruvec; + return NULL; +} + static inline struct lruvec *lock_page_lruvec_irqsave(struct page *page, unsigned long *flagsp) { diff --git a/include/linux/swap.h b/include/linux/swap.h index 46d51d058d05..d344b0fa7925 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -504,6 +504,7 @@ extern struct swap_info_struct *page_swap_info(struct page *); extern struct swap_info_struct *swp_swap_info(swp_entry_t entry); extern bool reuse_swap_page(struct page *, int *); extern int try_to_free_swap(struct page *); +extern void try_to_free_idle_swapcache(struct page *page); struct backing_dev_info; extern int init_swap_address_space(unsigned int type, unsigned long nr_pages); extern void exit_swap_address_space(unsigned int type); @@ -668,6 +669,8 @@ static inline int try_to_free_swap(struct page *page) return 0; } +static inline void try_to_free_idle_swapcache(struct page *page) {} + static inline swp_entry_t get_swap_page(struct page *page) { swp_entry_t entry; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index db29b96f7311..e3e813bfebe2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1213,6 +1213,18 @@ struct lruvec *lock_page_lruvec_irq(struct page *page) return lruvec; } +struct lruvec *trylock_page_lruvec_irq(struct page *page) +{ + struct lruvec *lruvec; + + lruvec = mem_cgroup_page_lruvec(page); + if (spin_trylock_irq(&lruvec->lru_lock)) { + lruvec_memcg_debug(lruvec, page); + return lruvec; + } + return NULL; +} + struct lruvec *lock_page_lruvec_irqsave(struct page *page, unsigned long *flags) { struct lruvec *lruvec; diff --git a/mm/memory.c b/mm/memory.c index b83f734c4e1d..2b6847f4c03e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3012,6 +3012,11 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) munlock_vma_page(old_page); unlock_page(old_page); } + if (page_copied && PageSwapCache(old_page) && + !page_mapped(old_page) && trylock_page(old_page)) { + try_to_free_idle_swapcache(old_page); + unlock_page(old_page); + } put_page(old_page); } return page_copied ? VM_FAULT_WRITE : 0; diff --git a/mm/swapfile.c b/mm/swapfile.c index 2aad85751991..e0dd8937de4e 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -40,6 +40,7 @@ #include #include #include +#include #include #include @@ -1788,6 +1789,34 @@ int try_to_free_swap(struct page *page) return 1; } +void try_to_free_idle_swapcache(struct page *page) +{ + struct lruvec *lruvec; + swp_entry_t entry; + + if (!PageSwapCache(page)) + return; + if (PageWriteback(page)) + return; + if (!PageLRU(page)) + return; + if (page_mapped(page)) + return; + entry.val = page_private(page); + if (__swap_count(entry)) + return; + + lruvec = trylock_page_lruvec_irq(page); + if (!lruvec) + return; + + del_page_from_lru_list(page, lruvec); + ClearPageActive(page); + ClearPageReferenced(page); + add_page_to_lru_list_tail(page, lruvec); + + unlock_page_lruvec_irq(lruvec); +} /* * Free the swap entry like above, but also try to * free the page cache entry if it is the last user. -- 2.30.2