Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp377603imm; Thu, 21 Jun 2018 20:59:19 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJG5RMI5qiAINQLxQDOI3nNC0s4EMzOLuSV85XkC++8Iz1wvpV0UPnoGJi4eABxAoptDADk X-Received: by 2002:a62:4359:: with SMTP id q86-v6mr8956759pfa.140.1529639959566; Thu, 21 Jun 2018 20:59:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529639959; cv=none; d=google.com; s=arc-20160816; b=pqqdNsbl8N/8rHq3TiC1BDMeBeHQHSmYzxN3D34+stDQyGUPmcRYNQ6nL7FFXwqCXN Zv2uo1GkEA1yuGD/Kj69pN5z6HMn6P4lXt36UjvI0/0Mu37O27mqJ02K0vuZjzZbj1sl EnRfNTsyemvC/fcuyIX8CUXrSB/EiKGaSBXVNPxwpjgNdkVj8Y0dqKbEgR+g2p4mfJlB b7ulnCRPTCqHwMAE+CO0jf4SHOVUecHtKP7HFmRRDO5HRONYnWf7cX0UgeXnI3jDOGIo M0WtXSJwdEEkeWkvXL3Lb7g/RZlA7frjZPVSzbqBL6Sx1Q9AxWbpjBKe4UkMa8LNYe4b zOqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=eMXG6THa7WU+f23Q6gi2wd3j/u6C2FHeiewVB7+GHoE=; b=bG3ixlkbZKyNgkhr0yQky9vJlE7vx+TOKjbEcfb5gHCCotOLgIDZg2QR+bv9QjXEZ6 8ZGoi7i+bLk1W45WPgdwbB8dkV9DEbyJGU3M1tfT2faIU6v3aeeqruWtaiBVRGiBR12p d3x7U6BtHKL2KzEDlSsFWe8qVF1281zA4J7s+oNzWXxECqkmbPKEnjemvJS4bWW8eMgz R2Ss9PVt4i0rjiNfK6XZP7rgVIHzi/afp7pOkOYMEyHSGSe9EpYFdOe4fnbLWtnZcWm+ dHVNtzCVL9cTYom06MV9cxyOkFmv9dRFip4LqN7a55bvhPmJ6T+iHVDUz84vBrLq3lm6 Txsw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i9-v6si5520811pgo.36.2018.06.21.20.59.04; Thu, 21 Jun 2018 20:59:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934535AbeFVDzi (ORCPT + 99 others); Thu, 21 Jun 2018 23:55:38 -0400 Received: from mga17.intel.com ([192.55.52.151]:9393 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934406AbeFVDzg (ORCPT ); Thu, 21 Jun 2018 23:55:36 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Jun 2018 20:55:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,255,1526367600"; d="scan'208";a="65335110" Received: from wanpingl-mobl.ccr.corp.intel.com (HELO yhuang6-ux31a.ccr.corp.intel.com) ([10.254.212.200]) by fmsmga004.fm.intel.com with ESMTP; 21 Jun 2018 20:55:24 -0700 From: "Huang, Ying" To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , Hugh Dickins , Minchan Kim , Rik van Riel , Dave Hansen , Naoya Horiguchi , Zi Yan , Daniel Jordan Subject: [PATCH -mm -v4 07/21] mm, THP, swap: Support PMD swap mapping in split_swap_cluster() Date: Fri, 22 Jun 2018 11:51:37 +0800 Message-Id: <20180622035151.6676-8-ying.huang@intel.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20180622035151.6676-1-ying.huang@intel.com> References: <20180622035151.6676-1-ying.huang@intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Huang Ying When splitting a THP in swap cache or failing to allocate a THP when swapin a huge swap cluster, the huge swap cluster will be split. In addition to clear the huge flag of the swap cluster, the PMD swap mapping count recorded in cluster_count() will be set to 0. But we will not touch PMD swap mappings themselves, because it is hard to find them all sometimes. When the PMD swap mappings are operated later, it will be found that the huge swap cluster has been split and the PMD swap mappings will be split at that time. Unless splitting a THP in swap cache (specified via "force" parameter), split_swap_cluster() will return -EEXIST if there is SWAP_HAS_CACHE flag in swap_map[offset]. Because this indicates there is a THP corresponds to this huge swap cluster, and it isn't desired to split the THP. When splitting a THP in swap cache, the position to call split_swap_cluster() is changed to before unlocking sub-pages. So that all sub-pages will be kept locked from the THP has been split to the huge swap cluster is split. This makes the code much easier to be reasoned. Signed-off-by: "Huang, Ying" Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Michal Hocko Cc: Johannes Weiner Cc: Shaohua Li Cc: Hugh Dickins Cc: Minchan Kim Cc: Rik van Riel Cc: Dave Hansen Cc: Naoya Horiguchi Cc: Zi Yan Cc: Daniel Jordan --- include/linux/swap.h | 4 ++-- mm/huge_memory.c | 18 ++++++++++++------ mm/swapfile.c | 45 ++++++++++++++++++++++++++++++--------------- 3 files changed, 44 insertions(+), 23 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index bb9de2cb952a..878f132dabc0 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -617,10 +617,10 @@ static inline swp_entry_t get_swap_page(struct page *page) #endif /* CONFIG_SWAP */ #ifdef CONFIG_THP_SWAP -extern int split_swap_cluster(swp_entry_t entry); +extern int split_swap_cluster(swp_entry_t entry, bool force); extern int split_swap_cluster_map(swp_entry_t entry); #else -static inline int split_swap_cluster(swp_entry_t entry) +static inline int split_swap_cluster(swp_entry_t entry, bool force) { return 0; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2d615328d77f..586d8693b8af 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2502,6 +2502,17 @@ static void __split_huge_page(struct page *page, struct list_head *list, unfreeze_page(head); + /* + * Split swap cluster before unlocking sub-pages. So all + * sub-pages will be kept locked from THP has been split to + * swap cluster is split. + */ + if (PageSwapCache(head)) { + swp_entry_t entry = { .val = page_private(head) }; + + split_swap_cluster(entry, true); + } + for (i = 0; i < HPAGE_PMD_NR; i++) { struct page *subpage = head + i; if (subpage == page) @@ -2728,12 +2739,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) __dec_node_page_state(page, NR_SHMEM_THPS); spin_unlock(&pgdata->split_queue_lock); __split_huge_page(page, list, flags); - if (PageSwapCache(head)) { - swp_entry_t entry = { .val = page_private(head) }; - - ret = split_swap_cluster(entry); - } else - ret = 0; + ret = 0; } else { if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { pr_alert("total_mapcount: %u, page_count(): %u\n", diff --git a/mm/swapfile.c b/mm/swapfile.c index a0141307f3ac..5ff2da89b77c 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1410,21 +1410,6 @@ static void swapcache_free_cluster(swp_entry_t entry) } } } - -int split_swap_cluster(swp_entry_t entry) -{ - struct swap_info_struct *si; - struct swap_cluster_info *ci; - unsigned long offset = swp_offset(entry); - - si = _swap_info_get(entry); - if (!si) - return -EBUSY; - ci = lock_cluster(si, offset); - cluster_clear_huge(ci); - unlock_cluster(ci); - return 0; -} #else static inline void swapcache_free_cluster(swp_entry_t entry) { @@ -4069,6 +4054,36 @@ int split_swap_cluster_map(swp_entry_t entry) unlock_cluster(ci); return 0; } + +int split_swap_cluster(swp_entry_t entry, bool force) +{ + struct swap_info_struct *si; + struct swap_cluster_info *ci; + unsigned long offset = swp_offset(entry); + int ret = 0; + + si = get_swap_device(entry); + if (!si) + return -EINVAL; + ci = lock_cluster(si, offset); + /* The swap cluster has been split by someone else */ + if (!cluster_is_huge(ci)) + goto out; + VM_BUG_ON(!is_cluster_offset(offset)); + VM_BUG_ON(cluster_count(ci) < SWAPFILE_CLUSTER); + /* If not forced, don't split swap cluster has swap cache */ + if (!force && si->swap_map[offset] & SWAP_HAS_CACHE) { + ret = -EEXIST; + goto out; + } + cluster_set_count(ci, SWAPFILE_CLUSTER); + cluster_clear_huge(ci); + +out: + unlock_cluster(ci); + put_swap_device(si); + return ret; +} #endif static int __init swapfile_init(void) -- 2.16.4