Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp311214imj; Sat, 16 Feb 2019 00:38:39 -0800 (PST) X-Google-Smtp-Source: AHgI3IbiqzAv4077uFmMQYO30t1GNe/Zxgt4hSj1Z+lVAcsesMNrIBZdJjhrTpP3YiI6bbiZvPua X-Received: by 2002:a63:6cc8:: with SMTP id h191mr8904679pgc.366.1550306319415; Sat, 16 Feb 2019 00:38:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550306319; cv=none; d=google.com; s=arc-20160816; b=YpLEYtuItT6MXf3uc+N1vLbytdTIy6Bnp2KN072ZRrtKPxvTGy/ksnvEa5BLprbzqL UtjlKl748xrlB1aF8qx4xi/dJzBtjHG6pZyWCI3wJTQGFL4lT9na7TNsyENbfOyMY2C5 eJM3QecJuwAi9UJYBkPj2swo1Z4KnfOYEUJP2+d9oFftfcGclfaBhzXt+bYJ9YUv9iTb YwosZdnwT5HSIKZJ2a4+j5aMr+eptDL+KrW6WPXQAlIYu9RRwSv9kXAswRIwW5E6ETze 4GT3+6GaZKowa45fVu+Lbh5atuZNO2LiGNX0MyC3S6/y7VS56Dof+cnIIgIlMQLwNUBV 0aTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :reply-to:references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-signature; bh=902xFr+JOntSJP1cg0P3STe45DXLaJ7bWFq0OYVQRqQ=; b=MCrFMHniGw60OqpOFsrqpV8JXZz+6a7q8XKf540zqw3PsRK7KxVn+QqNhqVGX1j/rf ndIuQA5M2qYhAKzSoehSmm7CDUw+3SPV0CS565FBpZ5+U7Z/rI46jPkzDdhoQsKK5+wh e8j+QBnSB3y75OhyK47Il/IqNIWJgpiIZpi1zF7w68vzkqb+BkAxnuJRNJrrhcx6v1dS ogOe9GicJxpZgyTBJ8MVjEF+lEm1gJa+0NNYe+Y+lwiPfu0GHTiIVsAtmQIC3eB3GIuX 4azeyn00S4An2h6kdqds9pHY/Jwy4mkdF3cFbfGBDO9pMqSLf1eHb7AMNSQMgJNOx0Zk Q3Ew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=KNcQRgco; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=VqdFBUXY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k62si7517043pfc.208.2019.02.16.00.38.23; Sat, 16 Feb 2019 00:38:39 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=KNcQRgco; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=VqdFBUXY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404431AbfBOWLk (ORCPT + 99 others); Fri, 15 Feb 2019 17:11:40 -0500 Received: from wout2-smtp.messagingengine.com ([64.147.123.25]:34205 "EHLO wout2-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2392933AbfBOWJK (ORCPT ); Fri, 15 Feb 2019 17:09:10 -0500 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id D84D2329C; Fri, 15 Feb 2019 17:09:08 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=902xFr+JOntSJ P1cg0P3STe45DXLaJ7bWFq0OYVQRqQ=; b=KNcQRgco65nLK8HcP4rreHPMghTtD tbAicuwBbTE1FHBK/mt1I3d7CrhIMayrxJ7vIYABDxqxw244SAuoqDpERuzwc21A /lenuvOPKvu41G7xdU1zJQKfmGI3Z3Eqwe77Qm7YCFQo7c2UAZsR3TdLMNRLadV1 viPsvHoH32VRO5hDpT7PE08kVsGqwhj9ja219+Z/ckfW0t6cDOfW0xf1GAyPFSQ9 qX12ECY2+hbG5EpEudUaVecKVJnIEyYGGywK7RL8ez/ZkDvv05hlxLtbYDapTgyP SuiDA0rOE78fPWsjpeLtTPfcFQ54n8nA3rvb3S+mQAtbpi5n4KOBvfyhw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=902xFr+JOntSJP1cg0P3STe45DXLaJ7bWFq0OYVQRqQ=; b=VqdFBUXY WLb1yElVXLoPwOR5wnvkIS2U9RfZpTs95eMz6DgbT6jN9jlUWm+08d6hcyXPJbF8 iAP/a7c1WbBagMx/3a+9kAkRRWiZLCeWmaMN9J1RC9ferz+shh6Zu2FATorUpCEG j+LPn15EmNthHAmK/HAWfOxSkgV/UMjCNgrYIakQjJSDSHiWTj9KbxL8+o6LeYnh 8V7xtwgYcdAotJTRPsoJDnssnGkBe8oXLSEPQ3zCk/+TYvaZYXWyW9wUyIusaeDv CaGE46LCYxcqdViEhGJhm1IZm0y5a9ByEsEx3QsuuWKVVvF6FqqE0X8MuKDDlUWi jeQzH7SrN1/WeA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedu X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 1E32EE4511; Fri, 15 Feb 2019 17:09:07 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 05/31] mem_defrag: split a THP if either src or dst is THP only. Date: Fri, 15 Feb 2019 14:08:30 -0800 Message-Id: <20190215220856.29749-6-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Zi Yan During the process of generating physically contiguous memory, it is possible that we want to move a THP to a place with 512 base pages. Exchange pages has not implemented the exchange of a THP and 512 base pages. Instead, we can split the THP and exchange 512 base pages. This increases the chance of creating a large contiguous region. A split THP could be promoted back after all 512 pages are moved to the destination or if none of its subpages is moved. In-place THP promotion will be introduced later in this patch serie. Signed-off-by: Zi Yan --- mm/internal.h | 4 ++ mm/mem_defrag.c | 155 +++++++++++++++++++++++++++++++++++++----------- mm/page_alloc.c | 45 ++++++++++++++ 3 files changed, 168 insertions(+), 36 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 4fe8d1a4d7bb..70a6ef603e5b 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -574,6 +574,10 @@ void expand(struct zone *zone, struct page *page, int low, int high, struct free_area *area, int migratetype); +int expand_free_page(struct zone *zone, struct page *buddy_head, + struct page *page, int buddy_order, int page_order, + struct free_area *area, int migratetype); + void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, unsigned int alloc_flags); diff --git a/mm/mem_defrag.c b/mm/mem_defrag.c index 414909e1c19c..4d458b125c95 100644 --- a/mm/mem_defrag.c +++ b/mm/mem_defrag.c @@ -643,6 +643,15 @@ static void exchange_free(struct page *freepage, unsigned long data) head->num_freepages++; } +static bool page_can_migrate(struct page *page) +{ + if (PageAnon(page)) + return true; + if (page_mapping(page)) + return true; + return false; +} + int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long start_addr, unsigned long end_addr, struct page *anchor_page, unsigned long page_vaddr, @@ -655,6 +664,7 @@ int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, int not_present = 0; bool src_thp = false; +restart: for (scan_address = start_addr; scan_address < end_addr; scan_address += page_size) { struct page *scan_page; @@ -683,6 +693,8 @@ int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, if ((scan_page == compound_head(scan_page)) && PageTransHuge(scan_page) && !PageHuge(scan_page)) src_thp = true; + else + src_thp = false; /* Allow THPs */ if (PageCompound(scan_page) && !src_thp) { @@ -720,13 +732,17 @@ int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, } retry_defrag: - /* migrate */ - if (PageBuddy(dest_page)) { + /* free pages */ + if (page_count(dest_page) == 0 && dest_page->mapping == NULL) { + int buddy_page_order = 0; + unsigned long pfn = page_to_pfn(dest_page); + unsigned long buddy_pfn; + struct page *buddy = dest_page; struct zone *zone = page_zone(dest_page); spinlock_t *zone_lock = &zone->lock; unsigned long zone_lock_flags; unsigned long free_page_order = 0; - int err = 0; + int err = 0, expand_err = 0; struct exchange_alloc_head exchange_alloc_head = {0}; int migratetype = get_pageblock_migratetype(dest_page); @@ -734,32 +750,77 @@ int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, INIT_LIST_HEAD(&exchange_alloc_head.freelist); INIT_LIST_HEAD(&exchange_alloc_head.migratepage_list); - count_vm_events(MEM_DEFRAG_DST_FREE_PAGES, 1<flags) { + failed += 1; + defrag_stats->dst_out_of_bound_failed += 1; + defrag_stats->not_defrag_vpn = scan_address + page_size; + goto quit_defrag; + } + /* spill order-0 pages to buddy allocator from pcplist */ + if (!PageBuddy(dest_page) && !page_drained) { + drain_all_pages(zone); + page_drained = 1; + goto retry_defrag; + } /* lock page_zone(dest_page)->lock */ spin_lock_irqsave(zone_lock, zone_lock_flags); - if (!PageBuddy(dest_page)) { + while (!PageBuddy(buddy) && buddy_page_order < MAX_ORDER) { + buddy_pfn = pfn & ~((1<free_area[free_page_order]), migratetype); + if (expand_err) + goto freepage_isolate_fail; if (!is_migrate_isolate(migratetype)) __mod_zone_freepage_state(zone, -(1UL << scan_page_order), @@ -778,7 +839,7 @@ int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, freepage_isolate_fail: spin_unlock_irqrestore(zone_lock, zone_lock_flags); - +freepage_isolate_fail_unlocked: if (err < 0) { failed += (page_size/PAGE_SIZE); defrag_stats->dst_isolate_free_failed += (page_size/PAGE_SIZE); @@ -844,6 +905,8 @@ int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, if ((dest_page == compound_head(dest_page)) && PageTransHuge(dest_page) && !PageHuge(dest_page)) dst_thp = true; + else + dst_thp = false; if (PageCompound(dest_page) && !dst_thp) { failed += get_contig_page_size(dest_page); @@ -854,37 +917,56 @@ int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, } if (src_thp != dst_thp) { - failed += get_contig_page_size(scan_page); - if (src_thp && !dst_thp) - defrag_stats->src_thp_dst_not_failed += - page_size/PAGE_SIZE; - else /* !src_thp && dst_thp */ - defrag_stats->dst_thp_src_not_failed += - page_size/PAGE_SIZE; + if (src_thp && !dst_thp) { + int ret; + + if (!page_can_migrate(dest_page)) { + failed += get_contig_page_size(scan_page); + defrag_stats->not_defrag_vpn = scan_address + page_size; + goto quit_defrag; + } + get_page(scan_page); + lock_page(scan_page); + if (!PageCompound(scan_page) || is_huge_zero_page(scan_page)) { + ret = 0; + src_thp = false; + goto split_src_done; + } + ret = split_huge_page(scan_page); +split_src_done: + unlock_page(scan_page); + put_page(scan_page); + if (ret) + defrag_stats->src_thp_dst_not_failed += page_size/PAGE_SIZE; + else + goto restart; + } else {/* !src_thp && dst_thp */ + int ret; + + get_page(dest_page); + lock_page(dest_page); + if (!PageCompound(dest_page) || is_huge_zero_page(dest_page)) { + ret = 0; + dst_thp = false; + goto split_dst_done; + } + ret = split_huge_page(dest_page); +split_dst_done: + unlock_page(dest_page); + put_page(dest_page); + if (ret) + defrag_stats->dst_thp_src_not_failed += page_size/PAGE_SIZE; + else + goto retry_defrag; + } + + failed += get_contig_page_size(scan_page); defrag_stats->not_defrag_vpn = scan_address + page_size; goto quit_defrag; /*continue;*/ } - /* free page on pcplist */ - if (page_count(dest_page) == 0) { - /* not managed pages */ - if (!dest_page->flags) { - failed += 1; - defrag_stats->dst_out_of_bound_failed += 1; - - defrag_stats->not_defrag_vpn = scan_address + page_size; - goto quit_defrag; - } - /* spill order-0 pages to buddy allocator from pcplist */ - if (!page_drained) { - drain_all_pages(NULL); - page_drained = 1; - goto retry_defrag; - } - } - if (PageAnon(dest_page)) { count_vm_events(MEM_DEFRAG_DST_ANON_PAGES, 1<dst_anon_failed += 1<= buddy_head && page < (buddy_head + (1< page_order) { + struct page *page_to_free; + + area--; + buddy_order--; + size >>= 1; + + if (page < (buddy_head + size)) + page_to_free = buddy_head + size; + else { + page_to_free = buddy_head; + buddy_head = buddy_head + size; + } + + /* + * Mark as guard pages (or page), that will allow to + * merge back to allocator when buddy will be freed. + * Corresponding page table entries will not be touched, + * pages will stay not present in virtual address space + */ + if (set_page_guard(zone, page_to_free, buddy_order, migratetype)) + continue; + + list_add(&page_to_free->lru, &area->free_list[migratetype]); + area->nr_free++; + set_page_order(page_to_free, buddy_order); + } + return 0; +} + static void check_new_page_bad(struct page *page) { const char *bad_reason = NULL; -- 2.20.1