Received: by 2002:a05:7412:da14:b0:e2:908c:2ebd with SMTP id fe20csp2358024rdb; Tue, 10 Oct 2023 01:09:34 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGTB/FUehSKj603p+FD/C4KlhnRclhuYCYuwG9pZOoaWXLw0RsI59tvgmg+6sWhyu4zuNSx X-Received: by 2002:a05:6358:7e49:b0:143:55e2:ce82 with SMTP id p9-20020a0563587e4900b0014355e2ce82mr18573670rwm.3.1696925374600; Tue, 10 Oct 2023 01:09:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696925374; cv=none; d=google.com; s=arc-20160816; b=MtillR8Oi9cKHBav02vDqsZTZnqGyavxPkJut4D2AlOejpMok/jxP4iJNN4LmT86pg iRxiYYOarVyx63sAwF1VWYrfqWECuqwtm9vQv4ieER1MFfyfhjYa8JgVl5eNzbBue5f0 WaBK2eJp/OaID3t/fs7eU5bJHIH9bXmzwuAJSMEHO2bDXWQglKJy154L93O53G/Yo0tV IvXksH93ljHk+rKHSksSLvm2b+g2djCSVCEcexHqWlkicuRia0JkOsVn/wyyYFzcHpP8 hjGivh4OgUrFHFtrDAAhObA14cYkDVWippqvyzjim9hDN0kMvaE1gr27GlaMLKtThjk0 fUnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:subject:cc:to:from:dkim-signature; bh=HOI8FdEhof1E56vYcfbeenk8tBLZQjqBwMBJBmWAjRk=; fh=aOX5I8+DKhvlhpHBsPAHDTzQ/kR5vQ3o0y0ko7/86sc=; b=0ex4yETBA7eZROJgad5oMkJn+RTQ1Ve8tpMxI+hHODUr9W9+wQXBZuHwNJK0n+kWc0 8pph3evpixMna4Fqg4hWT5kQQNaMURzJO2IUBr/9BWuR2CGyGo2HJaSzrkkrizglXtl1 0XUdDOJUBkLbctP2te55ndp20nh8nK5sASc7CZ1yHDcprlZeMfBAwk8n849ATd0XDFib 0lVocaubH8YeWeV2ju5mO82tNiGNiq7B5sg0a44l8ms6f9Gslu07+jatCbfcydX+Urzq chQmVsRCeYAvCW6D7Si2Wje9XdtEsaJknWtx7XfAxczE0Z8cHFezgwICu1YKOY7j/a0i od1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="D/Lj1BfJ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id o15-20020a656a4f000000b005702257f32csi1235986pgu.840.2023.10.10.01.09.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Oct 2023 01:09:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="D/Lj1BfJ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 53372802CF97; Tue, 10 Oct 2023 01:09:31 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234698AbjJJIJM (ORCPT + 99 others); Tue, 10 Oct 2023 04:09:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234685AbjJJIJL (ORCPT ); Tue, 10 Oct 2023 04:09:11 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 91BA9A4 for ; Tue, 10 Oct 2023 01:09:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696925349; x=1728461349; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=fyIYQttk+5+NGZsawMIlcEz5mFyrT6kUPFEx7wD19EM=; b=D/Lj1BfJcHhFTZjmJ5Q+7TunULLwJ3hkPaYw/qrGeKN/P25Lby6XfA1U H7R26Hlw0+T6PypJ4roMNZnOes/cg/KNkR4QRiXuP6DRK6t5shQNIs/DF ekWdgyv9woaVtqo5TGGOVswXpiVqtqQHnEDpeEOI/AXZ1Bwh5ZN4Zlrx+ Q/KFWhpUdXWDZgruEaY0WmkxPpNJ03A9+VlaLXAKzGIGxwHXqYJh5ORKi Wt0NSg8r78ViIUVSqZOC1lDtm7V8QSFPlIOVnpqyJJCHFqGF/EjIninih xdqWpGjYJh79qXNAN2thOUMfbtLipNxuiT6oqz8HJq/3Vn5+zJI2zl4qH g==; X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="369403929" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="369403929" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:09:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="823692426" X-IronPort-AV: E=Sophos;i="6.03,212,1694761200"; d="scan'208";a="823692426" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Oct 2023 01:09:03 -0700 From: "Huang, Ying" To: Zi Yan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zi Yan , Ryan Roberts , Andrew Morton , "Matthew Wilcox (Oracle)" , David Hildenbrand , "Yin, Fengwei" , Yu Zhao , Vlastimil Babka , Johannes Weiner , Baolin Wang , Kemeng Shi , Mel Gorman , Rohan Puri , Mcgrof Chamberlain , Adam Manzanares , John Hubbard Subject: Re: [RFC PATCH 1/4] mm/compaction: add support for >0 order folio memory compaction. References: <20230912162815.440749-1-zi.yan@sent.com> <20230912162815.440749-2-zi.yan@sent.com> Date: Tue, 10 Oct 2023 16:07:00 +0800 In-Reply-To: <20230912162815.440749-2-zi.yan@sent.com> (Zi Yan's message of "Tue, 12 Sep 2023 12:28:12 -0400") Message-ID: <87mswqhpej.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Spam-Status: No, score=2.7 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Tue, 10 Oct 2023 01:09:31 -0700 (PDT) X-Spam-Level: ** Zi Yan writes: > From: Zi Yan > > Before, memory compaction only migrates order-0 folios and skips >0 order > folios. This commit adds support for >0 order folio compaction by keeping > isolated free pages at their original size without splitting them into > order-0 pages and using them directly during migration process. > > What is different from the prior implementation: > 1. All isolated free pages are kept in a MAX_ORDER+1 array of page lists, > where each page list stores free pages in the same order. > 2. All free pages are not post_alloc_hook() processed nor buddy pages, > although their orders are stored in first page's private like buddy > pages. > 3. During migration, in new page allocation time (i.e., in > compaction_alloc()), free pages are then processed by post_alloc_hook(). > When migration fails and a new page is returned (i.e., in > compaction_free()), free pages are restored by reversing the > post_alloc_hook() operations. > > Step 3 is done for a latter optimization that splitting and/or merging free > pages during compaction becomes easier. > > Signed-off-by: Zi Yan > --- > mm/compaction.c | 108 +++++++++++++++++++++++++++++++++++++++--------- > mm/internal.h | 7 +++- > 2 files changed, 94 insertions(+), 21 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 01ba298739dd..868e92e55d27 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -107,6 +107,44 @@ static void split_map_pages(struct list_head *list) > list_splice(&tmp_list, list); > } > > +static unsigned long release_free_list(struct free_list *freepages) > +{ > + int order; > + unsigned long high_pfn = 0; > + > + for (order = 0; order <= MAX_ORDER; order++) { > + struct page *page, *next; > + > + list_for_each_entry_safe(page, next, &freepages[order].pages, lru) { > + unsigned long pfn = page_to_pfn(page); > + > + list_del(&page->lru); > + /* > + * Convert free pages into post allocation pages, so > + * that we can free them via __free_page. > + */ > + post_alloc_hook(page, order, __GFP_MOVABLE); > + __free_pages(page, order); > + if (pfn > high_pfn) > + high_pfn = pfn; > + } > + } > + return high_pfn; > +} > + > +static void sort_free_pages(struct list_head *src, struct free_list *dst) > +{ > + unsigned int order; > + struct page *page, *next; > + > + list_for_each_entry_safe(page, next, src, lru) { > + order = buddy_order(page); > + > + list_move(&page->lru, &dst[order].pages); > + dst[order].nr_free++; > + } > +} > + > #ifdef CONFIG_COMPACTION > bool PageMovable(struct page *page) > { > @@ -1422,6 +1460,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn) > { > unsigned long start_pfn, end_pfn; > struct page *page; > + LIST_HEAD(freelist); > > /* Do not search around if there are enough pages already */ > if (cc->nr_freepages >= cc->nr_migratepages) > @@ -1439,7 +1478,8 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn) > if (!page) > return; > > - isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false); > + isolate_freepages_block(cc, &start_pfn, end_pfn, &freelist, 1, false); > + sort_free_pages(&freelist, cc->freepages); > > /* Skip this pageblock in the future as it's full or nearly full */ > if (start_pfn == end_pfn && !cc->no_set_skip_hint) > @@ -1568,7 +1608,7 @@ static void fast_isolate_freepages(struct compact_control *cc) > nr_scanned += nr_isolated - 1; > total_isolated += nr_isolated; > cc->nr_freepages += nr_isolated; > - list_add_tail(&page->lru, &cc->freepages); > + list_add_tail(&page->lru, &cc->freepages[order].pages); > count_compact_events(COMPACTISOLATED, nr_isolated); > } else { > /* If isolation fails, abort the search */ > @@ -1642,13 +1682,13 @@ static void isolate_freepages(struct compact_control *cc) > unsigned long isolate_start_pfn; /* exact pfn we start at */ > unsigned long block_end_pfn; /* end of current pageblock */ > unsigned long low_pfn; /* lowest pfn scanner is able to scan */ > - struct list_head *freelist = &cc->freepages; > unsigned int stride; > + LIST_HEAD(freelist); > > /* Try a small search of the free lists for a candidate */ > fast_isolate_freepages(cc); > if (cc->nr_freepages) > - goto splitmap; > + return; > > /* > * Initialise the free scanner. The starting point is where we last > @@ -1708,7 +1748,8 @@ static void isolate_freepages(struct compact_control *cc) > > /* Found a block suitable for isolating free pages from. */ > nr_isolated = isolate_freepages_block(cc, &isolate_start_pfn, > - block_end_pfn, freelist, stride, false); > + block_end_pfn, &freelist, stride, false); > + sort_free_pages(&freelist, cc->freepages); > > /* Update the skip hint if the full pageblock was scanned */ > if (isolate_start_pfn == block_end_pfn) > @@ -1749,10 +1790,6 @@ static void isolate_freepages(struct compact_control *cc) > * and the loop terminated due to isolate_start_pfn < low_pfn > */ > cc->free_pfn = isolate_start_pfn; > - > -splitmap: > - /* __isolate_free_page() does not map the pages */ > - split_map_pages(freelist); > } > > /* > @@ -1763,18 +1800,21 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data) > { > struct compact_control *cc = (struct compact_control *)data; > struct folio *dst; > + int order = folio_order(src); > > - if (list_empty(&cc->freepages)) { > + if (!cc->freepages[order].nr_free) { > isolate_freepages(cc); > - > - if (list_empty(&cc->freepages)) > + if (!cc->freepages[order].nr_free) > return NULL; > } > > - dst = list_entry(cc->freepages.next, struct folio, lru); > + dst = list_first_entry(&cc->freepages[order].pages, struct folio, lru); > + cc->freepages[order].nr_free--; > list_del(&dst->lru); > - cc->nr_freepages--; > - > + post_alloc_hook(&dst->page, order, __GFP_MOVABLE); > + if (order) > + prep_compound_page(&dst->page, order); > + cc->nr_freepages -= 1 << order; > return dst; > } > > @@ -1786,9 +1826,34 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data) > static void compaction_free(struct folio *dst, unsigned long data) > { > struct compact_control *cc = (struct compact_control *)data; > + int order = folio_order(dst); > + struct page *page = &dst->page; > > - list_add(&dst->lru, &cc->freepages); > - cc->nr_freepages++; > + if (order) { > + int i; > + > + page[1].flags &= ~PAGE_FLAGS_SECOND; > + for (i = 1; i < (1 << order); i++) { > + page[i].mapping = NULL; > + clear_compound_head(&page[i]); > + page[i].flags &= ~PAGE_FLAGS_CHECK_AT_PREP; > + } > + > + } > + /* revert post_alloc_hook() operations */ > + page->mapping = NULL; > + page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; > + set_page_count(page, 0); > + page_mapcount_reset(page); > + reset_page_owner(page, order); > + page_table_check_free(page, order); > + arch_free_page(page, order); > + set_page_private(page, order); > + INIT_LIST_HEAD(&dst->lru); > + > + list_add(&dst->lru, &cc->freepages[order].pages); > + cc->freepages[order].nr_free++; > + cc->nr_freepages += 1 << order; > } > > /* possible outcome of isolate_migratepages */ > @@ -2412,6 +2477,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) > const bool sync = cc->mode != MIGRATE_ASYNC; > bool update_cached; > unsigned int nr_succeeded = 0; > + int order; > > /* > * These counters track activities during zone compaction. Initialize > @@ -2421,7 +2487,10 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) > cc->total_free_scanned = 0; > cc->nr_migratepages = 0; > cc->nr_freepages = 0; > - INIT_LIST_HEAD(&cc->freepages); > + for (order = 0; order <= MAX_ORDER; order++) { > + INIT_LIST_HEAD(&cc->freepages[order].pages); > + cc->freepages[order].nr_free = 0; > + } > INIT_LIST_HEAD(&cc->migratepages); > > cc->migratetype = gfp_migratetype(cc->gfp_mask); > @@ -2607,7 +2676,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) > * so we don't leave any returned pages behind in the next attempt. > */ > if (cc->nr_freepages > 0) { > - unsigned long free_pfn = release_freepages(&cc->freepages); > + unsigned long free_pfn = release_free_list(cc->freepages); > > cc->nr_freepages = 0; > VM_BUG_ON(free_pfn == 0); > @@ -2626,7 +2695,6 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) > > trace_mm_compaction_end(cc, start_pfn, end_pfn, sync, ret); > > - VM_BUG_ON(!list_empty(&cc->freepages)); > VM_BUG_ON(!list_empty(&cc->migratepages)); > > return ret; > diff --git a/mm/internal.h b/mm/internal.h > index 8c90e966e9f8..f5c691bb5c1c 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -465,6 +465,11 @@ int split_free_page(struct page *free_page, > /* > * in mm/compaction.c > */ > + > +struct free_list { > + struct list_head pages; > + unsigned long nr_free; Do we really need nr_free? Is it enough just to use list_empty(&free_list->pages)? > +}; > /* > * compact_control is used to track pages being migrated and the free pages > * they are being migrated to during memory compaction. The free_pfn starts > @@ -473,7 +478,7 @@ int split_free_page(struct page *free_page, > * completes when free_pfn <= migrate_pfn > */ > struct compact_control { > - struct list_head freepages; /* List of free pages to migrate to */ > + struct free_list freepages[MAX_ORDER + 1]; /* List of free pages to migrate to */ > struct list_head migratepages; /* List of pages being migrated */ > unsigned int nr_freepages; /* Number of isolated free pages */ > unsigned int nr_migratepages; /* Number of pages to migrate */ -- Best Regards, Huang, Ying