Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp2303693ybz; Thu, 30 Apr 2020 14:45:27 -0700 (PDT) X-Google-Smtp-Source: APiQypLI4WSZeUwu8jyrbR5gJYhmCVN/RPaCgERLRclrukDADOnxhqyWGnoomvH3pZ/bUwdkaSm6 X-Received: by 2002:a17:906:c839:: with SMTP id dd25mr504836ejb.164.1588283126891; Thu, 30 Apr 2020 14:45:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588283126; cv=none; d=google.com; s=arc-20160816; b=oKp/vkQ4ne0j33d+yb0K+AtZgRF4rCZhmhPJbjHfg9jYzaHs+6m9XtH0GBKC93lN9B clm/75MdcvAPrfB+eFX2U2WC+m518WJ2EIErA1ORcSFKpDCYDrFImCXnye5qV6pIQ1iY v/rDqxjgHzQ2OlRYj0JcUfvqNk/p0JcdH0z1BiIRoN4pLWlKJXJzEKmOPkLkpsh5fFsp ddRGquWFH+tGu3PgLmGEkRApp/4XUz96M/yAQC6QjzuF6XNCUAKAa1eXs+6fVvgxc230 jr2ueYh1HoPf5/iAg9eOZt31fMleRrXdrXFecsUmTgDDqL0mg62lC0BUyttNo6D0+uTL 4hMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:ironport-sdr:ironport-sdr; bh=zB5cFAJmywjRqTqFsx3gewnKMXM88vX6dc4snuZdqCs=; b=J/LRZ5V7SaB/XXvDnISUxFP+OiUG4VCW39spocHNcHzNn5dMLEyKOlI3sAi71ZP1yi pwEMbJQ99hD8TRM6cA1tgL5k1F0g/j/RJ1xYzP006iVFw3+HdCvM+VQlAfE8OE/kQHZJ ENQ7gIDgU/6aiPz1yMLRvx/2UINM2hfA9GXqs90WAqyvoHq580cPYPaYleKKDNkcMy51 2TExRiIvwJ9PcWPNhcAE+M0UJg01BJmwlVjBJnA1h6KB81b5YbgerOOl0bL5dJ+xwyFA SrrRsqwjjw1BGTuRgkDx4OOAWZl/4xAuvz0FNa+KOXusihHi9kad1OepbR5LgboRZm5Y SP5Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u15si572926ejj.129.2020.04.30.14.45.04; Thu, 30 Apr 2020 14:45:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726881AbgD3Vnk (ORCPT + 99 others); Thu, 30 Apr 2020 17:43:40 -0400 Received: from mga12.intel.com ([192.55.52.136]:52611 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726447AbgD3Vnk (ORCPT ); Thu, 30 Apr 2020 17:43:40 -0400 IronPort-SDR: Bn+a3oMt5M+v05qrBRze0BM28XytmYcLYmsJiKur43A5bm47G3HqbqUKUe7wn0P4XwH1usCOZg tDq4Ucg6wgpw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Apr 2020 14:43:39 -0700 IronPort-SDR: QUbxvfwZU6vxmq8ElmBZZv70FmF6lrpuSXPhXEZqSVKHtqJ01k53HL4+FWeXbOYNm3t886Xu3W FDpLiHsfbnNQ== X-IronPort-AV: E=Sophos;i="5.73,337,1583222400"; d="scan'208";a="459730130" Received: from ahduyck-mobl1.amr.corp.intel.com (HELO [10.213.186.136]) ([10.213.186.136]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Apr 2020 14:43:35 -0700 Subject: Re: [PATCH 5/7] mm: move zone iterator outside of deferred_init_maxorder() To: Daniel Jordan , Andrew Morton , Herbert Xu , Steffen Klassert Cc: Alex Williamson , Dan Williams , Dave Hansen , David Hildenbrand , Jason Gunthorpe , Jonathan Corbet , Josh Triplett , Kirill Tkhai , Michal Hocko , Pavel Machek , Pavel Tatashin , Peter Zijlstra , Randy Dunlap , Shile Zhang , Tejun Heo , Zi Yan , linux-crypto@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20200430201125.532129-1-daniel.m.jordan@oracle.com> <20200430201125.532129-6-daniel.m.jordan@oracle.com> From: Alexander Duyck Message-ID: Date: Thu, 30 Apr 2020 14:43:28 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20200430201125.532129-6-daniel.m.jordan@oracle.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/30/2020 1:11 PM, Daniel Jordan wrote: > padata will soon divide up pfn ranges between threads when parallelizing > deferred init, and deferred_init_maxorder() complicates that by using an > opaque index in addition to start and end pfns. Move the index outside > the function to make splitting the job easier, and simplify the code > while at it. > > deferred_init_maxorder() now always iterates within a single pfn range > instead of potentially multiple ranges, and advances start_pfn to the > end of that range instead of the max-order block so partial pfn ranges > in the block aren't skipped in a later iteration. The section alignment > check in deferred_grow_zone() is removed as well since this alignment is > no longer guaranteed. It's not clear what value the alignment provided > originally. > > Signed-off-by: Daniel Jordan So part of the reason for splitting it up along section aligned boundaries was because we already had an existing functionality in deferred_grow_zone that was going in and pulling out a section aligned chunk and processing it to prepare enough memory for other threads to keep running. I suspect that the section alignment was done because normally I believe that is also the alignment for memory onlining. With this already breaking things up over multiple threads how does this work with deferred_grow_zone? Which thread is it trying to allocate from if it needs to allocate some memory for itself? Also what is to prevent a worker from stop deferred_grow_zone from bailing out in the middle of a max order page block if there is a hole in the middle of the block? > --- > mm/page_alloc.c | 88 +++++++++++++++---------------------------------- > 1 file changed, 27 insertions(+), 61 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 68669d3a5a665..990514d8f0d94 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1708,55 +1708,23 @@ deferred_init_mem_pfn_range_in_zone(u64 *i, struct zone *zone, > } > > /* > - * Initialize and free pages. We do it in two loops: first we initialize > - * struct page, then free to buddy allocator, because while we are > - * freeing pages we can access pages that are ahead (computing buddy > - * page in __free_one_page()). > - * > - * In order to try and keep some memory in the cache we have the loop > - * broken along max page order boundaries. This way we will not cause > - * any issues with the buddy page computation. > + * Initialize the struct pages and then free them to the buddy allocator at > + * most a max order block at a time because while we are freeing pages we can > + * access pages that are ahead (computing buddy page in __free_one_page()). > + * It's also cache friendly. > */ > static unsigned long __init > -deferred_init_maxorder(u64 *i, struct zone *zone, unsigned long *start_pfn, > - unsigned long *end_pfn) > +deferred_init_maxorder(struct zone *zone, unsigned long *start_pfn, > + unsigned long end_pfn) > { > - unsigned long mo_pfn = ALIGN(*start_pfn + 1, MAX_ORDER_NR_PAGES); > - unsigned long spfn = *start_pfn, epfn = *end_pfn; > - unsigned long nr_pages = 0; > - u64 j = *i; > - > - /* First we loop through and initialize the page values */ > - for_each_free_mem_pfn_range_in_zone_from(j, zone, start_pfn, end_pfn) { > - unsigned long t; > - > - if (mo_pfn <= *start_pfn) > - break; > - > - t = min(mo_pfn, *end_pfn); > - nr_pages += deferred_init_pages(zone, *start_pfn, t); > - > - if (mo_pfn < *end_pfn) { > - *start_pfn = mo_pfn; > - break; > - } > - } > - > - /* Reset values and now loop through freeing pages as needed */ > - swap(j, *i); > - > - for_each_free_mem_pfn_range_in_zone_from(j, zone, &spfn, &epfn) { > - unsigned long t; > - > - if (mo_pfn <= spfn) > - break; > + unsigned long nr_pages, pfn; > > - t = min(mo_pfn, epfn); > - deferred_free_pages(spfn, t); > + pfn = ALIGN(*start_pfn + 1, MAX_ORDER_NR_PAGES); > + pfn = min(pfn, end_pfn); > > - if (mo_pfn <= epfn) > - break; > - } > + nr_pages = deferred_init_pages(zone, *start_pfn, pfn); > + deferred_free_pages(*start_pfn, pfn); > + *start_pfn = pfn; > > return nr_pages; > } > @@ -1814,9 +1782,11 @@ static int __init deferred_init_memmap(void *data) > * that we can avoid introducing any issues with the buddy > * allocator. > */ > - while (spfn < epfn) { > - nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); > - cond_resched(); > + for_each_free_mem_pfn_range_in_zone_from(i, zone, &spfn, &epfn) { > + while (spfn < epfn) { > + nr_pages += deferred_init_maxorder(zone, &spfn, epfn); > + cond_resched(); > + } > } > zone_empty: > /* Sanity check that the next zone really is unpopulated */ > @@ -1883,22 +1853,18 @@ deferred_grow_zone(struct zone *zone, unsigned int order) > * that we can avoid introducing any issues with the buddy > * allocator. > */ > - while (spfn < epfn) { > - /* update our first deferred PFN for this section */ > - first_deferred_pfn = spfn; > - > - nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); > - touch_nmi_watchdog(); > - > - /* We should only stop along section boundaries */ > - if ((first_deferred_pfn ^ spfn) < PAGES_PER_SECTION) > - continue; > - > - /* If our quota has been met we can stop here */ > - if (nr_pages >= nr_pages_needed) > - break; > + for_each_free_mem_pfn_range_in_zone_from(i, zone, &spfn, &epfn) { > + while (spfn < epfn) { > + nr_pages += deferred_init_maxorder(zone, &spfn, epfn); > + touch_nmi_watchdog(); > + > + /* If our quota has been met we can stop here */ > + if (nr_pages >= nr_pages_needed) > + goto out; > + } > } > > +out: > pgdat->first_deferred_pfn = spfn; > pgdat_resize_unlock(pgdat, &flags); > >