Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752120AbcDZJis (ORCPT ); Tue, 26 Apr 2016 05:38:48 -0400 Received: from e37.co.us.ibm.com ([32.97.110.158]:57462 "EHLO e37.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751742AbcDZJig (ORCPT ); Tue, 26 Apr 2016 05:38:36 -0400 X-IBM-Helo: d03dlp01.boulder.ibm.com X-IBM-MailFrom: rui.teng@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 2/6] mm/cma: introduce new zone, ZONE_CMA To: js1304@gmail.com, Andrew Morton References: <1461561670-28012-1-git-send-email-iamjoonsoo.kim@lge.com> <1461561670-28012-3-git-send-email-iamjoonsoo.kim@lge.com> Cc: Rik van Riel , Johannes Weiner , mgorman@techsingularity.net, Laura Abbott , Minchan Kim , Marek Szyprowski , Michal Nazarewicz , "Aneesh Kumar K.V" , Vlastimil Babka , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim From: Rui Teng Message-ID: <71acbf31-aba5-c6c3-9336-296ce1d8ad51@linux.vnet.ibm.com> Date: Tue, 26 Apr 2016 17:38:18 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.0 MIME-Version: 1.0 In-Reply-To: <1461561670-28012-3-git-send-email-iamjoonsoo.kim@lge.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16042609-0025-0000-0000-000033A538C2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 24015 Lines: 609 On 4/25/16 1:21 PM, js1304@gmail.com wrote: > From: Joonsoo Kim > > Attached cover-letter: > > This series try to solve problems of current CMA implementation. > > CMA is introduced to provide physically contiguous pages at runtime > without exclusive reserved memory area. But, current implementation > works like as previous reserved memory approach, because freepages > on CMA region are used only if there is no movable freepage. In other > words, freepages on CMA region are only used as fallback. In that > situation where freepages on CMA region are used as fallback, kswapd > would be woken up easily since there is no unmovable and reclaimable > freepage, too. If kswapd starts to reclaim memory, fallback allocation > to MIGRATE_CMA doesn't occur any more since movable freepages are > already refilled by kswapd and then most of freepage on CMA are left > to be in free. This situation looks like exclusive reserved memory case. > > In my experiment, I found that if system memory has 1024 MB memory and > 512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB > free memory is left. Detailed reason is that for keeping enough free > memory for unmovable and reclaimable allocation, kswapd uses below > equation when calculating free memory and it easily go under the watermark. > > Free memory for unmovable and reclaimable = Free total - Free CMA pages > > This is derivated from the property of CMA freepage that CMA freepage > can't be used for unmovable and reclaimable allocation. > > Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA) > is lower than low watermark and tries to make free memory until > (FreeTotal - FreeCMA) is higher than high watermark. That results > in that FreeTotal is moving around 512MB boundary consistently. It > then means that we can't utilize full memory capacity. > > To fix this problem, I submitted some patches [1] about 10 months ago, > but, found some more problems to be fixed before solving this problem. > It requires many hooks in allocator hotpath so some developers doesn't > like it. Instead, some of them suggest different approach [2] to fix > all the problems related to CMA, that is, introducing a new zone to deal > with free CMA pages. I agree that it is the best way to go so implement > here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I > decide to add a new zone rather than piggyback on ZONE_MOVABLE since > they have some differences. First, reserved CMA pages should not be > offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep > MIGRATE_CMA migratetype and insert many hooks on memory hotplug code > to distiguish hotpluggable memory and reserved memory for CMA in the same > zone. It would make memory hotplug code which is already complicated > more complicated. Second, cma_alloc() can be called more frequently > than memory hotplug operation and possibly we need to control > allocation rate of ZONE_CMA to optimize latency in the future. > In this case, separate zone approach is easy to modify. Third, I'd > like to see statistics for CMA, separately. Sometimes, we need to debug > why cma_alloc() is failed and separate statistics would be more helpful > in this situtaion. > > Anyway, this patchset solves four problems related to CMA implementation. > > 1) Utilization problem > As mentioned above, we can't utilize full memory capacity due to the > limitation of CMA freepage and fallback policy. This patchset implements > a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This > typed allocation is used for page cache and anonymous pages which > occupies most of memory usage in normal case so we can utilize full > memory capacity. Below is the experiment result about this problem. > > 8 CPUs, 1024 MB, VIRTUAL MACHINE > make -j16 > > > CMA reserve: 0 MB 512 MB > Elapsed-time: 92.4 186.5 > pswpin: 82 18647 > pswpout: 160 69839 > > > CMA reserve: 0 MB 512 MB > Elapsed-time: 93.1 93.4 > pswpin: 84 46 > pswpout: 183 92 > > FYI, there is another attempt [3] trying to solve this problem in lkml. > And, as far as I know, Qualcomm also has out-of-tree solution for this > problem. > > 2) Reclaim problem > Currently, there is no logic to distinguish CMA pages in reclaim path. > If reclaim is initiated for unmovable and reclaimable allocation, > reclaiming CMA pages doesn't help to satisfy the request and reclaiming > CMA page is just waste. By managing CMA pages in the new zone, we can > skip to reclaim ZONE_CMA completely if it is unnecessary. > > 3) Atomic allocation failure problem > Kswapd isn't started to reclaim pages when allocation request is movable > type and there is enough free page in the CMA region. After bunch of > consecutive movable allocation requests, free pages in ordinary region > (not CMA region) would be exhausted without waking up kswapd. At that time, > if atomic unmovable allocation comes, it can't be successful since there > is not enough page in ordinary region. This problem is reported > by Aneesh [4] and can be solved by this patchset. > > 4) Inefficiently work of compaction > Usual high-order allocation request is unmovable type and it cannot > be serviced from CMA area. In compaction, migration scanner doesn't > distinguish migratable pages on the CMA area and do migration. > In this case, even if we make high-order page on that region, it > cannot be used due to type mismatch. This patch will solve this problem > by separating CMA pages from ordinary zones. > > [1] https://lkml.org/lkml/2014/5/28/64 > [2] https://lkml.org/lkml/2014/11/4/55 > [3] https://lkml.org/lkml/2014/10/15/623 > [4] http://www.spinics.net/lists/linux-mm/msg100562.html > [5] https://lkml.org/lkml/2014/5/30/320 > > For this patch: > > Currently, reserved pages for CMA are managed together with normal pages. > To distinguish them, we used migratetype, MIGRATE_CMA, and > do special handlings for this migratetype. But, it turns out that > there are too many problems with this approach and to fix all of them > needs many more hooks to page allocation and reclaim path so > some developers express their discomfort and problems on CMA aren't fixed > for a long time. > > To terminate this situation and fix CMA problems, this patch implements > ZONE_CMA. Reserved pages for CMA will be managed in this new zone. This > approach will remove all exisiting hooks for MIGRATE_CMA and many > problems related to CMA implementation will be solved. > > This patch only add basic infrastructure of ZONE_CMA. In the following > patch, ZONE_CMA is actually populated and used. > > Adding a new zone could cause two possible problems. One is the overflow > of page flags and the other is GFP_ZONES_TABLE issue. > > Following is page-flags layout described in page-flags-layout.h. > > 1. No sparsemem or sparsemem vmemmap: | NODE | ZONE | ... | FLAGS | > 2. " plus space for last_cpupid: | NODE | ZONE | LAST_CPUPID ... | FLAGS | > 3. classic sparse with space for node:| SECTION | NODE | ZONE | ... | FLAGS | > 4. " plus space for last_cpupid: | SECTION | NODE | ZONE | LAST_CPUPID ... | FLAGS | > 5. classic sparse no space for node: | SECTION | ZONE | ... | FLAGS | > > There is no problem in #1, #2 configurations for 64-bit system. There are > enough room even for extremiely large x86_64 system. 32-bit system would > not have many nodes so it would have no problem, too. > System with #3, #4, #5 configurations could be affected by this zone > addition, but, thanks to recent THP rework which reduce one page flag, > problem surface would be small. In some configurations, problem is > still possible, but, it highly depends on individual configuration > so impact cannot be easily estimated. I guess that usual system > with CONFIG_CMA would not be affected. If there is a problem, > we can adjust section width or node width for that architecture. > > Currently, GFP_ZONES_TABLE is 32-bit value for 32-bit bit operation > in the 32-bit system. If we add one more zone, it will be 48-bit and > 32-bit bit operation cannot be possible. Although it will cause slight > overhead, there is no other way so this patch relax GFP_ZONES_TABLE's > 32-bit limitation. 32-bit System with CONFIG_CMA will be affected by > this change but it would be marginal. > > Note that there are many checkpatch warnings but I think that current > code is better for readability than fixing them up. > > Signed-off-by: Joonsoo Kim > --- > arch/x86/mm/highmem_32.c | 8 +++++ > include/linux/gfp.h | 29 +++++++++++------- > include/linux/mempolicy.h | 2 +- > include/linux/mmzone.h | 31 ++++++++++++++++++- > include/linux/vm_event_item.h | 10 ++++++- > include/trace/events/compaction.h | 10 ++++++- > kernel/power/snapshot.c | 8 +++++ > mm/memory_hotplug.c | 3 ++ > mm/page_alloc.c | 63 +++++++++++++++++++++++++++++++++------ > mm/vmstat.c | 9 +++++- > 10 files changed, 148 insertions(+), 25 deletions(-) > > diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c > index a6d7392..a7fcb12 100644 > --- a/arch/x86/mm/highmem_32.c > +++ b/arch/x86/mm/highmem_32.c > @@ -120,6 +120,14 @@ void __init set_highmem_pages_init(void) > if (!is_highmem(zone)) > continue; > > + /* > + * ZONE_CMA is a special zone that should not be > + * participated in initialization because it's pages > + * would be initialized by initialization of other zones. > + */ > + if (is_zone_cma(zone)) > + continue; > + > zone_start_pfn = zone->zone_start_pfn; > zone_end_pfn = zone_start_pfn + zone->spanned_pages; > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 570383a..4d6c008 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -301,6 +301,12 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) > #define OPT_ZONE_DMA32 ZONE_NORMAL > #endif > > +#ifdef CONFIG_CMA > +#define OPT_ZONE_CMA ZONE_CMA > +#else > +#define OPT_ZONE_CMA ZONE_MOVABLE > +#endif > + > /* > * GFP_ZONE_TABLE is a word size bitstring that is used for looking up the > * zone to use given the lowest 4 bits of gfp_t. Entries are ZONE_SHIFT long > @@ -331,7 +337,6 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) > * 0xe => BAD (MOVABLE+DMA32+HIGHMEM) > * 0xf => BAD (MOVABLE+DMA32+HIGHMEM+DMA) > * > - * GFP_ZONES_SHIFT must be <= 2 on 32 bit platforms. > */ > > #if defined(CONFIG_ZONE_DEVICE) && (MAX_NR_ZONES-1) <= 4 > @@ -341,19 +346,21 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) > #define GFP_ZONES_SHIFT ZONES_SHIFT > #endif > > -#if 16 * GFP_ZONES_SHIFT > BITS_PER_LONG > -#error GFP_ZONES_SHIFT too large to create GFP_ZONE_TABLE integer > +#if !defined(CONFIG_64BITS) && GFP_ZONES_SHIFT > 2 > +#define GFP_ZONE_TABLE_CAST unsigned long long > +#else > +#define GFP_ZONE_TABLE_CAST unsigned long > #endif > > #define GFP_ZONE_TABLE ( \ > - (ZONE_NORMAL << 0 * GFP_ZONES_SHIFT) \ > - | (OPT_ZONE_DMA << ___GFP_DMA * GFP_ZONES_SHIFT) \ > - | (OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * GFP_ZONES_SHIFT) \ > - | (OPT_ZONE_DMA32 << ___GFP_DMA32 * GFP_ZONES_SHIFT) \ > - | (ZONE_NORMAL << ___GFP_MOVABLE * GFP_ZONES_SHIFT) \ > - | (OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * GFP_ZONES_SHIFT) \ > - | (ZONE_MOVABLE << (___GFP_MOVABLE | ___GFP_HIGHMEM) * GFP_ZONES_SHIFT)\ > - | (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * GFP_ZONES_SHIFT)\ > + ((GFP_ZONE_TABLE_CAST) ZONE_NORMAL << 0 * GFP_ZONES_SHIFT) \ > + | ((GFP_ZONE_TABLE_CAST) OPT_ZONE_DMA << ___GFP_DMA * GFP_ZONES_SHIFT) \ > + | ((GFP_ZONE_TABLE_CAST) OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * GFP_ZONES_SHIFT) \ > + | ((GFP_ZONE_TABLE_CAST) OPT_ZONE_DMA32 << ___GFP_DMA32 * GFP_ZONES_SHIFT) \ > + | ((GFP_ZONE_TABLE_CAST) ZONE_NORMAL << ___GFP_MOVABLE * GFP_ZONES_SHIFT) \ > + | ((GFP_ZONE_TABLE_CAST) OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * GFP_ZONES_SHIFT) \ > + | ((GFP_ZONE_TABLE_CAST) OPT_ZONE_CMA << (___GFP_MOVABLE | ___GFP_HIGHMEM) * GFP_ZONES_SHIFT) \ > + | ((GFP_ZONE_TABLE_CAST) OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * GFP_ZONES_SHIFT) \ > ) > > /* > diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h > index 4429d25..c4cc86e 100644 > --- a/include/linux/mempolicy.h > +++ b/include/linux/mempolicy.h > @@ -157,7 +157,7 @@ extern enum zone_type policy_zone; > > static inline void check_highest_zone(enum zone_type k) > { > - if (k > policy_zone && k != ZONE_MOVABLE) > + if (k > policy_zone && k != ZONE_MOVABLE && !is_zone_cma_idx(k)) > policy_zone = k; > } > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index f4ae0abb..5c97ba9 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -322,6 +322,9 @@ enum zone_type { > ZONE_HIGHMEM, > #endif > ZONE_MOVABLE, > +#ifdef CONFIG_CMA > + ZONE_CMA, > +#endif > #ifdef CONFIG_ZONE_DEVICE > ZONE_DEVICE, > #endif > @@ -812,11 +815,37 @@ static inline int zone_movable_is_highmem(void) > } > #endif > > +static inline int is_zone_cma_idx(enum zone_type idx) > +{ > +#ifdef CONFIG_CMA > + return idx == ZONE_CMA; > +#else > + return 0; > +#endif > +} > + > +static inline int is_zone_cma(struct zone *zone) > +{ > + int zone_idx = zone_idx(zone); > + > + return is_zone_cma_idx(zone_idx); > +} > + > +static inline int zone_cma_is_highmem(void) > +{ > +#ifdef CONFIG_HIGHMEM Whether it needs to check the CONFIG_CMA here also? > + return 1; > +#else > + return 0; > +#endif > +} > + > static inline int is_highmem_idx(enum zone_type idx) > { > #ifdef CONFIG_HIGHMEM > return (idx == ZONE_HIGHMEM || > - (idx == ZONE_MOVABLE && zone_movable_is_highmem())); > + (idx == ZONE_MOVABLE && zone_movable_is_highmem()) || > + (is_zone_cma_idx(idx) && zone_cma_is_highmem())); When CONFIG_HIGHMEM defined, zone_cma_is_highmem() will always return 1. I think it is not necessary to call the function here, and even define it. > #else > return 0; > #endif > diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h > index 9ec2940..8e25ba5 100644 > --- a/include/linux/vm_event_item.h > +++ b/include/linux/vm_event_item.h > @@ -19,7 +19,15 @@ > #define HIGHMEM_ZONE(xx) > #endif > > -#define FOR_ALL_ZONES(xx) DMA_ZONE(xx) DMA32_ZONE(xx) xx##_NORMAL, HIGHMEM_ZONE(xx) xx##_MOVABLE > +#ifdef CONFIG_CMA > +#define MOVABLE_ZONE(xx) xx##_MOVABLE, > +#define CMA_ZONE(xx) xx##_CMA > +#else > +#define MOVABLE_ZONE(xx) xx##_MOVABLE > +#define CMA_ZONE(xx) > +#endif > + > +#define FOR_ALL_ZONES(xx) DMA_ZONE(xx) DMA32_ZONE(xx) xx##_NORMAL, HIGHMEM_ZONE(xx) MOVABLE_ZONE(xx) CMA_ZONE(xx) > > enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, > FOR_ALL_ZONES(PGALLOC), > diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h > index 36e2d6f..9d3b254 100644 > --- a/include/trace/events/compaction.h > +++ b/include/trace/events/compaction.h > @@ -38,12 +38,20 @@ > #define IFDEF_ZONE_HIGHMEM(X) > #endif > > +#ifdef CONFIG_CMA > +#define IFDEF_ZONE_CMA(X, Y, Z) X Z > +#else > +#define IFDEF_ZONE_CMA(X, Y, Z) Y > +#endif > + > #define ZONE_TYPE \ > IFDEF_ZONE_DMA( EM (ZONE_DMA, "DMA")) \ > IFDEF_ZONE_DMA32( EM (ZONE_DMA32, "DMA32")) \ > EM (ZONE_NORMAL, "Normal") \ > IFDEF_ZONE_HIGHMEM( EM (ZONE_HIGHMEM,"HighMem")) \ > - EMe(ZONE_MOVABLE,"Movable") > + IFDEF_ZONE_CMA( EM (ZONE_MOVABLE,"Movable"), \ > + EMe(ZONE_MOVABLE,"Movable"), \ > + EMe(ZONE_CMA, "CMA")) > > /* > * First define the enums in the above macros to be exported to userspace > diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c > index 3a97060..e8a7d8f 100644 > --- a/kernel/power/snapshot.c > +++ b/kernel/power/snapshot.c > @@ -1042,6 +1042,14 @@ unsigned int snapshot_additional_pages(struct zone *zone) > { > unsigned int rtree, nodes; > > + /* > + * Estimation of needed pages for ZONE_CMA is already considered > + * when calculating other zones since span of ZONE_CMA is subset > + * of other zones. > + */ > + if (is_zone_cma(zone)) > + return 0; > + > rtree = nodes = DIV_ROUND_UP(zone->spanned_pages, BM_BITS_PER_BLOCK); > rtree += DIV_ROUND_UP(rtree * sizeof(struct rtree_node), > LINKED_PAGE_DATA_SIZE); > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index caf2a14..354fa9c 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1808,6 +1808,9 @@ static int __ref __offline_pages(unsigned long start_pfn, > if (zone_idx(zone) <= ZONE_NORMAL && !can_offline_normal(zone, nr_pages)) > return -EINVAL; > > + if (is_zone_cma(zone)) > + return -EINVAL; > + > /* set above range as isolated */ > ret = start_isolate_page_range(start_pfn, end_pfn, > MIGRATE_MOVABLE, true); > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index ffa93e0..987a87c 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -202,6 +202,9 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = { > 32, > #endif > 32, > +#ifdef CONFIG_CMA > + 32, > +#endif > }; > > EXPORT_SYMBOL(totalram_pages); > @@ -218,6 +221,9 @@ static char * const zone_names[MAX_NR_ZONES] = { > "HighMem", > #endif > "Movable", > +#ifdef CONFIG_CMA > + "CMA", > +#endif > #ifdef CONFIG_ZONE_DEVICE > "Device", > #endif > @@ -4896,6 +4902,15 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, > struct memblock_region *r = NULL, *tmp; > #endif > > + /* > + * Physical pages for ZONE_CMA are belong to other zones now. They > + * are initialized when corresponding zone is initialized and they > + * will be moved to ZONE_CMA later. Zone information will also be > + * adjusted later. > + */ > + if (is_zone_cma_idx(zone)) > + return; > + > if (highest_memmap_pfn < end_pfn - 1) > highest_memmap_pfn = end_pfn - 1; > > @@ -5332,7 +5347,7 @@ static void __init find_usable_zone_for_movable(void) > { > int zone_index; > for (zone_index = MAX_NR_ZONES - 1; zone_index >= 0; zone_index--) { > - if (zone_index == ZONE_MOVABLE) > + if (zone_index == ZONE_MOVABLE || is_zone_cma_idx(zone_index)) > continue; > > if (arch_zone_highest_possible_pfn[zone_index] > > @@ -5541,6 +5556,8 @@ static void __meminit calculate_node_totalpages(struct pglist_data *pgdat, > unsigned long *zholes_size) > { > unsigned long realtotalpages = 0, totalpages = 0; > + unsigned long zone_cma_start_pfn = UINT_MAX; > + unsigned long zone_cma_end_pfn = 0; > enum zone_type i; > > for (i = 0; i < MAX_NR_ZONES; i++) { > @@ -5548,6 +5565,13 @@ static void __meminit calculate_node_totalpages(struct pglist_data *pgdat, > unsigned long zone_start_pfn, zone_end_pfn; > unsigned long size, real_size; > > + if (is_zone_cma_idx(i)) { > + zone->zone_start_pfn = zone_cma_start_pfn; > + size = zone_cma_end_pfn - zone_cma_start_pfn; > + real_size = 0; > + goto init_zone; > + } > + > size = zone_spanned_pages_in_node(pgdat->node_id, i, > node_start_pfn, > node_end_pfn, > @@ -5557,13 +5581,23 @@ static void __meminit calculate_node_totalpages(struct pglist_data *pgdat, > real_size = size - zone_absent_pages_in_node(pgdat->node_id, i, > node_start_pfn, node_end_pfn, > zholes_size); > - if (size) > + if (size) { > zone->zone_start_pfn = zone_start_pfn; > - else > + if (zone_cma_start_pfn > zone_start_pfn) > + zone_cma_start_pfn = zone_start_pfn; > + if (zone_cma_end_pfn < zone_start_pfn + size) > + zone_cma_end_pfn = zone_start_pfn + size; > + } else > zone->zone_start_pfn = 0; > + > +init_zone: > zone->spanned_pages = size; > zone->present_pages = real_size; > > + /* Prevent to over-count node span */ > + if (is_zone_cma_idx(i)) > + size = 0; > + > totalpages += size; > realtotalpages += real_size; > } > @@ -5705,6 +5739,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat) > struct zone *zone = pgdat->node_zones + j; > unsigned long size, realsize, freesize, memmap_pages; > unsigned long zone_start_pfn = zone->zone_start_pfn; > + bool zone_kernel = !is_highmem_idx(j) && !is_zone_cma_idx(j); > > size = zone->spanned_pages; > realsize = freesize = zone->present_pages; > @@ -5715,7 +5750,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat) > * and per-cpu initialisations > */ > memmap_pages = calc_memmap_size(size, realsize); > - if (!is_highmem_idx(j)) { > + if (zone_kernel) { > if (freesize >= memmap_pages) { > freesize -= memmap_pages; > if (memmap_pages) > @@ -5734,7 +5769,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat) > zone_names[0], dma_reserve); > } > > - if (!is_highmem_idx(j)) > + if (zone_kernel) > nr_kernel_pages += freesize; > /* Charge for highmem memmap if there are enough kernel pages */ > else if (nr_kernel_pages > memmap_pages * 2) > @@ -5746,7 +5781,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat) > * when the bootmem allocator frees pages into the buddy system. > * And all highmem pages will be managed by the buddy system. > */ > - zone->managed_pages = is_highmem_idx(j) ? realsize : freesize; > + zone->managed_pages = zone_kernel ? freesize : realsize; > #ifdef CONFIG_NUMA > zone->node = nid; > setup_min_unmapped_ratio(zone); > @@ -5763,7 +5798,12 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat) > mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages); > > lruvec_init(&zone->lruvec); > - if (!size) > + > + /* > + * ZONE_CMA should be initialized even if it has no present > + * page now since pages will be moved to the zone later. > + */ > + if (!size && !is_zone_cma_idx(j)) > continue; > > set_pageblock_order(); > @@ -6217,7 +6257,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn) > arch_zone_lowest_possible_pfn[0] = find_min_pfn_with_active_regions(); > arch_zone_highest_possible_pfn[0] = max_zone_pfn[0]; > for (i = 1; i < MAX_NR_ZONES; i++) { > - if (i == ZONE_MOVABLE) > + if (i == ZONE_MOVABLE || is_zone_cma_idx(i)) > continue; > arch_zone_lowest_possible_pfn[i] = > arch_zone_highest_possible_pfn[i-1]; > @@ -6234,7 +6274,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn) > /* Print out the zone ranges */ > pr_info("Zone ranges:\n"); > for (i = 0; i < MAX_NR_ZONES; i++) { > - if (i == ZONE_MOVABLE) > + if (i == ZONE_MOVABLE || is_zone_cma_idx(i)) > continue; > pr_info(" %-8s ", zone_names[i]); > if (arch_zone_lowest_possible_pfn[i] == > @@ -7048,6 +7088,11 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > */ > if (zone_idx(zone) == ZONE_MOVABLE) > return false; > + > + /* ZONE_CMA never contains unmovable pages */ > + if (is_zone_cma(zone)) > + return false; > + > mt = get_pageblock_migratetype(page); > if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt)) > return false; > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 070fd90..e8c46ad 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -710,8 +710,15 @@ int fragmentation_index(struct zone *zone, unsigned int order) > #define TEXT_FOR_HIGHMEM(xx) > #endif > > +#ifdef CONFIG_CMA > +#define TEXT_FOR_CMA(xx) xx "_cma", > +#else > +#define TEXT_FOR_CMA(xx) > +#endif > + > #define TEXTS_FOR_ZONES(xx) TEXT_FOR_DMA(xx) TEXT_FOR_DMA32(xx) xx "_normal", \ > - TEXT_FOR_HIGHMEM(xx) xx "_movable", > + TEXT_FOR_HIGHMEM(xx) xx "_movable", \ > + TEXT_FOR_CMA(xx) > > const char * const vmstat_text[] = { > /* enum zone_stat_item countes */ >