2015-02-12 07:30:11

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC 00/16] Introduce ZONE_CMA

Hello,

This series try to solve problems of current CMA implementation.

CMA is introduced to provide physically contiguous pages at runtime
without exclusive reserved memory area. But, current implementation
works like as previous reserved memory approach, because freepages
on CMA region are used only if there is no movable freepage. In other
words, freepages on CMA region are only used as fallback. In that
situation where freepages on CMA region are used as fallback, kswapd
would be woken up easily since there is no unmovable and reclaimable
freepage, too. If kswapd starts to reclaim memory, fallback allocation
to MIGRATE_CMA doesn't occur any more since movable freepages are
already refilled by kswapd and then most of freepage on CMA are left
to be in free. This situation looks like exclusive reserved memory case.

In my experiment, I found that if system memory has 1024 MB memory and
512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
free memory is left. Detailed reason is that for keeping enough free
memory for unmovable and reclaimable allocation, kswapd uses below
equation when calculating free memory and it easily go under the watermark.

Free memory for unmovable and reclaimable = Free total - Free CMA pages

This is derivated from the property of CMA freepage that CMA freepage
can't be used for unmovable and reclaimable allocation.

Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
is lower than low watermark and tries to make free memory until
(FreeTotal - FreeCMA) is higher than high watermark. That results
in that FreeTotal is moving around 512MB boundary consistently. It
then means that we can't utilize full memory capacity.

To fix this problem, I submitted some patches [1] about 10 months ago,
but, found some more problems to be fixed before solving this problem.
It requires many hooks in allocator hotpath so some developers doesn't
like it. Instead, some of them suggest different approach [2] to fix
all the problems related to CMA, that is, introducing a new zone to deal
with free CMA pages. I agree that it is the best way to go so implement
here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
decide to add a new zone rather than piggyback on ZONE_MOVABLE since
they have some differences. First, reserved CMA pages should not be
offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
to distiguish hotpluggable memory and reserved memory for CMA in the same
zone. It would make memory hotplug code which is already complicated
more complicated. Second, cma_alloc() can be called more frequently
than memory hotplug operation and possibly we need to control
allocation rate of ZONE_CMA to optimize latency in the future.
In this case, separate zone approach is easy to modify. Third, I'd
like to see statistics for CMA, separately. Sometimes, we need to debug
why cma_alloc() is failed and separate statistics would be more helpful
in this situtaion.

Anyway, this patchset solves three problems in CMA all at once.

1) Utilization problem
As mentioned above, we can't utilize full memory capacity due to the
limitation of CMA freepage and fallback policy. This patchset implements
a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
typed allocation is used for page cache and anonymous pages which
occupies most of memory usage in normal case so we can utilize full
memory capacity. Below is the experiment result about this problem.

8 CPUs, 1024 MB, VIRTUAL MACHINE
make -j16

<Before this series>
CMA reserve: 0 MB 512 MB
Elapsed-time: 92.4 186.5
pswpin: 82 18647
pswpout: 160 69839

<After this series>
CMA reserve: 0 MB 512 MB
Elapsed-time: 93.1 93.4
pswpin: 84 46
pswpout: 183 92

FYI, there is another attempt [3] trying to solve this problem in lkml.
And, as far as I know, Qualcomm also has out-of-tree solution for this
problem.

2) Reclaim problem
Currently, there is no logic to distinguish CMA pages in reclaim path.
If reclaim is initiated for unmovable and reclaimable allocation,
reclaiming CMA pages doesn't help to satisfy the request and reclaiming
CMA page is just waste. By managing CMA pages in the new zone, we can
skip to reclaim ZONE_CMA completely if it is unnecessary.

3) Incorrect watermark check problem
Currently, although we have statistics for number of freepage per order
in the zone, there is no statistics for number of CMA freepage per order.
This causes incorrect freepage calculation on high order allocation
request. For unmovable and reclaimable allocation request, we can't use
CMA freepage so we should subtract it's number on freepage calculation.
But, because we don't have such value per order, we will do incorrect
calculation. Currently, we did some trick and watermark check would be
passed with more relaxed condition [4]. With the new zone, we don't
need to worry about correct calculation because watermark check for
ZONE_CMA is invoked only by who can use CMA freepage so problem would
disappear itself.

There is one disadvantage from this implementation.

1) Break non-overlapped zone assumption
CMA regions could be spread to all memory range, so, to keep all of them
into one zone, span of ZONE_CMA would be overlap to other zones'.
I'm not sure that there is an assumption about possibility of zone overlap
But, if ZONE_CMA is introduced, this assumption becomes reality
so we should deal with this situation. I investigated most of sites
that iterates pfn on certain zone and found that they normally doesn't
consider zone overlap. I tried to handle these cases by myself in the
early of this series. I hope that there is no more site that depends on
non-overlap zone assumption when iterating pfn on certain zone.

I passed boot test on x86, ARM32 and ARM64. I did some stress tests
on x86 and there is no problem. Feel free to enjoy and please give me
a feedback. :)

This patchset is based on v3.18.

Thanks.

[1] https://lkml.org/lkml/2014/5/28/64
[2] https://lkml.org/lkml/2014/11/4/55
[3] https://lkml.org/lkml/2014/10/15/623
[4] https://lkml.org/lkml/2014/5/30/320


Joonsoo Kim (16):
mm/page_alloc: correct highmem memory statistics
mm/writeback: correct dirty page calculation for highmem
mm/highmem: make nr_free_highpages() handles all highmem zones by
itself
mm/vmstat: make node_page_state() handles all zones by itself
mm/vmstat: watch out zone range overlap
mm/page_alloc: watch out zone range overlap
mm/page_isolation: watch out zone range overlap
power: watch out zone range overlap
mm/cma: introduce cma_total_pages() for future use
mm/highmem: remove is_highmem_idx()
mm/page_alloc: clean-up free_area_init_core()
mm/cma: introduce new zone, ZONE_CMA
mm/cma: populate ZONE_CMA and use this zone when GFP_HIGHUSERMOVABLE
mm/cma: print stealed page count
mm/cma: remove ALLOC_CMA
mm/cma: remove MIGRATE_CMA

arch/x86/include/asm/sparsemem.h | 2 +-
arch/x86/mm/highmem_32.c | 3 +
include/linux/cma.h | 9 ++
include/linux/gfp.h | 31 +++---
include/linux/mempolicy.h | 2 +-
include/linux/mm.h | 1 +
include/linux/mmzone.h | 58 +++++-----
include/linux/page-flags-layout.h | 2 +
include/linux/vm_event_item.h | 8 +-
include/linux/vmstat.h | 26 +----
kernel/power/snapshot.c | 15 +++
lib/show_mem.c | 2 +-
mm/cma.c | 70 ++++++++++--
mm/compaction.c | 6 +-
mm/highmem.c | 12 +-
mm/hugetlb.c | 2 +-
mm/internal.h | 3 +-
mm/memory_hotplug.c | 3 +
mm/mempolicy.c | 3 +-
mm/page-writeback.c | 8 +-
mm/page_alloc.c | 223 +++++++++++++++++++++----------------
mm/page_isolation.c | 14 ++-
mm/vmscan.c | 2 +-
mm/vmstat.c | 16 ++-
24 files changed, 317 insertions(+), 204 deletions(-)

--
1.7.9.5


2015-02-12 07:30:13

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC 01/16] mm/page_alloc: correct highmem memory statistics

ZONE_MOVABLE could be treated as highmem so we need to consider it for
accurate statistics. And, in following patches, ZONE_CMA will be
introduced and it can be treated as highmem, too. So, instead of
manually adding stat of ZONE_MOVABLE, looping all zones and check whether
the zone is highmem or not and add stat of the zone which can be treated
as highmem.

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/page_alloc.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 616a2c9..c784035 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3105,6 +3105,8 @@ void si_meminfo_node(struct sysinfo *val, int nid)
{
int zone_type; /* needs to be signed */
unsigned long managed_pages = 0;
+ unsigned long managed_highpages = 0;
+ unsigned long free_highpages = 0;
pg_data_t *pgdat = NODE_DATA(nid);

for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
@@ -3113,12 +3115,19 @@ void si_meminfo_node(struct sysinfo *val, int nid)
val->sharedram = node_page_state(nid, NR_SHMEM);
val->freeram = node_page_state(nid, NR_FREE_PAGES);
#ifdef CONFIG_HIGHMEM
- val->totalhigh = pgdat->node_zones[ZONE_HIGHMEM].managed_pages;
- val->freehigh = zone_page_state(&pgdat->node_zones[ZONE_HIGHMEM],
- NR_FREE_PAGES);
+ for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++) {
+ struct zone *zone = &pgdat->node_zones[zone_type];
+
+ if (is_highmem(zone)) {
+ managed_highpages += zone->managed_pages;
+ free_highpages += zone_page_state(zone, NR_FREE_PAGES);
+ }
+ }
+ val->totalhigh = managed_highpages;
+ val->freehigh = free_highpages;
#else
- val->totalhigh = 0;
- val->freehigh = 0;
+ val->totalhigh = managed_highpages;
+ val->freehigh = free_highpages;
#endif
val->mem_unit = PAGE_SIZE;
}
--
1.7.9.5

2015-02-12 07:30:17

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC 02/16] mm/writeback: correct dirty page calculation for highmem

ZONE_MOVABLE could be treated as highmem so we need to consider it for
accurate calculation of dirty pages. And, in following patches, ZONE_CMA
will be introduced and it can be treated as highmem, too. So, instead of
manually adding stat of ZONE_MOVABLE, looping all zones and check whether
the zone is highmem or not and add stat of the zone which can be treated
as highmem.

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/page-writeback.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 19ceae8..942f6b3 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -198,11 +198,15 @@ static unsigned long highmem_dirtyable_memory(unsigned long total)
#ifdef CONFIG_HIGHMEM
int node;
unsigned long x = 0;
+ int i;

for_each_node_state(node, N_HIGH_MEMORY) {
- struct zone *z = &NODE_DATA(node)->node_zones[ZONE_HIGHMEM];
+ for (i = 0; i < MAX_NR_ZONES; i++) {
+ struct zone *z = &NODE_DATA(node)->node_zones[i];

- x += zone_dirtyable_memory(z);
+ if (is_highmem(z))
+ x += zone_dirtyable_memory(z);
+ }
}
/*
* Unreclaimable memory (kernel memory or anonymous memory
--
1.7.9.5

2015-02-12 07:30:15

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC 03/16] mm/highmem: make nr_free_highpages() handles all highmem zones by itself

nr_free_highpages() manually add statistics per each highmem zone
and return total value for them. Whenever we add a new highmem zone,
we need to consider this function and it's really troublesome. Make
it handles all highmem zones by itself.

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/highmem.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/mm/highmem.c b/mm/highmem.c
index 123bcd3..50b4ca6 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -112,16 +112,12 @@ EXPORT_PER_CPU_SYMBOL(__kmap_atomic_idx);

unsigned int nr_free_highpages (void)
{
- pg_data_t *pgdat;
+ struct zone *zone;
unsigned int pages = 0;

- for_each_online_pgdat(pgdat) {
- pages += zone_page_state(&pgdat->node_zones[ZONE_HIGHMEM],
- NR_FREE_PAGES);
- if (zone_movable_is_highmem())
- pages += zone_page_state(
- &pgdat->node_zones[ZONE_MOVABLE],
- NR_FREE_PAGES);
+ for_each_populated_zone(zone) {
+ if (is_highmem(zone))
+ pages += zone_page_state(zone, NR_FREE_PAGES);
}

return pages;
--
1.7.9.5

2015-02-12 07:30:18

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC 04/16] mm/vmstat: make node_page_state() handles all zones by itself

node_page_state() manually add statistics per each zone and return
total value for all zones. Whenever we add a new zone, we need to
consider this function and it's really troublesome. Make it handles
all zones by itself.

Signed-off-by: Joonsoo Kim <[email protected]>
---
include/linux/vmstat.h | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 82e7db7..676488a 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -170,19 +170,13 @@ static inline unsigned long node_page_state(int node,
enum zone_stat_item item)
{
struct zone *zones = NODE_DATA(node)->node_zones;
+ int i;
+ unsigned long count = 0;

- return
-#ifdef CONFIG_ZONE_DMA
- zone_page_state(&zones[ZONE_DMA], item) +
-#endif
-#ifdef CONFIG_ZONE_DMA32
- zone_page_state(&zones[ZONE_DMA32], item) +
-#endif
-#ifdef CONFIG_HIGHMEM
- zone_page_state(&zones[ZONE_HIGHMEM], item) +
-#endif
- zone_page_state(&zones[ZONE_NORMAL], item) +
- zone_page_state(&zones[ZONE_MOVABLE], item);
+ for (i = 0; i < MAX_NR_ZONES; i++)
+ count += zone_page_state(zones + i, item);
+
+ return count;
}

extern void zone_statistics(struct zone *, struct zone *, gfp_t gfp);
--
1.7.9.5

2015-02-12 07:34:15

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC 05/16] mm/vmstat: watch out zone range overlap

In the following patches, new zone, ZONE_CMA, will be introduced and
it would be overlapped with other zones. Currently, many places
iterating pfn range doesn't consider possibility of zone overlap and
this would cause a problem such as printing wrong statistics information.
To prevent this situation, this patch add some code to consider zone
overlapping before adding ZONE_CMA.

pagetypeinfo_showblockcount_print() prints zone's statistics so should
consider zone overlap.

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/vmstat.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 1b12d39..7a4ac8e 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -984,6 +984,8 @@ static void pagetypeinfo_showblockcount_print(struct seq_file *m,
continue;

page = pfn_to_page(pfn);
+ if (page_zone(page) != zone)
+ continue;

/* Watch for unexpected holes punched in the memmap */
if (!memmap_valid_within(pfn, page, zone))
--
1.7.9.5

2015-02-12 07:33:55

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC 06/16] mm/page_alloc: watch out zone range overlap

In the following patches, new zone, ZONE_CMA, will be introduced and
it would be overlapped with other zones. Currently, many places
iterating pfn range doesn't consider possibility of zone overlap and
this would cause a problem such as printing wrong statistics information.
To prevent this situation, this patch add some code to consider zone
overlapping before adding ZONE_CMA.

setup_zone_migrate_reserve() reserve some pages for specific zone so
should consider zone overlap.

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/page_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c784035..1c45934b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4040,8 +4040,8 @@ static void setup_zone_migrate_reserve(struct zone *zone)
continue;
page = pfn_to_page(pfn);

- /* Watch out for overlapping nodes */
- if (page_to_nid(page) != zone_to_nid(zone))
+ /* Watch out for overlapping zones */
+ if (page_zone(page) != zone)
continue;

block_migratetype = get_pageblock_migratetype(page);
--
1.7.9.5

2015-02-12 07:33:33

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC 07/16] mm/page_isolation: watch out zone range overlap

In the following patches, new zone, ZONE_CMA, will be introduced and
it would be overlapped with other zones. Currently, many places
iterating pfn range doesn't consider possibility of zone overlap and
this would cause a problem such as printing wrong statistics information.
To prevent this situation, this patch add some code to consider zone
overlapping before adding ZONE_CMA.

pfn range argument provieded to test_pages_isolated() should be in
a single zone. If not, zone lock doesn't work to protect free state of
buddy freepage.

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/page_isolation.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index c8778f7..883e78d 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -210,8 +210,8 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
* Returns 1 if all pages in the range are isolated.
*/
static int
-__test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
- bool skip_hwpoisoned_pages)
+__test_page_isolated_in_pageblock(struct zone *zone, unsigned long pfn,
+ unsigned long end_pfn, bool skip_hwpoisoned_pages)
{
struct page *page;

@@ -221,6 +221,9 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
continue;
}
page = pfn_to_page(pfn);
+ if (page_zone(page) != zone)
+ break;
+
if (PageBuddy(page)) {
/*
* If race between isolatation and allocation happens,
@@ -281,7 +284,7 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
/* Check all pages are free or marked as ISOLATED */
zone = page_zone(page);
spin_lock_irqsave(&zone->lock, flags);
- ret = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
+ ret = __test_page_isolated_in_pageblock(zone, start_pfn, end_pfn,
skip_hwpoisoned_pages);
spin_unlock_irqrestore(&zone->lock, flags);
return ret ? 0 : -EBUSY;
--
1.7.9.5

2015-02-12 07:30:50

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC 08/16] power: watch out zone range overlap

In the following patches, new zone, ZONE_CMA, will be introduced and
it would be overlapped with other zones. Currently, many places
iterating pfn range doesn't consider possibility of zone overlap and
this would cause a problem such as printing wrong statistics information.
To prevent this situation, this patch add some code to consider zone
overlapping before adding ZONE_CMA.

mark_free_pages() check and mark free status of page on the zone. Since
we iterate pfn to unmark free bit and we iterate buddy list directly
to mark free bit, if we don't check page's zone and mark it as non-free,
following buddy checker missed the page which is not on this zone and
we can get false free status.

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/page_alloc.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1c45934b..7733663 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1365,6 +1365,9 @@ void mark_free_pages(struct zone *zone)
if (pfn_valid(pfn)) {
struct page *page = pfn_to_page(pfn);

+ if (page_zone(page) != zone)
+ continue;
+
if (!swsusp_page_is_forbidden(page))
swsusp_unset_page_free(page);
}
--
1.7.9.5

2015-02-12 07:33:06

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC 09/16] mm/cma: introduce cma_total_pages() for future use

In the following patches, total reserved page count is needed to
initialize ZONE_CMA. This is the preparation step for that.

Signed-off-by: Joonsoo Kim <[email protected]>
---
include/linux/cma.h | 9 +++++++++
mm/cma.c | 17 +++++++++++++++++
2 files changed, 26 insertions(+)

diff --git a/include/linux/cma.h b/include/linux/cma.h
index a93438b..aeaea90 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -15,6 +15,9 @@

struct cma;

+#ifdef CONFIG_CMA
+extern unsigned long cma_total_pages(unsigned long node_start_pfn,
+ unsigned long node_end_pfn);
extern phys_addr_t cma_get_base(struct cma *cma);
extern unsigned long cma_get_size(struct cma *cma);

@@ -27,4 +30,10 @@ extern int cma_init_reserved_mem(phys_addr_t base,
struct cma **res_cma);
extern struct page *cma_alloc(struct cma *cma, int count, unsigned int align);
extern bool cma_release(struct cma *cma, struct page *pages, int count);
+
+#else
+static inline unsigned long cma_total_pages(unsigned long node_start_pfn,
+ unsigned long node_end_pfn) { return 0; }
+
+#endif /* CONFIG_CMA */
#endif
diff --git a/mm/cma.c b/mm/cma.c
index c35ceef..f817b91 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -40,6 +40,23 @@ struct cma cma_areas[MAX_CMA_AREAS];
unsigned cma_area_count;
static DEFINE_MUTEX(cma_mutex);

+unsigned long cma_total_pages(unsigned long node_start_pfn,
+ unsigned long node_end_pfn)
+{
+ int i;
+ unsigned long total_pages = 0;
+
+ for (i = 0; i < cma_area_count; i++) {
+ struct cma *cma = &cma_areas[i];
+
+ if (node_start_pfn <= cma->base_pfn &&
+ cma->base_pfn < node_end_pfn)
+ total_pages += cma->count;
+ }
+
+ return total_pages;
+}
+
phys_addr_t cma_get_base(struct cma *cma)
{
return PFN_PHYS(cma->base_pfn);
--
1.7.9.5

2015-02-12 07:32:37

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC 10/16] mm/highmem: remove is_highmem_idx()

We can use is_highmem() on every callsites of is_highmem_idx() so
is_highmem_idx() isn't really needed. And, if we introduce a new zone
for CMA, we need to modify it to adapt for new zone, so it's
inconvenient. Therefore, this patch remove it before introducing
a new zone.

Signed-off-by: Joonsoo Kim <[email protected]>
---
include/linux/mmzone.h | 18 ++++--------------
lib/show_mem.c | 2 +-
mm/page_alloc.c | 6 +++---
mm/vmscan.c | 2 +-
4 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ffe66e3..90237f2 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -854,16 +854,6 @@ static inline int zone_movable_is_highmem(void)
#endif
}

-static inline int is_highmem_idx(enum zone_type idx)
-{
-#ifdef CONFIG_HIGHMEM
- return (idx == ZONE_HIGHMEM ||
- (idx == ZONE_MOVABLE && zone_movable_is_highmem()));
-#else
- return 0;
-#endif
-}
-
/**
* is_highmem - helper function to quickly check if a struct zone is a
* highmem zone or not. This is an attempt to keep references
@@ -873,10 +863,10 @@ static inline int is_highmem_idx(enum zone_type idx)
static inline int is_highmem(struct zone *zone)
{
#ifdef CONFIG_HIGHMEM
- int zone_off = (char *)zone - (char *)zone->zone_pgdat->node_zones;
- return zone_off == ZONE_HIGHMEM * sizeof(*zone) ||
- (zone_off == ZONE_MOVABLE * sizeof(*zone) &&
- zone_movable_is_highmem());
+ int idx = zone_idx(zone);
+
+ return (idx == ZONE_HIGHMEM ||
+ (idx == ZONE_MOVABLE && zone_movable_is_highmem()));
#else
return 0;
#endif
diff --git a/lib/show_mem.c b/lib/show_mem.c
index 5e25627..f336c5b1 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -30,7 +30,7 @@ void show_mem(unsigned int filter)
total += zone->present_pages;
reserved += zone->present_pages - zone->managed_pages;

- if (is_highmem_idx(zoneid))
+ if (is_highmem(zone))
highmem += zone->present_pages;
}
pgdat_resize_unlock(pgdat, &flags);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7733663..416e036 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4151,7 +4151,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
INIT_LIST_HEAD(&page->lru);
#ifdef WANT_PAGE_VIRTUAL
/* The shift won't overflow because ZONE_NORMAL is below 4G. */
- if (!is_highmem_idx(zone))
+ if (!is_highmem(z))
set_page_address(page, __va(pfn << PAGE_SHIFT));
#endif
}
@@ -4881,7 +4881,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
zone_names[0], dma_reserve);
}

- if (!is_highmem_idx(j))
+ if (!is_highmem(zone))
nr_kernel_pages += freesize;
/* Charge for highmem memmap if there are enough kernel pages */
else if (nr_kernel_pages > memmap_pages * 2)
@@ -4895,7 +4895,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
* when the bootmem allocator frees pages into the buddy system.
* And all highmem pages will be managed by the buddy system.
*/
- zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;
+ zone->managed_pages = is_highmem(zone) ? realsize : freesize;
#ifdef CONFIG_NUMA
zone->node = nid;
zone->min_unmapped_pages = (freesize*sysctl_min_unmapped_ratio)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index dcb4707..30c34dc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3074,7 +3074,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
* has a highmem zone, force kswapd to reclaim from
* it to relieve lowmem pressure.
*/
- if (buffer_heads_over_limit && is_highmem_idx(i)) {
+ if (buffer_heads_over_limit && is_highmem(zone)) {
end_zone = i;
break;
}
--
1.7.9.5

2015-02-12 07:32:35

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC 11/16] mm/page_alloc: clean-up free_area_init_core()

Some of initialization step can be done without any further information.
If ZONE_CMA is introduced, it should be handled specially in
free_area_init_core() since it has not enough zone information. But,
some of data structure for ZONE_CMA should be initialized in this function
so this patch moves up these steps for preparation of ZONE_CMA.

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/page_alloc.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 416e036..6030525f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4845,11 +4845,19 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
init_waitqueue_head(&pgdat->kswapd_wait);
init_waitqueue_head(&pgdat->pfmemalloc_wait);
pgdat_page_cgroup_init(pgdat);
+ set_pageblock_order();

for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
unsigned long size, realsize, freesize, memmap_pages;

+ zone->name = zone_names[j];
+ spin_lock_init(&zone->lock);
+ spin_lock_init(&zone->lru_lock);
+ zone_seqlock_init(zone);
+ zone->zone_pgdat = pgdat;
+ lruvec_init(&zone->lruvec);
+
size = zone_spanned_pages_in_node(nid, j, node_start_pfn,
node_end_pfn, zones_size);
realsize = freesize = size - zone_absent_pages_in_node(nid, j,
@@ -4902,21 +4910,14 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
/ 100;
zone->min_slab_pages = (freesize * sysctl_min_slab_ratio) / 100;
#endif
- zone->name = zone_names[j];
- spin_lock_init(&zone->lock);
- spin_lock_init(&zone->lru_lock);
- zone_seqlock_init(zone);
- zone->zone_pgdat = pgdat;
zone_pcp_init(zone);

/* For bootup, initialized properly in watermark setup */
mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);

- lruvec_init(&zone->lruvec);
if (!size)
continue;

- set_pageblock_order();
setup_usemap(pgdat, zone, zone_start_pfn, size);
ret = init_currently_empty_zone(zone, zone_start_pfn,
size, MEMMAP_EARLY);
--
1.7.9.5

2015-02-12 07:30:19

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC 12/16] mm/cma: introduce new zone, ZONE_CMA

Currently, reserved pages for CMA are managed together with normal pages.
To distinguish them, we used migratetype, MIGRATE_CMA, and
do special handlings for this migratetype. But, it turns out that
there are too many problems with this approach and to fix all of them
needs many more hooks to page allocation and reclaim path so
some developers express their discomfort and problems on CMA aren't fixed
for a long time.

To terminate this situation and fix CMA problems, this patch implements
ZONE_CMA. Reserved pages for CMA will be managed in this new zone. This
approach will remove all exisiting hooks for MIGRATE_CMA and many
problems such as watermark check and reserved page utilization are
resolved itself.

This patch only add basic infrastructure of ZONE_CMA. In the following
patch, ZONE_CMA is actually populated and used.

Signed-off-by: Joonsoo Kim <[email protected]>
---
arch/x86/include/asm/sparsemem.h | 2 +-
arch/x86/mm/highmem_32.c | 3 +++
include/linux/gfp.h | 20 ++++++++----------
include/linux/mempolicy.h | 2 +-
include/linux/mmzone.h | 33 +++++++++++++++++++++++++++--
include/linux/page-flags-layout.h | 2 ++
include/linux/vm_event_item.h | 8 +++++++-
kernel/power/snapshot.c | 15 ++++++++++++++
mm/memory_hotplug.c | 3 +++
mm/mempolicy.c | 3 ++-
mm/page_alloc.c | 41 +++++++++++++++++++++++++++++++++----
mm/vmstat.c | 10 ++++++++-
12 files changed, 119 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h
index 4517d6b..ac169a8 100644
--- a/arch/x86/include/asm/sparsemem.h
+++ b/arch/x86/include/asm/sparsemem.h
@@ -25,7 +25,7 @@
# define MAX_PHYSMEM_BITS 32
# endif
#else /* CONFIG_X86_32 */
-# define SECTION_SIZE_BITS 27 /* matt - 128 is convenient right now */
+# define SECTION_SIZE_BITS 28
# define MAX_PHYSADDR_BITS 44
# define MAX_PHYSMEM_BITS 46
#endif
diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index 4500142..182e2b6 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -133,6 +133,9 @@ void __init set_highmem_pages_init(void)
if (!is_highmem(zone))
continue;

+ if (is_zone_cma(zone))
+ continue;
+
zone_start_pfn = zone->zone_start_pfn;
zone_end_pfn = zone_start_pfn + zone->spanned_pages;

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 41b30fd..619eb20 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -219,19 +219,15 @@ static inline int gfpflags_to_migratetype(const gfp_t gfp_flags)
* ZONES_SHIFT must be <= 2 on 32 bit platforms.
*/

-#if 16 * ZONES_SHIFT > BITS_PER_LONG
-#error ZONES_SHIFT too large to create GFP_ZONE_TABLE integer
-#endif
-
#define GFP_ZONE_TABLE ( \
- (ZONE_NORMAL << 0 * ZONES_SHIFT) \
- | (OPT_ZONE_DMA << ___GFP_DMA * ZONES_SHIFT) \
- | (OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * ZONES_SHIFT) \
- | (OPT_ZONE_DMA32 << ___GFP_DMA32 * ZONES_SHIFT) \
- | (ZONE_NORMAL << ___GFP_MOVABLE * ZONES_SHIFT) \
- | (OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * ZONES_SHIFT) \
- | (ZONE_MOVABLE << (___GFP_MOVABLE | ___GFP_HIGHMEM) * ZONES_SHIFT) \
- | (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * ZONES_SHIFT) \
+ ((u64)ZONE_NORMAL << 0 * ZONES_SHIFT) \
+ | ((u64)OPT_ZONE_DMA << ___GFP_DMA * ZONES_SHIFT) \
+ | ((u64)OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * ZONES_SHIFT) \
+ | ((u64)OPT_ZONE_DMA32 << ___GFP_DMA32 * ZONES_SHIFT) \
+ | ((u64)ZONE_NORMAL << ___GFP_MOVABLE * ZONES_SHIFT) \
+ | ((u64)OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * ZONES_SHIFT) \
+ | ((u64)ZONE_MOVABLE << (___GFP_MOVABLE|___GFP_HIGHMEM) * ZONES_SHIFT)\
+ | ((u64)OPT_ZONE_DMA32 << (___GFP_MOVABLE|___GFP_DMA32) * ZONES_SHIFT)\
)

/*
diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 3d385c8..ed01227 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -157,7 +157,7 @@ extern enum zone_type policy_zone;

static inline void check_highest_zone(enum zone_type k)
{
- if (k > policy_zone && k != ZONE_MOVABLE)
+ if (k > policy_zone && k != ZONE_MOVABLE && !is_zone_cma_idx(k))
policy_zone = k;
}

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 90237f2..991e20e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -319,6 +319,9 @@ enum zone_type {
ZONE_HIGHMEM,
#endif
ZONE_MOVABLE,
+#ifdef CONFIG_CMA
+ ZONE_CMA,
+#endif
__MAX_NR_ZONES
};

@@ -854,8 +857,33 @@ static inline int zone_movable_is_highmem(void)
#endif
}

+static inline int is_zone_cma_idx(enum zone_type idx)
+{
+#ifdef CONFIG_CMA
+ return idx == ZONE_CMA;
+#else
+ return 0;
+#endif
+}
+
+static inline int is_zone_cma(struct zone *zone)
+{
+ int zone_idx = zone_idx(zone);
+
+ return is_zone_cma_idx(zone_idx);
+}
+
+static inline int zone_cma_is_highmem(void)
+{
+#ifdef CONFIG_HIGHMEM
+ return 1;
+#else
+ return 0;
+#endif
+}
+
/**
- * is_highmem - helper function to quickly check if a struct zone is a
+ * is_highmem - helper function to quickly check if a struct zone is a
* highmem zone or not. This is an attempt to keep references
* to ZONE_{DMA/NORMAL/HIGHMEM/etc} in general code to a minimum.
* @zone - pointer to struct zone variable
@@ -866,7 +894,8 @@ static inline int is_highmem(struct zone *zone)
int idx = zone_idx(zone);

return (idx == ZONE_HIGHMEM ||
- (idx == ZONE_MOVABLE && zone_movable_is_highmem()));
+ (idx == ZONE_MOVABLE && zone_movable_is_highmem()) ||
+ (is_zone_cma_idx(idx) && zone_cma_is_highmem()));
#else
return 0;
#endif
diff --git a/include/linux/page-flags-layout.h b/include/linux/page-flags-layout.h
index da52366..77b078c 100644
--- a/include/linux/page-flags-layout.h
+++ b/include/linux/page-flags-layout.h
@@ -17,6 +17,8 @@
#define ZONES_SHIFT 1
#elif MAX_NR_ZONES <= 4
#define ZONES_SHIFT 2
+#elif MAX_NR_ZONES <= 8
+#define ZONES_SHIFT 3
#else
#error ZONES_SHIFT -- too many zones configured adjust calculation
#endif
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 730334c..9e4e07a 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -19,7 +19,13 @@
#define HIGHMEM_ZONE(xx)
#endif

-#define FOR_ALL_ZONES(xx) DMA_ZONE(xx) DMA32_ZONE(xx) xx##_NORMAL HIGHMEM_ZONE(xx) , xx##_MOVABLE
+#ifdef CONFIG_CMA
+#define CMA_ZONE(xx) , xx##_CMA
+#else
+#define CMA_ZONE(xx)
+#endif
+
+#define FOR_ALL_ZONES(xx) DMA_ZONE(xx) DMA32_ZONE(xx) xx##_NORMAL HIGHMEM_ZONE(xx) , xx##_MOVABLE CMA_ZONE(xx)

enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
FOR_ALL_ZONES(PGALLOC),
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 791a618..0e875e8 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -520,6 +520,13 @@ static int create_mem_extents(struct list_head *list, gfp_t gfp_mask)
unsigned long zone_start, zone_end;
struct mem_extent *ext, *cur, *aux;

+ /*
+ * ZONE_CMA is a virtual zone and it's spanned is subset of
+ * other zone, so we don't need to make another mem_extents.
+ */
+ if (is_zone_cma(zone))
+ continue;
+
zone_start = zone->zone_start_pfn;
zone_end = zone_end_pfn(zone);

@@ -1060,6 +1067,14 @@ unsigned int snapshot_additional_pages(struct zone *zone)
{
unsigned int rtree, nodes;

+ /*
+ * Estimation of needed pages for ZONE_CMA is already reflected
+ * when calculating other zones since ZONE_CMA is a virtual zone and
+ * it's span is subset of other zone.
+ */
+ if (is_zone_cma(zone))
+ return 0;
+
rtree = nodes = DIV_ROUND_UP(zone->spanned_pages, BM_BITS_PER_BLOCK);
rtree += DIV_ROUND_UP(rtree * sizeof(struct rtree_node),
LINKED_PAGE_DATA_SIZE);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1bf4807..569ce48 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1694,6 +1694,9 @@ static int __ref __offline_pages(unsigned long start_pfn,
if (zone_idx(zone) <= ZONE_NORMAL && !can_offline_normal(zone, nr_pages))
goto out;

+ if (is_zone_cma(zone))
+ goto out;
+
/* set above range as isolated */
ret = start_isolate_page_range(start_pfn, end_pfn,
MIGRATE_MOVABLE, true);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index e58725a..be21b5b 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1666,7 +1666,8 @@ static int apply_policy_zone(struct mempolicy *policy, enum zone_type zone)
{
enum zone_type dynamic_policy_zone = policy_zone;

- BUG_ON(dynamic_policy_zone == ZONE_MOVABLE);
+ BUG_ON(dynamic_policy_zone == ZONE_MOVABLE ||
+ is_zone_cma_idx(dynamic_policy_zone));

/*
* if policy->v.nodes has movable memory only,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6030525f..443f854 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -186,6 +186,9 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
32,
#endif
32,
+#ifdef CONFIG_CMA
+ 32,
+#endif
};

EXPORT_SYMBOL(totalram_pages);
@@ -202,6 +205,9 @@ static char * const zone_names[MAX_NR_ZONES] = {
"HighMem",
#endif
"Movable",
+#ifdef CONFIG_CMA
+ "CMA",
+#endif
};

int min_free_kbytes = 1024;
@@ -4106,6 +4112,15 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
unsigned long pfn;
struct zone *z;

+ /*
+ * ZONE_CMA is virtual zone and it's pages are belong to other zone
+ * now. Intialization of them will be done together with initialization
+ * of pages on the other zones. Later, we will move these pages
+ * to ZONE_CMA and reset zone attribute.
+ */
+ if (is_zone_cma_idx(zone))
+ return;
+
if (highest_memmap_pfn < end_pfn - 1)
highest_memmap_pfn = end_pfn - 1;

@@ -4541,7 +4556,7 @@ static void __init find_usable_zone_for_movable(void)
{
int zone_index;
for (zone_index = MAX_NR_ZONES - 1; zone_index >= 0; zone_index--) {
- if (zone_index == ZONE_MOVABLE)
+ if (zone_index == ZONE_MOVABLE || is_zone_cma_idx(zone_index))
continue;

if (arch_zone_highest_possible_pfn[zone_index] >
@@ -4833,8 +4848,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
{
enum zone_type j;
int nid = pgdat->node_id;
- unsigned long zone_start_pfn = pgdat->node_start_pfn;
int ret;
+ unsigned long zone_start_pfn = pgdat->node_start_pfn;
+ unsigned long first_zone_start_pfn = zone_start_pfn;
+ unsigned long last_zone_end_pfn = zone_start_pfn;

pgdat_resize_init(pgdat);
#ifdef CONFIG_NUMA_BALANCING
@@ -4858,6 +4875,16 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
zone->zone_pgdat = pgdat;
lruvec_init(&zone->lruvec);

+ if (is_zone_cma_idx(j)) {
+ BUG_ON(j != MAX_NR_ZONES - 1);
+
+ zone_start_pfn = first_zone_start_pfn;
+ size = last_zone_end_pfn - first_zone_start_pfn;
+ realsize = freesize = 0;
+ memmap_pages = 0;
+ goto init_zone;
+ }
+
size = zone_spanned_pages_in_node(nid, j, node_start_pfn,
node_end_pfn, zones_size);
realsize = freesize = size - zone_absent_pages_in_node(nid, j,
@@ -4896,6 +4923,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
nr_kernel_pages -= memmap_pages;
nr_all_pages += freesize;

+init_zone:
zone->spanned_pages = size;
zone->present_pages = realsize;
/*
@@ -4924,6 +4952,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
BUG_ON(ret);
memmap_init(size, nid, j, zone_start_pfn);
zone_start_pfn += size;
+ last_zone_end_pfn = zone_start_pfn;
}
}

@@ -5332,7 +5361,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
arch_zone_lowest_possible_pfn[0] = find_min_pfn_with_active_regions();
arch_zone_highest_possible_pfn[0] = max_zone_pfn[0];
for (i = 1; i < MAX_NR_ZONES; i++) {
- if (i == ZONE_MOVABLE)
+ if (i == ZONE_MOVABLE || is_zone_cma_idx(i))
continue;
arch_zone_lowest_possible_pfn[i] =
arch_zone_highest_possible_pfn[i-1];
@@ -5341,6 +5370,10 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
}
arch_zone_lowest_possible_pfn[ZONE_MOVABLE] = 0;
arch_zone_highest_possible_pfn[ZONE_MOVABLE] = 0;
+#ifdef CONFIG_CMA
+ arch_zone_lowest_possible_pfn[ZONE_CMA] = 0;
+ arch_zone_highest_possible_pfn[ZONE_CMA] = 0;
+#endif

/* Find the PFNs that ZONE_MOVABLE begins at in each node */
memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
@@ -5349,7 +5382,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
/* Print out the zone ranges */
printk("Zone ranges:\n");
for (i = 0; i < MAX_NR_ZONES; i++) {
- if (i == ZONE_MOVABLE)
+ if (i == ZONE_MOVABLE || is_zone_cma_idx(i))
continue;
printk(KERN_CONT " %-8s ", zone_names[i]);
if (arch_zone_lowest_possible_pfn[i] ==
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 7a4ac8e..b362b8f 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -747,8 +747,16 @@ static void walk_zones_in_node(struct seq_file *m, pg_data_t *pgdat,
#define TEXT_FOR_HIGHMEM(xx)
#endif

+#ifdef CONFIG_CMA
+#define TEXT_FOR_CMA(xx) xx "_cma",
+#else
+#define TEXT_FOR_CMA(xx)
+#endif
+
+
#define TEXTS_FOR_ZONES(xx) TEXT_FOR_DMA(xx) TEXT_FOR_DMA32(xx) xx "_normal", \
- TEXT_FOR_HIGHMEM(xx) xx "_movable",
+ TEXT_FOR_HIGHMEM(xx) xx "_movable", \
+ TEXT_FOR_CMA(xx)

const char * const vmstat_text[] = {
/* enum zone_stat_item countes */
--
1.7.9.5

2015-02-12 07:31:52

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC 13/16] mm/cma: populate ZONE_CMA and use this zone when GFP_HIGHUSERMOVABLE

Until now, reserved pages for CMA are managed altogether with normal
page in the same zone. This approach has numorous problems and fixing
them isn't easy. To fix this situation, ZONE_CMA is introduced in
previous patch, but, not yet populated. This patch implement population
of ZONE_CMA by stealing reserved pages from normal zones. This stealing
break one uncertain assumption on zone, that is, zone isn't overlapped.
In the early of this series, some check is inserted to every zone's span
iterator to handle zone overlap so there would be no problem with
this assumption break.

To utilize this zone, user should use GFP_HIGHUSERMOVABLE, because
these pages are only applicable for movable type and ZONE_CMA could
contain highmem.

Implementation itself is very easy to understand. Do steal when cma
area is initialized and recalculate values for per zone data structure.

Signed-off-by: Joonsoo Kim <[email protected]>
---
include/linux/gfp.h | 10 ++++++++--
include/linux/mm.h | 1 +
mm/cma.c | 23 ++++++++++++++++-------
mm/page_alloc.c | 42 +++++++++++++++++++++++++++++++++++++++---
4 files changed, 64 insertions(+), 12 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 619eb20..d125440 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -186,6 +186,12 @@ static inline int gfpflags_to_migratetype(const gfp_t gfp_flags)
#define OPT_ZONE_DMA32 ZONE_NORMAL
#endif

+#ifdef CONFIG_CMA
+#define OPT_ZONE_CMA ZONE_CMA
+#else
+#define OPT_ZONE_CMA ZONE_MOVABLE
+#endif
+
/*
* GFP_ZONE_TABLE is a word size bitstring that is used for looking up the
* zone to use given the lowest 4 bits of gfp_t. Entries are ZONE_SHIFT long
@@ -226,7 +232,7 @@ static inline int gfpflags_to_migratetype(const gfp_t gfp_flags)
| ((u64)OPT_ZONE_DMA32 << ___GFP_DMA32 * ZONES_SHIFT) \
| ((u64)ZONE_NORMAL << ___GFP_MOVABLE * ZONES_SHIFT) \
| ((u64)OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * ZONES_SHIFT) \
- | ((u64)ZONE_MOVABLE << (___GFP_MOVABLE|___GFP_HIGHMEM) * ZONES_SHIFT)\
+ | ((u64)OPT_ZONE_CMA << (___GFP_MOVABLE|___GFP_HIGHMEM) * ZONES_SHIFT)\
| ((u64)OPT_ZONE_DMA32 << (___GFP_MOVABLE|___GFP_DMA32) * ZONES_SHIFT)\
)

@@ -412,7 +418,7 @@ extern int alloc_contig_range(unsigned long start, unsigned long end,
extern void free_contig_range(unsigned long pfn, unsigned nr_pages);

/* CMA stuff */
-extern void init_cma_reserved_pageblock(struct page *page);
+extern void init_cma_reserved_pageblock(unsigned long pfn);

#endif

diff --git a/include/linux/mm.h b/include/linux/mm.h
index b464611..2d76446 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1731,6 +1731,7 @@ extern __printf(3, 4)
void warn_alloc_failed(gfp_t gfp_mask, int order, const char *fmt, ...);

extern void setup_per_cpu_pageset(void);
+extern void recalc_per_cpu_pageset(void);

extern void zone_pcp_update(struct zone *zone);
extern void zone_pcp_reset(struct zone *zone);
diff --git a/mm/cma.c b/mm/cma.c
index f817b91..267fa14 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -97,7 +97,7 @@ static int __init cma_activate_area(struct cma *cma)
int bitmap_size = BITS_TO_LONGS(cma_bitmap_maxno(cma)) * sizeof(long);
unsigned long base_pfn = cma->base_pfn, pfn = base_pfn;
unsigned i = cma->count >> pageblock_order;
- struct zone *zone;
+ int nid;

cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);

@@ -105,7 +105,7 @@ static int __init cma_activate_area(struct cma *cma)
return -ENOMEM;

WARN_ON_ONCE(!pfn_valid(pfn));
- zone = page_zone(pfn_to_page(pfn));
+ nid = page_to_nid(pfn_to_page(pfn));

do {
unsigned j;
@@ -115,16 +115,25 @@ static int __init cma_activate_area(struct cma *cma)
WARN_ON_ONCE(!pfn_valid(pfn));
/*
* alloc_contig_range requires the pfn range
- * specified to be in the same zone. Make this
- * simple by forcing the entire CMA resv range
- * to be in the same zone.
+ * specified to be in the same zone. We will
+ * achieve this goal by stealing pages from
+ * oridinary zone to ZONE_CMA. But, we need
+ * to make sure that entire CMA resv range to
+ * be in the same node. Otherwise, they could
+ * be on ZONE_CMA of different node.
*/
- if (page_zone(pfn_to_page(pfn)) != zone)
+ if (page_to_nid(pfn_to_page(pfn)) != nid)
goto err;
}
- init_cma_reserved_pageblock(pfn_to_page(base_pfn));
+ init_cma_reserved_pageblock(base_pfn);
} while (--i);

+ /*
+ * ZONE_CMA steals some managed pages from other zones,
+ * so we need to re-calculate pcp count for all zones.
+ */
+ recalc_per_cpu_pageset();
+
mutex_init(&cma->lock);
return 0;

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 443f854..f2844f0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -59,6 +59,7 @@
#include <linux/page-debug-flags.h>
#include <linux/hugetlb.h>
#include <linux/sched/rt.h>
+#include <linux/cma.h>

#include <asm/sections.h>
#include <asm/tlbflush.h>
@@ -807,16 +808,35 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
}

#ifdef CONFIG_CMA
+static void __init adjust_present_page_count(struct page *page, long count)
+{
+ struct zone *zone = page_zone(page);
+
+ zone->present_pages += count;
+}
+
/* Free whole pageblock and set its migration type to MIGRATE_CMA. */
-void __init init_cma_reserved_pageblock(struct page *page)
+void __init init_cma_reserved_pageblock(unsigned long pfn)
{
unsigned i = pageblock_nr_pages;
+ struct page *page = pfn_to_page(pfn);
struct page *p = page;
+ int nid = page_to_nid(page);
+
+ /*
+ * ZONE_CMA will steal present pages from other zones by changing
+ * page links, so adjust present_page count before stealing.
+ */
+ adjust_present_page_count(page, -pageblock_nr_pages);

do {
__ClearPageReserved(p);
set_page_count(p, 0);
- } while (++p, --i);
+
+ /* Steal page from other zones */
+ set_page_links(p, ZONE_CMA, nid, pfn);
+ mminit_verify_page_links(p, ZONE_CMA, nid, pfn);
+ } while (++p, ++pfn, --i);

set_pageblock_migratetype(page, MIGRATE_CMA);

@@ -4341,6 +4361,20 @@ void __init setup_per_cpu_pageset(void)
setup_zone_pageset(zone);
}

+void __init recalc_per_cpu_pageset(void)
+{
+ int cpu;
+ struct zone *zone;
+ struct per_cpu_pageset *pcp;
+
+ for_each_populated_zone(zone) {
+ for_each_possible_cpu(cpu) {
+ pcp = per_cpu_ptr(zone->pageset, cpu);
+ pageset_set_high_and_batch(zone, pcp);
+ }
+ }
+}
+
static noinline __init_refok
int zone_wait_table_init(struct zone *zone, unsigned long zone_size_pages)
{
@@ -4880,7 +4914,9 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,

zone_start_pfn = first_zone_start_pfn;
size = last_zone_end_pfn - first_zone_start_pfn;
- realsize = freesize = 0;
+ realsize = freesize =
+ cma_total_pages(first_zone_start_pfn,
+ last_zone_end_pfn);
memmap_pages = 0;
goto init_zone;
}
--
1.7.9.5

2015-02-12 07:31:53

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC 14/16] mm/cma: print stealed page count

Reserved pages for CMA could be on different zone. To figure out
memory map correctly, per zone number of stealed pages for CMA
would be needed.

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/cma.c | 28 +++++++++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/mm/cma.c b/mm/cma.c
index 267fa14..b165c1a 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -40,6 +40,8 @@ struct cma cma_areas[MAX_CMA_AREAS];
unsigned cma_area_count;
static DEFINE_MUTEX(cma_mutex);

+static unsigned long __initdata stealed_pages[MAX_NUMNODES][MAX_NR_ZONES];
+
unsigned long cma_total_pages(unsigned long node_start_pfn,
unsigned long node_end_pfn)
{
@@ -98,6 +100,7 @@ static int __init cma_activate_area(struct cma *cma)
unsigned long base_pfn = cma->base_pfn, pfn = base_pfn;
unsigned i = cma->count >> pageblock_order;
int nid;
+ int zone_index;

cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);

@@ -125,6 +128,8 @@ static int __init cma_activate_area(struct cma *cma)
if (page_to_nid(pfn_to_page(pfn)) != nid)
goto err;
}
+ zone_index = zone_idx(page_zone(pfn_to_page(base_pfn)));
+ stealed_pages[nid][zone_index] += pageblock_nr_pages;
init_cma_reserved_pageblock(base_pfn);
} while (--i);

@@ -145,7 +150,9 @@ err:

static int __init cma_init_reserved_areas(void)
{
- int i;
+ int i, j;
+ pg_data_t *pgdat;
+ struct zone *zone;

for (i = 0; i < cma_area_count; i++) {
int ret = cma_activate_area(&cma_areas[i]);
@@ -154,6 +161,25 @@ static int __init cma_init_reserved_areas(void)
return ret;
}

+ for (i = 0; i < MAX_NUMNODES; i++) {
+ for (j = 0; j < MAX_NR_ZONES; j++) {
+ if (stealed_pages[i][j])
+ goto print;
+ }
+ continue;
+
+print:
+ pgdat = NODE_DATA(i);
+ for (j = 0; j < MAX_NR_ZONES; j++) {
+ if (!stealed_pages[i][j])
+ continue;
+
+ zone = pgdat->node_zones + j;
+ pr_info("Steal %lu pages from %s\n",
+ stealed_pages[i][j], zone->name);
+ }
+ }
+
return 0;
}
core_initcall(cma_init_reserved_areas);
--
1.7.9.5

2015-02-12 07:31:25

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC 15/16] mm/cma: remove ALLOC_CMA

Now, reserved pages for CMA are on ZONE_CMA and it only serves for
MIGRATE_MOVABLE. Therefore, we don't need to consider ALLOC_CMA at all.

Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/compaction.c | 4 ----
mm/internal.h | 3 +--
mm/page_alloc.c | 16 ++--------------
3 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index f9792ba..b79134e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1312,10 +1312,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
if (!order || !may_enter_fs || !may_perform_io)
return COMPACT_SKIPPED;

-#ifdef CONFIG_CMA
- if (gfpflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
- alloc_flags |= ALLOC_CMA;
-#endif
/* Compact each zone in the list */
for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
nodemask) {
diff --git a/mm/internal.h b/mm/internal.h
index a4f90ba..9968dff 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -407,7 +407,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
#define ALLOC_HARDER 0x10 /* try to alloc harder */
#define ALLOC_HIGH 0x20 /* __GFP_HIGH set */
#define ALLOC_CPUSET 0x40 /* check for correct cpuset */
-#define ALLOC_CMA 0x80 /* allow allocations from CMA areas */
-#define ALLOC_FAIR 0x100 /* fair zone allocation */
+#define ALLOC_FAIR 0x80 /* fair zone allocation */

#endif /* __MM_INTERNAL_H */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f2844f0..551cc5b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1737,20 +1737,14 @@ static bool __zone_watermark_ok(struct zone *z, unsigned int order,
/* free_pages my go negative - that's OK */
long min = mark;
int o;
- long free_cma = 0;

free_pages -= (1 << order) - 1;
if (alloc_flags & ALLOC_HIGH)
min -= min / 2;
if (alloc_flags & ALLOC_HARDER)
min -= min / 4;
-#ifdef CONFIG_CMA
- /* If allocation can't use CMA areas don't use free CMA pages */
- if (!(alloc_flags & ALLOC_CMA))
- free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
-#endif

- if (free_pages - free_cma <= min + z->lowmem_reserve[classzone_idx])
+ if (free_pages <= min + z->lowmem_reserve[classzone_idx])
return false;
for (o = 0; o < order; o++) {
/* At the next order, this order's pages become unavailable */
@@ -2550,10 +2544,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
unlikely(test_thread_flag(TIF_MEMDIE))))
alloc_flags |= ALLOC_NO_WATERMARKS;
}
-#ifdef CONFIG_CMA
- if (gfpflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
- alloc_flags |= ALLOC_CMA;
-#endif
+
return alloc_flags;
}

@@ -2837,9 +2828,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
if (unlikely(!zonelist->_zonerefs->zone))
return NULL;

- if (IS_ENABLED(CONFIG_CMA) && migratetype == MIGRATE_MOVABLE)
- alloc_flags |= ALLOC_CMA;
-
retry_cpuset:
cpuset_mems_cookie = read_mems_allowed_begin();

--
1.7.9.5

2015-02-12 07:31:23

by Joonsoo Kim

[permalink] [raw]
Subject: [RFC 16/16] mm/cma: remove MIGRATE_CMA

Now, reserved pages for CMA are only on ZONE_CMA so we don't need to
use MIGRATE_CMA to distiguish CMA freepages and handle it differently.
So, this patch removes MIGRATE_CMA and also remove all related code.

Signed-off-by: Joonsoo Kim <[email protected]>
---
include/linux/gfp.h | 3 +-
include/linux/mmzone.h | 23 --------------
include/linux/vmstat.h | 8 -----
mm/cma.c | 2 +-
mm/compaction.c | 2 +-
mm/hugetlb.c | 2 +-
mm/page_alloc.c | 79 ++++++++++++++----------------------------------
mm/page_isolation.c | 5 ++-
mm/vmstat.c | 4 ---
9 files changed, 28 insertions(+), 100 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index d125440..1a6a5e2 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -413,8 +413,7 @@ static inline bool pm_suspended_storage(void)
#ifdef CONFIG_CMA

/* The below functions must be run on a range from a single zone. */
-extern int alloc_contig_range(unsigned long start, unsigned long end,
- unsigned migratetype);
+extern int alloc_contig_range(unsigned long start, unsigned long end);
extern void free_contig_range(unsigned long pfn, unsigned nr_pages);

/* CMA stuff */
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 991e20e..738b7f8 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -41,34 +41,12 @@ enum {
MIGRATE_MOVABLE,
MIGRATE_PCPTYPES, /* the number of types on the pcp lists */
MIGRATE_RESERVE = MIGRATE_PCPTYPES,
-#ifdef CONFIG_CMA
- /*
- * MIGRATE_CMA migration type is designed to mimic the way
- * ZONE_MOVABLE works. Only movable pages can be allocated
- * from MIGRATE_CMA pageblocks and page allocator never
- * implicitly change migration type of MIGRATE_CMA pageblock.
- *
- * The way to use it is to change migratetype of a range of
- * pageblocks to MIGRATE_CMA which can be done by
- * __free_pageblock_cma() function. What is important though
- * is that a range of pageblocks must be aligned to
- * MAX_ORDER_NR_PAGES should biggest page be bigger then
- * a single pageblock.
- */
- MIGRATE_CMA,
-#endif
#ifdef CONFIG_MEMORY_ISOLATION
MIGRATE_ISOLATE, /* can't allocate from here */
#endif
MIGRATE_TYPES
};

-#ifdef CONFIG_CMA
-# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
-#else
-# define is_migrate_cma(migratetype) false
-#endif
-
#define for_each_migratetype_order(order, type) \
for (order = 0; order < MAX_ORDER; order++) \
for (type = 0; type < MIGRATE_TYPES; type++)
@@ -156,7 +134,6 @@ enum zone_stat_item {
WORKINGSET_ACTIVATE,
WORKINGSET_NODERECLAIM,
NR_ANON_TRANSPARENT_HUGEPAGES,
- NR_FREE_CMA_PAGES,
NR_VM_ZONE_STAT_ITEMS };

/*
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 676488a..681f8ae 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -271,14 +271,6 @@ static inline void drain_zonestat(struct zone *zone,
struct per_cpu_pageset *pset) { }
#endif /* CONFIG_SMP */

-static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages,
- int migratetype)
-{
- __mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
- if (is_migrate_cma(migratetype))
- __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
-}
-
extern const char * const vmstat_text[];

#endif /* _LINUX_VMSTAT_H */
diff --git a/mm/cma.c b/mm/cma.c
index b165c1a..46d3e79 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -405,7 +405,7 @@ struct page *cma_alloc(struct cma *cma, int count, unsigned int align)

pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
mutex_lock(&cma_mutex);
- ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
+ ret = alloc_contig_range(pfn, pfn + count);
mutex_unlock(&cma_mutex);
if (ret == 0) {
page = pfn_to_page(pfn);
diff --git a/mm/compaction.c b/mm/compaction.c
index b79134e..1b9f18e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -64,7 +64,7 @@ static void map_pages(struct list_head *list)

static inline bool migrate_async_suitable(int migratetype)
{
- return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
+ return migratetype == MIGRATE_MOVABLE;
}

/*
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9fd7227..2ba5802 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -706,7 +706,7 @@ static int __alloc_gigantic_page(unsigned long start_pfn,
unsigned long nr_pages)
{
unsigned long end_pfn = start_pfn + nr_pages;
- return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+ return alloc_contig_range(start_pfn, end_pfn);
}

static bool pfn_range_valid_gigantic(unsigned long start_pfn,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 551cc5b..24c2ab5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -571,7 +571,7 @@ static inline void __free_one_page(struct page *page,
*/
max_order = min(MAX_ORDER, pageblock_order + 1);
} else {
- __mod_zone_freepage_state(zone, 1 << order, migratetype);
+ __mod_zone_page_state(zone, NR_FREE_PAGES, 1 << order);
}

page_idx = pfn & ((1 << max_order) - 1);
@@ -592,8 +592,8 @@ static inline void __free_one_page(struct page *page,
clear_page_guard_flag(buddy);
set_page_private(buddy, 0);
if (!is_migrate_isolate(migratetype)) {
- __mod_zone_freepage_state(zone, 1 << order,
- migratetype);
+ __mod_zone_page_state(zone, NR_FREE_PAGES,
+ 1 << order);
}
} else {
list_del(&buddy->lru);
@@ -815,7 +815,7 @@ static void __init adjust_present_page_count(struct page *page, long count)
zone->present_pages += count;
}

-/* Free whole pageblock and set its migration type to MIGRATE_CMA. */
+/* Free whole pageblock and set its migration type to MIGRATE_MOVABLE. */
void __init init_cma_reserved_pageblock(unsigned long pfn)
{
unsigned i = pageblock_nr_pages;
@@ -838,7 +838,7 @@ void __init init_cma_reserved_pageblock(unsigned long pfn)
mminit_verify_page_links(p, ZONE_CMA, nid, pfn);
} while (++p, ++pfn, --i);

- set_pageblock_migratetype(page, MIGRATE_CMA);
+ set_pageblock_migratetype(page, MIGRATE_MOVABLE);

if (pageblock_order >= MAX_ORDER) {
i = pageblock_nr_pages;
@@ -895,8 +895,8 @@ static inline void expand(struct zone *zone, struct page *page,
set_page_guard_flag(&page[size]);
set_page_private(&page[size], high);
/* Guard pages are not available for any usage */
- __mod_zone_freepage_state(zone, -(1 << high),
- migratetype);
+ __mod_zone_page_state(zone, NR_FREE_PAGES,
+ -(1 << high));
continue;
}
#endif
@@ -997,12 +997,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
static int fallbacks[MIGRATE_TYPES][4] = {
[MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE },
-#ifdef CONFIG_CMA
- [MIGRATE_MOVABLE] = { MIGRATE_CMA, MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
- [MIGRATE_CMA] = { MIGRATE_RESERVE }, /* Never used */
-#else
[MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-#endif
[MIGRATE_RESERVE] = { MIGRATE_RESERVE }, /* Never used */
#ifdef CONFIG_MEMORY_ISOLATION
[MIGRATE_ISOLATE] = { MIGRATE_RESERVE }, /* Never used */
@@ -1095,10 +1090,6 @@ static void change_pageblock_range(struct page *pageblock_page,
* allocation list. If falling back for a reclaimable kernel allocation, be
* more aggressive about taking ownership of free pages.
*
- * On the other hand, never change migration type of MIGRATE_CMA pageblocks
- * nor move CMA pages to different free lists. We don't want unmovable pages
- * to be allocated from MIGRATE_CMA areas.
- *
* Returns the new migratetype of the pageblock (or the same old migratetype
* if it was unchanged).
*/
@@ -1107,15 +1098,6 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,
{
int current_order = page_order(page);

- /*
- * When borrowing from MIGRATE_CMA, we need to release the excess
- * buddy pages to CMA itself. We also ensure the freepage_migratetype
- * is set to CMA so it is returned to the correct freelist in case
- * the page ends up being not actually allocated from the pcp lists.
- */
- if (is_migrate_cma(fallback_type))
- return fallback_type;
-
/* Take ownership for orders >= pageblock_order */
if (current_order >= pageblock_order) {
change_pageblock_range(page, current_order, start_type);
@@ -1182,8 +1164,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
new_type);
/* The freepage_migratetype may differ from pageblock's
* migratetype depending on the decisions in
- * try_to_steal_freepages. This is OK as long as it does
- * not differ for MIGRATE_CMA type.
+ * try_to_steal_freepages.
*/
set_freepage_migratetype(page, new_type);

@@ -1258,9 +1239,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
else
list_add_tail(&page->lru, list);
list = &page->lru;
- if (is_migrate_cma(get_freepage_migratetype(page)))
- __mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
- -(1 << order));
}
__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
spin_unlock(&zone->lock);
@@ -1521,7 +1499,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
return 0;

- __mod_zone_freepage_state(zone, -(1UL << order), mt);
+ __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
}

/* Remove page from free list */
@@ -1534,7 +1512,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
struct page *endpage = page + (1 << order) - 1;
for (; page < endpage; page += pageblock_nr_pages) {
int mt = get_pageblock_migratetype(page);
- if (!is_migrate_isolate(mt) && !is_migrate_cma(mt))
+ if (!is_migrate_isolate(mt))
set_pageblock_migratetype(page,
MIGRATE_MOVABLE);
}
@@ -1626,8 +1604,7 @@ again:
spin_unlock(&zone->lock);
if (!page)
goto failed;
- __mod_zone_freepage_state(zone, -(1 << order),
- get_freepage_migratetype(page));
+ __mod_zone_page_state(zone, NR_FREE_PAGES, -(1 << order));
}

__mod_zone_page_state(zone, NR_ALLOC_BATCH, -(1 << order));
@@ -3179,9 +3156,6 @@ static void show_migration_types(unsigned char type)
[MIGRATE_RECLAIMABLE] = 'E',
[MIGRATE_MOVABLE] = 'M',
[MIGRATE_RESERVE] = 'R',
-#ifdef CONFIG_CMA
- [MIGRATE_CMA] = 'C',
-#endif
#ifdef CONFIG_MEMORY_ISOLATION
[MIGRATE_ISOLATE] = 'I',
#endif
@@ -3233,8 +3207,7 @@ void show_free_areas(unsigned int filter)
" unevictable:%lu"
" dirty:%lu writeback:%lu unstable:%lu\n"
" free:%lu slab_reclaimable:%lu slab_unreclaimable:%lu\n"
- " mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n"
- " free_cma:%lu\n",
+ " mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n",
global_page_state(NR_ACTIVE_ANON),
global_page_state(NR_INACTIVE_ANON),
global_page_state(NR_ISOLATED_ANON),
@@ -3251,8 +3224,7 @@ void show_free_areas(unsigned int filter)
global_page_state(NR_FILE_MAPPED),
global_page_state(NR_SHMEM),
global_page_state(NR_PAGETABLE),
- global_page_state(NR_BOUNCE),
- global_page_state(NR_FREE_CMA_PAGES));
+ global_page_state(NR_BOUNCE));

for_each_populated_zone(zone) {
int i;
@@ -3285,7 +3257,6 @@ void show_free_areas(unsigned int filter)
" pagetables:%lukB"
" unstable:%lukB"
" bounce:%lukB"
- " free_cma:%lukB"
" writeback_tmp:%lukB"
" pages_scanned:%lu"
" all_unreclaimable? %s"
@@ -3316,7 +3287,6 @@ void show_free_areas(unsigned int filter)
K(zone_page_state(zone, NR_PAGETABLE)),
K(zone_page_state(zone, NR_UNSTABLE_NFS)),
K(zone_page_state(zone, NR_BOUNCE)),
- K(zone_page_state(zone, NR_FREE_CMA_PAGES)),
K(zone_page_state(zone, NR_WRITEBACK_TEMP)),
K(zone_page_state(zone, NR_PAGES_SCANNED)),
(!zone_reclaimable(zone) ? "yes" : "no")
@@ -6224,7 +6194,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
if (zone_idx(zone) == ZONE_MOVABLE)
return false;
mt = get_pageblock_migratetype(page);
- if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
+ if (mt == MIGRATE_MOVABLE)
return false;

pfn = page_to_pfn(page);
@@ -6372,15 +6342,11 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
* alloc_contig_range() -- tries to allocate given range of pages
* @start: start PFN to allocate
* @end: one-past-the-last PFN to allocate
- * @migratetype: migratetype of the underlaying pageblocks (either
- * #MIGRATE_MOVABLE or #MIGRATE_CMA). All pageblocks
- * in range must have the same migratetype and it must
- * be either of the two.
*
* The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
* aligned, however it's the caller's responsibility to guarantee that
* we are the only thread that changes migrate type of pageblocks the
- * pages fall in.
+ * pages fall in and it should be MIGRATE_MOVABLE.
*
* The PFN range must belong to a single zone.
*
@@ -6388,8 +6354,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
* pages which PFN is in [start, end) are allocated for the caller and
* need to be freed with free_contig_range().
*/
-int alloc_contig_range(unsigned long start, unsigned long end,
- unsigned migratetype)
+int alloc_contig_range(unsigned long start, unsigned long end)
{
unsigned long outer_start, outer_end;
int ret = 0, order;
@@ -6421,14 +6386,14 @@ int alloc_contig_range(unsigned long start, unsigned long end,
* allocator removing them from the buddy system. This way
* page allocator will never consider using them.
*
- * This lets us mark the pageblocks back as
- * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
- * aligned range but not in the unaligned, original range are
- * put back to page allocator so that buddy can use them.
+ * This lets us mark the pageblocks back as MIGRATE_MOVABLE
+ * so that free pages in the aligned range but not in the
+ * unaligned, original range are put back to page allocator
+ * so that buddy can use them.
*/

ret = start_isolate_page_range(pfn_max_align_down(start),
- pfn_max_align_up(end), migratetype,
+ pfn_max_align_up(end), MIGRATE_MOVABLE,
false);
if (ret)
return ret;
@@ -6490,7 +6455,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,

done:
undo_isolate_page_range(pfn_max_align_down(start),
- pfn_max_align_up(end), migratetype);
+ pfn_max_align_up(end), MIGRATE_MOVABLE);
return ret;
}

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 883e78d..bc1777a 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -57,13 +57,12 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
out:
if (!ret) {
unsigned long nr_pages;
- int migratetype = get_pageblock_migratetype(page);

set_pageblock_migratetype(page, MIGRATE_ISOLATE);
zone->nr_isolate_pageblock++;
nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);

- __mod_zone_freepage_state(zone, -nr_pages, migratetype);
+ __mod_zone_page_state(zone, NR_FREE_PAGES, -nr_pages);
}

spin_unlock_irqrestore(&zone->lock, flags);
@@ -116,7 +115,7 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype)
*/
if (!isolated_page) {
nr_pages = move_freepages_block(zone, page, migratetype);
- __mod_zone_freepage_state(zone, nr_pages, migratetype);
+ __mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
}
set_pageblock_migratetype(page, migratetype);
zone->nr_isolate_pageblock--;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index b362b8f..f3285d2 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -677,9 +677,6 @@ static char * const migratetype_names[MIGRATE_TYPES] = {
"Reclaimable",
"Movable",
"Reserve",
-#ifdef CONFIG_CMA
- "CMA",
-#endif
#ifdef CONFIG_MEMORY_ISOLATION
"Isolate",
#endif
@@ -801,7 +798,6 @@ const char * const vmstat_text[] = {
"workingset_activate",
"workingset_nodereclaim",
"nr_anon_transparent_hugepages",
- "nr_free_cma",

/* enum writeback_stat_item counters */
"nr_dirty_threshold",
--
1.7.9.5

2015-02-13 06:40:13

by Gioh Kim

[permalink] [raw]
Subject: Re: [RFC 07/16] mm/page_isolation: watch out zone range overlap


> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index c8778f7..883e78d 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -210,8 +210,8 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> * Returns 1 if all pages in the range are isolated.
> */
> static int
> -__test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
> - bool skip_hwpoisoned_pages)
> +__test_page_isolated_in_pageblock(struct zone *zone, unsigned long pfn,
> + unsigned long end_pfn, bool skip_hwpoisoned_pages)
> {
> struct page *page;
>
> @@ -221,6 +221,9 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
> continue;
> }
> page = pfn_to_page(pfn);
> + if (page_zone(page) != zone)
> + break;
> +
> if (PageBuddy(page)) {
> /*
> * If race between isolatation and allocation happens,
> @@ -281,7 +284,7 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
> /* Check all pages are free or marked as ISOLATED */
> zone = page_zone(page);
> spin_lock_irqsave(&zone->lock, flags);
> - ret = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
> + ret = __test_page_isolated_in_pageblock(zone, start_pfn, end_pfn,
> skip_hwpoisoned_pages);
> spin_unlock_irqrestore(&zone->lock, flags);
> return ret ? 0 : -EBUSY;
>

What about checking zone at test_pages_isolated?
It might be a little bit early and without locking zone.

@@ -273,8 +273,14 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
* are not aligned to pageblock_nr_pages.
* Then we just check migratetype first.
*/
+
+ zone = page_zone(__first_valid_page(start_pfn, pageblock_nr_pages));
+
for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
page = __first_valid_page(pfn, pageblock_nr_pages);
+
+ if (page_zone(page) != zone)
+ break;
if (page && get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
break;
}

2015-02-14 05:02:21

by Gioh Kim

[permalink] [raw]
Subject: Re: [RFC 13/16] mm/cma: populate ZONE_CMA and use this zone when GFP_HIGHUSERMOVABLE



2015-02-12 ???? 4:32?? Joonsoo Kim ??(??) ?? ??:
> Until now, reserved pages for CMA are managed altogether with normal
> page in the same zone. This approach has numorous problems and fixing
> them isn't easy. To fix this situation, ZONE_CMA is introduced in
> previous patch, but, not yet populated. This patch implement population
> of ZONE_CMA by stealing reserved pages from normal zones. This stealing
> break one uncertain assumption on zone, that is, zone isn't overlapped.
> In the early of this series, some check is inserted to every zone's span
> iterator to handle zone overlap so there would be no problem with
> this assumption break.
>
> To utilize this zone, user should use GFP_HIGHUSERMOVABLE, because

I think it might be typo of GFP_HIGHUSER_MOVABLE.

> these pages are only applicable for movable type and ZONE_CMA could
> contain highmem.
>

2015-02-17 05:22:17

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [RFC 07/16] mm/page_isolation: watch out zone range overlap

On Fri, Feb 13, 2015 at 03:40:08PM +0900, Gioh Kim wrote:
>
> > diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> > index c8778f7..883e78d 100644
> > --- a/mm/page_isolation.c
> > +++ b/mm/page_isolation.c
> > @@ -210,8 +210,8 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> > * Returns 1 if all pages in the range are isolated.
> > */
> > static int
> > -__test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
> > - bool skip_hwpoisoned_pages)
> > +__test_page_isolated_in_pageblock(struct zone *zone, unsigned long pfn,
> > + unsigned long end_pfn, bool skip_hwpoisoned_pages)
> > {
> > struct page *page;
> >
> > @@ -221,6 +221,9 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
> > continue;
> > }
> > page = pfn_to_page(pfn);
> > + if (page_zone(page) != zone)
> > + break;
> > +
> > if (PageBuddy(page)) {
> > /*
> > * If race between isolatation and allocation happens,
> > @@ -281,7 +284,7 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
> > /* Check all pages are free or marked as ISOLATED */
> > zone = page_zone(page);
> > spin_lock_irqsave(&zone->lock, flags);
> > - ret = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
> > + ret = __test_page_isolated_in_pageblock(zone, start_pfn, end_pfn,
> > skip_hwpoisoned_pages);
> > spin_unlock_irqrestore(&zone->lock, flags);
> > return ret ? 0 : -EBUSY;
> >
>
> What about checking zone at test_pages_isolated?
> It might be a little bit early and without locking zone.

Hello,

Will do in next spin.

Thanks.

2015-02-17 05:22:37

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [RFC 13/16] mm/cma: populate ZONE_CMA and use this zone when GFP_HIGHUSERMOVABLE

On Sat, Feb 14, 2015 at 02:02:16PM +0900, Gioh Kim wrote:
>
>
> 2015-02-12 오후 4:32에 Joonsoo Kim 이(가) 쓴 글:
> > Until now, reserved pages for CMA are managed altogether with normal
> > page in the same zone. This approach has numorous problems and fixing
> > them isn't easy. To fix this situation, ZONE_CMA is introduced in
> > previous patch, but, not yet populated. This patch implement population
> > of ZONE_CMA by stealing reserved pages from normal zones. This stealing
> > break one uncertain assumption on zone, that is, zone isn't overlapped.
> > In the early of this series, some check is inserted to every zone's span
> > iterator to handle zone overlap so there would be no problem with
> > this assumption break.
> >
> > To utilize this zone, user should use GFP_HIGHUSERMOVABLE, because
>
> I think it might be typo of GFP_HIGHUSER_MOVABLE.
>

Yes, I will correct next time.

Thanks.