2014-01-09 07:04:33

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 0/7] improve robustness on handling migratetype

Hello,

I found some weaknesses on handling migratetype during code review and
testing CMA.

First, we don't have any synchronization method on get/set pageblock
migratetype. When we change migratetype, we hold the zone lock. So
writer-writer race doesn't exist. But while someone changes migratetype,
others can get migratetype. This may introduce totally unintended value
as migratetype. Although I haven't heard of any problem report about
that, it is better to protect properly.

Second, (get/set)_freepage_migrate isn't used properly. I guess that it
would be introduced for per cpu page(pcp) performance, but, it is also
used by memory isolation, now. For that case, the information isn't
enough to use, so we need to fix it.

Third, there is the problem on buddy allocator. It doesn't consider
migratetype when merging buddy, so pages from cma or isolate region can
be moved to other migratetype freelist. It makes CMA failed over and over.
To prevent it, the buddy allocator should consider migratetype if
CMA/ISOLATE is enabled.

This patchset is aimed at fixing these problems and based on v3.13-rc7.

Thanks.

Joonsoo Kim (7):
mm/page_alloc: synchronize get/set pageblock
mm/cma: fix cma free page accounting
mm/page_alloc: move set_freepage_migratetype() to better place
mm/isolation: remove invalid check condition
mm/page_alloc: separate interface to set/get migratetype of freepage
mm/page_alloc: store freelist migratetype to the page on buddy
properly
mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy

include/linux/mm.h | 35 +++++++++++++++++++++---
include/linux/mmzone.h | 2 ++
include/linux/page-isolation.h | 1 -
mm/page_alloc.c | 59 ++++++++++++++++++++++++++--------------
mm/page_isolation.c | 5 +---
5 files changed, 73 insertions(+), 29 deletions(-)

--
1.7.9.5


2014-01-09 07:05:00

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 1/7] mm/page_alloc: synchronize get/set pageblock

Now get/set pageblock is done without any syncronization. Therefore
there is race condition and migratetype can be unintended value.
Sometime we move some pageblocks from one migratetype to the other
type, and, at the sametime, some page in this pageblock could be
freed. In this case, we can get totally unintended value,
since get/set pageblock don't get/set atomically. Instead, it is
accessed in bit unit.

Since set pageblock isn't used frequently rather than get pageblock,
I think that seqlock is proper method to synchronize it. This type
of lock has minimum overhead if there are a lot of readers and few
of writers. So it fits to this situation.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index bd791e4..feaa607 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -79,6 +79,7 @@ static inline int get_pageblock_migratetype(struct page *page)
{
return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
}
+void set_pageblock_migratetype(struct page *page, int migratetype);

struct free_area {
struct list_head free_list[MIGRATE_TYPES];
@@ -367,6 +368,7 @@ struct zone {
#endif
struct free_area free_area[MAX_ORDER];

+ seqlock_t pageblock_seqlock;
#ifndef CONFIG_SPARSEMEM
/*
* Flags for a pageblock_nr_pages block. See pageblock-flags.h.
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 3fff8e7..58e2a89 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -23,7 +23,6 @@ static inline bool is_migrate_isolate(int migratetype)

bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
bool skip_hwpoisoned_pages);
-void set_pageblock_migratetype(struct page *page, int migratetype);
int move_freepages_block(struct zone *zone, struct page *page,
int migratetype);
int move_freepages(struct zone *zone,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5248fe0..b36aa5a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4788,6 +4788,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
spin_lock_init(&zone->lock);
spin_lock_init(&zone->lru_lock);
zone_seqlock_init(zone);
+ seqlock_init(&zone->pageblock_seqlock);
zone->zone_pgdat = pgdat;
zone_pcp_init(zone);

@@ -5927,15 +5928,19 @@ unsigned long get_pageblock_flags_group(struct page *page,
unsigned long pfn, bitidx;
unsigned long flags = 0;
unsigned long value = 1;
+ unsigned int seq;

zone = page_zone(page);
pfn = page_to_pfn(page);
bitmap = get_pageblock_bitmap(zone, pfn);
bitidx = pfn_to_bitidx(zone, pfn);

- for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1)
- if (test_bit(bitidx + start_bitidx, bitmap))
- flags |= value;
+ do {
+ seq = read_seqbegin(&zone->pageblock_seqlock);
+ for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1)
+ if (test_bit(bitidx + start_bitidx, bitmap))
+ flags |= value;
+ } while (read_seqretry(&zone->pageblock_seqlock, seq));

return flags;
}
@@ -5954,6 +5959,7 @@ void set_pageblock_flags_group(struct page *page, unsigned long flags,
unsigned long *bitmap;
unsigned long pfn, bitidx;
unsigned long value = 1;
+ unsigned long irq_flags;

zone = page_zone(page);
pfn = page_to_pfn(page);
@@ -5961,11 +5967,13 @@ void set_pageblock_flags_group(struct page *page, unsigned long flags,
bitidx = pfn_to_bitidx(zone, pfn);
VM_BUG_ON(!zone_spans_pfn(zone, pfn));

+ write_seqlock_irqsave(&zone->pageblock_seqlock, irq_flags);
for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1)
if (flags & value)
__set_bit(bitidx + start_bitidx, bitmap);
else
__clear_bit(bitidx + start_bitidx, bitmap);
+ write_sequnlock_irqrestore(&zone->pageblock_seqlock, irq_flags);
}

/*
--
1.7.9.5

2014-01-09 07:05:13

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 5/7] mm/page_alloc: separate interface to set/get migratetype of freepage

Currently, we use (set/get)_freepage_migratetype in two use cases.
One is to know the buddy list where this page will be linked and
the other is to know the buddy list where this page is linked now.

But, we should deal these two use cases differently, because information
isn't sufficient for the second use case and properly setting this
information needs some overhead. Whenever the page is merged or split
in buddy, this information isn't properly re-assigned and it may not
have enough information for the second use case.

This patch just separates interface, so there is no functional change.
Following patch will do further steps about this issue.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3552717..2733e0b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -257,14 +257,31 @@ struct inode;
#define page_private(page) ((page)->private)
#define set_page_private(page, v) ((page)->private = (v))

-/* It's valid only if the page is free path or free_list */
-static inline void set_freepage_migratetype(struct page *page, int migratetype)
+/*
+ * It's valid only if the page is on buddy. It represents
+ * which freelist the page is linked.
+ */
+static inline void set_buddy_migratetype(struct page *page, int migratetype)
+{
+ page->index = migratetype;
+}
+
+static inline int get_buddy_migratetype(struct page *page)
+{
+ return page->index;
+}
+
+/*
+ * It's valid only if the page is on pcp list. It represents
+ * which freelist the page should go on buddy.
+ */
+static inline void set_pcp_migratetype(struct page *page, int migratetype)
{
page->index = migratetype;
}

-/* It's valid only if the page is free path or free_list */
-static inline int get_freepage_migratetype(struct page *page)
+/* It's valid only if the page is on pcp list */
+static inline int get_pcp_migratetype(struct page *page)
{
return page->index;
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4913829..c9e6622 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -681,7 +681,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
page = list_entry(list->prev, struct page, lru);
/* must delete as __free_one_page list manipulates */
list_del(&page->lru);
- mt = get_freepage_migratetype(page);
+ mt = get_pcp_migratetype(page);
/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
__free_one_page(page, zone, 0, mt);
trace_mm_page_pcpu_drain(page, 0, mt);
@@ -745,7 +745,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
local_irq_save(flags);
__count_vm_events(PGFREE, 1 << order);
migratetype = get_pageblock_migratetype(page);
- set_freepage_migratetype(page, migratetype);
+ set_buddy_migratetype(page, migratetype);
free_one_page(page_zone(page), page, order, migratetype);
local_irq_restore(flags);
}
@@ -903,7 +903,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
rmv_page_order(page);
area->nr_free--;
expand(zone, page, order, current_order, area, migratetype);
- set_freepage_migratetype(page, migratetype);
+ set_pcp_migratetype(page, migratetype);
return page;
}

@@ -971,7 +971,7 @@ int move_freepages(struct zone *zone,
order = page_order(page);
list_move(&page->lru,
&zone->free_area[order].free_list[migratetype]);
- set_freepage_migratetype(page, migratetype);
+ set_buddy_migratetype(page, migratetype);
page += 1 << order;
pages_moved += 1 << order;
}
@@ -1094,12 +1094,11 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)

/* CMA pages cannot be stolen */
if (is_migrate_cma(migratetype)) {
- set_freepage_migratetype(page, migratetype);
+ set_pcp_migratetype(page, migratetype);
__mod_zone_page_state(zone,
NR_FREE_CMA_PAGES, -(1 << order));
} else {
- set_freepage_migratetype(page,
- start_migratetype);
+ set_pcp_migratetype(page, start_migratetype);
}

/* Remove the page from the freelists */
@@ -1346,7 +1345,7 @@ void free_hot_cold_page(struct page *page, int cold)
return;

migratetype = get_pageblock_migratetype(page);
- set_freepage_migratetype(page, migratetype);
+ set_pcp_migratetype(page, migratetype);
local_irq_save(flags);
__count_vm_event(PGFREE);

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 534fb3a..c341413 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -190,7 +190,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
* is MIGRATE_ISOLATE. Catch it and move the page into
* MIGRATE_ISOLATE list.
*/
- if (get_freepage_migratetype(page) != MIGRATE_ISOLATE) {
+ if (get_buddy_migratetype(page) != MIGRATE_ISOLATE) {
struct page *end_page;

end_page = page + (1 << page_order(page)) - 1;
--
1.7.9.5

2014-01-09 07:05:22

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 3/7] mm/page_alloc: move set_freepage_migratetype() to better place

set_freepage_migratetype() inform us of the buddy freelist where
the page should be linked when it goes to buddy freelist.

Now, it has done in rmqueue_bulk() so that we should call
get_pageblock_migratetype() to know it's migratetype exactly if
CONFIG_CMA is enabled. That function has some overhead so that
removing it is preferable.

To remove it, we move set_freepage_migratetype() to __rmqueue_fallback()
and __rmqueue_smallest(). In those functions, we can know migratetype
easily so that we don't need to call get_pageblock_migratetype().

Removing is_migrate_isolate() is safe since what we want to ensure is
that the page from cma will not go to other migratetype freelist.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1489c301..4913829 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -903,6 +903,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
rmv_page_order(page);
area->nr_free--;
expand(zone, page, order, current_order, area, migratetype);
+ set_freepage_migratetype(page, migratetype);
return page;
}

@@ -1093,8 +1094,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)

/* CMA pages cannot be stolen */
if (is_migrate_cma(migratetype)) {
+ set_freepage_migratetype(page, migratetype);
__mod_zone_page_state(zone,
NR_FREE_CMA_PAGES, -(1 << order));
+ } else {
+ set_freepage_migratetype(page,
+ start_migratetype);
}

/* Remove the page from the freelists */
@@ -1153,7 +1158,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
unsigned long count, struct list_head *list,
int migratetype, int cold)
{
- int mt = migratetype, i;
+ int i;

spin_lock(&zone->lock);
for (i = 0; i < count; ++i) {
@@ -1174,12 +1179,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
list_add(&page->lru, list);
else
list_add_tail(&page->lru, list);
- if (IS_ENABLED(CONFIG_CMA)) {
- mt = get_pageblock_migratetype(page);
- if (!is_migrate_cma(mt) && !is_migrate_isolate(mt))
- mt = migratetype;
- }
- set_freepage_migratetype(page, mt);
list = &page->lru;
}
__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
--
1.7.9.5

2014-01-09 07:05:17

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 2/7] mm/cma: fix cma free page accounting

Cma pages can be allocated by not only order 0 request but also high order
request. So, we should consider to account free cma page in the both
places.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b36aa5a..1489c301 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1091,6 +1091,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
start_migratetype,
migratetype);

+ /* CMA pages cannot be stolen */
+ if (is_migrate_cma(migratetype)) {
+ __mod_zone_page_state(zone,
+ NR_FREE_CMA_PAGES, -(1 << order));
+ }
+
/* Remove the page from the freelists */
list_del(&page->lru);
rmv_page_order(page);
@@ -1175,9 +1181,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
}
set_freepage_migratetype(page, mt);
list = &page->lru;
- if (is_migrate_cma(mt))
- __mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
- -(1 << order));
}
__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
spin_unlock(&zone->lock);
--
1.7.9.5

2014-01-09 07:05:10

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 7/7] mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy

If (MAX_ORDER-1) is greater than pageblock order, there is a possibility
to merge different migratetype pages and to be linked in unintended
freelist.

While I test CMA, CMA pages are merged and linked into MOVABLE freelist
by above issue and then, the pages change their migratetype to UNMOVABLE by
try_to_steal_freepages(). After that, CMA to this region always fail.

To prevent this, we should not merge the page on MIGRATE_(CMA|ISOLATE)
freelist.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2548b42..ea99cee 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -581,6 +581,15 @@ static inline void __free_one_page(struct page *page,
__mod_zone_freepage_state(zone, 1 << order,
migratetype);
} else {
+ int buddy_mt = get_buddy_migratetype(buddy);
+
+ /* We don't want to merge cma, isolate pages */
+ if (unlikely(order >= pageblock_order) &&
+ migratetype != buddy_mt &&
+ (migratetype >= MIGRATE_PCPTYPES ||
+ buddy_mt >= MIGRATE_PCPTYPES)) {
+ break;
+ }
list_del(&buddy->lru);
zone->free_area[order].nr_free--;
rmv_page_order(buddy);
--
1.7.9.5

2014-01-09 07:05:06

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 6/7] mm/page_alloc: store freelist migratetype to the page on buddy properly

To maintain freelist migratetype information on buddy pages, migratetype
should be set again whenever the page order is changed. set_page_order()
is the best place to do, because it is called whenever the page order is
changed, so this patch adds set_buddy_migratetype() to set_page_order().

And this patch makes set/get_buddy_migratetype() only enabled if it is
really needed, because it has some overhead.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2733e0b..046e09f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -258,6 +258,12 @@ struct inode;
#define set_page_private(page, v) ((page)->private = (v))

/*
+ * This is for tracking the type of the list on buddy.
+ * It imposes some performance overhead to the buddy allocator,
+ * so we make it enabled only if it is needed.
+ */
+#if defined(CONFIG_MEMORY_ISOLATION) || defined(CONFIG_CMA)
+/*
* It's valid only if the page is on buddy. It represents
* which freelist the page is linked.
*/
@@ -270,6 +276,10 @@ static inline int get_buddy_migratetype(struct page *page)
{
return page->index;
}
+#else
+static inline void set_buddy_migratetype(struct page *page, int migratetype) {}
+static inline int get_buddy_migratetype(struct page *page) { return 0; }
+#endif

/*
* It's valid only if the page is on pcp list. It represents
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c9e6622..2548b42 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -446,9 +446,11 @@ static inline void set_page_guard_flag(struct page *page) { }
static inline void clear_page_guard_flag(struct page *page) { }
#endif

-static inline void set_page_order(struct page *page, int order)
+static inline void set_page_order(struct page *page, int order,
+ int migratetype)
{
set_page_private(page, order);
+ set_buddy_migratetype(page, migratetype);
__SetPageBuddy(page);
}

@@ -588,7 +590,7 @@ static inline void __free_one_page(struct page *page,
page_idx = combined_idx;
order++;
}
- set_page_order(page, order);
+ set_page_order(page, order, migratetype);

/*
* If this is not the largest possible page, check if the buddy
@@ -745,7 +747,6 @@ static void __free_pages_ok(struct page *page, unsigned int order)
local_irq_save(flags);
__count_vm_events(PGFREE, 1 << order);
migratetype = get_pageblock_migratetype(page);
- set_buddy_migratetype(page, migratetype);
free_one_page(page_zone(page), page, order, migratetype);
local_irq_restore(flags);
}
@@ -834,7 +835,7 @@ static inline void expand(struct zone *zone, struct page *page,
#endif
list_add(&page[size].lru, &area->free_list[migratetype]);
area->nr_free++;
- set_page_order(&page[size], high);
+ set_page_order(&page[size], high, migratetype);
}
}

--
1.7.9.5

2014-01-09 07:05:04

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH 4/7] mm/isolation: remove invalid check condition

test_page_isolated() checks stability of pages. It checks two conditions,
one is that the page is on isolate migratetype and the other is that
the page is on the buddy and the isolate freelist. With satisfying
these two conditions, we can determine that the page is stable and then
go forward.

__test_page_isolated_in_pageblock() is one of the main functions for this
test. In that function, if it meets the page with page_count 0 and
isolate migratetype, it decides that this page is stable. But this is not
true, because there is possiblity that this kind of page is on the pcp
and then it can be allocated by other users even though we hold the zone
lock. So removing this check.

Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index d1473b2..534fb3a 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -199,9 +199,6 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
}
pfn += 1 << page_order(page);
}
- else if (page_count(page) == 0 &&
- get_freepage_migratetype(page) == MIGRATE_ISOLATE)
- pfn += 1;
else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
/*
* The HWPoisoned page may be not in buddy
--
1.7.9.5

2014-01-09 09:27:36

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 0/7] improve robustness on handling migratetype

On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
> Hello,
>
> I found some weaknesses on handling migratetype during code review and
> testing CMA.
>
> First, we don't have any synchronization method on get/set pageblock
> migratetype. When we change migratetype, we hold the zone lock. So
> writer-writer race doesn't exist. But while someone changes migratetype,
> others can get migratetype. This may introduce totally unintended value
> as migratetype. Although I haven't heard of any problem report about
> that, it is better to protect properly.
>

This is deliberate. The migratetypes for the majority of users are advisory
and aimed for fragmentation avoidance. It was important that the cost of
that be kept as low as possible and the general case is that migration types
change very rarely. In many cases, the zone lock is held. In other cases,
such as splitting free pages, the cost is simply not justified.

I doubt there is any amount of data you could add in support that would
justify hammering the free fast paths (which call get_pageblock_type).

> Second, (get/set)_freepage_migrate isn't used properly. I guess that it
> would be introduced for per cpu page(pcp) performance, but, it is also
> used by memory isolation, now. For that case, the information isn't
> enough to use, so we need to fix it.
>
> Third, there is the problem on buddy allocator. It doesn't consider
> migratetype when merging buddy, so pages from cma or isolate region can
> be moved to other migratetype freelist. It makes CMA failed over and over.
> To prevent it, the buddy allocator should consider migratetype if
> CMA/ISOLATE is enabled.

Without loioing at the patches, this is likely to add some cost to the
page free fast path -- heavy cost if it's a pageblock lookup and lighter
cost if you are using cached page information which is potentially stale.
Why not force CMA regions to be aligned on MAX_ORDER_NR_PAGES boundary
instead to avoid any possibility of merging issues?

> This patchset is aimed at fixing these problems and based on v3.13-rc7.
>
> mm/page_alloc: synchronize get/set pageblock

cost with no justification.

> mm/cma: fix cma free page accounting

sounds like it would be a fix but unrelated to the leader and should be
seperated out on its own

> mm/page_alloc: move set_freepage_migratetype() to better place

Very vague. If this does something useful then it could do with a better
subject.

> mm/isolation: remove invalid check condition

Looks harmless.

> mm/page_alloc: separate interface to set/get migratetype of freepage
> mm/page_alloc: store freelist migratetype to the page on buddy
> properly

Potentially sounds useful

> mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy
>

Sounds unnecessary if CMA regions were MAX_ORDER_NR_PAGES aligned and
then the free paths would be unaffected for everybody.

I didn't look at the patches because it felt like cost without any supporting
justification for the patches. Superficially it looks like patch 1 needs to
go away and the last patch could be done without affected !CMA users. The
rest are potentially useful but there should have been some supporting
data on how it helps CMA with some backup showing that the page allocation
paths are not impacted as a result.

--
Mel Gorman
SUSE Labs

2014-01-09 14:05:30

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 0/7] improve robustness on handling migratetype

2014/1/9 Michal Nazarewicz <[email protected]>:
> On Thu, Jan 09 2014, Joonsoo Kim <[email protected]> wrote:
>> Third, there is the problem on buddy allocator. It doesn't consider
>> migratetype when merging buddy, so pages from cma or isolate region can
>> be moved to other migratetype freelist. It makes CMA failed over and over.
>> To prevent it, the buddy allocator should consider migratetype if
>> CMA/ISOLATE is enabled.
>
> There should never be situation where a CMA page shares a pageblock (or
> a max-order page) with a non-CMA page though, so this should never be an
> issue.

Right... It never happens.
When I ported CMA region reservation code to my own code for testing,
I made a mistake. Sorry for noise.

Thanks.

2014-01-09 21:10:39

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCH 2/7] mm/cma: fix cma free page accounting

On 1/8/2014 11:04 PM, Joonsoo Kim wrote:
> Cma pages can be allocated by not only order 0 request but also high order
> request. So, we should consider to account free cma page in the both
> places.
>
> Signed-off-by: Joonsoo Kim <[email protected]>
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b36aa5a..1489c301 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1091,6 +1091,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> start_migratetype,
> migratetype);
>
> + /* CMA pages cannot be stolen */
> + if (is_migrate_cma(migratetype)) {
> + __mod_zone_page_state(zone,
> + NR_FREE_CMA_PAGES, -(1 << order));
> + }
> +
> /* Remove the page from the freelists */
> list_del(&page->lru);
> rmv_page_order(page);
> @@ -1175,9 +1181,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
> }
> set_freepage_migratetype(page, mt);
> list = &page->lru;
> - if (is_migrate_cma(mt))
> - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
> - -(1 << order));
> }
> __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
> spin_unlock(&zone->lock);
>

Wouldn't this result in double counting? in the buffered_rmqueue non
zero ordered request we call __mod_zone_freepage_state which already
accounts for CMA pages if the migrate type is CMA so it seems like we
would get hit twice:

buffered_rmqueue
__rmqueue
__rmqueue_fallback
decrement
__mod_zone_freepage_state
decrement

Thanks,
Laura
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

2014-01-10 08:48:35

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 0/7] improve robustness on handling migratetype

On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote:
> On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
> > Hello,
> >
> > I found some weaknesses on handling migratetype during code review and
> > testing CMA.
> >
> > First, we don't have any synchronization method on get/set pageblock
> > migratetype. When we change migratetype, we hold the zone lock. So
> > writer-writer race doesn't exist. But while someone changes migratetype,
> > others can get migratetype. This may introduce totally unintended value
> > as migratetype. Although I haven't heard of any problem report about
> > that, it is better to protect properly.
> >
>
> This is deliberate. The migratetypes for the majority of users are advisory
> and aimed for fragmentation avoidance. It was important that the cost of
> that be kept as low as possible and the general case is that migration types
> change very rarely. In many cases, the zone lock is held. In other cases,
> such as splitting free pages, the cost is simply not justified.
>
> I doubt there is any amount of data you could add in support that would
> justify hammering the free fast paths (which call get_pageblock_type).

Hello, Mel.

There is a possibility that we can get unintended value such as 6 as migratetype
if reader-writer (get/set pageblock_migratetype) race happends. It can be
possible, because we read the value without any synchronization method. And
this migratetype, 6, has no place in buddy freelist, so array index overrun can
be possible and the system can break, although I haven't heard that it occurs.

I think that my solution is too expensive. However, I think that we need
solution. aren't we? Do you have any better idea?

>
> > Second, (get/set)_freepage_migrate isn't used properly. I guess that it
> > would be introduced for per cpu page(pcp) performance, but, it is also
> > used by memory isolation, now. For that case, the information isn't
> > enough to use, so we need to fix it.
> >
> > Third, there is the problem on buddy allocator. It doesn't consider
> > migratetype when merging buddy, so pages from cma or isolate region can
> > be moved to other migratetype freelist. It makes CMA failed over and over.
> > To prevent it, the buddy allocator should consider migratetype if
> > CMA/ISOLATE is enabled.
>
> Without loioing at the patches, this is likely to add some cost to the
> page free fast path -- heavy cost if it's a pageblock lookup and lighter
> cost if you are using cached page information which is potentially stale.
> Why not force CMA regions to be aligned on MAX_ORDER_NR_PAGES boundary
> instead to avoid any possibility of merging issues?
>

There was my mistake. CMA region is aligned on MAX_ORDER_NR_PAGES, so it
can't happed. Sorry for noise.

> > This patchset is aimed at fixing these problems and based on v3.13-rc7.
> >
> > mm/page_alloc: synchronize get/set pageblock
>
> cost with no justification.
>
> > mm/cma: fix cma free page accounting
>
> sounds like it would be a fix but unrelated to the leader and should be
> seperated out on its own

Yes, it is not related to this topic and it is wrong patch as Laura
pointed out, so I will drop it.

> > mm/page_alloc: move set_freepage_migratetype() to better place
>
> Very vague. If this does something useful then it could do with a better
> subject.

Okay.

> > mm/isolation: remove invalid check condition
>
> Looks harmless.
>
> > mm/page_alloc: separate interface to set/get migratetype of freepage
> > mm/page_alloc: store freelist migratetype to the page on buddy
> > properly
>
> Potentially sounds useful
>

I made these two patches for last patch to reduce performance effect of it.
In case of dropping last patch, it is better to remove the last callsite
using freelist migratetype to know the buddy freelist type. I will do respin.

Thanks.

2014-01-10 08:49:46

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 2/7] mm/cma: fix cma free page accounting

On Thu, Jan 09, 2014 at 01:10:29PM -0800, Laura Abbott wrote:
> On 1/8/2014 11:04 PM, Joonsoo Kim wrote:
> >Cma pages can be allocated by not only order 0 request but also high order
> >request. So, we should consider to account free cma page in the both
> >places.
> >
> >Signed-off-by: Joonsoo Kim <[email protected]>
> >
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index b36aa5a..1489c301 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -1091,6 +1091,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> > start_migratetype,
> > migratetype);
> >
> >+ /* CMA pages cannot be stolen */
> >+ if (is_migrate_cma(migratetype)) {
> >+ __mod_zone_page_state(zone,
> >+ NR_FREE_CMA_PAGES, -(1 << order));
> >+ }
> >+
> > /* Remove the page from the freelists */
> > list_del(&page->lru);
> > rmv_page_order(page);
> >@@ -1175,9 +1181,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
> > }
> > set_freepage_migratetype(page, mt);
> > list = &page->lru;
> >- if (is_migrate_cma(mt))
> >- __mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
> >- -(1 << order));
> > }
> > __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
> > spin_unlock(&zone->lock);
> >
>
> Wouldn't this result in double counting? in the buffered_rmqueue non
> zero ordered request we call __mod_zone_freepage_state which already
> accounts for CMA pages if the migrate type is CMA so it seems like
> we would get hit twice:
>
> buffered_rmqueue
> __rmqueue
> __rmqueue_fallback
> decrement
> __mod_zone_freepage_state
> decrement
>

Hello, Laura.

You are right. I missed it. I will drop this patch.

Thanks.

2014-01-10 09:48:44

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 0/7] improve robustness on handling migratetype

On Fri, Jan 10, 2014 at 05:48:55PM +0900, Joonsoo Kim wrote:
> On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote:
> > On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
> > > Hello,
> > >
> > > I found some weaknesses on handling migratetype during code review and
> > > testing CMA.
> > >
> > > First, we don't have any synchronization method on get/set pageblock
> > > migratetype. When we change migratetype, we hold the zone lock. So
> > > writer-writer race doesn't exist. But while someone changes migratetype,
> > > others can get migratetype. This may introduce totally unintended value
> > > as migratetype. Although I haven't heard of any problem report about
> > > that, it is better to protect properly.
> > >
> >
> > This is deliberate. The migratetypes for the majority of users are advisory
> > and aimed for fragmentation avoidance. It was important that the cost of
> > that be kept as low as possible and the general case is that migration types
> > change very rarely. In many cases, the zone lock is held. In other cases,
> > such as splitting free pages, the cost is simply not justified.
> >
> > I doubt there is any amount of data you could add in support that would
> > justify hammering the free fast paths (which call get_pageblock_type).
>
> Hello, Mel.
>
> There is a possibility that we can get unintended value such as 6 as migratetype
> if reader-writer (get/set pageblock_migratetype) race happends. It can be
> possible, because we read the value without any synchronization method. And
> this migratetype, 6, has no place in buddy freelist, so array index overrun can
> be possible and the system can break, although I haven't heard that it occurs.
>
> I think that my solution is too expensive. However, I think that we need
> solution. aren't we? Do you have any better idea?
>

It's not something I have ever heard or seen of occurring but
if you've identified that it's a real possibility then split
get_pageblock_migratetype into locked and unlocked versions. Ensure
that calls to set_pageblock_migratetype is always under zone->lock and
get_pageblock_migratetype is also under zone->lock which both should be
true in the majority of cases. Use the unlocked version otherwise but
instead of synchronoing, check if it's returning >= MIGRATE_TYPES and
return MIGRATE_MOVABLE in the unlikely event of a race. This will avoid
harming the fast paths for the majority of users and limit the damage if
a MIGRATE_CMA region is accidentally treated as MIGRATe_MOVABLE

--
Mel Gorman
SUSE Labs

2014-01-13 01:56:30

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 0/7] improve robustness on handling migratetype

On Fri, Jan 10, 2014 at 09:48:34AM +0000, Mel Gorman wrote:
> On Fri, Jan 10, 2014 at 05:48:55PM +0900, Joonsoo Kim wrote:
> > On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote:
> > > On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
> > > > Hello,
> > > >
> > > > I found some weaknesses on handling migratetype during code review and
> > > > testing CMA.
> > > >
> > > > First, we don't have any synchronization method on get/set pageblock
> > > > migratetype. When we change migratetype, we hold the zone lock. So
> > > > writer-writer race doesn't exist. But while someone changes migratetype,
> > > > others can get migratetype. This may introduce totally unintended value
> > > > as migratetype. Although I haven't heard of any problem report about
> > > > that, it is better to protect properly.
> > > >
> > >
> > > This is deliberate. The migratetypes for the majority of users are advisory
> > > and aimed for fragmentation avoidance. It was important that the cost of
> > > that be kept as low as possible and the general case is that migration types
> > > change very rarely. In many cases, the zone lock is held. In other cases,
> > > such as splitting free pages, the cost is simply not justified.
> > >
> > > I doubt there is any amount of data you could add in support that would
> > > justify hammering the free fast paths (which call get_pageblock_type).
> >
> > Hello, Mel.
> >
> > There is a possibility that we can get unintended value such as 6 as migratetype
> > if reader-writer (get/set pageblock_migratetype) race happends. It can be
> > possible, because we read the value without any synchronization method. And
> > this migratetype, 6, has no place in buddy freelist, so array index overrun can
> > be possible and the system can break, although I haven't heard that it occurs.
> >
> > I think that my solution is too expensive. However, I think that we need
> > solution. aren't we? Do you have any better idea?
> >
>
> It's not something I have ever heard or seen of occurring but
> if you've identified that it's a real possibility then split
> get_pageblock_migratetype into locked and unlocked versions. Ensure
> that calls to set_pageblock_migratetype is always under zone->lock and
> get_pageblock_migratetype is also under zone->lock which both should be
> true in the majority of cases. Use the unlocked version otherwise but
> instead of synchronoing, check if it's returning >= MIGRATE_TYPES and
> return MIGRATE_MOVABLE in the unlikely event of a race. This will avoid
> harming the fast paths for the majority of users and limit the damage if
> a MIGRATE_CMA region is accidentally treated as MIGRATe_MOVABLE

Okay.
I will re-investigate it and if I have indentified that it's a real possiblity,
I will re-make this patch according to your advice.

Thanks for comment!

2014-01-29 16:52:50

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 0/7] improve robustness on handling migratetype

On 01/10/2014 09:48 AM, Joonsoo Kim wrote:
> On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote:
>> On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
>>> Hello,
>>>
>>> I found some weaknesses on handling migratetype during code review and
>>> testing CMA.
>>>
>>> First, we don't have any synchronization method on get/set pageblock
>>> migratetype. When we change migratetype, we hold the zone lock. So
>>> writer-writer race doesn't exist. But while someone changes migratetype,
>>> others can get migratetype. This may introduce totally unintended value
>>> as migratetype. Although I haven't heard of any problem report about
>>> that, it is better to protect properly.
>>>
>>
>> This is deliberate. The migratetypes for the majority of users are advisory
>> and aimed for fragmentation avoidance. It was important that the cost of
>> that be kept as low as possible and the general case is that migration types
>> change very rarely. In many cases, the zone lock is held. In other cases,
>> such as splitting free pages, the cost is simply not justified.
>>
>> I doubt there is any amount of data you could add in support that would
>> justify hammering the free fast paths (which call get_pageblock_type).
>
> Hello, Mel.
>
> There is a possibility that we can get unintended value such as 6 as migratetype
> if reader-writer (get/set pageblock_migratetype) race happends. It can be
> possible, because we read the value without any synchronization method. And
> this migratetype, 6, has no place in buddy freelist, so array index overrun can
> be possible and the system can break, although I haven't heard that it occurs.

Hello,

it seems this can indeed happen. I'm working on memory compaction
improvements and in a prototype patch, I'm basically adding calls of
start_isolate_page_range() undo_isolate_page_range() some functions
under compact_zone(). With this I've seen occurrences of NULL pointers
in move_freepages(), free_one_page() in places where
free_list[migratetype] is manipulated by e.g. list_move(). That lead me
to question the value of migratetype and I found this thread. Adding
some debugging in get_pageblock_migratetype() and voila, I get a value
of 6 being read.

So is it just my patch adding a dangerous situation, or does it exist in
mainline as well? By looking at free_one_page(), it uses zone->lock, but
get_pageblock_migratetype() is called by its callers
(free_hot_cold_page() or __free_pages_ok()) outside of the lock. This
determined migratetype is then used under free_one_page() to access a
free_list.

It seems that this could race with set_pageblock_migratetype() called
from try_to_steal_freepages() (despite the latter being properly
locked). There are also other callers but those seem to be either
limited to initialization and isolation, which should be rare (?).
However, try_to_steal_freepages can occur repeatedly.
So I assume that the race happens but never manifests as a fatal error
as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and MIGRATE_MOVABLE
values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values with
bit 4 enabled and can thus result in invalid values due to non-atomic
access.

Does that make sense to you and should we thus proceed with patching
this race?

Vlastimil

> I think that my solution is too expensive. However, I think that we need
> solution. aren't we? Do you have any better idea?
>
>>
>>> Second, (get/set)_freepage_migrate isn't used properly. I guess that it
>>> would be introduced for per cpu page(pcp) performance, but, it is also
>>> used by memory isolation, now. For that case, the information isn't
>>> enough to use, so we need to fix it.
>>>
>>> Third, there is the problem on buddy allocator. It doesn't consider
>>> migratetype when merging buddy, so pages from cma or isolate region can
>>> be moved to other migratetype freelist. It makes CMA failed over and over.
>>> To prevent it, the buddy allocator should consider migratetype if
>>> CMA/ISOLATE is enabled.
>>
>> Without loioing at the patches, this is likely to add some cost to the
>> page free fast path -- heavy cost if it's a pageblock lookup and lighter
>> cost if you are using cached page information which is potentially stale.
>> Why not force CMA regions to be aligned on MAX_ORDER_NR_PAGES boundary
>> instead to avoid any possibility of merging issues?
>>
>
> There was my mistake. CMA region is aligned on MAX_ORDER_NR_PAGES, so it
> can't happed. Sorry for noise.
>
>>> This patchset is aimed at fixing these problems and based on v3.13-rc7.
>>>
>>> mm/page_alloc: synchronize get/set pageblock
>>
>> cost with no justification.
>>
>>> mm/cma: fix cma free page accounting
>>
>> sounds like it would be a fix but unrelated to the leader and should be
>> seperated out on its own
>
> Yes, it is not related to this topic and it is wrong patch as Laura
> pointed out, so I will drop it.
>
>>> mm/page_alloc: move set_freepage_migratetype() to better place
>>
>> Very vague. If this does something useful then it could do with a better
>> subject.
>
> Okay.
>
>>> mm/isolation: remove invalid check condition
>>
>> Looks harmless.
>>
>>> mm/page_alloc: separate interface to set/get migratetype of freepage
>>> mm/page_alloc: store freelist migratetype to the page on buddy
>>> properly
>>
>> Potentially sounds useful
>>
>
> I made these two patches for last patch to reduce performance effect of it.
> In case of dropping last patch, it is better to remove the last callsite
> using freelist migratetype to know the buddy freelist type. I will do respin.
>
> Thanks.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2014-01-31 15:39:19

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 0/7] improve robustness on handling migratetype

On Wed, Jan 29, 2014 at 05:52:41PM +0100, Vlastimil Babka wrote:
> On 01/10/2014 09:48 AM, Joonsoo Kim wrote:
> >On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote:
> >>On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
> >>>Hello,
> >>>
> >>>I found some weaknesses on handling migratetype during code review and
> >>>testing CMA.
> >>>
> >>>First, we don't have any synchronization method on get/set pageblock
> >>>migratetype. When we change migratetype, we hold the zone lock. So
> >>>writer-writer race doesn't exist. But while someone changes migratetype,
> >>>others can get migratetype. This may introduce totally unintended value
> >>>as migratetype. Although I haven't heard of any problem report about
> >>>that, it is better to protect properly.
> >>>
> >>
> >>This is deliberate. The migratetypes for the majority of users are advisory
> >>and aimed for fragmentation avoidance. It was important that the cost of
> >>that be kept as low as possible and the general case is that migration types
> >>change very rarely. In many cases, the zone lock is held. In other cases,
> >>such as splitting free pages, the cost is simply not justified.
> >>
> >>I doubt there is any amount of data you could add in support that would
> >>justify hammering the free fast paths (which call get_pageblock_type).
> >
> >Hello, Mel.
> >
> >There is a possibility that we can get unintended value such as 6 as migratetype
> >if reader-writer (get/set pageblock_migratetype) race happends. It can be
> >possible, because we read the value without any synchronization method. And
> >this migratetype, 6, has no place in buddy freelist, so array index overrun can
> >be possible and the system can break, although I haven't heard that it occurs.
>
> Hello,
>
> it seems this can indeed happen. I'm working on memory compaction
> improvements and in a prototype patch, I'm basically adding calls of
> start_isolate_page_range() undo_isolate_page_range() some functions
> under compact_zone(). With this I've seen occurrences of NULL
> pointers in move_freepages(), free_one_page() in places where
> free_list[migratetype] is manipulated by e.g. list_move(). That lead
> me to question the value of migratetype and I found this thread.
> Adding some debugging in get_pageblock_migratetype() and voila, I
> get a value of 6 being read.
>
> So is it just my patch adding a dangerous situation, or does it exist in
> mainline as well? By looking at free_one_page(), it uses zone->lock, but
> get_pageblock_migratetype() is called by its callers
> (free_hot_cold_page() or __free_pages_ok()) outside of the lock.
> This determined migratetype is then used under free_one_page() to
> access a free_list.
>
> It seems that this could race with set_pageblock_migratetype()
> called from try_to_steal_freepages() (despite the latter being
> properly locked). There are also other callers but those seem to be
> either limited to initialization and isolation, which should be rare
> (?).
> However, try_to_steal_freepages can occur repeatedly.
> So I assume that the race happens but never manifests as a fatal
> error as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and
> MIGRATE_MOVABLE
> values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values
> with bit 4 enabled and can thus result in invalid values due to
> non-atomic access.
>
> Does that make sense to you and should we thus proceed with patching
> this race?
>

If you have direct evidence then it is indeed a problem. the key would be
to avoid taking the zone->lock just to stabilise this and instead modify
when get_pageblock_pagetype is called to make it safe. Looking at the
callers of get_pageblock_pagetype it would appear that

1. __free_pages_ok's call to get_pageblock_pagetype can move into
free_one_page() under the zone lock as long as you also move
the set_freepage_migratetype call. The migratetype will be read
twice by the free_hot_cold_page->free_one_page call but that's
ok because you have established that it is necessary

2. rmqueue_bulk calls under zone->lock

3. free_hot_cold_page cannot take zone->lock to stabilise the
migratetype read but if it gets a bad read due to a race, it
enters the slow path. Force it to call free_one_page() there
and take the lock in the event of a race instead of only
calling in there due to is_migrate_isolatetype. Consider
adding a debug patch that counts with vmstat how often this
race occurs and check the value with and without the compaction
patches you've added

4. It's not obvious but __isolate_free_page should already hold the zone lock

5. buffered_rmqueue, move the call to get_pageblock_migratetype under
the zone lock. It'll just cost a local variable.

6. A race in setup_zone_migrate_reserve is relatively harmless. Check
system_state == SYSTEM_BOOTING and take the zone->lock if the system
is live. Release, resched and reacquire if need_resched()

7. has_unmovable_pages is harmless, the range should be isolated and
not racing against other updates

--
Mel Gorman
SUSE Labs

2014-02-03 07:45:11

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 0/7] improve robustness on handling migratetype

On Wed, Jan 29, 2014 at 05:52:41PM +0100, Vlastimil Babka wrote:
> On 01/10/2014 09:48 AM, Joonsoo Kim wrote:
> >On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote:
> >>On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
> >>>Hello,
> >>>
> >>>I found some weaknesses on handling migratetype during code review and
> >>>testing CMA.
> >>>
> >>>First, we don't have any synchronization method on get/set pageblock
> >>>migratetype. When we change migratetype, we hold the zone lock. So
> >>>writer-writer race doesn't exist. But while someone changes migratetype,
> >>>others can get migratetype. This may introduce totally unintended value
> >>>as migratetype. Although I haven't heard of any problem report about
> >>>that, it is better to protect properly.
> >>>
> >>
> >>This is deliberate. The migratetypes for the majority of users are advisory
> >>and aimed for fragmentation avoidance. It was important that the cost of
> >>that be kept as low as possible and the general case is that migration types
> >>change very rarely. In many cases, the zone lock is held. In other cases,
> >>such as splitting free pages, the cost is simply not justified.
> >>
> >>I doubt there is any amount of data you could add in support that would
> >>justify hammering the free fast paths (which call get_pageblock_type).
> >
> >Hello, Mel.
> >
> >There is a possibility that we can get unintended value such as 6 as migratetype
> >if reader-writer (get/set pageblock_migratetype) race happends. It can be
> >possible, because we read the value without any synchronization method. And
> >this migratetype, 6, has no place in buddy freelist, so array index overrun can
> >be possible and the system can break, although I haven't heard that it occurs.
>
> Hello,
>
> it seems this can indeed happen. I'm working on memory compaction
> improvements and in a prototype patch, I'm basically adding calls of
> start_isolate_page_range() undo_isolate_page_range() some functions
> under compact_zone(). With this I've seen occurrences of NULL
> pointers in move_freepages(), free_one_page() in places where
> free_list[migratetype] is manipulated by e.g. list_move(). That lead
> me to question the value of migratetype and I found this thread.
> Adding some debugging in get_pageblock_migratetype() and voila, I
> get a value of 6 being read.
>
> So is it just my patch adding a dangerous situation, or does it exist in
> mainline as well? By looking at free_one_page(), it uses zone->lock, but
> get_pageblock_migratetype() is called by its callers
> (free_hot_cold_page() or __free_pages_ok()) outside of the lock.
> This determined migratetype is then used under free_one_page() to
> access a free_list.
>
> It seems that this could race with set_pageblock_migratetype()
> called from try_to_steal_freepages() (despite the latter being
> properly locked). There are also other callers but those seem to be
> either limited to initialization and isolation, which should be rare
> (?).
> However, try_to_steal_freepages can occur repeatedly.
> So I assume that the race happens but never manifests as a fatal
> error as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and
> MIGRATE_MOVABLE
> values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values
> with bit 4 enabled and can thus result in invalid values due to
> non-atomic access.
>
> Does that make sense to you and should we thus proceed with patching
> this race?
>

Hello,

This race is possible without your prototype patch, however, on very low
probability. Some codes related to memory failure use set_migratetype_isolate()
which could result in this race.

Although it may be very rare case and not critical, it is better to fix
this race. I prefer that we don't depend on luck. :)

Mel's suggestion looks good to me. Do you have another idea?

Thanks.

2014-02-03 09:17:01

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 0/7] improve robustness on handling migratetype

On 02/03/2014 08:45 AM, Joonsoo Kim wrote:
> On Wed, Jan 29, 2014 at 05:52:41PM +0100, Vlastimil Babka wrote:
>> On 01/10/2014 09:48 AM, Joonsoo Kim wrote:
>>> On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote:
>>>> On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
>>>>> Hello,
>>>>>
>>>>> I found some weaknesses on handling migratetype during code review and
>>>>> testing CMA.
>>>>>
>>>>> First, we don't have any synchronization method on get/set pageblock
>>>>> migratetype. When we change migratetype, we hold the zone lock. So
>>>>> writer-writer race doesn't exist. But while someone changes migratetype,
>>>>> others can get migratetype. This may introduce totally unintended value
>>>>> as migratetype. Although I haven't heard of any problem report about
>>>>> that, it is better to protect properly.
>>>>>
>>>>
>>>> This is deliberate. The migratetypes for the majority of users are advisory
>>>> and aimed for fragmentation avoidance. It was important that the cost of
>>>> that be kept as low as possible and the general case is that migration types
>>>> change very rarely. In many cases, the zone lock is held. In other cases,
>>>> such as splitting free pages, the cost is simply not justified.
>>>>
>>>> I doubt there is any amount of data you could add in support that would
>>>> justify hammering the free fast paths (which call get_pageblock_type).
>>>
>>> Hello, Mel.
>>>
>>> There is a possibility that we can get unintended value such as 6 as migratetype
>>> if reader-writer (get/set pageblock_migratetype) race happends. It can be
>>> possible, because we read the value without any synchronization method. And
>>> this migratetype, 6, has no place in buddy freelist, so array index overrun can
>>> be possible and the system can break, although I haven't heard that it occurs.
>>
>> Hello,
>>
>> it seems this can indeed happen. I'm working on memory compaction
>> improvements and in a prototype patch, I'm basically adding calls of
>> start_isolate_page_range() undo_isolate_page_range() some functions
>> under compact_zone(). With this I've seen occurrences of NULL
>> pointers in move_freepages(), free_one_page() in places where
>> free_list[migratetype] is manipulated by e.g. list_move(). That lead
>> me to question the value of migratetype and I found this thread.
>> Adding some debugging in get_pageblock_migratetype() and voila, I
>> get a value of 6 being read.
>>
>> So is it just my patch adding a dangerous situation, or does it exist in
>> mainline as well? By looking at free_one_page(), it uses zone->lock, but
>> get_pageblock_migratetype() is called by its callers
>> (free_hot_cold_page() or __free_pages_ok()) outside of the lock.
>> This determined migratetype is then used under free_one_page() to
>> access a free_list.
>>
>> It seems that this could race with set_pageblock_migratetype()
>> called from try_to_steal_freepages() (despite the latter being
>> properly locked). There are also other callers but those seem to be
>> either limited to initialization and isolation, which should be rare
>> (?).
>> However, try_to_steal_freepages can occur repeatedly.
>> So I assume that the race happens but never manifests as a fatal
>> error as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and
>> MIGRATE_MOVABLE
>> values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values
>> with bit 4 enabled and can thus result in invalid values due to
>> non-atomic access.
>>
>> Does that make sense to you and should we thus proceed with patching
>> this race?
>>
>
> Hello,
>
> This race is possible without your prototype patch, however, on very low
> probability. Some codes related to memory failure use set_migratetype_isolate()
> which could result in this race.
>
> Although it may be very rare case and not critical, it is better to fix
> this race. I prefer that we don't depend on luck. :)

I agree :) I also don't like the possibility that the non-fatal type of
race (where higher-order bits are not involved) occurs and can hurt
anti-fragmentation, or even suddenly become a problem in the future if
e.g. more migratetypes are added. I'll try to quantify that with a debug
patch.

> Mel's suggestion looks good to me. Do you have another idea?

No, it sounds good so I'm going to work on this as outlined.

> Thanks.
>