2014-04-03 15:37:00

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v3] mm/page_alloc: fix freeing of MIGRATE_RESERVE migratetype pages

On 03/25/2014 02:47 PM, Bartlomiej Zolnierkiewicz wrote:
>
> Hi,
>
> On Friday, March 21, 2014 03:16:31 PM Vlastimil Babka wrote:
>> On 03/06/2014 06:35 PM, Bartlomiej Zolnierkiewicz wrote:
>>> Pages allocated from MIGRATE_RESERVE migratetype pageblocks
>>> are not freed back to MIGRATE_RESERVE migratetype free
>>> lists in free_pcppages_bulk()->__free_one_page() if we got
>>> to free_pcppages_bulk() through drain_[zone_]pages().
>>> The freeing through free_hot_cold_page() is okay because
>>> freepage migratetype is set to pageblock migratetype before
>>> calling free_pcppages_bulk().
>>
>> I think this is somewhat misleading and got me confused for a while.
>> It's not about the call path of free_pcppages_bulk(), but about the
>> fact that rmqueue_bulk() has been called at some point to fill up the
>> pcp lists, and had to resort to __rmqueue_fallback(). So, going through
>> free_hot_cold_page() might give you correct migratetype for the last
>> page freed, but the pcp lists may still contain misplaced pages from
>> earlier rmqueue_bulk().
>
> Ok, you're right. I'll fix this.
>
>>> If pages of MIGRATE_RESERVE
>>> migratetype end up on the free lists of other migratetype
>>> whole Reserved pageblock may be later changed to the other
>>> migratetype in __rmqueue_fallback() and it will be never
>>> changed back to be a Reserved pageblock. Fix the issue by
>>> moving freepage migratetype setting from rmqueue_bulk() to
>>> __rmqueue[_fallback]() and preserving freepage migratetype
>>> as an original pageblock migratetype for MIGRATE_RESERVE
>>> migratetype pages.
>>
>> Actually wouldn't the easiest solution to this particular problem to
>> check current pageblock migratetype in try_to_steal_freepages() and
>> disallow changing it. However I agree that preventing the misplaced page
>> in the first place would be even better.
>>
>>> The problem was introduced in v2.6.31 by commit ed0ae21
>>> ("page allocator: do not call get_pageblock_migratetype()
>>> more than necessary").
>>>
>>> Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]>
>>> Reported-by: Yong-Taek Lee <[email protected]>
>>> Cc: Marek Szyprowski <[email protected]>
>>> Cc: Mel Gorman <[email protected]>
>>> Cc: Hugh Dickins <[email protected]>
>>> ---
>>> v2:
>>> - updated patch description, there is no __zone_pcp_update()
>>> in newer kernels
>>> v3:
>>> - set freepage migratetype in __rmqueue[_fallback]()
>>> instead of rmqueue_bulk() (per Mel's request)
>>>
>>> mm/page_alloc.c | 27 ++++++++++++++++++---------
>>> 1 file changed, 18 insertions(+), 9 deletions(-)
>>>
>>> Index: b/mm/page_alloc.c
>>> ===================================================================
>>> --- a/mm/page_alloc.c 2014-03-06 18:10:21.884422983 +0100
>>> +++ b/mm/page_alloc.c 2014-03-06 18:10:27.016422895 +0100
>>> @@ -1094,7 +1094,7 @@ __rmqueue_fallback(struct zone *zone, in
>>> struct free_area *area;
>>> int current_order;
>>> struct page *page;
>>> - int migratetype, new_type, i;
>>> + int migratetype, new_type, mt = start_migratetype, i;
>>
>> A better naming would help, "mt" and "migratetype" are the same thing
>> and it gets too confusing.
>
> Well, yes, though 'mt' is short and the check code is consistent with
> the corresponding code in rmqueue_bulk().
>
> Do you have a proposal for a better name for this variable?
>
>>>
>>> /* Find the largest possible block of pages in the other list */
>>> for (current_order = MAX_ORDER-1; current_order >= order;
>>> @@ -1125,6 +1125,14 @@ __rmqueue_fallback(struct zone *zone, in
>>> expand(zone, page, order, current_order, area,
>>> new_type);
>>>
>>> + if (IS_ENABLED(CONFIG_CMA)) {
>>> + mt = get_pageblock_migratetype(page);
>>> + if (!is_migrate_cma(mt) &&
>>> + !is_migrate_isolate(mt))
>>> + mt = start_migratetype;
>>> + }
>>> + set_freepage_migratetype(page, mt);
>>> +
>>> trace_mm_page_alloc_extfrag(page, order, current_order,
>>> start_migratetype, migratetype, new_type);
>>>
>>> @@ -1147,7 +1155,9 @@ static struct page *__rmqueue(struct zon
>>> retry_reserve:
>>> page = __rmqueue_smallest(zone, order, migratetype);
>>>
>>> - if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
>>> + if (likely(page)) {
>>> + set_freepage_migratetype(page, migratetype);
>>
>> Are you sure that here the checking of of CMA and ISOLATE is not needed?
>
> CMA and ISOLATE migratetype pages are always put back on the correct
> free lists (since set_freepage_migratetype() sets freepage migratetype
> to the original one for CMA and ISOLATE migratetype pages) and
> __rmqueue_smallest() can take page only from the 'migratetype' free
> list.

Actually, this is true also for the __rmqueue_fallback() case. So we can
do without get_pageblock_migratetype() completely. In fact, Joonsoo
already posted such patch in "[PATCH 3/7] mm/page_alloc: move
set_freepage_migratetype() to better place", see:
http://lkml.org/lkml/2014/1/9/33

I've updated and improved this and will send shortly along with some
DEBUG_VM checks to test easier that this is indeed the case. Testing
from the CMA people is welcome.

Vlastimil

> + It was suggested to do it this way by Mel.
>
>> Did the original rmqueue_bulk() have this checking only for the
>> __rmqueue_fallback() case? Why wouldn't the check already be only in
>> __rmqueue_fallback() then?
>
> Probably because of historical reasons. The rmqueue_bulk() contained
> set_page_private() call when CMA was introduced and added the special
> handling for CMA and ISOLATE migratetype pages, please see commit
> 47118af ("mm: mmzone: MIGRATE_CMA migration type added").
>
>>> + } else if (migratetype != MIGRATE_RESERVE) {
>>> page = __rmqueue_fallback(zone, order, migratetype);
>>>
>>> /*
>>> @@ -1174,7 +1184,7 @@ static int rmqueue_bulk(struct zone *zon
>>> unsigned long count, struct list_head *list,
>>> int migratetype, int cold)
>>> {
>>> - int mt = migratetype, i;
>>> + int i;
>>>
>>> spin_lock(&zone->lock);
>>> for (i = 0; i < count; ++i) {
>>> @@ -1195,16 +1205,15 @@ static int rmqueue_bulk(struct zone *zon
>>> list_add(&page->lru, list);
>>> else
>>> list_add_tail(&page->lru, list);
>>> + list = &page->lru;
>>> if (IS_ENABLED(CONFIG_CMA)) {
>>> - mt = get_pageblock_migratetype(page);
>>> + int mt = get_pageblock_migratetype(page);
>>> if (!is_migrate_cma(mt) && !is_migrate_isolate(mt))
>>> mt = migratetype;
>>> + if (is_migrate_cma(mt))
>>> + __mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
>>> + -(1 << order));
>>> }
>>> - set_freepage_migratetype(page, mt);
>>> - list = &page->lru;
>>> - if (is_migrate_cma(mt))
>>> - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
>>> - -(1 << order));
>>> }
>>> __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
>>> spin_unlock(&zone->lock);
>
> Best regards,
> --
> Bartlomiej Zolnierkiewicz
> Samsung R&D Institute Poland
> Samsung Electronics
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>


2014-04-03 15:40:47

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH 2/2] mm/page_alloc: DEBUG_VM checks for free_list placement of CMA and RESERVE pages

For the MIGRATE_RESERVE pages, it is important they do not get misplaced
on free_list of other migratetype, otherwise the whole MIGRATE_RESERVE
pageblock might be changed to other migratetype in try_to_steal_freepages().
For MIGRATE_CMA, the pages also must not go to a different free_list, otherwise
they could get allocated as unmovable and result in CMA failure.

This is ensured by setting the freepage_migratetype appropriately when placing
pages on pcp lists, and using the information when releasing them back to
free_list. It is also assumed that CMA and RESERVE pageblocks are created only
in the init phase. This patch adds DEBUG_VM checks to catch any regressions
introduced for this invariant.

Cc: Yong-Taek Lee <[email protected]>
Cc: Bartlomiej Zolnierkiewicz <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
---
include/linux/mm.h | 19 +++++++++++++++++++
mm/page_alloc.c | 3 +++
2 files changed, 22 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c1b7414..27a74ba 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -280,6 +280,25 @@ static inline int get_freepage_migratetype(struct page *page)
}

/*
+ * Check that a freepage cannot end up on a wrong free_list for "sensitive"
+ * migratetypes. Return false if it could. Useful for VM_BUG_ON checks.
+ */
+static inline bool check_freepage_migratetype(struct page *page)
+{
+ int pageblock_mt = get_pageblock_migratetype(page);
+ int freepage_mt = get_freepage_migratetype(page);
+
+ /*
+ * For RESERVE and CMA pageblocks, the freepage_migratetype must
+ * match their migratetype. For other pageblocks, we don't care.
+ */
+ if (pageblock_mt != MIGRATE_RESERVE && !is_migrate_cma(pageblock_mt))
+ return true;
+
+ return (freepage_mt == pageblock_mt);
+}
+
+/*
* FIXME: take this include out, include page-flags.h in
* files which need it (119 of them)
*/
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2dbaba1..0ee9f8c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -697,6 +697,8 @@ static void free_pcppages_bulk(struct zone *zone, int count,
page = list_entry(list->prev, struct page, lru);
/* must delete as __free_one_page list manipulates */
list_del(&page->lru);
+
+ VM_BUG_ON(!check_freepage_migratetype(page));
mt = get_freepage_migratetype(page);
/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
__free_one_page(page, zone, 0, mt);
@@ -1190,6 +1192,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
struct page *page = __rmqueue(zone, order, migratetype);
if (unlikely(page == NULL))
break;
+ VM_BUG_ON(!check_freepage_migratetype(page));

/*
* Split buddy pages returned by expand() are received here
--
1.8.4.5

2014-04-03 15:40:43

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH 1/2] mm/page_alloc: prevent MIGRATE_RESERVE pages from being misplaced

For the MIGRATE_RESERVE pages, it is important they do not get misplaced
on free_list of other migratetype, otherwise the whole MIGRATE_RESERVE
pageblock might be changed to other migratetype in try_to_steal_freepages().

Currently, it is however possible for this to happen when MIGRATE_RESERVE
page is allocated on pcplist through rmqueue_bulk() as a fallback for other
desired migratetype, and then later freed back through free_pcppages_bulk()
without being actually used. This happens because free_pcppages_bulk() uses
get_freepage_migratetype() to choose the free_list, and rmqueue_bulk() calls
set_freepage_migratetype() with the *desired* migratetype and not the page's
original MIGRATE_RESERVE migratetype.

This patch fixes the problem by moving the call to set_freepage_migratetype()
from rmqueue_bulk() down to __rmqueue_smallest() and __rmqueue_fallback() where
the actual page's migratetype (e.g. from which free_list the page is taken
from) is used. Note that this migratetype might be different from the
pageblock's migratetype due to freepage stealing decisions. This is OK, as page
stealing never uses MIGRATE_RESERVE as a fallback, and also takes care to leave
all MIGRATE_CMA pages on the correct freelist.

Therefore, as an additional benefit, the call to get_pageblock_migratetype()
from rmqueue_bulk() when CMA is enabled, can be removed completely. This relies
on the fact that MIGRATE_CMA pageblocks are created only during system init,
and the above. The related is_migrate_isolate() check is also unnecessary, as
memory isolation has other ways to move pages between freelists, and drain
pcp lists containing pages that should be isolated.
The buffered_rmqueue() can also benefit from calling get_freepage_migratetype()
instead of get_pageblock_migratetype().

A separate patch will add VM_BUG_ON checks for the invariant that for
MIGRATE_RESERVE and MIGRATE_CMA pageblocks, freepage_migratetype must equal to
pageblock_migratetype so that these pages always go to the correct free_list.

Reported-by: Yong-Taek Lee <[email protected]>
Reported-by: Bartlomiej Zolnierkiewicz <[email protected]>
Suggested-by: Joonsoo Kim <[email protected]>
Suggested-by: Mel Gorman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/page_alloc.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3bac76a..2dbaba1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -930,6 +930,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
rmv_page_order(page);
area->nr_free--;
expand(zone, page, order, current_order, area, migratetype);
+ set_freepage_migratetype(page, migratetype);
return page;
}

@@ -1056,7 +1057,9 @@ static int try_to_steal_freepages(struct zone *zone, struct page *page,

/*
* When borrowing from MIGRATE_CMA, we need to release the excess
- * buddy pages to CMA itself.
+ * buddy pages to CMA itself. We also ensure the freepage_migratetype
+ * is set to CMA so it is returned to the correct freelist in case
+ * the page ends up being not actually allocated from the pcp lists.
*/
if (is_migrate_cma(fallback_type))
return fallback_type;
@@ -1124,6 +1127,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)

expand(zone, page, order, current_order, area,
new_type);
+ /* The freepage_migratetype may differ from pageblock's
+ * migratetype depending on the decisions in
+ * try_to_steal_freepages. This is OK as long as it does
+ * not differ for MIGRATE_CMA type.
+ */
+ set_freepage_migratetype(page, new_type);

trace_mm_page_alloc_extfrag(page, order, current_order,
start_migratetype, migratetype, new_type);
@@ -1174,7 +1183,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
unsigned long count, struct list_head *list,
int migratetype, int cold)
{
- int mt = migratetype, i;
+ int i;

spin_lock(&zone->lock);
for (i = 0; i < count; ++i) {
@@ -1195,14 +1204,8 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
list_add(&page->lru, list);
else
list_add_tail(&page->lru, list);
- if (IS_ENABLED(CONFIG_CMA)) {
- mt = get_pageblock_migratetype(page);
- if (!is_migrate_cma(mt) && !is_migrate_isolate(mt))
- mt = migratetype;
- }
- set_freepage_migratetype(page, mt);
list = &page->lru;
- if (is_migrate_cma(mt))
+ if (is_migrate_cma(get_freepage_migratetype(page)))
__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
-(1 << order));
}
@@ -1580,7 +1583,7 @@ again:
if (!page)
goto failed;
__mod_zone_freepage_state(zone, -(1 << order),
- get_pageblock_migratetype(page));
+ get_freepage_migratetype(page));
}

/*
--
1.8.4.5

2014-04-16 00:56:23

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/page_alloc: prevent MIGRATE_RESERVE pages from being misplaced

On Thu, Apr 03, 2014 at 05:40:17PM +0200, Vlastimil Babka wrote:
> For the MIGRATE_RESERVE pages, it is important they do not get misplaced
> on free_list of other migratetype, otherwise the whole MIGRATE_RESERVE
> pageblock might be changed to other migratetype in try_to_steal_freepages().
>
> Currently, it is however possible for this to happen when MIGRATE_RESERVE
> page is allocated on pcplist through rmqueue_bulk() as a fallback for other
> desired migratetype, and then later freed back through free_pcppages_bulk()
> without being actually used. This happens because free_pcppages_bulk() uses
> get_freepage_migratetype() to choose the free_list, and rmqueue_bulk() calls
> set_freepage_migratetype() with the *desired* migratetype and not the page's
> original MIGRATE_RESERVE migratetype.
>
> This patch fixes the problem by moving the call to set_freepage_migratetype()
> from rmqueue_bulk() down to __rmqueue_smallest() and __rmqueue_fallback() where
> the actual page's migratetype (e.g. from which free_list the page is taken
> from) is used. Note that this migratetype might be different from the
> pageblock's migratetype due to freepage stealing decisions. This is OK, as page
> stealing never uses MIGRATE_RESERVE as a fallback, and also takes care to leave
> all MIGRATE_CMA pages on the correct freelist.
>
> Therefore, as an additional benefit, the call to get_pageblock_migratetype()
> from rmqueue_bulk() when CMA is enabled, can be removed completely. This relies
> on the fact that MIGRATE_CMA pageblocks are created only during system init,
> and the above. The related is_migrate_isolate() check is also unnecessary, as
> memory isolation has other ways to move pages between freelists, and drain
> pcp lists containing pages that should be isolated.
> The buffered_rmqueue() can also benefit from calling get_freepage_migratetype()
> instead of get_pageblock_migratetype().
>
> A separate patch will add VM_BUG_ON checks for the invariant that for
> MIGRATE_RESERVE and MIGRATE_CMA pageblocks, freepage_migratetype must equal to
> pageblock_migratetype so that these pages always go to the correct free_list.
>
> Reported-by: Yong-Taek Lee <[email protected]>
> Reported-by: Bartlomiej Zolnierkiewicz <[email protected]>
> Suggested-by: Joonsoo Kim <[email protected]>
> Suggested-by: Mel Gorman <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Marek Szyprowski <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Michal Nazarewicz <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>

Looks good to me.

Acked-by: Joonsoo Kim <[email protected]>

2014-04-16 01:08:54

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm/page_alloc: DEBUG_VM checks for free_list placement of CMA and RESERVE pages

On Thu, Apr 03, 2014 at 05:40:18PM +0200, Vlastimil Babka wrote:
> For the MIGRATE_RESERVE pages, it is important they do not get misplaced
> on free_list of other migratetype, otherwise the whole MIGRATE_RESERVE
> pageblock might be changed to other migratetype in try_to_steal_freepages().
> For MIGRATE_CMA, the pages also must not go to a different free_list, otherwise
> they could get allocated as unmovable and result in CMA failure.
>
> This is ensured by setting the freepage_migratetype appropriately when placing
> pages on pcp lists, and using the information when releasing them back to
> free_list. It is also assumed that CMA and RESERVE pageblocks are created only
> in the init phase. This patch adds DEBUG_VM checks to catch any regressions
> introduced for this invariant.

Hello, Vlastimil.

Idea looks good to me.

>
> Cc: Yong-Taek Lee <[email protected]>
> Cc: Bartlomiej Zolnierkiewicz <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Marek Szyprowski <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Michal Nazarewicz <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>

Acked-by: Joonsoo Kim <[email protected]>

2014-04-17 23:28:24

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm/page_alloc: prevent MIGRATE_RESERVE pages from being misplaced

On Thu, Apr 03, 2014 at 05:40:17PM +0200, Vlastimil Babka wrote:
> For the MIGRATE_RESERVE pages, it is important they do not get misplaced
> on free_list of other migratetype, otherwise the whole MIGRATE_RESERVE
> pageblock might be changed to other migratetype in try_to_steal_freepages().
>
> Currently, it is however possible for this to happen when MIGRATE_RESERVE
> page is allocated on pcplist through rmqueue_bulk() as a fallback for other
> desired migratetype, and then later freed back through free_pcppages_bulk()
> without being actually used. This happens because free_pcppages_bulk() uses
> get_freepage_migratetype() to choose the free_list, and rmqueue_bulk() calls
> set_freepage_migratetype() with the *desired* migratetype and not the page's
> original MIGRATE_RESERVE migratetype.
>
> This patch fixes the problem by moving the call to set_freepage_migratetype()
> from rmqueue_bulk() down to __rmqueue_smallest() and __rmqueue_fallback() where
> the actual page's migratetype (e.g. from which free_list the page is taken
> from) is used. Note that this migratetype might be different from the
> pageblock's migratetype due to freepage stealing decisions. This is OK, as page
> stealing never uses MIGRATE_RESERVE as a fallback, and also takes care to leave
> all MIGRATE_CMA pages on the correct freelist.
>
> Therefore, as an additional benefit, the call to get_pageblock_migratetype()
> from rmqueue_bulk() when CMA is enabled, can be removed completely. This relies
> on the fact that MIGRATE_CMA pageblocks are created only during system init,
> and the above. The related is_migrate_isolate() check is also unnecessary, as
> memory isolation has other ways to move pages between freelists, and drain
> pcp lists containing pages that should be isolated.
> The buffered_rmqueue() can also benefit from calling get_freepage_migratetype()
> instead of get_pageblock_migratetype().

Nice description.

>
> A separate patch will add VM_BUG_ON checks for the invariant that for
> MIGRATE_RESERVE and MIGRATE_CMA pageblocks, freepage_migratetype must equal to
> pageblock_migratetype so that these pages always go to the correct free_list.
>
> Reported-by: Yong-Taek Lee <[email protected]>
> Reported-by: Bartlomiej Zolnierkiewicz <[email protected]>
> Suggested-by: Joonsoo Kim <[email protected]>
> Suggested-by: Mel Gorman <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Marek Szyprowski <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Michal Nazarewicz <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>

Acked-by: Minchan Kim <[email protected]>

--
Kind regards,
Minchan Kim

2014-04-30 21:47:32

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm/page_alloc: DEBUG_VM checks for free_list placement of CMA and RESERVE pages

On 04/03/2014 11:40 AM, Vlastimil Babka wrote:
> For the MIGRATE_RESERVE pages, it is important they do not get misplaced
> on free_list of other migratetype, otherwise the whole MIGRATE_RESERVE
> pageblock might be changed to other migratetype in try_to_steal_freepages().
> For MIGRATE_CMA, the pages also must not go to a different free_list, otherwise
> they could get allocated as unmovable and result in CMA failure.
>
> This is ensured by setting the freepage_migratetype appropriately when placing
> pages on pcp lists, and using the information when releasing them back to
> free_list. It is also assumed that CMA and RESERVE pageblocks are created only
> in the init phase. This patch adds DEBUG_VM checks to catch any regressions
> introduced for this invariant.
>
> Cc: Yong-Taek Lee <[email protected]>
> Cc: Bartlomiej Zolnierkiewicz <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Marek Szyprowski <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Michal Nazarewicz <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>

Two issues with this patch.

First:

[ 3446.320082] kernel BUG at mm/page_alloc.c:1197!
[ 3446.320082] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 3446.320082] Dumping ftrace buffer:
[ 3446.320082] (ftrace buffer empty)
[ 3446.320082] Modules linked in:
[ 3446.320082] CPU: 1 PID: 8923 Comm: trinity-c42 Not tainted 3.15.0-rc3-next-20140429-sasha-00015-g7c7e0a7-dirty #427
[ 3446.320082] task: ffff88053e208000 ti: ffff88053e246000 task.ti: ffff88053e246000
[ 3446.320082] RIP: get_page_from_freelist (mm/page_alloc.c:1197 mm/page_alloc.c:1548 mm/page_alloc.c:2036)
[ 3446.320082] RSP: 0018:ffff88053e247778 EFLAGS: 00010002
[ 3446.320082] RAX: 0000000000000003 RBX: ffffea0000f40000 RCX: 0000000000000008
[ 3446.320082] RDX: 0000000000000002 RSI: 0000000000000003 RDI: 00000000000000a0
[ 3446.320082] RBP: ffff88053e247868 R08: 0000000000000007 R09: 0000000000000000
[ 3446.320082] R10: ffff88006ffcef00 R11: 0000000000000000 R12: 0000000000000014
[ 3446.335888] R13: ffffea000115ffe0 R14: ffffea000115ffe0 R15: 0000000000000000
[ 3446.335888] FS: 00007f8c9f059700(0000) GS:ffff88006ec00000(0000) knlGS:0000000000000000
[ 3446.335888] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3446.335888] CR2: 0000000002cbc048 CR3: 000000054cdb4000 CR4: 00000000000006a0
[ 3446.335888] DR0: 00000000006de000 DR1: 00000000006de000 DR2: 0000000000000000
[ 3446.335888] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000602
[ 3446.335888] Stack:
[ 3446.335888] ffff88053e247798 ffff88006eddc0b8 0000000000000016 0000000000000000
[ 3446.335888] ffff88006ffd2068 ffff88006ffdb008 0000000100000000 0000000000000000
[ 3446.335888] ffff88006ffdb000 0000000000000000 0000000000000003 0000000000000001
[ 3446.335888] Call Trace:
[ 3446.335888] __alloc_pages_nodemask (mm/page_alloc.c:2731)
[ 3446.335888] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 3446.335888] alloc_pages_vma (include/linux/mempolicy.h:76 mm/mempolicy.c:1998)
[ 3446.335888] ? shmem_alloc_page (mm/shmem.c:881)
[ 3446.335888] ? kvm_clock_read (arch/x86/include/asm/preempt.h:90 arch/x86/kernel/kvmclock.c:86)
[ 3446.335888] shmem_alloc_page (mm/shmem.c:881)
[ 3446.335888] ? __const_udelay (arch/x86/lib/delay.c:126)
[ 3446.335888] ? __rcu_read_unlock (kernel/rcu/update.c:97)
[ 3446.335888] ? find_get_entry (mm/filemap.c:979)
[ 3446.335888] ? find_get_entry (mm/filemap.c:940)
[ 3446.335888] ? find_lock_entry (mm/filemap.c:1024)
[ 3446.335888] shmem_getpage_gfp (mm/shmem.c:1130)
[ 3446.335888] ? sched_clock_local (kernel/sched/clock.c:214)
[ 3446.335888] ? do_read_fault.isra.42 (mm/memory.c:3523)
[ 3446.335888] shmem_fault (mm/shmem.c:1237)
[ 3446.335888] ? do_read_fault.isra.42 (mm/memory.c:3523)
[ 3446.335888] __do_fault (mm/memory.c:3344)
[ 3446.335888] ? _raw_spin_unlock (arch/x86/include/asm/preempt.h:98 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:183)
[ 3446.335888] do_read_fault.isra.42 (mm/memory.c:3524)
[ 3446.335888] ? get_parent_ip (kernel/sched/core.c:2485)
[ 3446.335888] ? get_parent_ip (kernel/sched/core.c:2485)
[ 3446.335888] __handle_mm_fault (mm/memory.c:3662 mm/memory.c:3823 mm/memory.c:3950)
[ 3446.335888] ? __const_udelay (arch/x86/lib/delay.c:126)
[ 3446.335888] ? __rcu_read_unlock (kernel/rcu/update.c:97)
[ 3446.335888] handle_mm_fault (mm/memory.c:3973)
[ 3446.335888] __get_user_pages (mm/memory.c:1863)
[ 3446.335888] ? preempt_count_sub (kernel/sched/core.c:2541)
[ 3446.335888] __mlock_vma_pages_range (mm/mlock.c:255)
[ 3446.335888] __mm_populate (mm/mlock.c:711)
[ 3446.335888] vm_mmap_pgoff (include/linux/mm.h:1841 mm/util.c:402)
[ 3446.335888] SyS_mmap_pgoff (mm/mmap.c:1378)
[ 3446.335888] ? syscall_trace_enter (include/linux/context_tracking.h:27 arch/x86/kernel/ptrace.c:1461)
[ 3446.335888] ia32_do_call (arch/x86/ia32/ia32entry.S:430)
[ 3446.335888] Code: 00 66 0f 1f 44 00 00 ba 02 00 00 00 31 f6 48 89 c7 e8 c1 c3 ff ff 48 8b 53 10 83 f8 03 74 08 83 f8 04 75 13 0f 1f 00 39 d0 74 0c <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 45 85 ff 75 15 49 8b 55 00
[ 3446.335888] RIP get_page_from_freelist (mm/page_alloc.c:1197 mm/page_alloc.c:1548 mm/page_alloc.c:2036)
[ 3446.335888] RSP <ffff88053e247778>

And second:

[snip]

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2dbaba1..0ee9f8c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -697,6 +697,8 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> page = list_entry(list->prev, struct page, lru);
> /* must delete as __free_one_page list manipulates */
> list_del(&page->lru);
> +
> + VM_BUG_ON(!check_freepage_migratetype(page));
> mt = get_freepage_migratetype(page);
> /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
> __free_one_page(page, zone, 0, mt);
> @@ -1190,6 +1192,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
> struct page *page = __rmqueue(zone, order, migratetype);
> if (unlikely(page == NULL))
> break;
> + VM_BUG_ON(!check_freepage_migratetype(page));
>
> /*
> * Split buddy pages returned by expand() are received here
>

Could the VM_BUG_ON()s be VM_BUG_ON_PAGE() instead?


Thanks,
Sasha