2012-10-31 11:48:33

by Wen Congyang

[permalink] [raw]
Subject: [Patch v4 0/8] bugfix for memory hotplug

The last version is here:
https://lkml.org/lkml/2012/10/19/56

Note: patch 1-3 are in -mm tree and I don't touch them. The other patches
except patch6 are also in mm tree. Patch 6 is not touched.

Changes from v3 to v4:
Patch4: use dynamically allocated memory instead of static array.
Patch5: merge [patchv3 2-3] into a single patch, and update it as we use
dynamically allocated memory
Patch7: merge [patchv3 5-6] into a single patch
Patch8: merge [patchv3 9] and its fix into a patch

Changes from v2 to v3:
Merge the bug fix from ishimatsu to this patchset(Patch 1-3)
Patch 3: split it from patch as it fixes another bug.
Patch 4: new patch, and fix bad-page state when hotadding a memory
device after hotremoving it. I forgot to post this patch in v2.
Patch 6: update it according to Dave Hansen's comment.

Changes from v1 to v2:
Patch 1: updated according to kosaki's suggestion

Patch 2: new patch, and update mce_bad_pages when removing memory.

Patch 4: new patch, and fix a NR_FREE_PAGES mismatch, and this bug
cause oom in my test.

Patch 5: new patch, and fix a new bug. When repeating to online/offline
pages, the free pages will continue to decrease.

Wen Congyang (6):
memory-hotplug: auto offline page_cgroup when onlining memory block
failed
memory-hotplug: fix NR_FREE_PAGES mismatch
numa: convert static memory to dynamically allocated memory for per
node device
clear the memory to store struct page
memory-hotplug: current hwpoison doesn't support memory offline
memory-hotplug: allocate zone's pcp before onlining pages

Yasuaki Ishimatsu (2):
memory hotplug: suppress "Device memoryX does not have a release()
function" warning
suppress "Device nodeX does not have a release() function" warning

arch/powerpc/kernel/sysfs.c | 4 +--
drivers/base/memory.c | 9 ++++++-
drivers/base/node.c | 56 ++++++++++++++++++++++++++++++------------
include/linux/node.h | 2 +-
include/linux/page-isolation.h | 10 +++++---
mm/hugetlb.c | 4 +--
mm/memory-failure.c | 2 +-
mm/memory_hotplug.c | 13 +++++++---
mm/page_alloc.c | 37 +++++++++++++++++++++-------
mm/page_cgroup.c | 3 +++
mm/page_isolation.c | 27 ++++++++++++++------
mm/sparse.c | 25 ++++++++++++++++++-
12 files changed, 144 insertions(+), 48 deletions(-)

--
1.8.0


2012-10-31 11:26:32

by Wen Congyang

[permalink] [raw]
Subject: Re: [Patch v4 0/8] bugfix for memory hotplug

At 10/31/2012 07:23 PM, Wen Congyang Wrote:
> The last version is here:
> https://lkml.org/lkml/2012/10/19/56
>
> Note: patch 1-3 are in -mm tree and I don't touch them. The other patches
> except patch6 are also in mm tree. Patch 6 is not touched.
>
> Changes from v3 to v4:
> Patch4: use dynamically allocated memory instead of static array.
> Patch5: merge [patchv3 2-3] into a single patch, and update it as we use
> dynamically allocated memory
> Patch7: merge [patchv3 5-6] into a single patch
> Patch8: merge [patchv3 9] and its fix into a patch

Note:
The patch from Michal Hocko <[email protected]> is not merged into patch8

Thanks
Wen Congyang

>
> Changes from v2 to v3:
> Merge the bug fix from ishimatsu to this patchset(Patch 1-3)
> Patch 3: split it from patch as it fixes another bug.
> Patch 4: new patch, and fix bad-page state when hotadding a memory
> device after hotremoving it. I forgot to post this patch in v2.
> Patch 6: update it according to Dave Hansen's comment.
>
> Changes from v1 to v2:
> Patch 1: updated according to kosaki's suggestion
>
> Patch 2: new patch, and update mce_bad_pages when removing memory.
>
> Patch 4: new patch, and fix a NR_FREE_PAGES mismatch, and this bug
> cause oom in my test.
>
> Patch 5: new patch, and fix a new bug. When repeating to online/offline
> pages, the free pages will continue to decrease.
>
> Wen Congyang (6):
> memory-hotplug: auto offline page_cgroup when onlining memory block
> failed
> memory-hotplug: fix NR_FREE_PAGES mismatch
> numa: convert static memory to dynamically allocated memory for per
> node device
> clear the memory to store struct page
> memory-hotplug: current hwpoison doesn't support memory offline
> memory-hotplug: allocate zone's pcp before onlining pages
>
> Yasuaki Ishimatsu (2):
> memory hotplug: suppress "Device memoryX does not have a release()
> function" warning
> suppress "Device nodeX does not have a release() function" warning
>
> arch/powerpc/kernel/sysfs.c | 4 +--
> drivers/base/memory.c | 9 ++++++-
> drivers/base/node.c | 56 ++++++++++++++++++++++++++++++------------
> include/linux/node.h | 2 +-
> include/linux/page-isolation.h | 10 +++++---
> mm/hugetlb.c | 4 +--
> mm/memory-failure.c | 2 +-
> mm/memory_hotplug.c | 13 +++++++---
> mm/page_alloc.c | 37 +++++++++++++++++++++-------
> mm/page_cgroup.c | 3 +++
> mm/page_isolation.c | 27 ++++++++++++++------
> mm/sparse.c | 25 ++++++++++++++++++-
> 12 files changed, 144 insertions(+), 48 deletions(-)
>

2012-10-31 11:48:38

by Wen Congyang

[permalink] [raw]
Subject: [Patch v4 5/8] suppress "Device nodeX does not have a release() function" warning

From: Yasuaki Ishimatsu <[email protected]>

When calling unregister_node(), the function shows following message at
device_release().

"Device 'node2' does not have a release() function, it is broken and must
be fixed."

The reason is node's device struct does not have a release() function.

So the patch registers node_device_release() to the device's release()
function for suppressing the warning message. Additionally, the patch adds
memset() to initialize a node struct into register_node(). Because the node
struct is part of node_devices[] array and it cannot be freed by
node_device_release(). So if system reuses the node struct, it has a garbage.

CC: David Rientjes <[email protected]>
CC: Jiang Liu <[email protected]>
Cc: Minchan Kim <[email protected]>
CC: Andrew Morton <[email protected]>
CC: KOSAKI Motohiro <[email protected]>
Signed-off-by: Yasuaki Ishimatsu <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
drivers/base/node.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 28216ce..4282e82 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -252,6 +252,24 @@ static inline void hugetlb_register_node(struct node *node) {}
static inline void hugetlb_unregister_node(struct node *node) {}
#endif

+static void node_device_release(struct device *dev)
+{
+ struct node *node = to_node(dev);
+
+#if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS)
+ /*
+ * We schedule the work only when a memory section is
+ * onlined/offlined on this node. When we come here,
+ * all the memory on this node has been offlined,
+ * so we won't enqueue new work to this work.
+ *
+ * The work is using node->node_work, so we should
+ * flush work before freeing the memory.
+ */
+ flush_work(&node->node_work);
+#endif
+ kfree(node);
+}

/*
* register_node - Setup a sysfs device for a node.
@@ -265,6 +283,7 @@ int register_node(struct node *node, int num, struct node *parent)

node->dev.id = num;
node->dev.bus = &node_subsys;
+ node->dev.release = node_device_release;
error = device_register(&node->dev);

if (!error){
@@ -586,7 +605,6 @@ int register_one_node(int nid)
void unregister_one_node(int nid)
{
unregister_node(node_devices[nid]);
- kfree(node_devices[nid]);
node_devices[nid] = NULL;
}

--
1.8.0

2012-10-31 11:48:40

by Wen Congyang

[permalink] [raw]
Subject: [Patch v4 6/8] clear the memory to store struct page

If sparse memory vmemmap is enabled, we can't free the memory to store
struct page when a memory device is hotremoved, because we may store
struct page in the memory to manage the memory which doesn't belong
to this memory device. When we hotadded this memory device again, we
will reuse this memory to store struct page, and struct page may
contain some obsolete information, and we will get bad-page state:

[ 59.611278] init_memory_mapping: [mem 0x80000000-0x9fffffff]
[ 59.637836] Built 2 zonelists in Node order, mobility grouping on. Total pages: 547617
[ 59.638739] Policy zone: Normal
[ 59.650840] BUG: Bad page state in process bash pfn:9b6dc
[ 59.651124] page:ffffea0002200020 count:0 mapcount:0 mapping: (null) index:0xfdfdfdfdfdfdfdfd
[ 59.651494] page flags: 0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock)
[ 59.653604] Modules linked in: netconsole acpiphp pci_hotplug acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk libata virtio_pci virtio_ring virtio scsi_mod
[ 59.656998] Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12
[ 59.657172] Call Trace:
[ 59.657275] [<ffffffff810e9b30>] ? bad_page+0xb0/0x100
[ 59.657434] [<ffffffff810ea4c3>] ? free_pages_prepare+0xb3/0x100
[ 59.657610] [<ffffffff810ea668>] ? free_hot_cold_page+0x48/0x1a0
[ 59.657787] [<ffffffff8112cc08>] ? online_pages_range+0x68/0xa0
[ 59.657961] [<ffffffff8112cba0>] ? __online_page_increment_counters+0x10/0x10
[ 59.658162] [<ffffffff81045561>] ? walk_system_ram_range+0x101/0x110
[ 59.658346] [<ffffffff814c4f95>] ? online_pages+0x1a5/0x2b0
[ 59.658515] [<ffffffff8135663d>] ? __memory_block_change_state+0x20d/0x270
[ 59.658710] [<ffffffff81356756>] ? store_mem_state+0xb6/0xf0
[ 59.658878] [<ffffffff8119e482>] ? sysfs_write_file+0xd2/0x160
[ 59.659052] [<ffffffff8113769a>] ? vfs_write+0xaa/0x160
[ 59.659212] [<ffffffff81137977>] ? sys_write+0x47/0x90
[ 59.659371] [<ffffffff814e2f25>] ? async_page_fault+0x25/0x30
[ 59.659543] [<ffffffff814ea239>] ? system_call_fastpath+0x16/0x1b
[ 59.659720] Disabling lock debugging due to kernel taint

This patch clears the memory to store struct page to avoid unexpected error.

CC: David Rientjes <[email protected]>
CC: Jiang Liu <[email protected]>
Cc: Minchan Kim <[email protected]>
CC: Andrew Morton <[email protected]>
Acked-by: KOSAKI Motohiro <[email protected]>
CC: Yasuaki Ishimatsu <[email protected]>
Reported-by: Vasilis Liaskovitis <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
mm/sparse.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index fac95f2..0021265 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -638,7 +638,6 @@ static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
got_map_page:
ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
got_map_ptr:
- memset(ret, 0, memmap_size);

return ret;
}
@@ -760,6 +759,8 @@ int __meminit sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
goto out;
}

+ memset(memmap, 0, sizeof(struct page) * nr_pages);
+
ms->section_mem_map |= SECTION_MARKED_PRESENT;

ret = sparse_init_one_section(ms, section_nr, memmap, usemap);
--
1.8.0

2012-10-31 11:48:44

by Wen Congyang

[permalink] [raw]
Subject: [Patch v4 3/8] memory-hotplug: fix NR_FREE_PAGES mismatch

NR_FREE_PAGES will be wrong after offlining pages. We add/dec
NR_FREE_PAGES like this now:

1. move all pages in buddy system to MIGRATE_ISOLATE, and dec NR_FREE_PAGES

2. don't add NR_FREE_PAGES when it is freed and the migratetype is
MIGRATE_ISOLATE

3. dec NR_FREE_PAGES when offlining isolated pages.

4. add NR_FREE_PAGES when undoing isolate pages.

When we come to step 3, all pages are in MIGRATE_ISOLATE list, and
NR_FREE_PAGES are right. When we come to step4, all pages are not in
buddy system, so we don't change NR_FREE_PAGES in this step, but we change
NR_FREE_PAGES in step3. So NR_FREE_PAGES is wrong after offlining pages.
So there is no need to change NR_FREE_PAGES in step3.

This patch also fixs a problem in step2: if the migratetype is
MIGRATE_ISOLATE, we should not add NR_FRR_PAGES when we remove pages from
pcppages.

Signed-off-by: Wen Congyang <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
mm/page_alloc.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5b74de6..a7cd2d1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -667,11 +667,13 @@ static void free_pcppages_bulk(struct zone *zone, int count,
/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
__free_one_page(page, zone, 0, mt);
trace_mm_page_pcpu_drain(page, 0, mt);
- if (is_migrate_cma(mt))
- __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
+ if (likely(mt != MIGRATE_ISOLATE)) {
+ __mod_zone_page_state(zone, NR_FREE_PAGES, 1);
+ if (is_migrate_cma(mt))
+ __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
+ }
} while (--to_free && --batch_free && !list_empty(list));
}
- __mod_zone_page_state(zone, NR_FREE_PAGES, count);
spin_unlock(&zone->lock);
}

@@ -5987,8 +5989,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
list_del(&page->lru);
rmv_page_order(page);
zone->free_area[order].nr_free--;
- __mod_zone_page_state(zone, NR_FREE_PAGES,
- - (1UL << order));
for (i = 0; i < (1 << order); i++)
SetPageReserved((page+i));
pfn += (1 << order);
--
1.8.0

2012-10-31 11:49:32

by Wen Congyang

[permalink] [raw]
Subject: [Patch v4 7/8] memory-hotplug: current hwpoison doesn't support memory offline

hwpoisoned may be set when we offline a page by the sysfs interface
/sys/devices/system/memory/soft_offline_page or
/sys/devices/system/memory/hard_offline_page.

If a page is hwpisoned page, we may meet the following problems
when we offlining/removing the memory:
1. the pages can't be offlined.
If the page is hwpoisoned pages, it can't be freed when it is
onlined, and will not in free list. So we can't offline these
pages again. So we should skip such page when offlining pages.

2. mce_bad_pages is wrong after removing a memory.
When we hotremove a memory device, we will free the memory to
store struct page. If the page is hwpoisoned page, we should
decrease mce_bad_pages.

Cc: David Rientjes <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
include/linux/page-isolation.h | 10 ++++++----
mm/memory-failure.c | 2 +-
mm/memory_hotplug.c | 5 +++--
mm/page_alloc.c | 27 +++++++++++++++++++++++----
mm/page_isolation.c | 27 ++++++++++++++++++++-------
mm/sparse.c | 22 ++++++++++++++++++++++
6 files changed, 75 insertions(+), 18 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 76a9539..a92061e 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -2,7 +2,8 @@
#define __LINUX_PAGEISOLATION_H


-bool has_unmovable_pages(struct zone *zone, struct page *page, int count);
+bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
+ bool skip_hwpoisoned_pages);
void set_pageblock_migratetype(struct page *page, int migratetype);
int move_freepages_block(struct zone *zone, struct page *page,
int migratetype);
@@ -21,7 +22,7 @@ int move_freepages(struct zone *zone,
*/
int
start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
- unsigned migratetype);
+ unsigned migratetype, bool skip_hwpoisoned_pages);

/*
* Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
@@ -34,12 +35,13 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
/*
* Test all pages in [start_pfn, end_pfn) are isolated or not.
*/
-int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
+ bool skip_hwpoisoned_pages);

/*
* Internal functions. Changes pageblock's migrate type.
*/
-int set_migratetype_isolate(struct page *page);
+int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages);
void unset_migratetype_isolate(struct page *page, unsigned migratetype);
struct page *alloc_migrate_target(struct page *page, unsigned long private,
int **resultp);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 6c5899b..1abffee 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1385,7 +1385,7 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags)
* Isolate the page, so that it doesn't get reallocated if it
* was free.
*/
- set_migratetype_isolate(p);
+ set_migratetype_isolate(p, true);
/*
* When the target page is a free hugepage, just remove it
* from free hugepage list.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 56b758a..72f4fef 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -854,7 +854,7 @@ check_pages_isolated_cb(unsigned long start_pfn, unsigned long nr_pages,
{
int ret;
long offlined = *(long *)data;
- ret = test_pages_isolated(start_pfn, start_pfn + nr_pages);
+ ret = test_pages_isolated(start_pfn, start_pfn + nr_pages, true);
offlined = nr_pages;
if (!ret)
*(long *)data += offlined;
@@ -901,7 +901,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
nr_pages = end_pfn - start_pfn;

/* set above range as isolated */
- ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+ ret = start_isolate_page_range(start_pfn, end_pfn,
+ MIGRATE_MOVABLE, true);
if (ret)
goto out;

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a7cd2d1..027afd0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5577,7 +5577,8 @@ void set_pageblock_flags_group(struct page *page, unsigned long flags,
* MIGRATE_MOVABLE block might include unmovable pages. It means you can't
* expect this function should be exact.
*/
-bool has_unmovable_pages(struct zone *zone, struct page *page, int count)
+bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
+ bool skip_hwpoisoned_pages)
{
unsigned long pfn, iter, found;
int mt;
@@ -5612,6 +5613,13 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count)
continue;
}

+ /*
+ * The HWPoisoned page may be not in buddy system, and
+ * page_count() is not 0.
+ */
+ if (skip_hwpoisoned_pages && PageHWPoison(page))
+ continue;
+
if (!PageLRU(page))
found++;
/*
@@ -5654,7 +5662,7 @@ bool is_pageblock_removable_nolock(struct page *page)
zone->zone_start_pfn + zone->spanned_pages <= pfn)
return false;

- return !has_unmovable_pages(zone, page, 0);
+ return !has_unmovable_pages(zone, page, 0, true);
}

#ifdef CONFIG_CMA
@@ -5825,7 +5833,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
*/

ret = start_isolate_page_range(pfn_max_align_down(start),
- pfn_max_align_up(end), migratetype);
+ pfn_max_align_up(end), migratetype,
+ false);
if (ret)
return ret;

@@ -5864,7 +5873,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
}

/* Make sure the range is really isolated. */
- if (test_pages_isolated(outer_start, end)) {
+ if (test_pages_isolated(outer_start, end, false)) {
pr_warn("alloc_contig_range test_pages_isolated(%lx, %lx) failed\n",
outer_start, end);
ret = -EBUSY;
@@ -5979,6 +5988,16 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
continue;
}
page = pfn_to_page(pfn);
+ /*
+ * The HWPoisoned page may be not in buddy system, and
+ * page_count() is not 0.
+ */
+ if (unlikely(!PageBuddy(page) && PageHWPoison(page))) {
+ pfn++;
+ SetPageReserved(page);
+ continue;
+ }
+
BUG_ON(page_count(page));
BUG_ON(!PageBuddy(page));
order = page_order(page);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index f2f5b48..9d2264e 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -30,7 +30,7 @@ static void restore_pageblock_isolate(struct page *page, int migratetype)
zone->nr_pageblock_isolate--;
}

-int set_migratetype_isolate(struct page *page)
+int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
{
struct zone *zone;
unsigned long flags, pfn;
@@ -66,7 +66,8 @@ int set_migratetype_isolate(struct page *page)
* FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
* We just check MOVABLE pages.
*/
- if (!has_unmovable_pages(zone, page, arg.pages_found))
+ if (!has_unmovable_pages(zone, page, arg.pages_found,
+ skip_hwpoisoned_pages))
ret = 0;

/*
@@ -134,7 +135,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
* Returns 0 on success and -EBUSY if any part of range cannot be isolated.
*/
int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
- unsigned migratetype)
+ unsigned migratetype, bool skip_hwpoisoned_pages)
{
unsigned long pfn;
unsigned long undo_pfn;
@@ -147,7 +148,8 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
pfn < end_pfn;
pfn += pageblock_nr_pages) {
page = __first_valid_page(pfn, pageblock_nr_pages);
- if (page && set_migratetype_isolate(page)) {
+ if (page &&
+ set_migratetype_isolate(page, skip_hwpoisoned_pages)) {
undo_pfn = pfn;
goto undo;
}
@@ -190,7 +192,8 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
* Returns 1 if all pages in the range are isolated.
*/
static int
-__test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
+__test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
+ bool skip_hwpoisoned_pages)
{
struct page *page;

@@ -220,6 +223,14 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
else if (page_count(page) == 0 &&
get_freepage_migratetype(page) == MIGRATE_ISOLATE)
pfn += 1;
+ else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
+ /*
+ * The HWPoisoned page may be not in buddy
+ * system, and page_count() is not 0.
+ */
+ pfn++;
+ continue;
+ }
else
break;
}
@@ -228,7 +239,8 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
return 1;
}

-int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
+ bool skip_hwpoisoned_pages)
{
unsigned long pfn, flags;
struct page *page;
@@ -251,7 +263,8 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
/* Check all pages are free or Marked as ISOLATED */
zone = page_zone(page);
spin_lock_irqsave(&zone->lock, flags);
- ret = __test_page_isolated_in_pageblock(start_pfn, end_pfn);
+ ret = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
+ skip_hwpoisoned_pages);
spin_unlock_irqrestore(&zone->lock, flags);
return ret ? 0 : -EBUSY;
}
diff --git a/mm/sparse.c b/mm/sparse.c
index 0021265..b2d37c6 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -774,6 +774,27 @@ out:
return ret;
}

+#ifdef CONFIG_MEMORY_FAILURE
+static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
+{
+ int i;
+
+ if (!memmap)
+ return;
+
+ for (i = 0; i < PAGES_PER_SECTION; i++) {
+ if (PageHWPoison(&memmap[i])) {
+ atomic_long_sub(1, &mce_bad_pages);
+ ClearPageHWPoison(&memmap[i]);
+ }
+ }
+}
+#else
+static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
+{
+}
+#endif
+
void sparse_remove_one_section(struct zone *zone, struct mem_section *ms)
{
struct page *memmap = NULL;
@@ -787,6 +808,7 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms)
ms->pageblock_flags = NULL;
}

+ clear_hwpoisoned_pages(memmap, PAGES_PER_SECTION);
free_section_usemap(memmap, usemap);
}
#endif
--
1.8.0

2012-10-31 11:48:30

by Wen Congyang

[permalink] [raw]
Subject: [Patch v4 1/8] memory hotplug: suppress "Device memoryX does not have a release() function" warning

From: Yasuaki Ishimatsu <[email protected]>

When calling remove_memory_block(), the function shows following message
at device_release().

"Device 'memory528' does not have a release() function, it is broken and
must be fixed."

The reason is memory_block's device struct does not have a release()
function.

So the patch registers memory_block_release() to the device's release()
function for suppressing the warning message. Additionally, the patch
moves kfree(mem) into the release function since the release function is
prepared as a means to free a memory_block struct.

Signed-off-by: Yasuaki Ishimatsu <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Minchan Kim <[email protected]>
Acked-by: KOSAKI Motohiro <[email protected]>
Cc: Wen Congyang <[email protected]>
Cc: Greg KH <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
drivers/base/memory.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 86c8821..7eb1211 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -70,6 +70,13 @@ void unregister_memory_isolate_notifier(struct notifier_block *nb)
}
EXPORT_SYMBOL(unregister_memory_isolate_notifier);

+static void memory_block_release(struct device *dev)
+{
+ struct memory_block *mem = container_of(dev, struct memory_block, dev);
+
+ kfree(mem);
+}
+
/*
* register_memory - Setup a sysfs device for a memory block
*/
@@ -80,6 +87,7 @@ int register_memory(struct memory_block *memory)

memory->dev.bus = &memory_subsys;
memory->dev.id = memory->start_section_nr / sections_per_block;
+ memory->dev.release = memory_block_release;

error = device_register(&memory->dev);
return error;
@@ -635,7 +643,6 @@ int remove_memory_block(unsigned long node_id, struct mem_section *section,
mem_remove_simple_file(mem, phys_device);
mem_remove_simple_file(mem, removable);
unregister_memory(mem);
- kfree(mem);
} else
kobject_put(&mem->dev.kobj);

--
1.8.0

2012-10-31 11:49:58

by Wen Congyang

[permalink] [raw]
Subject: [Patch v4 8/8] memory-hotplug: allocate zone's pcp before onlining pages

We use __free_page() to put a page to buddy system when onlining pages.
__free_page() will store NR_FREE_PAGES in zone's pcp.vm_stat_diff, so we
should allocate zone's pcp before onlining pages, otherwise we will lose
some free pages.

Cc: David Rientjes <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
mm/memory_hotplug.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 72f4fef..63ea7df 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -505,12 +505,16 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
* So, zonelist must be updated after online.
*/
mutex_lock(&zonelists_mutex);
- if (!populated_zone(zone))
+ if (!populated_zone(zone)) {
need_zonelists_rebuild = 1;
+ build_all_zonelists(NULL, zone);
+ }

ret = walk_system_ram_range(pfn, nr_pages, &onlined_pages,
online_pages_range);
if (ret) {
+ if (need_zonelists_rebuild)
+ zone_pcp_reset(zone);
mutex_unlock(&zonelists_mutex);
printk(KERN_DEBUG "online_pages [mem %#010llx-%#010llx] failed\n",
(unsigned long long) pfn << PAGE_SHIFT,
@@ -526,7 +530,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
if (onlined_pages) {
node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
if (need_zonelists_rebuild)
- build_all_zonelists(NULL, zone);
+ build_all_zonelists(NULL, NULL);
else
zone_pcp_update(zone);
}
--
1.8.0

2012-10-31 11:50:22

by Wen Congyang

[permalink] [raw]
Subject: [Patch v4 4/8] numa: convert static memory to dynamically allocated memory for per node device

We use a static array to store struct node. In many cases, we don't have too
many nodes, and some memory will be unused. Convert it to per-device
dynamically allocated memory.

CC: David Rientjes <[email protected]>
CC: Jiang Liu <[email protected]>
Cc: Minchan Kim <[email protected]>
CC: Andrew Morton <[email protected]>
CC: KOSAKI Motohiro <[email protected]>
CC: Yasuaki Ishimatsu <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
arch/powerpc/kernel/sysfs.c | 4 ++--
drivers/base/node.c | 38 ++++++++++++++++++++++----------------
include/linux/node.h | 2 +-
mm/hugetlb.c | 4 ++--
4 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index cf357a0..3ce1f86 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -607,7 +607,7 @@ static void register_nodes(void)

int sysfs_add_device_to_node(struct device *dev, int nid)
{
- struct node *node = &node_devices[nid];
+ struct node *node = node_devices[nid];
return sysfs_create_link(&node->dev.kobj, &dev->kobj,
kobject_name(&dev->kobj));
}
@@ -615,7 +615,7 @@ EXPORT_SYMBOL_GPL(sysfs_add_device_to_node);

void sysfs_remove_device_from_node(struct device *dev, int nid)
{
- struct node *node = &node_devices[nid];
+ struct node *node = node_devices[nid];
sysfs_remove_link(&node->dev.kobj, kobject_name(&dev->kobj));
}
EXPORT_SYMBOL_GPL(sysfs_remove_device_from_node);
diff --git a/drivers/base/node.c b/drivers/base/node.c
index af1a177..28216ce 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -306,7 +306,7 @@ void unregister_node(struct node *node)
device_unregister(&node->dev);
}

-struct node node_devices[MAX_NUMNODES];
+struct node *node_devices[MAX_NUMNODES];

/*
* register cpu under node
@@ -323,15 +323,15 @@ int register_cpu_under_node(unsigned int cpu, unsigned int nid)
if (!obj)
return 0;

- ret = sysfs_create_link(&node_devices[nid].dev.kobj,
+ ret = sysfs_create_link(&node_devices[nid]->dev.kobj,
&obj->kobj,
kobject_name(&obj->kobj));
if (ret)
return ret;

return sysfs_create_link(&obj->kobj,
- &node_devices[nid].dev.kobj,
- kobject_name(&node_devices[nid].dev.kobj));
+ &node_devices[nid]->dev.kobj,
+ kobject_name(&node_devices[nid]->dev.kobj));
}

int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
@@ -345,10 +345,10 @@ int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
if (!obj)
return 0;

- sysfs_remove_link(&node_devices[nid].dev.kobj,
+ sysfs_remove_link(&node_devices[nid]->dev.kobj,
kobject_name(&obj->kobj));
sysfs_remove_link(&obj->kobj,
- kobject_name(&node_devices[nid].dev.kobj));
+ kobject_name(&node_devices[nid]->dev.kobj));

return 0;
}
@@ -390,15 +390,15 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
continue;
if (page_nid != nid)
continue;
- ret = sysfs_create_link_nowarn(&node_devices[nid].dev.kobj,
+ ret = sysfs_create_link_nowarn(&node_devices[nid]->dev.kobj,
&mem_blk->dev.kobj,
kobject_name(&mem_blk->dev.kobj));
if (ret)
return ret;

return sysfs_create_link_nowarn(&mem_blk->dev.kobj,
- &node_devices[nid].dev.kobj,
- kobject_name(&node_devices[nid].dev.kobj));
+ &node_devices[nid]->dev.kobj,
+ kobject_name(&node_devices[nid]->dev.kobj));
}
/* mem section does not span the specified node */
return 0;
@@ -431,10 +431,10 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
continue;
if (node_test_and_set(nid, *unlinked_nodes))
continue;
- sysfs_remove_link(&node_devices[nid].dev.kobj,
+ sysfs_remove_link(&node_devices[nid]->dev.kobj,
kobject_name(&mem_blk->dev.kobj));
sysfs_remove_link(&mem_blk->dev.kobj,
- kobject_name(&node_devices[nid].dev.kobj));
+ kobject_name(&node_devices[nid]->dev.kobj));
}
NODEMASK_FREE(unlinked_nodes);
return 0;
@@ -500,7 +500,7 @@ static void node_hugetlb_work(struct work_struct *work)

static void init_node_hugetlb_work(int nid)
{
- INIT_WORK(&node_devices[nid].node_work, node_hugetlb_work);
+ INIT_WORK(&node_devices[nid]->node_work, node_hugetlb_work);
}

static int node_memory_callback(struct notifier_block *self,
@@ -517,7 +517,7 @@ static int node_memory_callback(struct notifier_block *self,
* when transitioning to/from memoryless state.
*/
if (nid != NUMA_NO_NODE)
- schedule_work(&node_devices[nid].node_work);
+ schedule_work(&node_devices[nid]->node_work);
break;

case MEM_GOING_ONLINE:
@@ -558,9 +558,13 @@ int register_one_node(int nid)
struct node *parent = NULL;

if (p_node != nid)
- parent = &node_devices[p_node];
+ parent = node_devices[p_node];

- error = register_node(&node_devices[nid], nid, parent);
+ node_devices[nid] = kzalloc(sizeof(struct node), GFP_KERNEL);
+ if (!node_devices[nid])
+ return -ENOMEM;
+
+ error = register_node(node_devices[nid], nid, parent);

/* link cpu under this node */
for_each_present_cpu(cpu) {
@@ -581,7 +585,9 @@ int register_one_node(int nid)

void unregister_one_node(int nid)
{
- unregister_node(&node_devices[nid]);
+ unregister_node(node_devices[nid]);
+ kfree(node_devices[nid]);
+ node_devices[nid] = NULL;
}

/*
diff --git a/include/linux/node.h b/include/linux/node.h
index 624e53c..10316f1 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -27,7 +27,7 @@ struct node {
};

struct memory_block;
-extern struct node node_devices[];
+extern struct node *node_devices[];
typedef void (*node_registration_func_t)(struct node *);

extern int register_node(struct node *, int, struct node *);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 59a0059..1ef2cd4 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1800,7 +1800,7 @@ static void hugetlb_unregister_all_nodes(void)
* remove hstate attributes from any nodes that have them.
*/
for (nid = 0; nid < nr_node_ids; nid++)
- hugetlb_unregister_node(&node_devices[nid]);
+ hugetlb_unregister_node(node_devices[nid]);
}

/*
@@ -1845,7 +1845,7 @@ static void hugetlb_register_all_nodes(void)
int nid;

for_each_node_state(nid, N_HIGH_MEMORY) {
- struct node *node = &node_devices[nid];
+ struct node *node = node_devices[nid];
if (node->dev.id == nid)
hugetlb_register_node(node);
}
--
1.8.0

2012-10-31 12:00:08

by Wen Congyang

[permalink] [raw]
Subject: [Patch v4 2/8] memory-hotplug: auto offline page_cgroup when onlining memory block failed

When a memory block is onlined, we will try allocate memory on that node
to store page_cgroup. If onlining the memory block failed, we don't
offline the page cgroup, and we have no chance to offline this page cgroup
unless the memory block is onlined successfully again. It will cause that
we can't hot-remove the memory device on that node, because some memory is
used to store page cgroup. If onlining the memory block is failed, there
is no need to stort page cgroup for this memory. So auto offline
page_cgroup when onlining memory block failed.

Signed-off-by: Wen Congyang <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Minchan Kim <[email protected]>
Acked-by: KOSAKI Motohiro <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
mm/page_cgroup.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 5ddad0c..44db00e 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -251,6 +251,9 @@ static int __meminit page_cgroup_callback(struct notifier_block *self,
mn->nr_pages, mn->status_change_nid);
break;
case MEM_CANCEL_ONLINE:
+ offline_page_cgroup(mn->start_pfn,
+ mn->nr_pages, mn->status_change_nid);
+ break;
case MEM_GOING_OFFLINE:
break;
case MEM_ONLINE:
--
1.8.0

2012-10-31 13:41:41

by Jianguo Wu

[permalink] [raw]
Subject: Re: [Patch v4 3/8] memory-hotplug: fix NR_FREE_PAGES mismatch

On 2012/10/31 19:23, Wen Congyang wrote:
> NR_FREE_PAGES will be wrong after offlining pages. We add/dec
> NR_FREE_PAGES like this now:
>
> 1. move all pages in buddy system to MIGRATE_ISOLATE, and dec NR_FREE_PAGES
>
> 2. don't add NR_FREE_PAGES when it is freed and the migratetype is
> MIGRATE_ISOLATE
>
> 3. dec NR_FREE_PAGES when offlining isolated pages.
>
> 4. add NR_FREE_PAGES when undoing isolate pages.
>
> When we come to step 3, all pages are in MIGRATE_ISOLATE list, and
> NR_FREE_PAGES are right. When we come to step4, all pages are not in
> buddy system, so we don't change NR_FREE_PAGES in this step, but we change
> NR_FREE_PAGES in step3. So NR_FREE_PAGES is wrong after offlining pages.
> So there is no need to change NR_FREE_PAGES in step3.
>
> This patch also fixs a problem in step2: if the migratetype is
> MIGRATE_ISOLATE, we should not add NR_FRR_PAGES when we remove pages from
> pcppages.
>
> Signed-off-by: Wen Congyang <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: Jiang Liu <[email protected]>
> Cc: Len Brown <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Yasuaki Ishimatsu <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
> mm/page_alloc.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5b74de6..a7cd2d1 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -667,11 +667,13 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
> __free_one_page(page, zone, 0, mt);
> trace_mm_page_pcpu_drain(page, 0, mt);
> - if (is_migrate_cma(mt))
> - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
> + if (likely(mt != MIGRATE_ISOLATE)) {

Hi Congyang,
I think mt != MIGRATE_ISOLATE is always true here,
page from PCP's migratetype < MIGRATE_PCPTYPES.
When isolate page, we change pageblock's migratetype to MIGRATE_ISOLATE,
but set_freepage_migratetype() isn't called.
Maybe we can use mt = get_pageblock_migratetype() here ?

Thanks,
Jianguo Wu.

> + __mod_zone_page_state(zone, NR_FREE_PAGES, 1);
> + if (is_migrate_cma(mt))
> + __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
> + }
> } while (--to_free && --batch_free && !list_empty(list));
> }
> - __mod_zone_page_state(zone, NR_FREE_PAGES, count);
> spin_unlock(&zone->lock);
> }
>
> @@ -5987,8 +5989,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
> list_del(&page->lru);
> rmv_page_order(page);
> zone->free_area[order].nr_free--;
> - __mod_zone_page_state(zone, NR_FREE_PAGES,
> - - (1UL << order));
> for (i = 0; i < (1 << order); i++)
> SetPageReserved((page+i));
> pfn += (1 << order);
>

2012-11-01 02:49:20

by Wen Congyang

[permalink] [raw]
Subject: [PATCH] memory-hotplug: fix NR_FREE_PAGES mismatch's fix

When a page is freed and put into pcp list, get_freepage_migratetype()
doesn't return MIGRATE_ISOLATE even if this pageblock is isolated.
So we should use get_pageblock_migratetype() instead of mt to check
whether it is isolated.

Cc: David Rientjes <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Jianguo Wu <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>

---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 027afd0..e9c19d2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -667,7 +667,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
__free_one_page(page, zone, 0, mt);
trace_mm_page_pcpu_drain(page, 0, mt);
- if (likely(mt != MIGRATE_ISOLATE)) {
+ if (likely(mt != get_pageblock_migratetype(page))) {
__mod_zone_page_state(zone, NR_FREE_PAGES, 1);
if (is_migrate_cma(mt))
__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
--
1.8.0

2012-11-01 02:54:23

by Wen Congyang

[permalink] [raw]
Subject: Re: [Patch v4 3/8] memory-hotplug: fix NR_FREE_PAGES mismatch

At 10/31/2012 09:41 PM, Jianguo Wu Wrote:
> On 2012/10/31 19:23, Wen Congyang wrote:
>> NR_FREE_PAGES will be wrong after offlining pages. We add/dec
>> NR_FREE_PAGES like this now:
>>
>> 1. move all pages in buddy system to MIGRATE_ISOLATE, and dec NR_FREE_PAGES
>>
>> 2. don't add NR_FREE_PAGES when it is freed and the migratetype is
>> MIGRATE_ISOLATE
>>
>> 3. dec NR_FREE_PAGES when offlining isolated pages.
>>
>> 4. add NR_FREE_PAGES when undoing isolate pages.
>>
>> When we come to step 3, all pages are in MIGRATE_ISOLATE list, and
>> NR_FREE_PAGES are right. When we come to step4, all pages are not in
>> buddy system, so we don't change NR_FREE_PAGES in this step, but we change
>> NR_FREE_PAGES in step3. So NR_FREE_PAGES is wrong after offlining pages.
>> So there is no need to change NR_FREE_PAGES in step3.
>>
>> This patch also fixs a problem in step2: if the migratetype is
>> MIGRATE_ISOLATE, we should not add NR_FRR_PAGES when we remove pages from
>> pcppages.
>>
>> Signed-off-by: Wen Congyang <[email protected]>
>> Cc: David Rientjes <[email protected]>
>> Cc: Jiang Liu <[email protected]>
>> Cc: Len Brown <[email protected]>
>> Cc: Benjamin Herrenschmidt <[email protected]>
>> Cc: Paul Mackerras <[email protected]>
>> Cc: Christoph Lameter <[email protected]>
>> Cc: Minchan Kim <[email protected]>
>> Cc: KOSAKI Motohiro <[email protected]>
>> Cc: Yasuaki Ishimatsu <[email protected]>
>> Cc: Dave Hansen <[email protected]>
>> Cc: Mel Gorman <[email protected]>
>> Signed-off-by: Andrew Morton <[email protected]>
>> ---
>> mm/page_alloc.c | 10 +++++-----
>> 1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 5b74de6..a7cd2d1 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -667,11 +667,13 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>> /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
>> __free_one_page(page, zone, 0, mt);
>> trace_mm_page_pcpu_drain(page, 0, mt);
>> - if (is_migrate_cma(mt))
>> - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
>> + if (likely(mt != MIGRATE_ISOLATE)) {
>
> Hi Congyang,
> I think mt != MIGRATE_ISOLATE is always true here,
> page from PCP's migratetype < MIGRATE_PCPTYPES.
> When isolate page, we change pageblock's migratetype to MIGRATE_ISOLATE,
> but set_freepage_migratetype() isn't called.
> Maybe we can use mt = get_pageblock_migratetype() here ?

Yes, you are right. I have sent a fix patch.

Thanks for pointing it out.

Wen Congyang

>
> Thanks,
> Jianguo Wu.
>
>> + __mod_zone_page_state(zone, NR_FREE_PAGES, 1);
>> + if (is_migrate_cma(mt))
>> + __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
>> + }
>> } while (--to_free && --batch_free && !list_empty(list));
>> }
>> - __mod_zone_page_state(zone, NR_FREE_PAGES, count);
>> spin_unlock(&zone->lock);
>> }
>>
>> @@ -5987,8 +5989,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
>> list_del(&page->lru);
>> rmv_page_order(page);
>> zone->free_area[order].nr_free--;
>> - __mod_zone_page_state(zone, NR_FREE_PAGES,
>> - - (1UL << order));
>> for (i = 0; i < (1 << order); i++)
>> SetPageReserved((page+i));
>> pfn += (1 << order);
>>
>
>