2012-10-19 06:41:14

by Wen Congyang

[permalink] [raw]
Subject: [PATCH v3 0/9] bugfix for memory hotplug

From: Wen Congyang <[email protected]>

Changes from v2 to v3:
Merge the bug fix from ishimatsu to this patchset(Patch 1-3)
Patch 3: split it from patch as it fixes another bug.
Patch 4: new patch, and fix bad-page state when hotadding a memory
device after hotremoving it. I forgot to post this patch in v2.
Patch 6: update it according to Dave Hansen's comment.

Changes from v1 to v2:
Patch 1: updated according to kosaki's suggestion

Patch 2: new patch, and update mce_bad_pages when removing memory.

Patch 4: new patch, and fix a NR_FREE_PAGES mismatch, and this bug
cause oom in my test.

Patch 5: new patch, and fix a new bug. When repeating to online/offline
pages, the free pages will continue to decrease.

Wen Congyang (6):
clear the memory to store struct page
memory-hotplug: skip HWPoisoned page when offlining pages
memory-hotplug: update mce_bad_pages when removing the memory
memory-hotplug: auto offline page_cgroup when onlining memory block
failed
memory-hotplug: fix NR_FREE_PAGES mismatch
memory-hotplug: allocate zone's pcp before onlining pages

Yasuaki Ishimatsu (3):
suppress "Device memoryX does not have a release() function" warning
suppress "Device nodeX does not have a release() function" warning
memory-hotplug: flush the work for the node when the node is offlined

drivers/base/memory.c | 9 ++++++++-
drivers/base/node.c | 11 +++++++++++
include/linux/page-isolation.h | 10 ++++++----
mm/memory-failure.c | 2 +-
mm/memory_hotplug.c | 14 ++++++++------
mm/page_alloc.c | 37 ++++++++++++++++++++++++++++---------
mm/page_cgroup.c | 3 +++
mm/page_isolation.c | 27 ++++++++++++++++++++-------
mm/sparse.c | 22 +++++++++++++++++++++-
9 files changed, 106 insertions(+), 29 deletions(-)


2012-10-19 06:41:17

by Wen Congyang

[permalink] [raw]
Subject: [PATCH v3 3/9] memory-hotplug: flush the work for the node when the node is offlined

From: Yasuaki Ishimatsu <[email protected]>

If the node is onlined after it is offlined, we will clear the memory
to store the node's information. This structure contains struct work,
so we should flush work before the work's information is cleared.

CC: David Rientjes <[email protected]>
CC: Jiang Liu <[email protected]>
Cc: Minchan Kim <[email protected]>
CC: Andrew Morton <[email protected]>
CC: KOSAKI Motohiro <[email protected]>
Signed-off-by: Yasuaki Ishimatsu <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
drivers/base/node.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 2baa73a..13c0ddf 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -254,6 +254,11 @@ static inline void hugetlb_unregister_node(struct node *node) {}

static void node_device_release(struct device *dev)
{
+#if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS)
+ struct node *node_dev = to_node(dev);
+
+ flush_work(&node_dev->node_work);
+#endif
}

/*
--
1.7.1

2012-10-19 06:41:15

by Wen Congyang

[permalink] [raw]
Subject: [PATCH v3 1/9] suppress "Device memoryX does not have a release() function" warning

From: Yasuaki Ishimatsu <[email protected]>

When calling remove_memory_block(), the function shows following message at
device_release().

"Device 'memory528' does not have a release() function, it is broken and must
be fixed."

The reason is memory_block's device struct does not have a release() function.

So the patch registers memory_block_release() to the device's release() function
for suppressing the warning message. Additionally, the patch moves kfree(mem)
into the release function since the release function is prepared as a means
to free a memory_block struct.

CC: Jiang Liu <[email protected]>
Cc: Minchan Kim <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Wen Congyang <[email protected]>
Signed-off-by: Yasuaki Ishimatsu <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: KOSAKI Motohiro <[email protected]>
---
drivers/base/memory.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 86c8821..7eb1211 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -70,6 +70,13 @@ void unregister_memory_isolate_notifier(struct notifier_block *nb)
}
EXPORT_SYMBOL(unregister_memory_isolate_notifier);

+static void memory_block_release(struct device *dev)
+{
+ struct memory_block *mem = container_of(dev, struct memory_block, dev);
+
+ kfree(mem);
+}
+
/*
* register_memory - Setup a sysfs device for a memory block
*/
@@ -80,6 +87,7 @@ int register_memory(struct memory_block *memory)

memory->dev.bus = &memory_subsys;
memory->dev.id = memory->start_section_nr / sections_per_block;
+ memory->dev.release = memory_block_release;

error = device_register(&memory->dev);
return error;
@@ -635,7 +643,6 @@ int remove_memory_block(unsigned long node_id, struct mem_section *section,
mem_remove_simple_file(mem, phys_device);
mem_remove_simple_file(mem, removable);
unregister_memory(mem);
- kfree(mem);
} else
kobject_put(&mem->dev.kobj);

--
1.7.1

2012-10-19 06:41:55

by Wen Congyang

[permalink] [raw]
Subject: [PATCH v3 9/9] memory-hotplug: allocate zone's pcp before onlining pages

From: Wen Congyang <[email protected]>

We use __free_page() to put a page to buddy system when onlining pages.
__free_page() will store NR_FREE_PAGES in zone's pcp.vm_stat_diff, so we
should allocate zone's pcp before onlining pages, otherwise we will lose
some free pages.

CC: David Rientjes <[email protected]>
CC: Jiang Liu <[email protected]>
CC: Len Brown <[email protected]>
CC: Benjamin Herrenschmidt <[email protected]>
CC: Paul Mackerras <[email protected]>
CC: Christoph Lameter <[email protected]>
Cc: Minchan Kim <[email protected]>
CC: Andrew Morton <[email protected]>
CC: KOSAKI Motohiro <[email protected]>
CC: Yasuaki Ishimatsu <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
mm/memory_hotplug.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ec899a2..eb4c132 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -505,12 +505,16 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
* So, zonelist must be updated after online.
*/
mutex_lock(&zonelists_mutex);
- if (!populated_zone(zone))
+ if (!populated_zone(zone)) {
need_zonelists_rebuild = 1;
+ build_all_zonelists(NULL, zone);
+ }

ret = walk_system_ram_range(pfn, nr_pages, &onlined_pages,
online_pages_range);
if (ret) {
+ if (need_zonelists_rebuild)
+ zone_pcp_reset(zone);
mutex_unlock(&zonelists_mutex);
printk(KERN_DEBUG "online_pages [mem %#010llx-%#010llx] failed\n",
(unsigned long long) pfn << PAGE_SHIFT,
@@ -525,9 +529,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
zone->zone_pgdat->node_present_pages += onlined_pages;
if (onlined_pages) {
node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
- if (need_zonelists_rebuild)
- build_all_zonelists(NULL, zone);
- else
+ if (!need_zonelists_rebuild)
zone_pcp_update(zone);
}

--
1.7.1

2012-10-19 06:41:13

by Wen Congyang

[permalink] [raw]
Subject: [PATCH v3 2/9] suppress "Device nodeX does not have a release() function" warning

From: Yasuaki Ishimatsu <[email protected]>

When calling unregister_node(), the function shows following message at
device_release().

"Device 'node2' does not have a release() function, it is broken and must
be fixed."

The reason is node's device struct does not have a release() function.

So the patch registers node_device_release() to the device's release()
function for suppressing the warning message. Additionally, the patch adds
memset() to initialize a node struct into register_node(). Because the node
struct is part of node_devices[] array and it cannot be freed by
node_device_release(). So if system reuses the node struct, it has a garbage.

CC: David Rientjes <[email protected]>
CC: Jiang Liu <[email protected]>
Cc: Minchan Kim <[email protected]>
CC: Andrew Morton <[email protected]>
CC: KOSAKI Motohiro <[email protected]>
Signed-off-by: Yasuaki Ishimatsu <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
drivers/base/node.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index af1a177..2baa73a 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -252,6 +252,9 @@ static inline void hugetlb_register_node(struct node *node) {}
static inline void hugetlb_unregister_node(struct node *node) {}
#endif

+static void node_device_release(struct device *dev)
+{
+}

/*
* register_node - Setup a sysfs device for a node.
@@ -263,8 +266,11 @@ int register_node(struct node *node, int num, struct node *parent)
{
int error;

+ memset(node, 0, sizeof(*node));
+
node->dev.id = num;
node->dev.bus = &node_subsys;
+ node->dev.release = node_device_release;
error = device_register(&node->dev);

if (!error){
--
1.7.1

2012-10-19 06:42:16

by Wen Congyang

[permalink] [raw]
Subject: [PATCH v3 8/9] memory-hotplug: fix NR_FREE_PAGES mismatch

From: Wen Congyang <[email protected]>

NR_FREE_PAGES will be wrong after offlining pages. We add/dec NR_FREE_PAGES
like this now:
1. mova all pages in buddy system to MIGRATE_ISOLATE, and dec NR_FREE_PAGES
2. don't add NR_FREE_PAGES when it is freed and the migratetype is MIGRATE_ISOLATE
3. dec NR_FREE_PAGES when offlining isolated pages.
4. add NR_FREE_PAGES when undoing isolate pages.

When we come to step 3, all pages are in MIGRATE_ISOLATE list, and NR_FREE_PAGES
are right. When we come to step4, all pages are not in buddy system, so we don't
change NR_FREE_PAGES in this step, but we change NR_FREE_PAGES in step3. So
NR_FREE_PAGES is wrong after offlining pages. So there is no need to change
NR_FREE_PAGES in step3.

This patch also fixs a problem in step2: if the migratetype is MIGRATE_ISOLATE,
we should not add NR_FRR_PAGES when we remove pages from pcppages.

CC: David Rientjes <[email protected]>
CC: Jiang Liu <[email protected]>
CC: Len Brown <[email protected]>
CC: Benjamin Herrenschmidt <[email protected]>
CC: Paul Mackerras <[email protected]>
CC: Christoph Lameter <[email protected]>
Cc: Minchan Kim <[email protected]>
CC: Andrew Morton <[email protected]>
CC: KOSAKI Motohiro <[email protected]>
CC: Yasuaki Ishimatsu <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
mm/page_alloc.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e33d0fb..9aa9490 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -667,11 +667,13 @@ static void free_pcppages_bulk(struct zone *zone, int count,
/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
__free_one_page(page, zone, 0, mt);
trace_mm_page_pcpu_drain(page, 0, mt);
- if (is_migrate_cma(mt))
- __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
+ if (likely(mt != MIGRATE_ISOLATE)) {
+ __mod_zone_page_state(zone, NR_FREE_PAGES, 1);
+ if (is_migrate_cma(mt))
+ __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
+ }
} while (--to_free && --batch_free && !list_empty(list));
}
- __mod_zone_page_state(zone, NR_FREE_PAGES, count);
spin_unlock(&zone->lock);
}

@@ -6006,8 +6008,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
list_del(&page->lru);
rmv_page_order(page);
zone->free_area[order].nr_free--;
- __mod_zone_page_state(zone, NR_FREE_PAGES,
- - (1UL << order));
for (i = 0; i < (1 << order); i++)
SetPageReserved((page+i));
pfn += (1 << order);
--
1.7.1

2012-10-19 06:42:20

by Wen Congyang

[permalink] [raw]
Subject: [PATCH v3 7/9] memory-hotplug: auto offline page_cgroup when onlining memory block failed

From: Wen Congyang <[email protected]>

When a memory block is onlined, we will try allocate memory on that node
to store page_cgroup. If onlining the memory block failed, we don't
offline the page cgroup, and we have no chance to offline this page cgroup
unless the memory block is onlined successfully again. It will cause
that we can't hot-remove the memory device on that node, because some
memory is used to store page cgroup. If onlining the memory block
is failed, there is no need to stort page cgroup for this memory. So
auto offline page_cgroup when onlining memory block failed.

CC: David Rientjes <[email protected]>
CC: Jiang Liu <[email protected]>
CC: Len Brown <[email protected]>
CC: Benjamin Herrenschmidt <[email protected]>
CC: Paul Mackerras <[email protected]>
CC: Christoph Lameter <[email protected]>
Cc: Minchan Kim <[email protected]>
CC: Andrew Morton <[email protected]>
CC: KOSAKI Motohiro <[email protected]>
CC: Yasuaki Ishimatsu <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
Acked-by: KOSAKI Motohiro <[email protected]>
---
mm/page_cgroup.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 5ddad0c..44db00e 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -251,6 +251,9 @@ static int __meminit page_cgroup_callback(struct notifier_block *self,
mn->nr_pages, mn->status_change_nid);
break;
case MEM_CANCEL_ONLINE:
+ offline_page_cgroup(mn->start_pfn,
+ mn->nr_pages, mn->status_change_nid);
+ break;
case MEM_GOING_OFFLINE:
break;
case MEM_ONLINE:
--
1.7.1

2012-10-19 06:43:01

by Wen Congyang

[permalink] [raw]
Subject: [PATCH v3 4/9] clear the memory to store struct page

From: Wen Congyang <[email protected]>

If sparse memory vmemmap is enabled, we can't free the memory to store
struct page when a memory device is hotremoved, because we may store
struct page in the memory to manage the memory which doesn't belong
to this memory device. When we hotadded this memory device again, we
will reuse this memory to store struct page, and struct page may
contain some obsolete information, and we will get bad-page state:

[ 59.611278] init_memory_mapping: [mem 0x80000000-0x9fffffff]
[ 59.637836] Built 2 zonelists in Node order, mobility grouping on. Total pages: 547617
[ 59.638739] Policy zone: Normal
[ 59.650840] BUG: Bad page state in process bash pfn:9b6dc
[ 59.651124] page:ffffea0002200020 count:0 mapcount:0 mapping: (null) index:0xfdfdfdfdfdfdfdfd
[ 59.651494] page flags: 0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock)
[ 59.653604] Modules linked in: netconsole acpiphp pci_hotplug acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk libata virtio_pci virtio_ring virtio scsi_mod
[ 59.656998] Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12
[ 59.657172] Call Trace:
[ 59.657275] [<ffffffff810e9b30>] ? bad_page+0xb0/0x100
[ 59.657434] [<ffffffff810ea4c3>] ? free_pages_prepare+0xb3/0x100
[ 59.657610] [<ffffffff810ea668>] ? free_hot_cold_page+0x48/0x1a0
[ 59.657787] [<ffffffff8112cc08>] ? online_pages_range+0x68/0xa0
[ 59.657961] [<ffffffff8112cba0>] ? __online_page_increment_counters+0x10/0x10
[ 59.658162] [<ffffffff81045561>] ? walk_system_ram_range+0x101/0x110
[ 59.658346] [<ffffffff814c4f95>] ? online_pages+0x1a5/0x2b0
[ 59.658515] [<ffffffff8135663d>] ? __memory_block_change_state+0x20d/0x270
[ 59.658710] [<ffffffff81356756>] ? store_mem_state+0xb6/0xf0
[ 59.658878] [<ffffffff8119e482>] ? sysfs_write_file+0xd2/0x160
[ 59.659052] [<ffffffff8113769a>] ? vfs_write+0xaa/0x160
[ 59.659212] [<ffffffff81137977>] ? sys_write+0x47/0x90
[ 59.659371] [<ffffffff814e2f25>] ? async_page_fault+0x25/0x30
[ 59.659543] [<ffffffff814ea239>] ? system_call_fastpath+0x16/0x1b
[ 59.659720] Disabling lock debugging due to kernel taint

This patch clears the memory to store struct page to avoid unexpected error.

CC: David Rientjes <[email protected]>
CC: Jiang Liu <[email protected]>
Cc: Minchan Kim <[email protected]>
CC: Andrew Morton <[email protected]>
CC: KOSAKI Motohiro <[email protected]>
CC: Yasuaki Ishimatsu <[email protected]>
Reported-by: Vasilis Liaskovitis <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
mm/sparse.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index fac95f2..0021265 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -638,7 +638,6 @@ static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
got_map_page:
ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
got_map_ptr:
- memset(ret, 0, memmap_size);

return ret;
}
@@ -760,6 +759,8 @@ int __meminit sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
goto out;
}

+ memset(memmap, 0, sizeof(struct page) * nr_pages);
+
ms->section_mem_map |= SECTION_MARKED_PRESENT;

ret = sparse_init_one_section(ms, section_nr, memmap, usemap);
--
1.7.1

2012-10-19 06:42:59

by Wen Congyang

[permalink] [raw]
Subject: [PATCH v3 6/9] memory-hotplug: update mce_bad_pages when removing the memory

From: Wen Congyang <[email protected]>

When we hotremove a memory device, we will free the memory to store
struct page. If the page is hwpoisoned page, we should decrease
mce_bad_pages.

CC: David Rientjes <[email protected]>
CC: Jiang Liu <[email protected]>
CC: Len Brown <[email protected]>
CC: Benjamin Herrenschmidt <[email protected]>
CC: Paul Mackerras <[email protected]>
CC: Christoph Lameter <[email protected]>
Cc: Minchan Kim <[email protected]>
CC: Andrew Morton <[email protected]>
CC: KOSAKI Motohiro <[email protected]>
CC: Yasuaki Ishimatsu <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
mm/sparse.c | 19 +++++++++++++++++++
1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 0021265..77d6a93 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -774,6 +774,24 @@ out:
return ret;
}

+#ifdef CONFIG_MEMORY_FAILURE
+static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
+{
+ int i;
+
+ for (i = 0; i < PAGES_PER_SECTION; i++) {
+ if (PageHWPoison(&memmap[i])) {
+ atomic_long_sub(1, &mce_bad_pages);
+ ClearPageHWPoison(&memmap[i]);
+ }
+ }
+}
+#else
+static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
+{
+}
+#endif
+
void sparse_remove_one_section(struct zone *zone, struct mem_section *ms)
{
struct page *memmap = NULL;
@@ -785,6 +803,7 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms)
__section_nr(ms));
ms->section_mem_map = 0;
ms->pageblock_flags = NULL;
+ clear_hwpoisoned_pages(memmap, PAGES_PER_SECTION);
}

free_section_usemap(memmap, usemap);
--
1.7.1

2012-10-19 06:42:57

by Wen Congyang

[permalink] [raw]
Subject: [PATCH v3 5/9] memory-hotplug: skip HWPoisoned page when offlining pages

From: Wen Congyang <[email protected]>

hwpoisoned may be set when we offline a page by the sysfs interface
/sys/devices/system/memory/soft_offline_page or
/sys/devices/system/memory/hard_offline_page. We use __free_page() to put
a page to buddy system when onlining pages. If the page is hwpoisoned page,
we can't put it to buddy system, and the page is not in free list. Such
page can't be isolated and offlined. So we should skip such pages when
offlining pages.

CC: David Rientjes <[email protected]>
CC: Jiang Liu <[email protected]>
CC: Len Brown <[email protected]>
CC: Benjamin Herrenschmidt <[email protected]>
CC: Paul Mackerras <[email protected]>
CC: Christoph Lameter <[email protected]>
Cc: Minchan Kim <[email protected]>
CC: Andrew Morton <[email protected]>
CC: KOSAKI Motohiro <[email protected]>
CC: Yasuaki Ishimatsu <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
include/linux/page-isolation.h | 10 ++++++----
mm/memory-failure.c | 2 +-
mm/memory_hotplug.c | 4 ++--
mm/page_alloc.c | 27 +++++++++++++++++++++++----
mm/page_isolation.c | 27 ++++++++++++++++++++-------
5 files changed, 52 insertions(+), 18 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 76a9539..a92061e 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -2,7 +2,8 @@
#define __LINUX_PAGEISOLATION_H


-bool has_unmovable_pages(struct zone *zone, struct page *page, int count);
+bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
+ bool skip_hwpoisoned_pages);
void set_pageblock_migratetype(struct page *page, int migratetype);
int move_freepages_block(struct zone *zone, struct page *page,
int migratetype);
@@ -21,7 +22,7 @@ int move_freepages(struct zone *zone,
*/
int
start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
- unsigned migratetype);
+ unsigned migratetype, bool skip_hwpoisoned_pages);

/*
* Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
@@ -34,12 +35,13 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
/*
* Test all pages in [start_pfn, end_pfn) are isolated or not.
*/
-int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
+ bool skip_hwpoisoned_pages);

/*
* Internal functions. Changes pageblock's migrate type.
*/
-int set_migratetype_isolate(struct page *page);
+int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages);
void unset_migratetype_isolate(struct page *page, unsigned migratetype);
struct page *alloc_migrate_target(struct page *page, unsigned long private,
int **resultp);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 6c5899b..1abffee 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1385,7 +1385,7 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags)
* Isolate the page, so that it doesn't get reallocated if it
* was free.
*/
- set_migratetype_isolate(p);
+ set_migratetype_isolate(p, true);
/*
* When the target page is a free hugepage, just remove it
* from free hugepage list.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 56b758a..ec899a2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -854,7 +854,7 @@ check_pages_isolated_cb(unsigned long start_pfn, unsigned long nr_pages,
{
int ret;
long offlined = *(long *)data;
- ret = test_pages_isolated(start_pfn, start_pfn + nr_pages);
+ ret = test_pages_isolated(start_pfn, start_pfn + nr_pages, true);
offlined = nr_pages;
if (!ret)
*(long *)data += offlined;
@@ -901,7 +901,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
nr_pages = end_pfn - start_pfn;

/* set above range as isolated */
- ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+ ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE, true);
if (ret)
goto out;

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bb90971..e33d0fb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5575,7 +5575,8 @@ void set_pageblock_flags_group(struct page *page, unsigned long flags,
* MIGRATE_MOVABLE block might include unmovable pages. It means you can't
* expect this function should be exact.
*/
-bool has_unmovable_pages(struct zone *zone, struct page *page, int count)
+bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
+ bool skip_hwpoisoned_pages)
{
unsigned long pfn, iter, found;
int mt;
@@ -5610,6 +5611,13 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count)
continue;
}

+ /*
+ * The HWPoisoned page may be not in buddy system, and
+ * page_count() is not 0.
+ */
+ if (skip_hwpoisoned_pages && PageHWPoison(page))
+ continue;
+
if (!PageLRU(page))
found++;
/*
@@ -5652,7 +5660,7 @@ bool is_pageblock_removable_nolock(struct page *page)
zone->zone_start_pfn + zone->spanned_pages <= pfn)
return false;

- return !has_unmovable_pages(zone, page, 0);
+ return !has_unmovable_pages(zone, page, 0, true);
}

#ifdef CONFIG_CMA
@@ -5823,7 +5831,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
*/

ret = start_isolate_page_range(pfn_max_align_down(start),
- pfn_max_align_up(end), migratetype);
+ pfn_max_align_up(end), migratetype,
+ false);
if (ret)
goto done;

@@ -5862,7 +5871,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
}

/* Make sure the range is really isolated. */
- if (test_pages_isolated(outer_start, end)) {
+ if (test_pages_isolated(outer_start, end, false)) {
pr_warn("alloc_contig_range test_pages_isolated(%lx, %lx) failed\n",
outer_start, end);
ret = -EBUSY;
@@ -5977,6 +5986,16 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
continue;
}
page = pfn_to_page(pfn);
+ /*
+ * The HWPoisoned page may be not in buddy system, and
+ * page_count() is not 0.
+ */
+ if (unlikely(!PageBuddy(page) && PageHWPoison(page))) {
+ pfn++;
+ SetPageReserved(page);
+ continue;
+ }
+
BUG_ON(page_count(page));
BUG_ON(!PageBuddy(page));
order = page_order(page);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index f2f5b48..9d2264e 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -30,7 +30,7 @@ static void restore_pageblock_isolate(struct page *page, int migratetype)
zone->nr_pageblock_isolate--;
}

-int set_migratetype_isolate(struct page *page)
+int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
{
struct zone *zone;
unsigned long flags, pfn;
@@ -66,7 +66,8 @@ int set_migratetype_isolate(struct page *page)
* FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
* We just check MOVABLE pages.
*/
- if (!has_unmovable_pages(zone, page, arg.pages_found))
+ if (!has_unmovable_pages(zone, page, arg.pages_found,
+ skip_hwpoisoned_pages))
ret = 0;

/*
@@ -134,7 +135,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
* Returns 0 on success and -EBUSY if any part of range cannot be isolated.
*/
int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
- unsigned migratetype)
+ unsigned migratetype, bool skip_hwpoisoned_pages)
{
unsigned long pfn;
unsigned long undo_pfn;
@@ -147,7 +148,8 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
pfn < end_pfn;
pfn += pageblock_nr_pages) {
page = __first_valid_page(pfn, pageblock_nr_pages);
- if (page && set_migratetype_isolate(page)) {
+ if (page &&
+ set_migratetype_isolate(page, skip_hwpoisoned_pages)) {
undo_pfn = pfn;
goto undo;
}
@@ -190,7 +192,8 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
* Returns 1 if all pages in the range are isolated.
*/
static int
-__test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
+__test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
+ bool skip_hwpoisoned_pages)
{
struct page *page;

@@ -220,6 +223,14 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
else if (page_count(page) == 0 &&
get_freepage_migratetype(page) == MIGRATE_ISOLATE)
pfn += 1;
+ else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
+ /*
+ * The HWPoisoned page may be not in buddy
+ * system, and page_count() is not 0.
+ */
+ pfn++;
+ continue;
+ }
else
break;
}
@@ -228,7 +239,8 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
return 1;
}

-int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
+ bool skip_hwpoisoned_pages)
{
unsigned long pfn, flags;
struct page *page;
@@ -251,7 +263,8 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
/* Check all pages are free or Marked as ISOLATED */
zone = page_zone(page);
spin_lock_irqsave(&zone->lock, flags);
- ret = __test_page_isolated_in_pageblock(start_pfn, end_pfn);
+ ret = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
+ skip_hwpoisoned_pages);
spin_unlock_irqrestore(&zone->lock, flags);
return ret ? 0 : -EBUSY;
}
--
1.7.1

2012-10-19 06:54:18

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v3 2/9] suppress "Device nodeX does not have a release() function" warning

On Fri, Oct 19, 2012 at 2:46 AM, <[email protected]> wrote:
> From: Yasuaki Ishimatsu <[email protected]>
>
> When calling unregister_node(), the function shows following message at
> device_release().
>
> "Device 'node2' does not have a release() function, it is broken and must
> be fixed."
>
> The reason is node's device struct does not have a release() function.
>
> So the patch registers node_device_release() to the device's release()
> function for suppressing the warning message. Additionally, the patch adds
> memset() to initialize a node struct into register_node(). Because the node
> struct is part of node_devices[] array and it cannot be freed by
> node_device_release(). So if system reuses the node struct, it has a garbage.
>
> CC: David Rientjes <[email protected]>
> CC: Jiang Liu <[email protected]>
> Cc: Minchan Kim <[email protected]>
> CC: Andrew Morton <[email protected]>
> CC: KOSAKI Motohiro <[email protected]>
> Signed-off-by: Yasuaki Ishimatsu <[email protected]>
> Signed-off-by: Wen Congyang <[email protected]>

I still think node array should be converted node pointer array and
node_device_release() free memory as other typical drivers.
However, this is acceptable as first step.

Acked-by: KOSAKI Motohiro <[email protected]>

2012-10-19 07:01:35

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v3 3/9] memory-hotplug: flush the work for the node when the node is offlined

On Fri, Oct 19, 2012 at 2:46 AM, <[email protected]> wrote:
> From: Yasuaki Ishimatsu <[email protected]>
>
> If the node is onlined after it is offlined, we will clear the memory
> to store the node's information. This structure contains struct work,
> so we should flush work before the work's information is cleared.

This explanation is incorrect. Even if you don't call memset(), you should
call flush_work() at offline event. Because of, after offlinining, we
shouldn't touch any node data. Alive workqueue violate this principle.

And, hmmm... Wait. Usually workqueue shutdowning has two phase. 1)
inhibit enqueue new work 2) flush work. Otherwise other cpus may
enqueue new work after flush_work(). Where is (1)?

2012-10-19 07:03:09

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] clear the memory to store struct page

On Fri, Oct 19, 2012 at 2:46 AM, <[email protected]> wrote:
> From: Wen Congyang <[email protected]>
>
> If sparse memory vmemmap is enabled, we can't free the memory to store
> struct page when a memory device is hotremoved, because we may store
> struct page in the memory to manage the memory which doesn't belong
> to this memory device. When we hotadded this memory device again, we
> will reuse this memory to store struct page, and struct page may
> contain some obsolete information, and we will get bad-page state:
>
> [ 59.611278] init_memory_mapping: [mem 0x80000000-0x9fffffff]
> [ 59.637836] Built 2 zonelists in Node order, mobility grouping on. Total pages: 547617
> [ 59.638739] Policy zone: Normal
> [ 59.650840] BUG: Bad page state in process bash pfn:9b6dc
> [ 59.651124] page:ffffea0002200020 count:0 mapcount:0 mapping: (null) index:0xfdfdfdfdfdfdfdfd
> [ 59.651494] page flags: 0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock)
> [ 59.653604] Modules linked in: netconsole acpiphp pci_hotplug acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk libata virtio_pci virtio_ring virtio scsi_mod
> [ 59.656998] Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12
> [ 59.657172] Call Trace:
> [ 59.657275] [<ffffffff810e9b30>] ? bad_page+0xb0/0x100
> [ 59.657434] [<ffffffff810ea4c3>] ? free_pages_prepare+0xb3/0x100
> [ 59.657610] [<ffffffff810ea668>] ? free_hot_cold_page+0x48/0x1a0
> [ 59.657787] [<ffffffff8112cc08>] ? online_pages_range+0x68/0xa0
> [ 59.657961] [<ffffffff8112cba0>] ? __online_page_increment_counters+0x10/0x10
> [ 59.658162] [<ffffffff81045561>] ? walk_system_ram_range+0x101/0x110
> [ 59.658346] [<ffffffff814c4f95>] ? online_pages+0x1a5/0x2b0
> [ 59.658515] [<ffffffff8135663d>] ? __memory_block_change_state+0x20d/0x270
> [ 59.658710] [<ffffffff81356756>] ? store_mem_state+0xb6/0xf0
> [ 59.658878] [<ffffffff8119e482>] ? sysfs_write_file+0xd2/0x160
> [ 59.659052] [<ffffffff8113769a>] ? vfs_write+0xaa/0x160
> [ 59.659212] [<ffffffff81137977>] ? sys_write+0x47/0x90
> [ 59.659371] [<ffffffff814e2f25>] ? async_page_fault+0x25/0x30
> [ 59.659543] [<ffffffff814ea239>] ? system_call_fastpath+0x16/0x1b
> [ 59.659720] Disabling lock debugging due to kernel taint
>
> This patch clears the memory to store struct page to avoid unexpected error.
>
> CC: David Rientjes <[email protected]>
> CC: Jiang Liu <[email protected]>
> Cc: Minchan Kim <[email protected]>
> CC: Andrew Morton <[email protected]>
> CC: KOSAKI Motohiro <[email protected]>
> CC: Yasuaki Ishimatsu <[email protected]>
> Reported-by: Vasilis Liaskovitis <[email protected]>
> Signed-off-by: Wen Congyang <[email protected]>

Acked-by: KOSAKI Motohiro <[email protected]>

2012-10-19 07:06:38

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v3 6/9] memory-hotplug: update mce_bad_pages when removing the memory

On Fri, Oct 19, 2012 at 2:46 AM, <[email protected]> wrote:
> From: Wen Congyang <[email protected]>
>
> When we hotremove a memory device, we will free the memory to store
> struct page. If the page is hwpoisoned page, we should decrease
> mce_bad_pages.

I think [5/9] and [6/9] should be fold. Their two patches fix one
issue. (current
hwpoison doesn't support memory offline)

2012-10-19 07:07:46

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v3 9/9] memory-hotplug: allocate zone's pcp before onlining pages

On Fri, Oct 19, 2012 at 2:46 AM, <[email protected]> wrote:
> From: Wen Congyang <[email protected]>
>
> We use __free_page() to put a page to buddy system when onlining pages.
> __free_page() will store NR_FREE_PAGES in zone's pcp.vm_stat_diff, so we
> should allocate zone's pcp before onlining pages, otherwise we will lose
> some free pages.
>
> CC: David Rientjes <[email protected]>
> CC: Jiang Liu <[email protected]>
> CC: Len Brown <[email protected]>
> CC: Benjamin Herrenschmidt <[email protected]>
> CC: Paul Mackerras <[email protected]>
> CC: Christoph Lameter <[email protected]>
> Cc: Minchan Kim <[email protected]>
> CC: Andrew Morton <[email protected]>
> CC: KOSAKI Motohiro <[email protected]>
> CC: Yasuaki Ishimatsu <[email protected]>
> Signed-off-by: Wen Congyang <[email protected]>

Looks good.

Acked-by: KOSAKI Motohiro <[email protected]>

2012-10-19 07:42:10

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v3 8/9] memory-hotplug: fix NR_FREE_PAGES mismatch

On Fri, Oct 19, 2012 at 2:46 AM, <[email protected]> wrote:
> From: Wen Congyang <[email protected]>
>
> NR_FREE_PAGES will be wrong after offlining pages. We add/dec NR_FREE_PAGES
> like this now:
> 1. mova all pages in buddy system to MIGRATE_ISOLATE, and dec NR_FREE_PAGES

move?

> 2. don't add NR_FREE_PAGES when it is freed and the migratetype is MIGRATE_ISOLATE
> 3. dec NR_FREE_PAGES when offlining isolated pages.
> 4. add NR_FREE_PAGES when undoing isolate pages.
>
> When we come to step 3, all pages are in MIGRATE_ISOLATE list, and NR_FREE_PAGES
> are right. When we come to step4, all pages are not in buddy system, so we don't
> change NR_FREE_PAGES in this step, but we change NR_FREE_PAGES in step3. So
> NR_FREE_PAGES is wrong after offlining pages. So there is no need to change
> NR_FREE_PAGES in step3.

Sorry, I don't understand this two paragraph. Can you please elaborate more?

and one more trivial question: why do we need to call
undo_isolate_page_range() from
__offline_pages()?


>
> This patch also fixs a problem in step2: if the migratetype is MIGRATE_ISOLATE,
> we should not add NR_FRR_PAGES when we remove pages from pcppages.

Why drain_all_pages doesn't work?

2012-10-19 07:47:49

by Wen Congyang

[permalink] [raw]
Subject: Re: [PATCH v3 3/9] memory-hotplug: flush the work for the node when the node is offlined

At 10/19/2012 03:01 PM, KOSAKI Motohiro Wrote:
> On Fri, Oct 19, 2012 at 2:46 AM, <[email protected]> wrote:
>> From: Yasuaki Ishimatsu <[email protected]>
>>
>> If the node is onlined after it is offlined, we will clear the memory
>> to store the node's information. This structure contains struct work,
>> so we should flush work before the work's information is cleared.
>
> This explanation is incorrect. Even if you don't call memset(), you should
> call flush_work() at offline event. Because of, after offlinining, we
> shouldn't touch any node data. Alive workqueue violate this principle.

Yes, I will update the description.

>
> And, hmmm... Wait. Usually workqueue shutdowning has two phase. 1)
> inhibit enqueue new work 2) flush work. Otherwise other cpus may
> enqueue new work after flush_work(). Where is (1)?
>


We schedule the work only when a memory section is onlined/offlined on this
node. When we come here, all the memory on this node has been offlined,
so we won't enqueue new work to this work. I will add a comment
to descript this.

Thanks
Wen Congyang

2012-10-19 07:48:10

by Wen Congyang

[permalink] [raw]
Subject: Re: [PATCH v3 6/9] memory-hotplug: update mce_bad_pages when removing the memory

At 10/19/2012 03:06 PM, KOSAKI Motohiro Wrote:
> On Fri, Oct 19, 2012 at 2:46 AM, <[email protected]> wrote:
>> From: Wen Congyang <[email protected]>
>>
>> When we hotremove a memory device, we will free the memory to store
>> struct page. If the page is hwpoisoned page, we should decrease
>> mce_bad_pages.
>
> I think [5/9] and [6/9] should be fold. Their two patches fix one
> issue. (current
> hwpoison doesn't support memory offline)
>

OK, I will fold them into one patch soon.

Thanks
Wen Congyang

2012-10-19 08:06:40

by Yasuaki Ishimatsu

[permalink] [raw]
Subject: Re: [PATCH v3 0/9] bugfix for memory hotplug

Hi Wen,

Some bug fix patches have been merged into linux-next.
So the patches confuse me.
Why did you send same patches again?

Thanks,
Yasuaki Ishimatsu

2012/10/19 15:46, [email protected] wrote:
> From: Wen Congyang <[email protected]>
>
> Changes from v2 to v3:
> Merge the bug fix from ishimatsu to this patchset(Patch 1-3)
> Patch 3: split it from patch as it fixes another bug.
> Patch 4: new patch, and fix bad-page state when hotadding a memory
> device after hotremoving it. I forgot to post this patch in v2.
> Patch 6: update it according to Dave Hansen's comment.
>
> Changes from v1 to v2:
> Patch 1: updated according to kosaki's suggestion
>
> Patch 2: new patch, and update mce_bad_pages when removing memory.
>
> Patch 4: new patch, and fix a NR_FREE_PAGES mismatch, and this bug
> cause oom in my test.
>
> Patch 5: new patch, and fix a new bug. When repeating to online/offline
> pages, the free pages will continue to decrease.
>
> Wen Congyang (6):
> clear the memory to store struct page
> memory-hotplug: skip HWPoisoned page when offlining pages
> memory-hotplug: update mce_bad_pages when removing the memory
> memory-hotplug: auto offline page_cgroup when onlining memory block
> failed
> memory-hotplug: fix NR_FREE_PAGES mismatch
> memory-hotplug: allocate zone's pcp before onlining pages
>
> Yasuaki Ishimatsu (3):
> suppress "Device memoryX does not have a release() function" warning
> suppress "Device nodeX does not have a release() function" warning
> memory-hotplug: flush the work for the node when the node is offlined
>
> drivers/base/memory.c | 9 ++++++++-
> drivers/base/node.c | 11 +++++++++++
> include/linux/page-isolation.h | 10 ++++++----
> mm/memory-failure.c | 2 +-
> mm/memory_hotplug.c | 14 ++++++++------
> mm/page_alloc.c | 37 ++++++++++++++++++++++++++++---------
> mm/page_cgroup.c | 3 +++
> mm/page_isolation.c | 27 ++++++++++++++++++++-------
> mm/sparse.c | 22 +++++++++++++++++++++-
> 9 files changed, 106 insertions(+), 29 deletions(-)
>

2012-10-19 08:20:00

by Yasuaki Ishimatsu

[permalink] [raw]
Subject: Re: [PATCH v3 0/9] bugfix for memory hotplug

2012/10/19 17:06, Yasuaki Ishimatsu wrote:
> Hi Wen,
>
> Some bug fix patches have been merged into linux-next.
> So the patches confuse me.

The following patches have been already merged into linux-next
and mm-tree as long as I know.

>> Wen Congyang (6):
>> clear the memory to store struct page


>> memory-hotplug: skip HWPoisoned page when offlining pages

mm-tree

>> memory-hotplug: update mce_bad_pages when removing the memory

>> memory-hotplug: auto offline page_cgroup when onlining memory block
>> failed

mm-tree

>> memory-hotplug: fix NR_FREE_PAGES mismatch

mm-tree

>> memory-hotplug: allocate zone's pcp before onlining pages

mm-tree

>>
>> Yasuaki Ishimatsu (3):
>> suppress "Device memoryX does not have a release() function" warning

linux-next

>> suppress "Device nodeX does not have a release() function" warning
>> memory-hotplug: flush the work for the node when the node is offlined

linux-next

Thanks,
Yasuaki Ishimatsu

> Why did you send same patches again?
>
> Thanks,
> Yasuaki Ishimatsu
>
> 2012/10/19 15:46, [email protected] wrote:
>> From: Wen Congyang <[email protected]>
>>
>> Changes from v2 to v3:
>> Merge the bug fix from ishimatsu to this patchset(Patch 1-3)
>> Patch 3: split it from patch as it fixes another bug.
>> Patch 4: new patch, and fix bad-page state when hotadding a memory
>> device after hotremoving it. I forgot to post this patch in v2.
>> Patch 6: update it according to Dave Hansen's comment.
>>
>> Changes from v1 to v2:
>> Patch 1: updated according to kosaki's suggestion
>>
>> Patch 2: new patch, and update mce_bad_pages when removing memory.
>>
>> Patch 4: new patch, and fix a NR_FREE_PAGES mismatch, and this bug
>> cause oom in my test.
>>
>> Patch 5: new patch, and fix a new bug. When repeating to online/offline
>> pages, the free pages will continue to decrease.
>>
>> Wen Congyang (6):
>> clear the memory to store struct page
>> memory-hotplug: skip HWPoisoned page when offlining pages
>> memory-hotplug: update mce_bad_pages when removing the memory
>> memory-hotplug: auto offline page_cgroup when onlining memory block
>> failed
>> memory-hotplug: fix NR_FREE_PAGES mismatch
>> memory-hotplug: allocate zone's pcp before onlining pages
>>
>> Yasuaki Ishimatsu (3):
>> suppress "Device memoryX does not have a release() function" warning
>> suppress "Device nodeX does not have a release() function" warning
>> memory-hotplug: flush the work for the node when the node is offlined
>>
>> drivers/base/memory.c | 9 ++++++++-
>> drivers/base/node.c | 11 +++++++++++
>> include/linux/page-isolation.h | 10 ++++++----
>> mm/memory-failure.c | 2 +-
>> mm/memory_hotplug.c | 14 ++++++++------
>> mm/page_alloc.c | 37 ++++++++++++++++++++++++++++---------
>> mm/page_cgroup.c | 3 +++
>> mm/page_isolation.c | 27 ++++++++++++++++++++-------
>> mm/sparse.c | 22 +++++++++++++++++++++-
>> 9 files changed, 106 insertions(+), 29 deletions(-)
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2012-10-19 08:36:37

by Wen Congyang

[permalink] [raw]
Subject: Re: [PATCH v3 8/9] memory-hotplug: fix NR_FREE_PAGES mismatch

At 10/19/2012 03:41 PM, KOSAKI Motohiro Wrote:
> On Fri, Oct 19, 2012 at 2:46 AM, <[email protected]> wrote:
>> From: Wen Congyang <[email protected]>
>>
>> NR_FREE_PAGES will be wrong after offlining pages. We add/dec NR_FREE_PAGES
>> like this now:
>> 1. mova all pages in buddy system to MIGRATE_ISOLATE, and dec NR_FREE_PAGES
>
> move?

Yes.
__offline_pages()
start_isolate_page_range()
set_migratetype_isolate()
move_freepages_block() // move all pages in buddy system to MIGRATE_ISOLATE
__mod_zone_freepage_state() // dec NR_FREE_PAGES

>
>> 2. don't add NR_FREE_PAGES when it is freed and the migratetype is MIGRATE_ISOLATE
>> 3. dec NR_FREE_PAGES when offlining isolated pages.
>> 4. add NR_FREE_PAGES when undoing isolate pages.
>>
>> When we come to step 3, all pages are in MIGRATE_ISOLATE list, and NR_FREE_PAGES
>> are right. When we come to step4, all pages are not in buddy system, so we don't
>> change NR_FREE_PAGES in this step, but we change NR_FREE_PAGES in step3. So
>> NR_FREE_PAGES is wrong after offlining pages. So there is no need to change
>> NR_FREE_PAGES in step3.
>
> Sorry, I don't understand this two paragraph. Can you please elaborate more?

OK.

If we don't online/offline memory, we add NR_FREE_PAGES when we free a page,
and dec it when allocate a page. If we put the page into pcp, we don't add
NR_FREE_PAGES. We will add it when the page is moved to buddy system from pcp.

When we offline a memory section, we should dec NR_FREE_PAGES(we will add it
when onlining memory section). The pages may be freed or inuse:
1. If the page is freed, and in buddy system. We move it to MIGRATE_ISOLATE,
and dec NR_FREE_PAGES
2. If the page is inuse, we will migrate them to other memory section and free
them. We don't dec NR_FREE_PAGES when it is freed because we have decreased
it when it is allocated. We just put them in MIGRATE_ISOLATE.
3. If the page is in pcp, we call drain_all_pages() to put them to MIGRATE_ISOLATE.
We have decreased NR_FREE_PAGES when we allocate a page and put it in pcp.
So we just put them in MIGRATE_ISOLATE.

Step1 deals with case1, and step2 deals with case2,3

So NR_FREE_PAGES is right after all pages are put into MIGRATE_ISOLATE list.
Now offline_isolated_pages() will be called after all pages are put in
MIGRATE_ISOLATE list. So we should not change NR_FREE_PAGES now, but
we dec NR_FREE_PAGES in offline_isolated_pages().

>
> and one more trivial question: why do we need to call
> undo_isolate_page_range() from
> __offline_pages()?

We need to restore the page's migrate type to MIGRATE_MOVABLE.

>
>
>>
>> This patch also fixs a problem in step2: if the migratetype is MIGRATE_ISOLATE,
>> we should not add NR_FRR_PAGES when we remove pages from pcppages.
>
> Why drain_all_pages doesn't work?
>

drain_all_pages() deals with case3, and it should not touch NR_FREE_PAGES if it
put a page to MIGRATE_ISOLATE list. But we touch NR_FREE_PAGES without checking
where the page is put.

Thanks
Wen Congyang

2012-10-19 08:40:09

by Wen Congyang

[permalink] [raw]
Subject: Re: [PATCH v3 0/9] bugfix for memory hotplug

At 10/19/2012 04:19 PM, Yasuaki Ishimatsu Wrote:
> 2012/10/19 17:06, Yasuaki Ishimatsu wrote:
>> Hi Wen,
>>
>> Some bug fix patches have been merged into linux-next.
>> So the patches confuse me.

Sorry, I don't check linux-next tree.

>
> The following patches have been already merged into linux-next
> and mm-tree as long as I know.
>
>>> Wen Congyang (6):
>>> clear the memory to store struct page
>
>
>>> memory-hotplug: skip HWPoisoned page when offlining pages
>
> mm-tree

Hmm, I don't find this patch in this URL:
http://www.ozlabs.org/~akpm/mmotm/broken-out/

Do I miss something?

>
>>> memory-hotplug: update mce_bad_pages when removing the memory
>
>>> memory-hotplug: auto offline page_cgroup when onlining memory block
>>> failed
>
> mm-tree
>
>>> memory-hotplug: fix NR_FREE_PAGES mismatch
>
> mm-tree
>
>>> memory-hotplug: allocate zone's pcp before onlining pages
>
> mm-tree
>
>>>
>>> Yasuaki Ishimatsu (3):
>>> suppress "Device memoryX does not have a release() function" warning
>
> linux-next
>
>>> suppress "Device nodeX does not have a release() function" warning
>>> memory-hotplug: flush the work for the node when the node is offlined
>
> linux-next

I split this patch to two patches according to kosaki's comment.

Thanks
Wen Congyang

>
> Thanks,
> Yasuaki Ishimatsu
>
>> Why did you send same patches again?
>>
>> Thanks,
>> Yasuaki Ishimatsu
>>
>> 2012/10/19 15:46, [email protected] wrote:
>>> From: Wen Congyang <[email protected]>
>>>
>>> Changes from v2 to v3:
>>> Merge the bug fix from ishimatsu to this patchset(Patch 1-3)
>>> Patch 3: split it from patch as it fixes another bug.
>>> Patch 4: new patch, and fix bad-page state when hotadding a memory
>>> device after hotremoving it. I forgot to post this patch in v2.
>>> Patch 6: update it according to Dave Hansen's comment.
>>>
>>> Changes from v1 to v2:
>>> Patch 1: updated according to kosaki's suggestion
>>>
>>> Patch 2: new patch, and update mce_bad_pages when removing memory.
>>>
>>> Patch 4: new patch, and fix a NR_FREE_PAGES mismatch, and this bug
>>> cause oom in my test.
>>>
>>> Patch 5: new patch, and fix a new bug. When repeating to online/offline
>>> pages, the free pages will continue to decrease.
>>>
>>> Wen Congyang (6):
>>> clear the memory to store struct page
>>> memory-hotplug: skip HWPoisoned page when offlining pages
>>> memory-hotplug: update mce_bad_pages when removing the memory
>>> memory-hotplug: auto offline page_cgroup when onlining memory block
>>> failed
>>> memory-hotplug: fix NR_FREE_PAGES mismatch
>>> memory-hotplug: allocate zone's pcp before onlining pages
>>>
>>> Yasuaki Ishimatsu (3):
>>> suppress "Device memoryX does not have a release() function" warning
>>> suppress "Device nodeX does not have a release() function" warning
>>> memory-hotplug: flush the work for the node when the node is offlined
>>>
>>> drivers/base/memory.c | 9 ++++++++-
>>> drivers/base/node.c | 11 +++++++++++
>>> include/linux/page-isolation.h | 10 ++++++----
>>> mm/memory-failure.c | 2 +-
>>> mm/memory_hotplug.c | 14 ++++++++------
>>> mm/page_alloc.c | 37 ++++++++++++++++++++++++++++---------
>>> mm/page_cgroup.c | 3 +++
>>> mm/page_isolation.c | 27 ++++++++++++++++++++-------
>>> mm/sparse.c | 22 +++++++++++++++++++++-
>>> 9 files changed, 106 insertions(+), 29 deletions(-)
>>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>
>
>

2012-10-19 09:40:17

by Yasuaki Ishimatsu

[permalink] [raw]
Subject: Re: [PATCH v3 0/9] bugfix for memory hotplug

2012/10/19 17:45, Wen Congyang wrote:
> At 10/19/2012 04:19 PM, Yasuaki Ishimatsu Wrote:
>> 2012/10/19 17:06, Yasuaki Ishimatsu wrote:
>>> Hi Wen,
>>>
>>> Some bug fix patches have been merged into linux-next.
>>> So the patches confuse me.
>
> Sorry, I don't check linux-next tree.
>
>>
>> The following patches have been already merged into linux-next
>> and mm-tree as long as I know.
>>
>>>> Wen Congyang (6):
>>>> clear the memory to store struct page
>>
>>
>>>> memory-hotplug: skip HWPoisoned page when offlining pages
>>
>> mm-tree
>
> Hmm, I don't find this patch in this URL:
> http://www.ozlabs.org/~akpm/mmotm/broken-out/
>
> Do I miss something?

But Andrew announced that the patch was merged in mm-tree.
And you received the announcement.

>>
>>>> memory-hotplug: update mce_bad_pages when removing the memory
>>
>>>> memory-hotplug: auto offline page_cgroup when onlining memory block
>>>> failed
>>
>> mm-tree
>>
>>>> memory-hotplug: fix NR_FREE_PAGES mismatch
>>
>> mm-tree
>>
>>>> memory-hotplug: allocate zone's pcp before onlining pages
>>
>> mm-tree
>>
>>>>
>>>> Yasuaki Ishimatsu (3):
>>>> suppress "Device memoryX does not have a release() function" warning
>>
>> linux-next
>>
>>>> suppress "Device nodeX does not have a release() function" warning
>>>> memory-hotplug: flush the work for the node when the node is offlined
>>
>> linux-next
>
> I split this patch to two patches according to kosaki's comment.

Yeah, I know. But is the patch really need now?

Thanks,
Yasuaki Ishimatsu

>
> Thanks
> Wen Congyang
>
>>
>> Thanks,
>> Yasuaki Ishimatsu
>>
>>> Why did you send same patches again?
>>>
>>> Thanks,
>>> Yasuaki Ishimatsu
>>>
>>> 2012/10/19 15:46, [email protected] wrote:
>>>> From: Wen Congyang <[email protected]>
>>>>
>>>> Changes from v2 to v3:
>>>> Merge the bug fix from ishimatsu to this patchset(Patch 1-3)
>>>> Patch 3: split it from patch as it fixes another bug.
>>>> Patch 4: new patch, and fix bad-page state when hotadding a memory
>>>> device after hotremoving it. I forgot to post this patch in v2.
>>>> Patch 6: update it according to Dave Hansen's comment.
>>>>
>>>> Changes from v1 to v2:
>>>> Patch 1: updated according to kosaki's suggestion
>>>>
>>>> Patch 2: new patch, and update mce_bad_pages when removing memory.
>>>>
>>>> Patch 4: new patch, and fix a NR_FREE_PAGES mismatch, and this bug
>>>> cause oom in my test.
>>>>
>>>> Patch 5: new patch, and fix a new bug. When repeating to online/offline
>>>> pages, the free pages will continue to decrease.
>>>>
>>>> Wen Congyang (6):
>>>> clear the memory to store struct page
>>>> memory-hotplug: skip HWPoisoned page when offlining pages
>>>> memory-hotplug: update mce_bad_pages when removing the memory
>>>> memory-hotplug: auto offline page_cgroup when onlining memory block
>>>> failed
>>>> memory-hotplug: fix NR_FREE_PAGES mismatch
>>>> memory-hotplug: allocate zone's pcp before onlining pages
>>>>
>>>> Yasuaki Ishimatsu (3):
>>>> suppress "Device memoryX does not have a release() function" warning
>>>> suppress "Device nodeX does not have a release() function" warning
>>>> memory-hotplug: flush the work for the node when the node is offlined
>>>>
>>>> drivers/base/memory.c | 9 ++++++++-
>>>> drivers/base/node.c | 11 +++++++++++
>>>> include/linux/page-isolation.h | 10 ++++++----
>>>> mm/memory-failure.c | 2 +-
>>>> mm/memory_hotplug.c | 14 ++++++++------
>>>> mm/page_alloc.c | 37 ++++++++++++++++++++++++++++---------
>>>> mm/page_cgroup.c | 3 +++
>>>> mm/page_isolation.c | 27 ++++++++++++++++++++-------
>>>> mm/sparse.c | 22 +++++++++++++++++++++-
>>>> 9 files changed, 106 insertions(+), 29 deletions(-)
>>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at http://www.tux.org/lkml/
>>>
>>
>>
>>
>

2012-10-19 10:09:39

by Wen Congyang

[permalink] [raw]
Subject: Re: [PATCH v3 0/9] bugfix for memory hotplug

At 10/19/2012 05:39 PM, Yasuaki Ishimatsu Wrote:
> 2012/10/19 17:45, Wen Congyang wrote:
>> At 10/19/2012 04:19 PM, Yasuaki Ishimatsu Wrote:
>>> 2012/10/19 17:06, Yasuaki Ishimatsu wrote:
>>>> Hi Wen,
>>>>
>>>> Some bug fix patches have been merged into linux-next.
>>>> So the patches confuse me.
>>
>> Sorry, I don't check linux-next tree.
>>
>>>
>>> The following patches have been already merged into linux-next
>>> and mm-tree as long as I know.
>>>
>>>>> Wen Congyang (6):
>>>>> clear the memory to store struct page
>>>
>>>
>>>>> memory-hotplug: skip HWPoisoned page when offlining pages
>>>
>>> mm-tree
>>
>> Hmm, I don't find this patch in this URL:
>> http://www.ozlabs.org/~akpm/mmotm/broken-out/
>>
>> Do I miss something?
>
> But Andrew announced that the patch was merged in mm-tree.
> And you received the announcement.

I search my mail, and don't find such announcement. Maybe I miss
it.

Thanks
Wen Congyang

>
>>>
>>>>> memory-hotplug: update mce_bad_pages when removing the memory
>>>
>>>>> memory-hotplug: auto offline page_cgroup when onlining memory block
>>>>> failed
>>>
>>> mm-tree
>>>
>>>>> memory-hotplug: fix NR_FREE_PAGES mismatch
>>>
>>> mm-tree
>>>
>>>>> memory-hotplug: allocate zone's pcp before onlining pages
>>>
>>> mm-tree
>>>
>>>>>
>>>>> Yasuaki Ishimatsu (3):
>>>>> suppress "Device memoryX does not have a release() function" warning
>>>
>>> linux-next
>>>
>>>>> suppress "Device nodeX does not have a release() function" warning
>>>>> memory-hotplug: flush the work for the node when the node is offlined
>>>
>>> linux-next
>>
>> I split this patch to two patches according to kosaki's comment.
>
> Yeah, I know. But is the patch really need now?
>
> Thanks,
> Yasuaki Ishimatsu
>
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>> Thanks,
>>> Yasuaki Ishimatsu
>>>
>>>> Why did you send same patches again?
>>>>
>>>> Thanks,
>>>> Yasuaki Ishimatsu
>>>>
>>>> 2012/10/19 15:46, [email protected] wrote:
>>>>> From: Wen Congyang <[email protected]>
>>>>>
>>>>> Changes from v2 to v3:
>>>>> Merge the bug fix from ishimatsu to this patchset(Patch 1-3)
>>>>> Patch 3: split it from patch as it fixes another bug.
>>>>> Patch 4: new patch, and fix bad-page state when hotadding a memory
>>>>> device after hotremoving it. I forgot to post this patch in v2.
>>>>> Patch 6: update it according to Dave Hansen's comment.
>>>>>
>>>>> Changes from v1 to v2:
>>>>> Patch 1: updated according to kosaki's suggestion
>>>>>
>>>>> Patch 2: new patch, and update mce_bad_pages when removing memory.
>>>>>
>>>>> Patch 4: new patch, and fix a NR_FREE_PAGES mismatch, and this bug
>>>>> cause oom in my test.
>>>>>
>>>>> Patch 5: new patch, and fix a new bug. When repeating to online/offline
>>>>> pages, the free pages will continue to decrease.
>>>>>
>>>>> Wen Congyang (6):
>>>>> clear the memory to store struct page
>>>>> memory-hotplug: skip HWPoisoned page when offlining pages
>>>>> memory-hotplug: update mce_bad_pages when removing the memory
>>>>> memory-hotplug: auto offline page_cgroup when onlining memory block
>>>>> failed
>>>>> memory-hotplug: fix NR_FREE_PAGES mismatch
>>>>> memory-hotplug: allocate zone's pcp before onlining pages
>>>>>
>>>>> Yasuaki Ishimatsu (3):
>>>>> suppress "Device memoryX does not have a release() function" warning
>>>>> suppress "Device nodeX does not have a release() function" warning
>>>>> memory-hotplug: flush the work for the node when the node is offlined
>>>>>
>>>>> drivers/base/memory.c | 9 ++++++++-
>>>>> drivers/base/node.c | 11 +++++++++++
>>>>> include/linux/page-isolation.h | 10 ++++++----
>>>>> mm/memory-failure.c | 2 +-
>>>>> mm/memory_hotplug.c | 14 ++++++++------
>>>>> mm/page_alloc.c | 37 ++++++++++++++++++++++++++++---------
>>>>> mm/page_cgroup.c | 3 +++
>>>>> mm/page_isolation.c | 27 ++++++++++++++++++++-------
>>>>> mm/sparse.c | 22 +++++++++++++++++++++-
>>>>> 9 files changed, 106 insertions(+), 29 deletions(-)
>>>>>
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>> Please read the FAQ at http://www.tux.org/lkml/
>>>>
>>>
>>>
>>>
>>
>
>
>

2012-10-22 22:38:47

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 0/9] bugfix for memory hotplug

On Fri, 19 Oct 2012 18:39:45 +0900
Yasuaki Ishimatsu <[email protected]> wrote:

> 2012/10/19 17:45, Wen Congyang wrote:
> > At 10/19/2012 04:19 PM, Yasuaki Ishimatsu Wrote:
> >> 2012/10/19 17:06, Yasuaki Ishimatsu wrote:
> >>> Hi Wen,
> >>>
> >>> Some bug fix patches have been merged into linux-next.
> >>> So the patches confuse me.
> >
> > Sorry, I don't check linux-next tree.
> >
> >>
> >> The following patches have been already merged into linux-next
> >> and mm-tree as long as I know.
> >>
> >>>> Wen Congyang (6):
> >>>> clear the memory to store struct page
> >>
> >>
> >>>> memory-hotplug: skip HWPoisoned page when offlining pages
> >>
> >> mm-tree
> >
> > Hmm, I don't find this patch in this URL:
> > http://www.ozlabs.org/~akpm/mmotm/broken-out/
> >
> > Do I miss something?
>
> But Andrew announced that the patch was merged in mm-tree.
> And you received the announcement.

mmotm is updated less frequently than http://ozlabs.org/~akpm/mmots/ so
it's best to check mmots to find out what's in there. Even mmots is
lagging recently, because of ongoing sched-numa damage (grump). But
that has all improved lately so we should be getting back on track.

And yes, I merged
memory-hotplug-skip-hwpoisoned-page-when-offlining-pages.patch on 18
October.

The memory hotplug patching is somewhat confusing at present, so
thanks for checking - please continue to do so!

2012-10-22 22:52:27

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 2/9] suppress "Device nodeX does not have a release() function" warning

On Fri, 19 Oct 2012 14:46:35 +0800
[email protected] wrote:

> From: Yasuaki Ishimatsu <[email protected]>
>
> When calling unregister_node(), the function shows following message at
> device_release().
>
> "Device 'node2' does not have a release() function, it is broken and must
> be fixed."
>
> The reason is node's device struct does not have a release() function.
>
> So the patch registers node_device_release() to the device's release()
> function for suppressing the warning message. Additionally, the patch adds
> memset() to initialize a node struct into register_node(). Because the node
> struct is part of node_devices[] array and it cannot be freed by
> node_device_release(). So if system reuses the node struct, it has a garbage.
>
> ...
>
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -252,6 +252,9 @@ static inline void hugetlb_register_node(struct node *node) {}
> static inline void hugetlb_unregister_node(struct node *node) {}
> #endif
>
> +static void node_device_release(struct device *dev)
> +{
> +}
>
> /*
> * register_node - Setup a sysfs device for a node.
> @@ -263,8 +266,11 @@ int register_node(struct node *node, int num, struct node *parent)
> {
> int error;
>
> + memset(node, 0, sizeof(*node));
> +
> node->dev.id = num;
> node->dev.bus = &node_subsys;
> + node->dev.release = node_device_release;
> error = device_register(&node->dev);
>
> if (!error){

Greg won't like that empty ->release function ;)

As you say, this device item does not reside in per-device dynamically
allocated memory - it is part of an externally managed array.

So a proper fix here would be to convert this storage so that it *is*
dynamically allocated on a per-device basis.

Or perhaps we should recognize that the whole kobject
get/put/release-on-last-put model is inappropriate for these objects,
and stop using it entirely.

>From Kosaki's comment, it does sound that we plan to take the first
option: convert to per-device dynamically allocated memory? If so, I
suggest that we just leave the warning as-is for now, until we fix
things proprely.

2012-10-22 23:00:38

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v3 2/9] suppress "Device nodeX does not have a release() function" warning

On Mon, Oct 22, 2012 at 03:52:24PM -0700, Andrew Morton wrote:
> On Fri, 19 Oct 2012 14:46:35 +0800
> [email protected] wrote:
>
> > From: Yasuaki Ishimatsu <[email protected]>
> >
> > When calling unregister_node(), the function shows following message at
> > device_release().
> >
> > "Device 'node2' does not have a release() function, it is broken and must
> > be fixed."
> >
> > The reason is node's device struct does not have a release() function.
> >
> > So the patch registers node_device_release() to the device's release()
> > function for suppressing the warning message. Additionally, the patch adds
> > memset() to initialize a node struct into register_node(). Because the node
> > struct is part of node_devices[] array and it cannot be freed by
> > node_device_release(). So if system reuses the node struct, it has a garbage.
> >
> > ...
> >
> > --- a/drivers/base/node.c
> > +++ b/drivers/base/node.c
> > @@ -252,6 +252,9 @@ static inline void hugetlb_register_node(struct node *node) {}
> > static inline void hugetlb_unregister_node(struct node *node) {}
> > #endif
> >
> > +static void node_device_release(struct device *dev)
> > +{
> > +}
> >
> > /*
> > * register_node - Setup a sysfs device for a node.
> > @@ -263,8 +266,11 @@ int register_node(struct node *node, int num, struct node *parent)
> > {
> > int error;
> >
> > + memset(node, 0, sizeof(*node));
> > +
> > node->dev.id = num;
> > node->dev.bus = &node_subsys;
> > + node->dev.release = node_device_release;
> > error = device_register(&node->dev);
> >
> > if (!error){
>
> Greg won't like that empty ->release function ;)
>
> As you say, this device item does not reside in per-device dynamically
> allocated memory - it is part of an externally managed array.
>
> So a proper fix here would be to convert this storage so that it *is*
> dynamically allocated on a per-device basis.

I thought we had this fixed up already?

> Or perhaps we should recognize that the whole kobject
> get/put/release-on-last-put model is inappropriate for these objects,
> and stop using it entirely.

Yes, you can do the same thing with dynamic struct devices for what you
need to do here, it should be easy to convert the code to use it.

> From Kosaki's comment, it does sound that we plan to take the first
> option: convert to per-device dynamically allocated memory? If so, I
> suggest that we just leave the warning as-is for now, until we fix
> things proprely.

Sounds good to me.

thanks,

greg k-h

2012-10-26 09:38:44

by Wen Congyang

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] clear the memory to store struct page

Hi, andrew morton:

This patch has been acked by kosaki motohiro. Is it OK to be merged
into -mm tree?

Thanks
Wen Congyang

At 10/19/2012 02:46 PM, [email protected] Wrote:
> From: Wen Congyang <[email protected]>
>
> If sparse memory vmemmap is enabled, we can't free the memory to store
> struct page when a memory device is hotremoved, because we may store
> struct page in the memory to manage the memory which doesn't belong
> to this memory device. When we hotadded this memory device again, we
> will reuse this memory to store struct page, and struct page may
> contain some obsolete information, and we will get bad-page state:
>
> [ 59.611278] init_memory_mapping: [mem 0x80000000-0x9fffffff]
> [ 59.637836] Built 2 zonelists in Node order, mobility grouping on. Total pages: 547617
> [ 59.638739] Policy zone: Normal
> [ 59.650840] BUG: Bad page state in process bash pfn:9b6dc
> [ 59.651124] page:ffffea0002200020 count:0 mapcount:0 mapping: (null) index:0xfdfdfdfdfdfdfdfd
> [ 59.651494] page flags: 0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock)
> [ 59.653604] Modules linked in: netconsole acpiphp pci_hotplug acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk libata virtio_pci virtio_ring virtio scsi_mod
> [ 59.656998] Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12
> [ 59.657172] Call Trace:
> [ 59.657275] [<ffffffff810e9b30>] ? bad_page+0xb0/0x100
> [ 59.657434] [<ffffffff810ea4c3>] ? free_pages_prepare+0xb3/0x100
> [ 59.657610] [<ffffffff810ea668>] ? free_hot_cold_page+0x48/0x1a0
> [ 59.657787] [<ffffffff8112cc08>] ? online_pages_range+0x68/0xa0
> [ 59.657961] [<ffffffff8112cba0>] ? __online_page_increment_counters+0x10/0x10
> [ 59.658162] [<ffffffff81045561>] ? walk_system_ram_range+0x101/0x110
> [ 59.658346] [<ffffffff814c4f95>] ? online_pages+0x1a5/0x2b0
> [ 59.658515] [<ffffffff8135663d>] ? __memory_block_change_state+0x20d/0x270
> [ 59.658710] [<ffffffff81356756>] ? store_mem_state+0xb6/0xf0
> [ 59.658878] [<ffffffff8119e482>] ? sysfs_write_file+0xd2/0x160
> [ 59.659052] [<ffffffff8113769a>] ? vfs_write+0xaa/0x160
> [ 59.659212] [<ffffffff81137977>] ? sys_write+0x47/0x90
> [ 59.659371] [<ffffffff814e2f25>] ? async_page_fault+0x25/0x30
> [ 59.659543] [<ffffffff814ea239>] ? system_call_fastpath+0x16/0x1b
> [ 59.659720] Disabling lock debugging due to kernel taint
>
> This patch clears the memory to store struct page to avoid unexpected error.
>
> CC: David Rientjes <[email protected]>
> CC: Jiang Liu <[email protected]>
> Cc: Minchan Kim <[email protected]>
> CC: Andrew Morton <[email protected]>
> CC: KOSAKI Motohiro <[email protected]>
> CC: Yasuaki Ishimatsu <[email protected]>
> Reported-by: Vasilis Liaskovitis <[email protected]>
> Signed-off-by: Wen Congyang <[email protected]>
> ---
> mm/sparse.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index fac95f2..0021265 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -638,7 +638,6 @@ static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
> got_map_page:
> ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
> got_map_ptr:
> - memset(ret, 0, memmap_size);
>
> return ret;
> }
> @@ -760,6 +759,8 @@ int __meminit sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
> goto out;
> }
>
> + memset(memmap, 0, sizeof(struct page) * nr_pages);
> +
> ms->section_mem_map |= SECTION_MARKED_PRESENT;
>
> ret = sparse_init_one_section(ms, section_nr, memmap, usemap);

2012-10-29 21:10:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] clear the memory to store struct page

On Fri, 26 Oct 2012 17:44:06 +0800
Wen Congyang <[email protected]> wrote:

> This patch has been acked by kosaki motohiro. Is it OK to be merged
> into -mm tree?

I'd already merged the v2 patchset when you later sent out the v3
patchset which contains some of the material from v2 plus more things.

I can drop all of v2 and remerge v3. But I see from the discussion
under "[PATCH v3 6/9] memory-hotplug: update mce_bad_pages when
removing the memory" that you intend to send out a v4 patchset.

This is all a bit of a mess. Piecemeal picking-and-choosing of various
patches from various iterations of the same patchset is confusing and
error-prone.

Please, take a look at the current -mm tree at
http://ozlabs.org/~akpm/mmots/ then come up with a plan for us. We can
either add new patches or we can drop old patches and replace them.

2012-10-30 02:12:34

by Wen Congyang

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] clear the memory to store struct page

At 10/30/2012 05:10 AM, Andrew Morton Wrote:
> On Fri, 26 Oct 2012 17:44:06 +0800
> Wen Congyang <[email protected]> wrote:
>
>> This patch has been acked by kosaki motohiro. Is it OK to be merged
>> into -mm tree?
>
> I'd already merged the v2 patchset when you later sent out the v3
> patchset which contains some of the material from v2 plus more things.
>
> I can drop all of v2 and remerge v3. But I see from the discussion
> under "[PATCH v3 6/9] memory-hotplug: update mce_bad_pages when
> removing the memory" that you intend to send out a v4 patchset.

OK, I will send out a v4 patchset. Do I need to resend the patch
which is in -mm tree and has no comment?

Thanks
Wen Congyang

>
> This is all a bit of a mess. Piecemeal picking-and-choosing of various
> patches from various iterations of the same patchset is confusing and
> error-prone.
>
> Please, take a look at the current -mm tree at
> http://ozlabs.org/~akpm/mmots/ then come up with a plan for us. We can
> either add new patches or we can drop old patches and replace them.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2012-10-30 02:41:25

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] clear the memory to store struct page

On Tue, 30 Oct 2012 10:18:17 +0800 Wen Congyang <[email protected]> wrote:

> At 10/30/2012 05:10 AM, Andrew Morton Wrote:
> > On Fri, 26 Oct 2012 17:44:06 +0800
> > Wen Congyang <[email protected]> wrote:
> >
> >> This patch has been acked by kosaki motohiro. Is it OK to be merged
> >> into -mm tree?
> >
> > I'd already merged the v2 patchset when you later sent out the v3
> > patchset which contains some of the material from v2 plus more things.
> >
> > I can drop all of v2 and remerge v3. But I see from the discussion
> > under "[PATCH v3 6/9] memory-hotplug: update mce_bad_pages when
> > removing the memory" that you intend to send out a v4 patchset.
>
> OK, I will send out a v4 patchset.

Thanks.

> Do I need to resend the patch
> which is in -mm tree and has no comment?

I wonder which patch that is!

Sure, send everything.

2012-10-30 02:42:38

by Wen Congyang

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] clear the memory to store struct page

At 10/30/2012 10:41 AM, Andrew Morton Wrote:
> On Tue, 30 Oct 2012 10:18:17 +0800 Wen Congyang <[email protected]> wrote:
>
>> At 10/30/2012 05:10 AM, Andrew Morton Wrote:
>>> On Fri, 26 Oct 2012 17:44:06 +0800
>>> Wen Congyang <[email protected]> wrote:
>>>
>>>> This patch has been acked by kosaki motohiro. Is it OK to be merged
>>>> into -mm tree?
>>>
>>> I'd already merged the v2 patchset when you later sent out the v3
>>> patchset which contains some of the material from v2 plus more things.
>>>
>>> I can drop all of v2 and remerge v3. But I see from the discussion
>>> under "[PATCH v3 6/9] memory-hotplug: update mce_bad_pages when
>>> removing the memory" that you intend to send out a v4 patchset.
>>
>> OK, I will send out a v4 patchset.
>
> Thanks.
>
>> Do I need to resend the patch
>> which is in -mm tree and has no comment?
>
> I wonder which patch that is!
>
> Sure, send everything.
>

OK, I will send everything soon.

Thanks
Wen Congyang