2007-10-18 03:21:30

by Yasunori Goto

[permalink] [raw]
Subject: [Patch 000/002](memory hotplug) Rearrange notifier of memory hotplug (take 2)

Hello.

This patch set is to rearrange event notifier for memory hotplug,
because the old notifier has some defects. For example, there is no
information like new memory's pfn and # of pages for callback functions.

Fortunately, nothing uses this notifier so far, there is no impact by
this change. (SLUB will use this after this patch set to make
kmem_cache_node structure).

In addition, descriptions of notifer is added to memory hotplug
document.

This patch was a part of patch set to make kmem_cache_node of SLUB
to avoid panic of memory online. But, I think this change becomes
not only for SLUB but also for others. So, I extracted this from it.

This patch set is for 2.6.23-mm1.
I tested this patch on my ia64 box.

Please apply.

Bye.

--
Yasunori Goto



2007-10-18 03:24:22

by Yasunori Goto

[permalink] [raw]
Subject: [Patch 001/002](memory hotplug) Make description of memory hotplug notifier in document


Add description about event notification callback routine to the document.

Signed-off-by: Yasunori Goto <[email protected]>

---
Documentation/memory-hotplug.txt | 58 ++++++++++++++++++++++++++++++++++++---
1 file changed, 55 insertions(+), 3 deletions(-)

Index: current/Documentation/memory-hotplug.txt
===================================================================
--- current.orig/Documentation/memory-hotplug.txt 2007-10-17 15:57:50.000000000 +0900
+++ current/Documentation/memory-hotplug.txt 2007-10-17 21:26:30.000000000 +0900
@@ -2,7 +2,8 @@
Memory Hotplug
==============

-Last Updated: Jul 28 2007
+Created: Jul 28 2007
+Add description of notifier of memory hotplug Oct 11 2007

This document is about memory hotplug including how-to-use and current status.
Because Memory Hotplug is still under development, contents of this text will
@@ -24,7 +25,8 @@ be changed often.
6.1 Memory offline and ZONE_MOVABLE
6.2. How to offline memory
7. Physical memory remove
-8. Future Work List
+8. Memory hotplug event notifier
+9. Future Work List

Note(1): x86_64's has special implementation for memory hotplug.
This text does not describe it.
@@ -307,8 +309,58 @@ Need more implementation yet....
- Notification completion of remove works by OS to firmware.
- Guard from remove if not yet.

+--------------------------------
+8. Memory hotplug event notifier
+--------------------------------
+Memory hotplug has event notifer. There are 6 types of notification.
+
+MEMORY_GOING_ONLINE
+ Generated before new memory becomes available in order to be able to
+ prepare subsystems to handle memory. The page allocator is still unable
+ to allocate from the new memory.
+
+MEMORY_CANCEL_ONLINE
+ Generated if MEMORY_GOING_ONLINE fails.
+
+MEMORY_ONLINE
+ Generated when memory has succesfully brought online. The callback may
+ allocate pages from the new memory.
+
+MEMORY_GOING_OFFLINE
+ Generated to begin the process of offlining memory. Allocations are no
+ longer possible from the memory but some of the memory to be offlined
+ is still in use. The callback can be used to free memory known to a
+ subsystem from the indicated memory section.
+
+MEMORY_CANCEL_OFFLINE
+ Generated if MEMORY_GOING_OFFLINE fails. Memory is available again from
+ the section that we attempted to offline.
+
+MEMORY_OFFLINE
+ Generated after offlining memory is complete.
+
+A callback routine can be registered by
+ hotplug_memory_notifier(callback_func, priority)
+
+The second argument of callback function (action) is event types of above.
+The third argument is passed by pointer of struct memory_notify.
+
+struct memory_notify {
+ unsigned long start_pfn;
+ unsigned long nr_pages;
+ int status_cahnge_nid;
+}
+
+start_pfn is start_pfn of online/offline memory.
+nr_pages is # of pages of online/offline memory.
+status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
+set/clear. It means a new(memoryless) node gets new memory by online and a
+node loses all memory. If this is -1, then nodemask status is not changed.
+If status_changed_nid >= 0, callback should create/discard structures for the
+node if necessary.
+
--------------
-8. Future Work
+9. Future Work
--------------
- allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like
sysctl or new control file.

--
Yasunori Goto


2007-10-18 03:24:40

by Yasunori Goto

[permalink] [raw]
Subject: [Patch 002/002](memory hotplug) rearrange patch for notifier of memory hotplug


Current memory notifier has some defects yet. (Fortunately, nothing uses it.)
This patch is to fix and rearrange for them.

- Add information of start_pfn, nr_pages, and node id if node status is
changes from/to memoryless node for callback functions.
Callbacks can't do anything without those information.
- Add notification going-online status.
It is necessary for creating per node structure before the node's
pages are available.
- Move GOING_OFFLINE status notification after page isolation.
It is good place for return memory like cache for callback,
because returned page is not used again.
- Make CANCEL events for rollingback when error occurs.
- Delete MEM_MAPPING_INVALID notification. It will be not used.
- Fix compile error of (un)register_memory_notifier().


Signed-off-by: Yasunori Goto <[email protected]>


---
drivers/base/memory.c | 9 +--------
include/linux/memory.h | 27 +++++++++++++++------------
mm/memory_hotplug.c | 48 +++++++++++++++++++++++++++++++++++++++++++++---
3 files changed, 61 insertions(+), 23 deletions(-)

Index: current/drivers/base/memory.c
===================================================================
--- current.orig/drivers/base/memory.c 2007-10-17 21:17:54.000000000 +0900
+++ current/drivers/base/memory.c 2007-10-17 21:21:30.000000000 +0900
@@ -137,7 +137,7 @@ static ssize_t show_mem_state(struct sys
return len;
}

-static inline int memory_notify(unsigned long val, void *v)
+int memory_notify(unsigned long val, void *v)
{
return blocking_notifier_call_chain(&memory_chain, val, v);
}
@@ -183,7 +183,6 @@ memory_block_action(struct memory_block
break;
case MEM_OFFLINE:
mem->state = MEM_GOING_OFFLINE;
- memory_notify(MEM_GOING_OFFLINE, NULL);
start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
ret = remove_memory(start_paddr,
PAGES_PER_SECTION << PAGE_SHIFT);
@@ -191,7 +190,6 @@ memory_block_action(struct memory_block
mem->state = old_state;
break;
}
- memory_notify(MEM_MAPPING_INVALID, NULL);
break;
default:
printk(KERN_WARNING "%s(%p, %ld) unknown action: %ld\n",
@@ -199,11 +197,6 @@ memory_block_action(struct memory_block
WARN_ON(1);
ret = -EINVAL;
}
- /*
- * For now, only notify on successful memory operations
- */
- if (!ret)
- memory_notify(action, NULL);

return ret;
}
Index: current/include/linux/memory.h
===================================================================
--- current.orig/include/linux/memory.h 2007-10-17 21:17:54.000000000 +0900
+++ current/include/linux/memory.h 2007-10-17 21:21:30.000000000 +0900
@@ -41,18 +41,15 @@ struct memory_block {
#define MEM_ONLINE (1<<0) /* exposed to userspace */
#define MEM_GOING_OFFLINE (1<<1) /* exposed to userspace */
#define MEM_OFFLINE (1<<2) /* exposed to userspace */
+#define MEM_GOING_ONLINE (1<<3)
+#define MEM_CANCEL_ONLINE (1<<4)
+#define MEM_CANCEL_OFFLINE (1<<5)

-/*
- * All of these states are currently kernel-internal for notifying
- * kernel components and architectures.
- *
- * For MEM_MAPPING_INVALID, all notifier chains with priority >0
- * are called before pfn_to_page() becomes invalid. The priority=0
- * entry is reserved for the function that actually makes
- * pfn_to_page() stop working. Any notifiers that want to be called
- * after that should have priority <0.
- */
-#define MEM_MAPPING_INVALID (1<<3)
+struct memory_notify {
+ unsigned long start_pfn;
+ unsigned long nr_pages;
+ int status_change_nid;
+};

struct notifier_block;
struct mem_section;
@@ -69,12 +66,18 @@ static inline int register_memory_notifi
static inline void unregister_memory_notifier(struct notifier_block *nb)
{
}
+static inline int memory_notify(unsigned long val, void *v)
+{
+ return 0;
+}
#else
+extern int register_memory_notifier(struct notifier_block *nb);
+extern void unregister_memory_notifier(struct notifier_block *nb);
extern int register_new_memory(struct mem_section *);
extern int unregister_memory_section(struct mem_section *);
extern int memory_dev_init(void);
extern int remove_memory_block(unsigned long, struct mem_section *, int);
-
+extern int memory_notify(unsigned long val, void *v);
#define CONFIG_MEM_BLOCK_SIZE (PAGES_PER_SECTION<<PAGE_SHIFT)


Index: current/mm/memory_hotplug.c
===================================================================
--- current.orig/mm/memory_hotplug.c 2007-10-17 21:17:54.000000000 +0900
+++ current/mm/memory_hotplug.c 2007-10-17 21:21:30.000000000 +0900
@@ -187,7 +187,24 @@ int online_pages(unsigned long pfn, unsi
unsigned long onlined_pages = 0;
struct zone *zone;
int need_zonelists_rebuild = 0;
+ int nid;
+ int ret;
+ struct memory_notify arg;
+
+ arg.start_pfn = pfn;
+ arg.nr_pages = nr_pages;
+ arg.status_change_nid = -1;
+
+ nid = page_to_nid(pfn_to_page(pfn));
+ if (node_present_pages(nid) == 0)
+ arg.status_change_nid = nid;

+ ret = memory_notify(MEM_GOING_ONLINE, &arg);
+ ret = notifier_to_errno(ret);
+ if (ret) {
+ memory_notify(MEM_CANCEL_ONLINE, &arg);
+ return ret;
+ }
/*
* This doesn't need a lock to do pfn_to_page().
* The section can't be removed here because of the
@@ -222,6 +239,10 @@ int online_pages(unsigned long pfn, unsi
build_all_zonelists();
vm_total_pages = nr_free_pagecache_pages();
writeback_set_ratelimit();
+
+ if (onlined_pages)
+ memory_notify(MEM_ONLINE, &arg);
+
return 0;
}
#endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
@@ -467,8 +488,9 @@ int offline_pages(unsigned long start_pf
{
unsigned long pfn, nr_pages, expire;
long offlined_pages;
- int ret, drain, retry_max;
+ int ret, drain, retry_max, node;
struct zone *zone;
+ struct memory_notify arg;

BUG_ON(start_pfn >= end_pfn);
/* at least, alignment against pageblock is necessary */
@@ -480,11 +502,27 @@ int offline_pages(unsigned long start_pf
we assume this for now. .*/
if (!test_pages_in_a_zone(start_pfn, end_pfn))
return -EINVAL;
+
+ zone = page_zone(pfn_to_page(start_pfn));
+ node = zone_to_nid(zone);
+ nr_pages = end_pfn - start_pfn;
+
/* set above range as isolated */
ret = start_isolate_page_range(start_pfn, end_pfn);
if (ret)
return ret;
- nr_pages = end_pfn - start_pfn;
+
+ arg.start_pfn = start_pfn;
+ arg.nr_pages = nr_pages;
+ arg.status_change_nid = -1;
+ if (nr_pages >= node_present_pages(node))
+ arg.status_change_nid = node;
+
+ ret = memory_notify(MEM_GOING_OFFLINE, &arg);
+ ret = notifier_to_errno(ret);
+ if (ret)
+ goto failed_removal;
+
pfn = start_pfn;
expire = jiffies + timeout;
drain = 0;
@@ -539,20 +577,24 @@ repeat:
/* reset pagetype flags */
start_isolate_page_range(start_pfn, end_pfn);
/* removal success */
- zone = page_zone(pfn_to_page(start_pfn));
zone->present_pages -= offlined_pages;
zone->zone_pgdat->node_present_pages -= offlined_pages;
totalram_pages -= offlined_pages;
num_physpages -= offlined_pages;
+
vm_total_pages = nr_free_pagecache_pages();
writeback_set_ratelimit();
+
+ memory_notify(MEM_OFFLINE, &arg);
return 0;

failed_removal:
printk(KERN_INFO "memory offlining %lx to %lx failed\n",
start_pfn, end_pfn);
+ memory_notify(MEM_CANCEL_OFFLINE, &arg);
/* pushback to free area */
undo_isolate_page_range(start_pfn, end_pfn);
+
return ret;
}
#else

--
Yasunori Goto


2007-10-18 03:41:52

by Andrew Morton

[permalink] [raw]
Subject: Re: [Patch 002/002](memory hotplug) rearrange patch for notifier of memory hotplug

On Thu, 18 Oct 2007 12:23:34 +0900 Yasunori Goto <[email protected]> wrote:

> writeback_set_ratelimit();
> +
> + if (onlined_pages)
> + memory_notify(MEM_ONLINE, &arg);

perhaps that open-coded writeback_set_ratelimit() should become a
notifier callback.