2008-11-03 23:48:34

by Gary Hade

[permalink] [raw]
Subject: [PATCH] [REPOST #2] mm: show node to memory section relationship with symlinks in sysfs


Show node to memory section relationship with symlinks in sysfs

Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX. For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.

Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.

In addition to it always being a good policy to provide users with
the maximum possible amount of physical location information for
resources that can be hot-added and/or hot-removed, the following
are some (but likely not all) of the user benefits provided by
this change.
Immediate:
- Provides information needed to determine the specific node
on which a defective DIMM is located. This will reduce system
downtime when the node or defective DIMM is swapped out.
- Prevents unintended onlining of a memory section that was
previously offlined due to a defective DIMM. This could happen
during node hot-add when the user or node hot-add assist script
onlines _all_ offlined sections due to user or script inability
to identify the specific memory sections located on the hot-added
node. The consequences of reintroducing the defective memory
could be ugly.
- Provides information needed to vary the amount and distribution
of memory on specific nodes for testing or debugging purposes.
Future:
- Will provide information needed to identify the memory
sections that need to be offlined prior to physical removal
of a specific node.

Symlink creation during boot was tested on 2-node x86_64, 2-node
ppc64, and 2-node ia64 systems. Symlink creation during physical
memory hot-add tested on a 2-node x86_64 system.

Supersedes the "mm: show memory section to node relationship in sysfs"
patch posted on 05 Sept 2008 which created node ID containing 'node'
files in /sys/devices/system/memory/memoryX instead of symlinks.
Changed from files to symlinks due to feedback that symlinks were
more consistent with the sysfs way.

Supersedes the "mm: show node to memory section relationship with
symlinks in sysfs" patch posted on 29 Sept 2008 to address a Yasunori
Goto reported problem where an incorrect symlink was created due to
a range of uninitialized pages at the beginning of a section. This
problem which produced a symlink in /sys/devices/system/node/node0
that incorrectly referenced a mem section located on node1 is corrected
in this version. This version also covers the case were a mem section
could span multiple nodes.

Supersedes the "mm: show node to memory section relationship with
symlinks in sysfs" patch posted on 09 Oct 2008 to add the Andrew
Morton requested usefulness information and update to apply cleanly
to 2.6.28-rc3 and 2.6-git. Code is unchanged.

Signed-off-by: Gary Hade <[email protected]>
Signed-off-by: Badari Pulavarty <[email protected]>

---
Documentation/ABI/testing/sysfs-devices-memory | 51 +++++++
Documentation/memory-hotplug.txt | 16 +-
arch/ia64/mm/init.c | 2
arch/powerpc/mm/mem.c | 2
arch/s390/mm/init.c | 2
arch/sh/mm/init.c | 3
arch/x86/mm/init_32.c | 2
arch/x86/mm/init_64.c | 2
drivers/base/memory.c | 19 +-
drivers/base/node.c | 100 +++++++++++++++
include/linux/memory.h | 6
include/linux/memory_hotplug.h | 2
include/linux/node.h | 13 +
mm/memory_hotplug.c | 9 -
14 files changed, 205 insertions(+), 24 deletions(-)

Index: linux-2.6.28-rc3/Documentation/ABI/testing/sysfs-devices-memory
===================================================================
--- linux-2.6.28-rc3.orig/Documentation/ABI/testing/sysfs-devices-memory 2008-11-03 09:25:05.000000000 -0800
+++ linux-2.6.28-rc3/Documentation/ABI/testing/sysfs-devices-memory 2008-11-03 09:25:33.000000000 -0800
@@ -6,7 +6,6 @@ Description:
internal state of the kernel memory blocks. Files could be
added or removed dynamically to represent hot-add/remove
operations.
-
Users: hotplug memory add/remove tools
https://w3.opensource.ibm.com/projects/powerpc-utils/

@@ -19,6 +18,56 @@ Description:
This is useful for a user-level agent to determine
identify removable sections of the memory before attempting
potentially expensive hot-remove memory operation
+Users: hotplug memory remove tools
+ https://w3.opensource.ibm.com/projects/powerpc-utils/
+
+What: /sys/devices/system/memory/memoryX/phys_device
+Date: September 2008
+Contact: Badari Pulavarty <[email protected]>
+Description:
+ The file /sys/devices/system/memory/memoryX/phys_device
+ is read-only and is designed to show the name of physical
+ memory device. Implementation is currently incomplete.

+What: /sys/devices/system/memory/memoryX/phys_index
+Date: September 2008
+Contact: Badari Pulavarty <[email protected]>
+Description:
+ The file /sys/devices/system/memory/memoryX/phys_index
+ is read-only and contains the section ID in hexadecimal
+ which is equivalent to decimal X contained in the
+ memory section directory name.
+
+What: /sys/devices/system/memory/memoryX/state
+Date: September 2008
+Contact: Badari Pulavarty <[email protected]>
+Description:
+ The file /sys/devices/system/memory/memoryX/state
+ is read-write. When read, it's contents show the
+ online/offline state of the memory section. When written,
+ root can toggle the the online/offline state of a removable
+ memory section (see removable file description above)
+ using the following commands.
+ # echo online > /sys/devices/system/memory/memoryX/state
+ # echo offline > /sys/devices/system/memory/memoryX/state
+
+ For example, if /sys/devices/system/memory/memory22/removable
+ contains a value of 1 and
+ /sys/devices/system/memory/memory22/state contains the
+ string "online" the following command can be executed by
+ by root to offline that section.
+ # echo offline > /sys/devices/system/memory/memory22/state
Users: hotplug memory remove tools
https://w3.opensource.ibm.com/projects/powerpc-utils/
+
+What: /sys/devices/system/node/nodeX/memoryY
+Date: September 2008
+Contact: Gary Hade <[email protected]>
+Description:
+ When CONFIG_NUMA is enabled
+ /sys/devices/system/node/nodeX/memoryY is a symbolic link that
+ points to the corresponding /sys/devices/system/memory/memoryY
+ memory section directory. For example, the following symbolic
+ link is created for memory section 9 on node0.
+ /sys/devices/system/node/node0/memory9 -> ../../memory/memory9
+
Index: linux-2.6.28-rc3/Documentation/memory-hotplug.txt
===================================================================
--- linux-2.6.28-rc3.orig/Documentation/memory-hotplug.txt 2008-11-03 09:25:05.000000000 -0800
+++ linux-2.6.28-rc3/Documentation/memory-hotplug.txt 2008-11-03 09:25:33.000000000 -0800
@@ -124,7 +124,7 @@ config options.
This option can be kernel module too.

--------------------------------
-3 sysfs files for memory hotplug
+4 sysfs files for memory hotplug
--------------------------------
All sections have their device information under /sys/devices/system/memory as

@@ -138,11 +138,12 @@ For example, assume 1GiB section size. A
(0x100000000 / 1Gib = 4)
This device covers address range [0x100000000 ... 0x140000000)

-Under each section, you can see 3 files.
+Under each section, you can see 4 files.

/sys/devices/system/memory/memoryXXX/phys_index
/sys/devices/system/memory/memoryXXX/phys_device
/sys/devices/system/memory/memoryXXX/state
+/sys/devices/system/memory/memoryXXX/removable

'phys_index' : read-only and contains section id, same as XXX.
'state' : read-write
@@ -150,10 +151,20 @@ Under each section, you can see 3 files.
at write: user can specify "online", "offline" command
'phys_device': read-only: designed to show the name of physical memory device.
This is not well implemented now.
+'removable' : read-only: contains an integer value indicating
+ whether the memory section is removable or not
+ removable. A value of 1 indicates that the memory
+ section is removable and a value of 0 indicates that
+ it is not removable.

NOTE:
These directories/files appear after physical memory hotplug phase.

+If CONFIG_NUMA is enabled the
+/sys/devices/system/memory/memoryXXX memory section
+directories can also be accessed via symbolic links located in
+the /sys/devices/system/node/node* directories. For example:
+/sys/devices/system/node/node0/memory9 -> ../../memory/memory9

--------------------------------
4. Physical memory hot-add phase
@@ -365,7 +376,6 @@ node if necessary.
- allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like
sysctl or new control file.
- showing memory section and physical device relationship.
- - showing memory section and node relationship (maybe good for NUMA)
- showing memory section is under ZONE_MOVABLE or not
- test and make it better memory offlining.
- support HugeTLB page migration and offlining.
Index: linux-2.6.28-rc3/arch/ia64/mm/init.c
===================================================================
--- linux-2.6.28-rc3.orig/arch/ia64/mm/init.c 2008-11-03 09:25:05.000000000 -0800
+++ linux-2.6.28-rc3/arch/ia64/mm/init.c 2008-11-03 09:25:33.000000000 -0800
@@ -692,7 +692,7 @@ int arch_add_memory(int nid, u64 start,
pgdat = NODE_DATA(nid);

zone = pgdat->node_zones + ZONE_NORMAL;
- ret = __add_pages(zone, start_pfn, nr_pages);
+ ret = __add_pages(nid, zone, start_pfn, nr_pages);

if (ret)
printk("%s: Problem encountered in __add_pages() as ret=%d\n",
Index: linux-2.6.28-rc3/arch/powerpc/mm/mem.c
===================================================================
--- linux-2.6.28-rc3.orig/arch/powerpc/mm/mem.c 2008-11-03 09:25:05.000000000 -0800
+++ linux-2.6.28-rc3/arch/powerpc/mm/mem.c 2008-11-03 09:25:33.000000000 -0800
@@ -132,7 +132,7 @@ int arch_add_memory(int nid, u64 start,
/* this should work for most non-highmem platforms */
zone = pgdata->node_zones;

- return __add_pages(zone, start_pfn, nr_pages);
+ return __add_pages(nid, zone, start_pfn, nr_pages);
}
#endif /* CONFIG_MEMORY_HOTPLUG */

Index: linux-2.6.28-rc3/arch/s390/mm/init.c
===================================================================
--- linux-2.6.28-rc3.orig/arch/s390/mm/init.c 2008-11-03 09:25:05.000000000 -0800
+++ linux-2.6.28-rc3/arch/s390/mm/init.c 2008-11-03 09:25:33.000000000 -0800
@@ -183,7 +183,7 @@ int arch_add_memory(int nid, u64 start,
rc = vmem_add_mapping(start, size);
if (rc)
return rc;
- rc = __add_pages(zone, PFN_DOWN(start), PFN_DOWN(size));
+ rc = __add_pages(nid, zone, PFN_DOWN(start), PFN_DOWN(size));
if (rc)
vmem_remove_mapping(start, size);
return rc;
Index: linux-2.6.28-rc3/arch/sh/mm/init.c
===================================================================
--- linux-2.6.28-rc3.orig/arch/sh/mm/init.c 2008-11-03 09:25:05.000000000 -0800
+++ linux-2.6.28-rc3/arch/sh/mm/init.c 2008-11-03 09:25:33.000000000 -0800
@@ -305,7 +305,8 @@ int arch_add_memory(int nid, u64 start,
pgdat = NODE_DATA(nid);

/* We only have ZONE_NORMAL, so this is easy.. */
- ret = __add_pages(pgdat->node_zones + ZONE_NORMAL, start_pfn, nr_pages);
+ ret = __add_pages(nid, pgdat->node_zones + ZONE_NORMAL,
+ start_pfn, nr_pages);
if (unlikely(ret))
printk("%s: Failed, __add_pages() == %d\n", __func__, ret);

Index: linux-2.6.28-rc3/arch/x86/mm/init_32.c
===================================================================
--- linux-2.6.28-rc3.orig/arch/x86/mm/init_32.c 2008-11-03 09:25:05.000000000 -0800
+++ linux-2.6.28-rc3/arch/x86/mm/init_32.c 2008-11-03 09:25:33.000000000 -0800
@@ -1063,7 +1063,7 @@ int arch_add_memory(int nid, u64 start,
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;

- return __add_pages(zone, start_pfn, nr_pages);
+ return __add_pages(nid, zone, start_pfn, nr_pages);
}
#endif

Index: linux-2.6.28-rc3/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.28-rc3.orig/arch/x86/mm/init_64.c 2008-11-03 09:25:05.000000000 -0800
+++ linux-2.6.28-rc3/arch/x86/mm/init_64.c 2008-11-03 09:26:29.000000000 -0800
@@ -857,7 +857,7 @@ int arch_add_memory(int nid, u64 start,
if (last_mapped_pfn > max_pfn_mapped)
max_pfn_mapped = last_mapped_pfn;

- ret = __add_pages(zone, start_pfn, nr_pages);
+ ret = __add_pages(nid, zone, start_pfn, nr_pages);
WARN_ON_ONCE(ret);

return ret;
Index: linux-2.6.28-rc3/drivers/base/memory.c
===================================================================
--- linux-2.6.28-rc3.orig/drivers/base/memory.c 2008-11-03 09:25:05.000000000 -0800
+++ linux-2.6.28-rc3/drivers/base/memory.c 2008-11-03 09:25:33.000000000 -0800
@@ -347,8 +347,9 @@ static inline int memory_probe_init(void
* section belongs to...
*/

-static int add_memory_block(unsigned long node_id, struct mem_section *section,
- unsigned long state, int phys_device)
+static int add_memory_block(int nid, struct mem_section *section,
+ unsigned long state, int phys_device,
+ enum mem_add_context context)
{
struct memory_block *mem = kzalloc(sizeof(*mem), GFP_KERNEL);
int ret = 0;
@@ -370,6 +371,10 @@ static int add_memory_block(unsigned lon
ret = mem_create_simple_file(mem, phys_device);
if (!ret)
ret = mem_create_simple_file(mem, removable);
+ if (!ret) {
+ if (context == HOTPLUG)
+ ret = register_mem_sect_under_node(mem, nid);
+ }

return ret;
}
@@ -382,7 +387,7 @@ static int add_memory_block(unsigned lon
*
* This could be made generic for all sysdev classes.
*/
-static struct memory_block *find_memory_block(struct mem_section *section)
+struct memory_block *find_memory_block(struct mem_section *section)
{
struct kobject *kobj;
struct sys_device *sysdev;
@@ -411,6 +416,7 @@ int remove_memory_block(unsigned long no
struct memory_block *mem;

mem = find_memory_block(section);
+ unregister_mem_sect_under_nodes(mem);
mem_remove_simple_file(mem, phys_index);
mem_remove_simple_file(mem, state);
mem_remove_simple_file(mem, phys_device);
@@ -424,9 +430,9 @@ int remove_memory_block(unsigned long no
* need an interface for the VM to add new memory regions,
* but without onlining it.
*/
-int register_new_memory(struct mem_section *section)
+int register_new_memory(int nid, struct mem_section *section)
{
- return add_memory_block(0, section, MEM_OFFLINE, 0);
+ return add_memory_block(nid, section, MEM_OFFLINE, 0, HOTPLUG);
}

int unregister_memory_section(struct mem_section *section)
@@ -458,7 +464,8 @@ int __init memory_dev_init(void)
for (i = 0; i < NR_MEM_SECTIONS; i++) {
if (!present_section_nr(i))
continue;
- err = add_memory_block(0, __nr_to_section(i), MEM_ONLINE, 0);
+ err = add_memory_block(0, __nr_to_section(i), MEM_ONLINE,
+ 0, BOOT);
if (!ret)
ret = err;
}
Index: linux-2.6.28-rc3/drivers/base/node.c
===================================================================
--- linux-2.6.28-rc3.orig/drivers/base/node.c 2008-11-03 09:25:05.000000000 -0800
+++ linux-2.6.28-rc3/drivers/base/node.c 2008-11-03 09:25:33.000000000 -0800
@@ -6,6 +6,7 @@
#include <linux/module.h>
#include <linux/init.h>
#include <linux/mm.h>
+#include <linux/memory.h>
#include <linux/node.h>
#include <linux/hugetlb.h>
#include <linux/cpumask.h>
@@ -248,6 +249,102 @@ int unregister_cpu_under_node(unsigned i
return 0;
}

+#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
+#define page_initialized(page) (page->lru.next)
+
+static int get_nid_for_pfn(unsigned long pfn)
+{
+ struct page *page;
+
+ if (!pfn_valid_within(pfn))
+ return -1;
+ page = pfn_to_page(pfn);
+ if (!page_initialized(page))
+ return -1;
+ return pfn_to_nid(pfn);
+}
+
+/* register memory section under specified node if it spans that node */
+int register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
+{
+ unsigned long pfn, sect_start_pfn, sect_end_pfn;
+
+ if (!mem_blk)
+ return -EFAULT;
+ if (!node_online(nid))
+ return 0;
+ sect_start_pfn = section_nr_to_pfn(mem_blk->phys_index);
+ sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
+ for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
+ int page_nid;
+
+ page_nid = get_nid_for_pfn(pfn);
+ if (page_nid < 0)
+ continue;
+ if (page_nid != nid)
+ continue;
+ return sysfs_create_link_nowarn(&node_devices[nid].sysdev.kobj,
+ &mem_blk->sysdev.kobj,
+ kobject_name(&mem_blk->sysdev.kobj));
+ }
+ /* mem section does not span the specified node */
+ return 0;
+}
+
+/* unregister memory section under all nodes that it spans */
+int unregister_mem_sect_under_nodes(struct memory_block *mem_blk)
+{
+ nodemask_t unlinked_nodes;
+ unsigned long pfn, sect_start_pfn, sect_end_pfn;
+
+ if (!mem_blk)
+ return -EFAULT;
+ nodes_clear(unlinked_nodes);
+ sect_start_pfn = section_nr_to_pfn(mem_blk->phys_index);
+ sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
+ for (pfn = sect_start_pfn; pfn < sect_end_pfn; pfn++) {
+ unsigned int nid;
+
+ nid = get_nid_for_pfn(pfn);
+ if (nid < 0)
+ continue;
+ if (!node_online(nid))
+ continue;
+ if (node_test_and_set(nid, unlinked_nodes))
+ continue;
+ sysfs_remove_link(&node_devices[nid].sysdev.kobj,
+ kobject_name(&mem_blk->sysdev.kobj));
+ }
+ return 0;
+}
+
+static int link_mem_sections(int nid)
+{
+ unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn;
+ unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_spanned_pages;
+ unsigned long pfn;
+ int err = 0;
+
+ for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+ unsigned long section_nr = pfn_to_section_nr(pfn);
+ struct mem_section *mem_sect;
+ struct memory_block *mem_blk;
+ int ret;
+
+ if (!present_section_nr(section_nr))
+ continue;
+ mem_sect = __nr_to_section(section_nr);
+ mem_blk = find_memory_block(mem_sect);
+ ret = register_mem_sect_under_node(mem_blk, nid);
+ if (!err)
+ err = ret;
+ }
+ return err;
+}
+#else
+static int link_mem_sections(int nid) { return 0; }
+#endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
+
int register_one_node(int nid)
{
int error = 0;
@@ -267,6 +364,9 @@ int register_one_node(int nid)
if (cpu_to_node(cpu) == nid)
register_cpu_under_node(cpu, nid);
}
+
+ /* link memory sections under this node */
+ error = link_mem_sections(nid);
}

return error;
Index: linux-2.6.28-rc3/include/linux/memory.h
===================================================================
--- linux-2.6.28-rc3.orig/include/linux/memory.h 2008-11-03 09:25:05.000000000 -0800
+++ linux-2.6.28-rc3/include/linux/memory.h 2008-11-03 09:25:33.000000000 -0800
@@ -79,14 +79,14 @@ static inline int memory_notify(unsigned
#else
extern int register_memory_notifier(struct notifier_block *nb);
extern void unregister_memory_notifier(struct notifier_block *nb);
-extern int register_new_memory(struct mem_section *);
+extern int register_new_memory(int, struct mem_section *);
extern int unregister_memory_section(struct mem_section *);
extern int memory_dev_init(void);
extern int remove_memory_block(unsigned long, struct mem_section *, int);
extern int memory_notify(unsigned long val, void *v);
+extern struct memory_block *find_memory_block(struct mem_section *);
#define CONFIG_MEM_BLOCK_SIZE (PAGES_PER_SECTION<<PAGE_SHIFT)
-
-
+enum mem_add_context { BOOT, HOTPLUG };
#endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */

#ifdef CONFIG_MEMORY_HOTPLUG
Index: linux-2.6.28-rc3/include/linux/memory_hotplug.h
===================================================================
--- linux-2.6.28-rc3.orig/include/linux/memory_hotplug.h 2008-11-03 09:25:05.000000000 -0800
+++ linux-2.6.28-rc3/include/linux/memory_hotplug.h 2008-11-03 09:25:33.000000000 -0800
@@ -72,7 +72,7 @@ extern void __offline_isolated_pages(uns
extern int offline_pages(unsigned long, unsigned long, unsigned long);

/* reasonably generic interface to expand the physical pages in a zone */
-extern int __add_pages(struct zone *zone, unsigned long start_pfn,
+extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages);
extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages);
Index: linux-2.6.28-rc3/include/linux/node.h
===================================================================
--- linux-2.6.28-rc3.orig/include/linux/node.h 2008-11-03 09:25:05.000000000 -0800
+++ linux-2.6.28-rc3/include/linux/node.h 2008-11-03 09:25:33.000000000 -0800
@@ -26,6 +26,7 @@ struct node {
struct sys_device sysdev;
};

+struct memory_block;
extern struct node node_devices[];

extern int register_node(struct node *, int, struct node *);
@@ -35,6 +36,9 @@ extern int register_one_node(int nid);
extern void unregister_one_node(int nid);
extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
+extern int register_mem_sect_under_node(struct memory_block *mem_blk,
+ int nid);
+extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk);
#else
static inline int register_one_node(int nid)
{
@@ -52,6 +56,15 @@ static inline int unregister_cpu_under_n
{
return 0;
}
+static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
+ int nid)
+{
+ return 0;
+}
+static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk)
+{
+ return 0;
+}
#endif

#define to_node(sys_device) container_of(sys_device, struct node, sysdev)
Index: linux-2.6.28-rc3/mm/memory_hotplug.c
===================================================================
--- linux-2.6.28-rc3.orig/mm/memory_hotplug.c 2008-11-03 09:25:05.000000000 -0800
+++ linux-2.6.28-rc3/mm/memory_hotplug.c 2008-11-03 09:25:33.000000000 -0800
@@ -217,7 +217,8 @@ static int __add_zone(struct zone *zone,
return 0;
}

-static int __add_section(struct zone *zone, unsigned long phys_start_pfn)
+static int __add_section(int nid, struct zone *zone,
+ unsigned long phys_start_pfn)
{
int nr_pages = PAGES_PER_SECTION;
int ret;
@@ -235,7 +236,7 @@ static int __add_section(struct zone *zo
if (ret < 0)
return ret;

- return register_new_memory(__pfn_to_section(phys_start_pfn));
+ return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
}

#ifdef CONFIG_SPARSEMEM_VMEMMAP
@@ -274,7 +275,7 @@ static int __remove_section(struct zone
* call this function after deciding the zone to which to
* add the new pages.
*/
-int __add_pages(struct zone *zone, unsigned long phys_start_pfn,
+int __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
unsigned long nr_pages)
{
unsigned long i;
@@ -285,7 +286,7 @@ int __add_pages(struct zone *zone, unsig
end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);

for (i = start_sec; i <= end_sec; i++) {
- err = __add_section(zone, i << PFN_SECTION_SHIFT);
+ err = __add_section(nid, zone, i << PFN_SECTION_SHIFT);

/*
* EEXIST is finally dealt with by ioresource collision


2008-11-04 09:17:48

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] [REPOST #2] mm: show node to memory section relationship with symlinks in sysfs


* Gary Hade <[email protected]> wrote:

> Show node to memory section relationship with symlinks in sysfs

nice change.

> arch/x86/mm/init_32.c | 2
> arch/x86/mm/init_64.c | 2

Acked-by: Ingo Molnar <[email protected]>

Ingo

2008-11-05 20:36:41

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] [REPOST #2] mm: show node to memory section relationship with symlinks in sysfs

On Mon, 3 Nov 2008 15:48:08 -0800
Gary Hade <[email protected]> wrote:

>
> Show node to memory section relationship with symlinks in sysfs
>
> Add /sys/devices/system/node/nodeX/memoryY symlinks for all
> the memory sections located on nodeX. For example:
> /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
> indicates that memory section 135 resides on node1.
>
> Also revises documentation to cover this change as well as updating
> Documentation/ABI/testing/sysfs-devices-memory to include descriptions
> of memory hotremove files 'phys_device', 'phys_index', and 'state'
> that were previously not described there.
>
> In addition to it always being a good policy to provide users with
> the maximum possible amount of physical location information for
> resources that can be hot-added and/or hot-removed, the following
> are some (but likely not all) of the user benefits provided by
> this change.
> Immediate:
> - Provides information needed to determine the specific node
> on which a defective DIMM is located. This will reduce system
> downtime when the node or defective DIMM is swapped out.
> - Prevents unintended onlining of a memory section that was
> previously offlined due to a defective DIMM. This could happen
> during node hot-add when the user or node hot-add assist script
> onlines _all_ offlined sections due to user or script inability
> to identify the specific memory sections located on the hot-added
> node. The consequences of reintroducing the defective memory
> could be ugly.
> - Provides information needed to vary the amount and distribution
> of memory on specific nodes for testing or debugging purposes.
> Future:
> - Will provide information needed to identify the memory
> sections that need to be offlined prior to physical removal
> of a specific node.
>
> Symlink creation during boot was tested on 2-node x86_64, 2-node
> ppc64, and 2-node ia64 systems. Symlink creation during physical
> memory hot-add tested on a 2-node x86_64 system.
>
> Supersedes the "mm: show memory section to node relationship in sysfs"
> patch posted on 05 Sept 2008 which created node ID containing 'node'
> files in /sys/devices/system/memory/memoryX instead of symlinks.
> Changed from files to symlinks due to feedback that symlinks were
> more consistent with the sysfs way.
>
> ...
>
> Documentation/ABI/testing/sysfs-devices-memory | 51 +++++++
> Documentation/memory-hotplug.txt | 16 +-
> arch/ia64/mm/init.c | 2
> arch/powerpc/mm/mem.c | 2
> arch/s390/mm/init.c | 2
> arch/sh/mm/init.c | 3
> arch/x86/mm/init_32.c | 2
> arch/x86/mm/init_64.c | 2
> drivers/base/memory.c | 19 +-
> drivers/base/node.c | 100 +++++++++++++++
> include/linux/memory.h | 6
> include/linux/memory_hotplug.h | 2
> include/linux/node.h | 13 +
> mm/memory_hotplug.c | 9 -
> 14 files changed, 205 insertions(+), 24 deletions(-)

Dumb question: why do this with a symlink forest instead of, say, cat
/proc/sys/vm/mem-sections?

2008-11-05 21:05:21

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] [REPOST #2] mm: show node to memory section relationship with symlinks in sysfs

On Wed, 2008-11-05 at 12:36 -0800, Andrew Morton wrote:
> Dumb question: why do this with a symlink forest instead of, say, cat
> /proc/sys/vm/mem-sections?

The basic problem is that we on/offline memory based on sections and not
nodes. But, physically, people care about nodes.

So, the question we're answering is "to which sections does this node's
memory belong?". We could just put all this data in one big file and
have:

$ cat /proc/sys/vm/mem-sections?
node: section numbers
0: 1 2 3 4 5
1: 5 6 7 8
2: 99 100 101 102

But, we have the nodes in sysfs and we also have the sections in sysfs
and I don't want Greg to be mean to me. He's scary. We could simply
dump the section numbers in sysfs, but the first thing userspace is
going to do is:

for section in /sys/devices/system/node/node1/memory*; do
nr=$(cat $section)
cat foo > /sys/devices/system/memory/memory$nr/bar
done

Making the symlinks makes it harder for us to screw this process up,
both in the kernel and in userspace. Plus, symlinks are easy to code up
in sysfs.

-- Dave

2008-11-05 22:50:29

by Gary Hade

[permalink] [raw]
Subject: Re: [PATCH] [REPOST #2] mm: show node to memory section relationship with symlinks in sysfs

On Wed, Nov 05, 2008 at 01:03:44PM -0800, Dave Hansen wrote:
> On Wed, 2008-11-05 at 12:36 -0800, Andrew Morton wrote:
> > Dumb question: why do this with a symlink forest instead of, say, cat
> > /proc/sys/vm/mem-sections?
>
> The basic problem is that we on/offline memory based on sections and not
> nodes. But, physically, people care about nodes.
>
> So, the question we're answering is "to which sections does this node's
> memory belong?". We could just put all this data in one big file and
> have:
>
> $ cat /proc/sys/vm/mem-sections?
> node: section numbers
> 0: 1 2 3 4 5
> 1: 5 6 7 8
> 2: 99 100 101 102
>
> But, we have the nodes in sysfs and we also have the sections in sysfs
> and I don't want Greg to be mean to me. He's scary. We could simply
> dump the section numbers in sysfs, but the first thing userspace is
> going to do is:
>
> for section in /sys/devices/system/node/node1/memory*; do
> nr=$(cat $section)
> cat foo > /sys/devices/system/memory/memory$nr/bar
> done
>
> Making the symlinks makes it harder for us to screw this process up,
> both in the kernel and in userspace. Plus, symlinks are easy to code up
> in sysfs.

The new symlinks to the mem sections directories from within
the node directories are also consistent with the presidence set
by symlinks to the CPU directories from these same locations.

Gary

--
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503 IBM T/L: 775-4503
[email protected]
http://www.ibm.com/linux/ltc

2008-11-12 22:15:22

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [PATCH] [REPOST #2] mm: show node to memory section relationship with symlinks in sysfs

On Mon, 2008-11-03 at 15:48 -0800, Gary Hade wrote:
> Show node to memory section relationship with symlinks in sysfs
>
> Add /sys/devices/system/node/nodeX/memoryY symlinks for all
> the memory sections located on nodeX. For example:
> /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
> indicates that memory section 135 resides on node1.
>
> Also revises documentation to cover this change as well as updating
> Documentation/ABI/testing/sysfs-devices-memory to include descriptions
> of memory hotremove files 'phys_device', 'phys_index', and 'state'
> that were previously not described there.
>
> In addition to it always being a good policy to provide users with
> the maximum possible amount of physical location information for
> resources that can be hot-added and/or hot-removed, the following
> are some (but likely not all) of the user benefits provided by
> this change.
> Immediate:
> - Provides information needed to determine the specific node
> on which a defective DIMM is located. This will reduce system
> downtime when the node or defective DIMM is swapped out.
> - Prevents unintended onlining of a memory section that was
> previously offlined due to a defective DIMM. This could happen
> during node hot-add when the user or node hot-add assist script
> onlines _all_ offlined sections due to user or script inability
> to identify the specific memory sections located on the hot-added
> node. The consequences of reintroducing the defective memory
> could be ugly.
> - Provides information needed to vary the amount and distribution
> of memory on specific nodes for testing or debugging purposes.
> Future:
> - Will provide information needed to identify the memory
> sections that need to be offlined prior to physical removal
> of a specific node.
>
> Symlink creation during boot was tested on 2-node x86_64, 2-node
> ppc64, and 2-node ia64 systems. Symlink creation during physical
> memory hot-add tested on a 2-node x86_64 system.
>
> Supersedes the "mm: show memory section to node relationship in sysfs"
> patch posted on 05 Sept 2008 which created node ID containing 'node'
> files in /sys/devices/system/memory/memoryX instead of symlinks.
> Changed from files to symlinks due to feedback that symlinks were
> more consistent with the sysfs way.
>
> Supersedes the "mm: show node to memory section relationship with
> symlinks in sysfs" patch posted on 29 Sept 2008 to address a Yasunori
> Goto reported problem where an incorrect symlink was created due to
> a range of uninitialized pages at the beginning of a section. This
> problem which produced a symlink in /sys/devices/system/node/node0
> that incorrectly referenced a mem section located on node1 is corrected
> in this version. This version also covers the case were a mem section
> could span multiple nodes.
>
> Supersedes the "mm: show node to memory section relationship with
> symlinks in sysfs" patch posted on 09 Oct 2008 to add the Andrew
> Morton requested usefulness information and update to apply cleanly
> to 2.6.28-rc3 and 2.6-git. Code is unchanged.
>
> Signed-off-by: Gary Hade <[email protected]>
> Signed-off-by: Badari Pulavarty <[email protected]>
>

Hi Gary,

While testing latest mmtom (which has this patch) ran into an issue
with sysfs files. What I noticed was, with this patch "memoryXX"
directories in /sys/devices/system/memory/ are not getting cleaned up.
Backing out the patch seems to fix the problem.

When I tried to remove 64 blocks of memory, empty directories are
stayed around. (look at memory151 - memory215). This is causing OOPS
while trying to add memory block again. I think this could be because
of the symlink added from node directory. Can you look ?

Thanks,
Badari

.
./memory0
./memory0/phys_index
./memory0/state
./memory0/phys_device
./memory0/removable
./memory1
./memory1/phys_index
./memory1/state
./memory1/phys_device
./memory1/removable
./memory2
./memory2/phys_index
./memory2/state
./memory2/phys_device
./memory2/removable
./memory3
./memory3/phys_index
./memory3/state
./memory3/phys_device
./memory3/removable
./memory4
./memory4/phys_index
./memory4/state
./memory4/phys_device
./memory4/removable
./memory5
./memory5/phys_index
./memory5/state
./memory5/phys_device
./memory5/removable
./memory6
./memory6/phys_index
./memory6/state
./memory6/phys_device
./memory6/removable
./memory7
./memory7/phys_index
./memory7/state
./memory7/phys_device
./memory7/removable
./memory8
./memory8/phys_index
./memory8/state
./memory8/phys_device
./memory8/removable
./memory9
./memory9/phys_index
./memory9/state
./memory9/phys_device
./memory9/removable
./memory10
./memory10/phys_index
./memory10/state
./memory10/phys_device
./memory10/removable
./memory11
./memory11/phys_index
./memory11/state
./memory11/phys_device
./memory11/removable
./memory12
./memory12/phys_index
./memory12/state
./memory12/phys_device
./memory12/removable
./memory13
./memory13/phys_index
./memory13/state
./memory13/phys_device
./memory13/removable
./memory14
./memory14/phys_index
./memory14/state
./memory14/phys_device
./memory14/removable
./memory15
./memory15/phys_index
./memory15/state
./memory15/phys_device
./memory15/removable
./memory16
./memory16/phys_index
./memory16/state
./memory16/phys_device
./memory16/removable
./memory17
./memory17/phys_index
./memory17/state
./memory17/phys_device
./memory17/removable
./memory18
./memory18/phys_index
./memory18/state
./memory18/phys_device
./memory18/removable
./memory19
./memory19/phys_index
./memory19/state
./memory19/phys_device
./memory19/removable
./memory20
./memory20/phys_index
./memory20/state
./memory20/phys_device
./memory20/removable
./memory21
./memory21/phys_index
./memory21/state
./memory21/phys_device
./memory21/removable
./memory22
./memory22/phys_index
./memory22/state
./memory22/phys_device
./memory22/removable
./memory23
./memory23/phys_index
./memory23/state
./memory23/phys_device
./memory23/removable
./memory24
./memory24/phys_index
./memory24/state
./memory24/phys_device
./memory24/removable
./memory25
./memory25/phys_index
./memory25/state
./memory25/phys_device
./memory25/removable
./memory26
./memory26/phys_index
./memory26/state
./memory26/phys_device
./memory26/removable
./memory27
./memory27/phys_index
./memory27/state
./memory27/phys_device
./memory27/removable
./memory28
./memory28/phys_index
./memory28/state
./memory28/phys_device
./memory28/removable
./memory29
./memory29/phys_index
./memory29/state
./memory29/phys_device
./memory29/removable
./memory30
./memory30/phys_index
./memory30/state
./memory30/phys_device
./memory30/removable
./memory31
./memory31/phys_index
./memory31/state
./memory31/phys_device
./memory31/removable
./memory32
./memory32/phys_index
./memory32/state
./memory32/phys_device
./memory32/removable
./memory33
./memory33/phys_index
./memory33/state
./memory33/phys_device
./memory33/removable
./memory34
./memory34/phys_index
./memory34/state
./memory34/phys_device
./memory34/removable
./memory35
./memory35/phys_index
./memory35/state
./memory35/phys_device
./memory35/removable
./memory36
./memory36/phys_index
./memory36/state
./memory36/phys_device
./memory36/removable
./memory37
./memory37/phys_index
./memory37/state
./memory37/phys_device
./memory37/removable
./memory38
./memory38/phys_index
./memory38/state
./memory38/phys_device
./memory38/removable
./memory39
./memory39/phys_index
./memory39/state
./memory39/phys_device
./memory39/removable
./memory40
./memory40/phys_index
./memory40/state
./memory40/phys_device
./memory40/removable
./memory41
./memory41/phys_index
./memory41/state
./memory41/phys_device
./memory41/removable
./memory42
./memory42/phys_index
./memory42/state
./memory42/phys_device
./memory42/removable
./memory43
./memory43/phys_index
./memory43/state
./memory43/phys_device
./memory43/removable
./memory44
./memory44/phys_index
./memory44/state
./memory44/phys_device
./memory44/removable
./memory45
./memory45/phys_index
./memory45/state
./memory45/phys_device
./memory45/removable
./memory46
./memory46/phys_index
./memory46/state
./memory46/phys_device
./memory46/removable
./memory47
./memory47/phys_index
./memory47/state
./memory47/phys_device
./memory47/removable
./memory48
./memory48/phys_index
./memory48/state
./memory48/phys_device
./memory48/removable
./memory49
./memory49/phys_index
./memory49/state
./memory49/phys_device
./memory49/removable
./memory50
./memory50/phys_index
./memory50/state
./memory50/phys_device
./memory50/removable
./memory51
./memory51/phys_index
./memory51/state
./memory51/phys_device
./memory51/removable
./memory52
./memory52/phys_index
./memory52/state
./memory52/phys_device
./memory52/removable
./memory53
./memory53/phys_index
./memory53/state
./memory53/phys_device
./memory53/removable
./memory54
./memory54/phys_index
./memory54/state
./memory54/phys_device
./memory54/removable
./memory55
./memory55/phys_index
./memory55/state
./memory55/phys_device
./memory55/removable
./memory56
./memory56/phys_index
./memory56/state
./memory56/phys_device
./memory56/removable
./memory57
./memory57/phys_index
./memory57/state
./memory57/phys_device
./memory57/removable
./memory58
./memory58/phys_index
./memory58/state
./memory58/phys_device
./memory58/removable
./memory59
./memory59/phys_index
./memory59/state
./memory59/phys_device
./memory59/removable
./memory60
./memory60/phys_index
./memory60/state
./memory60/phys_device
./memory60/removable
./memory61
./memory61/phys_index
./memory61/state
./memory61/phys_device
./memory61/removable
./memory62
./memory62/phys_index
./memory62/state
./memory62/phys_device
./memory62/removable
./memory63
./memory63/phys_index
./memory63/state
./memory63/phys_device
./memory63/removable
./memory64
./memory64/phys_index
./memory64/state
./memory64/phys_device
./memory64/removable
./memory65
./memory65/phys_index
./memory65/state
./memory65/phys_device
./memory65/removable
./memory66
./memory66/phys_index
./memory66/state
./memory66/phys_device
./memory66/removable
./memory67
./memory67/phys_index
./memory67/state
./memory67/phys_device
./memory67/removable
./memory68
./memory68/phys_index
./memory68/state
./memory68/phys_device
./memory68/removable
./memory69
./memory69/phys_index
./memory69/state
./memory69/phys_device
./memory69/removable
./memory70
./memory70/phys_index
./memory70/state
./memory70/phys_device
./memory70/removable
./memory71
./memory71/phys_index
./memory71/state
./memory71/phys_device
./memory71/removable
./memory72
./memory72/phys_index
./memory72/state
./memory72/phys_device
./memory72/removable
./memory73
./memory73/phys_index
./memory73/state
./memory73/phys_device
./memory73/removable
./memory74
./memory74/phys_index
./memory74/state
./memory74/phys_device
./memory74/removable
./memory75
./memory75/phys_index
./memory75/state
./memory75/phys_device
./memory75/removable
./memory76
./memory76/phys_index
./memory76/state
./memory76/phys_device
./memory76/removable
./memory77
./memory77/phys_index
./memory77/state
./memory77/phys_device
./memory77/removable
./memory78
./memory78/phys_index
./memory78/state
./memory78/phys_device
./memory78/removable
./memory79
./memory79/phys_index
./memory79/state
./memory79/phys_device
./memory79/removable
./memory80
./memory80/phys_index
./memory80/state
./memory80/phys_device
./memory80/removable
./memory81
./memory81/phys_index
./memory81/state
./memory81/phys_device
./memory81/removable
./memory82
./memory82/phys_index
./memory82/state
./memory82/phys_device
./memory82/removable
./memory83
./memory83/phys_index
./memory83/state
./memory83/phys_device
./memory83/removable
./memory84
./memory84/phys_index
./memory84/state
./memory84/phys_device
./memory84/removable
./memory85
./memory85/phys_index
./memory85/state
./memory85/phys_device
./memory85/removable
./memory86
./memory86/phys_index
./memory86/state
./memory86/phys_device
./memory86/removable
./memory87
./memory87/phys_index
./memory87/state
./memory87/phys_device
./memory87/removable
./memory88
./memory88/phys_index
./memory88/state
./memory88/phys_device
./memory88/removable
./memory89
./memory89/phys_index
./memory89/state
./memory89/phys_device
./memory89/removable
./memory90
./memory90/phys_index
./memory90/state
./memory90/phys_device
./memory90/removable
./memory91
./memory91/phys_index
./memory91/state
./memory91/phys_device
./memory91/removable
./memory92
./memory92/phys_index
./memory92/state
./memory92/phys_device
./memory92/removable
./memory93
./memory93/phys_index
./memory93/state
./memory93/phys_device
./memory93/removable
./memory94
./memory94/phys_index
./memory94/state
./memory94/phys_device
./memory94/removable
./memory95
./memory95/phys_index
./memory95/state
./memory95/phys_device
./memory95/removable
./memory96
./memory96/phys_index
./memory96/state
./memory96/phys_device
./memory96/removable
./memory97
./memory97/phys_index
./memory97/state
./memory97/phys_device
./memory97/removable
./memory98
./memory98/phys_index
./memory98/state
./memory98/phys_device
./memory98/removable
./memory99
./memory99/phys_index
./memory99/state
./memory99/phys_device
./memory99/removable
./memory100
./memory100/phys_index
./memory100/state
./memory100/phys_device
./memory100/removable
./memory101
./memory101/phys_index
./memory101/state
./memory101/phys_device
./memory101/removable
./memory102
./memory102/phys_index
./memory102/state
./memory102/phys_device
./memory102/removable
./memory103
./memory103/phys_index
./memory103/state
./memory103/phys_device
./memory103/removable
./memory104
./memory104/phys_index
./memory104/state
./memory104/phys_device
./memory104/removable
./memory105
./memory105/phys_index
./memory105/state
./memory105/phys_device
./memory105/removable
./memory106
./memory106/phys_index
./memory106/state
./memory106/phys_device
./memory106/removable
./memory107
./memory107/phys_index
./memory107/state
./memory107/phys_device
./memory107/removable
./memory108
./memory108/phys_index
./memory108/state
./memory108/phys_device
./memory108/removable
./memory109
./memory109/phys_index
./memory109/state
./memory109/phys_device
./memory109/removable
./memory110
./memory110/phys_index
./memory110/state
./memory110/phys_device
./memory110/removable
./memory111
./memory111/phys_index
./memory111/state
./memory111/phys_device
./memory111/removable
./memory112
./memory112/phys_index
./memory112/state
./memory112/phys_device
./memory112/removable
./memory113
./memory113/phys_index
./memory113/state
./memory113/phys_device
./memory113/removable
./memory114
./memory114/phys_index
./memory114/state
./memory114/phys_device
./memory114/removable
./memory115
./memory115/phys_index
./memory115/state
./memory115/phys_device
./memory115/removable
./memory116
./memory116/phys_index
./memory116/state
./memory116/phys_device
./memory116/removable
./memory117
./memory117/phys_index
./memory117/state
./memory117/phys_device
./memory117/removable
./memory118
./memory118/phys_index
./memory118/state
./memory118/phys_device
./memory118/removable
./memory119
./memory119/phys_index
./memory119/state
./memory119/phys_device
./memory119/removable
./memory120
./memory120/phys_index
./memory120/state
./memory120/phys_device
./memory120/removable
./memory121
./memory121/phys_index
./memory121/state
./memory121/phys_device
./memory121/removable
./memory122
./memory122/phys_index
./memory122/state
./memory122/phys_device
./memory122/removable
./memory123
./memory123/phys_index
./memory123/state
./memory123/phys_device
./memory123/removable
./memory124
./memory124/phys_index
./memory124/state
./memory124/phys_device
./memory124/removable
./memory125
./memory125/phys_index
./memory125/state
./memory125/phys_device
./memory125/removable
./memory126
./memory126/phys_index
./memory126/state
./memory126/phys_device
./memory126/removable
./memory127
./memory127/phys_index
./memory127/state
./memory127/phys_device
./memory127/removable
./memory128
./memory128/phys_index
./memory128/state
./memory128/phys_device
./memory128/removable
./memory129
./memory129/phys_index
./memory129/state
./memory129/phys_device
./memory129/removable
./memory130
./memory130/phys_index
./memory130/state
./memory130/phys_device
./memory130/removable
./memory131
./memory131/phys_index
./memory131/state
./memory131/phys_device
./memory131/removable
./memory132
./memory132/phys_index
./memory132/state
./memory132/phys_device
./memory132/removable
./memory133
./memory133/phys_index
./memory133/state
./memory133/phys_device
./memory133/removable
./memory134
./memory134/phys_index
./memory134/state
./memory134/phys_device
./memory134/removable
./memory135
./memory135/phys_index
./memory135/state
./memory135/phys_device
./memory135/removable
./memory136
./memory136/phys_index
./memory136/state
./memory136/phys_device
./memory136/removable
./memory137
./memory137/phys_index
./memory137/state
./memory137/phys_device
./memory137/removable
./memory138
./memory138/phys_index
./memory138/state
./memory138/phys_device
./memory138/removable
./memory139
./memory139/phys_index
./memory139/state
./memory139/phys_device
./memory139/removable
./memory140
./memory140/phys_index
./memory140/state
./memory140/phys_device
./memory140/removable
./memory141
./memory141/phys_index
./memory141/state
./memory141/phys_device
./memory141/removable
./memory142
./memory142/phys_index
./memory142/state
./memory142/phys_device
./memory142/removable
./memory143
./memory143/phys_index
./memory143/state
./memory143/phys_device
./memory143/removable
./memory144
./memory144/phys_index
./memory144/state
./memory144/phys_device
./memory144/removable
./memory145
./memory145/phys_index
./memory145/state
./memory145/phys_device
./memory145/removable
./memory146
./memory146/phys_index
./memory146/state
./memory146/phys_device
./memory146/removable
./memory147
./memory147/phys_index
./memory147/state
./memory147/phys_device
./memory147/removable
./memory148
./memory148/phys_index
./memory148/state
./memory148/phys_device
./memory148/removable
./memory149
./memory149/phys_index
./memory149/state
./memory149/phys_device
./memory149/removable
./memory150
./memory150/phys_index
./memory150/state
./memory150/phys_device
./memory150/removable
./memory151
./memory151/phys_index
./memory151/state
./memory151/phys_device
./memory151/removable
./memory152
./memory152/phys_index
./memory152/state
./memory152/phys_device
./memory152/removable
./memory153
./memory252
./memory252/phys_index
./memory252/state
./memory252/phys_device
./memory252/removable
./memory154
./memory253
./memory253/phys_index
./memory253/state
./memory253/phys_device
./memory253/removable
./memory155
./memory156
./memory254
./memory254/phys_index
./memory254/state
./memory254/phys_device
./memory254/removable
./memory157
./memory255
./memory255/phys_index
./memory255/state
./memory255/phys_device
./memory255/removable
./memory158
./memory159
./memory160
./memory161
./memory162
./memory163
./memory164
./memory165
./memory166
./memory167
./memory168
./memory169
./memory170
./memory171
./memory172
./memory173
./memory174
./memory175
./memory176
./memory177
./memory178
./memory179
./memory180
./memory181
./memory182
./memory183
./memory184
./memory185
./memory186
./memory187
./memory188
./memory189
./memory190
./memory191
./memory192
./memory193
./memory194
./memory195
./memory196
./memory197
./memory198
./memory199
./memory200
./memory201
./memory202
./memory203
./memory204
./memory205
./memory206
./memory207
./memory208
./memory209
./memory210
./memory211
./memory212
./memory213
./memory214
./memory215
./memory216
./memory216/phys_index
./memory216/state
./memory216/phys_device
./memory216/removable
./memory217
./memory218
./memory218/phys_index
./memory218/state
./memory218/phys_device
./memory218/removable
./memory219
./memory219/phys_index
./memory219/state
./memory219/phys_device
./memory219/removable
./memory220
./memory220/phys_index
./memory220/state
./memory220/phys_device
./memory220/removable
./memory221
./memory221/phys_index
./memory221/state
./memory221/phys_device
./memory221/removable
./memory222
./memory222/phys_index
./memory222/state
./memory222/phys_device
./memory222/removable
./memory223
./memory223/phys_index
./memory223/state
./memory223/phys_device
./memory223/removable
./memory224
./memory224/phys_index
./memory224/state
./memory224/phys_device
./memory224/removable
./memory225
./memory225/phys_index
./memory225/state
./memory225/phys_device
./memory225/removable
./memory226
./memory226/phys_index
./memory226/state
./memory226/phys_device
./memory226/removable
./memory227
./memory227/phys_index
./memory227/state
./memory227/phys_device
./memory227/removable
./memory228
./memory228/phys_index
./memory228/state
./memory228/phys_device
./memory228/removable
./memory229
./memory229/phys_index
./memory229/state
./memory229/phys_device
./memory229/removable
./memory230
./memory230/phys_index
./memory230/state
./memory230/phys_device
./memory230/removable
./memory231
./memory231/phys_index
./memory231/state
./memory231/phys_device
./memory231/removable
./memory232
./memory232/phys_index
./memory232/state
./memory232/phys_device
./memory232/removable
./memory233
./memory233/phys_index
./memory233/state
./memory233/phys_device
./memory233/removable
./memory234
./memory234/phys_index
./memory234/state
./memory234/phys_device
./memory234/removable
./probe
./block_size_bytes

2008-11-13 16:54:22

by Gary Hade

[permalink] [raw]
Subject: Re: [PATCH] [REPOST #2] mm: show node to memory section relationship with symlinks in sysfs

On Wed, Nov 12, 2008 at 02:16:15PM -0800, Badari Pulavarty wrote:
> On Mon, 2008-11-03 at 15:48 -0800, Gary Hade wrote:
> > Show node to memory section relationship with symlinks in sysfs
> >
> > Add /sys/devices/system/node/nodeX/memoryY symlinks for all
> > the memory sections located on nodeX. For example:
> > /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
> > indicates that memory section 135 resides on node1.
> >
> > Also revises documentation to cover this change as well as updating
> > Documentation/ABI/testing/sysfs-devices-memory to include descriptions
> > of memory hotremove files 'phys_device', 'phys_index', and 'state'
> > that were previously not described there.
> >
> > In addition to it always being a good policy to provide users with
> > the maximum possible amount of physical location information for
> > resources that can be hot-added and/or hot-removed, the following
> > are some (but likely not all) of the user benefits provided by
> > this change.
> > Immediate:
> > - Provides information needed to determine the specific node
> > on which a defective DIMM is located. This will reduce system
> > downtime when the node or defective DIMM is swapped out.
> > - Prevents unintended onlining of a memory section that was
> > previously offlined due to a defective DIMM. This could happen
> > during node hot-add when the user or node hot-add assist script
> > onlines _all_ offlined sections due to user or script inability
> > to identify the specific memory sections located on the hot-added
> > node. The consequences of reintroducing the defective memory
> > could be ugly.
> > - Provides information needed to vary the amount and distribution
> > of memory on specific nodes for testing or debugging purposes.
> > Future:
> > - Will provide information needed to identify the memory
> > sections that need to be offlined prior to physical removal
> > of a specific node.
> >
> > Symlink creation during boot was tested on 2-node x86_64, 2-node
> > ppc64, and 2-node ia64 systems. Symlink creation during physical
> > memory hot-add tested on a 2-node x86_64 system.
> >
> > Supersedes the "mm: show memory section to node relationship in sysfs"
> > patch posted on 05 Sept 2008 which created node ID containing 'node'
> > files in /sys/devices/system/memory/memoryX instead of symlinks.
> > Changed from files to symlinks due to feedback that symlinks were
> > more consistent with the sysfs way.
> >
> > Supersedes the "mm: show node to memory section relationship with
> > symlinks in sysfs" patch posted on 29 Sept 2008 to address a Yasunori
> > Goto reported problem where an incorrect symlink was created due to
> > a range of uninitialized pages at the beginning of a section. This
> > problem which produced a symlink in /sys/devices/system/node/node0
> > that incorrectly referenced a mem section located on node1 is corrected
> > in this version. This version also covers the case were a mem section
> > could span multiple nodes.
> >
> > Supersedes the "mm: show node to memory section relationship with
> > symlinks in sysfs" patch posted on 09 Oct 2008 to add the Andrew
> > Morton requested usefulness information and update to apply cleanly
> > to 2.6.28-rc3 and 2.6-git. Code is unchanged.
> >
> > Signed-off-by: Gary Hade <[email protected]>
> > Signed-off-by: Badari Pulavarty <[email protected]>
> >
>
> Hi Gary,
>
> While testing latest mmtom (which has this patch) ran into an issue
> with sysfs files. What I noticed was, with this patch "memoryXX"
> directories in /sys/devices/system/memory/ are not getting cleaned up.
> Backing out the patch seems to fix the problem.
>
> When I tried to remove 64 blocks of memory, empty directories are
> stayed around. (look at memory151 - memory215). This is causing OOPS
> while trying to add memory block again. I think this could be because
> of the symlink added from node directory. Can you look ?

Badari, The call to unregister_mem_sect_under_nodes() in
remove_memory_block() preceding the removal of the files in
the memory section directory _should have_ removed all the
symlinks referencing the memory section directory. Did you
happen to check to see if the symlinks to memory151-memory215
were still present?

Gary

--
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503 IBM T/L: 775-4503
[email protected]
http://www.ibm.com/linux/ltc

2008-11-13 19:11:32

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [PATCH] [REPOST #2] mm: show node to memory section relationship with symlinks in sysfs

On Thu, 2008-11-13 at 08:54 -0800, Gary Hade wrote:
> On Wed, Nov 12, 2008 at 02:16:15PM -0800, Badari Pulavarty wrote:
> > On Mon, 2008-11-03 at 15:48 -0800, Gary Hade wrote:
> > > Show node to memory section relationship with symlinks in sysfs
> > >
> > > Add /sys/devices/system/node/nodeX/memoryY symlinks for all
> > > the memory sections located on nodeX. For example:
> > > /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
> > > indicates that memory section 135 resides on node1.
> > >
> > > Also revises documentation to cover this change as well as updating
> > > Documentation/ABI/testing/sysfs-devices-memory to include descriptions
> > > of memory hotremove files 'phys_device', 'phys_index', and 'state'
> > > that were previously not described there.
> > >
> > > In addition to it always being a good policy to provide users with
> > > the maximum possible amount of physical location information for
> > > resources that can be hot-added and/or hot-removed, the following
> > > are some (but likely not all) of the user benefits provided by
> > > this change.
> > > Immediate:
> > > - Provides information needed to determine the specific node
> > > on which a defective DIMM is located. This will reduce system
> > > downtime when the node or defective DIMM is swapped out.
> > > - Prevents unintended onlining of a memory section that was
> > > previously offlined due to a defective DIMM. This could happen
> > > during node hot-add when the user or node hot-add assist script
> > > onlines _all_ offlined sections due to user or script inability
> > > to identify the specific memory sections located on the hot-added
> > > node. The consequences of reintroducing the defective memory
> > > could be ugly.
> > > - Provides information needed to vary the amount and distribution
> > > of memory on specific nodes for testing or debugging purposes.
> > > Future:
> > > - Will provide information needed to identify the memory
> > > sections that need to be offlined prior to physical removal
> > > of a specific node.
> > >
> > > Symlink creation during boot was tested on 2-node x86_64, 2-node
> > > ppc64, and 2-node ia64 systems. Symlink creation during physical
> > > memory hot-add tested on a 2-node x86_64 system.
> > >
> > > Supersedes the "mm: show memory section to node relationship in sysfs"
> > > patch posted on 05 Sept 2008 which created node ID containing 'node'
> > > files in /sys/devices/system/memory/memoryX instead of symlinks.
> > > Changed from files to symlinks due to feedback that symlinks were
> > > more consistent with the sysfs way.
> > >
> > > Supersedes the "mm: show node to memory section relationship with
> > > symlinks in sysfs" patch posted on 29 Sept 2008 to address a Yasunori
> > > Goto reported problem where an incorrect symlink was created due to
> > > a range of uninitialized pages at the beginning of a section. This
> > > problem which produced a symlink in /sys/devices/system/node/node0
> > > that incorrectly referenced a mem section located on node1 is corrected
> > > in this version. This version also covers the case were a mem section
> > > could span multiple nodes.
> > >
> > > Supersedes the "mm: show node to memory section relationship with
> > > symlinks in sysfs" patch posted on 09 Oct 2008 to add the Andrew
> > > Morton requested usefulness information and update to apply cleanly
> > > to 2.6.28-rc3 and 2.6-git. Code is unchanged.
> > >
> > > Signed-off-by: Gary Hade <[email protected]>
> > > Signed-off-by: Badari Pulavarty <[email protected]>
> > >
> >
> > Hi Gary,
> >
> > While testing latest mmtom (which has this patch) ran into an issue
> > with sysfs files. What I noticed was, with this patch "memoryXX"
> > directories in /sys/devices/system/memory/ are not getting cleaned up.
> > Backing out the patch seems to fix the problem.
> >
> > When I tried to remove 64 blocks of memory, empty directories are
> > stayed around. (look at memory151 - memory215). This is causing OOPS
> > while trying to add memory block again. I think this could be because
> > of the symlink added from node directory. Can you look ?
>
> Badari, The call to unregister_mem_sect_under_nodes() in
> remove_memory_block() preceding the removal of the files in
> the memory section directory _should have_ removed all the
> symlinks referencing the memory section directory. Did you
> happen to check to see if the symlinks to memory151-memory215
> were still present?

Gary,

You are right. The links from "node" directory are getting removed.
I guess some how we still have an extra reference on the directory.

Look at 214-217 (I removed only 4 memory blocks this time).

Thanks,
Badari

/sys/devices/system/node/node0 # ls -ltr memory21*
lrwxrwxrwx 1 root root 0 2008-11-13 10:39 memory21 -> ../../memory/memory21
lrwxrwxrwx 1 root root 0 2008-11-13 10:39 memory219 -> ../../memory/memory219
lrwxrwxrwx 1 root root 0 2008-11-13 10:39 memory218 -> ../../memory/memory218
lrwxrwxrwx 1 root root 0 2008-11-13 10:39 memory213 -> ../../memory/memory213
lrwxrwxrwx 1 root root 0 2008-11-13 10:39 memory212 -> ../../memory/memory212
lrwxrwxrwx 1 root root 0 2008-11-13 10:39 memory211 -> ../../memory/memory211
lrwxrwxrwx 1 root root 0 2008-11-13 10:39 memory210 -> ../../memory/memory210

# find /sys/devices/system/memory/memory21?
/sys/devices/system/memory/memory210
/sys/devices/system/memory/memory210/phys_index
/sys/devices/system/memory/memory210/state
/sys/devices/system/memory/memory210/phys_device
/sys/devices/system/memory/memory210/removable
/sys/devices/system/memory/memory211
/sys/devices/system/memory/memory211/phys_index
/sys/devices/system/memory/memory211/state
/sys/devices/system/memory/memory211/phys_device
/sys/devices/system/memory/memory211/removable
/sys/devices/system/memory/memory212
/sys/devices/system/memory/memory212/phys_index
/sys/devices/system/memory/memory212/state
/sys/devices/system/memory/memory212/phys_device
/sys/devices/system/memory/memory212/removable
/sys/devices/system/memory/memory213
/sys/devices/system/memory/memory213/phys_index
/sys/devices/system/memory/memory213/state
/sys/devices/system/memory/memory213/phys_device
/sys/devices/system/memory/memory213/removable
/sys/devices/system/memory/memory214
/sys/devices/system/memory/memory215
/sys/devices/system/memory/memory216
/sys/devices/system/memory/memory217
/sys/devices/system/memory/memory218
/sys/devices/system/memory/memory218/phys_index
/sys/devices/system/memory/memory218/state
/sys/devices/system/memory/memory218/phys_device
/sys/devices/system/memory/memory218/removable
/sys/devices/system/memory/memory219


2008-11-14 16:04:48

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [PATCH] [REPOST #2] mm: show node to memory section relationship with symlinks in sysfs

On Thu, 2008-11-13 at 08:54 -0800, Gary Hade wrote:
> On Wed, Nov 12, 2008 at 02:16:15PM -0800, Badari Pulavarty wrote:
> > On Mon, 2008-11-03 at 15:48 -0800, Gary Hade wrote:
> > > Show node to memory section relationship with symlinks in sysfs
> > >
> > > Add /sys/devices/system/node/nodeX/memoryY symlinks for all
> > > the memory sections located on nodeX. For example:
> > > /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
> > > indicates that memory section 135 resides on node1.
> > >
> > > Also revises documentation to cover this change as well as updating
> > > Documentation/ABI/testing/sysfs-devices-memory to include descriptions
> > > of memory hotremove files 'phys_device', 'phys_index', and 'state'
> > > that were previously not described there.
> > >
> > > In addition to it always being a good policy to provide users with
> > > the maximum possible amount of physical location information for
> > > resources that can be hot-added and/or hot-removed, the following
> > > are some (but likely not all) of the user benefits provided by
> > > this change.
> > > Immediate:
> > > - Provides information needed to determine the specific node
> > > on which a defective DIMM is located. This will reduce system
> > > downtime when the node or defective DIMM is swapped out.
> > > - Prevents unintended onlining of a memory section that was
> > > previously offlined due to a defective DIMM. This could happen
> > > during node hot-add when the user or node hot-add assist script
> > > onlines _all_ offlined sections due to user or script inability
> > > to identify the specific memory sections located on the hot-added
> > > node. The consequences of reintroducing the defective memory
> > > could be ugly.
> > > - Provides information needed to vary the amount and distribution
> > > of memory on specific nodes for testing or debugging purposes.
> > > Future:
> > > - Will provide information needed to identify the memory
> > > sections that need to be offlined prior to physical removal
> > > of a specific node.
> > >
> > > Symlink creation during boot was tested on 2-node x86_64, 2-node
> > > ppc64, and 2-node ia64 systems. Symlink creation during physical
> > > memory hot-add tested on a 2-node x86_64 system.
> > >
> > > Supersedes the "mm: show memory section to node relationship in sysfs"
> > > patch posted on 05 Sept 2008 which created node ID containing 'node'
> > > files in /sys/devices/system/memory/memoryX instead of symlinks.
> > > Changed from files to symlinks due to feedback that symlinks were
> > > more consistent with the sysfs way.
> > >
> > > Supersedes the "mm: show node to memory section relationship with
> > > symlinks in sysfs" patch posted on 29 Sept 2008 to address a Yasunori
> > > Goto reported problem where an incorrect symlink was created due to
> > > a range of uninitialized pages at the beginning of a section. This
> > > problem which produced a symlink in /sys/devices/system/node/node0
> > > that incorrectly referenced a mem section located on node1 is corrected
> > > in this version. This version also covers the case were a mem section
> > > could span multiple nodes.
> > >
> > > Supersedes the "mm: show node to memory section relationship with
> > > symlinks in sysfs" patch posted on 09 Oct 2008 to add the Andrew
> > > Morton requested usefulness information and update to apply cleanly
> > > to 2.6.28-rc3 and 2.6-git. Code is unchanged.
> > >
> > > Signed-off-by: Gary Hade <[email protected]>
> > > Signed-off-by: Badari Pulavarty <[email protected]>
> > >
> >
> > Hi Gary,
> >
> > While testing latest mmtom (which has this patch) ran into an issue
> > with sysfs files. What I noticed was, with this patch "memoryXX"
> > directories in /sys/devices/system/memory/ are not getting cleaned up.
> > Backing out the patch seems to fix the problem.
> >
> > When I tried to remove 64 blocks of memory, empty directories are
> > stayed around. (look at memory151 - memory215). This is causing OOPS
> > while trying to add memory block again. I think this could be because
> > of the symlink added from node directory. Can you look ?
>
> Badari, The call to unregister_mem_sect_under_nodes() in
> remove_memory_block() preceding the removal of the files in
> the memory section directory _should have_ removed all the
> symlinks referencing the memory section directory. Did you
> happen to check to see if the symlinks to memory151-memory215
> were still present?
>
> Gary
>

Hi Gary,

As discussed earlier, patch is leaving an extra reference on the
memoryX directory. Needs a kobject_put() to match the reference
you get in find_memory_block().

Could you update the patch and resend it ?

Thanks,
Badari

2008-11-14 16:41:53

by Gary Hade

[permalink] [raw]
Subject: Re: [PATCH] [REPOST #2] mm: show node to memory section relationship with symlinks in sysfs

On Fri, Nov 14, 2008 at 08:05:17AM -0800, Badari Pulavarty wrote:
> On Thu, 2008-11-13 at 08:54 -0800, Gary Hade wrote:
> > On Wed, Nov 12, 2008 at 02:16:15PM -0800, Badari Pulavarty wrote:
> > > On Mon, 2008-11-03 at 15:48 -0800, Gary Hade wrote:
> > > > Show node to memory section relationship with symlinks in sysfs
> > > >
> > > > Add /sys/devices/system/node/nodeX/memoryY symlinks for all
> > > > the memory sections located on nodeX. For example:
> > > > /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
> > > > indicates that memory section 135 resides on node1.
> > > >
> > > > Also revises documentation to cover this change as well as updating
> > > > Documentation/ABI/testing/sysfs-devices-memory to include descriptions
> > > > of memory hotremove files 'phys_device', 'phys_index', and 'state'
> > > > that were previously not described there.
> > > >
> > > > In addition to it always being a good policy to provide users with
> > > > the maximum possible amount of physical location information for
> > > > resources that can be hot-added and/or hot-removed, the following
> > > > are some (but likely not all) of the user benefits provided by
> > > > this change.
> > > > Immediate:
> > > > - Provides information needed to determine the specific node
> > > > on which a defective DIMM is located. This will reduce system
> > > > downtime when the node or defective DIMM is swapped out.
> > > > - Prevents unintended onlining of a memory section that was
> > > > previously offlined due to a defective DIMM. This could happen
> > > > during node hot-add when the user or node hot-add assist script
> > > > onlines _all_ offlined sections due to user or script inability
> > > > to identify the specific memory sections located on the hot-added
> > > > node. The consequences of reintroducing the defective memory
> > > > could be ugly.
> > > > - Provides information needed to vary the amount and distribution
> > > > of memory on specific nodes for testing or debugging purposes.
> > > > Future:
> > > > - Will provide information needed to identify the memory
> > > > sections that need to be offlined prior to physical removal
> > > > of a specific node.
> > > >
> > > > Symlink creation during boot was tested on 2-node x86_64, 2-node
> > > > ppc64, and 2-node ia64 systems. Symlink creation during physical
> > > > memory hot-add tested on a 2-node x86_64 system.
> > > >
> > > > Supersedes the "mm: show memory section to node relationship in sysfs"
> > > > patch posted on 05 Sept 2008 which created node ID containing 'node'
> > > > files in /sys/devices/system/memory/memoryX instead of symlinks.
> > > > Changed from files to symlinks due to feedback that symlinks were
> > > > more consistent with the sysfs way.
> > > >
> > > > Supersedes the "mm: show node to memory section relationship with
> > > > symlinks in sysfs" patch posted on 29 Sept 2008 to address a Yasunori
> > > > Goto reported problem where an incorrect symlink was created due to
> > > > a range of uninitialized pages at the beginning of a section. This
> > > > problem which produced a symlink in /sys/devices/system/node/node0
> > > > that incorrectly referenced a mem section located on node1 is corrected
> > > > in this version. This version also covers the case were a mem section
> > > > could span multiple nodes.
> > > >
> > > > Supersedes the "mm: show node to memory section relationship with
> > > > symlinks in sysfs" patch posted on 09 Oct 2008 to add the Andrew
> > > > Morton requested usefulness information and update to apply cleanly
> > > > to 2.6.28-rc3 and 2.6-git. Code is unchanged.
> > > >
> > > > Signed-off-by: Gary Hade <[email protected]>
> > > > Signed-off-by: Badari Pulavarty <[email protected]>
> > > >
> > >
> > > Hi Gary,
> > >
> > > While testing latest mmtom (which has this patch) ran into an issue
> > > with sysfs files. What I noticed was, with this patch "memoryXX"
> > > directories in /sys/devices/system/memory/ are not getting cleaned up.
> > > Backing out the patch seems to fix the problem.
> > >
> > > When I tried to remove 64 blocks of memory, empty directories are
> > > stayed around. (look at memory151 - memory215). This is causing OOPS
> > > while trying to add memory block again. I think this could be because
> > > of the symlink added from node directory. Can you look ?
> >
> > Badari, The call to unregister_mem_sect_under_nodes() in
> > remove_memory_block() preceding the removal of the files in
> > the memory section directory _should have_ removed all the
> > symlinks referencing the memory section directory. Did you
> > happen to check to see if the symlinks to memory151-memory215
> > were still present?
> >
> > Gary
> >
>
> Hi Gary,
>
> As discussed earlier, patch is leaving an extra reference on the
> memoryX directory. Needs a kobject_put() to match the reference
> you get in find_memory_block().

Badari, Thanks again for finding that!

>
> Could you update the patch and resend it ?

Will do.

Gary

--
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503 IBM T/L: 775-4503
[email protected]
http://www.ibm.com/linux/ltc

2008-11-15 00:09:12

by Gary Hade

[permalink] [raw]
Subject: Re: [PATCH] [REPOST #2] mm: show node to memory section relationship with symlinks in sysfs

On Fri, Nov 14, 2008 at 08:41:25AM -0800, Gary Hade wrote:
> On Fri, Nov 14, 2008 at 08:05:17AM -0800, Badari Pulavarty wrote:
> > On Thu, 2008-11-13 at 08:54 -0800, Gary Hade wrote:
> > > On Wed, Nov 12, 2008 at 02:16:15PM -0800, Badari Pulavarty wrote:
> > > > On Mon, 2008-11-03 at 15:48 -0800, Gary Hade wrote:
> > > > > Show node to memory section relationship with symlinks in sysfs
> > > > >
> > > > > Add /sys/devices/system/node/nodeX/memoryY symlinks for all
> > > > > the memory sections located on nodeX. For example:
> > > > > /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
> > > > > indicates that memory section 135 resides on node1.
> > > > >
> > > > > Also revises documentation to cover this change as well as updating
> > > > > Documentation/ABI/testing/sysfs-devices-memory to include descriptions
> > > > > of memory hotremove files 'phys_device', 'phys_index', and 'state'
> > > > > that were previously not described there.
> > > > >
> > > > > In addition to it always being a good policy to provide users with
> > > > > the maximum possible amount of physical location information for
> > > > > resources that can be hot-added and/or hot-removed, the following
> > > > > are some (but likely not all) of the user benefits provided by
> > > > > this change.
> > > > > Immediate:
> > > > > - Provides information needed to determine the specific node
> > > > > on which a defective DIMM is located. This will reduce system
> > > > > downtime when the node or defective DIMM is swapped out.
> > > > > - Prevents unintended onlining of a memory section that was
> > > > > previously offlined due to a defective DIMM. This could happen
> > > > > during node hot-add when the user or node hot-add assist script
> > > > > onlines _all_ offlined sections due to user or script inability
> > > > > to identify the specific memory sections located on the hot-added
> > > > > node. The consequences of reintroducing the defective memory
> > > > > could be ugly.
> > > > > - Provides information needed to vary the amount and distribution
> > > > > of memory on specific nodes for testing or debugging purposes.
> > > > > Future:
> > > > > - Will provide information needed to identify the memory
> > > > > sections that need to be offlined prior to physical removal
> > > > > of a specific node.
> > > > >
> > > > > Symlink creation during boot was tested on 2-node x86_64, 2-node
> > > > > ppc64, and 2-node ia64 systems. Symlink creation during physical
> > > > > memory hot-add tested on a 2-node x86_64 system.
> > > > >
> > > > > Supersedes the "mm: show memory section to node relationship in sysfs"
> > > > > patch posted on 05 Sept 2008 which created node ID containing 'node'
> > > > > files in /sys/devices/system/memory/memoryX instead of symlinks.
> > > > > Changed from files to symlinks due to feedback that symlinks were
> > > > > more consistent with the sysfs way.
> > > > >
> > > > > Supersedes the "mm: show node to memory section relationship with
> > > > > symlinks in sysfs" patch posted on 29 Sept 2008 to address a Yasunori
> > > > > Goto reported problem where an incorrect symlink was created due to
> > > > > a range of uninitialized pages at the beginning of a section. This
> > > > > problem which produced a symlink in /sys/devices/system/node/node0
> > > > > that incorrectly referenced a mem section located on node1 is corrected
> > > > > in this version. This version also covers the case were a mem section
> > > > > could span multiple nodes.
> > > > >
> > > > > Supersedes the "mm: show node to memory section relationship with
> > > > > symlinks in sysfs" patch posted on 09 Oct 2008 to add the Andrew
> > > > > Morton requested usefulness information and update to apply cleanly
> > > > > to 2.6.28-rc3 and 2.6-git. Code is unchanged.
> > > > >
> > > > > Signed-off-by: Gary Hade <[email protected]>
> > > > > Signed-off-by: Badari Pulavarty <[email protected]>
> > > > >
> > > >
> > > > Hi Gary,
> > > >
> > > > While testing latest mmtom (which has this patch) ran into an issue
> > > > with sysfs files. What I noticed was, with this patch "memoryXX"
> > > > directories in /sys/devices/system/memory/ are not getting cleaned up.
> > > > Backing out the patch seems to fix the problem.
> > > >
> > > > When I tried to remove 64 blocks of memory, empty directories are
> > > > stayed around. (look at memory151 - memory215). This is causing OOPS
> > > > while trying to add memory block again. I think this could be because
> > > > of the symlink added from node directory. Can you look ?
> > >
> > > Badari, The call to unregister_mem_sect_under_nodes() in
> > > remove_memory_block() preceding the removal of the files in
> > > the memory section directory _should have_ removed all the
> > > symlinks referencing the memory section directory. Did you
> > > happen to check to see if the symlinks to memory151-memory215
> > > were still present?
> > >
> > > Gary
> > >
> >
> > Hi Gary,
> >
> > As discussed earlier, patch is leaving an extra reference on the
> > memoryX directory. Needs a kobject_put() to match the reference
> > you get in find_memory_block().
>
> Badari, Thanks again for finding that!
>
> >
> > Could you update the patch and resend it ?
>
> Will do.

I just posted a replacement patch with subject line
[PATCH] [REPOST #3] mm: show node to memory section relationship
with symlinks in sysfs
In addition to addressing the memory section directory removal problem
it also contains a change to correct a 'for' loop early termination
problem that I noticed while debugging the directory removal problem.

Andrew, If it would be better from a -mm standpoint for me to give
you a small patch that would apply on top of instead of replacing
the bad one, just let me know.

Thanks,
Gary

--
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503 IBM T/L: 775-4503
[email protected]
http://www.ibm.com/linux/ltc