2006-12-08 06:53:24

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [RFC] [PATCH] virtual memmap on sparsemem v3 [0/4] introduction

Hi, virtual mem_map on sparsemem/generic patch version 3.

I myself likes this patch.
But someone may feels this patch is intrusive and scattered.
please pointing out.

Changes v2 -> v3
- make map/unmap function for general purpose. (for my purpose ;)
- drop memory hotplug support. will be posted after this goes in.
- change pfn_to_page()/page_to_pfn() defintions.
- add CONFIT_SPARSEMEM_VMEMMAP_STATIC config.
- several clean ups.
- drop optimized pfn_valid() patch will be posted later after this goes in.
- add #error to check vmem_map alignment.

Changes v1 -> v2:
- support memory hotplug case.
- uses static address for vmem_map (ia64)
- added optimized pfn_valid() for ia64 (experimental)

Intro:
When using SPARSEMEM, pfn_to_page()/page_to_pfn() accesses global big table
of mem_section. if SPARSEMEM_EXTREME, this is 2-level table lookup.

If we can map mem_section->mem_map in (virtually) linear address, we can expect
optimzed pfn <-> page translation.

Virtual mem_map is not useful for 32bit archs. This uses huge virtual
address range.

-Kame


2006-12-08 07:01:19

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [RFC] [PATCH] virtual memmap on sparsemem v3 [2/4] generic virtual mem_map on sparsemem

This patch implements of virtual mem_map on sparsemem.
This includes only arch independent part and depends on
generic map/unmap in the kernel function in this patch series.

Usual sparsemem(_extreme) have to do global table look up in
pfn_to_page()/page_to_pfn(), this seems a bit costly.

If an arch has enough address space to map all mem_map in linear,
it is good to map sprase mem_map as linear mem_map. This redcuces
cost of pfn_to_page()/page_to_pfn().
This concept is used by ia64's VIRTUAL_MEM_MAP.

pfn_valid() works as same as usual sparsemem.

callbacks to create vmem_map are used for using alloc_bootmem_node() for
allocationg pud/pmd/pte.

How to use:
fix struct page *mem_map's pointing address before calling sparse_init().
that's all.

Note:
I assumes that mem_map per each section is always aligned to PAGE_SIZE.
For example, ia64.
sizeof(struct page) = 56 && PAGES_PER_SECTION=65536. Then mem_map per
section is aligned to 56 * 65536 bytes.
#error will detect this.

Signed-Off-By: KAMEZAWA Hiruyoki <[email protected]>


Index: devel-2.6.19/mm/sparse.c
===================================================================
--- devel-2.6.19.orig/mm/sparse.c 2006-11-30 06:57:37.000000000 +0900
+++ devel-2.6.19/mm/sparse.c 2006-12-08 15:03:02.000000000 +0900
@@ -9,6 +9,8 @@
#include <linux/spinlock.h>
#include <linux/vmalloc.h>
#include <asm/dma.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>

/*
* Permanent SPARSEMEM data:
@@ -76,6 +78,106 @@
}
#endif

+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+
+struct vmemmap_create_arg {
+ int section_nr;
+ int nid;
+};
+
+/* call backs for memory map */
+static int
+__init pte_alloc_vmemmap_boot(pmd_t *pmd, unsigned long addr, void *data)
+{
+ struct vmemmap_create_arg *arg = data;
+ void *pg = alloc_bootmem_pages_node(NODE_DATA(arg->nid), PAGE_SIZE);
+ BUG_ON(!pg);
+ pmd_populate_kernel(&init_mm, pmd, pg);
+ return 0;
+}
+static int
+__init pmd_alloc_vmemmap_boot(pud_t *pud, unsigned long addr, void *data)
+{
+ struct vmemmap_create_arg *arg = data;
+ void *pg = alloc_bootmem_pages_node(NODE_DATA(arg->nid), PAGE_SIZE);
+ BUG_ON(!pg);
+ pud_populate(&init_mm, pud, pg);
+ return 0;
+}
+
+static int
+__init pud_alloc_vmemmap_boot(pgd_t *pgd, unsigned long addr, void *data)
+{
+ struct vmemmap_create_arg *arg = data;
+ void *pg = alloc_bootmem_pages_node(NODE_DATA(arg->nid), PAGE_SIZE);
+ BUG_ON(!pg);
+ pgd_populate(&init_mm, pgd, pg);
+ return 0;
+}
+
+static int
+__init pte_set_vmemmap_boot(pte_t *pte, unsigned long addr, void *data)
+{
+ struct vmemmap_create_arg *arg = data;
+ struct mem_section *ms = __nr_to_section(arg->section_nr);
+ unsigned long pmap, vmap, section_pfn, pfn;
+
+ section_pfn = section_nr_to_pfn(arg->section_nr);
+ /* we already have mem_map in linear address space. calc it */
+
+ /* decode encoded value of base address. */
+ pmap = ms->section_mem_map & SECTION_MAP_MASK;
+ pmap = (unsigned long)((struct page *)pmap + section_pfn);
+ /* section's start */
+ vmap = (unsigned long)pfn_to_page(section_pfn);
+
+ pfn = (__pa(pmap) + (addr - vmap)) >> PAGE_SHIFT;
+ set_pte(pte, pfn_pte(pfn, PAGE_KERNEL));
+ return 0;
+}
+
+static int
+__init pte_clear_vmemmap(pte_t *pte, unsigned long addr, void *data)
+{
+ BUG();
+}
+
+struct gen_map_kern_ops vmemmap_boot_ops = {
+ .k_pte_set = pte_set_vmemmap_boot,
+ .k_pte_clear = pte_clear_vmemmap,
+ .k_pud_alloc = pud_alloc_vmemmap_boot,
+ .k_pmd_alloc = pmd_alloc_vmemmap_boot,
+ .k_pte_alloc = pte_alloc_vmemmap_boot,
+};
+
+static int
+__init map_virtual_mem_map(unsigned long section, int nid)
+{
+ struct vmemmap_create_arg arg;
+ unsigned long vmap_start, vmap_size;
+ vmap_start = (unsigned long)pfn_to_page(section_nr_to_pfn(section));
+ vmap_size = PAGES_PER_SECTION * sizeof(struct page);
+ arg.section_nr = section;
+ arg.nid = nid;
+
+ if (system_state == SYSTEM_BOOTING) {
+ map_generic_kernel(vmap_start, vmap_size, &vmemmap_boot_ops,
+ &arg);
+ } else {
+ BUG();
+ }
+ /* if bug, panic occurs.*/
+ return 0;
+}
+#else
+static int
+__init map_virtual_mem_map(unsigned long section, int nid)
+{
+ return 0;
+}
+#endif
+
+
/*
* Although written for the SPARSEMEM_EXTREME case, this happens
* to also work for the flat array case becase
@@ -92,7 +194,7 @@
continue;

if ((ms >= root) && (ms < (root + SECTIONS_PER_ROOT)))
- break;
+ break;
}

return (root_nr * SECTIONS_PER_ROOT) + (ms - root);
@@ -175,13 +277,14 @@
}

static int sparse_init_one_section(struct mem_section *ms,
- unsigned long pnum, struct page *mem_map)
+ unsigned long pnum, struct page *mem_map, int node)
{
if (!valid_section(ms))
return -EINVAL;

ms->section_mem_map &= ~SECTION_MAP_MASK;
ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum);
+ map_virtual_mem_map(pnum, node);

return 1;
}
@@ -261,7 +364,8 @@
map = sparse_early_mem_map_alloc(pnum);
if (!map)
continue;
- sparse_init_one_section(__nr_to_section(pnum), pnum, map);
+ sparse_init_one_section(__nr_to_section(pnum), pnum, map,
+ sparse_early_nid(__nr_to_section(pnum)));
}
}

@@ -296,7 +400,7 @@
}
ms->section_mem_map |= SECTION_MARKED_PRESENT;

- ret = sparse_init_one_section(ms, section_nr, memmap);
+ ret = sparse_init_one_section(ms, section_nr, memmap, pgdat->node_id);

out:
pgdat_resize_unlock(pgdat, &flags);
Index: devel-2.6.19/mm/Kconfig
===================================================================
--- devel-2.6.19.orig/mm/Kconfig 2006-11-30 06:57:37.000000000 +0900
+++ devel-2.6.19/mm/Kconfig 2006-12-08 15:05:10.000000000 +0900
@@ -112,12 +112,22 @@
def_bool y
depends on SPARSEMEM && !SPARSEMEM_STATIC

+config SPARSEMEM_VMEMMAP
+ bool "Virutally contiguous mem_map on sparsemem"
+ depends on SPARSEMEM && !SPARSEMEM_STATIC && ARCH_SPARSEMEM_VMEMMAP
+ help
+ This allows micro optimization to reduce costs of accessing
+ infrastructure of memory management.
+ But this consumes huge amount of virtual memory(not physical).
+ This option is selectable only if your arch supports it.
+
# eventually, we can have this option just 'select SPARSEMEM'
config MEMORY_HOTPLUG
bool "Allow for memory hot-add"
depends on SPARSEMEM || X86_64_ACPI_NUMA
depends on HOTPLUG && !SOFTWARE_SUSPEND && ARCH_ENABLE_MEMORY_HOTPLUG
depends on (IA64 || X86 || PPC64)
+ depends on !SPARSEMEM_VMEMMAP

comment "Memory hotplug is currently incompatible with Software Suspend"
depends on SPARSEMEM && HOTPLUG && SOFTWARE_SUSPEND
Index: devel-2.6.19/include/linux/mmzone.h
===================================================================
--- devel-2.6.19.orig/include/linux/mmzone.h 2006-11-30 06:57:37.000000000 +0900
+++ devel-2.6.19/include/linux/mmzone.h 2006-12-08 15:04:30.000000000 +0900
@@ -311,7 +311,7 @@
};
#endif /* CONFIG_ARCH_POPULATES_NODE_MAP */

-#ifndef CONFIG_DISCONTIGMEM
+#if !defined(CONFIG_DISCONTIGMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
/* The array of struct pages - for discontigmem use pgdat->lmem_map */
extern struct page *mem_map;
#endif
@@ -614,6 +614,13 @@
#define SECTION_MAP_MASK (~(SECTION_MAP_LAST_BIT-1))
#define SECTION_NID_SHIFT 2

+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+#if (((BITS_PER_LONG/4) * PAGES_PER_SECTION) % PAGE_SIZE) != 0
+#error "PAGE_SIZE/SECTION_SIZE relationship is not suitable for vmem_map"
+#endif
+extern struct page* mem_map;
+#endif
+
static inline struct page *__section_mem_map_addr(struct mem_section *section)
{
unsigned long map = section->section_mem_map;
Index: devel-2.6.19/include/asm-generic/memory_model.h
===================================================================
--- devel-2.6.19.orig/include/asm-generic/memory_model.h 2006-11-30 06:57:37.000000000 +0900
+++ devel-2.6.19/include/asm-generic/memory_model.h 2006-12-08 15:03:02.000000000 +0900
@@ -47,6 +47,11 @@
})

#elif defined(CONFIG_SPARSEMEM)
+
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+#define __page_to_pfn(pg) ((pg) - mem_map)
+#define __pfn_to_page(pfn) (mem_map + (pfn))
+#else
/*
* Note: section's mem_map is encorded to reflect its start_pfn.
* section[i].section_mem_map == mem_map's address - start_pfn;
@@ -62,6 +67,7 @@
struct mem_section *__sec = __pfn_to_section(__pfn); \
__section_mem_map_addr(__sec) + __pfn; \
})
+#endif /* CONFIG_SPARSEMEM_VMEMMAP */
#endif /* CONFIG_FLATMEM/DISCONTIGMEM/SPARSEMEM */

#ifdef CONFIG_OUT_OF_LINE_PFN_TO_PAGE
Index: devel-2.6.19/mm/memory.c
===================================================================
--- devel-2.6.19.orig/mm/memory.c 2006-11-30 06:57:37.000000000 +0900
+++ devel-2.6.19/mm/memory.c 2006-12-08 15:03:02.000000000 +0900
@@ -69,6 +69,12 @@
EXPORT_SYMBOL(mem_map);
#endif

+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+/* for the virtual mem_map */
+struct page *mem_map;
+EXPORT_SYMBOL(mem_map);
+#endif
+
unsigned long num_physpages;
/*
* A number of key systems in x86 including ioremap() rely on the assumption

2006-12-08 07:04:44

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [RFC] [PATCH] virtual memmap on sparsemem v3 [4/4] ia64 support

ia64 support for sparsemem/vmem_map.
* defines mem_map[] and set its value (by static way).
* changes definitions of VMALLOC_START.
* adds CONFIGS.

Signed-Off-By: KAMEZAWA Hiroyuki <[email protected]>

Index: devel-2.6.19/arch/ia64/Kconfig
===================================================================
--- devel-2.6.19.orig/arch/ia64/Kconfig 2006-11-30 06:57:37.000000000 +0900
+++ devel-2.6.19/arch/ia64/Kconfig 2006-12-08 15:03:21.000000000 +0900
@@ -333,6 +333,14 @@
def_bool y
depends on ARCH_DISCONTIGMEM_ENABLE

+config ARCH_SPARSEMEM_VMEMMAP
+ def_bool y
+ depends on ARCH_SPARSEMEM_ENABLE
+
+config ARCH_SPARSEMEM_VMEMMAP_STATIC
+ def_bool y
+ depends on SPARSEMEM_VMEMMAP
+
config ARCH_DISCONTIGMEM_DEFAULT
def_bool y if (IA64_SGI_SN2 || IA64_GENERIC || IA64_HP_ZX1 || IA64_HP_ZX1_SWIOTLB)
depends on ARCH_DISCONTIGMEM_ENABLE
Index: devel-2.6.19/arch/ia64/kernel/vmlinux.lds.S
===================================================================
--- devel-2.6.19.orig/arch/ia64/kernel/vmlinux.lds.S 2006-11-30 06:57:37.000000000 +0900
+++ devel-2.6.19/arch/ia64/kernel/vmlinux.lds.S 2006-12-08 15:03:21.000000000 +0900
@@ -2,6 +2,7 @@
#include <asm/cache.h>
#include <asm/ptrace.h>
#include <asm/system.h>
+#include <asm/sparsemem.h>
#include <asm/pgtable.h>

#define LOAD_OFFSET (KERNEL_START - KERNEL_TR_PAGE_SIZE)
@@ -34,6 +35,9 @@

v = PAGE_OFFSET; /* this symbol is here to make debugging easier... */
phys_start = _start - LOAD_OFFSET;
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+ mem_map = VIRTUAL_MEM_MAP_START;
+#endif

code : { } :code
. = KERNEL_START;
Index: devel-2.6.19/include/asm-ia64/sparsemem.h
===================================================================
--- devel-2.6.19.orig/include/asm-ia64/sparsemem.h 2006-11-30 06:57:37.000000000 +0900
+++ devel-2.6.19/include/asm-ia64/sparsemem.h 2006-12-08 15:03:21.000000000 +0900
@@ -16,5 +16,14 @@
#endif
#endif

+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+#define VIRTUAL_MEM_MAP_START (RGN_BASE(RGN_GATE) + 0x200000000)
+
+#ifndef __ASSEMBLY__
+#define VIRTUAL_MEM_MAP_SIZE ((1UL << (MAX_PHYSMEM_BITS - PAGE_SHIFT)) * sizeof(struct page))
+#define VIRTUAL_MEM_MAP_END (VIRTUAL_MEM_MAP_START + VIRTUAL_MEM_MAP_SIZE)
+#endif
+#endif
+
#endif /* CONFIG_SPARSEMEM */
#endif /* _ASM_IA64_SPARSEMEM_H */
Index: devel-2.6.19/include/asm-ia64/pgtable.h
===================================================================
--- devel-2.6.19.orig/include/asm-ia64/pgtable.h 2006-11-30 06:57:37.000000000 +0900
+++ devel-2.6.19/include/asm-ia64/pgtable.h 2006-12-08 15:03:21.000000000 +0900
@@ -230,13 +230,21 @@
#define set_pte(ptep, pteval) (*(ptep) = (pteval))
#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)

+#if defined(CONFIG_SPARSEMEM_VMEMMAP)
+#define VMALLOC_START (VIRTUAL_MEM_MAP_END)
+#define VMALLOC_END (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 9)))
+
+#elif defined(CONFIG_VIRTUAL_MEM_MAP)
#define VMALLOC_START (RGN_BASE(RGN_GATE) + 0x200000000UL)
-#ifdef CONFIG_VIRTUAL_MEM_MAP
-# define VMALLOC_END_INIT (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 9)))
-# define VMALLOC_END vmalloc_end
- extern unsigned long vmalloc_end;
+
+#defineVMALLOC_END_INIT (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 9)))
+#define VMALLOC_END vmalloc_end
+extern unsigned long vmalloc_end;
#else
+
+#define VMALLOC_START (RGN_BASE(RGN_GATE) + 0x200000000UL)
# define VMALLOC_END (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 9)))
+
#endif

/* fs/proc/kcore.c */

2006-12-08 06:58:17

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [RFC] [PATCH] virtual memmap on sparsemem v3 [1/4] map and unmap

When we want to map pages into the kernel space by vmalloc()'s routine,
we always need 'struct page' to do that.

There are cases where there is no page struct to use (bootstrap, etc..).
This function is designed to help map any memory to anywhere, anytime.

Users should manage their virtual/physical space by themselves.
Because it's complex and danger to manage virtual address space by
each function's own code, it's better to use fixed address.

Note: My first purpose is supporting virtual mem_map both at boot/hotplug
sharing the same logic.

Signed-Off-By: KAMEZAWA Hiroyuki <[email protected]>


---
include/linux/vmalloc.h | 36 ++++++++
mm/vmalloc.c | 200 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 236 insertions(+)

Index: devel-2.6.19/include/linux/vmalloc.h
===================================================================
--- devel-2.6.19.orig/include/linux/vmalloc.h 2006-11-30 06:57:37.000000000 +0900
+++ devel-2.6.19/include/linux/vmalloc.h 2006-12-07 23:04:54.000000000 +0900
@@ -3,6 +3,7 @@

#include <linux/spinlock.h>
#include <asm/page.h> /* pgprot_t */
+#include <asm/pgtable.h> /* pud_t */

struct vm_area_struct;

@@ -74,4 +75,39 @@
extern rwlock_t vmlist_lock;
extern struct vm_struct *vmlist;

+/*
+ * map kernel memory with callback routine. this function is designed
+ * for assisting special mappings in the kernel space, in other words,
+ * not managed by standard vmap calls.
+ * The caller has to be responsible to manage his own virtual address space.
+ *
+ * Bootstrap consideration:
+ * you can pass pud/pmd/pte alloc functions to map_generic_kernel().
+ * So you can use bootmem function or something to alloc page tables if
+ * necessary.
+ */
+
+struct gen_map_kern_ops {
+ /* must be defined */
+ int (*k_pte_set)(pte_t *pte, unsigned long addr, void *data);
+ int (*k_pte_clear)(pte_t *pte, unsigned long addr, void *data);
+ /* optional */
+ int (*k_pud_alloc)(pgd_t *pgd, unsigned long addr, void *data);
+ int (*k_pmd_alloc)(pud_t *pud, unsigned long addr, void *data);
+ int (*k_pte_alloc)(pmd_t *pmd, unsigned long addr, void *data);
+};
+
+/*
+ * call set_pte for specified address range.
+ */
+extern int map_generic_kernel(unsigned long addr, unsigned long size,
+ struct gen_map_kern_ops *ops, void *data);
+/*
+ * call clear_pte() callback against all ptes found.
+ * pgtable itself is not freed.
+ */
+extern int unmap_generic_kernel(unsigned long addr, unsigned long size,
+ struct gen_map_kern_ops *ops, void *data);
+
+
#endif /* _LINUX_VMALLOC_H */
Index: devel-2.6.19/mm/vmalloc.c
===================================================================
--- devel-2.6.19.orig/mm/vmalloc.c 2006-11-30 06:57:37.000000000 +0900
+++ devel-2.6.19/mm/vmalloc.c 2006-12-06 16:33:41.000000000 +0900
@@ -747,3 +747,203 @@
}
EXPORT_SYMBOL(remap_vmalloc_range);

+
+
+/*
+ * Geneoric VM mapper for kernel routines.
+ * Can be used even in bootstrap (before memory is availabe) if callback
+ * func support it.
+ * for usual use, please use vmalloc/vfree/map_vm_ara/unmap_vm_area.
+ */
+
+static int map_generic_pte_range(pmd_t *pmd, unsigned long addr,
+ unsigned long end,
+ struct gen_map_kern_ops *ops, void *data)
+{
+ pte_t *pte;
+ int ret = 0;
+ unsigned long next;
+ if (!pmd_present(*pmd)) {
+ if (ops->k_pte_alloc) {
+ ret = ops->k_pte_alloc(pmd, addr, data);
+ if (ret)
+ return ret;
+ } else {
+ pte = pte_alloc_kernel(pmd, addr);
+ if (!pte)
+ return -ENOMEM;
+ }
+ }
+ pte = pte_offset_kernel(pmd, addr);
+
+ do {
+ WARN_ON(!pte_none(*pte));
+ BUG_ON(!ops->k_pte_set);
+ ret = ops->k_pte_set(pte, addr, data);
+ if (ret)
+ break;
+ next = addr + PAGE_SIZE;
+ } while (pte++, addr = next, addr != end);
+ return ret;
+}
+
+static int map_generic_pmd_range(pud_t *pud, unsigned long addr,
+ unsigned long end,
+ struct gen_map_kern_ops *ops, void *data)
+{
+ pmd_t *pmd;
+ unsigned long next;
+ int ret;
+
+ if (pud_none(*pud)) {
+ if (ops->k_pmd_alloc) {
+ ret = ops->k_pmd_alloc(pud, addr, data);
+ if (ret)
+ return ret;
+ } else {
+ pmd = pmd_alloc(&init_mm, pud, addr);
+ if (!pmd)
+ return -ENOMEM;
+ }
+ }
+ pmd = pmd_offset(pud, addr);
+ do {
+ next = pmd_addr_end(addr, end);
+ ret = map_generic_pte_range(pmd, addr, next, ops, data);
+ if (ret)
+ break;
+ } while (pmd++, addr = next, addr != end);
+ return ret;
+}
+
+static int map_generic_pud_range(pgd_t *pgd, unsigned long addr,
+ unsigned long end,
+ struct gen_map_kern_ops *ops, void *data)
+{
+ pud_t *pud;
+ unsigned long next;
+ int ret;
+ if (pgd_none(*pgd)) {
+ if (ops->k_pud_alloc) {
+ ret = ops->k_pud_alloc(pgd, addr, data);
+ if (ret)
+ return ret;
+ } else {
+ pud = pud_alloc(&init_mm, pgd, addr);
+ if (!pud)
+ return -ENOMEM;
+ }
+ }
+ pud = pud_offset(pgd, addr);
+ do {
+ next = pud_addr_end(addr, end);
+ ret = map_generic_pmd_range(pud, addr, next, ops, data);
+ if (ret)
+ break;
+
+ } while (pud++, addr = next, addr != end);
+ return ret;
+}
+
+int map_generic_kernel(unsigned long addr, unsigned long size,
+ struct gen_map_kern_ops *ops, void *data)
+{
+ pgd_t *pgd;
+ unsigned long end = addr + size;
+ unsigned long next;
+ int ret;
+
+ do {
+ pgd = pgd_offset_k(addr);
+ next = pgd_addr_end(addr, end);
+ ret = map_generic_pud_range(pgd, addr, next, ops, data);
+ if (ret)
+ break;
+
+ } while (addr = next, addr != end);
+ flush_cache_vmap(addr, end);
+ return ret;
+}
+
+static int
+unmap_generic_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
+ struct gen_map_kern_ops *ops, void *data)
+{
+ pte_t *pte;
+ int err = 0;
+ pte = pte_offset_kernel(pmd, addr);
+ do {
+ if (!pte_present(*pte))
+ continue;
+ err = ops->k_pte_clear(pte, addr, data);
+ if (err)
+ break;
+ } while (pte++, addr += PAGE_SIZE, addr != end);
+ return err;
+}
+
+static int
+unmap_generic_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
+ struct gen_map_kern_ops *ops, void *data)
+{
+ pmd_t *pmd;
+ unsigned long next;
+ int err = 0;
+
+ pmd = pmd_offset(pud, addr);
+
+ do {
+ next = pmd_addr_end(addr, end);
+ if (pmd_none_or_clear_bad(pmd))
+ continue;
+ err = unmap_generic_pte_range(pmd, addr, next, ops, data);
+ if (err)
+ break;
+ } while (pmd++, addr = next, addr != end);
+ return err;
+}
+
+static int
+unmap_generic_pud_range(pgd_t *pgd, unsigned long addr, unsigned long end,
+ struct gen_map_kern_ops *ops, void *data)
+{
+ pud_t *pud;
+ unsigned long next;
+ int err = 0;
+
+ pud = pud_offset(pgd, addr);
+
+ do {
+ next = pud_addr_end(addr, end);
+ if (pud_none_or_clear_bad(pud))
+ continue;
+ err = unmap_generic_pmd_range(pud, addr, next, ops, data);
+ if (err)
+ break;
+ } while (pud++, addr = next, addr != end);
+ return err;
+}
+
+int unmap_generic_kernel(unsigned long addr, unsigned long size,
+ struct gen_map_kern_ops *ops, void *data)
+{
+ unsigned long next, end;
+ pgd_t *pgd;
+ int err = 0;
+
+ end = addr + size;
+ flush_cache_vmap(addr, end);
+
+ pgd = pgd_offset_k(addr);
+
+ do {
+ next = pgd_addr_end(addr, end);
+ if (pgd_none_or_clear_bad(pgd))
+ continue;
+ err = unmap_generic_pud_range(pgd, addr, next, ops, data);
+ if (err)
+ break;
+ } while (pgd++, addr = next, addr != end);
+ flush_tlb_kernel_range((unsigned long)start_addr, end_addr);
+ return err;
+}

2006-12-08 07:03:47

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [RFC] [PATCH] virtual memmap on sparsemem v3 [3/4] static virtual mem_map

This patch adds support for statically allocated virtual mem_map.
(means virtual address of mem_map array is defined statically.)
This removes reference to *(&mem_map).

Signed-Off-By: KAMEZAWA Hiroyuki <[email protected]>


Index: devel-2.6.19/include/linux/mmzone.h
===================================================================
--- devel-2.6.19.orig/include/linux/mmzone.h 2006-12-08 15:04:30.000000000 +0900
+++ devel-2.6.19/include/linux/mmzone.h 2006-12-08 15:05:18.000000000 +0900
@@ -618,8 +618,13 @@
#if (((BITS_PER_LONG/4) * PAGES_PER_SECTION) % PAGE_SIZE) != 0
#error "PAGE_SIZE/SECTION_SIZE relationship is not suitable for vmem_map"
#endif
+#ifdef CONFIG_SPARSEMEM_VMEMMAP_STATIC
+#include <linux/mm_types.h>
+extern struct page mem_map[];
+#else
extern struct page* mem_map;
#endif
+#endif

static inline struct page *__section_mem_map_addr(struct mem_section *section)
{
Index: devel-2.6.19/mm/Kconfig
===================================================================
--- devel-2.6.19.orig/mm/Kconfig 2006-12-08 15:05:10.000000000 +0900
+++ devel-2.6.19/mm/Kconfig 2006-12-08 15:05:18.000000000 +0900
@@ -121,6 +121,10 @@
But this consumes huge amount of virtual memory(not physical).
This option is selectable only if your arch supports it.

+config SPARSEMEM_VMEMMAP_STATIC
+ def_bool y
+ depends on ARCH_SPARSEMEM_VMEMMAP_STATIC
+
# eventually, we can have this option just 'select SPARSEMEM'
config MEMORY_HOTPLUG
bool "Allow for memory hot-add"
Index: devel-2.6.19/mm/memory.c
===================================================================
--- devel-2.6.19.orig/mm/memory.c 2006-12-08 15:03:02.000000000 +0900
+++ devel-2.6.19/mm/memory.c 2006-12-08 15:09:00.000000000 +0900
@@ -71,7 +71,9 @@

#ifdef CONFIG_SPARSEMEM_VMEMMAP
/* for the virtual mem_map */
+#ifndef CONFIG_SPARSEMEM_VMEMMAP_STATIC
struct page *mem_map;
+#endif
EXPORT_SYMBOL(mem_map);
#endif


2006-12-09 00:29:01

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] [PATCH] virtual memmap on sparsemem v3 [1/4] map and unmap

On Fri, 8 Dec 2006 16:01:42 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:

> When we want to map pages into the kernel space by vmalloc()'s routine,
> we always need 'struct page' to do that.
>
> There are cases where there is no page struct to use (bootstrap, etc..).
> This function is designed to help map any memory to anywhere, anytime.
>
> Users should manage their virtual/physical space by themselves.
> Because it's complex and danger to manage virtual address space by
> each function's own code, it's better to use fixed address.
>
> Note: My first purpose is supporting virtual mem_map both at boot/hotplug
> sharing the same logic.
>

A little thing:


> + if (ops->k_pte_alloc) {
> + ret = ops->k_pte_alloc(pmd, addr, data);
> + if (ret)
> + return ret;
> + } else {
> + pte = pte_alloc_kernel(pmd, addr);
> + if (!pte)
> + return -ENOMEM;
> + }

> + if (ops->k_pmd_alloc) {
> + ret = ops->k_pmd_alloc(pud, addr, data);
> + if (ret)
> + return ret;
> + } else {
> + pmd = pmd_alloc(&init_mm, pud, addr);
> + if (!pmd)
> + return -ENOMEM;

> + if (ops->k_pud_alloc) {
> + ret = ops->k_pud_alloc(pgd, addr, data);
> + if (ret)
> + return ret;
> + } else {
> + pud = pud_alloc(&init_mm, pgd, addr);
> + if (!pud)
> + return -ENOMEM;
> + }

Generally we prefer to simply *require* that the function vector be filled
in appropriately. So if the caller has no special needs, the caller will
set their gen_map_kern_ops.k_pte_alloc to point at pte_alloc_kernel().

erk, pte_alloc_kernel() is a macro. As is pmd_alloc(), etc. Well, let
that be a lesson to us. What a mess.

I suppose we could go through and convert them all to inlines and then the
compiler will generate an out-of-line copy for us. Better would be to turn
these things into regular, out-of-line C functions.

What a mess.

2006-12-09 00:30:57

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] [PATCH] virtual memmap on sparsemem v3 [3/4] static virtual mem_map

On Fri, 8 Dec 2006 16:07:08 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:

> This patch adds support for statically allocated virtual mem_map.
> (means virtual address of mem_map array is defined statically.)
> This removes reference to *(&mem_map).
>
> Signed-Off-By: KAMEZAWA Hiroyuki <[email protected]>
>
>
> Index: devel-2.6.19/include/linux/mmzone.h
> ===================================================================
> --- devel-2.6.19.orig/include/linux/mmzone.h 2006-12-08 15:04:30.000000000 +0900
> +++ devel-2.6.19/include/linux/mmzone.h 2006-12-08 15:05:18.000000000 +0900
> @@ -618,8 +618,13 @@
> #if (((BITS_PER_LONG/4) * PAGES_PER_SECTION) % PAGE_SIZE) != 0
> #error "PAGE_SIZE/SECTION_SIZE relationship is not suitable for vmem_map"
> #endif
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP_STATIC
> +#include <linux/mm_types.h>
> +extern struct page mem_map[];
> +#else
> extern struct page* mem_map;
> #endif
> +#endif

This looks rather unpleasant - what went wrong here?

Would prefer to unconditionally include the header file - conditional inclusions
like this can cause compile failures when someone changes a config option. They
generally raise the complexity level.

2006-12-09 02:40:31

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC] [PATCH] virtual memmap on sparsemem v3 [1/4] map and unmap

On Fri, 8 Dec 2006 16:28:19 -0800
Andrew Morton <[email protected]> wrote:

> Generally we prefer to simply *require* that the function vector be filled
> in appropriately. So if the caller has no special needs, the caller will
> set their gen_map_kern_ops.k_pte_alloc to point at pte_alloc_kernel().
>
> erk, pte_alloc_kernel() is a macro. As is pmd_alloc(), etc. Well, let
> that be a lesson to us. What a mess.
>
> I suppose we could go through and convert them all to inlines and then the
> compiler will generate an out-of-line copy for us. Better would be to turn
> these things into regular, out-of-line C functions.
>
> What a mess.
>

Thank you for review. I'll remove this default action.

-Kame

2006-12-09 02:46:55

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC] [PATCH] virtual memmap on sparsemem v3 [3/4] static virtual mem_map

On Fri, 8 Dec 2006 16:30:20 -0800
Andrew Morton <[email protected]> wrote:

> > +#ifdef CONFIG_SPARSEMEM_VMEMMAP_STATIC
> > +#include <linux/mm_types.h>
> > +extern struct page mem_map[];
> > +#else
> > extern struct page* mem_map;
> > #endif
> > +#endif
>
> This looks rather unpleasant - what went wrong here?
>
definition of 'struct page' is necessary before declearing array[]. (for gcc-4.0)

> Would prefer to unconditionally include the header file - conditional inclusions
> like this can cause compile failures when someone changes a config option. They
> generally raise the complexity level.
>
Okay.
Now, forward declearation of 'struct page' is in mmzone.h.
I'll remove it and include mm_types.h instead of it.
If someone says "Don't do that", I'll look for anothere way.

-Kame


2006-12-09 03:34:08

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] [PATCH] virtual memmap on sparsemem v3 [3/4] static virtual mem_map

On Sat, 9 Dec 2006 11:49:50 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:

> On Fri, 8 Dec 2006 16:30:20 -0800
> Andrew Morton <[email protected]> wrote:
>
> > > +#ifdef CONFIG_SPARSEMEM_VMEMMAP_STATIC
> > > +#include <linux/mm_types.h>
> > > +extern struct page mem_map[];
> > > +#else
> > > extern struct page* mem_map;
> > > #endif
> > > +#endif
> >
> > This looks rather unpleasant - what went wrong here?
> >
> definition of 'struct page' is necessary before declearing array[]. (for gcc-4.0)

hm, OK, that declaration needs all of `struct page' in scope.

>
> > Would prefer to unconditionally include the header file - conditional inclusions
> > like this can cause compile failures when someone changes a config option. They
> > generally raise the complexity level.
> >
> Okay.
> Now, forward declearation of 'struct page' is in mmzone.h.
> I'll remove it and include mm_types.h instead of it.
> If someone says "Don't do that", I'll look for anothere way.
>

This header needs mm_types.h, so including it is certainly OK - there's no
choice. But I think it'd be better to include mm_types.h outside of any
ifdefs. Just stick the #include at the start of the file as usual.

2006-12-09 03:38:35

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC] [PATCH] virtual memmap on sparsemem v3 [3/4] static virtual mem_map

On Fri, 8 Dec 2006 19:33:23 -0800
Andrew Morton <[email protected]> wrote:

> >
> > > Would prefer to unconditionally include the header file - conditional inclusions
> > > like this can cause compile failures when someone changes a config option. They
> > > generally raise the complexity level.
> > >
> > Okay.
> > Now, forward declearation of 'struct page' is in mmzone.h.
> > I'll remove it and include mm_types.h instead of it.
> > If someone says "Don't do that", I'll look for anothere way.
> >
>
> This header needs mm_types.h, so including it is certainly OK - there's no
> choice. But I think it'd be better to include mm_types.h outside of any
> ifdefs. Just stick the #include at the start of the file as usual.
>
Okay, I'll make clean up patch in that way.

Thanks,
-Kame

2006-12-09 04:49:32

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC] [PATCH] virtual memmap on sparsemem v3 [1/4] map and unmap

This removes implicit default actions in map_generic_kernel() call.
Also changes comments in vmalloc.h

Signed-Off-By: KAMEZAWA Hiroyuki <[email protected]>

Index: devel-2.6.19/include/linux/vmalloc.h
===================================================================
--- devel-2.6.19.orig/include/linux/vmalloc.h 2006-12-08 15:02:39.000000000 +0900
+++ devel-2.6.19/include/linux/vmalloc.h 2006-12-09 13:46:17.000000000 +0900
@@ -81,6 +81,9 @@
* not managed by standard vmap calls.
* The caller has to be responsible to manage his own virtual address space.
*
+ * what you have to do in pud/pmd/pte allocation is allocate page and
+ * populate that entry.
+ *
* Bootstrap consideration:
* you can pass pud/pmd/pte alloc functions to map_generic_kernel().
* So you can use bootmem function or something to alloc page tables if
@@ -88,13 +91,12 @@
*/

struct gen_map_kern_ops {
- /* must be defined */
+ /* all pointers must be filled */
int (*k_pte_set)(pte_t *pte, unsigned long addr, void *data);
- int (*k_pte_clear)(pte_t *pte, unsigned long addr, void *data);
- /* optional */
int (*k_pud_alloc)(pgd_t *pgd, unsigned long addr, void *data);
int (*k_pmd_alloc)(pud_t *pud, unsigned long addr, void *data);
int (*k_pte_alloc)(pmd_t *pmd, unsigned long addr, void *data);
+ int (*k_pte_clear)(pte_t *pte, unsigned long addr, void *data);
};

/*
Index: devel-2.6.19/mm/vmalloc.c
===================================================================
--- devel-2.6.19.orig/mm/vmalloc.c 2006-12-08 15:02:39.000000000 +0900
+++ devel-2.6.19/mm/vmalloc.c 2006-12-09 13:44:35.000000000 +0900
@@ -764,15 +764,10 @@
int ret = 0;
unsigned long next;
if (!pmd_present(*pmd)) {
- if (ops->k_pte_alloc) {
- ret = ops->k_pte_alloc(pmd, addr, data);
- if (ret)
- return ret;
- } else {
- pte = pte_alloc_kernel(pmd, addr);
- if (!pte)
- return -ENOMEM;
- }
+ BUG_ON(!ops->k_pte_alloc);
+ ret = ops->k_pte_alloc(pmd, addr, data);
+ if (ret)
+ return ret;
}
pte = pte_offset_kernel(pmd, addr);

@@ -796,15 +791,10 @@
int ret;

if (pud_none(*pud)) {
- if (ops->k_pmd_alloc) {
- ret = ops->k_pmd_alloc(pud, addr, data);
- if (ret)
- return ret;
- } else {
- pmd = pmd_alloc(&init_mm, pud, addr);
- if (!pmd)
- return -ENOMEM;
- }
+ BUG_ON(!ops->k_pmd_alloc);
+ ret = ops->k_pmd_alloc(pud, addr, data);
+ if (ret)
+ return ret;
}
pmd = pmd_offset(pud, addr);
do {
@@ -824,15 +814,10 @@
unsigned long next;
int ret;
if (pgd_none(*pgd)) {
- if (ops->k_pud_alloc) {
- ret = ops->k_pud_alloc(pgd, addr, data);
- if (ret)
- return ret;
- } else {
- pud = pud_alloc(&init_mm, pgd, addr);
- if (!pud)
- return -ENOMEM;
- }
+ BUG_ON(!ops->k_pud_alloc);
+ ret = ops->k_pud_alloc(pgd, addr, data);
+ if (ret)
+ return ret;
}
pud = pud_offset(pgd, addr);
do {

2006-12-09 04:50:32

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC] [PATCH] virtual memmap on sparsemem v3 [3/4] static virtual mem_map

for avoiding complex inclusion of headr file in the middle of another header
file.

Signed-Off-By: KAMEZAWA Hiroyuki <[email protected]>


Index: devel-2.6.19/include/linux/mmzone.h
===================================================================
--- devel-2.6.19.orig/include/linux/mmzone.h 2006-12-08 15:05:18.000000000 +0900
+++ devel-2.6.19/include/linux/mmzone.h 2006-12-09 13:00:32.000000000 +0900
@@ -13,6 +13,7 @@
#include <linux/init.h>
#include <linux/seqlock.h>
#include <linux/nodemask.h>
+#include <linux/mm_types.h>
#include <asm/atomic.h>
#include <asm/page.h>

@@ -562,7 +563,6 @@
#error Allocator MAX_ORDER exceeds SECTION_SIZE
#endif

-struct page;
struct mem_section {
/*
* This is, logically, a pointer to an array of struct
@@ -619,7 +619,6 @@
#error "PAGE_SIZE/SECTION_SIZE relationship is not suitable for vmem_map"
#endif
#ifdef CONFIG_SPARSEMEM_VMEMMAP_STATIC
-#include <linux/mm_types.h>
extern struct page mem_map[];
#else
extern struct page* mem_map;

2006-12-09 04:52:28

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC] [PATCH] virtual memmap on sparsemem v3 [4/4] ia64 support


I tested ia64 with this patch under

- DISCONTIGMEM + VIRTUAL_MEM_MAP
- SPARSEMEM
- SPARSEMEM_VMEMMAP

on SMP with tiger4_defconfig.

Fix typo for DISCONTIGMEM

Signed-Off-By: KAMEZAWA Hiroyuki <[email protected]>

Index: devel-2.6.19/include/asm-ia64/pgtable.h
===================================================================
--- devel-2.6.19.orig/include/asm-ia64/pgtable.h 2006-12-09 13:22:47.000000000 +0900
+++ devel-2.6.19/include/asm-ia64/pgtable.h 2006-12-09 13:23:44.000000000 +0900
@@ -237,7 +237,7 @@
#elif defined(CONFIG_VIRTUAL_MEM_MAP)
#define VMALLOC_START (RGN_BASE(RGN_GATE) + 0x200000000UL)

-#defineVMALLOC_END_INIT (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 9)))
+#define VMALLOC_END_INIT (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 9)))
#define VMALLOC_END vmalloc_end
extern unsigned long vmalloc_end;
#else

2006-12-09 11:51:49

by Heiko Carstens

[permalink] [raw]
Subject: Re: [RFC] [PATCH] virtual memmap on sparsemem v3 [0/4] introduction

> Virtual mem_map is not useful for 32bit archs. This uses huge virtual
> address range.

Why? The s390 vmem_map implementation which I sent last week to linux-mm
is merged in the meantime. It supports both 32 and 64 bit.
The main reason is to keep things simple and avoid #ifdef hell.

Since the maximum size of the virtual array is about 16MB it's not much
waste of address space. Actually I just changed the size of the vmalloc
area, so that the maximum supported physical amount of memory is still 1920MB.

2006-12-09 12:05:58

by Heiko Carstens

[permalink] [raw]
Subject: Re: [RFC] [PATCH] virtual memmap on sparsemem v3 [2/4] generic virtual mem_map on sparsemem

> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> +#if (((BITS_PER_LONG/4) * PAGES_PER_SECTION) % PAGE_SIZE) != 0
> +#error "PAGE_SIZE/SECTION_SIZE relationship is not suitable for vmem_map"
> +#endif

Why the BITS_PER_LONG/4? Or to put in other words: why not simply
PAGES_PER_SECTION % PAGE_SIZE != 0 ?

2006-12-09 13:18:10

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC] [PATCH] virtual memmap on sparsemem v3 [2/4] generic virtual mem_map on sparsemem

On Sat, 9 Dec 2006 13:05:47 +0100
Heiko Carstens <[email protected]> wrote:

> > +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> > +#if (((BITS_PER_LONG/4) * PAGES_PER_SECTION) % PAGE_SIZE) != 0
> > +#error "PAGE_SIZE/SECTION_SIZE relationship is not suitable for vmem_map"
> > +#endif
>
> Why the BITS_PER_LONG/4? Or to put in other words: why not simply
> PAGES_PER_SECTION % PAGE_SIZE != 0 ?
>
sorry, my mistake. What I wanted to do was

32bits arch --
4 * PAGES_PER_SECTION % PAGE_SIZE
64bits arch --
8 * PAGES_PER_SECTION % PAGE_SIZE

I'll renew this in the next week.

-Kame

2006-12-09 13:22:14

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC] [PATCH] virtual memmap on sparsemem v3 [0/4] introduction

On Sat, 9 Dec 2006 12:51:37 +0100
Heiko Carstens <[email protected]> wrote:

> > Virtual mem_map is not useful for 32bit archs. This uses huge virtual
> > address range.
>
> Why? The s390 vmem_map implementation which I sent last week to linux-mm
> is merged in the meantime. It supports both 32 and 64 bit.
> The main reason is to keep things simple and avoid #ifdef hell.
>
> Since the maximum size of the virtual array is about 16MB it's not much
> waste of address space. Actually I just changed the size of the vmalloc
> area, so that the maximum supported physical amount of memory is still 1920MB.

I'm sorry. I don't stop anyone who want to use vmem_map.
(My brain is polluted by ugly x86 36bit-space/32bit arch.)

-Kame

2006-12-10 19:47:35

by Bob Picco

[permalink] [raw]
Subject: Re: [RFC] [PATCH] virtual memmap on sparsemem v3 [0/4] introduction

Hiroyuki KAMEZAWA wrote: [Fri Dec 08 2006, 01:56:08AM EST]
Hi Kame,
> Hi, virtual mem_map on sparsemem/generic patch version 3.
>
> I myself likes this patch.
> But someone may feels this patch is intrusive and scattered.
> please pointing out.
>
> Changes v2 -> v3
> - make map/unmap function for general purpose. (for my purpose ;)
> - drop memory hotplug support. will be posted after this goes in.
> - change pfn_to_page()/page_to_pfn() defintions.
> - add CONFIT_SPARSEMEM_VMEMMAP_STATIC config.
> - several clean ups.
> - drop optimized pfn_valid() patch will be posted later after this goes in.
> - add #error to check vmem_map alignment.
>
> Changes v1 -> v2:
> - support memory hotplug case.
> - uses static address for vmem_map (ia64)
> - added optimized pfn_valid() for ia64 (experimental)
>
> Intro:
> When using SPARSEMEM, pfn_to_page()/page_to_pfn() accesses global big table
> of mem_section. if SPARSEMEM_EXTREME, this is 2-level table lookup.
Did you gather any performance numbers comparing
VIRTUAL_MEM_MAP+SPARSEMEM to SPARSEMEM+EXTREME? I did some quick but
inconclusive (small machine) ones when you first posted. There was
perhaps a slight degradation in VIRTUAL_MEM_MAP+SPARSEMEM.

bob
>
> If we can map mem_section->mem_map in (virtually) linear address, we can expect
> optimzed pfn <-> page translation.
>
> Virtual mem_map is not useful for 32bit archs. This uses huge virtual
> address range.
>
> -Kame

2006-12-11 00:40:41

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC] [PATCH] virtual memmap on sparsemem v3 [0/4] introduction

On Sun, 10 Dec 2006 14:47:30 -0500
"Bob Picco" <[email protected]> wrote:
> > Intro:
> > When using SPARSEMEM, pfn_to_page()/page_to_pfn() accesses global big table
> > of mem_section. if SPARSEMEM_EXTREME, this is 2-level table lookup.
> Did you gather any performance numbers comparing
> VIRTUAL_MEM_MAP+SPARSEMEM to SPARSEMEM+EXTREME? I did some quick but
> inconclusive (small machine) ones when you first posted. There was
> perhaps a slight degradation in VIRTUAL_MEM_MAP+SPARSEMEM.
>
No, I didn't. I'll do when I have a chance to do it. I hope that this won't
be merged until someone shows the benefit by data.
(I think this verion is better than the first one but..)

IIRC, DISCONTIGMEM + VIRTUAL_MEM_MAP shows a bit better performance than
SPARSEMEM_EXTREME. What I expect is to archive VIRTUAL_MEM_MAP performance
with SPARSEMEM.

Now, we still have chance to optimization.
- optimize pfn_valid. (I'll post this. mem_section[] will be never accessed
in runtime by this.)
- use large-sized-page (no concrete idea, maybe to modify map/unmap func
will work enough.)

Thanks,
-Kame

2006-12-11 06:41:41

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC] [PATCH] virtual memmap on sparsemem v3 [2/4] generic virtual mem_map on sparsemem

On Sat, 9 Dec 2006 22:17:00 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:

> I'll renew this in the next week.
>

Hi, this is a fix patch. Sorry for my carelessness.

I'll post next add-on patch against the next -mm which will be shipped.
What I have now are
- pfn_valid() optimization
- memory hotplug support

then, performance comparison stage will come.

(*)robust memory hot add patch (we are planning) may use this vmem_map for
avoiding allocating hot-added-memory's mem_map from existing memory.

Thanks,
-Kame

== patch from here ===

Fixes #error condition.

This check's meaning is:
--for 32bits--
4 (struct page's alignment) * PAGES_PER_SECTION % PAGE_SIZE == 0
--for 64bits--
8 (struct page's alignment) * PAGES_PER_SECTION % PAGE_SIZE == 0

Then, vmem_map is aligned per section.

This check may be removed if I (or someone) can write clean patch
for not aligned vmem_map.

Signed-Off-By: KAMEZAWA Hiroyuki <[email protected]>

Index: devel-2.6.19/include/linux/mmzone.h
===================================================================
--- devel-2.6.19.orig/include/linux/mmzone.h 2006-12-09 13:46:35.000000000 +0900
+++ devel-2.6.19/include/linux/mmzone.h 2006-12-11 15:22:19.000000000 +0900
@@ -615,7 +615,7 @@
#define SECTION_NID_SHIFT 2

#ifdef CONFIG_SPARSEMEM_VMEMMAP
-#if (((BITS_PER_LONG/4) * PAGES_PER_SECTION) % PAGE_SIZE) != 0
+#if (((BITS_PER_LONG/8) * PAGES_PER_SECTION) % PAGE_SIZE) != 0
#error "PAGE_SIZE/SECTION_SIZE relationship is not suitable for vmem_map"
#endif
#ifdef CONFIG_SPARSEMEM_VMEMMAP_STATIC