2010-07-22 18:25:19

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH -v26 00/31] generic changes for memblock

New memblock could be used to replace early_res in x86.

Suggested by: David, Ben, and Thomas

-v25: update to mainline with kmemleak fix on nobootmem
also rename lmb to memblock alread in mainline

-v26: according to Linus and hpa, seperate the big patchset to small ones.

This one is rebase of Ben's changeset to current mainline/tip

Last 6 are needed for x86 memblock transistion, but change mm/memblock.c

Thanks

Yinghai Lu

[PATCH 01/31] memblock: Rename memblock_region to memblock_type and memblock_property to memblock_region
[PATCH 02/31] memblock: No reason to include asm/memblock.h late
[PATCH 03/31] memblock: Introduce for_each_memblock() and new accessors, and use it
[PATCH 04/31] memblock: Remove nid_range argument, arch provides memblock_nid_range() instead
[PATCH 05/31] memblock: Factor the lowest level alloc function
[PATCH 06/31] memblock: Expose MEMBLOCK_ALLOC_ANYWHERE
[PATCH 07/31] memblock: Introduce default allocation limit and use it to replace explicit ones
[PATCH 08/31] memblock: Remove rmo_size, burry it in arch/powerpc where it belongs
[PATCH 09/31] memblock: Change u64 to phys_addr_t
[PATCH 10/31] memblock: Remove unused memblock.debug struct member
[PATCH 11/31] memblock: Remove memblock_type.size and add memblock.memory_size instead
[PATCH 12/31] memblock: Move memblock arrays to static storage in memblock.c and make their size a variable
[PATCH 13/31] memblock: Add debug markers at the end of the array
[PATCH 14/31] memblock: Make memblock_find_region() out of memblock_alloc_region()
[PATCH 15/31] memblock: Define MEMBLOCK_ERROR internally instead of using ~(phys_addr_t)0
[PATCH 16/31] memblock: Move memblock_init() to the bottom of the file
[PATCH 17/31] memblock: split memblock_find_base() out of __memblock_alloc_base()
[PATCH 18/31] memblock: Move functions around into a more sensible order
[PATCH 19/31] memblock: Add array resizing support
[PATCH 20/31] memblock: Add arch function to control coalescing of memblock memory regions
[PATCH 21/31] memblock: Add "start" argument to memblock_find_base()
[PATCH 22/31] memblock: NUMA allocate can now use early_pfn_map
[PATCH 23/31] memblock: Separate memblock_alloc_nid() and memblock_alloc_try_nid()
[PATCH 24/31] memblock: Make memblock_alloc_try_nid() fallback to MEMBLOCK_ALLOC_ANYWHERE
[PATCH 25/31] memblock: Add debugfs files to dump the arrays content
[PATCH 26/31] memblock: Prepare x86 to use memblock to replace early_res
[PATCH 27/31] memblock: Print new doubled array location info
[PATCH 28/31] memblock: Export MEMBLOCK_ERROR again
[PATCH 29/31] memblock: Prepare to include linux/memblock.h in core file
[PATCH 30/31] memblock: Add ARCH_DISCARD_MEMBLOCK to put memblock code to .init
[PATCH 31/31] memblock: Add memblock_find_in_range()

arch/microblaze/include/asm/memblock.h | 3 -
arch/microblaze/mm/init.c | 18 +-
arch/powerpc/include/asm/memblock.h | 7 -
arch/powerpc/include/asm/mmu.h | 12 +
arch/powerpc/kernel/head_40x.S | 6 +-
arch/powerpc/kernel/paca.c | 2 +-
arch/powerpc/kernel/prom.c | 15 +-
arch/powerpc/kernel/rtas.c | 2 +-
arch/powerpc/kernel/setup_32.c | 2 +-
arch/powerpc/kernel/setup_64.c | 2 +-
arch/powerpc/mm/40x_mmu.c | 17 +-
arch/powerpc/mm/44x_mmu.c | 14 +
arch/powerpc/mm/fsl_booke_mmu.c | 12 +-
arch/powerpc/mm/hash_utils_64.c | 35 ++-
arch/powerpc/mm/init_32.c | 43 +-
arch/powerpc/mm/init_64.c | 1 +
arch/powerpc/mm/mem.c | 94 ++---
arch/powerpc/mm/numa.c | 17 +-
arch/powerpc/mm/ppc_mmu_32.c | 18 +-
arch/powerpc/mm/tlb_nohash.c | 16 +
arch/powerpc/platforms/embedded6xx/wii.c | 2 +-
arch/sh/include/asm/memblock.h | 2 -
arch/sh/mm/init.c | 16 +-
arch/sparc/include/asm/memblock.h | 2 -
arch/sparc/mm/init_64.c | 46 +--
include/linux/memblock.h | 162 +++++--
mm/memblock.c | 764 +++++++++++++++++++-----------
27 files changed, 846 insertions(+), 484 deletions(-)


2010-07-22 18:22:06

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 06/31] memblock: Expose MEMBLOCK_ALLOC_ANYWHERE

From: Benjamin Herrenschmidt <[email protected]>

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
arch/powerpc/mm/hash_utils_64.c | 2 +-
include/linux/memblock.h | 1 +
mm/memblock.c | 2 --
3 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 2b0a807..c630b4f 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -625,7 +625,7 @@ static void __init htab_initialize(void)
if (machine_is(cell))
limit = 0x80000000;
else
- limit = 0;
+ limit = MEMBLOCK_ALLOC_ANYWHERE;

table = memblock_alloc_base(htab_size_bytes, htab_size_bytes, limit);

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 3e4a52f..5853752 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -50,6 +50,7 @@ extern u64 __init memblock_alloc_nid(u64 size, u64 align, int nid);
extern u64 __init memblock_alloc(u64 size, u64 align);
extern u64 __init memblock_alloc_base(u64 size,
u64, u64 max_addr);
+#define MEMBLOCK_ALLOC_ANYWHERE 0
extern u64 __init __memblock_alloc_base(u64 size,
u64 align, u64 max_addr);
extern u64 __init memblock_phys_mem_size(void);
diff --git a/mm/memblock.c b/mm/memblock.c
index 9b71de0..0ad7626 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -15,8 +15,6 @@
#include <linux/bitops.h>
#include <linux/memblock.h>

-#define MEMBLOCK_ALLOC_ANYWHERE 0
-
struct memblock memblock;

static int memblock_debug;
--
1.6.4.2

2010-07-22 18:22:14

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 07/31] memblock: Introduce default allocation limit and use it to replace explicit ones

From: Benjamin Herrenschmidt <[email protected]>

This introduce memblock.current_limit which is used to limit allocations
from memblock_alloc() or memblock_alloc_base(..., MEMBLOCK_ALLOC_ACCESSIBLE).

The old MEMBLOCK_ALLOC_ANYWHERE changes value from 0 to ~(u64)0 and can still
be used with memblock_alloc_base() to allocate really anywhere.

It is -no-longer- cropped to MEMBLOCK_REAL_LIMIT which disappears.

Note to archs: I'm leaving the default limit to MEMBLOCK_ALLOC_ANYWHERE. I
strongly recommend that you ensure that you set an appropriate limit
during boot in order to guarantee that an memblock_alloc() at any time
results in something that is accessible with a simple __va().

The reason is that a subsequent patch will introduce the ability for
the array to resize itself by reallocating itself. The MEMBLOCK core will
honor the current limit when performing those allocations.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
arch/microblaze/include/asm/memblock.h | 3 ---
arch/powerpc/include/asm/memblock.h | 7 -------
arch/powerpc/kernel/prom.c | 20 +++++++++++++++++++-
arch/powerpc/kernel/setup_32.c | 2 +-
arch/powerpc/mm/40x_mmu.c | 5 +++--
arch/powerpc/mm/fsl_booke_mmu.c | 3 ++-
arch/powerpc/mm/hash_utils_64.c | 3 ++-
arch/powerpc/mm/init_32.c | 29 +++++++----------------------
arch/powerpc/mm/ppc_mmu_32.c | 3 +--
arch/powerpc/mm/tlb_nohash.c | 2 ++
arch/sh/include/asm/memblock.h | 2 --
arch/sparc/include/asm/memblock.h | 2 --
include/linux/memblock.h | 16 +++++++++++++++-
mm/memblock.c | 19 +++++++++++--------
14 files changed, 63 insertions(+), 53 deletions(-)

diff --git a/arch/microblaze/include/asm/memblock.h b/arch/microblaze/include/asm/memblock.h
index f9c2fa3..20a8e25 100644
--- a/arch/microblaze/include/asm/memblock.h
+++ b/arch/microblaze/include/asm/memblock.h
@@ -9,9 +9,6 @@
#ifndef _ASM_MICROBLAZE_MEMBLOCK_H
#define _ASM_MICROBLAZE_MEMBLOCK_H

-/* MEMBLOCK limit is OFF */
-#define MEMBLOCK_REAL_LIMIT 0xFFFFFFFF
-
#endif /* _ASM_MICROBLAZE_MEMBLOCK_H */


diff --git a/arch/powerpc/include/asm/memblock.h b/arch/powerpc/include/asm/memblock.h
index 3c29728..43efc34 100644
--- a/arch/powerpc/include/asm/memblock.h
+++ b/arch/powerpc/include/asm/memblock.h
@@ -5,11 +5,4 @@

#define MEMBLOCK_DBG(fmt...) udbg_printf(fmt)

-#ifdef CONFIG_PPC32
-extern phys_addr_t lowmem_end_addr;
-#define MEMBLOCK_REAL_LIMIT lowmem_end_addr
-#else
-#define MEMBLOCK_REAL_LIMIT 0
-#endif
-
#endif /* _ASM_POWERPC_MEMBLOCK_H */
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 9d39539..f665d1b 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -98,7 +98,7 @@ static void __init move_device_tree(void)

if ((memory_limit && (start + size) > memory_limit) ||
overlaps_crashkernel(start, size)) {
- p = __va(memblock_alloc_base(size, PAGE_SIZE, memblock.rmo_size));
+ p = __va(memblock_alloc(size, PAGE_SIZE));
memcpy(p, initial_boot_params, size);
initial_boot_params = (struct boot_param_header *)p;
DBG("Moved device tree to 0x%p\n", p);
@@ -655,6 +655,21 @@ static void __init phyp_dump_reserve_mem(void)
static inline void __init phyp_dump_reserve_mem(void) {}
#endif /* CONFIG_PHYP_DUMP && CONFIG_PPC_RTAS */

+static void set_boot_memory_limit(void)
+{
+#ifdef CONFIG_PPC32
+ /* 601 can only access 16MB at the moment */
+ if (PVR_VER(mfspr(SPRN_PVR)) == 1)
+ memblock_set_current_limit(0x01000000);
+ /* 8xx can only access 8MB at the moment */
+ else if (PVR_VER(mfspr(SPRN_PVR)) == 0x50)
+ memblock_set_current_limit(0x00800000);
+ else
+ memblock_set_current_limit(0x10000000);
+#else
+ memblock_set_current_limit(memblock.rmo_size);
+#endif
+}

void __init early_init_devtree(void *params)
{
@@ -683,6 +698,7 @@ void __init early_init_devtree(void *params)

/* Scan memory nodes and rebuild MEMBLOCKs */
memblock_init();
+
of_scan_flat_dt(early_init_dt_scan_root, NULL);
of_scan_flat_dt(early_init_dt_scan_memory_ppc, NULL);

@@ -718,6 +734,8 @@ void __init early_init_devtree(void *params)

DBG("Phys. mem: %llx\n", memblock_phys_mem_size());

+ set_boot_memory_limit();
+
/* We may need to relocate the flat tree, do it now.
* FIXME .. and the initrd too? */
move_device_tree();
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index a10ffc8..b7eb1de 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -246,7 +246,7 @@ static void __init irqstack_early_init(void)
unsigned int i;

/* interrupt stacks must be in lowmem, we get that for free on ppc32
- * as the memblock is limited to lowmem by MEMBLOCK_REAL_LIMIT */
+ * as the memblock is limited to lowmem by default */
for_each_possible_cpu(i) {
softirq_ctx[i] = (struct thread_info *)
__va(memblock_alloc(THREAD_SIZE, THREAD_SIZE));
diff --git a/arch/powerpc/mm/40x_mmu.c b/arch/powerpc/mm/40x_mmu.c
index 1dc2fa5..58969b5 100644
--- a/arch/powerpc/mm/40x_mmu.c
+++ b/arch/powerpc/mm/40x_mmu.c
@@ -35,6 +35,7 @@
#include <linux/init.h>
#include <linux/delay.h>
#include <linux/highmem.h>
+#include <linux/memblock.h>

#include <asm/pgalloc.h>
#include <asm/prom.h>
@@ -47,6 +48,7 @@
#include <asm/bootx.h>
#include <asm/machdep.h>
#include <asm/setup.h>
+
#include "mmu_decl.h"

extern int __map_without_ltlbs;
@@ -139,8 +141,7 @@ unsigned long __init mmu_mapin_ram(unsigned long top)
* coverage with normal-sized pages (or other reasons) do not
* attempt to allocate outside the allowed range.
*/
-
- __initial_memory_limit_addr = memstart_addr + mapped;
+ memblock_set_current_limit(memstart_addr + mapped);

return mapped;
}
diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
index cdc7526..e525f86 100644
--- a/arch/powerpc/mm/fsl_booke_mmu.c
+++ b/arch/powerpc/mm/fsl_booke_mmu.c
@@ -40,6 +40,7 @@
#include <linux/init.h>
#include <linux/delay.h>
#include <linux/highmem.h>
+#include <linux/memblock.h>

#include <asm/pgalloc.h>
#include <asm/prom.h>
@@ -212,5 +213,5 @@ void __init adjust_total_lowmem(void)
pr_cont("%lu Mb, residual: %dMb\n", tlbcam_sz(tlbcam_index - 1) >> 20,
(unsigned int)((total_lowmem - __max_low_memory) >> 20));

- __initial_memory_limit_addr = memstart_addr + __max_low_memory;
+ memblock_set_current_limit(memstart_addr + __max_low_memory);
}
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index c630b4f..79f9445 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -696,7 +696,8 @@ static void __init htab_initialize(void)
#endif /* CONFIG_U3_DART */
BUG_ON(htab_bolt_mapping(base, base + size, __pa(base),
prot, mmu_linear_psize, mmu_kernel_ssize));
- }
+ }
+ memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);

/*
* If we have a memory_limit and we've allocated TCEs then we need to
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 6a6975d..59b208b 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -92,12 +92,6 @@ int __allow_ioremap_reserved;
unsigned long __max_low_memory = MAX_LOW_MEM;

/*
- * address of the limit of what is accessible with initial MMU setup -
- * 256MB usually, but only 16MB on 601.
- */
-phys_addr_t __initial_memory_limit_addr = (phys_addr_t)0x10000000;
-
-/*
* Check for command-line options that affect what MMU_init will do.
*/
void MMU_setup(void)
@@ -126,13 +120,6 @@ void __init MMU_init(void)
if (ppc_md.progress)
ppc_md.progress("MMU:enter", 0x111);

- /* 601 can only access 16MB at the moment */
- if (PVR_VER(mfspr(SPRN_PVR)) == 1)
- __initial_memory_limit_addr = 0x01000000;
- /* 8xx can only access 8MB at the moment */
- if (PVR_VER(mfspr(SPRN_PVR)) == 0x50)
- __initial_memory_limit_addr = 0x00800000;
-
/* parse args from command line */
MMU_setup();

@@ -190,20 +177,18 @@ void __init MMU_init(void)
#ifdef CONFIG_BOOTX_TEXT
btext_unmap();
#endif
+
+ /* Shortly after that, the entire linear mapping will be available */
+ memblock_set_current_limit(lowmem_end_addr);
}

/* This is only called until mem_init is done. */
void __init *early_get_page(void)
{
- void *p;
-
- if (init_bootmem_done) {
- p = alloc_bootmem_pages(PAGE_SIZE);
- } else {
- p = __va(memblock_alloc_base(PAGE_SIZE, PAGE_SIZE,
- __initial_memory_limit_addr));
- }
- return p;
+ if (init_bootmem_done)
+ return alloc_bootmem_pages(PAGE_SIZE);
+ else
+ return __va(memblock_alloc(PAGE_SIZE, PAGE_SIZE));
}

/* Free up now-unused memory */
diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
index f8a0182..7d34e17 100644
--- a/arch/powerpc/mm/ppc_mmu_32.c
+++ b/arch/powerpc/mm/ppc_mmu_32.c
@@ -223,8 +223,7 @@ void __init MMU_init_hw(void)
* Find some memory for the hash table.
*/
if ( ppc_md.progress ) ppc_md.progress("hash:find piece", 0x322);
- Hash = __va(memblock_alloc_base(Hash_size, Hash_size,
- __initial_memory_limit_addr));
+ Hash = __va(memblock_alloc(Hash_size, Hash_size));
cacheable_memzero(Hash, Hash_size);
_SDR1 = __pa(Hash) | SDR1_LOW_BITS;

diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index d8695b0..7ba32e7 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -432,6 +432,8 @@ static void __early_init_mmu(int boot_cpu)
* the MMU configuration
*/
mb();
+
+ memblock_set_current_limit(linear_map_top);
}

void __init early_init_mmu(void)
diff --git a/arch/sh/include/asm/memblock.h b/arch/sh/include/asm/memblock.h
index dfe683b..e87063f 100644
--- a/arch/sh/include/asm/memblock.h
+++ b/arch/sh/include/asm/memblock.h
@@ -1,6 +1,4 @@
#ifndef __ASM_SH_MEMBLOCK_H
#define __ASM_SH_MEMBLOCK_H

-#define MEMBLOCK_REAL_LIMIT 0
-
#endif /* __ASM_SH_MEMBLOCK_H */
diff --git a/arch/sparc/include/asm/memblock.h b/arch/sparc/include/asm/memblock.h
index f12af88..c67b047 100644
--- a/arch/sparc/include/asm/memblock.h
+++ b/arch/sparc/include/asm/memblock.h
@@ -5,6 +5,4 @@

#define MEMBLOCK_DBG(fmt...) prom_printf(fmt)

-#define MEMBLOCK_REAL_LIMIT 0
-
#endif /* !(_SPARC64_MEMBLOCK_H) */
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 5853752..aabdcdd 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -34,6 +34,7 @@ struct memblock_type {
struct memblock {
unsigned long debug;
u64 rmo_size;
+ u64 current_limit;
struct memblock_type memory;
struct memblock_type reserved;
};
@@ -46,11 +47,16 @@ extern long memblock_add(u64 base, u64 size);
extern long memblock_remove(u64 base, u64 size);
extern long __init memblock_free(u64 base, u64 size);
extern long __init memblock_reserve(u64 base, u64 size);
+
extern u64 __init memblock_alloc_nid(u64 size, u64 align, int nid);
extern u64 __init memblock_alloc(u64 size, u64 align);
+
+/* Flags for memblock_alloc_base() amd __memblock_alloc_base() */
+#define MEMBLOCK_ALLOC_ANYWHERE (~(u64)0)
+#define MEMBLOCK_ALLOC_ACCESSIBLE 0
+
extern u64 __init memblock_alloc_base(u64 size,
u64, u64 max_addr);
-#define MEMBLOCK_ALLOC_ANYWHERE 0
extern u64 __init __memblock_alloc_base(u64 size,
u64 align, u64 max_addr);
extern u64 __init memblock_phys_mem_size(void);
@@ -64,6 +70,14 @@ extern void memblock_dump_all(void);
/* Provided by the architecture */
extern u64 memblock_nid_range(u64 start, u64 end, int *nid);

+/**
+ * memblock_set_current_limit - Set the current allocation limit to allow
+ * limiting allocations to what is currently
+ * accessible during boot
+ * @limit: New limit value (physical address)
+ */
+extern void memblock_set_current_limit(u64 limit);
+

/*
* pfn conversion functions
diff --git a/mm/memblock.c b/mm/memblock.c
index 0ad7626..cdb35ba 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -115,6 +115,8 @@ void __init memblock_init(void)
memblock.reserved.regions[0].base = 0;
memblock.reserved.regions[0].size = 0;
memblock.reserved.cnt = 1;
+
+ memblock.current_limit = MEMBLOCK_ALLOC_ANYWHERE;
}

void __init memblock_analyze(void)
@@ -373,7 +375,7 @@ u64 __init memblock_alloc_nid(u64 size, u64 align, int nid)

u64 __init memblock_alloc(u64 size, u64 align)
{
- return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ANYWHERE);
+ return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
}

u64 __init memblock_alloc_base(u64 size, u64 align, u64 max_addr)
@@ -399,14 +401,9 @@ u64 __init __memblock_alloc_base(u64 size, u64 align, u64 max_addr)

size = memblock_align_up(size, align);

- /* On some platforms, make sure we allocate lowmem */
- /* Note that MEMBLOCK_REAL_LIMIT may be MEMBLOCK_ALLOC_ANYWHERE */
- if (max_addr == MEMBLOCK_ALLOC_ANYWHERE)
- max_addr = MEMBLOCK_REAL_LIMIT;
-
/* Pump up max_addr */
- if (max_addr == MEMBLOCK_ALLOC_ANYWHERE)
- max_addr = ~(u64)0;
+ if (max_addr == MEMBLOCK_ALLOC_ACCESSIBLE)
+ max_addr = memblock.current_limit;

/* We do a top-down search, this tends to limit memory
* fragmentation by keeping early boot allocs near the
@@ -501,3 +498,9 @@ int memblock_is_region_reserved(u64 base, u64 size)
return memblock_overlaps_region(&memblock.reserved, base, size);
}

+
+void __init memblock_set_current_limit(u64 limit)
+{
+ memblock.current_limit = limit;
+}
+
--
1.6.4.2

2010-07-22 18:22:39

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 21/31] memblock: Add "start" argument to memblock_find_base()

From: Benjamin Herrenschmidt <[email protected]>

To constraint the search of a region between two boundaries,
which will be used by the new NUMA aware allocator among others.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
mm/memblock.c | 27 ++++++++++++++++-----------
1 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index f793d9f..b4870cf 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -117,19 +117,18 @@ static phys_addr_t __init memblock_find_region(phys_addr_t start, phys_addr_t en
return MEMBLOCK_ERROR;
}

-static phys_addr_t __init memblock_find_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
+static phys_addr_t __init memblock_find_base(phys_addr_t size, phys_addr_t align,
+ phys_addr_t start, phys_addr_t end)
{
long i;
- phys_addr_t base = 0;
- phys_addr_t res_base;

BUG_ON(0 == size);

size = memblock_align_up(size, align);

/* Pump up max_addr */
- if (max_addr == MEMBLOCK_ALLOC_ACCESSIBLE)
- max_addr = memblock.current_limit;
+ if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
+ end = memblock.current_limit;

/* We do a top-down search, this tends to limit memory
* fragmentation by keeping early boot allocs near the
@@ -138,13 +137,19 @@ static phys_addr_t __init memblock_find_base(phys_addr_t size, phys_addr_t align
for (i = memblock.memory.cnt - 1; i >= 0; i--) {
phys_addr_t memblockbase = memblock.memory.regions[i].base;
phys_addr_t memblocksize = memblock.memory.regions[i].size;
+ phys_addr_t bottom, top, found;

if (memblocksize < size)
continue;
- base = min(memblockbase + memblocksize, max_addr);
- res_base = memblock_find_region(memblockbase, base, size, align);
- if (res_base != MEMBLOCK_ERROR)
- return res_base;
+ if ((memblockbase + memblocksize) <= start)
+ break;
+ bottom = max(memblockbase, start);
+ top = min(memblockbase + memblocksize, end);
+ if (bottom >= top)
+ continue;
+ found = memblock_find_region(bottom, top, size, align);
+ if (found != MEMBLOCK_ERROR)
+ return found;
}
return MEMBLOCK_ERROR;
}
@@ -204,7 +209,7 @@ static int memblock_double_array(struct memblock_type *type)
new_array = kmalloc(new_size, GFP_KERNEL);
addr = new_array == NULL ? MEMBLOCK_ERROR : __pa(new_array);
} else
- addr = memblock_find_base(new_size, sizeof(phys_addr_t), MEMBLOCK_ALLOC_ACCESSIBLE);
+ addr = memblock_find_base(new_size, sizeof(phys_addr_t), 0, MEMBLOCK_ALLOC_ACCESSIBLE);
if (addr == MEMBLOCK_ERROR) {
pr_err("memblock: Failed to double %s array from %ld to %ld entries !\n",
memblock_type_name(type), type->max, type->max * 2);
@@ -416,7 +421,7 @@ phys_addr_t __init __memblock_alloc_base(phys_addr_t size, phys_addr_t align, ph
*/
size = memblock_align_up(size, align);

- found = memblock_find_base(size, align, max_addr);
+ found = memblock_find_base(size, align, 0, max_addr);
if (found != MEMBLOCK_ERROR &&
memblock_add_region(&memblock.reserved, found, size) >= 0)
return found;
--
1.6.4.2

2010-07-22 18:22:53

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 30/31] memblock: Add ARCH_DISCARD_MEMBLOCK to put memblock code to .init

So those memblock bits could be released after kernel is booted up.

Arch code could define ARCH_DISCARD_MEMBLOCK in asm/memblock.h,
__init_memblock will become __init, __initdata_memblock will becom __initdata

x86 code will use that.

if ARCH_DISCARD_MEMBLOCK is defined, debugfs is not used

-v2: use ARCH_DISCARD_MEMBLOCK according to Michael Ellerman

Signed-off-by: Yinghai Lu <[email protected]>
---
include/linux/memblock.h | 8 +++++++
mm/memblock.c | 48 +++++++++++++++++++++++-----------------------
2 files changed, 32 insertions(+), 24 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 4aaaf0d..751a4eb 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -148,6 +148,14 @@ static inline unsigned long memblock_region_pages(const struct memblock_region *
region++)


+#ifdef ARCH_DISCARD_MEMBLOCK
+#define __init_memblock __init
+#define __initdata_memblock __initdata
+#else
+#define __init_memblock
+#define __initdata_memblock
+#endif
+
#endif /* CONFIG_HAVE_MEMBLOCK */

#endif /* __KERNEL__ */
diff --git a/mm/memblock.c b/mm/memblock.c
index 3d0a754..7471dac 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -20,12 +20,12 @@
#include <linux/seq_file.h>
#include <linux/memblock.h>

-struct memblock memblock;
+struct memblock memblock __initdata_memblock;

-int memblock_debug;
-int memblock_can_resize;
-static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS + 1];
-struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS + 1];
+int memblock_debug __initdata_memblock;
+int memblock_can_resize __initdata_memblock;
+static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS + 1] __initdata_memblock;
+struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS + 1] __initdata_memblock;

/* inline so we don't get a warning when pr_debug is compiled out */
static inline const char *memblock_type_name(struct memblock_type *type)
@@ -42,23 +42,23 @@ static inline const char *memblock_type_name(struct memblock_type *type)
* Address comparison utilities
*/

-static phys_addr_t memblock_align_down(phys_addr_t addr, phys_addr_t size)
+static phys_addr_t __init_memblock memblock_align_down(phys_addr_t addr, phys_addr_t size)
{
return addr & ~(size - 1);
}

-static phys_addr_t memblock_align_up(phys_addr_t addr, phys_addr_t size)
+static phys_addr_t __init_memblock memblock_align_up(phys_addr_t addr, phys_addr_t size)
{
return (addr + (size - 1)) & ~(size - 1);
}

-static unsigned long memblock_addrs_overlap(phys_addr_t base1, phys_addr_t size1,
+static unsigned long __init_memblock memblock_addrs_overlap(phys_addr_t base1, phys_addr_t size1,
phys_addr_t base2, phys_addr_t size2)
{
return ((base1 < (base2 + size2)) && (base2 < (base1 + size1)));
}

-static long memblock_addrs_adjacent(phys_addr_t base1, phys_addr_t size1,
+static long __init_memblock memblock_addrs_adjacent(phys_addr_t base1, phys_addr_t size1,
phys_addr_t base2, phys_addr_t size2)
{
if (base2 == base1 + size1)
@@ -69,7 +69,7 @@ static long memblock_addrs_adjacent(phys_addr_t base1, phys_addr_t size1,
return 0;
}

-static long memblock_regions_adjacent(struct memblock_type *type,
+static long __init_memblock memblock_regions_adjacent(struct memblock_type *type,
unsigned long r1, unsigned long r2)
{
phys_addr_t base1 = type->regions[r1].base;
@@ -80,7 +80,7 @@ static long memblock_regions_adjacent(struct memblock_type *type,
return memblock_addrs_adjacent(base1, size1, base2, size2);
}

-long memblock_overlaps_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
+long __init_memblock memblock_overlaps_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
{
unsigned long i;

@@ -156,7 +156,7 @@ static phys_addr_t __init memblock_find_base(phys_addr_t size, phys_addr_t align
return MEMBLOCK_ERROR;
}

-static void memblock_remove_region(struct memblock_type *type, unsigned long r)
+static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r)
{
unsigned long i;

@@ -168,7 +168,7 @@ static void memblock_remove_region(struct memblock_type *type, unsigned long r)
}

/* Assumption: base addr of region 1 < base addr of region 2 */
-static void memblock_coalesce_regions(struct memblock_type *type,
+static void __init_memblock memblock_coalesce_regions(struct memblock_type *type,
unsigned long r1, unsigned long r2)
{
type->regions[r1].size += type->regions[r2].size;
@@ -178,7 +178,7 @@ static void memblock_coalesce_regions(struct memblock_type *type,
/* Defined below but needed now */
static long memblock_add_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size);

-static int memblock_double_array(struct memblock_type *type)
+static int __init_memblock memblock_double_array(struct memblock_type *type)
{
struct memblock_region *new_array, *old_array;
phys_addr_t old_size, new_size, addr;
@@ -249,13 +249,13 @@ static int memblock_double_array(struct memblock_type *type)
return 0;
}

-extern int __weak memblock_memory_can_coalesce(phys_addr_t addr1, phys_addr_t size1,
+extern int __init_memblock __weak memblock_memory_can_coalesce(phys_addr_t addr1, phys_addr_t size1,
phys_addr_t addr2, phys_addr_t size2)
{
return 1;
}

-static long memblock_add_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
+static long __init_memblock memblock_add_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
{
unsigned long coalesced = 0;
long adjacent, i;
@@ -342,13 +342,13 @@ static long memblock_add_region(struct memblock_type *type, phys_addr_t base, ph
return 0;
}

-long memblock_add(phys_addr_t base, phys_addr_t size)
+long __init_memblock memblock_add(phys_addr_t base, phys_addr_t size)
{
return memblock_add_region(&memblock.memory, base, size);

}

-static long __memblock_remove(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
+static long __init_memblock __memblock_remove(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
{
phys_addr_t rgnbegin, rgnend;
phys_addr_t end = base + size;
@@ -396,7 +396,7 @@ static long __memblock_remove(struct memblock_type *type, phys_addr_t base, phys
return memblock_add_region(type, end, rgnend - end);
}

-long memblock_remove(phys_addr_t base, phys_addr_t size)
+long __init_memblock memblock_remove(phys_addr_t base, phys_addr_t size)
{
return __memblock_remove(&memblock.memory, base, size);
}
@@ -562,7 +562,7 @@ phys_addr_t __init memblock_phys_mem_size(void)
return memblock.memory_size;
}

-phys_addr_t memblock_end_of_DRAM(void)
+phys_addr_t __init_memblock memblock_end_of_DRAM(void)
{
int idx = memblock.memory.cnt - 1;

@@ -623,7 +623,7 @@ int __init memblock_is_reserved(phys_addr_t addr)
return 0;
}

-int memblock_is_region_reserved(phys_addr_t base, phys_addr_t size)
+int __init_memblock memblock_is_region_reserved(phys_addr_t base, phys_addr_t size)
{
return memblock_overlaps_region(&memblock.reserved, base, size);
}
@@ -634,7 +634,7 @@ void __init memblock_set_current_limit(phys_addr_t limit)
memblock.current_limit = limit;
}

-static void memblock_dump(struct memblock_type *region, char *name)
+static void __init_memblock memblock_dump(struct memblock_type *region, char *name)
{
unsigned long long base, size;
int i;
@@ -650,7 +650,7 @@ static void memblock_dump(struct memblock_type *region, char *name)
}
}

-void memblock_dump_all(void)
+void __init_memblock memblock_dump_all(void)
{
if (!memblock_debug)
return;
@@ -716,7 +716,7 @@ static int __init early_memblock(char *p)
}
early_param("memblock", early_memblock);

-#ifdef CONFIG_DEBUG_FS
+#if defined(CONFIG_DEBUG_FS) && !defined(ARCH_DISCARD_MEMBLOCK)

static int memblock_debug_show(struct seq_file *m, void *private)
{
--
1.6.4.2

2010-07-22 18:22:56

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 31/31] memblock: Add memblock_find_in_range()

it is a wrapper for memblock_find_base

make it more easy for x86 to use memblock. ( rebase )
x86 early_res is using find/reserve pattern instead of alloc.

keep it in weak version, so later We can use x86 own version if needed.
also We need it in mm/memblock.c, so one caller mm/page_alloc.c could get compiled

-v2: Change name to memblock_find_in_range() according to Michael Ellerman
-v3: Add generic weak version __memblock_find_in_range()
so keep the path for fallback to x86 version that handle from low
-v4: use 0 for failing path
-v5: use MEMBLOCK_ERROR again
-v6: remove __memblock_find_in_range()

Signed-off-by: Yinghai Lu <[email protected]>
---
include/linux/memblock.h | 2 ++
mm/memblock.c | 8 ++++++++
2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 751a4eb..61b22eb 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -48,6 +48,8 @@ extern struct memblock_region memblock_reserved_init_regions[];
#define memblock_dbg(fmt, ...) \
if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)

+u64 memblock_find_in_range(u64 start, u64 end, u64 size, u64 align);
+
extern void __init memblock_init(void);
extern void __init memblock_analyze(void);
extern long memblock_add(phys_addr_t base, phys_addr_t size);
diff --git a/mm/memblock.c b/mm/memblock.c
index 7471dac..ca7de91 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -156,6 +156,14 @@ static phys_addr_t __init memblock_find_base(phys_addr_t size, phys_addr_t align
return MEMBLOCK_ERROR;
}

+/*
+ * Find a free area with specified alignment in a specific range.
+ */
+u64 __init __weak memblock_find_in_range(u64 start, u64 end, u64 size, u64 align)
+{
+ return memblock_find_base(size, align, start, end);
+}
+
static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r)
{
unsigned long i;
--
1.6.4.2

2010-07-22 18:22:59

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 05/31] memblock: Factor the lowest level alloc function

From: Benjamin Herrenschmidt <[email protected]>

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
mm/memblock.c | 59 ++++++++++++++++++++++++++------------------------------
1 files changed, 27 insertions(+), 32 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 83643f3..9b71de0 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -294,8 +294,8 @@ static u64 memblock_align_up(u64 addr, u64 size)
return (addr + (size - 1)) & ~(size - 1);
}

-static u64 __init memblock_alloc_nid_unreserved(u64 start, u64 end,
- u64 size, u64 align)
+static u64 __init memblock_alloc_region(u64 start, u64 end,
+ u64 size, u64 align)
{
u64 base, res_base;
long j;
@@ -318,6 +318,13 @@ static u64 __init memblock_alloc_nid_unreserved(u64 start, u64 end,
return ~(u64)0;
}

+u64 __weak __init memblock_nid_range(u64 start, u64 end, int *nid)
+{
+ *nid = 0;
+
+ return end;
+}
+
static u64 __init memblock_alloc_nid_region(struct memblock_region *mp,
u64 size, u64 align, int nid)
{
@@ -333,8 +340,7 @@ static u64 __init memblock_alloc_nid_region(struct memblock_region *mp,

this_end = memblock_nid_range(start, end, &this_nid);
if (this_nid == nid) {
- u64 ret = memblock_alloc_nid_unreserved(start, this_end,
- size, align);
+ u64 ret = memblock_alloc_region(start, this_end, size, align);
if (ret != ~(u64)0)
return ret;
}
@@ -351,6 +357,10 @@ u64 __init memblock_alloc_nid(u64 size, u64 align, int nid)

BUG_ON(0 == size);

+ /* We do a bottom-up search for a region with the right
+ * nid since that's easier considering how memblock_nid_range()
+ * works
+ */
size = memblock_align_up(size, align);

for (i = 0; i < mem->cnt; i++) {
@@ -383,7 +393,7 @@ u64 __init memblock_alloc_base(u64 size, u64 align, u64 max_addr)

u64 __init __memblock_alloc_base(u64 size, u64 align, u64 max_addr)
{
- long i, j;
+ long i;
u64 base = 0;
u64 res_base;

@@ -396,33 +406,24 @@ u64 __init __memblock_alloc_base(u64 size, u64 align, u64 max_addr)
if (max_addr == MEMBLOCK_ALLOC_ANYWHERE)
max_addr = MEMBLOCK_REAL_LIMIT;

+ /* Pump up max_addr */
+ if (max_addr == MEMBLOCK_ALLOC_ANYWHERE)
+ max_addr = ~(u64)0;
+
+ /* We do a top-down search, this tends to limit memory
+ * fragmentation by keeping early boot allocs near the
+ * top of memory
+ */
for (i = memblock.memory.cnt - 1; i >= 0; i--) {
u64 memblockbase = memblock.memory.regions[i].base;
u64 memblocksize = memblock.memory.regions[i].size;

if (memblocksize < size)
continue;
- if (max_addr == MEMBLOCK_ALLOC_ANYWHERE)
- base = memblock_align_down(memblockbase + memblocksize - size, align);
- else if (memblockbase < max_addr) {
- base = min(memblockbase + memblocksize, max_addr);
- base = memblock_align_down(base - size, align);
- } else
- continue;
-
- while (base && memblockbase <= base) {
- j = memblock_overlaps_region(&memblock.reserved, base, size);
- if (j < 0) {
- /* this area isn't reserved, take it */
- if (memblock_add_region(&memblock.reserved, base, size) < 0)
- return 0;
- return base;
- }
- res_base = memblock.reserved.regions[j].base;
- if (res_base < size)
- break;
- base = memblock_align_down(res_base - size, align);
- }
+ base = min(memblockbase + memblocksize, max_addr);
+ res_base = memblock_alloc_region(memblockbase, base, size, align);
+ if (res_base != ~(u64)0)
+ return res_base;
}
return 0;
}
@@ -502,9 +503,3 @@ int memblock_is_region_reserved(u64 base, u64 size)
return memblock_overlaps_region(&memblock.reserved, base, size);
}

-u64 __weak memblock_nid_range(u64 start, u64 end, int *nid)
-{
- *nid = 0;
-
- return end;
-}
--
1.6.4.2

2010-07-22 18:22:48

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 25/31] memblock: Add debugfs files to dump the arrays content

From: Benjamin Herrenschmidt <[email protected]>

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
mm/memblock.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 51 insertions(+), 0 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 424ca11..b5eb901 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -16,6 +16,8 @@
#include <linux/bitops.h>
#include <linux/poison.h>
#include <linux/pfn.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
#include <linux/memblock.h>

struct memblock memblock;
@@ -714,3 +716,52 @@ static int __init early_memblock(char *p)
}
early_param("memblock", early_memblock);

+#ifdef CONFIG_DEBUG_FS
+
+static int memblock_debug_show(struct seq_file *m, void *private)
+{
+ struct memblock_type *type = m->private;
+ struct memblock_region *reg;
+ int i;
+
+ for (i = 0; i < type->cnt; i++) {
+ reg = &type->regions[i];
+ seq_printf(m, "%4d: ", i);
+ if (sizeof(phys_addr_t) == 4)
+ seq_printf(m, "0x%08lx..0x%08lx\n",
+ (unsigned long)reg->base,
+ (unsigned long)(reg->base + reg->size - 1));
+ else
+ seq_printf(m, "0x%016llx..0x%016llx\n",
+ (unsigned long long)reg->base,
+ (unsigned long long)(reg->base + reg->size - 1));
+
+ }
+ return 0;
+}
+
+static int memblock_debug_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, memblock_debug_show, inode->i_private);
+}
+
+static const struct file_operations memblock_debug_fops = {
+ .open = memblock_debug_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int __init memblock_init_debugfs(void)
+{
+ struct dentry *root = debugfs_create_dir("memblock", NULL);
+ if (!root)
+ return -ENXIO;
+ debugfs_create_file("memory", S_IRUGO, root, &memblock.memory, &memblock_debug_fops);
+ debugfs_create_file("reserved", S_IRUGO, root, &memblock.reserved, &memblock_debug_fops);
+
+ return 0;
+}
+__initcall(memblock_init_debugfs);
+
+#endif /* CONFIG_DEBUG_FS */
--
1.6.4.2

2010-07-22 18:22:44

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 02/31] memblock: No reason to include asm/memblock.h late

From: Benjamin Herrenschmidt <[email protected]>

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
include/linux/memblock.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 86e7daf..4b69313 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -16,6 +16,8 @@
#include <linux/init.h>
#include <linux/mm.h>

+#include <asm/memblock.h>
+
#define MAX_MEMBLOCK_REGIONS 128

struct memblock_region {
@@ -82,8 +84,6 @@ memblock_end_pfn(struct memblock_type *type, unsigned long region_nr)
memblock_size_pages(type, region_nr);
}

-#include <asm/memblock.h>
-
#endif /* __KERNEL__ */

#endif /* _LINUX_MEMBLOCK_H */
--
1.6.4.2

2010-07-22 18:23:35

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 08/31] memblock: Remove rmo_size, burry it in arch/powerpc where it belongs

From: Benjamin Herrenschmidt <[email protected]>

The RMA (RMO is a misnomer) is a concept specific to ppc64 (in fact
server ppc64 though I hijack it on embedded ppc64 for similar purposes)
and represents the area of memory that can be accessed in real mode
(aka with MMU off), or on embedded, from the exception vectors (which
is bolted in the TLB) which pretty much boils down to the same thing.

We take that out of the generic MEMBLOCK data structure and move it into
arch/powerpc where it belongs, renaming it to "RMA" while at it.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
arch/powerpc/include/asm/mmu.h | 12 ++++++++++++
arch/powerpc/kernel/head_40x.S | 6 +-----
arch/powerpc/kernel/paca.c | 2 +-
arch/powerpc/kernel/prom.c | 29 ++++++++---------------------
arch/powerpc/kernel/rtas.c | 2 +-
arch/powerpc/kernel/setup_64.c | 2 +-
arch/powerpc/mm/40x_mmu.c | 14 +++++++++++++-
arch/powerpc/mm/44x_mmu.c | 14 ++++++++++++++
arch/powerpc/mm/fsl_booke_mmu.c | 9 +++++++++
arch/powerpc/mm/hash_utils_64.c | 22 +++++++++++++++++++++-
arch/powerpc/mm/init_32.c | 14 ++++++++++++++
arch/powerpc/mm/init_64.c | 1 +
arch/powerpc/mm/ppc_mmu_32.c | 15 +++++++++++++++
arch/powerpc/mm/tlb_nohash.c | 14 ++++++++++++++
include/linux/memblock.h | 1 -
mm/memblock.c | 8 --------
16 files changed, 125 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 7ebf42e..bb40a06 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -2,6 +2,8 @@
#define _ASM_POWERPC_MMU_H_
#ifdef __KERNEL__

+#include <linux/types.h>
+
#include <asm/asm-compat.h>
#include <asm/feature-fixups.h>

@@ -82,6 +84,16 @@ extern unsigned int __start___mmu_ftr_fixup, __stop___mmu_ftr_fixup;
extern void early_init_mmu(void);
extern void early_init_mmu_secondary(void);

+extern void setup_initial_memory_limit(phys_addr_t first_memblock_base,
+ phys_addr_t first_memblock_size);
+
+#ifdef CONFIG_PPC64
+/* This is our real memory area size on ppc64 server, on embedded, we
+ * make it match the size our of bolted TLB area
+ */
+extern u64 ppc64_rma_size;
+#endif /* CONFIG_PPC64 */
+
#endif /* !__ASSEMBLY__ */

/* The kernel use the constants below to index in the page sizes array.
diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index a90625f..8278e8b 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -923,11 +923,7 @@ initial_mmu:
mtspr SPRN_PID,r0
sync

- /* Configure and load two entries into TLB slots 62 and 63.
- * In case we are pinning TLBs, these are reserved in by the
- * other TLB functions. If not reserving, then it doesn't
- * matter where they are loaded.
- */
+ /* Configure and load one entry into TLB slots 63 */
clrrwi r4,r4,10 /* Mask off the real page number */
ori r4,r4,(TLB_WR | TLB_EX) /* Set the write and execute bits */

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 139a773..b9ffd7d 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -117,7 +117,7 @@ void __init allocate_pacas(void)
* the first segment. On iSeries they must be within the area mapped
* by the HV, which is HvPagesToMap * HVPAGESIZE bytes.
*/
- limit = min(0x10000000ULL, memblock.rmo_size);
+ limit = min(0x10000000ULL, ppc64_rma_size);
if (firmware_has_feature(FW_FEATURE_ISERIES))
limit = min(limit, HvPagesToMap * HVPAGESIZE);

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index f665d1b..f12b193 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -66,6 +66,7 @@
int __initdata iommu_is_off;
int __initdata iommu_force_on;
unsigned long tce_alloc_start, tce_alloc_end;
+u64 ppc64_rma_size;
#endif

static int __init early_parse_mem(char *p)
@@ -492,7 +493,7 @@ static int __init early_init_dt_scan_memory_ppc(unsigned long node,

void __init early_init_dt_add_memory_arch(u64 base, u64 size)
{
-#if defined(CONFIG_PPC64)
+#ifdef CONFIG_PPC64
if (iommu_is_off) {
if (base >= 0x80000000ul)
return;
@@ -501,9 +502,13 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)
}
#endif

- memblock_add(base, size);
-
+ /* First MEMBLOCK added, do some special initializations */
+ if (memstart_addr == ~(phys_addr_t)0)
+ setup_initial_memory_limit(base, size);
memstart_addr = min((u64)memstart_addr, base);
+
+ /* Add the chunk to the MEMBLOCK list */
+ memblock_add(base, size);
}

u64 __init early_init_dt_alloc_memory_arch(u64 size, u64 align)
@@ -655,22 +660,6 @@ static void __init phyp_dump_reserve_mem(void)
static inline void __init phyp_dump_reserve_mem(void) {}
#endif /* CONFIG_PHYP_DUMP && CONFIG_PPC_RTAS */

-static void set_boot_memory_limit(void)
-{
-#ifdef CONFIG_PPC32
- /* 601 can only access 16MB at the moment */
- if (PVR_VER(mfspr(SPRN_PVR)) == 1)
- memblock_set_current_limit(0x01000000);
- /* 8xx can only access 8MB at the moment */
- else if (PVR_VER(mfspr(SPRN_PVR)) == 0x50)
- memblock_set_current_limit(0x00800000);
- else
- memblock_set_current_limit(0x10000000);
-#else
- memblock_set_current_limit(memblock.rmo_size);
-#endif
-}
-
void __init early_init_devtree(void *params)
{
phys_addr_t limit;
@@ -734,8 +723,6 @@ void __init early_init_devtree(void *params)

DBG("Phys. mem: %llx\n", memblock_phys_mem_size());

- set_boot_memory_limit();
-
/* We may need to relocate the flat tree, do it now.
* FIXME .. and the initrd too? */
move_device_tree();
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index d0516db..1662777 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -934,7 +934,7 @@ void __init rtas_initialize(void)
*/
#ifdef CONFIG_PPC64
if (machine_is(pseries) && firmware_has_feature(FW_FEATURE_LPAR)) {
- rtas_region = min(memblock.rmo_size, RTAS_INSTANTIATE_MAX);
+ rtas_region = min(ppc64_rma_size, RTAS_INSTANTIATE_MAX);
ibm_suspend_me_token = rtas_token("ibm,suspend-me");
}
#endif
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index d135f93..4360944 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -487,7 +487,7 @@ static void __init emergency_stack_init(void)
* bringup, we need to get at them in real mode. This means they
* must also be within the RMO region.
*/
- limit = min(slb0_limit(), memblock.rmo_size);
+ limit = min(slb0_limit(), ppc64_rma_size);

for_each_possible_cpu(i) {
unsigned long sp;
diff --git a/arch/powerpc/mm/40x_mmu.c b/arch/powerpc/mm/40x_mmu.c
index 58969b5..5810967 100644
--- a/arch/powerpc/mm/40x_mmu.c
+++ b/arch/powerpc/mm/40x_mmu.c
@@ -141,7 +141,19 @@ unsigned long __init mmu_mapin_ram(unsigned long top)
* coverage with normal-sized pages (or other reasons) do not
* attempt to allocate outside the allowed range.
*/
- memblock_set_current_limit(memstart_addr + mapped);
+ memblock_set_current_limit(mapped);

return mapped;
}
+
+void setup_initial_memory_limit(phys_addr_t first_memblock_base,
+ phys_addr_t first_memblock_size)
+{
+ /* We don't currently support the first MEMBLOCK not mapping 0
+ * physical on those processors
+ */
+ BUG_ON(first_memblock_base != 0);
+
+ /* 40x can only access 16MB at the moment (see head_40x.S) */
+ memblock_set_current_limit(min_t(u64, first_memblock_size, 0x00800000));
+}
diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
index d8c6efb..024acab 100644
--- a/arch/powerpc/mm/44x_mmu.c
+++ b/arch/powerpc/mm/44x_mmu.c
@@ -24,6 +24,8 @@
*/

#include <linux/init.h>
+#include <linux/memblock.h>
+
#include <asm/mmu.h>
#include <asm/system.h>
#include <asm/page.h>
@@ -213,6 +215,18 @@ unsigned long __init mmu_mapin_ram(unsigned long top)
return total_lowmem;
}

+void setup_initial_memory_limit(phys_addr_t first_memblock_base,
+ phys_addr_t first_memblock_size)
+{
+ /* We don't currently support the first MEMBLOCK not mapping 0
+ * physical on those processors
+ */
+ BUG_ON(first_memblock_base != 0);
+
+ /* 44x has a 256M TLB entry pinned at boot */
+ memblock_set_current_limit(min_t(u64, first_memblock_size, PPC_PIN_SIZE));
+}
+
#ifdef CONFIG_SMP
void __cpuinit mmu_init_secondary(int cpu)
{
diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
index e525f86..0be8fe2 100644
--- a/arch/powerpc/mm/fsl_booke_mmu.c
+++ b/arch/powerpc/mm/fsl_booke_mmu.c
@@ -215,3 +215,12 @@ void __init adjust_total_lowmem(void)

memblock_set_current_limit(memstart_addr + __max_low_memory);
}
+
+void setup_initial_memory_limit(phys_addr_t first_memblock_base,
+ phys_addr_t first_memblock_size)
+{
+ phys_addr_t limit = first_memblock_base + first_memblock_size;
+
+ /* 64M mapped initially according to head_fsl_booke.S */
+ memblock_set_current_limit(min_t(u64, limit, 0x04000000));
+}
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 79f9445..327a3cd 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -649,7 +649,7 @@ static void __init htab_initialize(void)
#ifdef CONFIG_DEBUG_PAGEALLOC
linear_map_hash_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
linear_map_hash_slots = __va(memblock_alloc_base(linear_map_hash_count,
- 1, memblock.rmo_size));
+ 1, ppc64_rma_size));
memset(linear_map_hash_slots, 0, linear_map_hash_count);
#endif /* CONFIG_DEBUG_PAGEALLOC */

@@ -1221,3 +1221,23 @@ void kernel_map_pages(struct page *page, int numpages, int enable)
local_irq_restore(flags);
}
#endif /* CONFIG_DEBUG_PAGEALLOC */
+
+void setup_initial_memory_limit(phys_addr_t first_memblock_base,
+ phys_addr_t first_memblock_size)
+{
+ /* We don't currently support the first MEMBLOCK not mapping 0
+ * physical on those processors
+ */
+ BUG_ON(first_memblock_base != 0);
+
+ /* On LPAR systems, the first entry is our RMA region,
+ * non-LPAR 64-bit hash MMU systems don't have a limitation
+ * on real mode access, but using the first entry works well
+ * enough. We also clamp it to 1G to avoid some funky things
+ * such as RTAS bugs etc...
+ */
+ ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
+
+ /* Finally limit subsequent allocations */
+ memblock_set_current_limit(ppc64_rma_size);
+}
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 59b208b..742da43 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -237,3 +237,17 @@ void free_initrd_mem(unsigned long start, unsigned long end)
}
#endif

+
+#ifdef CONFIG_8xx /* No 8xx specific .c file to put that in ... */
+void setup_initial_memory_limit(phys_addr_t first_memblock_base,
+ phys_addr_t first_memblock_size)
+{
+ /* We don't currently support the first MEMBLOCK not mapping 0
+ * physical on those processors
+ */
+ BUG_ON(first_memblock_base != 0);
+
+ /* 8xx can only access 8MB at the moment */
+ memblock_set_current_limit(min_t(u64, first_memblock_size, 0x00800000));
+}
+#endif /* CONFIG_8xx */
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 71f1415..9e081ff 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -328,3 +328,4 @@ int __meminit vmemmap_populate(struct page *start_page,
return 0;
}
#endif /* CONFIG_SPARSEMEM_VMEMMAP */
+
diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
index 7d34e17..11571e1 100644
--- a/arch/powerpc/mm/ppc_mmu_32.c
+++ b/arch/powerpc/mm/ppc_mmu_32.c
@@ -271,3 +271,18 @@ void __init MMU_init_hw(void)

if ( ppc_md.progress ) ppc_md.progress("hash:done", 0x205);
}
+
+void setup_initial_memory_limit(phys_addr_t first_memblock_base,
+ phys_addr_t first_memblock_size)
+{
+ /* We don't currently support the first MEMBLOCK not mapping 0
+ * physical on those processors
+ */
+ BUG_ON(first_memblock_base != 0);
+
+ /* 601 can only access 16MB at the moment */
+ if (PVR_VER(mfspr(SPRN_PVR)) == 1)
+ memblock_set_current_limit(min_t(u64, first_memblock_size, 0x01000000));
+ else /* Anything else has 256M mapped */
+ memblock_set_current_limit(min_t(u64, first_memblock_size, 0x10000000));
+}
diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index 7ba32e7..a086ed5 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -446,4 +446,18 @@ void __cpuinit early_init_mmu_secondary(void)
__early_init_mmu(0);
}

+void setup_initial_memory_limit(phys_addr_t first_memblock_base,
+ phys_addr_t first_memblock_size)
+{
+ /* On Embedded 64-bit, we adjust the RMA size to match
+ * the bolted TLB entry. We know for now that only 1G
+ * entries are supported though that may eventually
+ * change. We crop it to the size of the first MEMBLOCK to
+ * avoid going over total available memory just in case...
+ */
+ ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
+
+ /* Finally limit subsequent allocations */
+ memblock_set_current_limit(ppc64_memblock_base + ppc64_rma_size);
+}
#endif /* CONFIG_PPC64 */
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index aabdcdd..767c198 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -33,7 +33,6 @@ struct memblock_type {

struct memblock {
unsigned long debug;
- u64 rmo_size;
u64 current_limit;
struct memblock_type memory;
struct memblock_type reserved;
diff --git a/mm/memblock.c b/mm/memblock.c
index cdb35ba..43fa162 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -49,7 +49,6 @@ void memblock_dump_all(void)
return;

pr_info("MEMBLOCK configuration:\n");
- pr_info(" rmo_size = 0x%llx\n", (unsigned long long)memblock.rmo_size);
pr_info(" memory.size = 0x%llx\n", (unsigned long long)memblock.memory.size);

memblock_dump(&memblock.memory, "memory");
@@ -195,10 +194,6 @@ static long memblock_add_region(struct memblock_type *type, u64 base, u64 size)

long memblock_add(u64 base, u64 size)
{
- /* On pSeries LPAR systems, the first MEMBLOCK is our RMO region. */
- if (base == 0)
- memblock.rmo_size = size;
-
return memblock_add_region(&memblock.memory, base, size);

}
@@ -459,9 +454,6 @@ void __init memblock_enforce_memory_limit(u64 memory_limit)
break;
}

- if (memblock.memory.regions[0].size < memblock.rmo_size)
- memblock.rmo_size = memblock.memory.regions[0].size;
-
memory_limit = memblock_end_of_DRAM();

/* And truncate any reserves above the limit also. */
--
1.6.4.2

2010-07-22 18:23:41

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 01/31] memblock: Rename memblock_region to memblock_type and memblock_property to memblock_region

From: Benjamin Herrenschmidt <[email protected]>

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
arch/microblaze/mm/init.c | 4 +-
arch/powerpc/mm/hash_utils_64.c | 2 +-
arch/powerpc/mm/mem.c | 26 +++---
arch/powerpc/platforms/embedded6xx/wii.c | 2 +-
arch/sparc/mm/init_64.c | 6 +-
include/linux/memblock.h | 24 ++--
mm/memblock.c | 168 +++++++++++++++---------------
7 files changed, 115 insertions(+), 117 deletions(-)

diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c
index db59349..afd6494 100644
--- a/arch/microblaze/mm/init.c
+++ b/arch/microblaze/mm/init.c
@@ -77,8 +77,8 @@ void __init setup_memory(void)

/* Find main memory where is the kernel */
for (i = 0; i < memblock.memory.cnt; i++) {
- memory_start = (u32) memblock.memory.region[i].base;
- memory_end = (u32) memblock.memory.region[i].base
+ memory_start = (u32) memblock.memory.regions[i].base;
+ memory_end = (u32) memblock.memory.regions[i].base
+ (u32) memblock.memory.region[i].size;
if ((memory_start <= (u32)_text) &&
((u32)_text <= memory_end)) {
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 98f262d..dbaacb7 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -660,7 +660,7 @@ static void __init htab_initialize(void)

/* create bolted the linear mapping in the hash table */
for (i=0; i < memblock.memory.cnt; i++) {
- base = (unsigned long)__va(memblock.memory.region[i].base);
+ base = (unsigned long)__va(memblock.memory.regions[i].base);
size = memblock.memory.region[i].size;

DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 1a84a8d..a33f5c1 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -86,10 +86,10 @@ int page_is_ram(unsigned long pfn)
for (i=0; i < memblock.memory.cnt; i++) {
unsigned long base;

- base = memblock.memory.region[i].base;
+ base = memblock.memory.regions[i].base;

if ((paddr >= base) &&
- (paddr < (base + memblock.memory.region[i].size))) {
+ (paddr < (base + memblock.memory.regions[i].size))) {
return 1;
}
}
@@ -149,7 +149,7 @@ int
walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
void *arg, int (*func)(unsigned long, unsigned long, void *))
{
- struct memblock_property res;
+ struct memblock_region res;
unsigned long pfn, len;
u64 end;
int ret = -1;
@@ -206,7 +206,7 @@ void __init do_init_bootmem(void)
/* Add active regions with valid PFNs */
for (i = 0; i < memblock.memory.cnt; i++) {
unsigned long start_pfn, end_pfn;
- start_pfn = memblock.memory.region[i].base >> PAGE_SHIFT;
+ start_pfn = memblock.memory.regions[i].base >> PAGE_SHIFT;
end_pfn = start_pfn + memblock_size_pages(&memblock.memory, i);
add_active_range(0, start_pfn, end_pfn);
}
@@ -219,16 +219,16 @@ void __init do_init_bootmem(void)

/* reserve the sections we're already using */
for (i = 0; i < memblock.reserved.cnt; i++) {
- unsigned long addr = memblock.reserved.region[i].base +
+ unsigned long addr = memblock.reserved.regions[i].base +
memblock_size_bytes(&memblock.reserved, i) - 1;
if (addr < lowmem_end_addr)
- reserve_bootmem(memblock.reserved.region[i].base,
+ reserve_bootmem(memblock.reserved.regions[i].base,
memblock_size_bytes(&memblock.reserved, i),
BOOTMEM_DEFAULT);
- else if (memblock.reserved.region[i].base < lowmem_end_addr) {
+ else if (memblock.reserved.regions[i].base < lowmem_end_addr) {
unsigned long adjusted_size = lowmem_end_addr -
- memblock.reserved.region[i].base;
- reserve_bootmem(memblock.reserved.region[i].base,
+ memblock.reserved.regions[i].base;
+ reserve_bootmem(memblock.reserved.regions[i].base,
adjusted_size, BOOTMEM_DEFAULT);
}
}
@@ -237,7 +237,7 @@ void __init do_init_bootmem(void)

/* reserve the sections we're already using */
for (i = 0; i < memblock.reserved.cnt; i++)
- reserve_bootmem(memblock.reserved.region[i].base,
+ reserve_bootmem(memblock.reserved.regions[i].base,
memblock_size_bytes(&memblock.reserved, i),
BOOTMEM_DEFAULT);

@@ -257,10 +257,10 @@ static int __init mark_nonram_nosave(void)

for (i = 0; i < memblock.memory.cnt - 1; i++) {
memblock_region_max_pfn =
- (memblock.memory.region[i].base >> PAGE_SHIFT) +
- (memblock.memory.region[i].size >> PAGE_SHIFT);
+ (memblock.memory.regions[i].base >> PAGE_SHIFT) +
+ (memblock.memory.regions[i].size >> PAGE_SHIFT);
memblock_next_region_start_pfn =
- memblock.memory.region[i+1].base >> PAGE_SHIFT;
+ memblock.memory.regions[i+1].base >> PAGE_SHIFT;

if (memblock_region_max_pfn < memblock_next_region_start_pfn)
register_nosave_region(memblock_region_max_pfn,
diff --git a/arch/powerpc/platforms/embedded6xx/wii.c b/arch/powerpc/platforms/embedded6xx/wii.c
index 5cdcc7c..8450c29 100644
--- a/arch/powerpc/platforms/embedded6xx/wii.c
+++ b/arch/powerpc/platforms/embedded6xx/wii.c
@@ -65,7 +65,7 @@ static int __init page_aligned(unsigned long x)

void __init wii_memory_fixups(void)
{
- struct memblock_property *p = memblock.memory.region;
+ struct memblock_region *p = memblock.memory.region;

/*
* This is part of a workaround to allow the use of two
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index f043451..16d8bee 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -978,7 +978,7 @@ static void __init add_node_ranges(void)
unsigned long size = memblock_size_bytes(&memblock.memory, i);
unsigned long start, end;

- start = memblock.memory.region[i].base;
+ start = memblock.memory.regions[i].base;
end = start + size;
while (start < end) {
unsigned long this_end;
@@ -1299,7 +1299,7 @@ static void __init bootmem_init_nonnuma(void)
if (!size)
continue;

- start_pfn = memblock.memory.region[i].base >> PAGE_SHIFT;
+ start_pfn = memblock.memory.regions[i].base >> PAGE_SHIFT;
end_pfn = start_pfn + memblock_size_pages(&memblock.memory, i);
add_active_range(0, start_pfn, end_pfn);
}
@@ -1339,7 +1339,7 @@ static void __init trim_reserved_in_node(int nid)
numadbg(" trim_reserved_in_node(%d)\n", nid);

for (i = 0; i < memblock.reserved.cnt; i++) {
- unsigned long start = memblock.reserved.region[i].base;
+ unsigned long start = memblock.reserved.regions[i].base;
unsigned long size = memblock_size_bytes(&memblock.reserved, i);
unsigned long end = start + size;

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index a59faf2..86e7daf 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -18,22 +18,22 @@

#define MAX_MEMBLOCK_REGIONS 128

-struct memblock_property {
+struct memblock_region {
u64 base;
u64 size;
};

-struct memblock_region {
+struct memblock_type {
unsigned long cnt;
u64 size;
- struct memblock_property region[MAX_MEMBLOCK_REGIONS+1];
+ struct memblock_region regions[MAX_MEMBLOCK_REGIONS+1];
};

struct memblock {
unsigned long debug;
u64 rmo_size;
- struct memblock_region memory;
- struct memblock_region reserved;
+ struct memblock_type memory;
+ struct memblock_type reserved;
};

extern struct memblock memblock;
@@ -56,27 +56,27 @@ extern u64 memblock_end_of_DRAM(void);
extern void __init memblock_enforce_memory_limit(u64 memory_limit);
extern int __init memblock_is_reserved(u64 addr);
extern int memblock_is_region_reserved(u64 base, u64 size);
-extern int memblock_find(struct memblock_property *res);
+extern int memblock_find(struct memblock_region *res);

extern void memblock_dump_all(void);

static inline u64
-memblock_size_bytes(struct memblock_region *type, unsigned long region_nr)
+memblock_size_bytes(struct memblock_type *type, unsigned long region_nr)
{
- return type->region[region_nr].size;
+ return type->regions[region_nr].size;
}
static inline u64
-memblock_size_pages(struct memblock_region *type, unsigned long region_nr)
+memblock_size_pages(struct memblock_type *type, unsigned long region_nr)
{
return memblock_size_bytes(type, region_nr) >> PAGE_SHIFT;
}
static inline u64
-memblock_start_pfn(struct memblock_region *type, unsigned long region_nr)
+memblock_start_pfn(struct memblock_type *type, unsigned long region_nr)
{
- return type->region[region_nr].base >> PAGE_SHIFT;
+ return type->regions[region_nr].base >> PAGE_SHIFT;
}
static inline u64
-memblock_end_pfn(struct memblock_region *type, unsigned long region_nr)
+memblock_end_pfn(struct memblock_type *type, unsigned long region_nr)
{
return memblock_start_pfn(type, region_nr) +
memblock_size_pages(type, region_nr);
diff --git a/mm/memblock.c b/mm/memblock.c
index 3024eb3..13d4a57 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -29,7 +29,7 @@ static int __init early_memblock(char *p)
}
early_param("memblock", early_memblock);

-static void memblock_dump(struct memblock_region *region, char *name)
+static void memblock_dump(struct memblock_type *region, char *name)
{
unsigned long long base, size;
int i;
@@ -37,8 +37,8 @@ static void memblock_dump(struct memblock_region *region, char *name)
pr_info(" %s.cnt = 0x%lx\n", name, region->cnt);

for (i = 0; i < region->cnt; i++) {
- base = region->region[i].base;
- size = region->region[i].size;
+ base = region->regions[i].base;
+ size = region->regions[i].size;

pr_info(" %s[0x%x]\t0x%016llx - 0x%016llx, 0x%llx bytes\n",
name, i, base, base + size - 1, size);
@@ -74,34 +74,34 @@ static long memblock_addrs_adjacent(u64 base1, u64 size1, u64 base2, u64 size2)
return 0;
}

-static long memblock_regions_adjacent(struct memblock_region *rgn,
+static long memblock_regions_adjacent(struct memblock_type *type,
unsigned long r1, unsigned long r2)
{
- u64 base1 = rgn->region[r1].base;
- u64 size1 = rgn->region[r1].size;
- u64 base2 = rgn->region[r2].base;
- u64 size2 = rgn->region[r2].size;
+ u64 base1 = type->regions[r1].base;
+ u64 size1 = type->regions[r1].size;
+ u64 base2 = type->regions[r2].base;
+ u64 size2 = type->regions[r2].size;

return memblock_addrs_adjacent(base1, size1, base2, size2);
}

-static void memblock_remove_region(struct memblock_region *rgn, unsigned long r)
+static void memblock_remove_region(struct memblock_type *type, unsigned long r)
{
unsigned long i;

- for (i = r; i < rgn->cnt - 1; i++) {
- rgn->region[i].base = rgn->region[i + 1].base;
- rgn->region[i].size = rgn->region[i + 1].size;
+ for (i = r; i < type->cnt - 1; i++) {
+ type->regions[i].base = type->regions[i + 1].base;
+ type->regions[i].size = type->regions[i + 1].size;
}
- rgn->cnt--;
+ type->cnt--;
}

/* Assumption: base addr of region 1 < base addr of region 2 */
-static void memblock_coalesce_regions(struct memblock_region *rgn,
+static void memblock_coalesce_regions(struct memblock_type *type,
unsigned long r1, unsigned long r2)
{
- rgn->region[r1].size += rgn->region[r2].size;
- memblock_remove_region(rgn, r2);
+ type->regions[r1].size += type->regions[r2].size;
+ memblock_remove_region(type, r2);
}

void __init memblock_init(void)
@@ -109,13 +109,13 @@ void __init memblock_init(void)
/* Create a dummy zero size MEMBLOCK which will get coalesced away later.
* This simplifies the memblock_add() code below...
*/
- memblock.memory.region[0].base = 0;
- memblock.memory.region[0].size = 0;
+ memblock.memory.regions[0].base = 0;
+ memblock.memory.regions[0].size = 0;
memblock.memory.cnt = 1;

/* Ditto. */
- memblock.reserved.region[0].base = 0;
- memblock.reserved.region[0].size = 0;
+ memblock.reserved.regions[0].base = 0;
+ memblock.reserved.regions[0].size = 0;
memblock.reserved.cnt = 1;
}

@@ -126,24 +126,24 @@ void __init memblock_analyze(void)
memblock.memory.size = 0;

for (i = 0; i < memblock.memory.cnt; i++)
- memblock.memory.size += memblock.memory.region[i].size;
+ memblock.memory.size += memblock.memory.regions[i].size;
}

-static long memblock_add_region(struct memblock_region *rgn, u64 base, u64 size)
+static long memblock_add_region(struct memblock_type *type, u64 base, u64 size)
{
unsigned long coalesced = 0;
long adjacent, i;

- if ((rgn->cnt == 1) && (rgn->region[0].size == 0)) {
- rgn->region[0].base = base;
- rgn->region[0].size = size;
+ if ((type->cnt == 1) && (type->regions[0].size == 0)) {
+ type->regions[0].base = base;
+ type->regions[0].size = size;
return 0;
}

/* First try and coalesce this MEMBLOCK with another. */
- for (i = 0; i < rgn->cnt; i++) {
- u64 rgnbase = rgn->region[i].base;
- u64 rgnsize = rgn->region[i].size;
+ for (i = 0; i < type->cnt; i++) {
+ u64 rgnbase = type->regions[i].base;
+ u64 rgnsize = type->regions[i].size;

if ((rgnbase == base) && (rgnsize == size))
/* Already have this region, so we're done */
@@ -151,61 +151,59 @@ static long memblock_add_region(struct memblock_region *rgn, u64 base, u64 size)

adjacent = memblock_addrs_adjacent(base, size, rgnbase, rgnsize);
if (adjacent > 0) {
- rgn->region[i].base -= size;
- rgn->region[i].size += size;
+ type->regions[i].base -= size;
+ type->regions[i].size += size;
coalesced++;
break;
} else if (adjacent < 0) {
- rgn->region[i].size += size;
+ type->regions[i].size += size;
coalesced++;
break;
}
}

- if ((i < rgn->cnt - 1) && memblock_regions_adjacent(rgn, i, i+1)) {
- memblock_coalesce_regions(rgn, i, i+1);
+ if ((i < type->cnt - 1) && memblock_regions_adjacent(type, i, i+1)) {
+ memblock_coalesce_regions(type, i, i+1);
coalesced++;
}

if (coalesced)
return coalesced;
- if (rgn->cnt >= MAX_MEMBLOCK_REGIONS)
+ if (type->cnt >= MAX_MEMBLOCK_REGIONS)
return -1;

/* Couldn't coalesce the MEMBLOCK, so add it to the sorted table. */
- for (i = rgn->cnt - 1; i >= 0; i--) {
- if (base < rgn->region[i].base) {
- rgn->region[i+1].base = rgn->region[i].base;
- rgn->region[i+1].size = rgn->region[i].size;
+ for (i = type->cnt - 1; i >= 0; i--) {
+ if (base < type->regions[i].base) {
+ type->regions[i+1].base = type->regions[i].base;
+ type->regions[i+1].size = type->regions[i].size;
} else {
- rgn->region[i+1].base = base;
- rgn->region[i+1].size = size;
+ type->regions[i+1].base = base;
+ type->regions[i+1].size = size;
break;
}
}

- if (base < rgn->region[0].base) {
- rgn->region[0].base = base;
- rgn->region[0].size = size;
+ if (base < type->regions[0].base) {
+ type->regions[0].base = base;
+ type->regions[0].size = size;
}
- rgn->cnt++;
+ type->cnt++;

return 0;
}

long memblock_add(u64 base, u64 size)
{
- struct memblock_region *_rgn = &memblock.memory;
-
/* On pSeries LPAR systems, the first MEMBLOCK is our RMO region. */
if (base == 0)
memblock.rmo_size = size;

- return memblock_add_region(_rgn, base, size);
+ return memblock_add_region(&memblock.memory, base, size);

}

-static long __memblock_remove(struct memblock_region *rgn, u64 base, u64 size)
+static long __memblock_remove(struct memblock_type *type, u64 base, u64 size)
{
u64 rgnbegin, rgnend;
u64 end = base + size;
@@ -214,34 +212,34 @@ static long __memblock_remove(struct memblock_region *rgn, u64 base, u64 size)
rgnbegin = rgnend = 0; /* supress gcc warnings */

/* Find the region where (base, size) belongs to */
- for (i=0; i < rgn->cnt; i++) {
- rgnbegin = rgn->region[i].base;
- rgnend = rgnbegin + rgn->region[i].size;
+ for (i=0; i < type->cnt; i++) {
+ rgnbegin = type->regions[i].base;
+ rgnend = rgnbegin + type->regions[i].size;

if ((rgnbegin <= base) && (end <= rgnend))
break;
}

/* Didn't find the region */
- if (i == rgn->cnt)
+ if (i == type->cnt)
return -1;

/* Check to see if we are removing entire region */
if ((rgnbegin == base) && (rgnend == end)) {
- memblock_remove_region(rgn, i);
+ memblock_remove_region(type, i);
return 0;
}

/* Check to see if region is matching at the front */
if (rgnbegin == base) {
- rgn->region[i].base = end;
- rgn->region[i].size -= size;
+ type->regions[i].base = end;
+ type->regions[i].size -= size;
return 0;
}

/* Check to see if the region is matching at the end */
if (rgnend == end) {
- rgn->region[i].size -= size;
+ type->regions[i].size -= size;
return 0;
}

@@ -249,8 +247,8 @@ static long __memblock_remove(struct memblock_region *rgn, u64 base, u64 size)
* We need to split the entry - adjust the current one to the
* beginging of the hole and add the region after hole.
*/
- rgn->region[i].size = base - rgn->region[i].base;
- return memblock_add_region(rgn, end, rgnend - end);
+ type->regions[i].size = base - type->regions[i].base;
+ return memblock_add_region(type, end, rgnend - end);
}

long memblock_remove(u64 base, u64 size)
@@ -265,25 +263,25 @@ long __init memblock_free(u64 base, u64 size)

long __init memblock_reserve(u64 base, u64 size)
{
- struct memblock_region *_rgn = &memblock.reserved;
+ struct memblock_type *_rgn = &memblock.reserved;

BUG_ON(0 == size);

return memblock_add_region(_rgn, base, size);
}

-long memblock_overlaps_region(struct memblock_region *rgn, u64 base, u64 size)
+long memblock_overlaps_region(struct memblock_type *type, u64 base, u64 size)
{
unsigned long i;

- for (i = 0; i < rgn->cnt; i++) {
- u64 rgnbase = rgn->region[i].base;
- u64 rgnsize = rgn->region[i].size;
+ for (i = 0; i < type->cnt; i++) {
+ u64 rgnbase = type->regions[i].base;
+ u64 rgnsize = type->regions[i].size;
if (memblock_addrs_overlap(base, size, rgnbase, rgnsize))
break;
}

- return (i < rgn->cnt) ? i : -1;
+ return (i < type->cnt) ? i : -1;
}

static u64 memblock_align_down(u64 addr, u64 size)
@@ -311,7 +309,7 @@ static u64 __init memblock_alloc_nid_unreserved(u64 start, u64 end,
base = ~(u64)0;
return base;
}
- res_base = memblock.reserved.region[j].base;
+ res_base = memblock.reserved.regions[j].base;
if (res_base < size)
break;
base = memblock_align_down(res_base - size, align);
@@ -320,7 +318,7 @@ static u64 __init memblock_alloc_nid_unreserved(u64 start, u64 end,
return ~(u64)0;
}

-static u64 __init memblock_alloc_nid_region(struct memblock_property *mp,
+static u64 __init memblock_alloc_nid_region(struct memblock_region *mp,
u64 (*nid_range)(u64, u64, int *),
u64 size, u64 align, int nid)
{
@@ -350,7 +348,7 @@ static u64 __init memblock_alloc_nid_region(struct memblock_property *mp,
u64 __init memblock_alloc_nid(u64 size, u64 align, int nid,
u64 (*nid_range)(u64 start, u64 end, int *nid))
{
- struct memblock_region *mem = &memblock.memory;
+ struct memblock_type *mem = &memblock.memory;
int i;

BUG_ON(0 == size);
@@ -358,7 +356,7 @@ u64 __init memblock_alloc_nid(u64 size, u64 align, int nid,
size = memblock_align_up(size, align);

for (i = 0; i < mem->cnt; i++) {
- u64 ret = memblock_alloc_nid_region(&mem->region[i],
+ u64 ret = memblock_alloc_nid_region(&mem->regions[i],
nid_range,
size, align, nid);
if (ret != ~(u64)0)
@@ -402,8 +400,8 @@ u64 __init __memblock_alloc_base(u64 size, u64 align, u64 max_addr)
max_addr = MEMBLOCK_REAL_LIMIT;

for (i = memblock.memory.cnt - 1; i >= 0; i--) {
- u64 memblockbase = memblock.memory.region[i].base;
- u64 memblocksize = memblock.memory.region[i].size;
+ u64 memblockbase = memblock.memory.regions[i].base;
+ u64 memblocksize = memblock.memory.regions[i].size;

if (memblocksize < size)
continue;
@@ -423,7 +421,7 @@ u64 __init __memblock_alloc_base(u64 size, u64 align, u64 max_addr)
return 0;
return base;
}
- res_base = memblock.reserved.region[j].base;
+ res_base = memblock.reserved.regions[j].base;
if (res_base < size)
break;
base = memblock_align_down(res_base - size, align);
@@ -442,7 +440,7 @@ u64 memblock_end_of_DRAM(void)
{
int idx = memblock.memory.cnt - 1;

- return (memblock.memory.region[idx].base + memblock.memory.region[idx].size);
+ return (memblock.memory.regions[idx].base + memblock.memory.regions[idx].size);
}

/* You must call memblock_analyze() after this. */
@@ -450,7 +448,7 @@ void __init memblock_enforce_memory_limit(u64 memory_limit)
{
unsigned long i;
u64 limit;
- struct memblock_property *p;
+ struct memblock_region *p;

if (!memory_limit)
return;
@@ -458,24 +456,24 @@ void __init memblock_enforce_memory_limit(u64 memory_limit)
/* Truncate the memblock regions to satisfy the memory limit. */
limit = memory_limit;
for (i = 0; i < memblock.memory.cnt; i++) {
- if (limit > memblock.memory.region[i].size) {
- limit -= memblock.memory.region[i].size;
+ if (limit > memblock.memory.regions[i].size) {
+ limit -= memblock.memory.regions[i].size;
continue;
}

- memblock.memory.region[i].size = limit;
+ memblock.memory.regions[i].size = limit;
memblock.memory.cnt = i + 1;
break;
}

- if (memblock.memory.region[0].size < memblock.rmo_size)
- memblock.rmo_size = memblock.memory.region[0].size;
+ if (memblock.memory.regions[0].size < memblock.rmo_size)
+ memblock.rmo_size = memblock.memory.regions[0].size;

memory_limit = memblock_end_of_DRAM();

/* And truncate any reserves above the limit also. */
for (i = 0; i < memblock.reserved.cnt; i++) {
- p = &memblock.reserved.region[i];
+ p = &memblock.reserved.regions[i];

if (p->base > memory_limit)
p->size = 0;
@@ -494,9 +492,9 @@ int __init memblock_is_reserved(u64 addr)
int i;

for (i = 0; i < memblock.reserved.cnt; i++) {
- u64 upper = memblock.reserved.region[i].base +
- memblock.reserved.region[i].size - 1;
- if ((addr >= memblock.reserved.region[i].base) && (addr <= upper))
+ u64 upper = memblock.reserved.regions[i].base +
+ memblock.reserved.regions[i].size - 1;
+ if ((addr >= memblock.reserved.regions[i].base) && (addr <= upper))
return 1;
}
return 0;
@@ -511,7 +509,7 @@ int memblock_is_region_reserved(u64 base, u64 size)
* Given a <base, len>, find which memory regions belong to this range.
* Adjust the request and return a contiguous chunk.
*/
-int memblock_find(struct memblock_property *res)
+int memblock_find(struct memblock_region *res)
{
int i;
u64 rstart, rend;
@@ -520,8 +518,8 @@ int memblock_find(struct memblock_property *res)
rend = rstart + res->size - 1;

for (i = 0; i < memblock.memory.cnt; i++) {
- u64 start = memblock.memory.region[i].base;
- u64 end = start + memblock.memory.region[i].size - 1;
+ u64 start = memblock.memory.regions[i].base;
+ u64 end = start + memblock.memory.regions[i].size - 1;

if (start > rend)
return -1;
--
1.6.4.2

2010-07-22 18:24:13

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 13/31] memblock: Add debug markers at the end of the array

From: Benjamin Herrenschmidt <[email protected]>

Since we allocate one more than needed, why not do a bit of sanity checking
here to ensure we don't walk past the end of the array ?

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
mm/memblock.c | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index e1c5ce3..21ba089 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -13,6 +13,7 @@
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/bitops.h>
+#include <linux/poison.h>
#include <linux/memblock.h>

struct memblock memblock;
@@ -112,6 +113,10 @@ void __init memblock_init(void)
memblock.reserved.regions = memblock_reserved_init_regions;
memblock.reserved.max = INIT_MEMBLOCK_REGIONS;

+ /* Write a marker in the unused last array entry */
+ memblock.memory.regions[INIT_MEMBLOCK_REGIONS].base = (phys_addr_t)RED_INACTIVE;
+ memblock.reserved.regions[INIT_MEMBLOCK_REGIONS].base = (phys_addr_t)RED_INACTIVE;
+
/* Create a dummy zero size MEMBLOCK which will get coalesced away later.
* This simplifies the memblock_add() code below...
*/
@@ -131,6 +136,12 @@ void __init memblock_analyze(void)
{
int i;

+ /* Check marker in the unused last array entry */
+ WARN_ON(memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS].base
+ != (phys_addr_t)RED_INACTIVE);
+ WARN_ON(memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS].base
+ != (phys_addr_t)RED_INACTIVE);
+
memblock.memory_size = 0;

for (i = 0; i < memblock.memory.cnt; i++)
--
1.6.4.2

2010-07-22 18:24:27

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 19/31] memblock: Add array resizing support

From: Benjamin Herrenschmidt <[email protected]>

When one of the array gets full, we resize it. After much thinking and
a few iterations of that code, I went back to on-demand resizing using
the (new) internal memblock_find_base() function, which is pretty much what
Yinghai initially proposed, though there some differences in the details.

To work this relies on the default alloc limit being set sensibly by
the architecture.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
mm/memblock.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 102 insertions(+), 2 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 80d8b85..8197f37 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -11,6 +11,7 @@
*/

#include <linux/kernel.h>
+#include <linux/slab.h>
#include <linux/init.h>
#include <linux/bitops.h>
#include <linux/poison.h>
@@ -18,12 +19,23 @@

struct memblock memblock;

-static int memblock_debug;
+static int memblock_debug, memblock_can_resize;
static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS + 1];
static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS + 1];

#define MEMBLOCK_ERROR (~(phys_addr_t)0)

+/* inline so we don't get a warning when pr_debug is compiled out */
+static inline const char *memblock_type_name(struct memblock_type *type)
+{
+ if (type == &memblock.memory)
+ return "memory";
+ else if (type == &memblock.reserved)
+ return "reserved";
+ else
+ return "unknown";
+}
+
/*
* Address comparison utilities
*/
@@ -156,6 +168,79 @@ static void memblock_coalesce_regions(struct memblock_type *type,
memblock_remove_region(type, r2);
}

+/* Defined below but needed now */
+static long memblock_add_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size);
+
+static int memblock_double_array(struct memblock_type *type)
+{
+ struct memblock_region *new_array, *old_array;
+ phys_addr_t old_size, new_size, addr;
+ int use_slab = slab_is_available();
+
+ /* We don't allow resizing until we know about the reserved regions
+ * of memory that aren't suitable for allocation
+ */
+ if (!memblock_can_resize)
+ return -1;
+
+ pr_debug("memblock: %s array full, doubling...", memblock_type_name(type));
+
+ /* Calculate new doubled size */
+ old_size = type->max * sizeof(struct memblock_region);
+ new_size = old_size << 1;
+
+ /* Try to find some space for it.
+ *
+ * WARNING: We assume that either slab_is_available() and we use it or
+ * we use MEMBLOCK for allocations. That means that this is unsafe to use
+ * when bootmem is currently active (unless bootmem itself is implemented
+ * on top of MEMBLOCK which isn't the case yet)
+ *
+ * This should however not be an issue for now, as we currently only
+ * call into MEMBLOCK while it's still active, or much later when slab is
+ * active for memory hotplug operations
+ */
+ if (use_slab) {
+ new_array = kmalloc(new_size, GFP_KERNEL);
+ addr = new_array == NULL ? MEMBLOCK_ERROR : __pa(new_array);
+ } else
+ addr = memblock_find_base(new_size, sizeof(phys_addr_t), MEMBLOCK_ALLOC_ACCESSIBLE);
+ if (addr == MEMBLOCK_ERROR) {
+ pr_err("memblock: Failed to double %s array from %ld to %ld entries !\n",
+ memblock_type_name(type), type->max, type->max * 2);
+ return -1;
+ }
+ new_array = __va(addr);
+
+ /* Found space, we now need to move the array over before
+ * we add the reserved region since it may be our reserved
+ * array itself that is full.
+ */
+ memcpy(new_array, type->regions, old_size);
+ memset(new_array + type->max, 0, old_size);
+ old_array = type->regions;
+ type->regions = new_array;
+ type->max <<= 1;
+
+ /* If we use SLAB that's it, we are done */
+ if (use_slab)
+ return 0;
+
+ /* Add the new reserved region now. Should not fail ! */
+ BUG_ON(memblock_add_region(&memblock.reserved, addr, new_size) < 0);
+
+ /* If the array wasn't our static init one, then free it. We only do
+ * that before SLAB is available as later on, we don't know whether
+ * to use kfree or free_bootmem_pages(). Shouldn't be a big deal
+ * anyways
+ */
+ if (old_array != memblock_memory_init_regions &&
+ old_array != memblock_reserved_init_regions)
+ memblock_free(__pa(old_array), old_size);
+
+ return 0;
+}
+
static long memblock_add_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
{
unsigned long coalesced = 0;
@@ -196,7 +281,11 @@ static long memblock_add_region(struct memblock_type *type, phys_addr_t base, ph

if (coalesced)
return coalesced;
- if (type->cnt >= type->max)
+
+ /* If we are out of space, we fail. It's too late to resize the array
+ * but then this shouldn't have happened in the first place.
+ */
+ if (WARN_ON(type->cnt >= type->max))
return -1;

/* Couldn't coalesce the MEMBLOCK, so add it to the sorted table. */
@@ -217,6 +306,14 @@ static long memblock_add_region(struct memblock_type *type, phys_addr_t base, ph
}
type->cnt++;

+ /* The array is full ? Try to resize it. If that fails, we undo
+ * our allocation and return an error
+ */
+ if (type->cnt == type->max && memblock_double_array(type)) {
+ type->cnt--;
+ return -1;
+ }
+
return 0;
}

@@ -515,6 +612,9 @@ void __init memblock_analyze(void)

for (i = 0; i < memblock.memory.cnt; i++)
memblock.memory_size += memblock.memory.regions[i].size;
+
+ /* We allow resizing from there */
+ memblock_can_resize = 1;
}

void __init memblock_init(void)
--
1.6.4.2

2010-07-22 18:24:39

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 14/31] memblock: Make memblock_find_region() out of memblock_alloc_region()

From: Benjamin Herrenschmidt <[email protected]>

This function will be used to locate a free area to put the new memblock
arrays when attempting to resize them. memblock_alloc_region() is gone,
the two callsites now call memblock_add_region().

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
mm/memblock.c | 20 +++++++++-----------
1 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 21ba089..eae3dc0 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -309,8 +309,8 @@ static phys_addr_t memblock_align_up(phys_addr_t addr, phys_addr_t size)
return (addr + (size - 1)) & ~(size - 1);
}

-static phys_addr_t __init memblock_alloc_region(phys_addr_t start, phys_addr_t end,
- phys_addr_t size, phys_addr_t align)
+static phys_addr_t __init memblock_find_region(phys_addr_t start, phys_addr_t end,
+ phys_addr_t size, phys_addr_t align)
{
phys_addr_t base, res_base;
long j;
@@ -318,12 +318,8 @@ static phys_addr_t __init memblock_alloc_region(phys_addr_t start, phys_addr_t e
base = memblock_align_down((end - size), align);
while (start <= base) {
j = memblock_overlaps_region(&memblock.reserved, base, size);
- if (j < 0) {
- /* this area isn't reserved, take it */
- if (memblock_add_region(&memblock.reserved, base, size) < 0)
- base = ~(phys_addr_t)0;
+ if (j < 0)
return base;
- }
res_base = memblock.reserved.regions[j].base;
if (res_base < size)
break;
@@ -356,8 +352,9 @@ static phys_addr_t __init memblock_alloc_nid_region(struct memblock_region *mp,

this_end = memblock_nid_range(start, end, &this_nid);
if (this_nid == nid) {
- phys_addr_t ret = memblock_alloc_region(start, this_end, size, align);
- if (ret != ~(phys_addr_t)0)
+ phys_addr_t ret = memblock_find_region(start, this_end, size, align);
+ if (ret != ~(phys_addr_t)0 &&
+ memblock_add_region(&memblock.reserved, ret, size) >= 0)
return ret;
}
start = this_end;
@@ -432,8 +429,9 @@ phys_addr_t __init __memblock_alloc_base(phys_addr_t size, phys_addr_t align, ph
if (memblocksize < size)
continue;
base = min(memblockbase + memblocksize, max_addr);
- res_base = memblock_alloc_region(memblockbase, base, size, align);
- if (res_base != ~(phys_addr_t)0)
+ res_base = memblock_find_region(memblockbase, base, size, align);
+ if (res_base != ~(phys_addr_t)0 &&
+ memblock_add_region(&memblock.reserved, res_base, size) >= 0)
return res_base;
}
return 0;
--
1.6.4.2

2010-07-22 18:24:50

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 23/31] memblock: Separate memblock_alloc_nid() and memblock_alloc_try_nid()

From: Benjamin Herrenschmidt <[email protected]>

The former is now strict, it will fail if it cannot honor the allocation
within the node, while the later implements the previous semantic which
falls back to allocating anywhere.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
arch/sparc/mm/init_64.c | 4 ++--
include/linux/memblock.h | 6 +++++-
mm/memblock.c | 14 ++++++++++++++
3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 0883113..dc584d2 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -820,7 +820,7 @@ static void __init allocate_node_data(int nid)
struct pglist_data *p;

#ifdef CONFIG_NEED_MULTIPLE_NODES
- paddr = memblock_alloc_nid(sizeof(struct pglist_data), SMP_CACHE_BYTES, nid);
+ paddr = memblock_alloc_try_nid(sizeof(struct pglist_data), SMP_CACHE_BYTES, nid);
if (!paddr) {
prom_printf("Cannot allocate pglist_data for nid[%d]\n", nid);
prom_halt();
@@ -840,7 +840,7 @@ static void __init allocate_node_data(int nid)
if (p->node_spanned_pages) {
num_pages = bootmem_bootmap_pages(p->node_spanned_pages);

- paddr = memblock_alloc_nid(num_pages << PAGE_SHIFT, PAGE_SIZE, nid);
+ paddr = memblock_alloc_try_nid(num_pages << PAGE_SHIFT, PAGE_SIZE, nid);
if (!paddr) {
prom_printf("Cannot allocate bootmap for nid[%d]\n",
nid);
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index b69c243..08a12cf 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -50,7 +50,11 @@ extern long __init memblock_reserve(phys_addr_t base, phys_addr_t size);
/* The numa aware allocator is only available if
* CONFIG_ARCH_POPULATES_NODE_MAP is set
*/
-extern phys_addr_t __init memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int nid);
+extern phys_addr_t __init memblock_alloc_nid(phys_addr_t size, phys_addr_t align,
+ int nid);
+extern phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align,
+ int nid);
+
extern phys_addr_t __init memblock_alloc(phys_addr_t size, phys_addr_t align);

/* Flags for memblock_alloc_base() amd __memblock_alloc_base() */
diff --git a/mm/memblock.c b/mm/memblock.c
index a9c15a5..c3c499e 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -537,9 +537,23 @@ phys_addr_t __init memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int n
return ret;
}

+ return 0;
+}
+
+phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid)
+{
+ phys_addr_t res = memblock_alloc_nid(size, align, nid);
+
+ if (res)
+ return res;
return memblock_alloc(size, align);
}

+
+/*
+ * Remaining API functions
+ */
+
/* You must call memblock_analyze() before this. */
phys_addr_t __init memblock_phys_mem_size(void)
{
--
1.6.4.2

2010-07-22 18:25:00

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 03/31] memblock: Introduce for_each_memblock() and new accessors, and use it

From: Benjamin Herrenschmidt <[email protected]>

Walk memblock's using for_each_memblock() and use memblock_region_base/end_pfn() for
getting to PFNs. Update sparc, powerpc, microblaze and sh.

Note: This is -almost- a direct conversion. It doesn't fix some existing
crap when/if memblock's aren't page aligned in the first place. This will be
sorted out separately.

This removes memblock_find() as well, which isn't used anymore

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
arch/microblaze/mm/init.c | 18 +++----
arch/powerpc/mm/hash_utils_64.c | 8 ++--
arch/powerpc/mm/mem.c | 92 ++++++++++++++-------------------------
arch/powerpc/mm/numa.c | 17 ++++---
arch/sh/mm/init.c | 16 ++++---
arch/sparc/mm/init_64.c | 30 +++++--------
include/linux/memblock.h | 56 ++++++++++++++++++------
mm/memblock.c | 32 -------------
8 files changed, 117 insertions(+), 152 deletions(-)

diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c
index afd6494..8f45b41 100644
--- a/arch/microblaze/mm/init.c
+++ b/arch/microblaze/mm/init.c
@@ -70,16 +70,16 @@ static void __init paging_init(void)

void __init setup_memory(void)
{
- int i;
unsigned long map_size;
+ struct memblock_region *reg;
+
#ifndef CONFIG_MMU
u32 kernel_align_start, kernel_align_size;

/* Find main memory where is the kernel */
- for (i = 0; i < memblock.memory.cnt; i++) {
- memory_start = (u32) memblock.memory.regions[i].base;
- memory_end = (u32) memblock.memory.regions[i].base
- + (u32) memblock.memory.region[i].size;
+ for_each_memblock(memory, reg) {
+ memory_start = (u32)reg->base;
+ memory_end = (u32) reg->base + reg->size;
if ((memory_start <= (u32)_text) &&
((u32)_text <= memory_end)) {
memory_size = memory_end - memory_start;
@@ -147,12 +147,10 @@ void __init setup_memory(void)
free_bootmem(memory_start, memory_size);

/* reserve allocate blocks */
- for (i = 0; i < memblock.reserved.cnt; i++) {
+ for_each_memblock(reserved, reg) {
pr_debug("reserved %d - 0x%08x-0x%08x\n", i,
- (u32) memblock.reserved.region[i].base,
- (u32) memblock_size_bytes(&memblock.reserved, i));
- reserve_bootmem(memblock.reserved.region[i].base,
- memblock_size_bytes(&memblock.reserved, i) - 1, BOOTMEM_DEFAULT);
+ (u32) reg->base, (u32) reg->size);
+ reserve_bootmem(reg->base, reg->size, BOOTMEM_DEFAULT);
}
#ifdef CONFIG_MMU
init_bootmem_done = 1;
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index dbaacb7..2b0a807 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -588,7 +588,7 @@ static void __init htab_initialize(void)
unsigned long pteg_count;
unsigned long prot;
unsigned long base = 0, size = 0, limit;
- int i;
+ struct memblock_region *reg;

DBG(" -> htab_initialize()\n");

@@ -659,9 +659,9 @@ static void __init htab_initialize(void)
*/

/* create bolted the linear mapping in the hash table */
- for (i=0; i < memblock.memory.cnt; i++) {
- base = (unsigned long)__va(memblock.memory.regions[i].base);
- size = memblock.memory.region[i].size;
+ for_each_memblock(memory, reg) {
+ base = (unsigned long)__va(reg->base);
+ size = reg->size;

DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
base, size, prot);
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index a33f5c1..52df542 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -82,18 +82,11 @@ int page_is_ram(unsigned long pfn)
return pfn < max_pfn;
#else
unsigned long paddr = (pfn << PAGE_SHIFT);
- int i;
- for (i=0; i < memblock.memory.cnt; i++) {
- unsigned long base;
+ struct memblock_region *reg;

- base = memblock.memory.regions[i].base;
-
- if ((paddr >= base) &&
- (paddr < (base + memblock.memory.regions[i].size))) {
+ for_each_memblock(memory, reg)
+ if (paddr >= reg->base && paddr < (reg->base + reg->size))
return 1;
- }
- }
-
return 0;
#endif
}
@@ -149,23 +142,19 @@ int
walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
void *arg, int (*func)(unsigned long, unsigned long, void *))
{
- struct memblock_region res;
- unsigned long pfn, len;
- u64 end;
+ struct memblock_region *reg;
+ unsigned long end_pfn = start_pfn + nr_pages;
+ unsigned long tstart, tend;
int ret = -1;

- res.base = (u64) start_pfn << PAGE_SHIFT;
- res.size = (u64) nr_pages << PAGE_SHIFT;
-
- end = res.base + res.size - 1;
- while ((res.base < end) && (memblock_find(&res) >= 0)) {
- pfn = (unsigned long)(res.base >> PAGE_SHIFT);
- len = (unsigned long)(res.size >> PAGE_SHIFT);
- ret = (*func)(pfn, len, arg);
+ for_each_memblock(memory, reg) {
+ tstart = max(start_pfn, memblock_region_base_pfn(reg));
+ tend = min(end_pfn, memblock_region_end_pfn(reg));
+ if (tstart >= tend)
+ continue;
+ ret = (*func)(tstart, tend - tstart, arg);
if (ret)
break;
- res.base += (res.size + 1);
- res.size = (end - res.base + 1);
}
return ret;
}
@@ -179,9 +168,9 @@ EXPORT_SYMBOL_GPL(walk_system_ram_range);
#ifndef CONFIG_NEED_MULTIPLE_NODES
void __init do_init_bootmem(void)
{
- unsigned long i;
unsigned long start, bootmap_pages;
unsigned long total_pages;
+ struct memblock_region *reg;
int boot_mapsize;

max_low_pfn = max_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
@@ -204,10 +193,10 @@ void __init do_init_bootmem(void)
boot_mapsize = init_bootmem_node(NODE_DATA(0), start >> PAGE_SHIFT, min_low_pfn, max_low_pfn);

/* Add active regions with valid PFNs */
- for (i = 0; i < memblock.memory.cnt; i++) {
+ for_each_memblock(memory, reg) {
unsigned long start_pfn, end_pfn;
- start_pfn = memblock.memory.regions[i].base >> PAGE_SHIFT;
- end_pfn = start_pfn + memblock_size_pages(&memblock.memory, i);
+ start_pfn = memblock_region_base_pfn(reg);
+ end_pfn = memblock_region_end_pfn(reg);
add_active_range(0, start_pfn, end_pfn);
}

@@ -218,29 +207,21 @@ void __init do_init_bootmem(void)
free_bootmem_with_active_regions(0, lowmem_end_addr >> PAGE_SHIFT);

/* reserve the sections we're already using */
- for (i = 0; i < memblock.reserved.cnt; i++) {
- unsigned long addr = memblock.reserved.regions[i].base +
- memblock_size_bytes(&memblock.reserved, i) - 1;
- if (addr < lowmem_end_addr)
- reserve_bootmem(memblock.reserved.regions[i].base,
- memblock_size_bytes(&memblock.reserved, i),
- BOOTMEM_DEFAULT);
- else if (memblock.reserved.regions[i].base < lowmem_end_addr) {
- unsigned long adjusted_size = lowmem_end_addr -
- memblock.reserved.regions[i].base;
- reserve_bootmem(memblock.reserved.regions[i].base,
- adjusted_size, BOOTMEM_DEFAULT);
+ for_each_memblock(reserved, reg) {
+ unsigned long top = reg->base + reg->size - 1;
+ if (top < lowmem_end_addr)
+ reserve_bootmem(reg->base, reg->size, BOOTMEM_DEFAULT);
+ else if (reg->base < lowmem_end_addr) {
+ unsigned long trunc_size = lowmem_end_addr - reg->base;
+ reserve_bootmem(reg->base, trunc_size, BOOTMEM_DEFAULT);
}
}
#else
free_bootmem_with_active_regions(0, max_pfn);

/* reserve the sections we're already using */
- for (i = 0; i < memblock.reserved.cnt; i++)
- reserve_bootmem(memblock.reserved.regions[i].base,
- memblock_size_bytes(&memblock.reserved, i),
- BOOTMEM_DEFAULT);
-
+ for_each_memblock(reserved, reg)
+ reserve_bootmem(reg->base, reg->size, BOOTMEM_DEFAULT);
#endif
/* XXX need to clip this if using highmem? */
sparse_memory_present_with_active_regions(0);
@@ -251,22 +232,15 @@ void __init do_init_bootmem(void)
/* mark pages that don't exist as nosave */
static int __init mark_nonram_nosave(void)
{
- unsigned long memblock_next_region_start_pfn,
- memblock_region_max_pfn;
- int i;
-
- for (i = 0; i < memblock.memory.cnt - 1; i++) {
- memblock_region_max_pfn =
- (memblock.memory.regions[i].base >> PAGE_SHIFT) +
- (memblock.memory.regions[i].size >> PAGE_SHIFT);
- memblock_next_region_start_pfn =
- memblock.memory.regions[i+1].base >> PAGE_SHIFT;
-
- if (memblock_region_max_pfn < memblock_next_region_start_pfn)
- register_nosave_region(memblock_region_max_pfn,
- memblock_next_region_start_pfn);
+ struct memblock_region *reg, *prev = NULL;
+
+ for_each_memblock(memory, reg) {
+ if (prev &&
+ memblock_region_end_pfn(prev) < memblock_region_base_pfn(reg))
+ register_nosave_region(memblock_region_end_pfn(prev),
+ memblock_region_base_pfn(reg));
+ prev = reg;
}
-
return 0;
}

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index f473645..9eaaf22 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -746,16 +746,17 @@ static void __init setup_nonnuma(void)
unsigned long top_of_ram = memblock_end_of_DRAM();
unsigned long total_ram = memblock_phys_mem_size();
unsigned long start_pfn, end_pfn;
- unsigned int i, nid = 0;
+ unsigned int nid = 0;
+ struct memblock_region *reg;

printk(KERN_DEBUG "Top of RAM: 0x%lx, Total RAM: 0x%lx\n",
top_of_ram, total_ram);
printk(KERN_DEBUG "Memory hole size: %ldMB\n",
(top_of_ram - total_ram) >> 20);

- for (i = 0; i < memblock.memory.cnt; ++i) {
- start_pfn = memblock.memory.region[i].base >> PAGE_SHIFT;
- end_pfn = start_pfn + memblock_size_pages(&memblock.memory, i);
+ for_each_memblock(memory, reg) {
+ start_pfn = memblock_region_base_pfn(reg);
+ end_pfn = memblock_region_end_pfn(reg);

fake_numa_create_new_node(end_pfn, &nid);
add_active_range(nid, start_pfn, end_pfn);
@@ -891,11 +892,11 @@ static struct notifier_block __cpuinitdata ppc64_numa_nb = {
static void mark_reserved_regions_for_nid(int nid)
{
struct pglist_data *node = NODE_DATA(nid);
- int i;
+ struct memblock_region *reg;

- for (i = 0; i < memblock.reserved.cnt; i++) {
- unsigned long physbase = memblock.reserved.region[i].base;
- unsigned long size = memblock.reserved.region[i].size;
+ for_each_memblock(reserved, reg) {
+ unsigned long physbase = reg->base;
+ unsigned long size = reg->size;
unsigned long start_pfn = physbase >> PAGE_SHIFT;
unsigned long end_pfn = PFN_UP(physbase + size);
struct node_active_region node_ar;
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index d0e2491..da1605a 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -226,11 +226,12 @@ static void __init bootmem_init_one_node(unsigned int nid)
* reservations in other nodes.
*/
if (nid == 0) {
+ struct memblock_region *reg;
+
/* Reserve the sections we're already using. */
- for (i = 0; i < memblock.reserved.cnt; i++)
- reserve_bootmem(memblock.reserved.region[i].base,
- memblock_size_bytes(&memblock.reserved, i),
- BOOTMEM_DEFAULT);
+ for_each_memblock(reserved, reg) {
+ reserve_bootmem(reg->base, reg->size, BOOTMEM_DEFAULT);
+ }
}

sparse_memory_present_with_active_regions(nid);
@@ -238,13 +239,14 @@ static void __init bootmem_init_one_node(unsigned int nid)

static void __init do_init_bootmem(void)
{
+ struct memblock_region *reg;
int i;

/* Add active regions with valid PFNs. */
- for (i = 0; i < memblock.memory.cnt; i++) {
+ for_each_memblock(memory, reg) {
unsigned long start_pfn, end_pfn;
- start_pfn = memblock.memory.region[i].base >> PAGE_SHIFT;
- end_pfn = start_pfn + memblock_size_pages(&memblock.memory, i);
+ start_pfn = memblock_region_base_pfn(reg);
+ end_pfn = memblock_region_end_pfn(reg);
__add_active_range(0, start_pfn, end_pfn);
}

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 16d8bee..dd68025 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -972,13 +972,13 @@ int of_node_to_nid(struct device_node *dp)

static void __init add_node_ranges(void)
{
- int i;
+ struct memblock_region *reg;

- for (i = 0; i < memblock.memory.cnt; i++) {
- unsigned long size = memblock_size_bytes(&memblock.memory, i);
+ for_each_memblock(memory, reg) {
+ unsigned long size = reg->size;
unsigned long start, end;

- start = memblock.memory.regions[i].base;
+ start = reg->base;
end = start + size;
while (start < end) {
unsigned long this_end;
@@ -1281,7 +1281,7 @@ static void __init bootmem_init_nonnuma(void)
{
unsigned long top_of_ram = memblock_end_of_DRAM();
unsigned long total_ram = memblock_phys_mem_size();
- unsigned int i;
+ struct memblock_region *reg;

numadbg("bootmem_init_nonnuma()\n");

@@ -1292,15 +1292,14 @@ static void __init bootmem_init_nonnuma(void)

init_node_masks_nonnuma();

- for (i = 0; i < memblock.memory.cnt; i++) {
- unsigned long size = memblock_size_bytes(&memblock.memory, i);
+ for_each_memblock(memory, reg) {
unsigned long start_pfn, end_pfn;

- if (!size)
+ if (!reg->size)
continue;

- start_pfn = memblock.memory.regions[i].base >> PAGE_SHIFT;
- end_pfn = start_pfn + memblock_size_pages(&memblock.memory, i);
+ start_pfn = memblock_region_base_pfn(reg);
+ end_pfn = memblock_region_end_pfn(reg);
add_active_range(0, start_pfn, end_pfn);
}

@@ -1334,17 +1333,12 @@ static void __init reserve_range_in_node(int nid, unsigned long start,

static void __init trim_reserved_in_node(int nid)
{
- int i;
+ struct memblock_region *reg;

numadbg(" trim_reserved_in_node(%d)\n", nid);

- for (i = 0; i < memblock.reserved.cnt; i++) {
- unsigned long start = memblock.reserved.regions[i].base;
- unsigned long size = memblock_size_bytes(&memblock.reserved, i);
- unsigned long end = start + size;
-
- reserve_range_in_node(nid, start, end);
- }
+ for_each_memblock(reserved, reg)
+ reserve_range_in_node(nid, reg->base, reg->base + reg->size);
}

static void __init bootmem_init_one_node(int nid)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 4b69313..d948857 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -58,32 +58,60 @@ extern u64 memblock_end_of_DRAM(void);
extern void __init memblock_enforce_memory_limit(u64 memory_limit);
extern int __init memblock_is_reserved(u64 addr);
extern int memblock_is_region_reserved(u64 base, u64 size);
-extern int memblock_find(struct memblock_region *res);

extern void memblock_dump_all(void);

-static inline u64
-memblock_size_bytes(struct memblock_type *type, unsigned long region_nr)
+/*
+ * pfn conversion functions
+ *
+ * While the memory MEMBLOCKs should always be page aligned, the reserved
+ * MEMBLOCKs may not be. This accessor attempt to provide a very clear
+ * idea of what they return for such non aligned MEMBLOCKs.
+ */
+
+/**
+ * memblock_region_base_pfn - Return the lowest pfn intersecting with the region
+ * @reg: memblock_region structure
+ */
+static inline unsigned long memblock_region_base_pfn(const struct memblock_region *reg)
{
- return type->regions[region_nr].size;
+ return reg->base >> PAGE_SHIFT;
}
-static inline u64
-memblock_size_pages(struct memblock_type *type, unsigned long region_nr)
+
+/**
+ * memblock_region_last_pfn - Return the highest pfn intersecting with the region
+ * @reg: memblock_region structure
+ */
+static inline unsigned long memblock_region_last_pfn(const struct memblock_region *reg)
{
- return memblock_size_bytes(type, region_nr) >> PAGE_SHIFT;
+ return (reg->base + reg->size - 1) >> PAGE_SHIFT;
}
-static inline u64
-memblock_start_pfn(struct memblock_type *type, unsigned long region_nr)
+
+/**
+ * memblock_region_end_pfn - Return the pfn of the first page following the region
+ * but not intersecting it
+ * @reg: memblock_region structure
+ */
+static inline unsigned long memblock_region_end_pfn(const struct memblock_region *reg)
{
- return type->regions[region_nr].base >> PAGE_SHIFT;
+ return memblock_region_last_pfn(reg) + 1;
}
-static inline u64
-memblock_end_pfn(struct memblock_type *type, unsigned long region_nr)
+
+/**
+ * memblock_region_pages - Return the number of pages covering a region
+ * @reg: memblock_region structure
+ */
+static inline unsigned long memblock_region_pages(const struct memblock_region *reg)
{
- return memblock_start_pfn(type, region_nr) +
- memblock_size_pages(type, region_nr);
+ return memblock_region_end_pfn(reg) - memblock_region_end_pfn(reg);
}

+#define for_each_memblock(memblock_type, region) \
+ for (region = memblock.memblock_type.regions; \
+ region < (memblock.memblock_type.regions + memblock.memblock_type.cnt); \
+ region++)
+
+
#endif /* __KERNEL__ */

#endif /* _LINUX_MEMBLOCK_H */
diff --git a/mm/memblock.c b/mm/memblock.c
index 13d4a57..5df1400 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -505,35 +505,3 @@ int memblock_is_region_reserved(u64 base, u64 size)
return memblock_overlaps_region(&memblock.reserved, base, size);
}

-/*
- * Given a <base, len>, find which memory regions belong to this range.
- * Adjust the request and return a contiguous chunk.
- */
-int memblock_find(struct memblock_region *res)
-{
- int i;
- u64 rstart, rend;
-
- rstart = res->base;
- rend = rstart + res->size - 1;
-
- for (i = 0; i < memblock.memory.cnt; i++) {
- u64 start = memblock.memory.regions[i].base;
- u64 end = start + memblock.memory.regions[i].size - 1;
-
- if (start > rend)
- return -1;
-
- if ((end >= rstart) && (start < rend)) {
- /* adjust the request */
- if (rstart < start)
- rstart = start;
- if (rend > end)
- rend = end;
- res->base = rstart;
- res->size = rend - rstart + 1;
- return 0;
- }
- }
- return -1;
-}
--
1.6.4.2

2010-07-22 18:24:54

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 10/31] memblock: Remove unused memblock.debug struct member

From: Benjamin Herrenschmidt <[email protected]>

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
include/linux/memblock.h | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 3b41f61..5abb06b 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -32,7 +32,6 @@ struct memblock_type {
};

struct memblock {
- unsigned long debug;
phys_addr_t current_limit;
struct memblock_type memory;
struct memblock_type reserved;
@@ -55,9 +54,11 @@ extern phys_addr_t __init memblock_alloc(phys_addr_t size, phys_addr_t align);
#define MEMBLOCK_ALLOC_ACCESSIBLE 0

extern phys_addr_t __init memblock_alloc_base(phys_addr_t size,
- phys_addr_t, phys_addr_t max_addr);
+ phys_addr_t align,
+ phys_addr_t max_addr);
extern phys_addr_t __init __memblock_alloc_base(phys_addr_t size,
- phys_addr_t align, phys_addr_t max_addr);
+ phys_addr_t align,
+ phys_addr_t max_addr);
extern phys_addr_t __init memblock_phys_mem_size(void);
extern phys_addr_t memblock_end_of_DRAM(void);
extern void __init memblock_enforce_memory_limit(phys_addr_t memory_limit);
--
1.6.4.2

2010-07-22 18:25:53

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 24/31] memblock: Make memblock_alloc_try_nid() fallback to MEMBLOCK_ALLOC_ANYWHERE

From: Benjamin Herrenschmidt <[email protected]>

memblock_alloc_nid() used to fallback to allocating anywhere by using
memblock_alloc() as a fallback.

However, some of my previous patches limit memblock_alloc() to the region
covered by MEMBLOCK_ALLOC_ACCESSIBLE which is not quite what we want
for memblock_alloc_try_nid().

So we fix it by explicitely using MEMBLOCK_ALLOC_ANYWHERE.

Not that so far only sparc uses memblock_alloc_nid() and it hasn't been updated
to clamp the accessible zone yet. Thus the temporary "breakage" should have
no effect.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
mm/memblock.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index c3c499e..424ca11 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -546,7 +546,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i

if (res)
return res;
- return memblock_alloc(size, align);
+ return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ANYWHERE);
}


--
1.6.4.2

2010-07-22 18:26:10

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 20/31] memblock: Add arch function to control coalescing of memblock memory regions

From: Benjamin Herrenschmidt <[email protected]>

Some archs such as ARM want to avoid coalescing accross things such
as the lowmem/highmem boundary or similar. This provides the option
to control it via an arch callback for which a weak default is provided
which always allows coalescing.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
include/linux/memblock.h | 2 ++
mm/memblock.c | 19 ++++++++++++++++++-
2 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index b839053..15da7d9 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -70,6 +70,8 @@ extern void memblock_dump_all(void);

/* Provided by the architecture */
extern phys_addr_t memblock_nid_range(phys_addr_t start, phys_addr_t end, int *nid);
+extern int memblock_memory_can_coalesce(phys_addr_t addr1, phys_addr_t size1,
+ phys_addr_t addr2, phys_addr_t size2);

/**
* memblock_set_current_limit - Set the current allocation limit to allow
diff --git a/mm/memblock.c b/mm/memblock.c
index 8197f37..f793d9f 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -241,6 +241,12 @@ static int memblock_double_array(struct memblock_type *type)
return 0;
}

+extern int __weak memblock_memory_can_coalesce(phys_addr_t addr1, phys_addr_t size1,
+ phys_addr_t addr2, phys_addr_t size2)
+{
+ return 1;
+}
+
static long memblock_add_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
{
unsigned long coalesced = 0;
@@ -262,6 +268,10 @@ static long memblock_add_region(struct memblock_type *type, phys_addr_t base, ph
return 0;

adjacent = memblock_addrs_adjacent(base, size, rgnbase, rgnsize);
+ /* Check if arch allows coalescing */
+ if (adjacent != 0 && type == &memblock.memory &&
+ !memblock_memory_can_coalesce(base, size, rgnbase, rgnsize))
+ break;
if (adjacent > 0) {
type->regions[i].base -= size;
type->regions[i].size += size;
@@ -274,7 +284,14 @@ static long memblock_add_region(struct memblock_type *type, phys_addr_t base, ph
}
}

- if ((i < type->cnt - 1) && memblock_regions_adjacent(type, i, i+1)) {
+ /* If we plugged a hole, we may want to also coalesce with the
+ * next region
+ */
+ if ((i < type->cnt - 1) && memblock_regions_adjacent(type, i, i+1) &&
+ ((type != &memblock.memory || memblock_memory_can_coalesce(type->regions[i].base,
+ type->regions[i].size,
+ type->regions[i+1].base,
+ type->regions[i+1].size)))) {
memblock_coalesce_regions(type, i, i+1);
coalesced++;
}
--
1.6.4.2

2010-07-22 18:26:31

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 27/31] memblock: Print new doubled array location info

so will have more idea where it is, use memblock_debug to controll it

-v2: use memblock_dbg instead of " if (memblock_debug) "

Signed-off-by: Yinghai Lu <[email protected]>
---
mm/memblock.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index c16d4c6..796ef8c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -192,8 +192,6 @@ static int memblock_double_array(struct memblock_type *type)
if (!memblock_can_resize)
return -1;

- pr_debug("memblock: %s array full, doubling...", memblock_type_name(type));
-
/* Calculate new doubled size */
old_size = type->max * sizeof(struct memblock_region);
new_size = old_size << 1;
@@ -221,6 +219,9 @@ static int memblock_double_array(struct memblock_type *type)
}
new_array = __va(addr);

+ memblock_dbg("memblock: %s array is doubled to %ld at [%#010llx-%#010llx]",
+ memblock_type_name(type), type->max * 2, (u64)addr, (u64)addr + new_size - 1);
+
/* Found space, we now need to move the array over before
* we add the reserved region since it may be our reserved
* array itself that is full.
@@ -646,7 +647,7 @@ static void memblock_dump(struct memblock_type *region, char *name)
base = region->regions[i].base;
size = region->regions[i].size;

- pr_info(" %s[0x%x]\t0x%016llx - 0x%016llx, 0x%llx bytes\n",
+ pr_info(" %s[%#x]\t[%#016llx-%#016llx], %#llx bytes\n",
name, i, base, base + size - 1, size);
}
}
--
1.6.4.2

2010-07-22 18:26:35

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 17/31] memblock: split memblock_find_base() out of __memblock_alloc_base()

From: Benjamin Herrenschmidt <[email protected]>

This will be used by the array resize code and might prove useful
to some arch code as well at which point it can be made non-static.

Also add comment as to why aligning size is important

[ Yinghai Lu ]: it should return MEMBLOCK_ERROR when it fail to find a range

Signed-off-by: Yinghai Lu <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
mm/memblock.c | 58 +++++++++++++++++++++++++++++++++++++-------------------
1 files changed, 38 insertions(+), 20 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 1283893..17403e6 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -345,12 +345,15 @@ phys_addr_t __init memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int n

BUG_ON(0 == size);

+ /* We align the size to limit fragmentation. Without this, a lot of
+ * small allocs quickly eat up the whole reserve array on sparc
+ */
+ size = memblock_align_up(size, align);
+
/* We do a bottom-up search for a region with the right
* nid since that's easier considering how memblock_nid_range()
* works
*/
- size = memblock_align_up(size, align);
-
for (i = 0; i < mem->cnt; i++) {
phys_addr_t ret = memblock_alloc_nid_region(&mem->regions[i],
size, align, nid);
@@ -366,20 +369,7 @@ phys_addr_t __init memblock_alloc(phys_addr_t size, phys_addr_t align)
return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
}

-phys_addr_t __init memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
-{
- phys_addr_t alloc;
-
- alloc = __memblock_alloc_base(size, align, max_addr);
-
- if (alloc == 0)
- panic("ERROR: Failed to allocate 0x%llx bytes below 0x%llx.\n",
- (unsigned long long) size, (unsigned long long) max_addr);
-
- return alloc;
-}
-
-phys_addr_t __init __memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
+static phys_addr_t __init memblock_find_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
{
long i;
phys_addr_t base = 0;
@@ -387,8 +377,6 @@ phys_addr_t __init __memblock_alloc_base(phys_addr_t size, phys_addr_t align, ph

BUG_ON(0 == size);

- size = memblock_align_up(size, align);
-
/* Pump up max_addr */
if (max_addr == MEMBLOCK_ALLOC_ACCESSIBLE)
max_addr = memblock.current_limit;
@@ -405,13 +393,43 @@ phys_addr_t __init __memblock_alloc_base(phys_addr_t size, phys_addr_t align, ph
continue;
base = min(memblockbase + memblocksize, max_addr);
res_base = memblock_find_region(memblockbase, base, size, align);
- if (res_base != MEMBLOCK_ERROR &&
- memblock_add_region(&memblock.reserved, res_base, size) >= 0)
+ if (res_base != MEMBLOCK_ERROR)
return res_base;
}
+ return MEMBLOCK_ERROR;
+}
+
+phys_addr_t __init __memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
+{
+ phys_addr_t found;
+
+ /* We align the size to limit fragmentation. Without this, a lot of
+ * small allocs quickly eat up the whole reserve array on sparc
+ */
+ size = memblock_align_up(size, align);
+
+ found = memblock_find_base(size, align, max_addr);
+ if (found != MEMBLOCK_ERROR &&
+ memblock_add_region(&memblock.reserved, found, size) >= 0)
+ return found;
+
return 0;
}

+phys_addr_t __init memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
+{
+ phys_addr_t alloc;
+
+ alloc = __memblock_alloc_base(size, align, max_addr);
+
+ if (alloc == 0)
+ panic("ERROR: Failed to allocate 0x%llx bytes below 0x%llx.\n",
+ (unsigned long long) size, (unsigned long long) max_addr);
+
+ return alloc;
+}
+
+
/* You must call memblock_analyze() before this. */
phys_addr_t __init memblock_phys_mem_size(void)
{
--
1.6.4.2

2010-07-22 18:26:27

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 09/31] memblock: Change u64 to phys_addr_t

From: Benjamin Herrenschmidt <[email protected]>

Let's not waste space and cycles on archs that don't support >32-bit
physical address space.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
include/linux/memblock.h | 44 +++++++++---------
mm/memblock.c | 114 +++++++++++++++++++++++----------------------
2 files changed, 80 insertions(+), 78 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 767c198..3b41f61 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -21,19 +21,19 @@
#define MAX_MEMBLOCK_REGIONS 128

struct memblock_region {
- u64 base;
- u64 size;
+ phys_addr_t base;
+ phys_addr_t size;
};

struct memblock_type {
unsigned long cnt;
- u64 size;
+ phys_addr_t size;
struct memblock_region regions[MAX_MEMBLOCK_REGIONS+1];
};

struct memblock {
unsigned long debug;
- u64 current_limit;
+ phys_addr_t current_limit;
struct memblock_type memory;
struct memblock_type reserved;
};
@@ -42,32 +42,32 @@ extern struct memblock memblock;

extern void __init memblock_init(void);
extern void __init memblock_analyze(void);
-extern long memblock_add(u64 base, u64 size);
-extern long memblock_remove(u64 base, u64 size);
-extern long __init memblock_free(u64 base, u64 size);
-extern long __init memblock_reserve(u64 base, u64 size);
+extern long memblock_add(phys_addr_t base, phys_addr_t size);
+extern long memblock_remove(phys_addr_t base, phys_addr_t size);
+extern long __init memblock_free(phys_addr_t base, phys_addr_t size);
+extern long __init memblock_reserve(phys_addr_t base, phys_addr_t size);

-extern u64 __init memblock_alloc_nid(u64 size, u64 align, int nid);
-extern u64 __init memblock_alloc(u64 size, u64 align);
+extern phys_addr_t __init memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int nid);
+extern phys_addr_t __init memblock_alloc(phys_addr_t size, phys_addr_t align);

/* Flags for memblock_alloc_base() amd __memblock_alloc_base() */
-#define MEMBLOCK_ALLOC_ANYWHERE (~(u64)0)
+#define MEMBLOCK_ALLOC_ANYWHERE (~(phys_addr_t)0)
#define MEMBLOCK_ALLOC_ACCESSIBLE 0

-extern u64 __init memblock_alloc_base(u64 size,
- u64, u64 max_addr);
-extern u64 __init __memblock_alloc_base(u64 size,
- u64 align, u64 max_addr);
-extern u64 __init memblock_phys_mem_size(void);
-extern u64 memblock_end_of_DRAM(void);
-extern void __init memblock_enforce_memory_limit(u64 memory_limit);
-extern int __init memblock_is_reserved(u64 addr);
-extern int memblock_is_region_reserved(u64 base, u64 size);
+extern phys_addr_t __init memblock_alloc_base(phys_addr_t size,
+ phys_addr_t, phys_addr_t max_addr);
+extern phys_addr_t __init __memblock_alloc_base(phys_addr_t size,
+ phys_addr_t align, phys_addr_t max_addr);
+extern phys_addr_t __init memblock_phys_mem_size(void);
+extern phys_addr_t memblock_end_of_DRAM(void);
+extern void __init memblock_enforce_memory_limit(phys_addr_t memory_limit);
+extern int __init memblock_is_reserved(phys_addr_t addr);
+extern int memblock_is_region_reserved(phys_addr_t base, phys_addr_t size);

extern void memblock_dump_all(void);

/* Provided by the architecture */
-extern u64 memblock_nid_range(u64 start, u64 end, int *nid);
+extern phys_addr_t memblock_nid_range(phys_addr_t start, phys_addr_t end, int *nid);

/**
* memblock_set_current_limit - Set the current allocation limit to allow
@@ -75,7 +75,7 @@ extern u64 memblock_nid_range(u64 start, u64 end, int *nid);
* accessible during boot
* @limit: New limit value (physical address)
*/
-extern void memblock_set_current_limit(u64 limit);
+extern void memblock_set_current_limit(phys_addr_t limit);


/*
diff --git a/mm/memblock.c b/mm/memblock.c
index 43fa162..0c0f787 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -55,13 +55,14 @@ void memblock_dump_all(void)
memblock_dump(&memblock.reserved, "reserved");
}

-static unsigned long memblock_addrs_overlap(u64 base1, u64 size1, u64 base2,
- u64 size2)
+static unsigned long memblock_addrs_overlap(phys_addr_t base1, phys_addr_t size1,
+ phys_addr_t base2, phys_addr_t size2)
{
return ((base1 < (base2 + size2)) && (base2 < (base1 + size1)));
}

-static long memblock_addrs_adjacent(u64 base1, u64 size1, u64 base2, u64 size2)
+static long memblock_addrs_adjacent(phys_addr_t base1, phys_addr_t size1,
+ phys_addr_t base2, phys_addr_t size2)
{
if (base2 == base1 + size1)
return 1;
@@ -72,12 +73,12 @@ static long memblock_addrs_adjacent(u64 base1, u64 size1, u64 base2, u64 size2)
}

static long memblock_regions_adjacent(struct memblock_type *type,
- unsigned long r1, unsigned long r2)
+ unsigned long r1, unsigned long r2)
{
- u64 base1 = type->regions[r1].base;
- u64 size1 = type->regions[r1].size;
- u64 base2 = type->regions[r2].base;
- u64 size2 = type->regions[r2].size;
+ phys_addr_t base1 = type->regions[r1].base;
+ phys_addr_t size1 = type->regions[r1].size;
+ phys_addr_t base2 = type->regions[r2].base;
+ phys_addr_t size2 = type->regions[r2].size;

return memblock_addrs_adjacent(base1, size1, base2, size2);
}
@@ -128,7 +129,7 @@ void __init memblock_analyze(void)
memblock.memory.size += memblock.memory.regions[i].size;
}

-static long memblock_add_region(struct memblock_type *type, u64 base, u64 size)
+static long memblock_add_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
{
unsigned long coalesced = 0;
long adjacent, i;
@@ -141,8 +142,8 @@ static long memblock_add_region(struct memblock_type *type, u64 base, u64 size)

/* First try and coalesce this MEMBLOCK with another. */
for (i = 0; i < type->cnt; i++) {
- u64 rgnbase = type->regions[i].base;
- u64 rgnsize = type->regions[i].size;
+ phys_addr_t rgnbase = type->regions[i].base;
+ phys_addr_t rgnsize = type->regions[i].size;

if ((rgnbase == base) && (rgnsize == size))
/* Already have this region, so we're done */
@@ -192,16 +193,16 @@ static long memblock_add_region(struct memblock_type *type, u64 base, u64 size)
return 0;
}

-long memblock_add(u64 base, u64 size)
+long memblock_add(phys_addr_t base, phys_addr_t size)
{
return memblock_add_region(&memblock.memory, base, size);

}

-static long __memblock_remove(struct memblock_type *type, u64 base, u64 size)
+static long __memblock_remove(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
{
- u64 rgnbegin, rgnend;
- u64 end = base + size;
+ phys_addr_t rgnbegin, rgnend;
+ phys_addr_t end = base + size;
int i;

rgnbegin = rgnend = 0; /* supress gcc warnings */
@@ -246,17 +247,17 @@ static long __memblock_remove(struct memblock_type *type, u64 base, u64 size)
return memblock_add_region(type, end, rgnend - end);
}

-long memblock_remove(u64 base, u64 size)
+long memblock_remove(phys_addr_t base, phys_addr_t size)
{
return __memblock_remove(&memblock.memory, base, size);
}

-long __init memblock_free(u64 base, u64 size)
+long __init memblock_free(phys_addr_t base, phys_addr_t size)
{
return __memblock_remove(&memblock.reserved, base, size);
}

-long __init memblock_reserve(u64 base, u64 size)
+long __init memblock_reserve(phys_addr_t base, phys_addr_t size)
{
struct memblock_type *_rgn = &memblock.reserved;

@@ -265,13 +266,13 @@ long __init memblock_reserve(u64 base, u64 size)
return memblock_add_region(_rgn, base, size);
}

-long memblock_overlaps_region(struct memblock_type *type, u64 base, u64 size)
+long memblock_overlaps_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
{
unsigned long i;

for (i = 0; i < type->cnt; i++) {
- u64 rgnbase = type->regions[i].base;
- u64 rgnsize = type->regions[i].size;
+ phys_addr_t rgnbase = type->regions[i].base;
+ phys_addr_t rgnsize = type->regions[i].size;
if (memblock_addrs_overlap(base, size, rgnbase, rgnsize))
break;
}
@@ -279,20 +280,20 @@ long memblock_overlaps_region(struct memblock_type *type, u64 base, u64 size)
return (i < type->cnt) ? i : -1;
}

-static u64 memblock_align_down(u64 addr, u64 size)
+static phys_addr_t memblock_align_down(phys_addr_t addr, phys_addr_t size)
{
return addr & ~(size - 1);
}

-static u64 memblock_align_up(u64 addr, u64 size)
+static phys_addr_t memblock_align_up(phys_addr_t addr, phys_addr_t size)
{
return (addr + (size - 1)) & ~(size - 1);
}

-static u64 __init memblock_alloc_region(u64 start, u64 end,
- u64 size, u64 align)
+static phys_addr_t __init memblock_alloc_region(phys_addr_t start, phys_addr_t end,
+ phys_addr_t size, phys_addr_t align)
{
- u64 base, res_base;
+ phys_addr_t base, res_base;
long j;

base = memblock_align_down((end - size), align);
@@ -301,7 +302,7 @@ static u64 __init memblock_alloc_region(u64 start, u64 end,
if (j < 0) {
/* this area isn't reserved, take it */
if (memblock_add_region(&memblock.reserved, base, size) < 0)
- base = ~(u64)0;
+ base = ~(phys_addr_t)0;
return base;
}
res_base = memblock.reserved.regions[j].base;
@@ -310,42 +311,43 @@ static u64 __init memblock_alloc_region(u64 start, u64 end,
base = memblock_align_down(res_base - size, align);
}

- return ~(u64)0;
+ return ~(phys_addr_t)0;
}

-u64 __weak __init memblock_nid_range(u64 start, u64 end, int *nid)
+phys_addr_t __weak __init memblock_nid_range(phys_addr_t start, phys_addr_t end, int *nid)
{
*nid = 0;

return end;
}

-static u64 __init memblock_alloc_nid_region(struct memblock_region *mp,
- u64 size, u64 align, int nid)
+static phys_addr_t __init memblock_alloc_nid_region(struct memblock_region *mp,
+ phys_addr_t size,
+ phys_addr_t align, int nid)
{
- u64 start, end;
+ phys_addr_t start, end;

start = mp->base;
end = start + mp->size;

start = memblock_align_up(start, align);
while (start < end) {
- u64 this_end;
+ phys_addr_t this_end;
int this_nid;

this_end = memblock_nid_range(start, end, &this_nid);
if (this_nid == nid) {
- u64 ret = memblock_alloc_region(start, this_end, size, align);
- if (ret != ~(u64)0)
+ phys_addr_t ret = memblock_alloc_region(start, this_end, size, align);
+ if (ret != ~(phys_addr_t)0)
return ret;
}
start = this_end;
}

- return ~(u64)0;
+ return ~(phys_addr_t)0;
}

-u64 __init memblock_alloc_nid(u64 size, u64 align, int nid)
+phys_addr_t __init memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int nid)
{
struct memblock_type *mem = &memblock.memory;
int i;
@@ -359,23 +361,23 @@ u64 __init memblock_alloc_nid(u64 size, u64 align, int nid)
size = memblock_align_up(size, align);

for (i = 0; i < mem->cnt; i++) {
- u64 ret = memblock_alloc_nid_region(&mem->regions[i],
+ phys_addr_t ret = memblock_alloc_nid_region(&mem->regions[i],
size, align, nid);
- if (ret != ~(u64)0)
+ if (ret != ~(phys_addr_t)0)
return ret;
}

return memblock_alloc(size, align);
}

-u64 __init memblock_alloc(u64 size, u64 align)
+phys_addr_t __init memblock_alloc(phys_addr_t size, phys_addr_t align)
{
return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
}

-u64 __init memblock_alloc_base(u64 size, u64 align, u64 max_addr)
+phys_addr_t __init memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
{
- u64 alloc;
+ phys_addr_t alloc;

alloc = __memblock_alloc_base(size, align, max_addr);

@@ -386,11 +388,11 @@ u64 __init memblock_alloc_base(u64 size, u64 align, u64 max_addr)
return alloc;
}

-u64 __init __memblock_alloc_base(u64 size, u64 align, u64 max_addr)
+phys_addr_t __init __memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
{
long i;
- u64 base = 0;
- u64 res_base;
+ phys_addr_t base = 0;
+ phys_addr_t res_base;

BUG_ON(0 == size);

@@ -405,26 +407,26 @@ u64 __init __memblock_alloc_base(u64 size, u64 align, u64 max_addr)
* top of memory
*/
for (i = memblock.memory.cnt - 1; i >= 0; i--) {
- u64 memblockbase = memblock.memory.regions[i].base;
- u64 memblocksize = memblock.memory.regions[i].size;
+ phys_addr_t memblockbase = memblock.memory.regions[i].base;
+ phys_addr_t memblocksize = memblock.memory.regions[i].size;

if (memblocksize < size)
continue;
base = min(memblockbase + memblocksize, max_addr);
res_base = memblock_alloc_region(memblockbase, base, size, align);
- if (res_base != ~(u64)0)
+ if (res_base != ~(phys_addr_t)0)
return res_base;
}
return 0;
}

/* You must call memblock_analyze() before this. */
-u64 __init memblock_phys_mem_size(void)
+phys_addr_t __init memblock_phys_mem_size(void)
{
return memblock.memory.size;
}

-u64 memblock_end_of_DRAM(void)
+phys_addr_t memblock_end_of_DRAM(void)
{
int idx = memblock.memory.cnt - 1;

@@ -432,10 +434,10 @@ u64 memblock_end_of_DRAM(void)
}

/* You must call memblock_analyze() after this. */
-void __init memblock_enforce_memory_limit(u64 memory_limit)
+void __init memblock_enforce_memory_limit(phys_addr_t memory_limit)
{
unsigned long i;
- u64 limit;
+ phys_addr_t limit;
struct memblock_region *p;

if (!memory_limit)
@@ -472,12 +474,12 @@ void __init memblock_enforce_memory_limit(u64 memory_limit)
}
}

-int __init memblock_is_reserved(u64 addr)
+int __init memblock_is_reserved(phys_addr_t addr)
{
int i;

for (i = 0; i < memblock.reserved.cnt; i++) {
- u64 upper = memblock.reserved.regions[i].base +
+ phys_addr_t upper = memblock.reserved.regions[i].base +
memblock.reserved.regions[i].size - 1;
if ((addr >= memblock.reserved.regions[i].base) && (addr <= upper))
return 1;
@@ -485,13 +487,13 @@ int __init memblock_is_reserved(u64 addr)
return 0;
}

-int memblock_is_region_reserved(u64 base, u64 size)
+int memblock_is_region_reserved(phys_addr_t base, phys_addr_t size)
{
return memblock_overlaps_region(&memblock.reserved, base, size);
}


-void __init memblock_set_current_limit(u64 limit)
+void __init memblock_set_current_limit(phys_addr_t limit)
{
memblock.current_limit = limit;
}
--
1.6.4.2

2010-07-22 18:27:22

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 16/31] memblock: Move memblock_init() to the bottom of the file

From: Benjamin Herrenschmidt <[email protected]>

It's a real PITA to have to search for it in the middle

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
mm/memblock.c | 54 +++++++++++++++++++++++++++---------------------------
1 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 5743417..1283893 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -107,33 +107,6 @@ static void memblock_coalesce_regions(struct memblock_type *type,
memblock_remove_region(type, r2);
}

-void __init memblock_init(void)
-{
- /* Hookup the initial arrays */
- memblock.memory.regions = memblock_memory_init_regions;
- memblock.memory.max = INIT_MEMBLOCK_REGIONS;
- memblock.reserved.regions = memblock_reserved_init_regions;
- memblock.reserved.max = INIT_MEMBLOCK_REGIONS;
-
- /* Write a marker in the unused last array entry */
- memblock.memory.regions[INIT_MEMBLOCK_REGIONS].base = (phys_addr_t)RED_INACTIVE;
- memblock.reserved.regions[INIT_MEMBLOCK_REGIONS].base = (phys_addr_t)RED_INACTIVE;
-
- /* Create a dummy zero size MEMBLOCK which will get coalesced away later.
- * This simplifies the memblock_add() code below...
- */
- memblock.memory.regions[0].base = 0;
- memblock.memory.regions[0].size = 0;
- memblock.memory.cnt = 1;
-
- /* Ditto. */
- memblock.reserved.regions[0].base = 0;
- memblock.reserved.regions[0].size = 0;
- memblock.reserved.cnt = 1;
-
- memblock.current_limit = MEMBLOCK_ALLOC_ANYWHERE;
-}
-
void __init memblock_analyze(void)
{
int i;
@@ -517,3 +490,30 @@ void __init memblock_set_current_limit(phys_addr_t limit)
memblock.current_limit = limit;
}

+void __init memblock_init(void)
+{
+ /* Hookup the initial arrays */
+ memblock.memory.regions = memblock_memory_init_regions;
+ memblock.memory.max = INIT_MEMBLOCK_REGIONS;
+ memblock.reserved.regions = memblock_reserved_init_regions;
+ memblock.reserved.max = INIT_MEMBLOCK_REGIONS;
+
+ /* Write a marker in the unused last array entry */
+ memblock.memory.regions[INIT_MEMBLOCK_REGIONS].base = (phys_addr_t)RED_INACTIVE;
+ memblock.reserved.regions[INIT_MEMBLOCK_REGIONS].base = (phys_addr_t)RED_INACTIVE;
+
+ /* Create a dummy zero size MEMBLOCK which will get coalesced away later.
+ * This simplifies the memblock_add() code below...
+ */
+ memblock.memory.regions[0].base = 0;
+ memblock.memory.regions[0].size = 0;
+ memblock.memory.cnt = 1;
+
+ /* Ditto. */
+ memblock.reserved.regions[0].base = 0;
+ memblock.reserved.regions[0].size = 0;
+ memblock.reserved.cnt = 1;
+
+ memblock.current_limit = MEMBLOCK_ALLOC_ANYWHERE;
+}
+
--
1.6.4.2

2010-07-22 18:27:39

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 22/31] memblock: NUMA allocate can now use early_pfn_map

From: Benjamin Herrenschmidt <[email protected]>

We now provide a default (weak) implementation of memblock_nid_range()
which uses the early_pfn_map[] if CONFIG_ARCH_POPULATES_NODE_MAP
is set. Sparc still needs to use its own method due to the way
the pages can be scattered between nodes.

This implementation is inefficient due to our main algorithm and
callback construct wanting to work on an ascending addresses bases
while early_pfn_map[] would rather work with nid's (it's unsorted
at that stage). But it should work and we can look into improving
it subsequently, possibly using arch compile options to chose a
different algorithm alltogether.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
include/linux/memblock.h | 3 +++
mm/memblock.c | 28 +++++++++++++++++++++++++++-
2 files changed, 30 insertions(+), 1 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 15da7d9..b69c243 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -47,6 +47,9 @@ extern long memblock_remove(phys_addr_t base, phys_addr_t size);
extern long __init memblock_free(phys_addr_t base, phys_addr_t size);
extern long __init memblock_reserve(phys_addr_t base, phys_addr_t size);

+/* The numa aware allocator is only available if
+ * CONFIG_ARCH_POPULATES_NODE_MAP is set
+ */
extern phys_addr_t __init memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int nid);
extern phys_addr_t __init memblock_alloc(phys_addr_t size, phys_addr_t align);

diff --git a/mm/memblock.c b/mm/memblock.c
index b4870cf..a9c15a5 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -15,6 +15,7 @@
#include <linux/init.h>
#include <linux/bitops.h>
#include <linux/poison.h>
+#include <linux/pfn.h>
#include <linux/memblock.h>

struct memblock memblock;
@@ -451,11 +452,36 @@ phys_addr_t __init memblock_alloc(phys_addr_t size, phys_addr_t align)
/*
* Additional node-local allocators. Search for node memory is bottom up
* and walks memblock regions within that node bottom-up as well, but allocation
- * within an memblock region is top-down.
+ * within an memblock region is top-down. XXX I plan to fix that at some stage
+ *
+ * WARNING: Only available after early_node_map[] has been populated,
+ * on some architectures, that is after all the calls to add_active_range()
+ * have been done to populate it.
*/

phys_addr_t __weak __init memblock_nid_range(phys_addr_t start, phys_addr_t end, int *nid)
{
+#ifdef CONFIG_ARCH_POPULATES_NODE_MAP
+ /*
+ * This code originates from sparc which really wants use to walk by addresses
+ * and returns the nid. This is not very convenient for early_pfn_map[] users
+ * as the map isn't sorted yet, and it really wants to be walked by nid.
+ *
+ * For now, I implement the inefficient method below which walks the early
+ * map multiple times. Eventually we may want to use an ARCH config option
+ * to implement a completely different method for both case.
+ */
+ unsigned long start_pfn, end_pfn;
+ int i;
+
+ for (i = 0; i < MAX_NUMNODES; i++) {
+ get_pfn_range_for_nid(i, &start_pfn, &end_pfn);
+ if (start < PFN_PHYS(start_pfn) || start >= PFN_PHYS(end_pfn))
+ continue;
+ *nid = i;
+ return min(end, PFN_PHYS(end_pfn));
+ }
+#endif
*nid = 0;

return end;
--
1.6.4.2

2010-07-22 18:22:36

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 29/31] memblock: Prepare to include linux/memblock.h in core file

Need to add protection in linux/memblock.h, to prepare to include it in
mm/page_alloc.c and mm/bootmem.c etc.

Signed-off-by: Yinghai Lu <[email protected]>
---
include/linux/memblock.h | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 89749c4..4aaaf0d 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -2,6 +2,7 @@
#define _LINUX_MEMBLOCK_H
#ifdef __KERNEL__

+#ifdef CONFIG_HAVE_MEMBLOCK
/*
* Logical memory blocks.
*
@@ -147,6 +148,8 @@ static inline unsigned long memblock_region_pages(const struct memblock_region *
region++)


+#endif /* CONFIG_HAVE_MEMBLOCK */
+
#endif /* __KERNEL__ */

#endif /* _LINUX_MEMBLOCK_H */
--
1.6.4.2

2010-07-22 18:27:36

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 15/31] memblock: Define MEMBLOCK_ERROR internally instead of using ~(phys_addr_t)0

From: Benjamin Herrenschmidt <[email protected]>

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
mm/memblock.c | 12 +++++++-----
1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index eae3dc0..5743417 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -22,6 +22,8 @@ static int memblock_debug;
static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS + 1];
static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS + 1];

+#define MEMBLOCK_ERROR (~(phys_addr_t)0)
+
static int __init early_memblock(char *p)
{
if (p && strstr(p, "debug"))
@@ -326,7 +328,7 @@ static phys_addr_t __init memblock_find_region(phys_addr_t start, phys_addr_t en
base = memblock_align_down(res_base - size, align);
}

- return ~(phys_addr_t)0;
+ return MEMBLOCK_ERROR;
}

phys_addr_t __weak __init memblock_nid_range(phys_addr_t start, phys_addr_t end, int *nid)
@@ -353,14 +355,14 @@ static phys_addr_t __init memblock_alloc_nid_region(struct memblock_region *mp,
this_end = memblock_nid_range(start, end, &this_nid);
if (this_nid == nid) {
phys_addr_t ret = memblock_find_region(start, this_end, size, align);
- if (ret != ~(phys_addr_t)0 &&
+ if (ret != MEMBLOCK_ERROR &&
memblock_add_region(&memblock.reserved, ret, size) >= 0)
return ret;
}
start = this_end;
}

- return ~(phys_addr_t)0;
+ return MEMBLOCK_ERROR;
}

phys_addr_t __init memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int nid)
@@ -379,7 +381,7 @@ phys_addr_t __init memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int n
for (i = 0; i < mem->cnt; i++) {
phys_addr_t ret = memblock_alloc_nid_region(&mem->regions[i],
size, align, nid);
- if (ret != ~(phys_addr_t)0)
+ if (ret != MEMBLOCK_ERROR)
return ret;
}

@@ -430,7 +432,7 @@ phys_addr_t __init __memblock_alloc_base(phys_addr_t size, phys_addr_t align, ph
continue;
base = min(memblockbase + memblocksize, max_addr);
res_base = memblock_find_region(memblockbase, base, size, align);
- if (res_base != ~(phys_addr_t)0 &&
+ if (res_base != MEMBLOCK_ERROR &&
memblock_add_region(&memblock.reserved, res_base, size) >= 0)
return res_base;
}
--
1.6.4.2

2010-07-22 18:22:32

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 04/31] memblock: Remove nid_range argument, arch provides memblock_nid_range() instead

From: Benjamin Herrenschmidt <[email protected]>

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
arch/sparc/mm/init_64.c | 16 ++++++----------
include/linux/memblock.h | 7 +++++--
mm/memblock.c | 13 ++++++++-----
3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index dd68025..0883113 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -785,8 +785,7 @@ static int find_node(unsigned long addr)
return -1;
}

-static unsigned long long nid_range(unsigned long long start,
- unsigned long long end, int *nid)
+u64 memblock_nid_range(u64 start, u64 end, int *nid)
{
*nid = find_node(start);
start += PAGE_SIZE;
@@ -804,8 +803,7 @@ static unsigned long long nid_range(unsigned long long start,
return start;
}
#else
-static unsigned long long nid_range(unsigned long long start,
- unsigned long long end, int *nid)
+u64 memblock_nid_range(u64 start, u64 end, int *nid)
{
*nid = 0;
return end;
@@ -822,8 +820,7 @@ static void __init allocate_node_data(int nid)
struct pglist_data *p;

#ifdef CONFIG_NEED_MULTIPLE_NODES
- paddr = memblock_alloc_nid(sizeof(struct pglist_data),
- SMP_CACHE_BYTES, nid, nid_range);
+ paddr = memblock_alloc_nid(sizeof(struct pglist_data), SMP_CACHE_BYTES, nid);
if (!paddr) {
prom_printf("Cannot allocate pglist_data for nid[%d]\n", nid);
prom_halt();
@@ -843,8 +840,7 @@ static void __init allocate_node_data(int nid)
if (p->node_spanned_pages) {
num_pages = bootmem_bootmap_pages(p->node_spanned_pages);

- paddr = memblock_alloc_nid(num_pages << PAGE_SHIFT, PAGE_SIZE, nid,
- nid_range);
+ paddr = memblock_alloc_nid(num_pages << PAGE_SHIFT, PAGE_SIZE, nid);
if (!paddr) {
prom_printf("Cannot allocate bootmap for nid[%d]\n",
nid);
@@ -984,7 +980,7 @@ static void __init add_node_ranges(void)
unsigned long this_end;
int nid;

- this_end = nid_range(start, end, &nid);
+ this_end = memblock_nid_range(start, end, &nid);

numadbg("Adding active range nid[%d] "
"start[%lx] end[%lx]\n",
@@ -1317,7 +1313,7 @@ static void __init reserve_range_in_node(int nid, unsigned long start,
unsigned long this_end;
int n;

- this_end = nid_range(start, end, &n);
+ this_end = memblock_nid_range(start, end, &n);
if (n == nid) {
numadbg(" MATCH reserving range [%lx:%lx]\n",
start, this_end);
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index d948857..3e4a52f 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -46,8 +46,7 @@ extern long memblock_add(u64 base, u64 size);
extern long memblock_remove(u64 base, u64 size);
extern long __init memblock_free(u64 base, u64 size);
extern long __init memblock_reserve(u64 base, u64 size);
-extern u64 __init memblock_alloc_nid(u64 size, u64 align, int nid,
- u64 (*nid_range)(u64, u64, int *));
+extern u64 __init memblock_alloc_nid(u64 size, u64 align, int nid);
extern u64 __init memblock_alloc(u64 size, u64 align);
extern u64 __init memblock_alloc_base(u64 size,
u64, u64 max_addr);
@@ -61,6 +60,10 @@ extern int memblock_is_region_reserved(u64 base, u64 size);

extern void memblock_dump_all(void);

+/* Provided by the architecture */
+extern u64 memblock_nid_range(u64 start, u64 end, int *nid);
+
+
/*
* pfn conversion functions
*
diff --git a/mm/memblock.c b/mm/memblock.c
index 5df1400..83643f3 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -319,7 +319,6 @@ static u64 __init memblock_alloc_nid_unreserved(u64 start, u64 end,
}

static u64 __init memblock_alloc_nid_region(struct memblock_region *mp,
- u64 (*nid_range)(u64, u64, int *),
u64 size, u64 align, int nid)
{
u64 start, end;
@@ -332,7 +331,7 @@ static u64 __init memblock_alloc_nid_region(struct memblock_region *mp,
u64 this_end;
int this_nid;

- this_end = nid_range(start, end, &this_nid);
+ this_end = memblock_nid_range(start, end, &this_nid);
if (this_nid == nid) {
u64 ret = memblock_alloc_nid_unreserved(start, this_end,
size, align);
@@ -345,8 +344,7 @@ static u64 __init memblock_alloc_nid_region(struct memblock_region *mp,
return ~(u64)0;
}

-u64 __init memblock_alloc_nid(u64 size, u64 align, int nid,
- u64 (*nid_range)(u64 start, u64 end, int *nid))
+u64 __init memblock_alloc_nid(u64 size, u64 align, int nid)
{
struct memblock_type *mem = &memblock.memory;
int i;
@@ -357,7 +355,6 @@ u64 __init memblock_alloc_nid(u64 size, u64 align, int nid,

for (i = 0; i < mem->cnt; i++) {
u64 ret = memblock_alloc_nid_region(&mem->regions[i],
- nid_range,
size, align, nid);
if (ret != ~(u64)0)
return ret;
@@ -505,3 +502,9 @@ int memblock_is_region_reserved(u64 base, u64 size)
return memblock_overlaps_region(&memblock.reserved, base, size);
}

+u64 __weak memblock_nid_range(u64 start, u64 end, int *nid)
+{
+ *nid = 0;
+
+ return end;
+}
--
1.6.4.2

2010-07-22 18:28:17

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 28/31] memblock: Export MEMBLOCK_ERROR again

will used by x86 memblock_x86_find_in_range_node and nobootmem replacement

-v2: use 0 instead -1ULL, Suggested by Linus, so we don't need cast them later to unsigned long

Signed-off-by: Yinghai Lu <[email protected]>
---
include/linux/memblock.h | 1 +
mm/memblock.c | 2 --
2 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 70bc467..89749c4 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -19,6 +19,7 @@
#include <asm/memblock.h>

#define INIT_MEMBLOCK_REGIONS 128
+#define MEMBLOCK_ERROR 0

struct memblock_region {
phys_addr_t base;
diff --git a/mm/memblock.c b/mm/memblock.c
index 796ef8c..3d0a754 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -27,8 +27,6 @@ int memblock_can_resize;
static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS + 1];
struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS + 1];

-#define MEMBLOCK_ERROR (~(phys_addr_t)0)
-
/* inline so we don't get a warning when pr_debug is compiled out */
static inline const char *memblock_type_name(struct memblock_type *type)
{
--
1.6.4.2

2010-07-22 18:28:44

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 26/31] memblock: Prepare x86 to use memblock to replace early_res

1. expose memblock_debug
2. expose memblock_reserved_init_regions

-v2: drop memblock_add_region() and MEMBLOCK_ERROR export
-v3: seperate wrong return of memblock_fin_base to another patch
-v4: expose memblock_can_resize to handle x86 EFI that could have more than
128 entries
-v5: add memblock_dbg, so we can spare some if (memblock_debug) suggested by Ingo.

Signed-off-by: Yinghai Lu <[email protected]>
---
include/linux/memblock.h | 6 ++++++
mm/memblock.c | 5 +++--
2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 08a12cf..70bc467 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -39,6 +39,12 @@ struct memblock {
};

extern struct memblock memblock;
+extern int memblock_debug;
+extern int memblock_can_resize;
+extern struct memblock_region memblock_reserved_init_regions[];
+
+#define memblock_dbg(fmt, ...) \
+ if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)

extern void __init memblock_init(void);
extern void __init memblock_analyze(void);
diff --git a/mm/memblock.c b/mm/memblock.c
index b5eb901..c16d4c6 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -22,9 +22,10 @@

struct memblock memblock;

-static int memblock_debug, memblock_can_resize;
+int memblock_debug;
+int memblock_can_resize;
static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS + 1];
-static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS + 1];
+struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS + 1];

#define MEMBLOCK_ERROR (~(phys_addr_t)0)

--
1.6.4.2

2010-07-22 18:29:08

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 18/31] memblock: Move functions around into a more sensible order

From: Benjamin Herrenschmidt <[email protected]>

Some shuffling is needed for doing array resize so we may as well
put some sense into the ordering of the functions in the whole memblock.c
file. No code change. Added some comments.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
mm/memblock.c | 301 ++++++++++++++++++++++++++++++---------------------------
1 files changed, 159 insertions(+), 142 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 17403e6..80d8b85 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -24,40 +24,18 @@ static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIO

#define MEMBLOCK_ERROR (~(phys_addr_t)0)

-static int __init early_memblock(char *p)
-{
- if (p && strstr(p, "debug"))
- memblock_debug = 1;
- return 0;
-}
-early_param("memblock", early_memblock);
+/*
+ * Address comparison utilities
+ */

-static void memblock_dump(struct memblock_type *region, char *name)
+static phys_addr_t memblock_align_down(phys_addr_t addr, phys_addr_t size)
{
- unsigned long long base, size;
- int i;
-
- pr_info(" %s.cnt = 0x%lx\n", name, region->cnt);
-
- for (i = 0; i < region->cnt; i++) {
- base = region->regions[i].base;
- size = region->regions[i].size;
-
- pr_info(" %s[0x%x]\t0x%016llx - 0x%016llx, 0x%llx bytes\n",
- name, i, base, base + size - 1, size);
- }
+ return addr & ~(size - 1);
}

-void memblock_dump_all(void)
+static phys_addr_t memblock_align_up(phys_addr_t addr, phys_addr_t size)
{
- if (!memblock_debug)
- return;
-
- pr_info("MEMBLOCK configuration:\n");
- pr_info(" memory size = 0x%llx\n", (unsigned long long)memblock.memory_size);
-
- memblock_dump(&memblock.memory, "memory");
- memblock_dump(&memblock.reserved, "reserved");
+ return (addr + (size - 1)) & ~(size - 1);
}

static unsigned long memblock_addrs_overlap(phys_addr_t base1, phys_addr_t size1,
@@ -88,6 +66,77 @@ static long memblock_regions_adjacent(struct memblock_type *type,
return memblock_addrs_adjacent(base1, size1, base2, size2);
}

+long memblock_overlaps_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
+{
+ unsigned long i;
+
+ for (i = 0; i < type->cnt; i++) {
+ phys_addr_t rgnbase = type->regions[i].base;
+ phys_addr_t rgnsize = type->regions[i].size;
+ if (memblock_addrs_overlap(base, size, rgnbase, rgnsize))
+ break;
+ }
+
+ return (i < type->cnt) ? i : -1;
+}
+
+/*
+ * Find, allocate, deallocate or reserve unreserved regions. All allocations
+ * are top-down.
+ */
+
+static phys_addr_t __init memblock_find_region(phys_addr_t start, phys_addr_t end,
+ phys_addr_t size, phys_addr_t align)
+{
+ phys_addr_t base, res_base;
+ long j;
+
+ base = memblock_align_down((end - size), align);
+ while (start <= base) {
+ j = memblock_overlaps_region(&memblock.reserved, base, size);
+ if (j < 0)
+ return base;
+ res_base = memblock.reserved.regions[j].base;
+ if (res_base < size)
+ break;
+ base = memblock_align_down(res_base - size, align);
+ }
+
+ return MEMBLOCK_ERROR;
+}
+
+static phys_addr_t __init memblock_find_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
+{
+ long i;
+ phys_addr_t base = 0;
+ phys_addr_t res_base;
+
+ BUG_ON(0 == size);
+
+ size = memblock_align_up(size, align);
+
+ /* Pump up max_addr */
+ if (max_addr == MEMBLOCK_ALLOC_ACCESSIBLE)
+ max_addr = memblock.current_limit;
+
+ /* We do a top-down search, this tends to limit memory
+ * fragmentation by keeping early boot allocs near the
+ * top of memory
+ */
+ for (i = memblock.memory.cnt - 1; i >= 0; i--) {
+ phys_addr_t memblockbase = memblock.memory.regions[i].base;
+ phys_addr_t memblocksize = memblock.memory.regions[i].size;
+
+ if (memblocksize < size)
+ continue;
+ base = min(memblockbase + memblocksize, max_addr);
+ res_base = memblock_find_region(memblockbase, base, size, align);
+ if (res_base != MEMBLOCK_ERROR)
+ return res_base;
+ }
+ return MEMBLOCK_ERROR;
+}
+
static void memblock_remove_region(struct memblock_type *type, unsigned long r)
{
unsigned long i;
@@ -107,22 +156,6 @@ static void memblock_coalesce_regions(struct memblock_type *type,
memblock_remove_region(type, r2);
}

-void __init memblock_analyze(void)
-{
- int i;
-
- /* Check marker in the unused last array entry */
- WARN_ON(memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS].base
- != (phys_addr_t)RED_INACTIVE);
- WARN_ON(memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS].base
- != (phys_addr_t)RED_INACTIVE);
-
- memblock.memory_size = 0;
-
- for (i = 0; i < memblock.memory.cnt; i++)
- memblock.memory_size += memblock.memory.regions[i].size;
-}
-
static long memblock_add_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
{
unsigned long coalesced = 0;
@@ -260,49 +293,47 @@ long __init memblock_reserve(phys_addr_t base, phys_addr_t size)
return memblock_add_region(_rgn, base, size);
}

-long memblock_overlaps_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
+phys_addr_t __init __memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
{
- unsigned long i;
+ phys_addr_t found;

- for (i = 0; i < type->cnt; i++) {
- phys_addr_t rgnbase = type->regions[i].base;
- phys_addr_t rgnsize = type->regions[i].size;
- if (memblock_addrs_overlap(base, size, rgnbase, rgnsize))
- break;
- }
+ /* We align the size to limit fragmentation. Without this, a lot of
+ * small allocs quickly eat up the whole reserve array on sparc
+ */
+ size = memblock_align_up(size, align);

- return (i < type->cnt) ? i : -1;
-}
+ found = memblock_find_base(size, align, max_addr);
+ if (found != MEMBLOCK_ERROR &&
+ memblock_add_region(&memblock.reserved, found, size) >= 0)
+ return found;

-static phys_addr_t memblock_align_down(phys_addr_t addr, phys_addr_t size)
-{
- return addr & ~(size - 1);
+ return 0;
}

-static phys_addr_t memblock_align_up(phys_addr_t addr, phys_addr_t size)
+phys_addr_t __init memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
{
- return (addr + (size - 1)) & ~(size - 1);
+ phys_addr_t alloc;
+
+ alloc = __memblock_alloc_base(size, align, max_addr);
+
+ if (alloc == 0)
+ panic("ERROR: Failed to allocate 0x%llx bytes below 0x%llx.\n",
+ (unsigned long long) size, (unsigned long long) max_addr);
+
+ return alloc;
}

-static phys_addr_t __init memblock_find_region(phys_addr_t start, phys_addr_t end,
- phys_addr_t size, phys_addr_t align)
+phys_addr_t __init memblock_alloc(phys_addr_t size, phys_addr_t align)
{
- phys_addr_t base, res_base;
- long j;
+ return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
+}

- base = memblock_align_down((end - size), align);
- while (start <= base) {
- j = memblock_overlaps_region(&memblock.reserved, base, size);
- if (j < 0)
- return base;
- res_base = memblock.reserved.regions[j].base;
- if (res_base < size)
- break;
- base = memblock_align_down(res_base - size, align);
- }

- return MEMBLOCK_ERROR;
-}
+/*
+ * Additional node-local allocators. Search for node memory is bottom up
+ * and walks memblock regions within that node bottom-up as well, but allocation
+ * within an memblock region is top-down.
+ */

phys_addr_t __weak __init memblock_nid_range(phys_addr_t start, phys_addr_t end, int *nid)
{
@@ -364,72 +395,6 @@ phys_addr_t __init memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int n
return memblock_alloc(size, align);
}

-phys_addr_t __init memblock_alloc(phys_addr_t size, phys_addr_t align)
-{
- return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
-}
-
-static phys_addr_t __init memblock_find_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
-{
- long i;
- phys_addr_t base = 0;
- phys_addr_t res_base;
-
- BUG_ON(0 == size);
-
- /* Pump up max_addr */
- if (max_addr == MEMBLOCK_ALLOC_ACCESSIBLE)
- max_addr = memblock.current_limit;
-
- /* We do a top-down search, this tends to limit memory
- * fragmentation by keeping early boot allocs near the
- * top of memory
- */
- for (i = memblock.memory.cnt - 1; i >= 0; i--) {
- phys_addr_t memblockbase = memblock.memory.regions[i].base;
- phys_addr_t memblocksize = memblock.memory.regions[i].size;
-
- if (memblocksize < size)
- continue;
- base = min(memblockbase + memblocksize, max_addr);
- res_base = memblock_find_region(memblockbase, base, size, align);
- if (res_base != MEMBLOCK_ERROR)
- return res_base;
- }
- return MEMBLOCK_ERROR;
-}
-
-phys_addr_t __init __memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
-{
- phys_addr_t found;
-
- /* We align the size to limit fragmentation. Without this, a lot of
- * small allocs quickly eat up the whole reserve array on sparc
- */
- size = memblock_align_up(size, align);
-
- found = memblock_find_base(size, align, max_addr);
- if (found != MEMBLOCK_ERROR &&
- memblock_add_region(&memblock.reserved, found, size) >= 0)
- return found;
-
- return 0;
-}
-
-phys_addr_t __init memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
-{
- phys_addr_t alloc;
-
- alloc = __memblock_alloc_base(size, align, max_addr);
-
- if (alloc == 0)
- panic("ERROR: Failed to allocate 0x%llx bytes below 0x%llx.\n",
- (unsigned long long) size, (unsigned long long) max_addr);
-
- return alloc;
-}
-
-
/* You must call memblock_analyze() before this. */
phys_addr_t __init memblock_phys_mem_size(void)
{
@@ -508,6 +473,50 @@ void __init memblock_set_current_limit(phys_addr_t limit)
memblock.current_limit = limit;
}

+static void memblock_dump(struct memblock_type *region, char *name)
+{
+ unsigned long long base, size;
+ int i;
+
+ pr_info(" %s.cnt = 0x%lx\n", name, region->cnt);
+
+ for (i = 0; i < region->cnt; i++) {
+ base = region->regions[i].base;
+ size = region->regions[i].size;
+
+ pr_info(" %s[0x%x]\t0x%016llx - 0x%016llx, 0x%llx bytes\n",
+ name, i, base, base + size - 1, size);
+ }
+}
+
+void memblock_dump_all(void)
+{
+ if (!memblock_debug)
+ return;
+
+ pr_info("MEMBLOCK configuration:\n");
+ pr_info(" memory size = 0x%llx\n", (unsigned long long)memblock.memory_size);
+
+ memblock_dump(&memblock.memory, "memory");
+ memblock_dump(&memblock.reserved, "reserved");
+}
+
+void __init memblock_analyze(void)
+{
+ int i;
+
+ /* Check marker in the unused last array entry */
+ WARN_ON(memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS].base
+ != (phys_addr_t)RED_INACTIVE);
+ WARN_ON(memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS].base
+ != (phys_addr_t)RED_INACTIVE);
+
+ memblock.memory_size = 0;
+
+ for (i = 0; i < memblock.memory.cnt; i++)
+ memblock.memory_size += memblock.memory.regions[i].size;
+}
+
void __init memblock_init(void)
{
/* Hookup the initial arrays */
@@ -535,3 +544,11 @@ void __init memblock_init(void)
memblock.current_limit = MEMBLOCK_ALLOC_ANYWHERE;
}

+static int __init early_memblock(char *p)
+{
+ if (p && strstr(p, "debug"))
+ memblock_debug = 1;
+ return 0;
+}
+early_param("memblock", early_memblock);
+
--
1.6.4.2

2010-07-22 18:33:07

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 12/31] memblock: Move memblock arrays to static storage in memblock.c and make their size a variable

From: Benjamin Herrenschmidt <[email protected]>

This is in preparation for having resizable arrays.

Note that we still allocate one more than needed, this is unchanged from
the previous implementation.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
include/linux/memblock.h | 7 ++++---
mm/memblock.c | 10 +++++++++-
2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index e849d31..b839053 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -18,7 +18,7 @@

#include <asm/memblock.h>

-#define MAX_MEMBLOCK_REGIONS 128
+#define INIT_MEMBLOCK_REGIONS 128

struct memblock_region {
phys_addr_t base;
@@ -26,8 +26,9 @@ struct memblock_region {
};

struct memblock_type {
- unsigned long cnt;
- struct memblock_region regions[MAX_MEMBLOCK_REGIONS+1];
+ unsigned long cnt; /* number of regions */
+ unsigned long max; /* size of the allocated array */
+ struct memblock_region *regions;
};

struct memblock {
diff --git a/mm/memblock.c b/mm/memblock.c
index 78c2394..e1c5ce3 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -18,6 +18,8 @@
struct memblock memblock;

static int memblock_debug;
+static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS + 1];
+static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS + 1];

static int __init early_memblock(char *p)
{
@@ -104,6 +106,12 @@ static void memblock_coalesce_regions(struct memblock_type *type,

void __init memblock_init(void)
{
+ /* Hookup the initial arrays */
+ memblock.memory.regions = memblock_memory_init_regions;
+ memblock.memory.max = INIT_MEMBLOCK_REGIONS;
+ memblock.reserved.regions = memblock_reserved_init_regions;
+ memblock.reserved.max = INIT_MEMBLOCK_REGIONS;
+
/* Create a dummy zero size MEMBLOCK which will get coalesced away later.
* This simplifies the memblock_add() code below...
*/
@@ -169,7 +177,7 @@ static long memblock_add_region(struct memblock_type *type, phys_addr_t base, ph

if (coalesced)
return coalesced;
- if (type->cnt >= MAX_MEMBLOCK_REGIONS)
+ if (type->cnt >= type->max)
return -1;

/* Couldn't coalesce the MEMBLOCK, so add it to the sorted table. */
--
1.6.4.2

2010-07-22 18:39:27

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 11/31] memblock: Remove memblock_type.size and add memblock.memory_size instead

From: Benjamin Herrenschmidt <[email protected]>

Right now, both the "memory" and "reserved" memblock_type structures have
a "size" member. It's unused in the later case, and represent the
calculated memory size in the later case.

This moves it out to the main memblock structure instead

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---
arch/powerpc/mm/mem.c | 2 +-
include/linux/memblock.h | 2 +-
mm/memblock.c | 8 ++++----
3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 52df542..f661f6c 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -301,7 +301,7 @@ void __init mem_init(void)
swiotlb_init(1);
#endif

- num_physpages = memblock.memory.size >> PAGE_SHIFT;
+ num_physpages = memblock_phys_mem_size() >> PAGE_SHIFT;
high_memory = (void *) __va(max_low_pfn * PAGE_SIZE);

#ifdef CONFIG_NEED_MULTIPLE_NODES
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 5abb06b..e849d31 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -27,12 +27,12 @@ struct memblock_region {

struct memblock_type {
unsigned long cnt;
- phys_addr_t size;
struct memblock_region regions[MAX_MEMBLOCK_REGIONS+1];
};

struct memblock {
phys_addr_t current_limit;
+ phys_addr_t memory_size; /* Updated by memblock_analyze() */
struct memblock_type memory;
struct memblock_type reserved;
};
diff --git a/mm/memblock.c b/mm/memblock.c
index 0c0f787..78c2394 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -49,7 +49,7 @@ void memblock_dump_all(void)
return;

pr_info("MEMBLOCK configuration:\n");
- pr_info(" memory.size = 0x%llx\n", (unsigned long long)memblock.memory.size);
+ pr_info(" memory size = 0x%llx\n", (unsigned long long)memblock.memory_size);

memblock_dump(&memblock.memory, "memory");
memblock_dump(&memblock.reserved, "reserved");
@@ -123,10 +123,10 @@ void __init memblock_analyze(void)
{
int i;

- memblock.memory.size = 0;
+ memblock.memory_size = 0;

for (i = 0; i < memblock.memory.cnt; i++)
- memblock.memory.size += memblock.memory.regions[i].size;
+ memblock.memory_size += memblock.memory.regions[i].size;
}

static long memblock_add_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
@@ -423,7 +423,7 @@ phys_addr_t __init __memblock_alloc_base(phys_addr_t size, phys_addr_t align, ph
/* You must call memblock_analyze() before this. */
phys_addr_t __init memblock_phys_mem_size(void)
{
- return memblock.memory.size;
+ return memblock.memory_size;
}

phys_addr_t memblock_end_of_DRAM(void)
--
1.6.4.2

2010-07-22 21:37:36

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH -v26 00/31] generic changes for memblock

On Thu, 2010-07-22 at 11:20 -0700, Yinghai Lu wrote:
> New memblock could be used to replace early_res in x86.
>
> Suggested by: David, Ben, and Thomas
>
> -v25: update to mainline with kmemleak fix on nobootmem
> also rename lmb to memblock alread in mainline
>
> -v26: according to Linus and hpa, seperate the big patchset to small ones.
>
> This one is rebase of Ben's changeset to current mainline/tip
>
> Last 6 are needed for x86 memblock transistion, but change mm/memblock.c

Are there any change from my original series other than an automated
rebase ? If yes, let me know as I do plan to do that rebase myself, I
just haven't got to it yet.

Ben.


> Thanks
>
> Yinghai Lu
>
> [PATCH 01/31] memblock: Rename memblock_region to memblock_type and memblock_property to memblock_region
> [PATCH 02/31] memblock: No reason to include asm/memblock.h late
> [PATCH 03/31] memblock: Introduce for_each_memblock() and new accessors, and use it
> [PATCH 04/31] memblock: Remove nid_range argument, arch provides memblock_nid_range() instead
> [PATCH 05/31] memblock: Factor the lowest level alloc function
> [PATCH 06/31] memblock: Expose MEMBLOCK_ALLOC_ANYWHERE
> [PATCH 07/31] memblock: Introduce default allocation limit and use it to replace explicit ones
> [PATCH 08/31] memblock: Remove rmo_size, burry it in arch/powerpc where it belongs
> [PATCH 09/31] memblock: Change u64 to phys_addr_t
> [PATCH 10/31] memblock: Remove unused memblock.debug struct member
> [PATCH 11/31] memblock: Remove memblock_type.size and add memblock.memory_size instead
> [PATCH 12/31] memblock: Move memblock arrays to static storage in memblock.c and make their size a variable
> [PATCH 13/31] memblock: Add debug markers at the end of the array
> [PATCH 14/31] memblock: Make memblock_find_region() out of memblock_alloc_region()
> [PATCH 15/31] memblock: Define MEMBLOCK_ERROR internally instead of using ~(phys_addr_t)0
> [PATCH 16/31] memblock: Move memblock_init() to the bottom of the file
> [PATCH 17/31] memblock: split memblock_find_base() out of __memblock_alloc_base()
> [PATCH 18/31] memblock: Move functions around into a more sensible order
> [PATCH 19/31] memblock: Add array resizing support
> [PATCH 20/31] memblock: Add arch function to control coalescing of memblock memory regions
> [PATCH 21/31] memblock: Add "start" argument to memblock_find_base()
> [PATCH 22/31] memblock: NUMA allocate can now use early_pfn_map
> [PATCH 23/31] memblock: Separate memblock_alloc_nid() and memblock_alloc_try_nid()
> [PATCH 24/31] memblock: Make memblock_alloc_try_nid() fallback to MEMBLOCK_ALLOC_ANYWHERE
> [PATCH 25/31] memblock: Add debugfs files to dump the arrays content
> [PATCH 26/31] memblock: Prepare x86 to use memblock to replace early_res
> [PATCH 27/31] memblock: Print new doubled array location info
> [PATCH 28/31] memblock: Export MEMBLOCK_ERROR again
> [PATCH 29/31] memblock: Prepare to include linux/memblock.h in core file
> [PATCH 30/31] memblock: Add ARCH_DISCARD_MEMBLOCK to put memblock code to .init
> [PATCH 31/31] memblock: Add memblock_find_in_range()
>
> arch/microblaze/include/asm/memblock.h | 3 -
> arch/microblaze/mm/init.c | 18 +-
> arch/powerpc/include/asm/memblock.h | 7 -
> arch/powerpc/include/asm/mmu.h | 12 +
> arch/powerpc/kernel/head_40x.S | 6 +-
> arch/powerpc/kernel/paca.c | 2 +-
> arch/powerpc/kernel/prom.c | 15 +-
> arch/powerpc/kernel/rtas.c | 2 +-
> arch/powerpc/kernel/setup_32.c | 2 +-
> arch/powerpc/kernel/setup_64.c | 2 +-
> arch/powerpc/mm/40x_mmu.c | 17 +-
> arch/powerpc/mm/44x_mmu.c | 14 +
> arch/powerpc/mm/fsl_booke_mmu.c | 12 +-
> arch/powerpc/mm/hash_utils_64.c | 35 ++-
> arch/powerpc/mm/init_32.c | 43 +-
> arch/powerpc/mm/init_64.c | 1 +
> arch/powerpc/mm/mem.c | 94 ++---
> arch/powerpc/mm/numa.c | 17 +-
> arch/powerpc/mm/ppc_mmu_32.c | 18 +-
> arch/powerpc/mm/tlb_nohash.c | 16 +
> arch/powerpc/platforms/embedded6xx/wii.c | 2 +-
> arch/sh/include/asm/memblock.h | 2 -
> arch/sh/mm/init.c | 16 +-
> arch/sparc/include/asm/memblock.h | 2 -
> arch/sparc/mm/init_64.c | 46 +--
> include/linux/memblock.h | 162 +++++--
> mm/memblock.c | 764 +++++++++++++++++++-----------
> 27 files changed, 846 insertions(+), 484 deletions(-)

2010-07-22 22:01:48

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH -v26 00/31] generic changes for memblock

On 07/22/2010 02:35 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-07-22 at 11:20 -0700, Yinghai Lu wrote:
>> New memblock could be used to replace early_res in x86.
>>
>> Suggested by: David, Ben, and Thomas
>>
>> -v25: update to mainline with kmemleak fix on nobootmem
>> also rename lmb to memblock alread in mainline
>>
>> -v26: according to Linus and hpa, seperate the big patchset to small ones.
>>
>> This one is rebase of Ben's changeset to current mainline/tip
>>
>> Last 6 are needed for x86 memblock transistion, but change mm/memblock.c
>
> Are there any change from my original series other than an automated
> rebase ? If yes, let me know as I do plan to do that rebase myself, I
> just haven't got to it yet.
>
>>
>> [PATCH 01/31] memblock: Rename memblock_region to memblock_type and memblock_property to memblock_region
>> [PATCH 02/31] memblock: No reason to include asm/memblock.h late
>> [PATCH 03/31] memblock: Introduce for_each_memblock() and new accessors, and use it
>> [PATCH 04/31] memblock: Remove nid_range argument, arch provides memblock_nid_range() instead
>> [PATCH 05/31] memblock: Factor the lowest level alloc function
>> [PATCH 06/31] memblock: Expose MEMBLOCK_ALLOC_ANYWHERE
>> [PATCH 07/31] memblock: Introduce default allocation limit and use it to replace explicit ones
>> [PATCH 08/31] memblock: Remove rmo_size, burry it in arch/powerpc where it belongs
>> [PATCH 09/31] memblock: Change u64 to phys_addr_t
>> [PATCH 10/31] memblock: Remove unused memblock.debug struct member
>> [PATCH 11/31] memblock: Remove memblock_type.size and add memblock.memory_size instead
>> [PATCH 12/31] memblock: Move memblock arrays to static storage in memblock.c and make their size a variable
>> [PATCH 13/31] memblock: Add debug markers at the end of the array
>> [PATCH 14/31] memblock: Make memblock_find_region() out of memblock_alloc_region()
>> [PATCH 15/31] memblock: Define MEMBLOCK_ERROR internally instead of using ~(phys_addr_t)0
>> [PATCH 16/31] memblock: Move memblock_init() to the bottom of the file
>> [PATCH 17/31] memblock: split memblock_find_base() out of __memblock_alloc_base()

i folded the patch that make make memblock_find_base() to return MEMBLOCK_ERROR into patch 17
and following one or two patches need to change too accordingly.

>> [PATCH 18/31] memblock: Move functions around into a more sensible order
>> [PATCH 19/31] memblock: Add array resizing support
>> [PATCH 20/31] memblock: Add arch function to control coalescing of memblock memory regions
>> [PATCH 21/31] memblock: Add "start" argument to memblock_find_base()
>> [PATCH 22/31] memblock: NUMA allocate can now use early_pfn_map
>> [PATCH 23/31] memblock: Separate memblock_alloc_nid() and memblock_alloc_try_nid()
>> [PATCH 24/31] memblock: Make memblock_alloc_try_nid() fallback to MEMBLOCK_ALLOC_ANYWHERE
>> [PATCH 25/31] memblock: Add debugfs files to dump the arrays content



>> [PATCH 26/31] memblock: Prepare x86 to use memblock to replace early_res
>> [PATCH 27/31] memblock: Print new doubled array location info
>> [PATCH 28/31] memblock: Export MEMBLOCK_ERROR again
>> [PATCH 29/31] memblock: Prepare to include linux/memblock.h in core file
>> [PATCH 30/31] memblock: Add ARCH_DISCARD_MEMBLOCK to put memblock code to .init
>> [PATCH 31/31] memblock: Add memblock_find_in_range()

please check if you can those 6 into your new branch.

Thanks

Yinghai

2010-07-28 05:16:06

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 28/31] memblock: Export MEMBLOCK_ERROR again

On Thu, 2010-07-22 at 11:21 -0700, Yinghai Lu wrote:
> will used by x86 memblock_x86_find_in_range_node and nobootmem replacement
>
> -v2: use 0 instead -1ULL, Suggested by Linus, so we don't need cast them later to unsigned long

The patch in its current form is a NAK.

You can't just do those two things in one commit.

If we're going to switch LMB errors to always be 0, we need to ensure we
cannot realistically hand out 0 as a result of lmb_alloc().

I'll cook up a patch to do that.

Ben.

> Signed-off-by: Yinghai Lu <[email protected]>
> ---
> include/linux/memblock.h | 1 +
> mm/memblock.c | 2 --
> 2 files changed, 1 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 70bc467..89749c4 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -19,6 +19,7 @@
> #include <asm/memblock.h>
>
> #define INIT_MEMBLOCK_REGIONS 128
> +#define MEMBLOCK_ERROR 0
>
> struct memblock_region {
> phys_addr_t base;
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 796ef8c..3d0a754 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -27,8 +27,6 @@ int memblock_can_resize;
> static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS + 1];
> struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS + 1];
>
> -#define MEMBLOCK_ERROR (~(phys_addr_t)0)
> -
> /* inline so we don't get a warning when pr_debug is compiled out */
> static inline const char *memblock_type_name(struct memblock_type *type)
> {

2010-07-28 05:20:09

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 28/31] memblock: Export MEMBLOCK_ERROR again

On Wed, 2010-07-28 at 15:15 +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2010-07-22 at 11:21 -0700, Yinghai Lu wrote:
> > will used by x86 memblock_x86_find_in_range_node and nobootmem replacement
> >
> > -v2: use 0 instead -1ULL, Suggested by Linus, so we don't need cast them later to unsigned long
>
> The patch in its current form is a NAK.
>
> You can't just do those two things in one commit.
>
> If we're going to switch LMB errors to always be 0, we need to ensure we
> cannot realistically hand out 0 as a result of lmb_alloc().
>
> I'll cook up a patch to do that.

Screw it, I don't like it but I'll just split your patch in two for now
and keep 0. It's a bit fishy but memblock does mostly top-down
allocations and so shouldn't hit 0, and in practice the region at 0 is,
I beleive, reserved, but we need to be extra careful and might need to
revisit that a bit.

That's an area where I don't completely agree with Linus, ie, 0 is a
perfectly valid physical address for memblock to return :-)

Cheers,
Ben.

> Ben.
>
> > Signed-off-by: Yinghai Lu <[email protected]>
> > ---
> > include/linux/memblock.h | 1 +
> > mm/memblock.c | 2 --
> > 2 files changed, 1 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> > index 70bc467..89749c4 100644
> > --- a/include/linux/memblock.h
> > +++ b/include/linux/memblock.h
> > @@ -19,6 +19,7 @@
> > #include <asm/memblock.h>
> >
> > #define INIT_MEMBLOCK_REGIONS 128
> > +#define MEMBLOCK_ERROR 0
> >
> > struct memblock_region {
> > phys_addr_t base;
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index 796ef8c..3d0a754 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -27,8 +27,6 @@ int memblock_can_resize;
> > static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS + 1];
> > struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS + 1];
> >
> > -#define MEMBLOCK_ERROR (~(phys_addr_t)0)
> > -
> > /* inline so we don't get a warning when pr_debug is compiled out */
> > static inline const char *memblock_type_name(struct memblock_type *type)
> > {
>

2010-07-28 05:24:52

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 28/31] memblock: Export MEMBLOCK_ERROR again

On Wed, 2010-07-28 at 15:15 +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2010-07-22 at 11:21 -0700, Yinghai Lu wrote:
> > will used by x86 memblock_x86_find_in_range_node and nobootmem replacement
> >
> > -v2: use 0 instead -1ULL, Suggested by Linus, so we don't need cast them later to unsigned long
>
> The patch in its current form is a NAK.
>
> You can't just do those two things in one commit.
>
> If we're going to switch LMB errors to always be 0, we need to ensure we
> cannot realistically hand out 0 as a result of lmb_alloc().
>
> I'll cook up a patch to do that.

BTW. After that, maybe send a patch completely removing
MEMBLOCK_ERROR ?

I find 0 to be self-explanatory enough.

Cheers,
Ben.

> Ben.
>
> > Signed-off-by: Yinghai Lu <[email protected]>
> > ---
> > include/linux/memblock.h | 1 +
> > mm/memblock.c | 2 --
> > 2 files changed, 1 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> > index 70bc467..89749c4 100644
> > --- a/include/linux/memblock.h
> > +++ b/include/linux/memblock.h
> > @@ -19,6 +19,7 @@
> > #include <asm/memblock.h>
> >
> > #define INIT_MEMBLOCK_REGIONS 128
> > +#define MEMBLOCK_ERROR 0
> >
> > struct memblock_region {
> > phys_addr_t base;
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index 796ef8c..3d0a754 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -27,8 +27,6 @@ int memblock_can_resize;
> > static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS + 1];
> > struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS + 1];
> >
> > -#define MEMBLOCK_ERROR (~(phys_addr_t)0)
> > -
> > /* inline so we don't get a warning when pr_debug is compiled out */
> > static inline const char *memblock_type_name(struct memblock_type *type)
> > {
>

2010-07-28 05:28:00

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 29/31] memblock: Prepare to include linux/memblock.h in core file

On Thu, 2010-07-22 at 11:21 -0700, Yinghai Lu wrote:
> Need to add protection in linux/memblock.h, to prepare to include it in
> mm/page_alloc.c and mm/bootmem.c etc.

Not really no ... having the definitions shouldn't hurt whether memblock
is used or not, no ?

I'm going to keep that patch in the queue because it has one advantage
that you don't mention, which is to more easily catch attempts to
use those functions when CONFIG_HAVE_MEMBLOCK isn't set, but I'll update
the commit message.

Ben.

> Signed-off-by: Yinghai Lu <[email protected]>
> ---
> include/linux/memblock.h | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 89749c4..4aaaf0d 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -2,6 +2,7 @@
> #define _LINUX_MEMBLOCK_H
> #ifdef __KERNEL__
>
> +#ifdef CONFIG_HAVE_MEMBLOCK
> /*
> * Logical memory blocks.
> *
> @@ -147,6 +148,8 @@ static inline unsigned long memblock_region_pages(const struct memblock_region *
> region++)
>
>
> +#endif /* CONFIG_HAVE_MEMBLOCK */
> +
> #endif /* __KERNEL__ */
>
> #endif /* _LINUX_MEMBLOCK_H */

2010-07-28 05:37:31

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 31/31] memblock: Add memblock_find_in_range()

On Thu, 2010-07-22 at 11:21 -0700, Yinghai Lu wrote:
> it is a wrapper for memblock_find_base
>
> make it more easy for x86 to use memblock. ( rebase )
> x86 early_res is using find/reserve pattern instead of alloc.
>
> keep it in weak version, so later We can use x86 own version if needed.
> also We need it in mm/memblock.c, so one caller mm/page_alloc.c could get compiled
>
> -v2: Change name to memblock_find_in_range() according to Michael Ellerman
> -v3: Add generic weak version __memblock_find_in_range()
> so keep the path for fallback to x86 version that handle from low
> -v4: use 0 for failing path
> -v5: use MEMBLOCK_ERROR again
> -v6: remove __memblock_find_in_range()

It's very gross to have this weak and not memblock_find_base()... IE.

You create a new function defined as a wrapper on an existing one to
provide an easier set of arguments ... but also make it weak so the
arch can completely change its semantics without actually changing
the semantics of the function it wraps.

This is going to cause confusion and bugs. I'm adding the patch without
the weak bit to my branch for now, we need to discuss what is the best
approach for x86 here. Might be to use a different function. I don't
understand yet -why- x86 needs to override it, maybe the right way is
to reserve things more intelligently on x86 ?

In any case, you can always use your own wrapper there if needed

Cheers,
Ben.

> Signed-off-by: Yinghai Lu <[email protected]>
> ---
> include/linux/memblock.h | 2 ++
> mm/memblock.c | 8 ++++++++
> 2 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 751a4eb..61b22eb 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -48,6 +48,8 @@ extern struct memblock_region memblock_reserved_init_regions[];
> #define memblock_dbg(fmt, ...) \
> if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
>
> +u64 memblock_find_in_range(u64 start, u64 end, u64 size, u64 align);
> +
> extern void __init memblock_init(void);
> extern void __init memblock_analyze(void);
> extern long memblock_add(phys_addr_t base, phys_addr_t size);
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 7471dac..ca7de91 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -156,6 +156,14 @@ static phys_addr_t __init memblock_find_base(phys_addr_t size, phys_addr_t align
> return MEMBLOCK_ERROR;
> }
>
> +/*
> + * Find a free area with specified alignment in a specific range.
> + */
> +u64 __init __weak memblock_find_in_range(u64 start, u64 end, u64 size, u64 align)
> +{
> + return memblock_find_base(size, align, start, end);
> +}
> +
> static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r)
> {
> unsigned long i;

2010-07-28 05:45:38

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 30/31] memblock: Add ARCH_DISCARD_MEMBLOCK to put memblock code to .init

On Thu, 2010-07-22 at 11:21 -0700, Yinghai Lu wrote:
> So those memblock bits could be released after kernel is booted up.
>
> Arch code could define ARCH_DISCARD_MEMBLOCK in asm/memblock.h,
> __init_memblock will become __init, __initdata_memblock will becom __initdata

Actually, that sucks.

A bunch of memblock is already __init. The bits that aren't fall mostly
into two categories:

- Stuff used by powerpc memory hotplug. Most of this could be covered
by __meminit

- The arrays themselves. We really only care about keeping the memory
array and a couple of things to walk it.

I'm merging the patch now to avoid further delays but we need to
revisit that too.

Ben.


> x86 code will use that.
>
> if ARCH_DISCARD_MEMBLOCK is defined, debugfs is not used
>
> -v2: use ARCH_DISCARD_MEMBLOCK according to Michael Ellerman
>
> Signed-off-by: Yinghai Lu <[email protected]>
> ---
> include/linux/memblock.h | 8 +++++++
> mm/memblock.c | 48 +++++++++++++++++++++++-----------------------
> 2 files changed, 32 insertions(+), 24 deletions(-)
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 4aaaf0d..751a4eb 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -148,6 +148,14 @@ static inline unsigned long memblock_region_pages(const struct memblock_region *
> region++)
>
>
> +#ifdef ARCH_DISCARD_MEMBLOCK
> +#define __init_memblock __init
> +#define __initdata_memblock __initdata
> +#else
> +#define __init_memblock
> +#define __initdata_memblock
> +#endif
> +
> #endif /* CONFIG_HAVE_MEMBLOCK */
>
> #endif /* __KERNEL__ */
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 3d0a754..7471dac 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -20,12 +20,12 @@
> #include <linux/seq_file.h>
> #include <linux/memblock.h>
>
> -struct memblock memblock;
> +struct memblock memblock __initdata_memblock;
>
> -int memblock_debug;
> -int memblock_can_resize;
> -static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS + 1];
> -struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS + 1];
> +int memblock_debug __initdata_memblock;
> +int memblock_can_resize __initdata_memblock;
> +static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS + 1] __initdata_memblock;
> +struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS + 1] __initdata_memblock;
>
> /* inline so we don't get a warning when pr_debug is compiled out */
> static inline const char *memblock_type_name(struct memblock_type *type)
> @@ -42,23 +42,23 @@ static inline const char *memblock_type_name(struct memblock_type *type)
> * Address comparison utilities
> */
>
> -static phys_addr_t memblock_align_down(phys_addr_t addr, phys_addr_t size)
> +static phys_addr_t __init_memblock memblock_align_down(phys_addr_t addr, phys_addr_t size)
> {
> return addr & ~(size - 1);
> }
>
> -static phys_addr_t memblock_align_up(phys_addr_t addr, phys_addr_t size)
> +static phys_addr_t __init_memblock memblock_align_up(phys_addr_t addr, phys_addr_t size)
> {
> return (addr + (size - 1)) & ~(size - 1);
> }
>
> -static unsigned long memblock_addrs_overlap(phys_addr_t base1, phys_addr_t size1,
> +static unsigned long __init_memblock memblock_addrs_overlap(phys_addr_t base1, phys_addr_t size1,
> phys_addr_t base2, phys_addr_t size2)
> {
> return ((base1 < (base2 + size2)) && (base2 < (base1 + size1)));
> }
>
> -static long memblock_addrs_adjacent(phys_addr_t base1, phys_addr_t size1,
> +static long __init_memblock memblock_addrs_adjacent(phys_addr_t base1, phys_addr_t size1,
> phys_addr_t base2, phys_addr_t size2)
> {
> if (base2 == base1 + size1)
> @@ -69,7 +69,7 @@ static long memblock_addrs_adjacent(phys_addr_t base1, phys_addr_t size1,
> return 0;
> }
>
> -static long memblock_regions_adjacent(struct memblock_type *type,
> +static long __init_memblock memblock_regions_adjacent(struct memblock_type *type,
> unsigned long r1, unsigned long r2)
> {
> phys_addr_t base1 = type->regions[r1].base;
> @@ -80,7 +80,7 @@ static long memblock_regions_adjacent(struct memblock_type *type,
> return memblock_addrs_adjacent(base1, size1, base2, size2);
> }
>
> -long memblock_overlaps_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
> +long __init_memblock memblock_overlaps_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
> {
> unsigned long i;
>
> @@ -156,7 +156,7 @@ static phys_addr_t __init memblock_find_base(phys_addr_t size, phys_addr_t align
> return MEMBLOCK_ERROR;
> }
>
> -static void memblock_remove_region(struct memblock_type *type, unsigned long r)
> +static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r)
> {
> unsigned long i;
>
> @@ -168,7 +168,7 @@ static void memblock_remove_region(struct memblock_type *type, unsigned long r)
> }
>
> /* Assumption: base addr of region 1 < base addr of region 2 */
> -static void memblock_coalesce_regions(struct memblock_type *type,
> +static void __init_memblock memblock_coalesce_regions(struct memblock_type *type,
> unsigned long r1, unsigned long r2)
> {
> type->regions[r1].size += type->regions[r2].size;
> @@ -178,7 +178,7 @@ static void memblock_coalesce_regions(struct memblock_type *type,
> /* Defined below but needed now */
> static long memblock_add_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size);
>
> -static int memblock_double_array(struct memblock_type *type)
> +static int __init_memblock memblock_double_array(struct memblock_type *type)
> {
> struct memblock_region *new_array, *old_array;
> phys_addr_t old_size, new_size, addr;
> @@ -249,13 +249,13 @@ static int memblock_double_array(struct memblock_type *type)
> return 0;
> }
>
> -extern int __weak memblock_memory_can_coalesce(phys_addr_t addr1, phys_addr_t size1,
> +extern int __init_memblock __weak memblock_memory_can_coalesce(phys_addr_t addr1, phys_addr_t size1,
> phys_addr_t addr2, phys_addr_t size2)
> {
> return 1;
> }
>
> -static long memblock_add_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
> +static long __init_memblock memblock_add_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
> {
> unsigned long coalesced = 0;
> long adjacent, i;
> @@ -342,13 +342,13 @@ static long memblock_add_region(struct memblock_type *type, phys_addr_t base, ph
> return 0;
> }
>
> -long memblock_add(phys_addr_t base, phys_addr_t size)
> +long __init_memblock memblock_add(phys_addr_t base, phys_addr_t size)
> {
> return memblock_add_region(&memblock.memory, base, size);
>
> }
>
> -static long __memblock_remove(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
> +static long __init_memblock __memblock_remove(struct memblock_type *type, phys_addr_t base, phys_addr_t size)
> {
> phys_addr_t rgnbegin, rgnend;
> phys_addr_t end = base + size;
> @@ -396,7 +396,7 @@ static long __memblock_remove(struct memblock_type *type, phys_addr_t base, phys
> return memblock_add_region(type, end, rgnend - end);
> }
>
> -long memblock_remove(phys_addr_t base, phys_addr_t size)
> +long __init_memblock memblock_remove(phys_addr_t base, phys_addr_t size)
> {
> return __memblock_remove(&memblock.memory, base, size);
> }
> @@ -562,7 +562,7 @@ phys_addr_t __init memblock_phys_mem_size(void)
> return memblock.memory_size;
> }
>
> -phys_addr_t memblock_end_of_DRAM(void)
> +phys_addr_t __init_memblock memblock_end_of_DRAM(void)
> {
> int idx = memblock.memory.cnt - 1;
>
> @@ -623,7 +623,7 @@ int __init memblock_is_reserved(phys_addr_t addr)
> return 0;
> }
>
> -int memblock_is_region_reserved(phys_addr_t base, phys_addr_t size)
> +int __init_memblock memblock_is_region_reserved(phys_addr_t base, phys_addr_t size)
> {
> return memblock_overlaps_region(&memblock.reserved, base, size);
> }
> @@ -634,7 +634,7 @@ void __init memblock_set_current_limit(phys_addr_t limit)
> memblock.current_limit = limit;
> }
>
> -static void memblock_dump(struct memblock_type *region, char *name)
> +static void __init_memblock memblock_dump(struct memblock_type *region, char *name)
> {
> unsigned long long base, size;
> int i;
> @@ -650,7 +650,7 @@ static void memblock_dump(struct memblock_type *region, char *name)
> }
> }
>
> -void memblock_dump_all(void)
> +void __init_memblock memblock_dump_all(void)
> {
> if (!memblock_debug)
> return;
> @@ -716,7 +716,7 @@ static int __init early_memblock(char *p)
> }
> early_param("memblock", early_memblock);
>
> -#ifdef CONFIG_DEBUG_FS
> +#if defined(CONFIG_DEBUG_FS) && !defined(ARCH_DISCARD_MEMBLOCK)
>
> static int memblock_debug_show(struct seq_file *m, void *private)
> {

2010-07-28 05:54:14

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 28/31] memblock: Export MEMBLOCK_ERROR again

On 07/27/2010 10:19 PM, Benjamin Herrenschmidt wrote:
>
> Screw it, I don't like it but I'll just split your patch in two for now
> and keep 0. It's a bit fishy but memblock does mostly top-down
> allocations and so shouldn't hit 0, and in practice the region at 0 is,
> I beleive, reserved, but we need to be extra careful and might need to
> revisit that a bit.
>
> That's an area where I don't completely agree with Linus, ie, 0 is a
> perfectly valid physical address for memblock to return :-)
>

On x86, physical address 0 contains the real-mode IVT and will thus be
reserved, at least for the forseeable future. Other architectures may
very well have non-special RAM there.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-07-28 06:01:06

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 28/31] memblock: Export MEMBLOCK_ERROR again

From: "H. Peter Anvin" <[email protected]>
Date: Tue, 27 Jul 2010 22:53:21 -0700

> On 07/27/2010 10:19 PM, Benjamin Herrenschmidt wrote:
>>
>> Screw it, I don't like it but I'll just split your patch in two for now
>> and keep 0. It's a bit fishy but memblock does mostly top-down
>> allocations and so shouldn't hit 0, and in practice the region at 0 is,
>> I beleive, reserved, but we need to be extra careful and might need to
>> revisit that a bit.
>>
>> That's an area where I don't completely agree with Linus, ie, 0 is a
>> perfectly valid physical address for memblock to return :-)
>>
>
> On x86, physical address 0 contains the real-mode IVT and will thus be
> reserved, at least for the forseeable future. Other architectures may
> very well have non-special RAM there.

0 is very much possible on sparc64

2010-07-28 06:09:27

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 31/31] memblock: Add memblock_find_in_range()

On 07/27/2010 10:36 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2010-07-22 at 11:21 -0700, Yinghai Lu wrote:
>> it is a wrapper for memblock_find_base
>>
>> make it more easy for x86 to use memblock. ( rebase )
>> x86 early_res is using find/reserve pattern instead of alloc.
>>
>> keep it in weak version, so later We can use x86 own version if needed.
>> also We need it in mm/memblock.c, so one caller mm/page_alloc.c could get compiled
>>
>> -v2: Change name to memblock_find_in_range() according to Michael Ellerman
>> -v3: Add generic weak version __memblock_find_in_range()
>> so keep the path for fallback to x86 version that handle from low
>> -v4: use 0 for failing path
>> -v5: use MEMBLOCK_ERROR again
>> -v6: remove __memblock_find_in_range()
>
> It's very gross to have this weak and not memblock_find_base()... IE.
>
> You create a new function defined as a wrapper on an existing one to
> provide an easier set of arguments ... but also make it weak so the
> arch can completely change its semantics without actually changing
> the semantics of the function it wraps.
>
> This is going to cause confusion and bugs. I'm adding the patch without
> the weak bit to my branch for now, we need to discuss what is the best
> approach for x86 here. Might be to use a different function. I don't
> understand yet -why- x86 needs to override it, maybe the right way is
> to reserve things more intelligently on x86 ?

again there is a difference between low/high to high/low.

for example:
high/low allocation, from first kernel to kexec second kernel, always work fine except system with Qlogic card.
because Qlogic card is using main RAM as EFT etc for card's FW log trace. second kernel have not idea that those RAM
is used by first kernel for that purpose. that the CARD still use that between two kernels.
second kernel could have crash it try to use those ram.

low/high allocation seems to be safe, second kernel can slip to boot fine.

Thanks

Yinghai

2010-07-28 06:15:23

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 28/31] memblock: Export MEMBLOCK_ERROR again

On 07/27/2010 11:01 PM, David Miller wrote:
> From: "H. Peter Anvin" <[email protected]>
> Date: Tue, 27 Jul 2010 22:53:21 -0700
>
>> On 07/27/2010 10:19 PM, Benjamin Herrenschmidt wrote:
>>>
>>> Screw it, I don't like it but I'll just split your patch in two for now
>>> and keep 0. It's a bit fishy but memblock does mostly top-down
>>> allocations and so shouldn't hit 0, and in practice the region at 0 is,
>>> I beleive, reserved, but we need to be extra careful and might need to
>>> revisit that a bit.
>>>
>>> That's an area where I don't completely agree with Linus, ie, 0 is a
>>> perfectly valid physical address for memblock to return :-)
>>>
>>
>> On x86, physical address 0 contains the real-mode IVT and will thus be
>> reserved, at least for the forseeable future. Other architectures may
>> very well have non-special RAM there.
>
> 0 is very much possible on sparc64

So still keep MEMBLOCK_ERROR to (~(phys_addr_t)0) ?

We can change some variable from unsigned long to phys_addr_t that will be
assigned by memblock_find_base().

that could avoid casting too.

Thanks

Yinghai

2010-07-28 06:23:39

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 31/31] memblock: Add memblock_find_in_range()

On 07/27/2010 10:36 PM, Benjamin Herrenschmidt wrote:
>
> It's very gross to have this weak and not memblock_find_base()... IE.
>
> You create a new function defined as a wrapper on an existing one to
> provide an easier set of arguments ... but also make it weak so the
> arch can completely change its semantics without actually changing
> the semantics of the function it wraps.
>
> This is going to cause confusion and bugs. I'm adding the patch without
> the weak bit to my branch for now, we need to discuss what is the best
> approach for x86 here. Might be to use a different function. I don't
> understand yet -why- x86 needs to override it, maybe the right way is
> to reserve things more intelligently on x86 ?
>
> In any case, you can always use your own wrapper there if needed
>

I'm really confused by this as well. First of all this only looks like
a prototype which swizzles the argument order, which is a good start for
making problems.

The second thing is that the proposed x86 code seems to do something I
would consider to be a core service, which is find an allocation outside
any reserved region, but it does so by looking at two different data
structures. Instead the logical thing would be to knock a reserved
block out of the available set, so that there is always a data structure
which contains the currently free and available memory. I think the
best thing would be if the same data structure could also handle
reserved memory types (by carrying an attribute), but if that is not
possible, there can be a reserved memblock structure and an available
memblock structure, alternatively (and equivalently) an "all memory"
memblock structure and an available memblock structure, but unavailable
memory should not be in the available structure.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-07-28 06:30:28

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 28/31] memblock: Export MEMBLOCK_ERROR again

On Tue, 2010-07-27 at 22:53 -0700, H. Peter Anvin wrote:
> On 07/27/2010 10:19 PM, Benjamin Herrenschmidt wrote:
> >
> > Screw it, I don't like it but I'll just split your patch in two for now
> > and keep 0. It's a bit fishy but memblock does mostly top-down
> > allocations and so shouldn't hit 0, and in practice the region at 0 is,
> > I beleive, reserved, but we need to be extra careful and might need to
> > revisit that a bit.
> >
> > That's an area where I don't completely agree with Linus, ie, 0 is a
> > perfectly valid physical address for memblock to return :-)
> >
>
> On x86, physical address 0 contains the real-mode IVT and will thus be
> reserved, at least for the forseeable future. Other architectures may
> very well have non-special RAM there.

Right, that's my point. Anyways, I'm making 0 special for now and adding
a wart to prevent the allocator from returning something below
PAGE_SIZE. If we want to revisit that later we can.

Cheers,
Ben.

2010-07-28 06:38:55

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 31/31] memblock: Add memblock_find_in_range()

On 07/27/2010 11:08 PM, Yinghai Lu wrote:
>
> for example:
> high/low allocation, from first kernel to kexec second kernel, always work fine except system with Qlogic card.
> because Qlogic card is using main RAM as EFT etc for card's FW log trace. second kernel have not idea that those RAM
> is used by first kernel for that purpose. that the CARD still use that between two kernels.
> second kernel could have crash it try to use those ram.
>

Uhm, no. That's a bug in the Qlogic driver not shutting the card down
cleanly. Hacking around that in memory allocation order is braindamaged
in the extreme. kexec *cannot* be safe in any way if we don't shut down
pending DMA, and what you describe above is DMA.

> low/high allocation seems to be safe, second kernel can slip to boot fine.

Low to high is just broken. Low memory is a special, desirable
resource, and we should minimize the use of it.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-07-28 07:14:14

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 31/31] memblock: Add memblock_find_in_range()

On 07/27/2010 11:38 PM, H. Peter Anvin wrote:
> On 07/27/2010 11:08 PM, Yinghai Lu wrote:
>>
>> for example:
>> high/low allocation, from first kernel to kexec second kernel, always work fine except system with Qlogic card.
>> because Qlogic card is using main RAM as EFT etc for card's FW log trace. second kernel have not idea that those RAM
>> is used by first kernel for that purpose. that the CARD still use that between two kernels.
>> second kernel could have crash it try to use those ram.
>>
>
> Uhm, no. That's a bug in the Qlogic driver not shutting the card down
> cleanly. Hacking around that in memory allocation order is braindamaged
> in the extreme. kexec *cannot* be safe in any way if we don't shut down
> pending DMA, and what you describe above is DMA.
>

the problem is later if the user hit the problem, it will be called "Regression" after bisecting to the memblock/x86 changes.
because low/high does work before.

BTW, that design from qlogic to save log in RAM is not good one, they may save some cents for the ram in card.

other vendors seems put log/trace in the ram on card.

Thanks

Yinghai

2010-07-28 09:26:46

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 28/31] memblock: Export MEMBLOCK_ERROR again

On Tue, 2010-07-27 at 23:01 -0700, David Miller wrote:
> > On x86, physical address 0 contains the real-mode IVT and will thus be
> > reserved, at least for the forseeable future. Other architectures may
> > very well have non-special RAM there.
>
> 0 is very much possible on sparc64

Yup, we need to fix that.

For now I made MEMBLOCK_ERROR 0 and added a blurb to prevent allocating
the first page since it would cause other problems with the current code
(0 is after all the normal error result from membloc_alloc(), ie, our
API wasn't quite consistent there).

So I don't think I'm introducing a regression here, on the contrary. But
if we are going to allow lmb_alloc() to return 0, we need to fix all
callers first and then we can look into turning MEMBLOCK_ERROR back to
~0

Cheers,
Ben.

2010-07-28 09:30:06

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 28/31] memblock: Export MEMBLOCK_ERROR again

On Tue, 2010-07-27 at 23:13 -0700, Yinghai Lu wrote:
> So still keep MEMBLOCK_ERROR to (~(phys_addr_t)0) ?

No point for now. lmb_alloc() etc... are defined as returning 0 for
failure so we would need to fix that too.

As I said in a previous email we do need to revisit the memblock error
handling.

> We can change some variable from unsigned long to phys_addr_t that
> will be assigned by memblock_find_base().
>
> that could avoid casting too.

That would be a good idea anyways if those are indeed carrying a
physical address.

Ben.

2010-07-28 10:02:46

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 31/31] memblock: Add memblock_find_in_range()

On Tue, 2010-07-27 at 23:08 -0700, Yinghai Lu wrote:
>
> for example:
> high/low allocation, from first kernel to kexec second kernel, always
> work fine except system with Qlogic card.
> because Qlogic card is using main RAM as EFT etc for card's FW log
> trace. second kernel have not idea that those RAM
> is used by first kernel for that purpose. that the CARD still use
> that between two kernels.
> second kernel could have crash it try to use those ram.
>
> low/high allocation seems to be safe, second kernel can slip to boot
> fine.

No, it works 'by chance'. You need kexec to somewhat mark those regions
as reserved. I don't know how x86 does those things, on architectures
using the flat device-tree, we have added a concept of "reserve map" to
the flat device tree blob to mark that kind of region.

Also, because you mark your new function as weak but not the one that's
actually used by memblock_alloc(), it will still end up being top-down,
so if you want to switch to bottom up, make the internal function weak,
not the wrapper.

Cheers,
Ben.

2010-07-28 16:06:56

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 31/31] memblock: Add memblock_find_in_range()

On 07/28/2010 12:12 AM, Yinghai Lu wrote:
>
> the problem is later if the user hit the problem, it will be called "Regression" after bisecting to the memblock/x86 changes.
> because low/high does work before.
>
> BTW, that design from qlogic to save log in RAM is not good one, they may save some cents for the ram in card.
>
> other vendors seems put log/trace in the ram on card.
>

It's broken NOW. The only reason it's not exploding is by accident.
The fact that you knew about the problem and had the notion of working
around it instead of fixing the root cause by either fixing or
blacklisting the broken driver is disgusting beyond belief.

Either the driver needs to map the memory off limit in the e820 map
passed to kexec, or better yet it should have its DMA disabled across
kexec. I'm truly appalled.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-07-28 17:02:36

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 31/31] memblock: Add memblock_find_in_range()

On Tue, 2010-07-27 at 23:38 -0700, H. Peter Anvin wrote:
> On 07/27/2010 11:08 PM, Yinghai Lu wrote:
> >
> > for example:
> > high/low allocation, from first kernel to kexec second kernel, always work fine except system with Qlogic card.
> > because Qlogic card is using main RAM as EFT etc for card's FW log trace. second kernel have not idea that those RAM
> > is used by first kernel for that purpose. that the CARD still use that between two kernels.
> > second kernel could have crash it try to use those ram.
> >
>
> Uhm, no. That's a bug in the Qlogic driver not shutting the card down
> cleanly. Hacking around that in memory allocation order is braindamaged
> in the extreme. kexec *cannot* be safe in any way if we don't shut down
> pending DMA, and what you describe above is DMA.

That's not the kexec for crash dump requirement as it was communicated
to us. We were specifically told that the shutdown routines *may* not
be called before booting the kexec kernel and thus we have to take
action to stop the DMA engines in the init routines so the kexec kernel
can halt all in-progress DMA as it boots. This implies that kexec must
be able to cope with in-progress DMA.

James

2010-07-28 17:54:48

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 31/31] memblock: Add memblock_find_in_range()

On 07/28/2010 10:02 AM, James Bottomley wrote:
> On Tue, 2010-07-27 at 23:38 -0700, H. Peter Anvin wrote:
>> On 07/27/2010 11:08 PM, Yinghai Lu wrote:
>>>
>>> for example:
>>> high/low allocation, from first kernel to kexec second kernel, always work fine except system with Qlogic card.
>>> because Qlogic card is using main RAM as EFT etc for card's FW log trace. second kernel have not idea that those RAM
>>> is used by first kernel for that purpose. that the CARD still use that between two kernels.
>>> second kernel could have crash it try to use those ram.
>>>
>>
>> Uhm, no. That's a bug in the Qlogic driver not shutting the card down
>> cleanly. Hacking around that in memory allocation order is braindamaged
>> in the extreme. kexec *cannot* be safe in any way if we don't shut down
>> pending DMA, and what you describe above is DMA.
>
> That's not the kexec for crash dump requirement as it was communicated
> to us. We were specifically told that the shutdown routines *may* not
> be called before booting the kexec kernel and thus we have to take
> action to stop the DMA engines in the init routines so the kexec kernel
> can halt all in-progress DMA as it boots. This implies that kexec must
> be able to cope with in-progress DMA.
>

kexec for crash dump is a special case: for crash dump, there is a chunk
of memory pre-reserved for the crash kernel, and that is the *only*
memory that the crash kernel will use. In other words, everything else
is reserved memory as far as the crash kernel is concerned. As such, it
should not be affected; there may be DMA still pending to the main
kernel's memory area, of course, but as far as the crash kernel is
concerned, that should just be input data.

If allocation order somehow matters for the *crash kernel*, then we have
even more fundamental problems...

Obviously, if there is DMA going on to the crash kernel reserved region
then all bets are off, but at that point the system is so screwed anyway
that it shouldn't matter.

-hpa

2010-07-28 18:10:40

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 31/31] memblock: Add memblock_find_in_range()

On Wed, 2010-07-28 at 10:53 -0700, H. Peter Anvin wrote:
> On 07/28/2010 10:02 AM, James Bottomley wrote:
> > On Tue, 2010-07-27 at 23:38 -0700, H. Peter Anvin wrote:
> >> On 07/27/2010 11:08 PM, Yinghai Lu wrote:
> >>>
> >>> for example:
> >>> high/low allocation, from first kernel to kexec second kernel, always work fine except system with Qlogic card.
> >>> because Qlogic card is using main RAM as EFT etc for card's FW log trace. second kernel have not idea that those RAM
> >>> is used by first kernel for that purpose. that the CARD still use that between two kernels.
> >>> second kernel could have crash it try to use those ram.
> >>>
> >>
> >> Uhm, no. That's a bug in the Qlogic driver not shutting the card down
> >> cleanly. Hacking around that in memory allocation order is braindamaged
> >> in the extreme. kexec *cannot* be safe in any way if we don't shut down
> >> pending DMA, and what you describe above is DMA.
> >
> > That's not the kexec for crash dump requirement as it was communicated
> > to us. We were specifically told that the shutdown routines *may* not
> > be called before booting the kexec kernel and thus we have to take
> > action to stop the DMA engines in the init routines so the kexec kernel
> > can halt all in-progress DMA as it boots. This implies that kexec must
> > be able to cope with in-progress DMA.
> >
>
> kexec for crash dump is a special case: for crash dump, there is a chunk
> of memory pre-reserved for the crash kernel, and that is the *only*
> memory that the crash kernel will use. In other words, everything else
> is reserved memory as far as the crash kernel is concerned. As such, it
> should not be affected; there may be DMA still pending to the main
> kernel's memory area, of course, but as far as the crash kernel is
> concerned, that should just be input data.
>
> If allocation order somehow matters for the *crash kernel*, then we have
> even more fundamental problems...
>
> Obviously, if there is DMA going on to the crash kernel reserved region
> then all bets are off, but at that point the system is so screwed anyway
> that it shouldn't matter.

So I don't understand the problem. Proper shutdown of the old kernel
will halt all the DMA engines (by design ... we can't have DMA ongoing
if the next action might be power off). The only case I know where DMA
engines may be active is the crash kernel case.

James

2010-07-28 18:30:55

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 31/31] memblock: Add memblock_find_in_range()

On 07/28/2010 11:10 AM, James Bottomley wrote:
>
> So I don't understand the problem. Proper shutdown of the old kernel
> will halt all the DMA engines (by design ... we can't have DMA ongoing
> if the next action might be power off). The only case I know where DMA
> engines may be active is the crash kernel case.
>

I'm not sure I fully understand the exact problem, either; not being
familiar with this putative "logging" facility of the Qlogic devices.
My point was largely that if a device causes failures because of the
choice of the allocation order, then we have a much bigger problem and
papering over it by trying to muck with the allocation order is just wrong.

This logging facility of Qlogic is DMA, no more, no less. It needs to
be shut down on a "overwrite" kexec, where we replace one kernel with
another, as opposed to a crash dump kexec, where we use a reserved chunk
of virgin memory. What I don't know/understand at the moment is if
there is something "special" about this particular logging facility,
e.g. if the Qlogic card ignore the bus mastering control bit -- which
would be reckless but I can see someone having the bright idea to do that.

Yinghai, do you have any more detail, or know who would? Also copying
the Qlogic Infinipath maintainer email...

-hpa

2010-07-28 19:29:09

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 31/31] memblock: Add memblock_find_in_range()

On 07/28/2010 11:30 AM, H. Peter Anvin wrote:
> On 07/28/2010 11:10 AM, James Bottomley wrote:
>>
>> So I don't understand the problem. Proper shutdown of the old kernel
>> will halt all the DMA engines (by design ... we can't have DMA ongoing
>> if the next action might be power off). The only case I know where DMA
>> engines may be active is the crash kernel case.
>>
>
> I'm not sure I fully understand the exact problem, either; not being
> familiar with this putative "logging" facility of the Qlogic devices.
> My point was largely that if a device causes failures because of the
> choice of the allocation order, then we have a much bigger problem and
> papering over it by trying to muck with the allocation order is just wrong.
>
> This logging facility of Qlogic is DMA, no more, no less. It needs to
> be shut down on a "overwrite" kexec, where we replace one kernel with
> another, as opposed to a crash dump kexec, where we use a reserved chunk
> of virgin memory. What I don't know/understand at the moment is if
> there is something "special" about this particular logging facility,
> e.g. if the Qlogic card ignore the bus mastering control bit -- which
> would be reckless but I can see someone having the bright idea to do that.
>
> Yinghai, do you have any more detail, or know who would? Also copying
> the Qlogic Infinipath maintainer email...

when I was debug memblock with x86, found the strange crash when high/low.

then use kexec with "memtest" in command line, and the early memtest does find

some bad memory.

then I add more print about EPT physical address for first kernel,

it does show that range is used by qla driver in first kernel.

I built all needed drivers in kernel so can pxeboot the kernel on all test platforms easily.

Thanks

Yinghai


---
drivers/scsi/qla2xxx/qla_init.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux-2.6/drivers/scsi/qla2xxx/qla_init.c
===================================================================
--- linux-2.6.orig/drivers/scsi/qla2xxx/qla_init.c
+++ linux-2.6/drivers/scsi/qla2xxx/qla_init.c
@@ -1327,8 +1327,8 @@ qla2x00_alloc_fw_dump(scsi_qla_host_t *v
goto try_eft;
}

- qla_printk(KERN_INFO, ha, "Allocated (%d KB) for FCE...\n",
- FCE_SIZE / 1024);
+ qla_printk(KERN_INFO, ha, "Allocated (%d KB) at %p for FCE...\n",
+ FCE_SIZE / 1024, tc);

fce_size = sizeof(struct qla2xxx_fce_chain) + FCE_SIZE;
ha->flags.fce_enabled = 1;
@@ -1354,8 +1354,8 @@ try_eft:
goto cont_alloc;
}

- qla_printk(KERN_INFO, ha, "Allocated (%d KB) for EFT...\n",
- EFT_SIZE / 1024);
+ qla_printk(KERN_INFO, ha, "Allocated (%d KB) at %p for EFT...\n",
+ EFT_SIZE / 1024, tc);

eft_size = EFT_SIZE;
ha->eft_dma = tc_dma;
@@ -1383,8 +1383,8 @@ cont_alloc:
}
return;
}
- qla_printk(KERN_INFO, ha, "Allocated (%d KB) for firmware dump...\n",
- dump_size / 1024);
+ qla_printk(KERN_INFO, ha, "Allocated (%d KB) at %p for firmware dump...\n",
+ dump_size / 1024, ha->fw_dump);

ha->fw_dump_len = dump_size;
ha->fw_dump->signature[0] = 'Q';

2010-07-28 20:00:10

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 31/31] memblock: Add memblock_find_in_range()

On 07/28/2010 12:27 PM, Yinghai Lu wrote:
>>
>> Yinghai, do you have any more detail, or know who would? Also copying
>> the Qlogic Infinipath maintainer email...
>
> when I was debug memblock with x86, found the strange crash when high/low.
> then use kexec with "memtest" in command line, and the early memtest does find
> some bad memory.
>
> then I add more print about EPT physical address for first kernel,
> it does show that range is used by qla driver in first kernel.
> I built all needed drivers in kernel so can pxeboot the kernel on all test platforms easily.
>

[Cc: Andrew Vasquez, who seems to have written the offending code,
checkin df613b96077cee826b14089ae6e75eeabf71faa3.]

The question is still open why this particular DMA activity was not shut
down before the kexec. I'm not familiar with how non-crashdump kexec
idles the hardware, but it obviously better do so.

-hpa

2010-07-28 22:58:33

by Ralph Campbell

[permalink] [raw]
Subject: Re: [PATCH 31/31] memblock: Add memblock_find_in_range()

On Wed, 2010-07-28 at 11:30 -0700, H. Peter Anvin wrote:
> On 07/28/2010 11:10 AM, James Bottomley wrote:
> >
> > So I don't understand the problem. Proper shutdown of the old kernel
> > will halt all the DMA engines (by design ... we can't have DMA ongoing
> > if the next action might be power off). The only case I know where DMA
> > engines may be active is the crash kernel case.
> >
>
> I'm not sure I fully understand the exact problem, either; not being
> familiar with this putative "logging" facility of the Qlogic devices.
> My point was largely that if a device causes failures because of the
> choice of the allocation order, then we have a much bigger problem and
> papering over it by trying to muck with the allocation order is just wrong.
>
> This logging facility of Qlogic is DMA, no more, no less. It needs to
> be shut down on a "overwrite" kexec, where we replace one kernel with
> another, as opposed to a crash dump kexec, where we use a reserved chunk
> of virgin memory. What I don't know/understand at the moment is if
> there is something "special" about this particular logging facility,
> e.g. if the Qlogic card ignore the bus mastering control bit -- which
> would be reckless but I can see someone having the bright idea to do that.
>
> Yinghai, do you have any more detail, or know who would? Also copying
> the Qlogic Infinipath maintainer email...
>
> -hpa

I read the messages in this thread but I don't understand what the
problem is. Something to do with logging, DMA and crash dumps but
it also sounds like the original discussion may be confused about
how the Infiniband HCA cards work.

Can someone summarize what is going on...

2010-07-28 23:42:53

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 31/31] memblock: Add memblock_find_in_range()

On 07/28/2010 03:58 PM, Ralph Campbell wrote:
>
> I read the messages in this thread but I don't understand what the
> problem is. Something to do with logging, DMA and crash dumps but
> it also sounds like the original discussion may be confused about
> how the Infiniband HCA cards work.
>
> Can someone summarize what is going on...
>

Sorry, I was confused... this had to do with the qla driver, not Infinipath.

-hpa