2010-06-22 17:28:17

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH -v19 00/25] Use lmb with x86

New lmb could be used to replace early_res in x86.

Suggested by: David, Ben, and Thomas

-v6: change sequence as requested by Thomas
-v7: seperate them to more patches
-v8: add boundary checking to make sure not free partial page.
-v9: use lmb_debug to control print out of reserve_lmb.
add e820 clean up, and e820 become __initdata
-v10:use lmb.rmo_size and ARCH_DISCARD_LMB according to Michael
change name to lmb_find_area/reserve_lmb_area/free_lmb_area,
according to Michael
update find_lmb_area to use __lmb_alloc_base according to ben
-v11:move find_lmb_area_size back to x86.
x86 has own find_lmb_area, and could be disabled by ARCH_LMB_FIND_AREA
because _lmb_find_base has different behavoir from x86's old one.
one from high to high and one from low to high
need more test
tested for x86 32bit/64bit, numa/nonuma, nobootmem/bootmem.
-v12:refresh the series with current tip
seperate nobootmem.c, so could remove some #ifdef
still keep CONFIG_NO_BOOTMEM, in x86 .c, and could use the as tags
so other lmb could refer them to use NO_BOOTMEM.

-v14:refresh to current tip

-v15:remove x86 version lmb_find_area
remove other nobootmem and x86 e820 from this patchset

-v16: rebase to Ben's cleanup powerpc/lmb
move back most func back to arch/x86/mm/lmb.c

-v17: remove exposing of lmb_add_region
seperate first lmb core related patch to several small patches.

-v18: change lmb_find_area to lmb_find_in_range
kill __lmb_find_area and use lmb_find_area directly
remove lmb_add_memory
change lmb_reserve_area to lmb_reserve_range
change lmb_free_area to lmb_free_range
don't clear lmb.reserved after converting
use for_each_lmb to replace for cycle
rebase to 06/15/2010 powerpc/lmb

-v19: make the patchset only focus on lmb related.
will submit patches about bootmem/nobootmem seperating and other e820
related later after this one.

this patcheset is based on tip/master+powerpc/lmb

todo:
replace range handling (subtracting) with lmb.

Thanks

Yinghai Lu

arch/x86/Kconfig | 9 +-
arch/x86/include/asm/e820.h | 21 +-
arch/x86/include/asm/efi.h | 2 +-
arch/x86/include/asm/lmb.h | 21 ++
arch/x86/kernel/acpi/sleep.c | 7 +-
arch/x86/kernel/apic/numaq_32.c | 3 +-
arch/x86/kernel/check.c | 16 +-
arch/x86/kernel/e820.c | 197 ++++----------
arch/x86/kernel/efi.c | 5 +-
arch/x86/kernel/head.c | 3 +-
arch/x86/kernel/head32.c | 10 +-
arch/x86/kernel/head64.c | 7 +-
arch/x86/kernel/mpparse.c | 5 +-
arch/x86/kernel/setup.c | 76 ++++--
arch/x86/kernel/setup_percpu.c | 6 -
arch/x86/kernel/trampoline.c | 8 +-
arch/x86/mm/Makefile | 2 +
arch/x86/mm/init.c | 7 +-
arch/x86/mm/init_32.c | 31 +-
arch/x86/mm/init_64.c | 38 +--
arch/x86/mm/k8topology_64.c | 4 +-
arch/x86/mm/lmb.c | 398 ++++++++++++++++++++++++++
arch/x86/mm/memtest.c | 7 +-
arch/x86/mm/numa_32.c | 25 +-
arch/x86/mm/numa_64.c | 45 ++--
arch/x86/mm/srat_32.c | 3 +-
arch/x86/mm/srat_64.c | 13 +-
arch/x86/xen/mmu.c | 5 +-
arch/x86/xen/setup.c | 3 +-
include/linux/early_res.h | 23 --
include/linux/lmb.h | 17 ++
include/linux/mm.h | 2 +
kernel/Makefile | 1 -
kernel/early_res.c | 584 ---------------------------------------
lib/lmb.c | 65 +++--
lib/swiotlb.c | 16 +-
mm/bootmem.c | 11 +-
mm/page_alloc.c | 72 +++--
mm/sparse-vmemmap.c | 11 -
39 files changed, 777 insertions(+), 1002 deletions(-)
create mode 100644 arch/x86/include/asm/lmb.h
create mode 100644 arch/x86/mm/lmb.c
delete mode 100644 include/linux/early_res.h
delete mode 100644 kernel/early_res.c


2010-06-22 17:28:20

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 01/25] lmb: lmb_find_base() should return LMB_ERROR on failing path

all callees assume it return LMB_ERROR when it fail to find a range

Signed-off-by: Yinghai Lu <[email protected]>
---
lib/lmb.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/lib/lmb.c b/lib/lmb.c
index 2421b2a..13d1a04 100644
--- a/lib/lmb.c
+++ b/lib/lmb.c
@@ -154,7 +154,7 @@ static phys_addr_t __init lmb_find_base(phys_addr_t size, phys_addr_t align,
if (found != LMB_ERROR)
return found;
}
- return 0;
+ return LMB_ERROR;
}

static void lmb_remove_region(struct lmb_type *type, unsigned long r)
--
1.6.4.2

2010-06-22 17:28:28

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 07/25] lmb: Add lmb_find_in_range()

it is a wrapper for lmb_find_base

make it more easy for x86 to use lmb. ( rebase )
x86 early_res is using find/reserve pattern instead of alloc.

keep it in weak version, so later We can use x86 own version if needed.
also We need it in lib/lmb.c, so one caller mm/page_alloc.c could get compiled

-v2: Change name to lmb_find_in_range() according to Michael Ellerman
-v3: Add generic weak version __lmb_find_in_range()
so keep the path for fallback to x86 version that handle from low
-v4: use 0 for failing path
-v5: use LMB_ERROR again
-v6: remove __lmb_find_in_range()

Signed-off-by: Yinghai Lu <[email protected]>
---
include/linux/lmb.h | 2 ++
lib/lmb.c | 8 ++++++++
2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/include/linux/lmb.h b/include/linux/lmb.h
index 5310c7b..6ca5659 100644
--- a/include/linux/lmb.h
+++ b/include/linux/lmb.h
@@ -45,6 +45,8 @@ extern int lmb_debug;
extern int lmb_can_resize;
extern struct lmb_region lmb_reserved_init_regions[];

+u64 lmb_find_in_range(u64 start, u64 end, u64 size, u64 align);
+
extern void __init lmb_init(void);
extern void __init lmb_analyze(void);
extern long lmb_add(phys_addr_t base, phys_addr_t size);
diff --git a/lib/lmb.c b/lib/lmb.c
index e45e967..2e00159 100644
--- a/lib/lmb.c
+++ b/lib/lmb.c
@@ -156,6 +156,14 @@ static phys_addr_t __init lmb_find_base(phys_addr_t size, phys_addr_t align,
return LMB_ERROR;
}

+/*
+ * Find a free area with specified alignment in a specific range.
+ */
+u64 __init __weak lmb_find_in_range(u64 start, u64 end, u64 size, u64 align)
+{
+ return lmb_find_base(size, align, start, end);
+}
+
static void __init_lmb lmb_remove_region(struct lmb_type *type, unsigned long r)
{
unsigned long i;
--
1.6.4.2

2010-06-22 17:29:00

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 17/25] x86, lmb: Add lmb_memory_in_range()

It will return memory size in specified range according to lmb.memory.region

Try to share some code with lmb_free_memory_in_range() by passing get_free to
__lmb_memory_in_range().

-v2: Ben want _in_range in the name instead of size

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/lmb.h | 1 +
arch/x86/mm/lmb.c | 18 +++++++++++++++++-
2 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
index 3a304f8..749c224 100644
--- a/arch/x86/include/asm/lmb.h
+++ b/arch/x86/include/asm/lmb.h
@@ -16,5 +16,6 @@ void lmb_register_active_regions(int nid, unsigned long start_pfn,
u64 lmb_hole_size(u64 start, u64 end);
u64 lmb_find_in_range_node(int nid, u64 start, u64 end, u64 size, u64 align);
u64 lmb_free_memory_in_range(u64 addr, u64 limit);
+u64 lmb_memory_in_range(u64 addr, u64 limit);

#endif
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index 991dd55..bd2f60b 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -217,7 +217,7 @@ void __init lmb_to_bootmem(u64 start, u64 end)
}
#endif

-u64 __init lmb_free_memory_in_range(u64 addr, u64 limit)
+static u64 __init __lmb_memory_in_range(u64 addr, u64 limit, bool get_free)
{
int i, count;
struct range *range;
@@ -246,6 +246,10 @@ u64 __init lmb_free_memory_in_range(u64 addr, u64 limit)
}
subtract_range(range, count, 0, addr);
subtract_range(range, count, limit, -1ULL);
+
+ /* Subtract lmb.reserved.region in range ? */
+ if (!get_free)
+ goto sort_and_count_them;
for_each_lmb(reserved, r) {
final_start = PFN_DOWN(r->base);
final_end = PFN_UP(r->base + r->size);
@@ -256,6 +260,8 @@ u64 __init lmb_free_memory_in_range(u64 addr, u64 limit)

subtract_range(range, count, final_start, final_end);
}
+
+sort_and_count_them:
nr_range = clean_sort_range(range, count);

free_size = 0;
@@ -265,6 +271,16 @@ u64 __init lmb_free_memory_in_range(u64 addr, u64 limit)
return free_size << PAGE_SHIFT;
}

+u64 __init lmb_free_memory_in_range(u64 addr, u64 limit)
+{
+ return __lmb_memory_in_range(addr, limit, true);
+}
+
+u64 __init lmb_memory_in_range(u64 addr, u64 limit)
+{
+ return __lmb_memory_in_range(addr, limit, false);
+}
+
void __init lmb_reserve_range(u64 start, u64 end, char *name)
{
if (start == end)
--
1.6.4.2

2010-06-22 17:28:56

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 10/25] x86, lmb: Add lmb_to_bootmem()

lmb_to_bootmem() will reserve lmb.reserved.region in bootmem after bootmem is
set up.

We can use it to with all arches that support lmb later.

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/lmb.h | 1 +
arch/x86/mm/lmb.c | 30 ++++++++++++++++++++++++++++++
2 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
index 9d26895..e8207f8 100644
--- a/arch/x86/include/asm/lmb.h
+++ b/arch/x86/include/asm/lmb.h
@@ -4,5 +4,6 @@
#define ARCH_DISCARD_LMB

u64 lmb_find_in_range_size(u64 start, u64 *sizep, u64 align);
+void lmb_to_bootmem(u64 start, u64 end);

#endif
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index 4335f48..a959699 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -86,3 +86,33 @@ u64 __init lmb_find_in_range_size(u64 start, u64 *sizep, u64 align)
return LMB_ERROR;
}

+#ifndef CONFIG_NO_BOOTMEM
+void __init lmb_to_bootmem(u64 start, u64 end)
+{
+ int count;
+ u64 final_start, final_end;
+ struct lmb_region *r;
+
+ /* Take out region array itself */
+ if (lmb.reserved.regions != lmb_reserved_init_regions)
+ lmb_free(__pa(lmb.reserved.regions), sizeof(struct lmb_region) * lmb.reserved.max);
+
+ count = lmb.reserved.cnt;
+ pr_info("(%d early reservations) ==> bootmem [%010llx - %010llx]\n", count, start, end);
+ for_each_lmb(reserved, r) {
+ pr_info(" [%010llx - %010llx] ", (u64)r->base, (u64)r->base + r->size);
+ final_start = max(start, r->base);
+ final_end = min(end, r->base + r->size);
+ if (final_start >= final_end) {
+ pr_cont("\n");
+ continue;
+ }
+ pr_cont(" ==> [%010llx - %010llx]\n", final_start, final_end);
+ reserve_bootmem_generic(final_start, final_end - final_start, BOOTMEM_DEFAULT);
+ }
+
+ /* Put region array back ? */
+ if (lmb.reserved.regions != lmb_reserved_init_regions)
+ lmb_reserve(__pa(lmb.reserved.regions), sizeof(struct lmb_region) * lmb.reserved.max);
+}
+#endif
--
1.6.4.2

2010-06-22 17:29:22

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 20/25] x86: Replace e820_/_early string with lmb_

1.include linux/lmb.h directly. so later could reduce e820.h reference.
2 this patch is done by sed scripts mainly

-v2: use LMB_ERROR instead of -1ULL or -1UL

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/efi.h | 2 +-
arch/x86/kernel/acpi/sleep.c | 7 ++++---
arch/x86/kernel/apic/numaq_32.c | 3 ++-
arch/x86/kernel/efi.c | 5 +++--
arch/x86/kernel/head32.c | 4 ++--
arch/x86/kernel/head64.c | 4 ++--
arch/x86/kernel/setup.c | 29 ++++++++++++++---------------
arch/x86/kernel/trampoline.c | 8 ++++----
arch/x86/mm/init.c | 7 ++++---
arch/x86/mm/init_32.c | 12 +++++++-----
arch/x86/mm/init_64.c | 11 ++++++-----
arch/x86/mm/k8topology_64.c | 4 +++-
arch/x86/mm/memtest.c | 7 +++----
arch/x86/mm/numa_32.c | 25 +++++++++++++------------
arch/x86/mm/numa_64.c | 34 +++++++++++++++++-----------------
arch/x86/mm/srat_32.c | 3 ++-
arch/x86/mm/srat_64.c | 11 ++++++-----
arch/x86/xen/mmu.c | 5 +++--
arch/x86/xen/setup.c | 3 ++-
mm/bootmem.c | 4 ++--
20 files changed, 100 insertions(+), 88 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 8406ed7..5f28c36 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -90,7 +90,7 @@ extern void __iomem *efi_ioremap(unsigned long addr, unsigned long size,
#endif /* CONFIG_X86_32 */

extern int add_efi_memmap;
-extern void efi_reserve_early(void);
+extern void efi_lmb_reserve_range(void);
extern void efi_call_phys_prelog(void);
extern void efi_call_phys_epilog(void);

diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index 82e5086..c552767 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -7,6 +7,7 @@

#include <linux/acpi.h>
#include <linux/bootmem.h>
+#include <linux/lmb.h>
#include <linux/dmi.h>
#include <linux/cpumask.h>
#include <asm/segment.h>
@@ -133,15 +134,15 @@ void __init acpi_reserve_wakeup_memory(void)
return;
}

- mem = find_e820_area(0, 1<<20, WAKEUP_SIZE, PAGE_SIZE);
+ mem = lmb_find_in_range(0, 1<<20, WAKEUP_SIZE, PAGE_SIZE);

- if (mem == -1L) {
+ if (mem == (unsigned long)LMB_ERROR) {
printk(KERN_ERR "ACPI: Cannot allocate lowmem, S3 disabled.\n");
return;
}
acpi_realmode = (unsigned long) phys_to_virt(mem);
acpi_wakeup_address = mem;
- reserve_early(mem, mem + WAKEUP_SIZE, "ACPI WAKEUP");
+ lmb_reserve_range(mem, mem + WAKEUP_SIZE, "ACPI WAKEUP");
}


diff --git a/arch/x86/kernel/apic/numaq_32.c b/arch/x86/kernel/apic/numaq_32.c
index 3e28401..c71e494 100644
--- a/arch/x86/kernel/apic/numaq_32.c
+++ b/arch/x86/kernel/apic/numaq_32.c
@@ -26,6 +26,7 @@
#include <linux/nodemask.h>
#include <linux/topology.h>
#include <linux/bootmem.h>
+#include <linux/lmb.h>
#include <linux/threads.h>
#include <linux/cpumask.h>
#include <linux/kernel.h>
@@ -88,7 +89,7 @@ static inline void numaq_register_node(int node, struct sys_cfg_data *scd)
node_end_pfn[node] =
MB_TO_PAGES(eq->hi_shrd_mem_start + eq->hi_shrd_mem_size);

- e820_register_active_regions(node, node_start_pfn[node],
+ lmb_register_active_regions(node, node_start_pfn[node],
node_end_pfn[node]);

memory_present(node, node_start_pfn[node], node_end_pfn[node]);
diff --git a/arch/x86/kernel/efi.c b/arch/x86/kernel/efi.c
index c2fa9b8..00cc672 100644
--- a/arch/x86/kernel/efi.c
+++ b/arch/x86/kernel/efi.c
@@ -30,6 +30,7 @@
#include <linux/init.h>
#include <linux/efi.h>
#include <linux/bootmem.h>
+#include <linux/lmb.h>
#include <linux/spinlock.h>
#include <linux/uaccess.h>
#include <linux/time.h>
@@ -275,7 +276,7 @@ static void __init do_add_efi_memmap(void)
sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
}

-void __init efi_reserve_early(void)
+void __init efi_lmb_reserve_range(void)
{
unsigned long pmap;

@@ -290,7 +291,7 @@ void __init efi_reserve_early(void)
boot_params.efi_info.efi_memdesc_size;
memmap.desc_version = boot_params.efi_info.efi_memdesc_version;
memmap.desc_size = boot_params.efi_info.efi_memdesc_size;
- reserve_early(pmap, pmap + memmap.nr_map * memmap.desc_size,
+ lmb_reserve_range(pmap, pmap + memmap.nr_map * memmap.desc_size,
"EFI memmap");
}

diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index d5258e4..3a58c71 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -42,7 +42,7 @@ void __init i386_start_kernel(void)
lmb_reserve_range(PAGE_SIZE, PAGE_SIZE + PAGE_SIZE, "EX TRAMPOLINE");
#endif

- reserve_early(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");
+ lmb_reserve_range(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");

#ifdef CONFIG_BLK_DEV_INITRD
/* Reserve INITRD */
@@ -51,7 +51,7 @@ void __init i386_start_kernel(void)
u64 ramdisk_image = boot_params.hdr.ramdisk_image;
u64 ramdisk_size = boot_params.hdr.ramdisk_size;
u64 ramdisk_end = PAGE_ALIGN(ramdisk_image + ramdisk_size);
- reserve_early(ramdisk_image, ramdisk_end, "RAMDISK");
+ lmb_reserve_range(ramdisk_image, ramdisk_end, "RAMDISK");
}
#endif

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 89dd2de..5f16fbf 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -101,7 +101,7 @@ void __init x86_64_start_reservations(char *real_mode_data)

copy_bootdata(__va(real_mode_data));

- reserve_early(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");
+ lmb_reserve_range(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");

#ifdef CONFIG_BLK_DEV_INITRD
/* Reserve INITRD */
@@ -110,7 +110,7 @@ void __init x86_64_start_reservations(char *real_mode_data)
unsigned long ramdisk_image = boot_params.hdr.ramdisk_image;
unsigned long ramdisk_size = boot_params.hdr.ramdisk_size;
unsigned long ramdisk_end = PAGE_ALIGN(ramdisk_image + ramdisk_size);
- reserve_early(ramdisk_image, ramdisk_end, "RAMDISK");
+ lmb_reserve_range(ramdisk_image, ramdisk_end, "RAMDISK");
}
#endif

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3d68c3e..ba3f94c 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -303,7 +303,7 @@ static inline void init_gbpages(void)
static void __init reserve_brk(void)
{
if (_brk_end > _brk_start)
- reserve_early(__pa(_brk_start), __pa(_brk_end), "BRK");
+ lmb_reserve_range(__pa(_brk_start), __pa(_brk_end), "BRK");

/* Mark brk area as locked down and no longer taking any
new allocations */
@@ -325,17 +325,16 @@ static void __init relocate_initrd(void)
char *p, *q;

/* We need to move the initrd down into lowmem */
- ramdisk_here = find_e820_area(0, end_of_lowmem, area_size,
+ ramdisk_here = lmb_find_in_range(0, end_of_lowmem, area_size,
PAGE_SIZE);

- if (ramdisk_here == -1ULL)
+ if (ramdisk_here == LMB_ERROR)
panic("Cannot find place for new RAMDISK of size %lld\n",
ramdisk_size);

/* Note: this includes all the lowmem currently occupied by
the initrd, we rely on that fact to keep the data intact. */
- reserve_early(ramdisk_here, ramdisk_here + area_size,
- "NEW RAMDISK");
+ lmb_reserve_range(ramdisk_here, ramdisk_here + area_size, "NEW RAMDISK");
initrd_start = ramdisk_here + PAGE_OFFSET;
initrd_end = initrd_start + ramdisk_size;
printk(KERN_INFO "Allocated new RAMDISK: %08llx - %08llx\n",
@@ -391,7 +390,7 @@ static void __init reserve_initrd(void)
initrd_start = 0;

if (ramdisk_size >= (end_of_lowmem>>1)) {
- free_early(ramdisk_image, ramdisk_end);
+ lmb_free_range(ramdisk_image, ramdisk_end);
printk(KERN_ERR "initrd too large to handle, "
"disabling initrd\n");
return;
@@ -414,7 +413,7 @@ static void __init reserve_initrd(void)

relocate_initrd();

- free_early(ramdisk_image, ramdisk_end);
+ lmb_free_range(ramdisk_image, ramdisk_end);
}
#else
static void __init reserve_initrd(void)
@@ -470,7 +469,7 @@ static void __init e820_reserve_setup_data(void)
e820_print_map("reserve setup_data");
}

-static void __init reserve_early_setup_data(void)
+static void __init lmb_reserve_range_setup_data(void)
{
struct setup_data *data;
u64 pa_data;
@@ -482,7 +481,7 @@ static void __init reserve_early_setup_data(void)
while (pa_data) {
data = early_memremap(pa_data, sizeof(*data));
sprintf(buf, "setup data %x", data->type);
- reserve_early(pa_data, pa_data+sizeof(*data)+data->len, buf);
+ lmb_reserve_range(pa_data, pa_data+sizeof(*data)+data->len, buf);
pa_data = data->next;
early_iounmap(data, sizeof(*data));
}
@@ -520,23 +519,23 @@ static void __init reserve_crashkernel(void)
if (crash_base <= 0) {
const unsigned long long alignment = 16<<20; /* 16M */

- crash_base = find_e820_area(alignment, ULONG_MAX, crash_size,
+ crash_base = lmb_find_in_range(alignment, ULONG_MAX, crash_size,
alignment);
- if (crash_base == -1ULL) {
+ if (crash_base == LMB_ERROR) {
pr_info("crashkernel reservation failed - No suitable area found.\n");
return;
}
} else {
unsigned long long start;

- start = find_e820_area(crash_base, ULONG_MAX, crash_size,
+ start = lmb_find_in_range(crash_base, ULONG_MAX, crash_size,
1<<20);
if (start != crash_base) {
pr_info("crashkernel reservation failed - memory is in use.\n");
return;
}
}
- reserve_early(crash_base, crash_base + crash_size, "CRASH KERNEL");
+ lmb_reserve_range(crash_base, crash_base + crash_size, "CRASH KERNEL");

printk(KERN_INFO "Reserving %ldMB of memory at %ldMB "
"for crashkernel (System RAM: %ldMB)\n",
@@ -792,7 +791,7 @@ void __init setup_arch(char **cmdline_p)
#endif
4)) {
efi_enabled = 1;
- efi_reserve_early();
+ efi_lmb_reserve_range();
}
#endif

@@ -852,7 +851,7 @@ void __init setup_arch(char **cmdline_p)
vmi_activate();

/* after early param, so could get panic from serial */
- reserve_early_setup_data();
+ lmb_reserve_range_setup_data();

if (acpi_mps_check()) {
#ifdef CONFIG_X86_LOCAL_APIC
diff --git a/arch/x86/kernel/trampoline.c b/arch/x86/kernel/trampoline.c
index c652ef6..725477e 100644
--- a/arch/x86/kernel/trampoline.c
+++ b/arch/x86/kernel/trampoline.c
@@ -1,7 +1,7 @@
#include <linux/io.h>
+#include <linux/lmb.h>

#include <asm/trampoline.h>
-#include <asm/e820.h>

#if defined(CONFIG_X86_64) && defined(CONFIG_ACPI_SLEEP)
#define __trampinit
@@ -19,12 +19,12 @@ void __init reserve_trampoline_memory(void)
unsigned long mem;

/* Has to be in very low memory so we can execute real-mode AP code. */
- mem = find_e820_area(0, 1<<20, TRAMPOLINE_SIZE, PAGE_SIZE);
- if (mem == -1L)
+ mem = lmb_find_in_range(0, 1<<20, TRAMPOLINE_SIZE, PAGE_SIZE);
+ if (mem == (unsigned long)LMB_ERROR)
panic("Cannot allocate trampoline\n");

trampoline_base = __va(mem);
- reserve_early(mem, mem + TRAMPOLINE_SIZE, "TRAMPOLINE");
+ lmb_reserve_range(mem, mem + TRAMPOLINE_SIZE, "TRAMPOLINE");
}

/*
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index b278535..0bfbd02 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -2,6 +2,7 @@
#include <linux/initrd.h>
#include <linux/ioport.h>
#include <linux/swap.h>
+#include <linux/lmb.h>

#include <asm/cacheflush.h>
#include <asm/e820.h>
@@ -75,9 +76,9 @@ static void __init find_early_table_space(unsigned long end, int use_pse,
#else
start = 0x8000;
#endif
- e820_table_start = find_e820_area(start, max_pfn_mapped<<PAGE_SHIFT,
+ e820_table_start = lmb_find_in_range(start, max_pfn_mapped<<PAGE_SHIFT,
tables, PAGE_SIZE);
- if (e820_table_start == -1UL)
+ if (e820_table_start == (unsigned long)LMB_ERROR)
panic("Cannot find space for the kernel page tables");

e820_table_start >>= PAGE_SHIFT;
@@ -299,7 +300,7 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
__flush_tlb_all();

if (!after_bootmem && e820_table_end > e820_table_start)
- reserve_early(e820_table_start << PAGE_SHIFT,
+ lmb_reserve_range(e820_table_start << PAGE_SHIFT,
e820_table_end << PAGE_SHIFT, "PGTABLE");

if (!after_bootmem)
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 90e0545..e3ae067 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -25,6 +25,7 @@
#include <linux/pfn.h>
#include <linux/poison.h>
#include <linux/bootmem.h>
+#include <linux/lmb.h>
#include <linux/proc_fs.h>
#include <linux/memory_hotplug.h>
#include <linux/initrd.h>
@@ -712,14 +713,14 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
highstart_pfn = highend_pfn = max_pfn;
if (max_pfn > max_low_pfn)
highstart_pfn = max_low_pfn;
- e820_register_active_regions(0, 0, highend_pfn);
+ lmb_register_active_regions(0, 0, highend_pfn);
sparse_memory_present_with_active_regions(0);
printk(KERN_NOTICE "%ldMB HIGHMEM available.\n",
pages_to_mb(highend_pfn - highstart_pfn));
num_physpages = highend_pfn;
high_memory = (void *) __va(highstart_pfn * PAGE_SIZE - 1) + 1;
#else
- e820_register_active_regions(0, 0, max_low_pfn);
+ lmb_register_active_regions(0, 0, max_low_pfn);
sparse_memory_present_with_active_regions(0);
num_physpages = max_low_pfn;
high_memory = (void *) __va(max_low_pfn * PAGE_SIZE - 1) + 1;
@@ -781,11 +782,11 @@ void __init setup_bootmem_allocator(void)
* Initialize the boot-time allocator (with low memory only):
*/
bootmap_size = bootmem_bootmap_pages(max_low_pfn)<<PAGE_SHIFT;
- bootmap = find_e820_area(0, max_pfn_mapped<<PAGE_SHIFT, bootmap_size,
+ bootmap = lmb_find_in_range(0, max_pfn_mapped<<PAGE_SHIFT, bootmap_size,
PAGE_SIZE);
- if (bootmap == -1L)
+ if (bootmap == (unsigned long)LMB_ERROR)
panic("Cannot find bootmem map of size %ld\n", bootmap_size);
- reserve_early(bootmap, bootmap + bootmap_size, "BOOTMAP");
+ lmb_reserve_range(bootmap, bootmap + bootmap_size, "BOOTMAP");
#endif

printk(KERN_INFO " mapped low ram: 0 - %08lx\n",
@@ -1069,3 +1070,4 @@ void mark_rodata_ro(void)
#endif
}
#endif
+
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 634fa08..bb9315f 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -21,6 +21,7 @@
#include <linux/initrd.h>
#include <linux/pagemap.h>
#include <linux/bootmem.h>
+#include <linux/lmb.h>
#include <linux/proc_fs.h>
#include <linux/pci.h>
#include <linux/pfn.h>
@@ -577,18 +578,18 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
unsigned long bootmap_size, bootmap;

bootmap_size = bootmem_bootmap_pages(end_pfn)<<PAGE_SHIFT;
- bootmap = find_e820_area(0, end_pfn<<PAGE_SHIFT, bootmap_size,
+ bootmap = lmb_find_in_range(0, end_pfn<<PAGE_SHIFT, bootmap_size,
PAGE_SIZE);
- if (bootmap == -1L)
+ if (bootmap == LMB_ERROR)
panic("Cannot find bootmem map of size %ld\n", bootmap_size);
- reserve_early(bootmap, bootmap + bootmap_size, "BOOTMAP");
+ lmb_reserve_range(bootmap, bootmap + bootmap_size, "BOOTMAP");
/* don't touch min_low_pfn */
bootmap_size = init_bootmem_node(NODE_DATA(0), bootmap >> PAGE_SHIFT,
0, end_pfn);
- e820_register_active_regions(0, start_pfn, end_pfn);
+ lmb_register_active_regions(0, start_pfn, end_pfn);
free_bootmem_with_active_regions(0, end_pfn);
#else
- e820_register_active_regions(0, start_pfn, end_pfn);
+ lmb_register_active_regions(0, start_pfn, end_pfn);
#endif
}
#endif
diff --git a/arch/x86/mm/k8topology_64.c b/arch/x86/mm/k8topology_64.c
index 970ed57..d7d031b 100644
--- a/arch/x86/mm/k8topology_64.c
+++ b/arch/x86/mm/k8topology_64.c
@@ -11,6 +11,8 @@
#include <linux/string.h>
#include <linux/module.h>
#include <linux/nodemask.h>
+#include <linux/lmb.h>
+
#include <asm/io.h>
#include <linux/pci_ids.h>
#include <linux/acpi.h>
@@ -222,7 +224,7 @@ int __init k8_scan_nodes(void)
for_each_node_mask(i, node_possible_map) {
int j;

- e820_register_active_regions(i,
+ lmb_register_active_regions(i,
nodes[i].start >> PAGE_SHIFT,
nodes[i].end >> PAGE_SHIFT);
for (j = apicid_base; j < cores + apicid_base; j++)
diff --git a/arch/x86/mm/memtest.c b/arch/x86/mm/memtest.c
index 18d244f..9911122 100644
--- a/arch/x86/mm/memtest.c
+++ b/arch/x86/mm/memtest.c
@@ -6,8 +6,7 @@
#include <linux/smp.h>
#include <linux/init.h>
#include <linux/pfn.h>
-
-#include <asm/e820.h>
+#include <linux/lmb.h>

static u64 patterns[] __initdata = {
0,
@@ -35,7 +34,7 @@ static void __init reserve_bad_mem(u64 pattern, u64 start_bad, u64 end_bad)
(unsigned long long) pattern,
(unsigned long long) start_bad,
(unsigned long long) end_bad);
- reserve_early(start_bad, end_bad, "BAD RAM");
+ lmb_reserve_range(start_bad, end_bad, "BAD RAM");
}

static void __init memtest(u64 pattern, u64 start_phys, u64 size)
@@ -74,7 +73,7 @@ static void __init do_one_pass(u64 pattern, u64 start, u64 end)
u64 size = 0;

while (start < end) {
- start = find_e820_area_size(start, &size, 1);
+ start = lmb_find_in_range_size(start, &size, 1);

/* done ? */
if (start >= end)
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index 809baaa..303cc3b 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -24,6 +24,7 @@

#include <linux/mm.h>
#include <linux/bootmem.h>
+#include <linux/lmb.h>
#include <linux/mmzone.h>
#include <linux/highmem.h>
#include <linux/initrd.h>
@@ -120,7 +121,7 @@ int __init get_memcfg_numa_flat(void)

node_start_pfn[0] = 0;
node_end_pfn[0] = max_pfn;
- e820_register_active_regions(0, 0, max_pfn);
+ lmb_register_active_regions(0, 0, max_pfn);
memory_present(0, 0, max_pfn);
node_remap_size[0] = node_memmap_size_bytes(0, 0, max_pfn);

@@ -161,14 +162,14 @@ static void __init allocate_pgdat(int nid)
NODE_DATA(nid) = (pg_data_t *)node_remap_start_vaddr[nid];
else {
unsigned long pgdat_phys;
- pgdat_phys = find_e820_area(min_low_pfn<<PAGE_SHIFT,
+ pgdat_phys = lmb_find_in_range(min_low_pfn<<PAGE_SHIFT,
max_pfn_mapped<<PAGE_SHIFT,
sizeof(pg_data_t),
PAGE_SIZE);
NODE_DATA(nid) = (pg_data_t *)(pfn_to_kaddr(pgdat_phys>>PAGE_SHIFT));
memset(buf, 0, sizeof(buf));
sprintf(buf, "NODE_DATA %d", nid);
- reserve_early(pgdat_phys, pgdat_phys + sizeof(pg_data_t), buf);
+ lmb_reserve_range(pgdat_phys, pgdat_phys + sizeof(pg_data_t), buf);
}
printk(KERN_DEBUG "allocate_pgdat: node %d NODE_DATA %08lx\n",
nid, (unsigned long)NODE_DATA(nid));
@@ -291,15 +292,15 @@ static __init unsigned long calculate_numa_remap_pages(void)
PTRS_PER_PTE);
node_kva_target <<= PAGE_SHIFT;
do {
- node_kva_final = find_e820_area(node_kva_target,
+ node_kva_final = lmb_find_in_range(node_kva_target,
((u64)node_end_pfn[nid])<<PAGE_SHIFT,
((u64)size)<<PAGE_SHIFT,
LARGE_PAGE_BYTES);
node_kva_target -= LARGE_PAGE_BYTES;
- } while (node_kva_final == -1ULL &&
+ } while (node_kva_final == LMB_ERROR &&
(node_kva_target>>PAGE_SHIFT) > (node_start_pfn[nid]));

- if (node_kva_final == -1ULL)
+ if (node_kva_final == LMB_ERROR)
panic("Can not get kva ram\n");

node_remap_size[nid] = size;
@@ -318,9 +319,9 @@ static __init unsigned long calculate_numa_remap_pages(void)
* but we could have some hole in high memory, and it will only
* check page_is_ram(pfn) && !page_is_reserved_early(pfn) to decide
* to use it as free.
- * So reserve_early here, hope we don't run out of that array
+ * So lmb_reserve_range here, hope we don't run out of that array
*/
- reserve_early(node_kva_final,
+ lmb_reserve_range(node_kva_final,
node_kva_final+(((u64)size)<<PAGE_SHIFT),
"KVA RAM");

@@ -367,14 +368,14 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,

kva_target_pfn = round_down(max_low_pfn - kva_pages, PTRS_PER_PTE);
do {
- kva_start_pfn = find_e820_area(kva_target_pfn<<PAGE_SHIFT,
+ kva_start_pfn = lmb_find_in_range(kva_target_pfn<<PAGE_SHIFT,
max_low_pfn<<PAGE_SHIFT,
kva_pages<<PAGE_SHIFT,
PTRS_PER_PTE<<PAGE_SHIFT) >> PAGE_SHIFT;
kva_target_pfn -= PTRS_PER_PTE;
- } while (kva_start_pfn == -1UL && kva_target_pfn > min_low_pfn);
+ } while (kva_start_pfn == (unsigned long)LMB_ERROR && kva_target_pfn > min_low_pfn);

- if (kva_start_pfn == -1UL)
+ if (kva_start_pfn == (unsigned long)LMB_ERROR)
panic("Can not get kva space\n");

printk(KERN_INFO "kva_start_pfn ~ %lx max_low_pfn ~ %lx\n",
@@ -382,7 +383,7 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
printk(KERN_INFO "max_pfn = %lx\n", max_pfn);

/* avoid clash with initrd */
- reserve_early(kva_start_pfn<<PAGE_SHIFT,
+ lmb_reserve_range(kva_start_pfn<<PAGE_SHIFT,
(kva_start_pfn + kva_pages)<<PAGE_SHIFT,
"KVA PG");
#ifdef CONFIG_HIGHMEM
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index ffd987f..92c01a5 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -87,16 +87,16 @@ static int __init allocate_cachealigned_memnodemap(void)

addr = 0x8000;
nodemap_size = roundup(sizeof(s16) * memnodemapsize, L1_CACHE_BYTES);
- nodemap_addr = find_e820_area(addr, max_pfn<<PAGE_SHIFT,
+ nodemap_addr = lmb_find_in_range(addr, max_pfn<<PAGE_SHIFT,
nodemap_size, L1_CACHE_BYTES);
- if (nodemap_addr == -1UL) {
+ if (nodemap_addr == LMB_ERROR) {
printk(KERN_ERR
"NUMA: Unable to allocate Memory to Node hash map\n");
nodemap_addr = nodemap_size = 0;
return -1;
}
memnodemap = phys_to_virt(nodemap_addr);
- reserve_early(nodemap_addr, nodemap_addr + nodemap_size, "MEMNODEMAP");
+ lmb_reserve_range(nodemap_addr, nodemap_addr + nodemap_size, "MEMNODEMAP");

printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n",
nodemap_addr, nodemap_addr + nodemap_size);
@@ -227,7 +227,7 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
if (node_data[nodeid] == NULL)
return;
nodedata_phys = __pa(node_data[nodeid]);
- reserve_early(nodedata_phys, nodedata_phys + pgdat_size, "NODE_DATA");
+ lmb_reserve_range(nodedata_phys, nodedata_phys + pgdat_size, "NODE_DATA");
printk(KERN_INFO " NODE_DATA [%016lx - %016lx]\n", nodedata_phys,
nodedata_phys + pgdat_size - 1);
nid = phys_to_nid(nodedata_phys);
@@ -246,7 +246,7 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
* Find a place for the bootmem map
* nodedata_phys could be on other nodes by alloc_bootmem,
* so need to sure bootmap_start not to be small, otherwise
- * early_node_mem will get that with find_e820_area instead
+ * early_node_mem will get that with lmb_find_in_range instead
* of alloc_bootmem, that could clash with reserved range
*/
bootmap_pages = bootmem_bootmap_pages(last_pfn - start_pfn);
@@ -258,12 +258,12 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
bootmap = early_node_mem(nodeid, bootmap_start, end,
bootmap_pages<<PAGE_SHIFT, PAGE_SIZE);
if (bootmap == NULL) {
- free_early(nodedata_phys, nodedata_phys + pgdat_size);
+ lmb_free_range(nodedata_phys, nodedata_phys + pgdat_size);
node_data[nodeid] = NULL;
return;
}
bootmap_start = __pa(bootmap);
- reserve_early(bootmap_start, bootmap_start+(bootmap_pages<<PAGE_SHIFT),
+ lmb_reserve_range(bootmap_start, bootmap_start+(bootmap_pages<<PAGE_SHIFT),
"BOOTMAP");

bootmap_size = init_bootmem_node(NODE_DATA(nodeid),
@@ -417,7 +417,7 @@ static int __init split_nodes_interleave(u64 addr, u64 max_addr,
nr_nodes = MAX_NUMNODES;
}

- size = (max_addr - addr - e820_hole_size(addr, max_addr)) / nr_nodes;
+ size = (max_addr - addr - lmb_hole_size(addr, max_addr)) / nr_nodes;
/*
* Calculate the number of big nodes that can be allocated as a result
* of consolidating the remainder.
@@ -453,7 +453,7 @@ static int __init split_nodes_interleave(u64 addr, u64 max_addr,
* non-reserved memory is less than the per-node size.
*/
while (end - physnodes[i].start -
- e820_hole_size(physnodes[i].start, end) < size) {
+ lmb_hole_size(physnodes[i].start, end) < size) {
end += FAKE_NODE_MIN_SIZE;
if (end > physnodes[i].end) {
end = physnodes[i].end;
@@ -467,7 +467,7 @@ static int __init split_nodes_interleave(u64 addr, u64 max_addr,
* this one must extend to the boundary.
*/
if (end < dma32_end && dma32_end - end -
- e820_hole_size(end, dma32_end) < FAKE_NODE_MIN_SIZE)
+ lmb_hole_size(end, dma32_end) < FAKE_NODE_MIN_SIZE)
end = dma32_end;

/*
@@ -476,7 +476,7 @@ static int __init split_nodes_interleave(u64 addr, u64 max_addr,
* physical node.
*/
if (physnodes[i].end - end -
- e820_hole_size(end, physnodes[i].end) < size)
+ lmb_hole_size(end, physnodes[i].end) < size)
end = physnodes[i].end;

/*
@@ -504,7 +504,7 @@ static u64 __init find_end_of_node(u64 start, u64 max_addr, u64 size)
{
u64 end = start + size;

- while (end - start - e820_hole_size(start, end) < size) {
+ while (end - start - lmb_hole_size(start, end) < size) {
end += FAKE_NODE_MIN_SIZE;
if (end > max_addr) {
end = max_addr;
@@ -533,7 +533,7 @@ static int __init split_nodes_size_interleave(u64 addr, u64 max_addr, u64 size)
* creates a uniform distribution of node sizes across the entire
* machine (but not necessarily over physical nodes).
*/
- min_size = (max_addr - addr - e820_hole_size(addr, max_addr)) /
+ min_size = (max_addr - addr - lmb_hole_size(addr, max_addr)) /
MAX_NUMNODES;
min_size = max(min_size, FAKE_NODE_MIN_SIZE);
if ((min_size & FAKE_NODE_MIN_HASH_MASK) < min_size)
@@ -566,7 +566,7 @@ static int __init split_nodes_size_interleave(u64 addr, u64 max_addr, u64 size)
* this one must extend to the boundary.
*/
if (end < dma32_end && dma32_end - end -
- e820_hole_size(end, dma32_end) < FAKE_NODE_MIN_SIZE)
+ lmb_hole_size(end, dma32_end) < FAKE_NODE_MIN_SIZE)
end = dma32_end;

/*
@@ -575,7 +575,7 @@ static int __init split_nodes_size_interleave(u64 addr, u64 max_addr, u64 size)
* physical node.
*/
if (physnodes[i].end - end -
- e820_hole_size(end, physnodes[i].end) < size)
+ lmb_hole_size(end, physnodes[i].end) < size)
end = physnodes[i].end;

/*
@@ -639,7 +639,7 @@ static int __init numa_emulation(unsigned long start_pfn,
*/
remove_all_active_ranges();
for_each_node_mask(i, node_possible_map) {
- e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
+ lmb_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
nodes[i].end >> PAGE_SHIFT);
setup_node_bootmem(i, nodes[i].start, nodes[i].end);
}
@@ -692,7 +692,7 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
node_set(0, node_possible_map);
for (i = 0; i < nr_cpu_ids; i++)
numa_set_node(i, 0);
- e820_register_active_regions(0, start_pfn, last_pfn);
+ lmb_register_active_regions(0, start_pfn, last_pfn);
setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
}

diff --git a/arch/x86/mm/srat_32.c b/arch/x86/mm/srat_32.c
index 9324f13..68dd606 100644
--- a/arch/x86/mm/srat_32.c
+++ b/arch/x86/mm/srat_32.c
@@ -25,6 +25,7 @@
*/
#include <linux/mm.h>
#include <linux/bootmem.h>
+#include <linux/lmb.h>
#include <linux/mmzone.h>
#include <linux/acpi.h>
#include <linux/nodemask.h>
@@ -264,7 +265,7 @@ int __init get_memcfg_from_srat(void)
if (node_read_chunk(chunk->nid, chunk))
continue;

- e820_register_active_regions(chunk->nid, chunk->start_pfn,
+ lmb_register_active_regions(chunk->nid, chunk->start_pfn,
min(chunk->end_pfn, max_pfn));
}
/* for out of order entries in SRAT */
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index f9897f7..8f82c80 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -16,6 +16,7 @@
#include <linux/module.h>
#include <linux/topology.h>
#include <linux/bootmem.h>
+#include <linux/lmb.h>
#include <linux/mm.h>
#include <asm/proto.h>
#include <asm/numa.h>
@@ -98,15 +99,15 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
unsigned long phys;

length = slit->header.length;
- phys = find_e820_area(0, max_pfn_mapped<<PAGE_SHIFT, length,
+ phys = lmb_find_in_range(0, max_pfn_mapped<<PAGE_SHIFT, length,
PAGE_SIZE);

- if (phys == -1L)
+ if (phys == (unsigned long)LMB_ERROR)
panic(" Can not save slit!\n");

acpi_slit = __va(phys);
memcpy(acpi_slit, slit, length);
- reserve_early(phys, phys + length, "ACPI SLIT");
+ lmb_reserve_range(phys, phys + length, "ACPI SLIT");
}

/* Callback for Proximity Domain -> x2APIC mapping */
@@ -324,7 +325,7 @@ static int __init nodes_cover_memory(const struct bootnode *nodes)
pxmram = 0;
}

- e820ram = max_pfn - (e820_hole_size(0, max_pfn<<PAGE_SHIFT)>>PAGE_SHIFT);
+ e820ram = max_pfn - (lmb_hole_size(0, max_pfn<<PAGE_SHIFT)>>PAGE_SHIFT);
/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
if ((long)(e820ram - pxmram) >= (1<<(20 - PAGE_SHIFT))) {
printk(KERN_ERR
@@ -421,7 +422,7 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
}

for_each_node_mask(i, nodes_parsed)
- e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
+ lmb_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
nodes[i].end >> PAGE_SHIFT);
/* for out of order entries in SRAT */
sort_node_map();
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 914f046..2c8c862 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -44,6 +44,7 @@
#include <linux/bug.h>
#include <linux/module.h>
#include <linux/gfp.h>
+#include <linux/lmb.h>

#include <asm/pgtable.h>
#include <asm/tlbflush.h>
@@ -1735,7 +1736,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
__xen_write_cr3(true, __pa(pgd));
xen_mc_issue(PARAVIRT_LAZY_CPU);

- reserve_early(__pa(xen_start_info->pt_base),
+ lmb_reserve_range(__pa(xen_start_info->pt_base),
__pa(xen_start_info->pt_base +
xen_start_info->nr_pt_frames * PAGE_SIZE),
"XEN PAGETABLES");
@@ -1773,7 +1774,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,

pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(__pa(swapper_pg_dir)));

- reserve_early(__pa(xen_start_info->pt_base),
+ lmb_reserve_range(__pa(xen_start_info->pt_base),
__pa(xen_start_info->pt_base +
xen_start_info->nr_pt_frames * PAGE_SIZE),
"XEN PAGETABLES");
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index ad0047f..a156e16 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -8,6 +8,7 @@
#include <linux/sched.h>
#include <linux/mm.h>
#include <linux/pm.h>
+#include <linux/lmb.h>

#include <asm/elf.h>
#include <asm/vdso.h>
@@ -61,7 +62,7 @@ char * __init xen_memory_setup(void)
* - xen_start_info
* See comment above "struct start_info" in <xen/interface/xen.h>
*/
- reserve_early(__pa(xen_start_info->mfn_list),
+ lmb_reserve_range(__pa(xen_start_info->mfn_list),
__pa(xen_start_info->pt_base),
"XEN START INFO");

diff --git a/mm/bootmem.c b/mm/bootmem.c
index dac3f56..f332b70 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -435,7 +435,7 @@ void __init free_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
unsigned long size)
{
#ifdef CONFIG_NO_BOOTMEM
- free_early(physaddr, physaddr + size);
+ lmb_free_range(physaddr, physaddr + size);
#else
unsigned long start, end;

@@ -460,7 +460,7 @@ void __init free_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
void __init free_bootmem(unsigned long addr, unsigned long size)
{
#ifdef CONFIG_NO_BOOTMEM
- free_early(addr, addr + size);
+ lmb_free_range(addr, addr + size);
#else
unsigned long start, end;

--
1.6.4.2

2010-06-22 17:29:35

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 23/25] x86: Have nobootmem version setup_bootmem_allocator()

We can reduce #ifdef number from 3 to one in init_32.c

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/mm/init_32.c | 15 ++++++++++-----
1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index e3ae067..f172aa3 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -771,11 +771,9 @@ static unsigned long __init setup_node_bootmem(int nodeid,

return bootmap + bootmap_size;
}
-#endif

void __init setup_bootmem_allocator(void)
{
-#ifndef CONFIG_NO_BOOTMEM
int nodeid;
unsigned long bootmap_size, bootmap;
/*
@@ -787,13 +785,11 @@ void __init setup_bootmem_allocator(void)
if (bootmap == (unsigned long)LMB_ERROR)
panic("Cannot find bootmem map of size %ld\n", bootmap_size);
lmb_reserve_range(bootmap, bootmap + bootmap_size, "BOOTMAP");
-#endif

printk(KERN_INFO " mapped low ram: 0 - %08lx\n",
max_pfn_mapped<<PAGE_SHIFT);
printk(KERN_INFO " low ram: 0 - %08lx\n", max_low_pfn<<PAGE_SHIFT);

-#ifndef CONFIG_NO_BOOTMEM
for_each_online_node(nodeid) {
unsigned long start_pfn, end_pfn;

@@ -811,10 +807,19 @@ void __init setup_bootmem_allocator(void)
bootmap = setup_node_bootmem(nodeid, start_pfn, end_pfn,
bootmap);
}
-#endif

after_bootmem = 1;
}
+#else
+void __init setup_bootmem_allocator(void)
+{
+ printk(KERN_INFO " mapped low ram: 0 - %08lx\n",
+ max_pfn_mapped<<PAGE_SHIFT);
+ printk(KERN_INFO " low ram: 0 - %08lx\n", max_low_pfn<<PAGE_SHIFT);
+
+ after_bootmem = 1;
+}
+#endif

/*
* paging_init() sets up the page tables - note that the first 8MB are
--
1.6.4.2

2010-06-22 17:29:46

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 19/25] x86: Use lmb to replace early_res

1. replace find_e820_area with lmb_find_in_range
2. replace reserve_early with lmb_reserve_range
3. replace free_early with lmb_free_range.
4. NO_BOOTMEM will switch to use lmb too.
5. use _e820, _early wrap in the patch, in following patch, will
replace them all
6. because lmb_free_range support partial free, we can remove some special care
7. Need to make sure that lmb_find_in_range() is called after fill_lmb_memory()
so adjust some calling later in setup.c::setup_arch()
-- corruption_check and mptable_update

-v2: Move reserve_brk() early
Before fill_lmb_area, to avoid overlap between brk and lmb_find_in_range()
that could happen We have more then 128 RAM entry in E820 tables, and
fill_lmb_memory() could use lmb_find_in_range() to find a new place for
lmb.memory.region array.
and We don't need to use extend_brk() after fill_lmb_area()
So move reserve_brk() early before fill_lmb_area().
-v3: Move find_smp_config early
To make sure lmb_find_in_range not find wrong place, if BIOS doesn't put mptable
in right place.
-v4: Treat RESERVED_KERN as RAM in lmb.memory. and they are already in
lmb.reserved already..
use __NOT_KEEP_LMB to make sure lmb related code could be freed later.
-v5: Generic version __lmb_find_in_range() is going from high to low, and for 32bit
active_region for 32bit does include high pages
need to replace the limit with lmb.default_alloc_limit, aka get_max_mapped()
-v6: Use current_limit instead
-v7: check with LMB_ERROR instead of -1ULL or -1L
-v8: Set lmb_can_resize early to handle EFI with more RAM entries

Suggested-by: David S. Miller <[email protected]>
Suggested-by: Benjamin Herrenschmidt <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/Kconfig | 9 +--
arch/x86/include/asm/e820.h | 15 ++--
arch/x86/kernel/check.c | 16 ++--
arch/x86/kernel/e820.c | 164 ++++++++++++++--------------------------
arch/x86/kernel/head.c | 3 +-
arch/x86/kernel/head32.c | 6 +-
arch/x86/kernel/head64.c | 3 +
arch/x86/kernel/mpparse.c | 5 +-
arch/x86/kernel/setup.c | 46 ++++++++---
arch/x86/kernel/setup_percpu.c | 6 --
arch/x86/mm/numa_64.c | 9 +-
kernel/Makefile | 1 -
mm/bootmem.c | 1 +
mm/page_alloc.c | 36 +++------
mm/sparse-vmemmap.c | 11 ---
15 files changed, 140 insertions(+), 191 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3069a6d..600ce1a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -27,6 +27,7 @@ config X86
select HAVE_PERF_EVENTS if (!M386 && !M486)
select HAVE_IOREMAP_PROT
select HAVE_KPROBES
+ select HAVE_LMB
select ARCH_WANT_OPTIONAL_GPIOLIB
select ARCH_WANT_FRAME_POINTERS
select HAVE_DMA_ATTRS
@@ -196,9 +197,6 @@ config ARCH_SUPPORTS_OPTIMIZED_INLINING
config ARCH_SUPPORTS_DEBUG_PAGEALLOC
def_bool y

-config HAVE_EARLY_RES
- def_bool y
-
config HAVE_INTEL_TXT
def_bool y
depends on EXPERIMENTAL && DMAR && ACPI
@@ -591,14 +589,13 @@ config NO_BOOTMEM
default y
bool "Disable Bootmem code"
---help---
- Use early_res directly instead of bootmem before slab is ready.
+ Use lmb directly instead of bootmem before slab is ready.
- allocator (buddy) [generic]
- early allocator (bootmem) [generic]
- - very early allocator (reserve_early*()) [x86]
+ - very early allocator (lmb) [some generic]
- very very early allocator (early brk model) [x86]
So reduce one layer between early allocator to final allocator

-
config MEMTEST
bool "Memtest"
---help---
diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index ec8a52d..38adac8 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -117,24 +117,27 @@ extern unsigned long end_user_pfn;
extern u64 find_e820_area(u64 start, u64 end, u64 size, u64 align);
extern u64 find_e820_area_size(u64 start, u64 *sizep, u64 align);
extern u64 early_reserve_e820(u64 startt, u64 sizet, u64 align);
-#include <linux/early_res.h>

extern unsigned long e820_end_of_ram_pfn(void);
extern unsigned long e820_end_of_low_ram_pfn(void);
-extern int e820_find_active_region(const struct e820entry *ei,
- unsigned long start_pfn,
- unsigned long last_pfn,
- unsigned long *ei_startpfn,
- unsigned long *ei_endpfn);
extern void e820_register_active_regions(int nid, unsigned long start_pfn,
unsigned long end_pfn);
extern u64 e820_hole_size(u64 start, u64 end);
+
+extern u64 early_reserve_e820(u64 startt, u64 sizet, u64 align);
+
+void init_lmb_memory(void);
+void fill_lmb_memory(void);
+
extern void finish_e820_parsing(void);
extern void e820_reserve_resources(void);
extern void e820_reserve_resources_late(void);
extern void setup_memory_map(void);
extern char *default_machine_specific_memory_setup(void);

+void reserve_early(u64 start, u64 end, char *name);
+void free_early(u64 start, u64 end);
+
/*
* Returns true iff the specified range [s,e) is completely contained inside
* the ISA region.
diff --git a/arch/x86/kernel/check.c b/arch/x86/kernel/check.c
index fc999e6..27843ed 100644
--- a/arch/x86/kernel/check.c
+++ b/arch/x86/kernel/check.c
@@ -2,7 +2,8 @@
#include <linux/sched.h>
#include <linux/kthread.h>
#include <linux/workqueue.h>
-#include <asm/e820.h>
+#include <linux/lmb.h>
+
#include <asm/proto.h>

/*
@@ -18,10 +19,12 @@ static int __read_mostly memory_corruption_check = -1;
static unsigned __read_mostly corruption_check_size = 64*1024;
static unsigned __read_mostly corruption_check_period = 60; /* seconds */

-static struct e820entry scan_areas[MAX_SCAN_AREAS];
+static struct scan_area {
+ u64 addr;
+ u64 size;
+} scan_areas[MAX_SCAN_AREAS];
static int num_scan_areas;

-
static __init int set_corruption_check(char *arg)
{
char *end;
@@ -81,9 +84,9 @@ void __init setup_bios_corruption_check(void)

while (addr < corruption_check_size && num_scan_areas < MAX_SCAN_AREAS) {
u64 size;
- addr = find_e820_area_size(addr, &size, PAGE_SIZE);
+ addr = lmb_find_in_range_size(addr, &size, PAGE_SIZE);

- if (!(addr + 1))
+ if (addr == LMB_ERROR)
break;

if (addr >= corruption_check_size)
@@ -92,7 +95,7 @@ void __init setup_bios_corruption_check(void)
if ((addr + size) > corruption_check_size)
size = corruption_check_size - addr;

- e820_update_range(addr, size, E820_RAM, E820_RESERVED);
+ lmb_reserve_range(addr, addr + size, "SCAN RAM");
scan_areas[num_scan_areas].addr = addr;
scan_areas[num_scan_areas].size = size;
num_scan_areas++;
@@ -105,7 +108,6 @@ void __init setup_bios_corruption_check(void)

printk(KERN_INFO "Scanning %d areas for low memory corruption\n",
num_scan_areas);
- update_e820();
}


diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 0d6fc71..b4a2301 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -15,6 +15,7 @@
#include <linux/pfn.h>
#include <linux/suspend.h>
#include <linux/firmware-map.h>
+#include <linux/lmb.h>

#include <asm/e820.h>
#include <asm/proto.h>
@@ -742,69 +743,29 @@ core_initcall(e820_mark_nvs_memory);
*/
u64 __init find_e820_area(u64 start, u64 end, u64 size, u64 align)
{
- int i;
-
- for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *ei = &e820.map[i];
- u64 addr;
- u64 ei_start, ei_last;
+ u64 mem = lmb_find_in_range(start, end, size, align);

- if (ei->type != E820_RAM)
- continue;
-
- ei_last = ei->addr + ei->size;
- ei_start = ei->addr;
- addr = find_early_area(ei_start, ei_last, start, end,
- size, align);
-
- if (addr != -1ULL)
- return addr;
- }
- return -1ULL;
-}
+ if (mem == LMB_ERROR)
+ return -1ULL;

-u64 __init find_fw_memmap_area(u64 start, u64 end, u64 size, u64 align)
-{
- return find_e820_area(start, end, size, align);
+ return mem;
}

-u64 __init get_max_mapped(void)
-{
- u64 end = max_pfn_mapped;
-
- end <<= PAGE_SHIFT;
-
- return end;
-}
/*
* Find next free range after *start
*/
u64 __init find_e820_area_size(u64 start, u64 *sizep, u64 align)
{
- int i;
-
- for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *ei = &e820.map[i];
- u64 addr;
- u64 ei_start, ei_last;
-
- if (ei->type != E820_RAM)
- continue;
+ u64 mem = lmb_find_in_range_size(start, sizep, align);

- ei_last = ei->addr + ei->size;
- ei_start = ei->addr;
- addr = find_early_area_size(ei_start, ei_last, start,
- sizep, align);
+ if (mem == LMB_ERROR)
+ return -1ULL

- if (addr != -1ULL)
- return addr;
- }
-
- return -1ULL;
+ return mem;
}

/*
- * pre allocated 4k and reserved it in e820
+ * pre allocated 4k and reserved it in lmb and e820_saved
*/
u64 __init early_reserve_e820(u64 startt, u64 sizet, u64 align)
{
@@ -813,8 +774,8 @@ u64 __init early_reserve_e820(u64 startt, u64 sizet, u64 align)
u64 start;

for (start = startt; ; start += size) {
- start = find_e820_area_size(start, &size, align);
- if (!(start + 1))
+ start = lmb_find_in_range_size(start, &size, align);
+ if (start == LMB_ERROR)
return 0;
if (size >= sizet)
break;
@@ -830,10 +791,9 @@ u64 __init early_reserve_e820(u64 startt, u64 sizet, u64 align)
addr = round_down(start + size - sizet, align);
if (addr < start)
return 0;
- e820_update_range(addr, sizet, E820_RAM, E820_RESERVED);
+ lmb_reserve_range(addr, addr + sizet, "new next");
e820_update_range_saved(addr, sizet, E820_RAM, E820_RESERVED);
- printk(KERN_INFO "update e820 for early_reserve_e820\n");
- update_e820();
+ printk(KERN_INFO "update e820_saved for early_reserve_e820\n");
update_e820_saved();

return addr;
@@ -895,52 +855,12 @@ unsigned long __init e820_end_of_low_ram_pfn(void)
{
return e820_end_pfn(1UL<<(32 - PAGE_SHIFT), E820_RAM);
}
-/*
- * Finds an active region in the address range from start_pfn to last_pfn and
- * returns its range in ei_startpfn and ei_endpfn for the e820 entry.
- */
-int __init e820_find_active_region(const struct e820entry *ei,
- unsigned long start_pfn,
- unsigned long last_pfn,
- unsigned long *ei_startpfn,
- unsigned long *ei_endpfn)
-{
- u64 align = PAGE_SIZE;
-
- *ei_startpfn = round_up(ei->addr, align) >> PAGE_SHIFT;
- *ei_endpfn = round_down(ei->addr + ei->size, align) >> PAGE_SHIFT;
-
- /* Skip map entries smaller than a page */
- if (*ei_startpfn >= *ei_endpfn)
- return 0;
-
- /* Skip if map is outside the node */
- if (ei->type != E820_RAM || *ei_endpfn <= start_pfn ||
- *ei_startpfn >= last_pfn)
- return 0;
-
- /* Check for overlaps */
- if (*ei_startpfn < start_pfn)
- *ei_startpfn = start_pfn;
- if (*ei_endpfn > last_pfn)
- *ei_endpfn = last_pfn;
-
- return 1;
-}

/* Walk the e820 map and register active regions within a node */
void __init e820_register_active_regions(int nid, unsigned long start_pfn,
unsigned long last_pfn)
{
- unsigned long ei_startpfn;
- unsigned long ei_endpfn;
- int i;
-
- for (i = 0; i < e820.nr_map; i++)
- if (e820_find_active_region(&e820.map[i],
- start_pfn, last_pfn,
- &ei_startpfn, &ei_endpfn))
- add_active_range(nid, ei_startpfn, ei_endpfn);
+ lmb_register_active_regions(nid, start_pfn, last_pfn);
}

/*
@@ -950,18 +870,16 @@ void __init e820_register_active_regions(int nid, unsigned long start_pfn,
*/
u64 __init e820_hole_size(u64 start, u64 end)
{
- unsigned long start_pfn = start >> PAGE_SHIFT;
- unsigned long last_pfn = end >> PAGE_SHIFT;
- unsigned long ei_startpfn, ei_endpfn, ram = 0;
- int i;
+ return lmb_hole_size(start, end);
+}

- for (i = 0; i < e820.nr_map; i++) {
- if (e820_find_active_region(&e820.map[i],
- start_pfn, last_pfn,
- &ei_startpfn, &ei_endpfn))
- ram += ei_endpfn - ei_startpfn;
- }
- return end - start - ((u64)ram << PAGE_SHIFT);
+void reserve_early(u64 start, u64 end, char *name)
+{
+ lmb_reserve_range(start, end, name);
+}
+void free_early(u64 start, u64 end)
+{
+ lmb_free_range(start, end);
}

static void early_panic(char *msg)
@@ -1210,3 +1128,37 @@ void __init setup_memory_map(void)
printk(KERN_INFO "BIOS-provided physical RAM map:\n");
e820_print_map(who);
}
+
+void __init init_lmb_memory(void)
+{
+ lmb_init();
+}
+
+void __init fill_lmb_memory(void)
+{
+ int i;
+ u64 end;
+
+ /*
+ * EFI may have more than 128 entries
+ * We are safe to enable resizing, beause fill_lmb_memory()
+ * is rather later for x86
+ */
+ lmb_can_resize = 1;
+
+ for (i = 0; i < e820.nr_map; i++) {
+ struct e820entry *ei = &e820.map[i];
+
+ end = ei->addr + ei->size;
+ if (end != (resource_size_t)end)
+ continue;
+
+ if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN)
+ continue;
+
+ lmb_add(ei->addr, ei->size);
+ }
+
+ lmb_analyze();
+ lmb_dump_all();
+}
diff --git a/arch/x86/kernel/head.c b/arch/x86/kernel/head.c
index 3e66bd3..1239c9a 100644
--- a/arch/x86/kernel/head.c
+++ b/arch/x86/kernel/head.c
@@ -1,5 +1,6 @@
#include <linux/kernel.h>
#include <linux/init.h>
+#include <linux/lmb.h>

#include <asm/setup.h>
#include <asm/bios_ebda.h>
@@ -51,5 +52,5 @@ void __init reserve_ebda_region(void)
lowmem = 0x9f000;

/* reserve all memory between lowmem and the 1MB mark */
- reserve_early_overlap_ok(lowmem, 0x100000, "BIOS reserved");
+ lmb_reserve_range(lowmem, 0x100000, "* BIOS reserved");
}
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index b2e2460..d5258e4 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -8,6 +8,7 @@
#include <linux/init.h>
#include <linux/start_kernel.h>
#include <linux/mm.h>
+#include <linux/lmb.h>

#include <asm/setup.h>
#include <asm/sections.h>
@@ -30,14 +31,15 @@ static void __init i386_default_early_setup(void)

void __init i386_start_kernel(void)
{
+ init_lmb_memory();
+
#ifdef CONFIG_X86_TRAMPOLINE
/*
* But first pinch a few for the stack/trampoline stuff
* FIXME: Don't need the extra page at 4K, but need to fix
* trampoline before removing it. (see the GDT stuff)
*/
- reserve_early_overlap_ok(PAGE_SIZE, PAGE_SIZE + PAGE_SIZE,
- "EX TRAMPOLINE");
+ lmb_reserve_range(PAGE_SIZE, PAGE_SIZE + PAGE_SIZE, "EX TRAMPOLINE");
#endif

reserve_early(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 7147143..89dd2de 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -12,6 +12,7 @@
#include <linux/percpu.h>
#include <linux/start_kernel.h>
#include <linux/io.h>
+#include <linux/lmb.h>

#include <asm/processor.h>
#include <asm/proto.h>
@@ -96,6 +97,8 @@ void __init x86_64_start_kernel(char * real_mode_data)

void __init x86_64_start_reservations(char *real_mode_data)
{
+ init_lmb_memory();
+
copy_bootdata(__va(real_mode_data));

reserve_early(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index d86dbf7..57ef3bd 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -11,6 +11,7 @@
#include <linux/init.h>
#include <linux/delay.h>
#include <linux/bootmem.h>
+#include <linux/lmb.h>
#include <linux/kernel_stat.h>
#include <linux/mc146818rtc.h>
#include <linux/bitops.h>
@@ -641,7 +642,7 @@ static void __init smp_reserve_memory(struct mpf_intel *mpf)
{
unsigned long size = get_mpc_size(mpf->physptr);

- reserve_early_overlap_ok(mpf->physptr, mpf->physptr+size, "MP-table mpc");
+ lmb_reserve_range(mpf->physptr, mpf->physptr+size, "* MP-table mpc");
}

static int __init smp_scan_config(unsigned long base, unsigned long length)
@@ -670,7 +671,7 @@ static int __init smp_scan_config(unsigned long base, unsigned long length)
mpf, (u64)virt_to_phys(mpf));

mem = virt_to_phys(mpf);
- reserve_early_overlap_ok(mem, mem + sizeof(*mpf), "MP-table mpf");
+ lmb_reserve_range(mem, mem + sizeof(*mpf), "* MP-table mpf");
if (mpf->physptr)
smp_reserve_memory(mpf);

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b008e78..3d68c3e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -31,6 +31,7 @@
#include <linux/apm_bios.h>
#include <linux/initrd.h>
#include <linux/bootmem.h>
+#include <linux/lmb.h>
#include <linux/seq_file.h>
#include <linux/console.h>
#include <linux/mca.h>
@@ -615,7 +616,7 @@ static __init void reserve_ibft_region(void)
addr = find_ibft_region(&size);

if (size)
- reserve_early_overlap_ok(addr, addr + size, "ibft");
+ lmb_reserve_range(addr, addr + size, "* ibft");
}

#ifdef CONFIG_X86_RESERVE_LOW_64K
@@ -709,6 +710,15 @@ static void __init trim_bios_range(void)
sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
}

+static u64 __init get_max_mapped(void)
+{
+ u64 end = max_pfn_mapped;
+
+ end <<= PAGE_SHIFT;
+
+ return end;
+}
+
/*
* Determine if we were loaded by an EFI loader. If so, then we have also been
* passed the efi memmap, systab, etc., so we should use these data structures
@@ -897,8 +907,6 @@ void __init setup_arch(char **cmdline_p)
*/
max_pfn = e820_end_of_ram_pfn();

- /* preallocate 4k for mptable mpc */
- early_reserve_e820_mpc_new();
/* update e820 for memory not covered by WB MTRRs */
mtrr_bp_init();
if (mtrr_trim_uncached_memory(max_pfn))
@@ -923,15 +931,6 @@ void __init setup_arch(char **cmdline_p)
max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
#endif

-#ifdef CONFIG_X86_CHECK_BIOS_CORRUPTION
- setup_bios_corruption_check();
-#endif
-
- printk(KERN_DEBUG "initial memory mapped : 0 - %08lx\n",
- max_pfn_mapped<<PAGE_SHIFT);
-
- reserve_brk();
-
/*
* Find and reserve possible boot-time SMP configuration:
*/
@@ -939,6 +938,26 @@ void __init setup_arch(char **cmdline_p)

reserve_ibft_region();

+ /*
+ * Need to conclude brk, before fill_lmb_memory()
+ * it could use lmb_find_in_range, could overlap with
+ * brk area.
+ */
+ reserve_brk();
+
+ lmb.current_limit = get_max_mapped();
+ fill_lmb_memory();
+
+ /* preallocate 4k for mptable mpc */
+ early_reserve_e820_mpc_new();
+
+#ifdef CONFIG_X86_CHECK_BIOS_CORRUPTION
+ setup_bios_corruption_check();
+#endif
+
+ printk(KERN_DEBUG "initial memory mapped : 0 - %08lx\n",
+ max_pfn_mapped<<PAGE_SHIFT);
+
reserve_trampoline_memory();

#ifdef CONFIG_ACPI_SLEEP
@@ -962,6 +981,7 @@ void __init setup_arch(char **cmdline_p)
max_low_pfn = max_pfn;
}
#endif
+ lmb.current_limit = get_max_mapped();

/*
* NOTE: On x86-32, only from this point on, fixmaps are ready for use.
@@ -1001,7 +1021,7 @@ void __init setup_arch(char **cmdline_p)

initmem_init(0, max_pfn, acpi, k8);
#ifndef CONFIG_NO_BOOTMEM
- early_res_to_bootmem(0, max_low_pfn<<PAGE_SHIFT);
+ lmb_to_bootmem(0, max_low_pfn<<PAGE_SHIFT);
#endif

dma32_reserve_bootmem();
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index de3b63a..2a5cc9d 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -131,13 +131,7 @@ static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size, size_t align)

static void __init pcpu_fc_free(void *ptr, size_t size)
{
-#ifdef CONFIG_NO_BOOTMEM
- u64 start = __pa(ptr);
- u64 end = start + size;
- free_early_partial(start, end);
-#else
free_bootmem(__pa(ptr), size);
-#endif
}

static int __init pcpu_cpu_distance(unsigned int from, unsigned int to)
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index a7bcc23..ffd987f 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -7,6 +7,7 @@
#include <linux/string.h>
#include <linux/init.h>
#include <linux/bootmem.h>
+#include <linux/lmb.h>
#include <linux/mmzone.h>
#include <linux/ctype.h>
#include <linux/module.h>
@@ -171,8 +172,8 @@ static void * __init early_node_mem(int nodeid, unsigned long start,
if (start < (MAX_DMA32_PFN<<PAGE_SHIFT) &&
end > (MAX_DMA32_PFN<<PAGE_SHIFT))
start = MAX_DMA32_PFN<<PAGE_SHIFT;
- mem = find_e820_area(start, end, size, align);
- if (mem != -1L)
+ mem = lmb_find_in_range_node(nodeid, start, end, size, align);
+ if (mem != LMB_ERROR)
return __va(mem);

/* extend the search scope */
@@ -181,8 +182,8 @@ static void * __init early_node_mem(int nodeid, unsigned long start,
start = MAX_DMA32_PFN<<PAGE_SHIFT;
else
start = MAX_DMA_PFN<<PAGE_SHIFT;
- mem = find_e820_area(start, end, size, align);
- if (mem != -1L)
+ mem = lmb_find_in_range_node(nodeid, start, end, size, align);
+ if (mem != LMB_ERROR)
return __va(mem);

printk(KERN_ERR "Cannot find %lu bytes in node %d\n",
diff --git a/kernel/Makefile b/kernel/Makefile
index ce53fb2..b263d03 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -11,7 +11,6 @@ obj-y = sched.o fork.o exec_domain.o panic.o printk.o \
hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
notifier.o ksysfs.o pm_qos_params.o sched_clock.o cred.o \
async.o range.o
-obj-$(CONFIG_HAVE_EARLY_RES) += early_res.o
obj-y += groups.o

ifdef CONFIG_FUNCTION_TRACER
diff --git a/mm/bootmem.c b/mm/bootmem.c
index ee31b95..dac3f56 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -15,6 +15,7 @@
#include <linux/module.h>
#include <linux/kmemleak.h>
#include <linux/range.h>
+#include <linux/lmb.h>

#include <asm/bug.h>
#include <asm/io.h>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 37f30fc..f67a09d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3667,38 +3667,22 @@ int __init add_from_early_node_map(struct range *range, int az,
void * __init __alloc_memory_core_early(int nid, u64 size, u64 align,
u64 goal, u64 limit)
{
- int i;
void *ptr;

- /* need to go over early_node_map to find out good range for node */
- for_each_active_range_index_in_nid(i, nid) {
- u64 addr;
- u64 ei_start, ei_last;
+ u64 addr;

- ei_last = early_node_map[i].end_pfn;
- ei_last <<= PAGE_SHIFT;
- ei_start = early_node_map[i].start_pfn;
- ei_start <<= PAGE_SHIFT;
- addr = find_early_area(ei_start, ei_last,
- goal, limit, size, align);
-
- if (addr == -1ULL)
- continue;
+ if (limit > lmb.current_limit)
+ limit = lmb.current_limit;

-#if 0
- printk(KERN_DEBUG "alloc (nid=%d %llx - %llx) (%llx - %llx) %llx %llx => %llx\n",
- nid,
- ei_start, ei_last, goal, limit, size,
- align, addr);
-#endif
+ addr = find_memory_core_early(nid, size, align, goal, limit);

- ptr = phys_to_virt(addr);
- memset(ptr, 0, size);
- reserve_early_without_check(addr, addr + size, "BOOTMEM");
- return ptr;
- }
+ if (addr == LMB_ERROR)
+ return NULL;

- return NULL;
+ ptr = phys_to_virt(addr);
+ memset(ptr, 0, size);
+ lmb_reserve_range(addr, addr + size, "BOOTMEM");
+ return ptr;
}
#endif

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index aa33fd6..29d6cbf 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -220,18 +220,7 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,

if (vmemmap_buf_start) {
/* need to free left buf */
-#ifdef CONFIG_NO_BOOTMEM
- free_early(__pa(vmemmap_buf_start), __pa(vmemmap_buf_end));
- if (vmemmap_buf_start < vmemmap_buf) {
- char name[15];
-
- snprintf(name, sizeof(name), "MEMMAP %d", nodeid);
- reserve_early_without_check(__pa(vmemmap_buf_start),
- __pa(vmemmap_buf), name);
- }
-#else
free_bootmem(__pa(vmemmap_buf), vmemmap_buf_end - vmemmap_buf);
-#endif
vmemmap_buf = NULL;
vmemmap_buf_end = NULL;
}
--
1.6.4.2

2010-06-22 17:29:52

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 24/25] x86: Put 64 bit numa node memmap above 16M

Do not use 0x8000 hard code value anymore.

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/mm/numa_64.c | 2 +-
arch/x86/mm/srat_64.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 92c01a5..93a1e07 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -85,7 +85,7 @@ static int __init allocate_cachealigned_memnodemap(void)
if (memnodemapsize <= ARRAY_SIZE(memnode.embedded_map))
return 0;

- addr = 0x8000;
+ addr = __pa(MAX_DMA_ADDRESS);
nodemap_size = roundup(sizeof(s16) * memnodemapsize, L1_CACHE_BYTES);
nodemap_addr = lmb_find_in_range(addr, max_pfn<<PAGE_SHIFT,
nodemap_size, L1_CACHE_BYTES);
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index 8f82c80..1f9b7b7 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -99,8 +99,8 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
unsigned long phys;

length = slit->header.length;
- phys = lmb_find_in_range(0, max_pfn_mapped<<PAGE_SHIFT, length,
- PAGE_SIZE);
+ phys = lmb_find_in_range(__pa(MAX_DMA_ADDRESS), max_pfn_mapped<<PAGE_SHIFT,
+ length, PAGE_SIZE);

if (phys == (unsigned long)LMB_ERROR)
panic(" Can not save slit!\n");
--
1.6.4.2

2010-06-22 17:28:53

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 11/25] x86,lmb: Add lmb_reserve_range/lmb_free_range

they are wrappers for core versions.
they are taking start/end/name instead of base/size.
will make x86 conversion more easy

could add more debug print out

-v2: change get_max_mapped() to lmb.default_alloc_limit according to Michael
Ellerman and Ben
change to lmb_reserve_range and lmb_free_range according to Michael Ellerman
-v3: call check_and_double after reserve/free, so could avoid to use
find_lmb_area. Suggested by Michael Ellerman

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/lmb.h | 3 +++
arch/x86/mm/lmb.c | 22 ++++++++++++++++++++++
2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
index e8207f8..70df84f 100644
--- a/arch/x86/include/asm/lmb.h
+++ b/arch/x86/include/asm/lmb.h
@@ -6,4 +6,7 @@
u64 lmb_find_in_range_size(u64 start, u64 *sizep, u64 align);
void lmb_to_bootmem(u64 start, u64 end);

+void lmb_reserve_range(u64 start, u64 end, char *name);
+void lmb_free_range(u64 start, u64 end);
+
#endif
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index a959699..6b6cc58 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -116,3 +116,25 @@ void __init lmb_to_bootmem(u64 start, u64 end)
lmb_reserve(__pa(lmb.reserved.regions), sizeof(struct lmb_region) * lmb.reserved.max);
}
#endif
+
+void __init lmb_reserve_range(u64 start, u64 end, char *name)
+{
+ if (start == end)
+ return;
+
+ if (WARN_ONCE(start > end, "lmb_reserve_range: wrong range [%#llx, %#llx]\n", start, end))
+ return;
+
+ lmb_reserve(start, end - start);
+}
+
+void __init lmb_free_range(u64 start, u64 end)
+{
+ if (start == end)
+ return;
+
+ if (WARN_ONCE(start > end, "lmb_free_range: wrong range [%#llx, %#llx]\n", start, end))
+ return;
+
+ lmb_free(start, end - start);
+}
--
1.6.4.2

2010-06-22 17:30:19

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 18/25] x86, lmb: Use lmb_debug to control debug message print out

Also let lmb_reserve_range/lmb_free_range could print out name if lmb=debug is
specified

will also print ther name when reserve_lmb_area/free_lmb_area are called.

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/mm/lmb.c | 24 ++++++++++++++++++------
1 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index bd2f60b..209f25b 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -122,10 +122,12 @@ static void __init subtract_lmb_reserved(struct range *range, int az)

count = lmb.reserved.cnt;

- pr_info("Subtract (%d early reservations)\n", count);
+ if (lmb_debug)
+ pr_info("Subtract (%d early reservations)\n", count);

for_each_lmb(reserved, r) {
- pr_info(" [%010llx - %010llx]\n", (u64)r->base, (u64)r->base + r->size);
+ if (lmb_debug)
+ pr_info(" [%010llx - %010llx]\n", (u64)r->base, (u64)r->base + r->size);
final_start = PFN_DOWN(r->base);
final_end = PFN_UP(r->base + r->size);
if (final_start >= final_end)
@@ -198,16 +200,20 @@ void __init lmb_to_bootmem(u64 start, u64 end)
lmb_free(__pa(lmb.reserved.regions), sizeof(struct lmb_region) * lmb.reserved.max);

count = lmb.reserved.cnt;
- pr_info("(%d early reservations) ==> bootmem [%010llx - %010llx]\n", count, start, end);
+ if (lmb_debug)
+ pr_info("(%d early reservations) ==> bootmem [%010llx - %010llx]\n", count, start, end);
for_each_lmb(reserved, r) {
- pr_info(" [%010llx - %010llx] ", (u64)r->base, (u64)r->base + r->size);
+ if (lmb_debug)
+ pr_info(" [%010llx - %010llx] ", (u64)r->base, (u64)r->base + r->size);
final_start = max(start, r->base);
final_end = min(end, r->base + r->size);
if (final_start >= final_end) {
- pr_cont("\n");
+ if (lmb_debug)
+ pr_cont("\n");
continue;
}
- pr_cont(" ==> [%010llx - %010llx]\n", final_start, final_end);
+ if (lmb_debug)
+ pr_cont(" ==> [%010llx - %010llx]\n", final_start, final_end);
reserve_bootmem_generic(final_start, final_end - final_start, BOOTMEM_DEFAULT);
}

@@ -289,6 +295,9 @@ void __init lmb_reserve_range(u64 start, u64 end, char *name)
if (WARN_ONCE(start > end, "lmb_reserve_range: wrong range [%#llx, %#llx]\n", start, end))
return;

+ if (lmb_debug)
+ pr_info(" lmb_reserve_range: [%010llx, %010llx] %16s\n", start, end, name);
+
lmb_reserve(start, end - start);
}

@@ -300,6 +309,9 @@ void __init lmb_free_range(u64 start, u64 end)
if (WARN_ONCE(start > end, "lmb_free_range: wrong range [%#llx, %#llx]\n", start, end))
return;

+ if (lmb_debug)
+ pr_info(" lmb_free_range: [%010llx, %010llx]\n", start, end);
+
lmb_free(start, end - start);
}

--
1.6.4.2

2010-06-22 17:28:51

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 08/25] x86, lmb: Add lmb_find_in_range_size()

size is returned according free range.
Will be used to find free ranges for early_memtest and memory corruption check

Do not mess it up with lib/lmb.c yet.

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/lmb.h | 8 ++++
arch/x86/mm/Makefile | 2 +
arch/x86/mm/lmb.c | 88 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 98 insertions(+), 0 deletions(-)
create mode 100644 arch/x86/include/asm/lmb.h
create mode 100644 arch/x86/mm/lmb.c

diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
new file mode 100644
index 0000000..9d26895
--- /dev/null
+++ b/arch/x86/include/asm/lmb.h
@@ -0,0 +1,8 @@
+#ifndef _X86_LMB_H
+#define _X86_LMB_H
+
+#define ARCH_DISCARD_LMB
+
+u64 lmb_find_in_range_size(u64 start, u64 *sizep, u64 align);
+
+#endif
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index a4c7683..8ab0505 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -26,4 +26,6 @@ obj-$(CONFIG_NUMA) += numa.o numa_$(BITS).o
obj-$(CONFIG_K8_NUMA) += k8topology_64.o
obj-$(CONFIG_ACPI_NUMA) += srat_$(BITS).o

+obj-$(CONFIG_HAVE_LMB) += lmb.o
+
obj-$(CONFIG_MEMTEST) += memtest.o
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
new file mode 100644
index 0000000..4335f48
--- /dev/null
+++ b/arch/x86/mm/lmb.c
@@ -0,0 +1,88 @@
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/bitops.h>
+#include <linux/lmb.h>
+#include <linux/bootmem.h>
+#include <linux/mm.h>
+#include <linux/range.h>
+
+/* Check for already reserved areas */
+static inline bool __init bad_addr_size(u64 *addrp, u64 *sizep, u64 align)
+{
+ struct lmb_region *r;
+ u64 addr = *addrp, last;
+ u64 size = *sizep;
+ bool changed = false;
+
+again:
+ last = addr + size;
+ for_each_lmb(reserved, r) {
+ if (last > r->base && addr < r->base) {
+ size = r->base - addr;
+ changed = true;
+ goto again;
+ }
+ if (last > (r->base + r->size) && addr < (r->base + r->size)) {
+ addr = round_up(r->base + r->size, align);
+ size = last - addr;
+ changed = true;
+ goto again;
+ }
+ if (last <= (r->base + r->size) && addr >= r->base) {
+ (*sizep)++;
+ return false;
+ }
+ }
+ if (changed) {
+ *addrp = addr;
+ *sizep = size;
+ }
+ return changed;
+}
+
+static u64 __init __lmb_find_in_range_size(u64 ei_start, u64 ei_last, u64 start,
+ u64 *sizep, u64 align)
+{
+ u64 addr, last;
+
+ addr = round_up(ei_start, align);
+ if (addr < start)
+ addr = round_up(start, align);
+ if (addr >= ei_last)
+ goto out;
+ *sizep = ei_last - addr;
+ while (bad_addr_size(&addr, sizep, align) && addr + *sizep <= ei_last)
+ ;
+ last = addr + *sizep;
+ if (last > ei_last)
+ goto out;
+
+ return addr;
+
+out:
+ return LMB_ERROR;
+}
+
+/*
+ * Find next free range after start, and size is returned in *sizep
+ */
+u64 __init lmb_find_in_range_size(u64 start, u64 *sizep, u64 align)
+{
+ struct lmb_region *r;
+
+ for_each_lmb(memory, r) {
+ u64 ei_start = r->base;
+ u64 ei_last = ei_start + r->size;
+ u64 addr;
+
+ addr = __lmb_find_in_range_size(ei_start, ei_last, start,
+ sizep, align);
+
+ if (addr != LMB_ERROR)
+ return addr;
+ }
+
+ return LMB_ERROR;
+}
+
--
1.6.4.2

2010-06-22 17:30:49

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 25/25] swiotlb: Use page alignment for early buffer allocation

for 2.6.34

We could call free_bootmem_late() if swiotlb is not used, and
it will shrink to page alignement.

So alloc them with page alignment at first, to avoid lose two pages

before patch:
[ 0.000000] lmb_reserve_range: [00d3600000, 00d7600000] swiotlb buffer
[ 0.000000] lmb_reserve_range: [00d7e7ef40, 00d7e9ef40] swiotlb list
[ 0.000000] lmb_reserve_range: [00d7e3ef40, 00d7e7ef40] swiotlb orig_ad
[ 0.000000] lmb_reserve_range: [000008a000, 0000092000] swiotlb overflo

after patch will get
[ 0.000000] lmb_reserve_range: [00d3600000, 00d7600000] swiotlb buffer
[ 0.000000] lmb_reserve_range: [00d7e7e000, 00d7e9e000] swiotlb list
[ 0.000000] lmb_reserve_range: [00d7e3e000, 00d7e7e000] swiotlb orig_ad
[ 0.000000] lmb_reserve_range: [000008a000, 0000092000] swiotlb overflo

Signed-off-by: Yinghai Lu <[email protected]>
Cc: FUJITA Tomonori <[email protected]>
Cc: Becky Bruce <[email protected]>
---
lib/swiotlb.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index a009055..ecfab7f 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -159,7 +159,7 @@ swiotlb_init_with_default_size(size_t default_size, int verbose)
/*
* Get IO TLB memory from the low pages
*/
- io_tlb_start = alloc_bootmem_low_pages(bytes);
+ io_tlb_start = alloc_bootmem_low_pages(PAGE_ALIGN(bytes));
if (!io_tlb_start)
panic("Cannot allocate SWIOTLB buffer");
io_tlb_end = io_tlb_start + bytes;
@@ -169,16 +169,16 @@ swiotlb_init_with_default_size(size_t default_size, int verbose)
* to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
* between io_tlb_start and io_tlb_end.
*/
- io_tlb_list = alloc_bootmem(io_tlb_nslabs * sizeof(int));
+ io_tlb_list = alloc_bootmem_pages(PAGE_ALIGN(io_tlb_nslabs * sizeof(int)));
for (i = 0; i < io_tlb_nslabs; i++)
io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
io_tlb_index = 0;
- io_tlb_orig_addr = alloc_bootmem(io_tlb_nslabs * sizeof(phys_addr_t));
+ io_tlb_orig_addr = alloc_bootmem_pages(PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t)));

/*
* Get the overflow emergency buffer
*/
- io_tlb_overflow_buffer = alloc_bootmem_low(io_tlb_overflow);
+ io_tlb_overflow_buffer = alloc_bootmem_low_pages(PAGE_ALIGN(io_tlb_overflow));
if (!io_tlb_overflow_buffer)
panic("Cannot allocate SWIOTLB overflow buffer!\n");
if (verbose)
@@ -304,13 +304,13 @@ void __init swiotlb_free(void)
get_order(io_tlb_nslabs << IO_TLB_SHIFT));
} else {
free_bootmem_late(__pa(io_tlb_overflow_buffer),
- io_tlb_overflow);
+ PAGE_ALIGN(io_tlb_overflow));
free_bootmem_late(__pa(io_tlb_orig_addr),
- io_tlb_nslabs * sizeof(phys_addr_t));
+ PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t)));
free_bootmem_late(__pa(io_tlb_list),
- io_tlb_nslabs * sizeof(int));
+ PAGE_ALIGN(io_tlb_nslabs * sizeof(int)));
free_bootmem_late(__pa(io_tlb_start),
- io_tlb_nslabs << IO_TLB_SHIFT);
+ PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
}
}

--
1.6.4.2

2010-06-22 17:31:06

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 21/25] x86: Remove not used early_res code

and some functions in e820.c that are not used anymore

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/e820.h | 14 -
arch/x86/kernel/e820.c | 52 ----
include/linux/early_res.h | 23 --
kernel/early_res.c | 584 -------------------------------------------
4 files changed, 0 insertions(+), 673 deletions(-)
delete mode 100644 include/linux/early_res.h
delete mode 100644 kernel/early_res.c

diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index 38adac8..6fbd8cd 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -112,32 +112,18 @@ static inline void early_memtest(unsigned long start, unsigned long end)
}
#endif

-extern unsigned long end_user_pfn;
-
-extern u64 find_e820_area(u64 start, u64 end, u64 size, u64 align);
-extern u64 find_e820_area_size(u64 start, u64 *sizep, u64 align);
-extern u64 early_reserve_e820(u64 startt, u64 sizet, u64 align);
-
extern unsigned long e820_end_of_ram_pfn(void);
extern unsigned long e820_end_of_low_ram_pfn(void);
-extern void e820_register_active_regions(int nid, unsigned long start_pfn,
- unsigned long end_pfn);
-extern u64 e820_hole_size(u64 start, u64 end);
-
extern u64 early_reserve_e820(u64 startt, u64 sizet, u64 align);

void init_lmb_memory(void);
void fill_lmb_memory(void);
-
extern void finish_e820_parsing(void);
extern void e820_reserve_resources(void);
extern void e820_reserve_resources_late(void);
extern void setup_memory_map(void);
extern char *default_machine_specific_memory_setup(void);

-void reserve_early(u64 start, u64 end, char *name);
-void free_early(u64 start, u64 end);
-
/*
* Returns true iff the specified range [s,e) is completely contained inside
* the ISA region.
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index b4a2301..c8edb78 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -739,32 +739,6 @@ core_initcall(e820_mark_nvs_memory);
#endif

/*
- * Find a free area with specified alignment in a specific range.
- */
-u64 __init find_e820_area(u64 start, u64 end, u64 size, u64 align)
-{
- u64 mem = lmb_find_in_range(start, end, size, align);
-
- if (mem == LMB_ERROR)
- return -1ULL;
-
- return mem;
-}
-
-/*
- * Find next free range after *start
- */
-u64 __init find_e820_area_size(u64 start, u64 *sizep, u64 align)
-{
- u64 mem = lmb_find_in_range_size(start, sizep, align);
-
- if (mem == LMB_ERROR)
- return -1ULL
-
- return mem;
-}
-
-/*
* pre allocated 4k and reserved it in lmb and e820_saved
*/
u64 __init early_reserve_e820(u64 startt, u64 sizet, u64 align)
@@ -856,32 +830,6 @@ unsigned long __init e820_end_of_low_ram_pfn(void)
return e820_end_pfn(1UL<<(32 - PAGE_SHIFT), E820_RAM);
}

-/* Walk the e820 map and register active regions within a node */
-void __init e820_register_active_regions(int nid, unsigned long start_pfn,
- unsigned long last_pfn)
-{
- lmb_register_active_regions(nid, start_pfn, last_pfn);
-}
-
-/*
- * Find the hole size (in bytes) in the memory range.
- * @start: starting address of the memory range to scan
- * @end: ending address of the memory range to scan
- */
-u64 __init e820_hole_size(u64 start, u64 end)
-{
- return lmb_hole_size(start, end);
-}
-
-void reserve_early(u64 start, u64 end, char *name)
-{
- lmb_reserve_range(start, end, name);
-}
-void free_early(u64 start, u64 end)
-{
- lmb_free_range(start, end);
-}
-
static void early_panic(char *msg)
{
early_printk(msg);
diff --git a/include/linux/early_res.h b/include/linux/early_res.h
deleted file mode 100644
index 29c09f5..0000000
--- a/include/linux/early_res.h
+++ /dev/null
@@ -1,23 +0,0 @@
-#ifndef _LINUX_EARLY_RES_H
-#define _LINUX_EARLY_RES_H
-#ifdef __KERNEL__
-
-extern void reserve_early(u64 start, u64 end, char *name);
-extern void reserve_early_overlap_ok(u64 start, u64 end, char *name);
-extern void free_early(u64 start, u64 end);
-void free_early_partial(u64 start, u64 end);
-extern void early_res_to_bootmem(u64 start, u64 end);
-
-void reserve_early_without_check(u64 start, u64 end, char *name);
-u64 find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
- u64 size, u64 align);
-u64 find_early_area_size(u64 ei_start, u64 ei_last, u64 start,
- u64 *sizep, u64 align);
-u64 find_fw_memmap_area(u64 start, u64 end, u64 size, u64 align);
-u64 get_max_mapped(void);
-#include <linux/range.h>
-int get_free_all_memory_range(struct range **rangep, int nodeid);
-
-#endif /* __KERNEL__ */
-
-#endif /* _LINUX_EARLY_RES_H */
diff --git a/kernel/early_res.c b/kernel/early_res.c
deleted file mode 100644
index 31aa933..0000000
--- a/kernel/early_res.c
+++ /dev/null
@@ -1,584 +0,0 @@
-/*
- * early_res, could be used to replace bootmem
- */
-#include <linux/kernel.h>
-#include <linux/types.h>
-#include <linux/init.h>
-#include <linux/bootmem.h>
-#include <linux/mm.h>
-#include <linux/early_res.h>
-
-/*
- * Early reserved memory areas.
- */
-/*
- * need to make sure this one is bigger enough before
- * find_fw_memmap_area could be used
- */
-#define MAX_EARLY_RES_X 32
-
-struct early_res {
- u64 start, end;
- char name[15];
- char overlap_ok;
-};
-static struct early_res early_res_x[MAX_EARLY_RES_X] __initdata;
-
-static int max_early_res __initdata = MAX_EARLY_RES_X;
-static struct early_res *early_res __initdata = &early_res_x[0];
-static int early_res_count __initdata;
-
-static int __init find_overlapped_early(u64 start, u64 end)
-{
- int i;
- struct early_res *r;
-
- for (i = 0; i < max_early_res && early_res[i].end; i++) {
- r = &early_res[i];
- if (end > r->start && start < r->end)
- break;
- }
-
- return i;
-}
-
-/*
- * Drop the i-th range from the early reservation map,
- * by copying any higher ranges down one over it, and
- * clearing what had been the last slot.
- */
-static void __init drop_range(int i)
-{
- int j;
-
- for (j = i + 1; j < max_early_res && early_res[j].end; j++)
- ;
-
- memmove(&early_res[i], &early_res[i + 1],
- (j - 1 - i) * sizeof(struct early_res));
-
- early_res[j - 1].end = 0;
- early_res_count--;
-}
-
-static void __init drop_range_partial(int i, u64 start, u64 end)
-{
- u64 common_start, common_end;
- u64 old_start, old_end;
-
- old_start = early_res[i].start;
- old_end = early_res[i].end;
- common_start = max(old_start, start);
- common_end = min(old_end, end);
-
- /* no overlap ? */
- if (common_start >= common_end)
- return;
-
- if (old_start < common_start) {
- /* make head segment */
- early_res[i].end = common_start;
- if (old_end > common_end) {
- char name[15];
-
- /*
- * Save a local copy of the name, since the
- * early_res array could get resized inside
- * reserve_early_without_check() ->
- * __check_and_double_early_res(), which would
- * make the current name pointer invalid.
- */
- strncpy(name, early_res[i].name,
- sizeof(early_res[i].name) - 1);
- /* add another for left over on tail */
- reserve_early_without_check(common_end, old_end, name);
- }
- return;
- } else {
- if (old_end > common_end) {
- /* reuse the entry for tail left */
- early_res[i].start = common_end;
- return;
- }
- /* all covered */
- drop_range(i);
- }
-}
-
-/*
- * Split any existing ranges that:
- * 1) are marked 'overlap_ok', and
- * 2) overlap with the stated range [start, end)
- * into whatever portion (if any) of the existing range is entirely
- * below or entirely above the stated range. Drop the portion
- * of the existing range that overlaps with the stated range,
- * which will allow the caller of this routine to then add that
- * stated range without conflicting with any existing range.
- */
-static void __init drop_overlaps_that_are_ok(u64 start, u64 end)
-{
- int i;
- struct early_res *r;
- u64 lower_start, lower_end;
- u64 upper_start, upper_end;
- char name[15];
-
- for (i = 0; i < max_early_res && early_res[i].end; i++) {
- r = &early_res[i];
-
- /* Continue past non-overlapping ranges */
- if (end <= r->start || start >= r->end)
- continue;
-
- /*
- * Leave non-ok overlaps as is; let caller
- * panic "Overlapping early reservations"
- * when it hits this overlap.
- */
- if (!r->overlap_ok)
- return;
-
- /*
- * We have an ok overlap. We will drop it from the early
- * reservation map, and add back in any non-overlapping
- * portions (lower or upper) as separate, overlap_ok,
- * non-overlapping ranges.
- */
-
- /* 1. Note any non-overlapping (lower or upper) ranges. */
- strncpy(name, r->name, sizeof(name) - 1);
-
- lower_start = lower_end = 0;
- upper_start = upper_end = 0;
- if (r->start < start) {
- lower_start = r->start;
- lower_end = start;
- }
- if (r->end > end) {
- upper_start = end;
- upper_end = r->end;
- }
-
- /* 2. Drop the original ok overlapping range */
- drop_range(i);
-
- i--; /* resume for-loop on copied down entry */
-
- /* 3. Add back in any non-overlapping ranges. */
- if (lower_end)
- reserve_early_overlap_ok(lower_start, lower_end, name);
- if (upper_end)
- reserve_early_overlap_ok(upper_start, upper_end, name);
- }
-}
-
-static void __init __reserve_early(u64 start, u64 end, char *name,
- int overlap_ok)
-{
- int i;
- struct early_res *r;
-
- i = find_overlapped_early(start, end);
- if (i >= max_early_res)
- panic("Too many early reservations");
- r = &early_res[i];
- if (r->end)
- panic("Overlapping early reservations "
- "%llx-%llx %s to %llx-%llx %s\n",
- start, end - 1, name ? name : "", r->start,
- r->end - 1, r->name);
- r->start = start;
- r->end = end;
- r->overlap_ok = overlap_ok;
- if (name)
- strncpy(r->name, name, sizeof(r->name) - 1);
- early_res_count++;
-}
-
-/*
- * A few early reservtations come here.
- *
- * The 'overlap_ok' in the name of this routine does -not- mean it
- * is ok for these reservations to overlap an earlier reservation.
- * Rather it means that it is ok for subsequent reservations to
- * overlap this one.
- *
- * Use this entry point to reserve early ranges when you are doing
- * so out of "Paranoia", reserving perhaps more memory than you need,
- * just in case, and don't mind a subsequent overlapping reservation
- * that is known to be needed.
- *
- * The drop_overlaps_that_are_ok() call here isn't really needed.
- * It would be needed if we had two colliding 'overlap_ok'
- * reservations, so that the second such would not panic on the
- * overlap with the first. We don't have any such as of this
- * writing, but might as well tolerate such if it happens in
- * the future.
- */
-void __init reserve_early_overlap_ok(u64 start, u64 end, char *name)
-{
- drop_overlaps_that_are_ok(start, end);
- __reserve_early(start, end, name, 1);
-}
-
-static void __init __check_and_double_early_res(u64 ex_start, u64 ex_end)
-{
- u64 start, end, size, mem;
- struct early_res *new;
-
- /* do we have enough slots left ? */
- if ((max_early_res - early_res_count) > max(max_early_res/8, 2))
- return;
-
- /* double it */
- mem = -1ULL;
- size = sizeof(struct early_res) * max_early_res * 2;
- if (early_res == early_res_x)
- start = 0;
- else
- start = early_res[0].end;
- end = ex_start;
- if (start + size < end)
- mem = find_fw_memmap_area(start, end, size,
- sizeof(struct early_res));
- if (mem == -1ULL) {
- start = ex_end;
- end = get_max_mapped();
- if (start + size < end)
- mem = find_fw_memmap_area(start, end, size,
- sizeof(struct early_res));
- }
- if (mem == -1ULL)
- panic("can not find more space for early_res array");
-
- new = __va(mem);
- /* save the first one for own */
- new[0].start = mem;
- new[0].end = mem + size;
- new[0].overlap_ok = 0;
- /* copy old to new */
- if (early_res == early_res_x) {
- memcpy(&new[1], &early_res[0],
- sizeof(struct early_res) * max_early_res);
- memset(&new[max_early_res+1], 0,
- sizeof(struct early_res) * (max_early_res - 1));
- early_res_count++;
- } else {
- memcpy(&new[1], &early_res[1],
- sizeof(struct early_res) * (max_early_res - 1));
- memset(&new[max_early_res], 0,
- sizeof(struct early_res) * max_early_res);
- }
- memset(&early_res[0], 0, sizeof(struct early_res) * max_early_res);
- early_res = new;
- max_early_res *= 2;
- printk(KERN_DEBUG "early_res array is doubled to %d at [%llx - %llx]\n",
- max_early_res, mem, mem + size - 1);
-}
-
-/*
- * Most early reservations come here.
- *
- * We first have drop_overlaps_that_are_ok() drop any pre-existing
- * 'overlap_ok' ranges, so that we can then reserve this memory
- * range without risk of panic'ing on an overlapping overlap_ok
- * early reservation.
- */
-void __init reserve_early(u64 start, u64 end, char *name)
-{
- if (start >= end)
- return;
-
- __check_and_double_early_res(start, end);
-
- drop_overlaps_that_are_ok(start, end);
- __reserve_early(start, end, name, 0);
-}
-
-void __init reserve_early_without_check(u64 start, u64 end, char *name)
-{
- struct early_res *r;
-
- if (start >= end)
- return;
-
- __check_and_double_early_res(start, end);
-
- r = &early_res[early_res_count];
-
- r->start = start;
- r->end = end;
- r->overlap_ok = 0;
- if (name)
- strncpy(r->name, name, sizeof(r->name) - 1);
- early_res_count++;
-}
-
-void __init free_early(u64 start, u64 end)
-{
- struct early_res *r;
- int i;
-
- i = find_overlapped_early(start, end);
- r = &early_res[i];
- if (i >= max_early_res || r->end != end || r->start != start)
- panic("free_early on not reserved area: %llx-%llx!",
- start, end - 1);
-
- drop_range(i);
-}
-
-void __init free_early_partial(u64 start, u64 end)
-{
- struct early_res *r;
- int i;
-
- if (start == end)
- return;
-
- if (WARN_ONCE(start > end, " wrong range [%#llx, %#llx]\n", start, end))
- return;
-
-try_next:
- i = find_overlapped_early(start, end);
- if (i >= max_early_res)
- return;
-
- r = &early_res[i];
- /* hole ? */
- if (r->end >= end && r->start <= start) {
- drop_range_partial(i, start, end);
- return;
- }
-
- drop_range_partial(i, start, end);
- goto try_next;
-}
-
-#ifdef CONFIG_NO_BOOTMEM
-static void __init subtract_early_res(struct range *range, int az)
-{
- int i, count;
- u64 final_start, final_end;
- int idx = 0;
-
- count = 0;
- for (i = 0; i < max_early_res && early_res[i].end; i++)
- count++;
-
- /* need to skip first one ?*/
- if (early_res != early_res_x)
- idx = 1;
-
-#define DEBUG_PRINT_EARLY_RES 1
-
-#if DEBUG_PRINT_EARLY_RES
- printk(KERN_INFO "Subtract (%d early reservations)\n", count);
-#endif
- for (i = idx; i < count; i++) {
- struct early_res *r = &early_res[i];
-#if DEBUG_PRINT_EARLY_RES
- printk(KERN_INFO " #%d [%010llx - %010llx] %15s\n", i,
- r->start, r->end, r->name);
-#endif
- final_start = PFN_DOWN(r->start);
- final_end = PFN_UP(r->end);
- if (final_start >= final_end)
- continue;
- subtract_range(range, az, final_start, final_end);
- }
-
-}
-
-int __init get_free_all_memory_range(struct range **rangep, int nodeid)
-{
- int i, count;
- u64 start = 0, end;
- u64 size;
- u64 mem;
- struct range *range;
- int nr_range;
-
- count = 0;
- for (i = 0; i < max_early_res && early_res[i].end; i++)
- count++;
-
- count *= 2;
-
- size = sizeof(struct range) * count;
- end = get_max_mapped();
-#ifdef MAX_DMA32_PFN
- if (end > (MAX_DMA32_PFN << PAGE_SHIFT))
- start = MAX_DMA32_PFN << PAGE_SHIFT;
-#endif
- mem = find_fw_memmap_area(start, end, size, sizeof(struct range));
- if (mem == -1ULL)
- panic("can not find more space for range free");
-
- range = __va(mem);
- /* use early_node_map[] and early_res to get range array at first */
- memset(range, 0, size);
- nr_range = 0;
-
- /* need to go over early_node_map to find out good range for node */
- nr_range = add_from_early_node_map(range, count, nr_range, nodeid);
-#ifdef CONFIG_X86_32
- subtract_range(range, count, max_low_pfn, -1ULL);
-#endif
- subtract_early_res(range, count);
- nr_range = clean_sort_range(range, count);
-
- /* need to clear it ? */
- if (nodeid == MAX_NUMNODES) {
- memset(&early_res[0], 0,
- sizeof(struct early_res) * max_early_res);
- early_res = NULL;
- max_early_res = 0;
- }
-
- *rangep = range;
- return nr_range;
-}
-#else
-void __init early_res_to_bootmem(u64 start, u64 end)
-{
- int i, count;
- u64 final_start, final_end;
- int idx = 0;
-
- count = 0;
- for (i = 0; i < max_early_res && early_res[i].end; i++)
- count++;
-
- /* need to skip first one ?*/
- if (early_res != early_res_x)
- idx = 1;
-
- printk(KERN_INFO "(%d/%d early reservations) ==> bootmem [%010llx - %010llx]\n",
- count - idx, max_early_res, start, end);
- for (i = idx; i < count; i++) {
- struct early_res *r = &early_res[i];
- printk(KERN_INFO " #%d [%010llx - %010llx] %16s", i,
- r->start, r->end, r->name);
- final_start = max(start, r->start);
- final_end = min(end, r->end);
- if (final_start >= final_end) {
- printk(KERN_CONT "\n");
- continue;
- }
- printk(KERN_CONT " ==> [%010llx - %010llx]\n",
- final_start, final_end);
- reserve_bootmem_generic(final_start, final_end - final_start,
- BOOTMEM_DEFAULT);
- }
- /* clear them */
- memset(&early_res[0], 0, sizeof(struct early_res) * max_early_res);
- early_res = NULL;
- max_early_res = 0;
- early_res_count = 0;
-}
-#endif
-
-/* Check for already reserved areas */
-static inline int __init bad_addr(u64 *addrp, u64 size, u64 align)
-{
- int i;
- u64 addr = *addrp;
- int changed = 0;
- struct early_res *r;
-again:
- i = find_overlapped_early(addr, addr + size);
- r = &early_res[i];
- if (i < max_early_res && r->end) {
- *addrp = addr = round_up(r->end, align);
- changed = 1;
- goto again;
- }
- return changed;
-}
-
-/* Check for already reserved areas */
-static inline int __init bad_addr_size(u64 *addrp, u64 *sizep, u64 align)
-{
- int i;
- u64 addr = *addrp, last;
- u64 size = *sizep;
- int changed = 0;
-again:
- last = addr + size;
- for (i = 0; i < max_early_res && early_res[i].end; i++) {
- struct early_res *r = &early_res[i];
- if (last > r->start && addr < r->start) {
- size = r->start - addr;
- changed = 1;
- goto again;
- }
- if (last > r->end && addr < r->end) {
- addr = round_up(r->end, align);
- size = last - addr;
- changed = 1;
- goto again;
- }
- if (last <= r->end && addr >= r->start) {
- (*sizep)++;
- return 0;
- }
- }
- if (changed) {
- *addrp = addr;
- *sizep = size;
- }
- return changed;
-}
-
-/*
- * Find a free area with specified alignment in a specific range.
- * only with the area.between start to end is active range from early_node_map
- * so they are good as RAM
- */
-u64 __init find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
- u64 size, u64 align)
-{
- u64 addr, last;
-
- addr = round_up(ei_start, align);
- if (addr < start)
- addr = round_up(start, align);
- if (addr >= ei_last)
- goto out;
- while (bad_addr(&addr, size, align) && addr+size <= ei_last)
- ;
- last = addr + size;
- if (last > ei_last)
- goto out;
- if (last > end)
- goto out;
-
- return addr;
-
-out:
- return -1ULL;
-}
-
-u64 __init find_early_area_size(u64 ei_start, u64 ei_last, u64 start,
- u64 *sizep, u64 align)
-{
- u64 addr, last;
-
- addr = round_up(ei_start, align);
- if (addr < start)
- addr = round_up(start, align);
- if (addr >= ei_last)
- goto out;
- *sizep = ei_last - addr;
- while (bad_addr_size(&addr, sizep, align) && addr + *sizep <= ei_last)
- ;
- last = addr + *sizep;
- if (last > ei_last)
- goto out;
-
- return addr;
-
-out:
- return -1ULL;
-}
--
1.6.4.2

2010-06-22 17:31:03

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 22/25] x86, lmb: Use lmb_memory_size()/lmb_free_memory_size() to get correct dma_reserve

lmb_memory_size() will return memory size in lmb.memory.region.
lmb_free_memory_size() will return free memory size in lmb.memory.region.

So We can get exact reseved size in specified range.

Set the size right after initmem_init(), because later bootmem API will
get area above 16M. (except some fallback).

Later after we remove the bootmem, We could call that just before paging_init().

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/e820.h | 2 ++
arch/x86/kernel/e820.c | 17 +++++++++++++++++
arch/x86/kernel/setup.c | 1 +
arch/x86/mm/init_64.c | 7 -------
4 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index 6fbd8cd..f59db16 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -118,6 +118,8 @@ extern u64 early_reserve_e820(u64 startt, u64 sizet, u64 align);

void init_lmb_memory(void);
void fill_lmb_memory(void);
+void find_lmb_dma_reserve(void);
+
extern void finish_e820_parsing(void);
extern void e820_reserve_resources(void);
extern void e820_reserve_resources_late(void);
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index c8edb78..c09b84b 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1110,3 +1110,20 @@ void __init fill_lmb_memory(void)
lmb_analyze();
lmb_dump_all();
}
+
+void __init find_lmb_dma_reserve(void)
+{
+#ifdef CONFIG_X86_64
+ u64 free_size_pfn;
+ u64 mem_size_pfn;
+ /*
+ * need to find out used area below MAX_DMA_PFN
+ * need to use lmb to get free size in [0, MAX_DMA_PFN]
+ * at first, and assume boot_mem will not take below MAX_DMA_PFN
+ */
+ mem_size_pfn = lmb_memory_in_range(0, MAX_DMA_PFN << PAGE_SHIFT) >> PAGE_SHIFT;
+ free_size_pfn = lmb_free_memory_in_range(0, MAX_DMA_PFN << PAGE_SHIFT) >> PAGE_SHIFT;
+ set_dma_reserve(mem_size_pfn - free_size_pfn);
+#endif
+}
+
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index ba3f94c..426b217 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1019,6 +1019,7 @@ void __init setup_arch(char **cmdline_p)
#endif

initmem_init(0, max_pfn, acpi, k8);
+ find_lmb_dma_reserve();
#ifndef CONFIG_NO_BOOTMEM
lmb_to_bootmem(0, max_low_pfn<<PAGE_SHIFT);
#endif
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index bb9315f..abb79de 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -53,8 +53,6 @@
#include <asm/init.h>
#include <linux/bootmem.h>

-static unsigned long dma_reserve __initdata;
-
static int __init parse_direct_gbpages_off(char *arg)
{
direct_gbpages = 0;
@@ -821,11 +819,6 @@ int __init reserve_bootmem_generic(unsigned long phys, unsigned long len,

reserve_bootmem(phys, len, flags);

- if (phys+len <= MAX_DMA_PFN*PAGE_SIZE) {
- dma_reserve += len / PAGE_SIZE;
- set_dma_reserve(dma_reserve);
- }
-
return 0;
}
#endif
--
1.6.4.2

2010-06-22 17:28:49

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 16/25] x86, lmb: Add lmb_free_memory_in_range()

It will return free memory size in specified range.

We can not use memory_size - reserved_size here, because some reserved area
may not be in the scope of lmb.memory.region.

Use lmb.memory.region subtracting lmb.reserved.region to get free range array.
then count size of all free ranges.

-v2: Ben insist on using _in_range

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/lmb.h | 1 +
arch/x86/mm/lmb.c | 48 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 49 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
index 02fe25d..3a304f8 100644
--- a/arch/x86/include/asm/lmb.h
+++ b/arch/x86/include/asm/lmb.h
@@ -15,5 +15,6 @@ void lmb_register_active_regions(int nid, unsigned long start_pfn,
unsigned long last_pfn);
u64 lmb_hole_size(u64 start, u64 end);
u64 lmb_find_in_range_node(int nid, u64 start, u64 end, u64 size, u64 align);
+u64 lmb_free_memory_in_range(u64 addr, u64 limit);

#endif
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index 399d223..991dd55 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -217,6 +217,54 @@ void __init lmb_to_bootmem(u64 start, u64 end)
}
#endif

+u64 __init lmb_free_memory_in_range(u64 addr, u64 limit)
+{
+ int i, count;
+ struct range *range;
+ int nr_range;
+ u64 final_start, final_end;
+ u64 free_size;
+ struct lmb_region *r;
+
+ count = (lmb.reserved.cnt + lmb.memory.cnt) * 2;
+
+ range = find_range_array(count);
+ nr_range = 0;
+
+ addr = PFN_UP(addr);
+ limit = PFN_DOWN(limit);
+
+ for_each_lmb(memory, r) {
+ final_start = PFN_UP(r->base);
+ final_end = PFN_DOWN(r->base + r->size);
+ if (final_start >= final_end)
+ continue;
+ if (final_start >= limit || final_end <= addr)
+ continue;
+
+ nr_range = add_range(range, count, nr_range, final_start, final_end);
+ }
+ subtract_range(range, count, 0, addr);
+ subtract_range(range, count, limit, -1ULL);
+ for_each_lmb(reserved, r) {
+ final_start = PFN_DOWN(r->base);
+ final_end = PFN_UP(r->base + r->size);
+ if (final_start >= final_end)
+ continue;
+ if (final_start >= limit || final_end <= addr)
+ continue;
+
+ subtract_range(range, count, final_start, final_end);
+ }
+ nr_range = clean_sort_range(range, count);
+
+ free_size = 0;
+ for (i = 0; i < nr_range; i++)
+ free_size += range[i].end - range[i].start;
+
+ return free_size << PAGE_SHIFT;
+}
+
void __init lmb_reserve_range(u64 start, u64 end, char *name)
{
if (start == end)
--
1.6.4.2

2010-06-22 17:31:40

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 15/25] x86, lmb: Add lmb_find_in_range_node()

It can be used to find NODE_DATA for numa.

Need to make sure early_node_map[] is filled before it is called, otherwise
it will fallback to lmb_find_in_range(), with node range.

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/lmb.h | 1 +
arch/x86/mm/lmb.c | 15 +++++++++++++++
2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
index 763fb52..02fe25d 100644
--- a/arch/x86/include/asm/lmb.h
+++ b/arch/x86/include/asm/lmb.h
@@ -14,5 +14,6 @@ int get_free_all_memory_range(struct range **rangep, int nodeid);
void lmb_register_active_regions(int nid, unsigned long start_pfn,
unsigned long last_pfn);
u64 lmb_hole_size(u64 start, u64 end);
+u64 lmb_find_in_range_node(int nid, u64 start, u64 end, u64 size, u64 align);

#endif
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index b7e42bc..399d223 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -240,6 +240,21 @@ void __init lmb_free_range(u64 start, u64 end)
}

/*
+ * Need to call this function after lmb_register_active_regions,
+ * so early_node_map[] is filled already.
+ */
+u64 __init lmb_find_in_range_node(int nid, u64 start, u64 end, u64 size, u64 align)
+{
+ u64 addr;
+ addr = find_memory_core_early(nid, size, align, start, end);
+ if (addr != LMB_ERROR)
+ return addr;
+
+ /* Fallback, should already have start end within node range */
+ return lmb_find_in_range(start, end, size, align);
+}
+
+/*
* Finds an active region in the address range from start_pfn to last_pfn and
* returns its range in ei_startpfn and ei_endpfn for the lmb entry.
*/
--
1.6.4.2

2010-06-22 17:32:12

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 04/25] lmb: Export LMB_ERROR again

will used by x86 lmb_find_in_range_node and nobootmem replacement

Signed-off-by: Yinghai Lu <[email protected]>
---
include/linux/lmb.h | 1 +
lib/lmb.c | 2 --
2 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/include/linux/lmb.h b/include/linux/lmb.h
index 1e96224..eb8a5a3 100644
--- a/include/linux/lmb.h
+++ b/include/linux/lmb.h
@@ -19,6 +19,7 @@
#include <asm/lmb.h>

#define INIT_LMB_REGIONS 128
+#define LMB_ERROR (~(phys_addr_t)0)

struct lmb_region {
phys_addr_t base;
diff --git a/lib/lmb.c b/lib/lmb.c
index 7a6e11a..2aaeeec 100644
--- a/lib/lmb.c
+++ b/lib/lmb.c
@@ -27,8 +27,6 @@ int lmb_can_resize;
static struct lmb_region lmb_memory_init_regions[INIT_LMB_REGIONS + 1];
struct lmb_region lmb_reserved_init_regions[INIT_LMB_REGIONS + 1];

-#define LMB_ERROR (~(phys_addr_t)0)
-
/* inline so we don't get a warning when pr_debug is compiled out */
static inline const char *lmb_type_name(struct lmb_type *type)
{
--
1.6.4.2

2010-06-22 17:31:42

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 13/25] x86, lmb: Add lmb_register_active_regions() and lmb_hole_size()

lmb_register_active_regions() will be used to fill early_node_map,
the result will be lmb.memory.region AND numa data

lmb_hole_size will be used to find hole size on lmb.memory.region
with specified range.

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/lmb.h | 4 ++
arch/x86/mm/lmb.c | 67 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 71 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
index 63bb597..763fb52 100644
--- a/arch/x86/include/asm/lmb.h
+++ b/arch/x86/include/asm/lmb.h
@@ -11,4 +11,8 @@ void lmb_free_range(u64 start, u64 end);
struct range;
int get_free_all_memory_range(struct range **rangep, int nodeid);

+void lmb_register_active_regions(int nid, unsigned long start_pfn,
+ unsigned long last_pfn);
+u64 lmb_hole_size(u64 start, u64 end);
+
#endif
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index 3954103..b7e42bc 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -238,3 +238,70 @@ void __init lmb_free_range(u64 start, u64 end)

lmb_free(start, end - start);
}
+
+/*
+ * Finds an active region in the address range from start_pfn to last_pfn and
+ * returns its range in ei_startpfn and ei_endpfn for the lmb entry.
+ */
+static int __init lmb_find_active_region(const struct lmb_region *ei,
+ unsigned long start_pfn,
+ unsigned long last_pfn,
+ unsigned long *ei_startpfn,
+ unsigned long *ei_endpfn)
+{
+ u64 align = PAGE_SIZE;
+
+ *ei_startpfn = round_up(ei->base, align) >> PAGE_SHIFT;
+ *ei_endpfn = round_down(ei->base + ei->size, align) >> PAGE_SHIFT;
+
+ /* Skip map entries smaller than a page */
+ if (*ei_startpfn >= *ei_endpfn)
+ return 0;
+
+ /* Skip if map is outside the node */
+ if (*ei_endpfn <= start_pfn || *ei_startpfn >= last_pfn)
+ return 0;
+
+ /* Check for overlaps */
+ if (*ei_startpfn < start_pfn)
+ *ei_startpfn = start_pfn;
+ if (*ei_endpfn > last_pfn)
+ *ei_endpfn = last_pfn;
+
+ return 1;
+}
+
+/* Walk the lmb.memory map and register active regions within a node */
+void __init lmb_register_active_regions(int nid, unsigned long start_pfn,
+ unsigned long last_pfn)
+{
+ unsigned long ei_startpfn;
+ unsigned long ei_endpfn;
+ struct lmb_region *r;
+
+ for_each_lmb(memory, r)
+ if (lmb_find_active_region(r, start_pfn, last_pfn,
+ &ei_startpfn, &ei_endpfn))
+ add_active_range(nid, ei_startpfn, ei_endpfn);
+}
+
+/*
+ * Find the hole size (in bytes) in the memory range.
+ * @start: starting address of the memory range to scan
+ * @end: ending address of the memory range to scan
+ */
+u64 __init lmb_hole_size(u64 start, u64 end)
+{
+ unsigned long start_pfn = start >> PAGE_SHIFT;
+ unsigned long last_pfn = end >> PAGE_SHIFT;
+ unsigned long ei_startpfn, ei_endpfn, ram = 0;
+ struct lmb_region *r;
+
+ for_each_lmb(memory, r)
+ if (lmb_find_active_region(r, start_pfn, last_pfn,
+ &ei_startpfn, &ei_endpfn))
+ ram += ei_endpfn - ei_startpfn;
+
+ return end - start - ((u64)ram << PAGE_SHIFT);
+}
+
--
1.6.4.2

2010-06-22 17:28:41

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 12/25] x86, lmb: Add get_free_all_memory_range()

get_free_all_memory_range is for CONFIG_NO_BOOTMEM=y, and will be called by
free_all_memory_core_early().

It will use early_node_map aka active ranges subtract lmb.reserved to
get all free range, and those ranges will convert to slab pages.

-v4: increase range size

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Jan Beulich <[email protected]>
---
arch/x86/include/asm/lmb.h | 2 +
arch/x86/mm/lmb.c | 102 +++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 103 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/lmb.h b/arch/x86/include/asm/lmb.h
index 70df84f..63bb597 100644
--- a/arch/x86/include/asm/lmb.h
+++ b/arch/x86/include/asm/lmb.h
@@ -8,5 +8,7 @@ void lmb_to_bootmem(u64 start, u64 end);

void lmb_reserve_range(u64 start, u64 end, char *name);
void lmb_free_range(u64 start, u64 end);
+struct range;
+int get_free_all_memory_range(struct range **rangep, int nodeid);

#endif
diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
index 6b6cc58..3954103 100644
--- a/arch/x86/mm/lmb.c
+++ b/arch/x86/mm/lmb.c
@@ -86,7 +86,107 @@ u64 __init lmb_find_in_range_size(u64 start, u64 *sizep, u64 align)
return LMB_ERROR;
}

-#ifndef CONFIG_NO_BOOTMEM
+static __init struct range *find_range_array(int count)
+{
+ u64 end, size, mem;
+ struct range *range;
+
+ size = sizeof(struct range) * count;
+ end = lmb.current_limit;
+
+ mem = lmb_find_in_range(0, end, size, sizeof(struct range));
+ if (mem == LMB_ERROR)
+ panic("can not find more space for range array");
+
+ /*
+ * This range is tempoaray, so don't reserve it, it will not be
+ * overlapped because We will not alloccate new buffer before
+ * We discard this one
+ */
+ range = __va(mem);
+ memset(range, 0, size);
+
+ return range;
+}
+
+#ifdef CONFIG_NO_BOOTMEM
+static void __init subtract_lmb_reserved(struct range *range, int az)
+{
+ int count;
+ u64 final_start, final_end;
+ struct lmb_region *r;
+
+ /* Take out region array itself at first*/
+ if (lmb.reserved.regions != lmb_reserved_init_regions)
+ lmb_free(__pa(lmb.reserved.regions), sizeof(struct lmb_region) * lmb.reserved.max);
+
+ count = lmb.reserved.cnt;
+
+ pr_info("Subtract (%d early reservations)\n", count);
+
+ for_each_lmb(reserved, r) {
+ pr_info(" [%010llx - %010llx]\n", (u64)r->base, (u64)r->base + r->size);
+ final_start = PFN_DOWN(r->base);
+ final_end = PFN_UP(r->base + r->size);
+ if (final_start >= final_end)
+ continue;
+ subtract_range(range, az, final_start, final_end);
+ }
+ /* Put region array back ? */
+ if (lmb.reserved.regions != lmb_reserved_init_regions)
+ lmb_reserve(__pa(lmb.reserved.regions), sizeof(struct lmb_region) * lmb.reserved.max);
+}
+
+struct count_data {
+ int nr;
+};
+
+static int __init count_work_fn(unsigned long start_pfn,
+ unsigned long end_pfn, void *datax)
+{
+ struct count_data *data = datax;
+
+ data->nr++;
+
+ return 0;
+}
+
+static int __init count_early_node_map(int nodeid)
+{
+ struct count_data data;
+
+ data.nr = 0;
+ work_with_active_regions(nodeid, count_work_fn, &data);
+
+ return data.nr;
+}
+
+int __init get_free_all_memory_range(struct range **rangep, int nodeid)
+{
+ int count;
+ struct range *range;
+ int nr_range;
+
+ count = (lmb.reserved.cnt + count_early_node_map(nodeid)) * 2;
+
+ range = find_range_array(count);
+ nr_range = 0;
+
+ /*
+ * Use early_node_map[] and lmb.reserved.region to get range array
+ * at first
+ */
+ nr_range = add_from_early_node_map(range, count, nr_range, nodeid);
+#ifdef CONFIG_X86_32
+ subtract_range(range, count, max_low_pfn, -1ULL);
+#endif
+ subtract_lmb_reserved(range, count);
+ nr_range = clean_sort_range(range, count);
+
+ *rangep = range;
+ return nr_range;
+}
+#else
void __init lmb_to_bootmem(u64 start, u64 end)
{
int count;
--
1.6.4.2

2010-06-22 17:28:44

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 06/25] lmb: Add ARCH_DISCARD_LMB to put lmb code to .init

So those lmb bits could be released after kernel is booted up.

Arch code could define ARCH_DISCARD_LMB in asm/lmb.h,
__init_lmb will become __init, __initdata_lmb will becom __initdata

x86 code will use that.

if ARCH_DISCARD_LMB is defined, debugfs is not used

-v2: use ARCH_DISCARD_LMB according to Michael Ellerman

Signed-off-by: Yinghai Lu <[email protected]>
---
include/linux/lmb.h | 8 ++++++++
lib/lmb.c | 48 ++++++++++++++++++++++++------------------------
2 files changed, 32 insertions(+), 24 deletions(-)

diff --git a/include/linux/lmb.h b/include/linux/lmb.h
index d529232..5310c7b 100644
--- a/include/linux/lmb.h
+++ b/include/linux/lmb.h
@@ -145,6 +145,14 @@ static inline unsigned long lmb_region_pages(const struct lmb_region *reg)
region++)


+#ifdef ARCH_DISCARD_LMB
+#define __init_lmb __init
+#define __initdata_lmb __initdata
+#else
+#define __init_lmb
+#define __initdata_lmb
+#endif
+
#endif /* CONFIG_HAVE_LMB */

#endif /* __KERNEL__ */
diff --git a/lib/lmb.c b/lib/lmb.c
index 2aaeeec..e45e967 100644
--- a/lib/lmb.c
+++ b/lib/lmb.c
@@ -20,12 +20,12 @@
#include <linux/seq_file.h>
#include <linux/lmb.h>

-struct lmb lmb;
+struct lmb lmb __initdata_lmb;

-int lmb_debug;
-int lmb_can_resize;
-static struct lmb_region lmb_memory_init_regions[INIT_LMB_REGIONS + 1];
-struct lmb_region lmb_reserved_init_regions[INIT_LMB_REGIONS + 1];
+int lmb_debug __initdata_lmb;
+int lmb_can_resize __initdata_lmb;
+static struct lmb_region lmb_memory_init_regions[INIT_LMB_REGIONS + 1] __initdata_lmb;
+struct lmb_region lmb_reserved_init_regions[INIT_LMB_REGIONS + 1] __initdata_lmb;

/* inline so we don't get a warning when pr_debug is compiled out */
static inline const char *lmb_type_name(struct lmb_type *type)
@@ -42,23 +42,23 @@ static inline const char *lmb_type_name(struct lmb_type *type)
* Address comparison utilities
*/

-static phys_addr_t lmb_align_down(phys_addr_t addr, phys_addr_t size)
+static phys_addr_t __init_lmb lmb_align_down(phys_addr_t addr, phys_addr_t size)
{
return addr & ~(size - 1);
}

-static phys_addr_t lmb_align_up(phys_addr_t addr, phys_addr_t size)
+static phys_addr_t __init_lmb lmb_align_up(phys_addr_t addr, phys_addr_t size)
{
return (addr + (size - 1)) & ~(size - 1);
}

-static unsigned long lmb_addrs_overlap(phys_addr_t base1, phys_addr_t size1,
+static unsigned long __init_lmb lmb_addrs_overlap(phys_addr_t base1, phys_addr_t size1,
phys_addr_t base2, phys_addr_t size2)
{
return ((base1 < (base2 + size2)) && (base2 < (base1 + size1)));
}

-static long lmb_addrs_adjacent(phys_addr_t base1, phys_addr_t size1,
+static long __init_lmb lmb_addrs_adjacent(phys_addr_t base1, phys_addr_t size1,
phys_addr_t base2, phys_addr_t size2)
{
if (base2 == base1 + size1)
@@ -69,7 +69,7 @@ static long lmb_addrs_adjacent(phys_addr_t base1, phys_addr_t size1,
return 0;
}

-static long lmb_regions_adjacent(struct lmb_type *type,
+static long __init_lmb lmb_regions_adjacent(struct lmb_type *type,
unsigned long r1, unsigned long r2)
{
phys_addr_t base1 = type->regions[r1].base;
@@ -80,7 +80,7 @@ static long lmb_regions_adjacent(struct lmb_type *type,
return lmb_addrs_adjacent(base1, size1, base2, size2);
}

-long lmb_overlaps_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
+long __init_lmb lmb_overlaps_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
{
unsigned long i;

@@ -156,7 +156,7 @@ static phys_addr_t __init lmb_find_base(phys_addr_t size, phys_addr_t align,
return LMB_ERROR;
}

-static void lmb_remove_region(struct lmb_type *type, unsigned long r)
+static void __init_lmb lmb_remove_region(struct lmb_type *type, unsigned long r)
{
unsigned long i;

@@ -168,7 +168,7 @@ static void lmb_remove_region(struct lmb_type *type, unsigned long r)
}

/* Assumption: base addr of region 1 < base addr of region 2 */
-static void lmb_coalesce_regions(struct lmb_type *type,
+static void __init_lmb lmb_coalesce_regions(struct lmb_type *type,
unsigned long r1, unsigned long r2)
{
type->regions[r1].size += type->regions[r2].size;
@@ -178,7 +178,7 @@ static void lmb_coalesce_regions(struct lmb_type *type,
/* Defined below but needed now */
static long lmb_add_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size);

-static int lmb_double_array(struct lmb_type *type)
+static int __init_lmb lmb_double_array(struct lmb_type *type)
{
struct lmb_region *new_array, *old_array;
phys_addr_t old_size, new_size, addr;
@@ -250,13 +250,13 @@ static int lmb_double_array(struct lmb_type *type)
return 0;
}

-extern int __weak lmb_memory_can_coalesce(phys_addr_t addr1, phys_addr_t size1,
+extern int __init_lmb __weak lmb_memory_can_coalesce(phys_addr_t addr1, phys_addr_t size1,
phys_addr_t addr2, phys_addr_t size2)
{
return 1;
}

-static long lmb_add_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
+static long __init_lmb lmb_add_region(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
{
unsigned long coalesced = 0;
long adjacent, i;
@@ -343,13 +343,13 @@ static long lmb_add_region(struct lmb_type *type, phys_addr_t base, phys_addr_t
return 0;
}

-long lmb_add(phys_addr_t base, phys_addr_t size)
+long __init_lmb lmb_add(phys_addr_t base, phys_addr_t size)
{
return lmb_add_region(&lmb.memory, base, size);

}

-static long __lmb_remove(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
+static long __init_lmb __lmb_remove(struct lmb_type *type, phys_addr_t base, phys_addr_t size)
{
phys_addr_t rgnbegin, rgnend;
phys_addr_t end = base + size;
@@ -397,7 +397,7 @@ static long __lmb_remove(struct lmb_type *type, phys_addr_t base, phys_addr_t si
return lmb_add_region(type, end, rgnend - end);
}

-long lmb_remove(phys_addr_t base, phys_addr_t size)
+long __init_lmb lmb_remove(phys_addr_t base, phys_addr_t size)
{
return __lmb_remove(&lmb.memory, base, size);
}
@@ -554,7 +554,7 @@ phys_addr_t __init lmb_phys_mem_size(void)
return lmb.memory_size;
}

-phys_addr_t lmb_end_of_DRAM(void)
+phys_addr_t __init_lmb lmb_end_of_DRAM(void)
{
int idx = lmb.memory.cnt - 1;

@@ -615,7 +615,7 @@ int __init lmb_is_reserved(phys_addr_t addr)
return 0;
}

-int lmb_is_region_reserved(phys_addr_t base, phys_addr_t size)
+int __init_lmb lmb_is_region_reserved(phys_addr_t base, phys_addr_t size)
{
return lmb_overlaps_region(&lmb.reserved, base, size);
}
@@ -626,7 +626,7 @@ void __init lmb_set_current_limit(phys_addr_t limit)
lmb.current_limit = limit;
}

-static void lmb_dump(struct lmb_type *region, char *name)
+static void __init_lmb lmb_dump(struct lmb_type *region, char *name)
{
unsigned long long base, size;
int i;
@@ -642,7 +642,7 @@ static void lmb_dump(struct lmb_type *region, char *name)
}
}

-void lmb_dump_all(void)
+void __init_lmb lmb_dump_all(void)
{
if (!lmb_debug)
return;
@@ -708,7 +708,7 @@ static int __init early_lmb(char *p)
}
early_param("lmb", early_lmb);

-#ifdef CONFIG_DEBUG_FS
+#if defined(CONFIG_DEBUG_FS) && !defined(ARCH_DISCARD_LMB)

static int lmb_debug_show(struct seq_file *m, void *private)
{
--
1.6.4.2

2010-06-22 17:32:42

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 14/25] lmb: Add find_memory_core_early()

According to node range in early_node_map[] with __lmb_find_in_range
to find free range.

Will be used by lmb_find_in_range_node()

lmb_find_in_range_node will be used to find right buffer for NODE_DATA

Signed-off-by: Yinghai Lu <[email protected]>
---
include/linux/mm.h | 2 ++
mm/page_alloc.c | 36 ++++++++++++++++++++++++++++++++++++
2 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4238a9c..a85bb08 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1164,6 +1164,8 @@ extern void free_bootmem_with_active_regions(int nid,
unsigned long max_low_pfn);
int add_from_early_node_map(struct range *range, int az,
int nr_range, int nid);
+u64 __init find_memory_core_early(int nid, u64 size, u64 align,
+ u64 goal, u64 limit);
void *__alloc_memory_core_early(int nodeid, u64 size, u64 align,
u64 goal, u64 limit);
typedef int (*work_fn_t)(unsigned long, unsigned long, void *);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 431214b..37f30fc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -21,6 +21,7 @@
#include <linux/pagemap.h>
#include <linux/jiffies.h>
#include <linux/bootmem.h>
+#include <linux/lmb.h>
#include <linux/compiler.h>
#include <linux/kernel.h>
#include <linux/kmemcheck.h>
@@ -3612,6 +3613,41 @@ void __init free_bootmem_with_active_regions(int nid,
}
}

+#ifdef CONFIG_HAVE_LMB
+u64 __init find_memory_core_early(int nid, u64 size, u64 align,
+ u64 goal, u64 limit)
+{
+ int i;
+
+ /* Need to go over early_node_map to find out good range for node */
+ for_each_active_range_index_in_nid(i, nid) {
+ u64 addr;
+ u64 ei_start, ei_last;
+ u64 final_start, final_end;
+
+ ei_last = early_node_map[i].end_pfn;
+ ei_last <<= PAGE_SHIFT;
+ ei_start = early_node_map[i].start_pfn;
+ ei_start <<= PAGE_SHIFT;
+
+ final_start = max(ei_start, goal);
+ final_end = min(ei_last, limit);
+
+ if (final_start >= final_end)
+ continue;
+
+ addr = lmb_find_in_range(final_start, final_end, size, align);
+
+ if (addr == LMB_ERROR)
+ continue;
+
+ return addr;
+ }
+
+ return LMB_ERROR;
+}
+#endif
+
int __init add_from_early_node_map(struct range *range, int az,
int nr_range, int nid)
{
--
1.6.4.2

2010-06-22 17:32:55

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 09/25] bootmem, x86: Add weak version of reserve_bootmem_generic

It will be used lmb_to_bootmem converting

It is an wrapper for reserve_bootmem, and x86 64bit is using special one.

Also clean up that version for x86_64. We don't need to take care of numa
path for that, bootmem can handle it how

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/mm/init_32.c | 6 ------
arch/x86/mm/init_64.c | 20 ++------------------
mm/bootmem.c | 6 ++++++
3 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index bca7909..90e0545 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -1069,9 +1069,3 @@ void mark_rodata_ro(void)
#endif
}
#endif
-
-int __init reserve_bootmem_generic(unsigned long phys, unsigned long len,
- int flags)
-{
- return reserve_bootmem(phys, len, flags);
-}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index ee41bba..634fa08 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -799,13 +799,10 @@ void mark_rodata_ro(void)

#endif

+#ifndef CONFIG_NO_BOOTMEM
int __init reserve_bootmem_generic(unsigned long phys, unsigned long len,
int flags)
{
-#ifdef CONFIG_NUMA
- int nid, next_nid;
- int ret;
-#endif
unsigned long pfn = phys >> PAGE_SHIFT;

if (pfn >= max_pfn) {
@@ -821,21 +818,7 @@ int __init reserve_bootmem_generic(unsigned long phys, unsigned long len,
return -EFAULT;
}

- /* Should check here against the e820 map to avoid double free */
-#ifdef CONFIG_NUMA
- nid = phys_to_nid(phys);
- next_nid = phys_to_nid(phys + len - 1);
- if (nid == next_nid)
- ret = reserve_bootmem_node(NODE_DATA(nid), phys, len, flags);
- else
- ret = reserve_bootmem(phys, len, flags);
-
- if (ret != 0)
- return ret;
-
-#else
reserve_bootmem(phys, len, flags);
-#endif

if (phys+len <= MAX_DMA_PFN*PAGE_SIZE) {
dma_reserve += len / PAGE_SIZE;
@@ -844,6 +827,7 @@ int __init reserve_bootmem_generic(unsigned long phys, unsigned long len,

return 0;
}
+#endif

int kern_addr_valid(unsigned long addr)
{
diff --git a/mm/bootmem.c b/mm/bootmem.c
index 58c66cc..ee31b95 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -526,6 +526,12 @@ int __init reserve_bootmem(unsigned long addr, unsigned long size,
}

#ifndef CONFIG_NO_BOOTMEM
+int __weak __init reserve_bootmem_generic(unsigned long phys, unsigned long len,
+ int flags)
+{
+ return reserve_bootmem(phys, len, flags);
+}
+
static unsigned long __init align_idx(struct bootmem_data *bdata,
unsigned long idx, unsigned long step)
{
--
1.6.4.2

2010-06-22 17:28:38

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 05/25] lmb: Prepare to include linux/lmb.h in core file

Need to add protection in linux/lmb.h, to prepare to include it in
mm/page_alloc.c and mm/bootmem.c etc.

Signed-off-by: Yinghai Lu <[email protected]>
---
include/linux/lmb.h | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/include/linux/lmb.h b/include/linux/lmb.h
index eb8a5a3..d529232 100644
--- a/include/linux/lmb.h
+++ b/include/linux/lmb.h
@@ -2,6 +2,7 @@
#define _LINUX_LMB_H
#ifdef __KERNEL__

+#ifdef CONFIG_HAVE_LMB
/*
* Logical memory blocks.
*
@@ -144,6 +145,8 @@ static inline unsigned long lmb_region_pages(const struct lmb_region *reg)
region++)


+#endif /* CONFIG_HAVE_LMB */
+
#endif /* __KERNEL__ */

#endif /* _LINUX_LMB_H */
--
1.6.4.2

2010-06-22 17:33:25

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 03/25] lmb: Print new doubled array location info

so will have more idea where it is, use lmb_debug to controll it

Signed-off-by: Yinghai Lu <[email protected]>
---
lib/lmb.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/lmb.c b/lib/lmb.c
index bbdb1ec..7a6e11a 100644
--- a/lib/lmb.c
+++ b/lib/lmb.c
@@ -192,8 +192,6 @@ static int lmb_double_array(struct lmb_type *type)
if (!lmb_can_resize)
return -1;

- pr_debug("lmb: %s array full, doubling...", lmb_type_name(type));
-
/* Calculate new doubled size */
old_size = type->max * sizeof(struct lmb_region);
new_size = old_size << 1;
@@ -221,6 +219,10 @@ static int lmb_double_array(struct lmb_type *type)
}
new_array = __va(addr);

+ if (lmb_debug)
+ pr_info("lmb: %s array is doubled to %ld at %llx - %llx",
+ lmb_type_name(type), type->max * 2, (u64)addr, (u64)addr + new_size);
+
/* Found space, we now need to move the array over before
* we add the reserved region since it may be our reserved
* array itself that is full.
--
1.6.4.2

2010-06-22 17:33:40

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 02/25] lmb: Prepare x86 to use lmb to replace early_res

1. expose lmb_debug
2. expose lmb_reserved_init_regions

-v2: drop lmb_add_region() and LMB_ERROR export
-v3: seperate wrong return of lmb_fin_base to another patch
-v4: expose lmb_can_resize to handle x86 EFI that could have more than
128 entries

Signed-off-by: Yinghai Lu <[email protected]>
Acked-by: Benjamin Herrenschmidt <[email protected]>
---
include/linux/lmb.h | 3 +++
lib/lmb.c | 5 +++--
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/lmb.h b/include/linux/lmb.h
index 6f8c4bd..1e96224 100644
--- a/include/linux/lmb.h
+++ b/include/linux/lmb.h
@@ -39,6 +39,9 @@ struct lmb {
};

extern struct lmb lmb;
+extern int lmb_debug;
+extern int lmb_can_resize;
+extern struct lmb_region lmb_reserved_init_regions[];

extern void __init lmb_init(void);
extern void __init lmb_analyze(void);
diff --git a/lib/lmb.c b/lib/lmb.c
index 13d1a04..bbdb1ec 100644
--- a/lib/lmb.c
+++ b/lib/lmb.c
@@ -22,9 +22,10 @@

struct lmb lmb;

-static int lmb_debug, lmb_can_resize;
+int lmb_debug;
+int lmb_can_resize;
static struct lmb_region lmb_memory_init_regions[INIT_LMB_REGIONS + 1];
-static struct lmb_region lmb_reserved_init_regions[INIT_LMB_REGIONS + 1];
+struct lmb_region lmb_reserved_init_regions[INIT_LMB_REGIONS + 1];

#define LMB_ERROR (~(phys_addr_t)0)

--
1.6.4.2

2010-06-22 18:45:55

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [PATCH 05/25] lmb: Prepare to include linux/lmb.h in core file

On Tue, Jun 22, 2010 at 10:26:34AM -0700, Yinghai Lu wrote:
> Need to add protection in linux/lmb.h, to prepare to include it in
> mm/page_alloc.c and mm/bootmem.c etc.
>
> Signed-off-by: Yinghai Lu <[email protected]>
> ---
> include/linux/lmb.h | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/lmb.h b/include/linux/lmb.h
> index eb8a5a3..d529232 100644
> --- a/include/linux/lmb.h
> +++ b/include/linux/lmb.h
> @@ -2,6 +2,7 @@
> #define _LINUX_LMB_H
> #ifdef __KERNEL__
>
> +#ifdef CONFIG_HAVE_LMB

The file could loose the "#ifdef __KERNEL__"
as lmb.h is not exported to userspace.

Sam

2010-06-29 17:43:34

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 03/25] lmb: Print new doubled array location info

On Tuesday, June 22, 2010 11:26:32 am Yinghai Lu wrote:
> + if (lmb_debug)
> + pr_info("lmb: %s array is doubled to %ld at %llx - %llx",
> + lmb_type_name(type), type->max * 2, (u64)addr, (u64)addr + new_size);

Please print this memory range the same way we print resources, e.g.,
"%#010llx-%#010llx", with "addr" and "addr + new_size - 1".

2010-06-29 17:46:48

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 10/25] x86, lmb: Add lmb_to_bootmem()

On Tuesday, June 22, 2010 11:26:39 am Yinghai Lu wrote:
> + pr_info("(%d early reservations) ==> bootmem [%010llx - %010llx]\n", count, start, end);

Please make all these address range messages match the %pR format.

> + for_each_lmb(reserved, r) {
> + pr_info(" [%010llx - %010llx] ", (u64)r->base, (u64)r->base + r->size);
> + final_start = max(start, r->base);
> + final_end = min(end, r->base + r->size);
> + if (final_start >= final_end) {
> + pr_cont("\n");
> + continue;
> + }
> + pr_cont(" ==> [%010llx - %010llx]\n", final_start, final_end);
> + reserve_bootmem_generic(final_start, final_end - final_start, BOOTMEM_DEFAULT);
> + }
> +
> + /* Put region array back ? */
> + if (lmb.reserved.regions != lmb_reserved_init_regions)
> + lmb_reserve(__pa(lmb.reserved.regions), sizeof(struct lmb_region) * lmb.reserved.max);
> +}
> +#endif
>

2010-06-29 17:51:46

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 11/25] x86,lmb: Add lmb_reserve_range/lmb_free_range

On Tuesday, June 22, 2010 11:26:40 am Yinghai Lu wrote:

> +void __init lmb_reserve_range(u64 start, u64 end, char *name)
> +{
> + if (start == end)
> + return;
> +
> + if (WARN_ONCE(start > end, "lmb_reserve_range: wrong range [%#llx, %#llx]\n", start, end))

Please use %pR format.

> + return;
> +
> + lmb_reserve(start, end - start);
> +}
> +
> +void __init lmb_free_range(u64 start, u64 end)
> +{
> + if (start == end)
> + return;
> +
> + if (WARN_ONCE(start > end, "lmb_free_range: wrong range [%#llx, %#llx]\n", start, end))
> + return;
> +
> + lmb_free(start, end - start);
> +}
>

2010-06-29 17:55:57

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 12/25] x86, lmb: Add get_free_all_memory_range()

On Tuesday, June 22, 2010 11:26:41 am Yinghai Lu wrote:

> + pr_info("Subtract (%d early reservations)\n", count);
> +
> + for_each_lmb(reserved, r) {
> + pr_info(" [%010llx - %010llx]\n", (u64)r->base, (u64)r->base + r->size);

Use %pR format and consider adding text to the line, e.g.,
"lmb: early reservation [%#010llx-%#010llx] removed" so that
grep output is useful all by itself, without requiring the
context of the preceding lines.

> + final_start = PFN_DOWN(r->base);
> + final_end = PFN_UP(r->base + r->size);
> + if (final_start >= final_end)
> + continue;
> + subtract_range(range, az, final_start, final_end);
> + }
> + /* Put region array back ? */
> + if (lmb.reserved.regions != lmb_reserved_init_regions)
> + lmb_reserve(__pa(lmb.reserved.regions), sizeof(struct lmb_region) * lmb.reserved.max);
> +}
> +
> +struct count_data {
> + int nr;
> +};
> +
> +static int __init count_work_fn(unsigned long start_pfn,
> + unsigned long end_pfn, void *datax)
> +{
> + struct count_data *data = datax;
> +
> + data->nr++;
> +
> + return 0;
> +}
> +
> +static int __init count_early_node_map(int nodeid)
> +{
> + struct count_data data;
> +
> + data.nr = 0;
> + work_with_active_regions(nodeid, count_work_fn, &data);
> +
> + return data.nr;
> +}
> +
> +int __init get_free_all_memory_range(struct range **rangep, int nodeid)
> +{
> + int count;
> + struct range *range;
> + int nr_range;
> +
> + count = (lmb.reserved.cnt + count_early_node_map(nodeid)) * 2;
> +
> + range = find_range_array(count);
> + nr_range = 0;
> +
> + /*
> + * Use early_node_map[] and lmb.reserved.region to get range array
> + * at first
> + */
> + nr_range = add_from_early_node_map(range, count, nr_range, nodeid);
> +#ifdef CONFIG_X86_32
> + subtract_range(range, count, max_low_pfn, -1ULL);
> +#endif
> + subtract_lmb_reserved(range, count);
> + nr_range = clean_sort_range(range, count);
> +
> + *rangep = range;
> + return nr_range;
> +}
> +#else
> void __init lmb_to_bootmem(u64 start, u64 end)
> {
> int count;
>

2010-06-29 17:58:13

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 18/25] x86, lmb: Use lmb_debug to control debug message print out

On Tuesday, June 22, 2010 11:26:47 am Yinghai Lu wrote:
> Also let lmb_reserve_range/lmb_free_range could print out name if lmb=debug is
> specified
>
> will also print ther name when reserve_lmb_area/free_lmb_area are called.
>
> Signed-off-by: Yinghai Lu <[email protected]>
> ---
> arch/x86/mm/lmb.c | 24 ++++++++++++++++++------
> 1 files changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/mm/lmb.c b/arch/x86/mm/lmb.c
> index bd2f60b..209f25b 100644
> --- a/arch/x86/mm/lmb.c
> +++ b/arch/x86/mm/lmb.c
> @@ -122,10 +122,12 @@ static void __init subtract_lmb_reserved(struct range *range, int az)
>
> count = lmb.reserved.cnt;
>
> - pr_info("Subtract (%d early reservations)\n", count);
> + if (lmb_debug)
> + pr_info("Subtract (%d early reservations)\n", count);
>
> for_each_lmb(reserved, r) {
> - pr_info(" [%010llx - %010llx]\n", (u64)r->base, (u64)r->base + r->size);

Use %pR format (looks like this is rework on top of a previous patch,
so this comment is probably redundant).

> + if (lmb_debug)
> + pr_info(" [%010llx - %010llx]\n", (u64)r->base, (u64)r->base + r->size);
> final_start = PFN_DOWN(r->base);
> final_end = PFN_UP(r->base + r->size);
> if (final_start >= final_end)
> @@ -198,16 +200,20 @@ void __init lmb_to_bootmem(u64 start, u64 end)
> lmb_free(__pa(lmb.reserved.regions), sizeof(struct lmb_region) * lmb.reserved.max);
>
> count = lmb.reserved.cnt;
> - pr_info("(%d early reservations) ==> bootmem [%010llx - %010llx]\n", count, start, end);
> + if (lmb_debug)
> + pr_info("(%d early reservations) ==> bootmem [%010llx - %010llx]\n", count, start, end);
> for_each_lmb(reserved, r) {
> - pr_info(" [%010llx - %010llx] ", (u64)r->base, (u64)r->base + r->size);
> + if (lmb_debug)
> + pr_info(" [%010llx - %010llx] ", (u64)r->base, (u64)r->base + r->size);
> final_start = max(start, r->base);
> final_end = min(end, r->base + r->size);
> if (final_start >= final_end) {
> - pr_cont("\n");
> + if (lmb_debug)
> + pr_cont("\n");
> continue;
> }
> - pr_cont(" ==> [%010llx - %010llx]\n", final_start, final_end);
> + if (lmb_debug)
> + pr_cont(" ==> [%010llx - %010llx]\n", final_start, final_end);
> reserve_bootmem_generic(final_start, final_end - final_start, BOOTMEM_DEFAULT);
> }
>
> @@ -289,6 +295,9 @@ void __init lmb_reserve_range(u64 start, u64 end, char *name)
> if (WARN_ONCE(start > end, "lmb_reserve_range: wrong range [%#llx, %#llx]\n", start, end))
> return;
>
> + if (lmb_debug)
> + pr_info(" lmb_reserve_range: [%010llx, %010llx] %16s\n", start, end, name);
> +
> lmb_reserve(start, end - start);
> }
>
> @@ -300,6 +309,9 @@ void __init lmb_free_range(u64 start, u64 end)
> if (WARN_ONCE(start > end, "lmb_free_range: wrong range [%#llx, %#llx]\n", start, end))
> return;
>
> + if (lmb_debug)
> + pr_info(" lmb_free_range: [%010llx, %010llx]\n", start, end);
> +
> lmb_free(start, end - start);
> }
>
>

2010-06-29 18:02:18

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 23/25] x86: Have nobootmem version setup_bootmem_allocator()

On Tuesday, June 22, 2010 11:26:52 am Yinghai Lu wrote:
> We can reduce #ifdef number from 3 to one in init_32.c
>
> Signed-off-by: Yinghai Lu <[email protected]>
> ---
> arch/x86/mm/init_32.c | 15 ++++++++++-----
> 1 files changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> index e3ae067..f172aa3 100644
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -771,11 +771,9 @@ static unsigned long __init setup_node_bootmem(int nodeid,
>
> return bootmap + bootmap_size;
> }
> -#endif
>
> void __init setup_bootmem_allocator(void)
> {
> -#ifndef CONFIG_NO_BOOTMEM
> int nodeid;
> unsigned long bootmap_size, bootmap;
> /*
> @@ -787,13 +785,11 @@ void __init setup_bootmem_allocator(void)
> if (bootmap == (unsigned long)LMB_ERROR)
> panic("Cannot find bootmem map of size %ld\n", bootmap_size);
> lmb_reserve_range(bootmap, bootmap + bootmap_size, "BOOTMAP");
> -#endif
>
> printk(KERN_INFO " mapped low ram: 0 - %08lx\n",
> max_pfn_mapped<<PAGE_SHIFT);
> printk(KERN_INFO " low ram: 0 - %08lx\n", max_low_pfn<<PAGE_SHIFT);
>
> -#ifndef CONFIG_NO_BOOTMEM
> for_each_online_node(nodeid) {
> unsigned long start_pfn, end_pfn;
>
> @@ -811,10 +807,19 @@ void __init setup_bootmem_allocator(void)
> bootmap = setup_node_bootmem(nodeid, start_pfn, end_pfn,
> bootmap);
> }
> -#endif
>
> after_bootmem = 1;
> }
> +#else
> +void __init setup_bootmem_allocator(void)
> +{
> + printk(KERN_INFO " mapped low ram: 0 - %08lx\n",
> + max_pfn_mapped<<PAGE_SHIFT);
> + printk(KERN_INFO " low ram: 0 - %08lx\n", max_low_pfn<<PAGE_SHIFT);

Please use %pR format here and fix up the other printks above in the
other #ifdef branch.

> +
> + after_bootmem = 1;
> +}
> +#endif
>
> /*
> * paging_init() sets up the page tables - note that the first 8MB are
>

2010-06-29 18:05:33

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH -v19 00/25] Use lmb with x86

On Tuesday, June 22, 2010 11:26:29 am Yinghai Lu wrote:
> New lmb could be used to replace early_res in x86.

Since you're fiddling in e820 anyway, can you fix places like
e820_print_map() so they use the %pR format? I think it'd be
cleaner if e820_print_type() just returned a string pointer
rather than calling printk directly. Then we'd only need a
single printk per region and wouldn't need the KERN_CONT junk.

2010-06-29 18:39:37

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 03/25] lmb: Print new doubled array location info

On 06/29/2010 10:44 AM, Bjorn Helgaas wrote:
> On Tuesday, June 22, 2010 11:26:32 am Yinghai Lu wrote:
>> + if (lmb_debug)
>> + pr_info("lmb: %s array is doubled to %ld at %llx - %llx",
>> + lmb_type_name(type), type->max * 2, (u64)addr, (u64)addr + new_size);
>
> Please print this memory range the same way we print resources, e.g.,
> "%#010llx-%#010llx", with "addr" and "addr + new_size - 1".

*Ten* hex digits?

-hpa

2010-06-29 18:52:59

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 03/25] lmb: Print new doubled array location info

On Tuesday, June 29, 2010 12:38:44 pm H. Peter Anvin wrote:
> On 06/29/2010 10:44 AM, Bjorn Helgaas wrote:
> > On Tuesday, June 22, 2010 11:26:32 am Yinghai Lu wrote:
> >> + if (lmb_debug)
> >> + pr_info("lmb: %s array is doubled to %ld at %llx - %llx",
> >> + lmb_type_name(type), type->max * 2, (u64)addr, (u64)addr + new_size);
> >
> > Please print this memory range the same way we print resources, e.g.,
> > "%#010llx-%#010llx", with "addr" and "addr + new_size - 1".
>
> *Ten* hex digits?

I think that width includes the "0x" prefix, so it's really eight digits.

2010-06-29 20:06:49

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 03/25] lmb: Print new doubled array location info

On 06/29/2010 10:44 AM, Bjorn Helgaas wrote:
> On Tuesday, June 22, 2010 11:26:32 am Yinghai Lu wrote:
>> + if (lmb_debug)
>> + pr_info("lmb: %s array is doubled to %ld at %llx - %llx",
>> + lmb_type_name(type), type->max * 2, (u64)addr, (u64)addr + new_size);
>
> Please print this memory range the same way we print resources, e.g.,
> "%#010llx-%#010llx", with "addr" and "addr + new_size - 1".

ok, I will put # for 0x.

but i like to have

xxx - yyy : to include end
[xxx, yyy - 1]

just like current e820 print out.
and it would be more readable without too many ffff

Thanks

Yinghai

2010-06-29 20:57:48

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 03/25] lmb: Print new doubled array location info

On Tuesday, June 29, 2010 02:03:21 pm Yinghai Lu wrote:
> On 06/29/2010 10:44 AM, Bjorn Helgaas wrote:
> > On Tuesday, June 22, 2010 11:26:32 am Yinghai Lu wrote:
> >> + if (lmb_debug)
> >> + pr_info("lmb: %s array is doubled to %ld at %llx - %llx",
> >> + lmb_type_name(type), type->max * 2, (u64)addr, (u64)addr + new_size);
> >
> > Please print this memory range the same way we print resources, e.g.,
> > "%#010llx-%#010llx", with "addr" and "addr + new_size - 1".
>
> ok, I will put # for 0x.
>
> but i like to have
>
> xxx - yyy : to include end
> [xxx, yyy - 1]
>
> just like current e820 print out.
> and it would be more readable without too many ffff

I think it's stupid to use two different conventions for printing
address ranges. That just makes extra mental work for people
comparing e820 ranges with %pR resources.

I don't personally care that much whether we pick the convention of
including the end (like the current e820 output) or the convention of
excluding it (like %pR and /proc/iomem do), but whatever we pick, we
should use it consistently.

To me, the fact that /proc/iomem is user-visible and excludes the end
is a pretty strong argument for adopting that convention.

And I think you should remove the extra spaces in "xxx - yyy".
There's no reason to be different when we could be consistent.

Bjorn