2010-02-09 19:37:14

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH -v6 0/35] x86: not use bootmem for x86

please check the patches regarding with early_res and bootmem

it will use early_res instead of bootmem with x86 code.
but still can use CONFIG_NO_BOOMEM to use bootmem or not
so could make transistion more smoothly

-v2: allocate vmemmap on one node together, and also seperate early_res
-v3: make x86 32 bit support early_res to use bootmem too
move related early_res to kernel/
sparse vmemmap together: address Ingo.
-v4: some patches could go with tip with acked-by Jesse
radix and logical flat etc
-v5: put back to 2 patches into this patch to make it consistent
as linus pointed out that some place should replace size_t
with resource_size_t, and acctually that is done already in
those patches in pci/linux-next.
-v6: updated about intel_bus.c removal

------------------------
http://lkml.indiana.edu/hypermail/linux/kernel/0910.3/01432.html
Ingo said:
" I think we could remove the bootmem allocator middle man altogether.

This can be done by initializing the page allocator sooner and by
extending (already existing) 'reserve memory early on' mechanisms in
architecture code. (the reserve_early*() APIs in x86 for example)

Right now we have 5 memory allocation models on x86, initialized
gradually:

- allocator (buddy) [generic]
- early allocator (bootmem) [generic]
- very early allocator (reserve_early*()) [x86]
- very very early allocator (early brk model) [x86]
- very very very early allocator (build time .data/.bss) [generic]

Seems excessive.

The reserve_early() method is list/range based and can handle vast
amounts of not very fragmented memory - perfect for basically all the
real bootmem purposes (which is to bootstrap the buddy).

reserve_early() allocated memory could be freed into the buddy later on
as well. The main reason why bootmem is 'destroyed' during free-to-buddy
is because it has excessive internal bitmaps we want to free. With a
list/range based reserve_early() mechanism there's no such problem -
they can linger indefinitely and there's near zero allocation management
overhead. "





a06bb22: x86: fix sci on ioapic 1

-----------------early_res to replace bootmem---
aa0a11b: x86: move range related operation to one file
c4b3117: x86/pci: use resource_size_t in update_res
e058bd0: x86/pci: amd one chain system to use pci read out res
d26795f: x86/pci: use u64 instead of size_t in amd_bus.c
5f8e15f: x86/pci: add cap_resource
3d35ae5: x86/pci: enable pci root res read out for 32bit too
81d88a0: x86: change range end to start+size
6579bec: x86: print out for RAM buffer
728cbde: x86: call early_res_to_bootmem one time
628a00a: x86: introduce max_early_res and early_res_count
d3b8276: x86: dynamic increase early_res array size
5bd2f97: x86: make early_node_mem get mem > 4g if possible
67cb2e2: x86: only call dma32_reserve_bootmem 64bit !CONFIG_NUMA
b103b00: x86: make 64 bit use early_res instead of bootmem before slab
ce3fcbd: sparsemem: put usemap for one node together
8b114f3: sparsemem: put mem map for one node together.
34ab079: x86: move bios page reserve early to head32/64.c
7488456: x86: seperate early_res related code from e820.c
bdd2487: x86: add find_early_area_size
678f2b2: x86: move back find_e820_area to e820.c
a1139c4: early_res: enhance check_and_double_early_res
139a7ce: x86: make 32bit support NO_BOOTMEM
2307d29: move round_up/down to kernel.h
4f26f0a: x86: add find_fw_memmap_area
efcffa6: core: move early_res

---------spareirq radix tree related ----------------
8311fab: irq: remove not need bootmem code
2216c85: radix: move radix init early
17abde7: sparseirq: change irq_desc_ptrs to static
3a86f81: sparseirq: use radix_tree instead of ptrs array
30afe2e: x86: remove arch_probe_nr_irqs

---------------x86 logical flat related -----------
b0be8be: use nr_cpus= to set nr_cpu_ids early
9676ccf: x86: according to nr_cpu_ids to decide if need to leave logical flat
f4f0989: x86: make 32bit apic flat to physflat switch like 64bit
0fa2815: x86: use num_processors for possible cpus




Yinghai


2010-02-09 19:33:59

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 30/35] sparseirq: use radix_tree instead of ptrs array

use radix_tree irq_desc_tree instead of irq_desc_ptrs.

-v2: according to Eric and cyrill to use radix_tree_lookup_slot and radix_tree_replace_slot

Signed-off-by: Yinghai Lu <[email protected]>
---
kernel/irq/handle.c | 49 +++++++++++++++++++++++++------------------------
1 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index 266f798..76d5a67 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -19,6 +19,7 @@
#include <linux/kernel_stat.h>
#include <linux/rculist.h>
#include <linux/hash.h>
+#include <linux/radix-tree.h>
#include <trace/events/irq.h>

#include "internals.h"
@@ -127,7 +128,26 @@ static void init_one_irq_desc(int irq, struct irq_desc *desc, int node)
*/
DEFINE_RAW_SPINLOCK(sparse_irq_lock);

-static struct irq_desc **irq_desc_ptrs __read_mostly;
+static RADIX_TREE(irq_desc_tree, GFP_ATOMIC);
+
+static void set_irq_desc(unsigned int irq, struct irq_desc *desc)
+{
+ radix_tree_insert(&irq_desc_tree, irq, desc);
+}
+
+struct irq_desc *irq_to_desc(unsigned int irq)
+{
+ return radix_tree_lookup(&irq_desc_tree, irq);
+}
+
+void replace_irq_desc(unsigned int irq, struct irq_desc *desc)
+{
+ void **ptr;
+
+ ptr = radix_tree_lookup_slot(&irq_desc_tree, irq);
+ if (ptr)
+ radix_tree_replace_slot(ptr, desc);
+}

static struct irq_desc irq_desc_legacy[NR_IRQS_LEGACY] __cacheline_aligned_in_smp = {
[0 ... NR_IRQS_LEGACY-1] = {
@@ -159,9 +179,6 @@ int __init early_irq_init(void)
legacy_count = ARRAY_SIZE(irq_desc_legacy);
node = first_online_node;

- /* allocate irq_desc_ptrs array based on nr_irqs */
- irq_desc_ptrs = kcalloc(nr_irqs, sizeof(void *), GFP_NOWAIT);
-
/* allocate based on nr_cpu_ids */
kstat_irqs_legacy = kzalloc_node(NR_IRQS_LEGACY * nr_cpu_ids *
sizeof(int), GFP_NOWAIT, node);
@@ -175,28 +192,12 @@ int __init early_irq_init(void)
lockdep_set_class(&desc[i].lock, &irq_desc_lock_class);
alloc_desc_masks(&desc[i], node, true);
init_desc_masks(&desc[i]);
- irq_desc_ptrs[i] = desc + i;
+ set_irq_desc(i, &desc[i]);
}

- for (i = legacy_count; i < nr_irqs; i++)
- irq_desc_ptrs[i] = NULL;
-
return arch_early_irq_init();
}

-struct irq_desc *irq_to_desc(unsigned int irq)
-{
- if (irq_desc_ptrs && irq < nr_irqs)
- return irq_desc_ptrs[irq];
-
- return NULL;
-}
-
-void replace_irq_desc(unsigned int irq, struct irq_desc *desc)
-{
- irq_desc_ptrs[irq] = desc;
-}
-
struct irq_desc * __ref irq_to_desc_alloc_node(unsigned int irq, int node)
{
struct irq_desc *desc;
@@ -208,14 +209,14 @@ struct irq_desc * __ref irq_to_desc_alloc_node(unsigned int irq, int node)
return NULL;
}

- desc = irq_desc_ptrs[irq];
+ desc = irq_to_desc(irq);
if (desc)
return desc;

raw_spin_lock_irqsave(&sparse_irq_lock, flags);

/* We have to check it to avoid races with another CPU */
- desc = irq_desc_ptrs[irq];
+ desc = irq_to_desc(irq);
if (desc)
goto out_unlock;

@@ -228,7 +229,7 @@ struct irq_desc * __ref irq_to_desc_alloc_node(unsigned int irq, int node)
}
init_one_irq_desc(irq, desc, node);

- irq_desc_ptrs[irq] = desc;
+ set_irq_desc(irq, desc);

out_unlock:
raw_spin_unlock_irqrestore(&sparse_irq_lock, flags);
--
1.6.4.2

2010-02-09 19:34:12

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 32/35] use nr_cpus= to set nr_cpu_ids early

on x86, before prefill_possible_map(), nr_cpu_ids will be NR_CPUS aka CONFIG_NR_CPUS

add nr_cpus= to set nr_cpu_ids. so we can simulate cpus <=8 are installed on
normal config.

-v2: accordging to Christoph, acpi_numa_init should use nr_cpu_ids in stead of
NR_CPUS.

Signed-off-by: Yinghai Lu <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
---
arch/ia64/kernel/acpi.c | 4 ++--
arch/x86/kernel/smpboot.c | 7 ++++---
drivers/acpi/numa.c | 4 ++--
init/main.c | 14 ++++++++++++++
4 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index 40574ae..605a08b 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -881,8 +881,8 @@ __init void prefill_possible_map(void)

possible = available_cpus + additional_cpus;

- if (possible > NR_CPUS)
- possible = NR_CPUS;
+ if (possible > nr_cpu_ids)
+ possible = nr_cpu_ids;

printk(KERN_INFO "SMP: Allowing %d CPUs, %d hotplug CPUs\n",
possible, max((possible - available_cpus), 0));
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index b2ebcba..6c69693 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1217,11 +1217,12 @@ __init void prefill_possible_map(void)

total_cpus = max_t(int, possible, num_processors + disabled_cpus);

- if (possible > CONFIG_NR_CPUS) {
+ /* nr_cpu_ids could be reduced via nr_cpus= */
+ if (possible > nr_cpu_ids) {
printk(KERN_WARNING
"%d Processors exceeds NR_CPUS limit of %d\n",
- possible, CONFIG_NR_CPUS);
- possible = CONFIG_NR_CPUS;
+ possible, nr_cpu_ids);
+ possible = nr_cpu_ids;
}

printk(KERN_INFO "SMP: Allowing %d CPUs, %d hotplug CPUs\n",
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 7ad48df..b872546 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -279,9 +279,9 @@ int __init acpi_numa_init(void)
/* SRAT: Static Resource Affinity Table */
if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
- acpi_parse_x2apic_affinity, NR_CPUS);
+ acpi_parse_x2apic_affinity, nr_cpu_ids);
acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
- acpi_parse_processor_affinity, NR_CPUS);
+ acpi_parse_processor_affinity, nr_cpu_ids);
ret = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
acpi_parse_memory_affinity,
NR_NODE_MEMBLKS);
diff --git a/init/main.c b/init/main.c
index 8451878..05b5283 100644
--- a/init/main.c
+++ b/init/main.c
@@ -149,6 +149,20 @@ static int __init nosmp(char *str)

early_param("nosmp", nosmp);

+/* this is hard limit */
+static int __init nrcpus(char *str)
+{
+ int nr_cpus;
+
+ get_option(&str, &nr_cpus);
+ if (nr_cpus > 0 && nr_cpus < nr_cpu_ids)
+ nr_cpu_ids = nr_cpus;
+
+ return 0;
+}
+
+early_param("nr_cpus", nrcpus);
+
static int __init maxcpus(char *str)
{
get_option(&str, &setup_max_cpus);
--
1.6.4.2

2010-02-09 19:34:08

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 23/35] x86: make 32bit support NO_BOOTMEM

let's make 32bit consistent with 64bit

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/Kconfig | 1 -
arch/x86/kernel/early_res.c | 3 +++
arch/x86/mm/init_32.c | 6 ++++++
arch/x86/mm/numa_32.c | 3 +++
4 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a6fedb8..24c2908 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -571,7 +571,6 @@ config PARAVIRT_DEBUG
config NO_BOOTMEM
default y
bool "Disable Bootmem code"
- depends on X86_64
---help---
use early_res directly instead of bootmem before slab is ready.

diff --git a/arch/x86/kernel/early_res.c b/arch/x86/kernel/early_res.c
index 52804e9..38a0c61 100644
--- a/arch/x86/kernel/early_res.c
+++ b/arch/x86/kernel/early_res.c
@@ -354,6 +354,9 @@ int __init get_free_all_memory_range(struct range **rangep, int nodeid)

/* need to go over early_node_map to find out good range for node */
nr_range = add_from_early_node_map(range, count, nr_range, nodeid);
+#ifdef CONFIG_X86_32
+ subtract_range(range, count, max_low_pfn, -1UL);
+#endif
subtract_early_res(range, count);
nr_range = clean_sort_range(range, count);

diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 2dccde0..262867a 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -748,6 +748,7 @@ static void __init zone_sizes_init(void)
free_area_init_nodes(max_zone_pfns);
}

+#ifndef CONFIG_NO_BOOTMEM
static unsigned long __init setup_node_bootmem(int nodeid,
unsigned long start_pfn,
unsigned long end_pfn,
@@ -767,9 +768,11 @@ static unsigned long __init setup_node_bootmem(int nodeid,

return bootmap + bootmap_size;
}
+#endif

void __init setup_bootmem_allocator(void)
{
+#ifndef CONFIG_NO_BOOTMEM
int nodeid;
unsigned long bootmap_size, bootmap;
/*
@@ -781,11 +784,13 @@ void __init setup_bootmem_allocator(void)
if (bootmap == -1L)
panic("Cannot find bootmem map of size %ld\n", bootmap_size);
reserve_early(bootmap, bootmap + bootmap_size, "BOOTMAP");
+#endif

printk(KERN_INFO " mapped low ram: 0 - %08lx\n",
max_pfn_mapped<<PAGE_SHIFT);
printk(KERN_INFO " low ram: 0 - %08lx\n", max_low_pfn<<PAGE_SHIFT);

+#ifndef CONFIG_NO_BOOTMEM
for_each_online_node(nodeid) {
unsigned long start_pfn, end_pfn;

@@ -803,6 +808,7 @@ void __init setup_bootmem_allocator(void)
bootmap = setup_node_bootmem(nodeid, start_pfn, end_pfn,
bootmap);
}
+#endif

after_bootmem = 1;
}
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index b20760c..809baaa 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -418,7 +418,10 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,

for_each_online_node(nid) {
memset(NODE_DATA(nid), 0, sizeof(struct pglist_data));
+ NODE_DATA(nid)->node_id = nid;
+#ifndef CONFIG_NO_BOOTMEM
NODE_DATA(nid)->bdata = &bootmem_node_data[nid];
+#endif
}

setup_bootmem_allocator();
--
1.6.4.2

2010-02-09 19:34:06

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 24/35] move round_up/down to kernel.h

prepare to early_res moving up

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/proto.h | 10 ----------
include/linux/kernel.h | 10 ++++++++++
2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index 4009f65..6f414ed 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -23,14 +23,4 @@ extern int reboot_force;

long do_arch_prctl(struct task_struct *task, int code, unsigned long addr);

-/*
- * This looks more complex than it should be. But we need to
- * get the type for the ~ right in round_down (it needs to be
- * as wide as the result!), and we want to evaluate the macro
- * arguments just once each.
- */
-#define __round_mask(x,y) ((__typeof__(x))((y)-1))
-#define round_up(x,y) ((((x)-1) | __round_mask(x,y))+1)
-#define round_down(x,y) ((x) & ~__round_mask(x,y))
-
#endif /* _ASM_X86_PROTO_H */
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 1221d23..8a6e583 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -44,6 +44,16 @@ extern const char linux_proc_banner[];

#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))

+/*
+ * This looks more complex than it should be. But we need to
+ * get the type for the ~ right in round_down (it needs to be
+ * as wide as the result!), and we want to evaluate the macro
+ * arguments just once each.
+ */
+#define __round_mask(x,y) ((__typeof__(x))((y)-1))
+#define round_up(x,y) ((((x)-1) | __round_mask(x,y))+1)
+#define round_down(x,y) ((x) & ~__round_mask(x,y))
+
#define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f))
#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
#define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))
--
1.6.4.2

2010-02-09 19:33:55

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 27/35] irq: remove not need bootmem code

mem_init is moved early already.

Signed-off-by: Yinghai Lu <[email protected]>
---
kernel/irq/handle.c | 14 +++-----------
1 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index 814940e..0e823c0 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -19,7 +19,6 @@
#include <linux/kernel_stat.h>
#include <linux/rculist.h>
#include <linux/hash.h>
-#include <linux/bootmem.h>
#include <trace/events/irq.h>

#include "internals.h"
@@ -87,12 +86,8 @@ void __ref init_kstat_irqs(struct irq_desc *desc, int node, int nr)
{
void *ptr;

- if (slab_is_available())
- ptr = kzalloc_node(nr * sizeof(*desc->kstat_irqs),
- GFP_ATOMIC, node);
- else
- ptr = alloc_bootmem_node(NODE_DATA(node),
- nr * sizeof(*desc->kstat_irqs));
+ ptr = kzalloc_node(nr * sizeof(*desc->kstat_irqs),
+ GFP_ATOMIC, node);

/*
* don't overwite if can not get new one
@@ -219,10 +214,7 @@ struct irq_desc * __ref irq_to_desc_alloc_node(unsigned int irq, int node)
if (desc)
goto out_unlock;

- if (slab_is_available())
- desc = kzalloc_node(sizeof(*desc), GFP_ATOMIC, node);
- else
- desc = alloc_bootmem_node(NODE_DATA(node), sizeof(*desc));
+ desc = kzalloc_node(sizeof(*desc), GFP_ATOMIC, node);

printk(KERN_DEBUG " alloc irq_desc for %d on node %d\n", irq, node);
if (!desc) {
--
1.6.4.2

2010-02-09 19:35:52

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 28/35] radix: move radix init early

prepare to use it in early_irq_init()

Signed-off-by: Yinghai Lu <[email protected]>
---
init/main.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/init/main.c b/init/main.c
index 4cb47a1..8451878 100644
--- a/init/main.c
+++ b/init/main.c
@@ -584,6 +584,7 @@ asmlinkage void __init start_kernel(void)
local_irq_disable();
}
rcu_init();
+ radix_tree_init();
/* init some links before init_ISA_irqs() */
early_irq_init();
init_IRQ();
@@ -657,7 +658,6 @@ asmlinkage void __init start_kernel(void)
proc_caches_init();
buffer_init();
key_init();
- radix_tree_init();
security_init();
vfs_caches_init(totalram_pages);
signals_init();
--
1.6.4.2

2010-02-09 19:33:57

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 29/35] sparseirq: change irq_desc_ptrs to static

add replace_irq_desc()

-v2: remove unneeded boundary check in replace_irq_desc

Signed-off-by: Yinghai Lu <[email protected]>
---
kernel/irq/handle.c | 7 ++++++-
kernel/irq/internals.h | 6 +-----
kernel/irq/numa_migrate.c | 4 ++--
3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index 0e823c0..266f798 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -127,7 +127,7 @@ static void init_one_irq_desc(int irq, struct irq_desc *desc, int node)
*/
DEFINE_RAW_SPINLOCK(sparse_irq_lock);

-struct irq_desc **irq_desc_ptrs __read_mostly;
+static struct irq_desc **irq_desc_ptrs __read_mostly;

static struct irq_desc irq_desc_legacy[NR_IRQS_LEGACY] __cacheline_aligned_in_smp = {
[0 ... NR_IRQS_LEGACY-1] = {
@@ -192,6 +192,11 @@ struct irq_desc *irq_to_desc(unsigned int irq)
return NULL;
}

+void replace_irq_desc(unsigned int irq, struct irq_desc *desc)
+{
+ irq_desc_ptrs[irq] = desc;
+}
+
struct irq_desc * __ref irq_to_desc_alloc_node(unsigned int irq, int node)
{
struct irq_desc *desc;
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index b2821f0..c63f3bc 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -21,11 +21,7 @@ extern void clear_kstat_irqs(struct irq_desc *desc);
extern raw_spinlock_t sparse_irq_lock;

#ifdef CONFIG_SPARSE_IRQ
-/* irq_desc_ptrs allocated at boot time */
-extern struct irq_desc **irq_desc_ptrs;
-#else
-/* irq_desc_ptrs is a fixed size array */
-extern struct irq_desc *irq_desc_ptrs[NR_IRQS];
+void replace_irq_desc(unsigned int irq, struct irq_desc *desc);
#endif

#ifdef CONFIG_PROC_FS
diff --git a/kernel/irq/numa_migrate.c b/kernel/irq/numa_migrate.c
index 26bac9d..963559d 100644
--- a/kernel/irq/numa_migrate.c
+++ b/kernel/irq/numa_migrate.c
@@ -70,7 +70,7 @@ static struct irq_desc *__real_move_irq_desc(struct irq_desc *old_desc,
raw_spin_lock_irqsave(&sparse_irq_lock, flags);

/* We have to check it to avoid races with another CPU */
- desc = irq_desc_ptrs[irq];
+ desc = irq_to_desc(irq);

if (desc && old_desc != desc)
goto out_unlock;
@@ -90,7 +90,7 @@ static struct irq_desc *__real_move_irq_desc(struct irq_desc *old_desc,
goto out_unlock;
}

- irq_desc_ptrs[irq] = desc;
+ replace_irq_desc(irq, desc);
raw_spin_unlock_irqrestore(&sparse_irq_lock, flags);

/* free the old one */
--
1.6.4.2

2010-02-09 19:35:10

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 10/35] x86: call early_res_to_bootmem one time

simplify setup_node_mem, do use bootmem from other node.
instead just find_e820_area in early_node_mem.

so we can keep the boundary between early_res and boot mem more clear.
and only call civertion one time instead of for all nodes.

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/kernel/setup.c | 1 +
arch/x86/mm/init_32.c | 1 -
arch/x86/mm/init_64.c | 3 +-
arch/x86/mm/numa_64.c | 62 +++++++++++++++-------------------------------
4 files changed, 22 insertions(+), 45 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3499b4f..48cadbb 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -967,6 +967,7 @@ void __init setup_arch(char **cmdline_p)
#endif

initmem_init(0, max_pfn, acpi, k8);
+ early_res_to_bootmem(0, max_low_pfn<<PAGE_SHIFT);

#ifdef CONFIG_X86_64
/*
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 9a0c258..2dccde0 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -764,7 +764,6 @@ static unsigned long __init setup_node_bootmem(int nodeid,
printk(KERN_INFO " node %d bootmap %08lx - %08lx\n",
nodeid, bootmap, bootmap + bootmap_size);
free_bootmem_with_active_regions(nodeid, end_pfn);
- early_res_to_bootmem(start_pfn<<PAGE_SHIFT, end_pfn<<PAGE_SHIFT);

return bootmap + bootmap_size;
}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 69ddfbd..a15abaa 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -579,13 +579,12 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
PAGE_SIZE);
if (bootmap == -1L)
panic("Cannot find bootmem map of size %ld\n", bootmap_size);
+ reserve_early(bootmap, bootmap + bootmap_size, "BOOTMAP");
/* don't touch min_low_pfn */
bootmap_size = init_bootmem_node(NODE_DATA(0), bootmap >> PAGE_SHIFT,
0, end_pfn);
e820_register_active_regions(0, start_pfn, end_pfn);
free_bootmem_with_active_regions(0, end_pfn);
- early_res_to_bootmem(0, end_pfn<<PAGE_SHIFT);
- reserve_bootmem(bootmap, bootmap_size, BOOTMEM_DEFAULT);
}
#endif

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 83bbc70..3232148 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -164,18 +164,21 @@ static void * __init early_node_mem(int nodeid, unsigned long start,
unsigned long align)
{
unsigned long mem = find_e820_area(start, end, size, align);
- void *ptr;

if (mem != -1L)
return __va(mem);

- ptr = __alloc_bootmem_nopanic(size, align, __pa(MAX_DMA_ADDRESS));
- if (ptr == NULL) {
- printk(KERN_ERR "Cannot find %lu bytes in node %d\n",
+
+ start = __pa(MAX_DMA_ADDRESS);
+ end = max_low_pfn_mapped << PAGE_SHIFT;
+ mem = find_e820_area(start, end, size, align);
+ if (mem != -1L)
+ return __va(mem);
+
+ printk(KERN_ERR "Cannot find %lu bytes in node %d\n",
size, nodeid);
- return NULL;
- }
- return ptr;
+
+ return NULL;
}

/* Initialize bootmem allocator for a node */
@@ -211,8 +214,12 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
if (node_data[nodeid] == NULL)
return;
nodedata_phys = __pa(node_data[nodeid]);
+ reserve_early(nodedata_phys, nodedata_phys + pgdat_size, "NODE_DATA");
printk(KERN_INFO " NODE_DATA [%016lx - %016lx]\n", nodedata_phys,
nodedata_phys + pgdat_size - 1);
+ nid = phys_to_nid(nodedata_phys);
+ if (nid != nodeid)
+ printk(KERN_INFO " NODE_DATA(%d) on node %d\n", nodeid, nid);

memset(NODE_DATA(nodeid), 0, sizeof(pg_data_t));
NODE_DATA(nodeid)->bdata = &bootmem_node_data[nodeid];
@@ -227,11 +234,7 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
* of alloc_bootmem, that could clash with reserved range
*/
bootmap_pages = bootmem_bootmap_pages(last_pfn - start_pfn);
- nid = phys_to_nid(nodedata_phys);
- if (nid == nodeid)
- bootmap_start = roundup(nodedata_phys + pgdat_size, PAGE_SIZE);
- else
- bootmap_start = roundup(start, PAGE_SIZE);
+ bootmap_start = roundup(nodedata_phys + pgdat_size, PAGE_SIZE);
/*
* SMP_CACHE_BYTES could be enough, but init_bootmem_node like
* to use that to align to PAGE_SIZE
@@ -239,18 +242,13 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
bootmap = early_node_mem(nodeid, bootmap_start, end,
bootmap_pages<<PAGE_SHIFT, PAGE_SIZE);
if (bootmap == NULL) {
- if (nodedata_phys < start || nodedata_phys >= end) {
- /*
- * only need to free it if it is from other node
- * bootmem
- */
- if (nid != nodeid)
- free_bootmem(nodedata_phys, pgdat_size);
- }
+ free_early(nodedata_phys, nodedata_phys + pgdat_size);
node_data[nodeid] = NULL;
return;
}
bootmap_start = __pa(bootmap);
+ reserve_early(bootmap_start, bootmap_start+(bootmap_pages<<PAGE_SHIFT),
+ "BOOTMAP");

bootmap_size = init_bootmem_node(NODE_DATA(nodeid),
bootmap_start >> PAGE_SHIFT,
@@ -259,31 +257,11 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
printk(KERN_INFO " bootmap [%016lx - %016lx] pages %lx\n",
bootmap_start, bootmap_start + bootmap_size - 1,
bootmap_pages);
-
- free_bootmem_with_active_regions(nodeid, end);
-
- /*
- * convert early reserve to bootmem reserve earlier
- * otherwise early_node_mem could use early reserved mem
- * on previous node
- */
- early_res_to_bootmem(start, end);
-
- /*
- * in some case early_node_mem could use alloc_bootmem
- * to get range on other node, don't reserve that again
- */
- if (nid != nodeid)
- printk(KERN_INFO " NODE_DATA(%d) on node %d\n", nodeid, nid);
- else
- reserve_bootmem_node(NODE_DATA(nodeid), nodedata_phys,
- pgdat_size, BOOTMEM_DEFAULT);
nid = phys_to_nid(bootmap_start);
if (nid != nodeid)
printk(KERN_INFO " bootmap(%d) on node %d\n", nodeid, nid);
- else
- reserve_bootmem_node(NODE_DATA(nodeid), bootmap_start,
- bootmap_pages<<PAGE_SHIFT, BOOTMEM_DEFAULT);
+
+ free_bootmem_with_active_regions(nodeid, end);

node_set_online(nodeid);
}
--
1.6.4.2

2010-02-09 19:35:15

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 15/35] x86: make 64 bit use early_res instead of bootmem before slab

finally we can use early_res to replace bootmem for x86_64 now.

still can use CONFIG_NO_BOOTMEM to enable it or not

-v2: fix 32bit compiling about MAX_DMA32_PFN

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/Kconfig | 7 ++
arch/x86/include/asm/e820.h | 6 ++
arch/x86/kernel/e820.c | 159 ++++++++++++++++++++++++++++++++---
arch/x86/kernel/setup.c | 2 +
arch/x86/mm/init_64.c | 5 +-
arch/x86/mm/numa_64.c | 20 ++++-
include/linux/bootmem.h | 7 ++
include/linux/mm.h | 5 +
include/linux/mmzone.h | 2 +
mm/bootmem.c | 195 ++++++++++++++++++++++++++++++++++++++++++-
mm/page_alloc.c | 59 +++++++++++++-
mm/percpu.c | 3 +
mm/sparse-vmemmap.c | 2 +-
13 files changed, 448 insertions(+), 24 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index eb40925..a6fedb8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -568,6 +568,13 @@ config PARAVIRT_DEBUG
Enable to debug paravirt_ops internals. Specifically, BUG if
a paravirt_op is missing when it is called.

+config NO_BOOTMEM
+ default y
+ bool "Disable Bootmem code"
+ depends on X86_64
+ ---help---
+ use early_res directly instead of bootmem before slab is ready.
+
config MEMTEST
bool "Memtest"
---help---
diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index 761249e..7d72e5f 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -117,6 +117,12 @@ extern void free_early(u64 start, u64 end);
extern void early_res_to_bootmem(u64 start, u64 end);
extern u64 early_reserve_e820(u64 startt, u64 sizet, u64 align);

+void reserve_early_without_check(u64 start, u64 end, char *name);
+u64 find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
+ u64 size, u64 align);
+#include <linux/range.h>
+int get_free_all_memory_range(struct range **rangep, int nodeid);
+
extern unsigned long e820_end_of_ram_pfn(void);
extern unsigned long e820_end_of_low_ram_pfn(void);
extern int e820_find_active_region(const struct e820entry *ei,
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index e13bbd0..fd14052 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -977,6 +977,25 @@ void __init reserve_early(u64 start, u64 end, char *name)
__reserve_early(start, end, name, 0);
}

+void __init reserve_early_without_check(u64 start, u64 end, char *name)
+{
+ struct early_res *r;
+
+ if (start >= end)
+ return;
+
+ __check_and_double_early_res(end);
+
+ r = &early_res[early_res_count];
+
+ r->start = start;
+ r->end = end;
+ r->overlap_ok = 0;
+ if (name)
+ strncpy(r->name, name, sizeof(r->name) - 1);
+ early_res_count++;
+}
+
void __init free_early(u64 start, u64 end)
{
struct early_res *r;
@@ -991,6 +1010,94 @@ void __init free_early(u64 start, u64 end)
drop_range(i);
}

+#ifdef CONFIG_NO_BOOTMEM
+static void __init subtract_early_res(struct range *range, int az)
+{
+ int i, count;
+ u64 final_start, final_end;
+ int idx = 0;
+
+ count = 0;
+ for (i = 0; i < max_early_res && early_res[i].end; i++)
+ count++;
+
+ /* need to skip first one ?*/
+ if (early_res != early_res_x)
+ idx = 1;
+
+#if 1
+ printk(KERN_INFO "Subtract (%d early reservations)\n", count);
+#endif
+ for (i = idx; i < count; i++) {
+ struct early_res *r = &early_res[i];
+#if 0
+ printk(KERN_INFO " #%d [%010llx - %010llx] %15s", i,
+ r->start, r->end, r->name);
+#endif
+ final_start = PFN_DOWN(r->start);
+ final_end = PFN_UP(r->end);
+ if (final_start >= final_end) {
+#if 0
+ printk(KERN_CONT "\n");
+#endif
+ continue;
+ }
+#if 0
+ printk(KERN_CONT " subtract pfn [%010llx - %010llx]\n",
+ final_start, final_end);
+#endif
+ subtract_range(range, az, final_start, final_end);
+ }
+
+}
+
+int __init get_free_all_memory_range(struct range **rangep, int nodeid)
+{
+ int i, count;
+ u64 start = 0, end;
+ u64 size;
+ u64 mem;
+ struct range *range;
+ int nr_range;
+
+ count = 0;
+ for (i = 0; i < max_early_res && early_res[i].end; i++)
+ count++;
+
+ count *= 2;
+
+ size = sizeof(struct range) * count;
+#ifdef MAX_DMA32_PFN
+ if (max_pfn_mapped > MAX_DMA32_PFN)
+ start = MAX_DMA32_PFN << PAGE_SHIFT;
+#endif
+ end = max_pfn_mapped << PAGE_SHIFT;
+ mem = find_e820_area(start, end, size, sizeof(struct range));
+ if (mem == -1ULL)
+ panic("can not find more space for range free");
+
+ range = __va(mem);
+ /* use early_node_map[] and early_res to get range array at first */
+ memset(range, 0, size);
+ nr_range = 0;
+
+ /* need to go over early_node_map to find out good range for node */
+ nr_range = add_from_early_node_map(range, count, nr_range, nodeid);
+ subtract_early_res(range, count);
+ nr_range = clean_sort_range(range, count);
+
+ /* need to clear it ? */
+ if (nodeid == MAX_NUMNODES) {
+ memset(&early_res[0], 0,
+ sizeof(struct early_res) * max_early_res);
+ early_res = NULL;
+ max_early_res = 0;
+ }
+
+ *rangep = range;
+ return nr_range;
+}
+#else
void __init early_res_to_bootmem(u64 start, u64 end)
{
int i, count;
@@ -1028,6 +1135,7 @@ void __init early_res_to_bootmem(u64 start, u64 end)
max_early_res = 0;
early_res_count = 0;
}
+#endif

/* Check for already reserved areas */
static inline int __init bad_addr(u64 *addrp, u64 size, u64 align)
@@ -1083,6 +1191,35 @@ again:

/*
* Find a free area with specified alignment in a specific range.
+ * only with the area.between start to end is active range from early_node_map
+ * so they are good as RAM
+ */
+u64 __init find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
+ u64 size, u64 align)
+{
+ u64 addr, last;
+
+ addr = round_up(ei_start, align);
+ if (addr < start)
+ addr = round_up(start, align);
+ if (addr >= ei_last)
+ goto out;
+ while (bad_addr(&addr, size, align) && addr+size <= ei_last)
+ ;
+ last = addr + size;
+ if (last > ei_last)
+ goto out;
+ if (last > end)
+ goto out;
+
+ return addr;
+
+out:
+ return -1ULL;
+}
+
+/*
+ * Find a free area with specified alignment in a specific range.
*/
u64 __init find_e820_area(u64 start, u64 end, u64 size, u64 align)
{
@@ -1090,24 +1227,20 @@ u64 __init find_e820_area(u64 start, u64 end, u64 size, u64 align)

for (i = 0; i < e820.nr_map; i++) {
struct e820entry *ei = &e820.map[i];
- u64 addr, last;
- u64 ei_last;
+ u64 addr;
+ u64 ei_start, ei_last;

if (ei->type != E820_RAM)
continue;
- addr = round_up(ei->addr, align);
+
ei_last = ei->addr + ei->size;
- if (addr < start)
- addr = round_up(start, align);
- if (addr >= ei_last)
- continue;
- while (bad_addr(&addr, size, align) && addr+size <= ei_last)
- ;
- last = addr + size;
- if (last > ei_last)
- continue;
- if (last > end)
+ ei_start = ei->addr;
+ addr = find_early_area(ei_start, ei_last, start, end,
+ size, align);
+
+ if (addr == -1ULL)
continue;
+
return addr;
}
return -1ULL;
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index ea4141b..d49e168 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -967,7 +967,9 @@ void __init setup_arch(char **cmdline_p)
#endif

initmem_init(0, max_pfn, acpi, k8);
+#ifndef CONFIG_NO_BOOTMEM
early_res_to_bootmem(0, max_low_pfn<<PAGE_SHIFT);
+#endif

dma32_reserve_bootmem();

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a15abaa..fd9a004 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -572,6 +572,7 @@ kernel_physical_mapping_init(unsigned long start,
void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
int acpi, int k8)
{
+#ifndef CONFIG_NO_BOOTMEM
unsigned long bootmap_size, bootmap;

bootmap_size = bootmem_bootmap_pages(end_pfn)<<PAGE_SHIFT;
@@ -583,8 +584,10 @@ void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
/* don't touch min_low_pfn */
bootmap_size = init_bootmem_node(NODE_DATA(0), bootmap >> PAGE_SHIFT,
0, end_pfn);
- e820_register_active_regions(0, start_pfn, end_pfn);
free_bootmem_with_active_regions(0, end_pfn);
+#else
+ e820_register_active_regions(0, start_pfn, end_pfn);
+#endif
}
#endif

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 02f13cb..a20e170 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -198,11 +198,13 @@ static void * __init early_node_mem(int nodeid, unsigned long start,
void __init
setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
{
- unsigned long start_pfn, last_pfn, bootmap_pages, bootmap_size;
+ unsigned long start_pfn, last_pfn, nodedata_phys;
const int pgdat_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
- unsigned long bootmap_start, nodedata_phys;
- void *bootmap;
int nid;
+#ifndef CONFIG_NO_BOOTMEM
+ unsigned long bootmap_start, bootmap_pages, bootmap_size;
+ void *bootmap;
+#endif

if (!end)
return;
@@ -216,7 +218,7 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)

start = roundup(start, ZONE_ALIGN);

- printk(KERN_INFO "Bootmem setup node %d %016lx-%016lx\n", nodeid,
+ printk(KERN_INFO "Initmem setup node %d %016lx-%016lx\n", nodeid,
start, end);

start_pfn = start >> PAGE_SHIFT;
@@ -235,10 +237,13 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
printk(KERN_INFO " NODE_DATA(%d) on node %d\n", nodeid, nid);

memset(NODE_DATA(nodeid), 0, sizeof(pg_data_t));
- NODE_DATA(nodeid)->bdata = &bootmem_node_data[nodeid];
+ NODE_DATA(nodeid)->node_id = nodeid;
NODE_DATA(nodeid)->node_start_pfn = start_pfn;
NODE_DATA(nodeid)->node_spanned_pages = last_pfn - start_pfn;

+#ifndef CONFIG_NO_BOOTMEM
+ NODE_DATA(nodeid)->bdata = &bootmem_node_data[nodeid];
+
/*
* Find a place for the bootmem map
* nodedata_phys could be on other nodes by alloc_bootmem,
@@ -275,6 +280,7 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
printk(KERN_INFO " bootmap(%d) on node %d\n", nodeid, nid);

free_bootmem_with_active_regions(nodeid, end);
+#endif

node_set_online(nodeid);
}
@@ -733,6 +739,10 @@ unsigned long __init numa_free_all_bootmem(void)
for_each_online_node(i)
pages += free_all_bootmem_node(NODE_DATA(i));

+#ifdef CONFIG_NO_BOOTMEM
+ pages += free_all_memory_core_early(MAX_NUMNODES);
+#endif
+
return pages;
}

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index b10ec49..266ab92 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -23,6 +23,7 @@ extern unsigned long max_pfn;
extern unsigned long saved_max_pfn;
#endif

+#ifndef CONFIG_NO_BOOTMEM
/*
* node_bootmem_map is a map pointer - the bits represent all physical
* memory pages (including holes) on the node.
@@ -37,6 +38,7 @@ typedef struct bootmem_data {
} bootmem_data_t;

extern bootmem_data_t bootmem_node_data[];
+#endif

extern unsigned long bootmem_bootmap_pages(unsigned long);

@@ -46,6 +48,7 @@ extern unsigned long init_bootmem_node(pg_data_t *pgdat,
unsigned long endpfn);
extern unsigned long init_bootmem(unsigned long addr, unsigned long memend);

+unsigned long free_all_memory_core_early(int nodeid);
extern unsigned long free_all_bootmem_node(pg_data_t *pgdat);
extern unsigned long free_all_bootmem(void);

@@ -84,6 +87,10 @@ extern void *__alloc_bootmem_node(pg_data_t *pgdat,
unsigned long size,
unsigned long align,
unsigned long goal);
+void *__alloc_bootmem_node_high(pg_data_t *pgdat,
+ unsigned long size,
+ unsigned long align,
+ unsigned long goal);
extern void *__alloc_bootmem_node_nopanic(pg_data_t *pgdat,
unsigned long size,
unsigned long align,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8b2fa85..f2c5b3c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -12,6 +12,7 @@
#include <linux/prio_tree.h>
#include <linux/debug_locks.h>
#include <linux/mm_types.h>
+#include <linux/range.h>

struct mempolicy;
struct anon_vma;
@@ -1049,6 +1050,10 @@ extern void get_pfn_range_for_nid(unsigned int nid,
extern unsigned long find_min_pfn_with_active_regions(void);
extern void free_bootmem_with_active_regions(int nid,
unsigned long max_low_pfn);
+int add_from_early_node_map(struct range *range, int az,
+ int nr_range, int nid);
+void *__alloc_memory_core_early(int nodeid, u64 size, u64 align,
+ u64 goal, u64 limit);
typedef int (*work_fn_t)(unsigned long, unsigned long, void *);
extern void work_with_active_regions(int nid, work_fn_t work_fn, void *data);
extern void sparse_memory_present_with_active_regions(int nid);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 30fe668..eae8387 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -620,7 +620,9 @@ typedef struct pglist_data {
struct page_cgroup *node_page_cgroup;
#endif
#endif
+#ifndef CONFIG_NO_BOOTMEM
struct bootmem_data *bdata;
+#endif
#ifdef CONFIG_MEMORY_HOTPLUG
/*
* Must be held any time you expect node_start_pfn, node_present_pages
diff --git a/mm/bootmem.c b/mm/bootmem.c
index 7d14868..d7c791e 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -13,6 +13,7 @@
#include <linux/bootmem.h>
#include <linux/module.h>
#include <linux/kmemleak.h>
+#include <linux/range.h>

#include <asm/bug.h>
#include <asm/io.h>
@@ -32,6 +33,7 @@ unsigned long max_pfn;
unsigned long saved_max_pfn;
#endif

+#ifndef CONFIG_NO_BOOTMEM
bootmem_data_t bootmem_node_data[MAX_NUMNODES] __initdata;

static struct list_head bdata_list __initdata = LIST_HEAD_INIT(bdata_list);
@@ -142,7 +144,7 @@ unsigned long __init init_bootmem(unsigned long start, unsigned long pages)
min_low_pfn = start;
return init_bootmem_core(NODE_DATA(0)->bdata, start, 0, pages);
}
-
+#endif
/*
* free_bootmem_late - free bootmem pages directly to page allocator
* @addr: starting address of the range
@@ -167,6 +169,60 @@ void __init free_bootmem_late(unsigned long addr, unsigned long size)
}
}

+#ifdef CONFIG_NO_BOOTMEM
+static void __init __free_pages_memory(unsigned long start, unsigned long end)
+{
+ int i;
+ unsigned long start_aligned, end_aligned;
+ int order = ilog2(BITS_PER_LONG);
+
+ start_aligned = (start + (BITS_PER_LONG - 1)) & ~(BITS_PER_LONG - 1);
+ end_aligned = end & ~(BITS_PER_LONG - 1);
+
+ if (end_aligned <= start_aligned) {
+#if 1
+ printk(KERN_DEBUG " %lx - %lx\n", start, end);
+#endif
+ for (i = start; i < end; i++)
+ __free_pages_bootmem(pfn_to_page(i), 0);
+
+ return;
+ }
+
+#if 1
+ printk(KERN_DEBUG " %lx %lx - %lx %lx\n",
+ start, start_aligned, end_aligned, end);
+#endif
+ for (i = start; i < start_aligned; i++)
+ __free_pages_bootmem(pfn_to_page(i), 0);
+
+ for (i = start_aligned; i < end_aligned; i += BITS_PER_LONG)
+ __free_pages_bootmem(pfn_to_page(i), order);
+
+ for (i = end_aligned; i < end; i++)
+ __free_pages_bootmem(pfn_to_page(i), 0);
+}
+
+unsigned long __init free_all_memory_core_early(int nodeid)
+{
+ int i;
+ u64 start, end;
+ unsigned long count = 0;
+ struct range *range = NULL;
+ int nr_range;
+
+ nr_range = get_free_all_memory_range(&range, nodeid);
+
+ for (i = 0; i < nr_range; i++) {
+ start = range[i].start;
+ end = range[i].end;
+ count += end - start;
+ __free_pages_memory(start, end);
+ }
+
+ return count;
+}
+#else
static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
{
int aligned;
@@ -227,6 +283,7 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)

return count;
}
+#endif

/**
* free_all_bootmem_node - release a node's free pages to the buddy allocator
@@ -237,7 +294,12 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata)
unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
{
register_page_bootmem_info_node(pgdat);
+#ifdef CONFIG_NO_BOOTMEM
+ /* free_all_memory_core_early(MAX_NUMNODES) will be called later */
+ return 0;
+#else
return free_all_bootmem_core(pgdat->bdata);
+#endif
}

/**
@@ -247,9 +309,14 @@ unsigned long __init free_all_bootmem_node(pg_data_t *pgdat)
*/
unsigned long __init free_all_bootmem(void)
{
+#ifdef CONFIG_NO_BOOTMEM
+ return free_all_memory_core_early(NODE_DATA(0)->node_id);
+#else
return free_all_bootmem_core(NODE_DATA(0)->bdata);
+#endif
}

+#ifndef CONFIG_NO_BOOTMEM
static void __init __free(bootmem_data_t *bdata,
unsigned long sidx, unsigned long eidx)
{
@@ -344,6 +411,7 @@ static int __init mark_bootmem(unsigned long start, unsigned long end,
}
BUG();
}
+#endif

/**
* free_bootmem_node - mark a page range as usable
@@ -358,6 +426,12 @@ static int __init mark_bootmem(unsigned long start, unsigned long end,
void __init free_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
unsigned long size)
{
+#ifdef CONFIG_NO_BOOTMEM
+ free_early(physaddr, physaddr + size);
+#if 0
+ printk(KERN_DEBUG "free %lx %lx\n", physaddr, size);
+#endif
+#else
unsigned long start, end;

kmemleak_free_part(__va(physaddr), size);
@@ -366,6 +440,7 @@ void __init free_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
end = PFN_DOWN(physaddr + size);

mark_bootmem_node(pgdat->bdata, start, end, 0, 0);
+#endif
}

/**
@@ -379,6 +454,12 @@ void __init free_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
*/
void __init free_bootmem(unsigned long addr, unsigned long size)
{
+#ifdef CONFIG_NO_BOOTMEM
+ free_early(addr, addr + size);
+#if 0
+ printk(KERN_DEBUG "free %lx %lx\n", addr, size);
+#endif
+#else
unsigned long start, end;

kmemleak_free_part(__va(addr), size);
@@ -387,6 +468,7 @@ void __init free_bootmem(unsigned long addr, unsigned long size)
end = PFN_DOWN(addr + size);

mark_bootmem(start, end, 0, 0);
+#endif
}

/**
@@ -403,12 +485,17 @@ void __init free_bootmem(unsigned long addr, unsigned long size)
int __init reserve_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
unsigned long size, int flags)
{
+#ifdef CONFIG_NO_BOOTMEM
+ panic("no bootmem");
+ return 0;
+#else
unsigned long start, end;

start = PFN_DOWN(physaddr);
end = PFN_UP(physaddr + size);

return mark_bootmem_node(pgdat->bdata, start, end, 1, flags);
+#endif
}

/**
@@ -424,14 +511,20 @@ int __init reserve_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
int __init reserve_bootmem(unsigned long addr, unsigned long size,
int flags)
{
+#ifdef CONFIG_NO_BOOTMEM
+ panic("no bootmem");
+ return 0;
+#else
unsigned long start, end;

start = PFN_DOWN(addr);
end = PFN_UP(addr + size);

return mark_bootmem(start, end, 1, flags);
+#endif
}

+#ifndef CONFIG_NO_BOOTMEM
static unsigned long __init align_idx(struct bootmem_data *bdata,
unsigned long idx, unsigned long step)
{
@@ -582,12 +675,33 @@ static void * __init alloc_arch_preferred_bootmem(bootmem_data_t *bdata,
#endif
return NULL;
}
+#endif

static void * __init ___alloc_bootmem_nopanic(unsigned long size,
unsigned long align,
unsigned long goal,
unsigned long limit)
{
+#ifdef CONFIG_NO_BOOTMEM
+ void *ptr;
+
+ if (WARN_ON_ONCE(slab_is_available()))
+ return kzalloc(size, GFP_NOWAIT);
+
+restart:
+
+ ptr = __alloc_memory_core_early(MAX_NUMNODES, size, align, goal, limit);
+
+ if (ptr)
+ return ptr;
+
+ if (goal != 0) {
+ goal = 0;
+ goto restart;
+ }
+
+ return NULL;
+#else
bootmem_data_t *bdata;
void *region;

@@ -613,6 +727,7 @@ restart:
}

return NULL;
+#endif
}

/**
@@ -631,7 +746,13 @@ restart:
void * __init __alloc_bootmem_nopanic(unsigned long size, unsigned long align,
unsigned long goal)
{
- return ___alloc_bootmem_nopanic(size, align, goal, 0);
+ unsigned long limit = 0;
+
+#ifdef CONFIG_NO_BOOTMEM
+ limit = -1UL;
+#endif
+
+ return ___alloc_bootmem_nopanic(size, align, goal, limit);
}

static void * __init ___alloc_bootmem(unsigned long size, unsigned long align,
@@ -665,9 +786,16 @@ static void * __init ___alloc_bootmem(unsigned long size, unsigned long align,
void * __init __alloc_bootmem(unsigned long size, unsigned long align,
unsigned long goal)
{
- return ___alloc_bootmem(size, align, goal, 0);
+ unsigned long limit = 0;
+
+#ifdef CONFIG_NO_BOOTMEM
+ limit = -1UL;
+#endif
+
+ return ___alloc_bootmem(size, align, goal, limit);
}

+#ifndef CONFIG_NO_BOOTMEM
static void * __init ___alloc_bootmem_node(bootmem_data_t *bdata,
unsigned long size, unsigned long align,
unsigned long goal, unsigned long limit)
@@ -684,6 +812,7 @@ static void * __init ___alloc_bootmem_node(bootmem_data_t *bdata,

return ___alloc_bootmem(size, align, goal, limit);
}
+#endif

/**
* __alloc_bootmem_node - allocate boot memory from a specific node
@@ -706,7 +835,46 @@ void * __init __alloc_bootmem_node(pg_data_t *pgdat, unsigned long size,
if (WARN_ON_ONCE(slab_is_available()))
return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);

+#ifdef CONFIG_NO_BOOTMEM
+ return __alloc_memory_core_early(pgdat->node_id, size, align,
+ goal, -1ULL);
+#else
return ___alloc_bootmem_node(pgdat->bdata, size, align, goal, 0);
+#endif
+}
+
+void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size,
+ unsigned long align, unsigned long goal)
+{
+#ifdef MAX_DMA32_PFN
+ unsigned long end_pfn;
+
+ if (WARN_ON_ONCE(slab_is_available()))
+ return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
+
+ /* update goal according ...MAX_DMA32_PFN */
+ end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages;
+
+ if (end_pfn > MAX_DMA32_PFN + (128 >> (20 - PAGE_SHIFT)) &&
+ (goal >> PAGE_SHIFT) < MAX_DMA32_PFN) {
+ void *ptr;
+ unsigned long new_goal;
+
+ new_goal = MAX_DMA32_PFN << PAGE_SHIFT;
+#ifdef CONFIG_NO_BOOTMEM
+ ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
+ new_goal, -1ULL);
+#else
+ ptr = alloc_bootmem_core(pgdat->bdata, size, align,
+ new_goal, 0);
+#endif
+ if (ptr)
+ return ptr;
+ }
+#endif
+
+ return __alloc_bootmem_node(pgdat, size, align, goal);
+
}

#ifdef CONFIG_SPARSEMEM
@@ -720,6 +888,16 @@ void * __init __alloc_bootmem_node(pg_data_t *pgdat, unsigned long size,
void * __init alloc_bootmem_section(unsigned long size,
unsigned long section_nr)
{
+#ifdef CONFIG_NO_BOOTMEM
+ unsigned long pfn, goal, limit;
+
+ pfn = section_nr_to_pfn(section_nr);
+ goal = pfn << PAGE_SHIFT;
+ limit = section_nr_to_pfn(section_nr + 1) << PAGE_SHIFT;
+
+ return __alloc_memory_core_early(early_pfn_to_nid(pfn), size,
+ SMP_CACHE_BYTES, goal, limit);
+#else
bootmem_data_t *bdata;
unsigned long pfn, goal, limit;

@@ -729,6 +907,7 @@ void * __init alloc_bootmem_section(unsigned long size,
bdata = &bootmem_node_data[early_pfn_to_nid(pfn)];

return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit);
+#endif
}
#endif

@@ -740,11 +919,16 @@ void * __init __alloc_bootmem_node_nopanic(pg_data_t *pgdat, unsigned long size,
if (WARN_ON_ONCE(slab_is_available()))
return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);

+#ifdef CONFIG_NO_BOOTMEM
+ ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
+ goal, -1ULL);
+#else
ptr = alloc_arch_preferred_bootmem(pgdat->bdata, size, align, goal, 0);
if (ptr)
return ptr;

ptr = alloc_bootmem_core(pgdat->bdata, size, align, goal, 0);
+#endif
if (ptr)
return ptr;

@@ -795,6 +979,11 @@ void * __init __alloc_bootmem_low_node(pg_data_t *pgdat, unsigned long size,
if (WARN_ON_ONCE(slab_is_available()))
return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);

+#ifdef CONFIG_NO_BOOTMEM
+ return __alloc_memory_core_early(pgdat->node_id, size, align,
+ goal, ARCH_LOW_ADDRESS_LIMIT);
+#else
return ___alloc_bootmem_node(pgdat->bdata, size, align,
goal, ARCH_LOW_ADDRESS_LIMIT);
+#endif
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8deb9d0..78821a2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3435,6 +3435,59 @@ void __init free_bootmem_with_active_regions(int nid,
}
}

+int __init add_from_early_node_map(struct range *range, int az,
+ int nr_range, int nid)
+{
+ int i;
+ u64 start, end;
+
+ /* need to go over early_node_map to find out good range for node */
+ for_each_active_range_index_in_nid(i, nid) {
+ start = early_node_map[i].start_pfn;
+ end = early_node_map[i].end_pfn;
+ nr_range = add_range(range, az, nr_range, start, end);
+ }
+ return nr_range;
+}
+
+void * __init __alloc_memory_core_early(int nid, u64 size, u64 align,
+ u64 goal, u64 limit)
+{
+ int i;
+ void *ptr;
+
+ /* need to go over early_node_map to find out good range for node */
+ for_each_active_range_index_in_nid(i, nid) {
+ u64 addr;
+ u64 ei_start, ei_last;
+
+ ei_last = early_node_map[i].end_pfn;
+ ei_last <<= PAGE_SHIFT;
+ ei_start = early_node_map[i].start_pfn;
+ ei_start <<= PAGE_SHIFT;
+ addr = find_early_area(ei_start, ei_last,
+ goal, limit, size, align);
+
+ if (addr == -1ULL)
+ continue;
+
+#if 0
+ printk(KERN_DEBUG "alloc (nid=%d %llx - %llx) (%llx - %llx) %llx %llx => %llx\n",
+ nid,
+ ei_start, ei_last, goal, limit, size,
+ align, addr);
+#endif
+
+ ptr = phys_to_virt(addr);
+ memset(ptr, 0, size);
+ reserve_early_without_check(addr, addr + size, "BOOTMEM");
+ return ptr;
+ }
+
+ return NULL;
+}
+
+
void __init work_with_active_regions(int nid, work_fn_t work_fn, void *data)
{
int i;
@@ -4467,7 +4520,11 @@ void __init set_dma_reserve(unsigned long new_dma_reserve)
}

#ifndef CONFIG_NEED_MULTIPLE_NODES
-struct pglist_data __refdata contig_page_data = { .bdata = &bootmem_node_data[0] };
+struct pglist_data __refdata contig_page_data = {
+#ifndef CONFIG_NO_BOOTMEM
+ .bdata = &bootmem_node_data[0]
+#endif
+ };
EXPORT_SYMBOL(contig_page_data);
#endif

diff --git a/mm/percpu.c b/mm/percpu.c
index 083e7c9..841defe 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1929,7 +1929,10 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, ssize_t dyn_size,
}
/* copy and return the unused part */
memcpy(ptr, __per_cpu_load, ai->static_size);
+#ifndef CONFIG_NO_BOOTMEM
+ /* fix partial free ! */
free_fn(ptr + size_sum, ai->unit_size - size_sum);
+#endif
}
}

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index d9714bd..9506c39 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -40,7 +40,7 @@ static void * __init_refok __earlyonly_bootmem_alloc(int node,
unsigned long align,
unsigned long goal)
{
- return __alloc_bootmem_node(NODE_DATA(node), size, align, goal);
+ return __alloc_bootmem_node_high(NODE_DATA(node), size, align, goal);
}


--
1.6.4.2

2010-02-09 19:33:53

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 01/35] x86: fix sci on ioapic 1

Thomas Renninger <[email protected]> reported on IBM x3330

booting a latest kernel on this machine results in:

PCI: PCI BIOS revision 2.10 entry at 0xfd61c, last bus=1
PCI: Using configuration type 1 for base access bio: create slab <bio-0> at 0
ACPI: SCI (IRQ30) allocation failed
ACPI Exception: AE_NOT_ACQUIRED, Unable to install System Control Interrupt handler (20090903/evevent-161)
ACPI: Unable to start the ACPI Interpreter

Later all kind of devices fail...

and bisect it down to this commit:
commit b9c61b70075c87a8612624736faf4a2de5b1ed30

x86/pci: update pirq_enable_irq() to setup io apic routing

it turns out we need to set irq routing for the sci on ioapic1 early.

-v2: make it work without sparseirq too.
-v3: fix checkpatch.pl warning, and cc to stable

Reported-by: Thomas Renninger <[email protected]>
Bisected-by: Thomas Renninger <[email protected]>
Tested-by: Thomas Renninger <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>
Cc: [email protected]
---
arch/x86/include/asm/io_apic.h | 1 +
arch/x86/kernel/acpi/boot.c | 9 ++++++-
arch/x86/kernel/apic/io_apic.c | 50 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 59 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 7c7c16c..5f61f6e 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -160,6 +160,7 @@ extern int io_apic_get_redir_entries(int ioapic);
struct io_apic_irq_attr;
extern int io_apic_set_pci_routing(struct device *dev, int irq,
struct io_apic_irq_attr *irq_attr);
+void setup_IO_APIC_irq_extra(u32 gsi);
extern int (*ioapic_renumber_irq)(int ioapic, int irq);
extern void ioapic_init_mappings(void);
extern void ioapic_insert_resources(void);
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 036d28a..6083bfa 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -446,6 +446,12 @@ void __init acpi_pic_sci_set_trigger(unsigned int irq, u16 trigger)
int acpi_gsi_to_irq(u32 gsi, unsigned int *irq)
{
*irq = gsi;
+
+#ifdef CONFIG_X86_IO_APIC
+ if (acpi_irq_model == ACPI_IRQ_MODEL_IOAPIC)
+ setup_IO_APIC_irq_extra(gsi);
+#endif
+
return 0;
}

@@ -473,7 +479,8 @@ int acpi_register_gsi(struct device *dev, u32 gsi, int trigger, int polarity)
plat_gsi = mp_register_gsi(dev, gsi, trigger, polarity);
}
#endif
- acpi_gsi_to_irq(plat_gsi, &irq);
+ irq = plat_gsi;
+
return irq;
}

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 75ba3d0..ee2fba1 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1541,6 +1541,56 @@ static void __init setup_IO_APIC_irqs(void)
}

/*
+ * for the gsit that is not in first ioapic
+ * but could not use acpi_register_gsi()
+ * like some special sci in IBM x3330
+ */
+void setup_IO_APIC_irq_extra(u32 gsi)
+{
+ int apic_id = 0, pin, idx, irq;
+ int node = cpu_to_node(boot_cpu_id);
+ struct irq_desc *desc;
+ struct irq_cfg *cfg;
+
+ /*
+ * Convert 'gsi' to 'ioapic.pin'.
+ */
+ apic_id = mp_find_ioapic(gsi);
+ if (apic_id < 0)
+ return;
+
+ pin = mp_find_ioapic_pin(apic_id, gsi);
+ idx = find_irq_entry(apic_id, pin, mp_INT);
+ if (idx == -1)
+ return;
+
+ irq = pin_2_irq(idx, apic_id, pin);
+#ifdef CONFIG_SPARSE_IRQ
+ desc = irq_to_desc(irq);
+ if (desc)
+ return;
+#endif
+ desc = irq_to_desc_alloc_node(irq, node);
+ if (!desc) {
+ printk(KERN_INFO "can not get irq_desc for %d\n", irq);
+ return;
+ }
+
+ cfg = desc->chip_data;
+ add_pin_to_irq_node(cfg, node, apic_id, pin);
+
+ if (test_bit(pin, mp_ioapic_routing[apic_id].pin_programmed)) {
+ pr_debug("Pin %d-%d already programmed\n",
+ mp_ioapics[apic_id].apicid, pin);
+ return;
+ }
+ set_bit(pin, mp_ioapic_routing[apic_id].pin_programmed);
+
+ setup_IO_APIC_irq(apic_id, pin, irq, desc,
+ irq_trigger(idx), irq_polarity(idx));
+}
+
+/*
* Set up the timer pin, possibly with the 8259A-master behind.
*/
static void __init setup_timer_IRQ0_pin(unsigned int apic_id, unsigned int pin,
--
1.6.4.2

2010-02-09 19:36:06

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 26/35] core: move early_res

from arch/x86 to kernel/

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/e820.h | 2 +-
arch/x86/include/asm/early_res.h | 21 --
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/e820.c | 1 -
arch/x86/kernel/early_res.c | 523 --------------------------------------
include/linux/early_res.h | 21 ++
kernel/Makefile | 2 +-
kernel/early_res.c | 522 +++++++++++++++++++++++++++++++++++++
8 files changed, 546 insertions(+), 548 deletions(-)
delete mode 100644 arch/x86/include/asm/early_res.h
delete mode 100644 arch/x86/kernel/early_res.c
create mode 100644 include/linux/early_res.h
create mode 100644 kernel/early_res.c

diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index a8299e1..0e22296 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -112,7 +112,7 @@ extern unsigned long end_user_pfn;
extern u64 find_e820_area(u64 start, u64 end, u64 size, u64 align);
extern u64 find_e820_area_size(u64 start, u64 *sizep, u64 align);
extern u64 early_reserve_e820(u64 startt, u64 sizet, u64 align);
-#include <asm/early_res.h>
+#include <linux/early_res.h>

extern unsigned long e820_end_of_ram_pfn(void);
extern unsigned long e820_end_of_low_ram_pfn(void);
diff --git a/arch/x86/include/asm/early_res.h b/arch/x86/include/asm/early_res.h
deleted file mode 100644
index 9758f3d..0000000
--- a/arch/x86/include/asm/early_res.h
+++ /dev/null
@@ -1,21 +0,0 @@
-#ifndef _ASM_X86_EARLY_RES_H
-#define _ASM_X86_EARLY_RES_H
-#ifdef __KERNEL__
-
-extern void reserve_early(u64 start, u64 end, char *name);
-extern void reserve_early_overlap_ok(u64 start, u64 end, char *name);
-extern void free_early(u64 start, u64 end);
-extern void early_res_to_bootmem(u64 start, u64 end);
-
-void reserve_early_without_check(u64 start, u64 end, char *name);
-u64 find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
- u64 size, u64 align);
-u64 find_early_area_size(u64 ei_start, u64 ei_last, u64 start,
- u64 *sizep, u64 align);
-u64 find_fw_memmap_area(u64 start, u64 end, u64 size, u64 align);
-#include <linux/range.h>
-int get_free_all_memory_range(struct range **rangep, int nodeid);
-
-#endif /* __KERNEL__ */
-
-#endif /* _ASM_X86_EARLY_RES_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index f5fb9f0..d87f09b 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -38,7 +38,7 @@ obj-$(CONFIG_X86_32) += probe_roms_32.o
obj-$(CONFIG_X86_32) += sys_i386_32.o i386_ksyms_32.o
obj-$(CONFIG_X86_64) += sys_x86_64.o x8664_ksyms_64.o
obj-$(CONFIG_X86_64) += syscall_64.o vsyscall_64.o
-obj-y += bootflag.o e820.o early_res.o
+obj-y += bootflag.o e820.o
obj-y += pci-dma.o quirks.o i8237.o topology.o kdebugfs.o
obj-y += alternative.o i8253.o pci-nommu.o hw_breakpoint.o
obj-y += tsc.o io_delay.o rtc.o
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 2c3b090..09ca6e8 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -17,7 +17,6 @@
#include <linux/firmware-map.h>

#include <asm/e820.h>
-#include <asm/early_res.h>
#include <asm/proto.h>
#include <asm/setup.h>

diff --git a/arch/x86/kernel/early_res.c b/arch/x86/kernel/early_res.c
deleted file mode 100644
index e209fc4..0000000
--- a/arch/x86/kernel/early_res.c
+++ /dev/null
@@ -1,523 +0,0 @@
-/*
- * early_res, could be used to replace bootmem
- */
-#include <linux/kernel.h>
-#include <linux/types.h>
-#include <linux/init.h>
-#include <linux/bootmem.h>
-#include <linux/mm.h>
-
-#include <asm/early_res.h>
-
-/*
- * Early reserved memory areas.
- */
-/*
- * need to make sure this one is bigger enough before
- * find_fw_memmap_area could be used
- */
-#define MAX_EARLY_RES_X 32
-
-struct early_res {
- u64 start, end;
- char name[15];
- char overlap_ok;
-};
-static struct early_res early_res_x[MAX_EARLY_RES_X] __initdata;
-
-static int max_early_res __initdata = MAX_EARLY_RES_X;
-static struct early_res *early_res __initdata = &early_res_x[0];
-static int early_res_count __initdata;
-
-static int __init find_overlapped_early(u64 start, u64 end)
-{
- int i;
- struct early_res *r;
-
- for (i = 0; i < max_early_res && early_res[i].end; i++) {
- r = &early_res[i];
- if (end > r->start && start < r->end)
- break;
- }
-
- return i;
-}
-
-/*
- * Drop the i-th range from the early reservation map,
- * by copying any higher ranges down one over it, and
- * clearing what had been the last slot.
- */
-static void __init drop_range(int i)
-{
- int j;
-
- for (j = i + 1; j < max_early_res && early_res[j].end; j++)
- ;
-
- memmove(&early_res[i], &early_res[i + 1],
- (j - 1 - i) * sizeof(struct early_res));
-
- early_res[j - 1].end = 0;
- early_res_count--;
-}
-
-/*
- * Split any existing ranges that:
- * 1) are marked 'overlap_ok', and
- * 2) overlap with the stated range [start, end)
- * into whatever portion (if any) of the existing range is entirely
- * below or entirely above the stated range. Drop the portion
- * of the existing range that overlaps with the stated range,
- * which will allow the caller of this routine to then add that
- * stated range without conflicting with any existing range.
- */
-static void __init drop_overlaps_that_are_ok(u64 start, u64 end)
-{
- int i;
- struct early_res *r;
- u64 lower_start, lower_end;
- u64 upper_start, upper_end;
- char name[15];
-
- for (i = 0; i < max_early_res && early_res[i].end; i++) {
- r = &early_res[i];
-
- /* Continue past non-overlapping ranges */
- if (end <= r->start || start >= r->end)
- continue;
-
- /*
- * Leave non-ok overlaps as is; let caller
- * panic "Overlapping early reservations"
- * when it hits this overlap.
- */
- if (!r->overlap_ok)
- return;
-
- /*
- * We have an ok overlap. We will drop it from the early
- * reservation map, and add back in any non-overlapping
- * portions (lower or upper) as separate, overlap_ok,
- * non-overlapping ranges.
- */
-
- /* 1. Note any non-overlapping (lower or upper) ranges. */
- strncpy(name, r->name, sizeof(name) - 1);
-
- lower_start = lower_end = 0;
- upper_start = upper_end = 0;
- if (r->start < start) {
- lower_start = r->start;
- lower_end = start;
- }
- if (r->end > end) {
- upper_start = end;
- upper_end = r->end;
- }
-
- /* 2. Drop the original ok overlapping range */
- drop_range(i);
-
- i--; /* resume for-loop on copied down entry */
-
- /* 3. Add back in any non-overlapping ranges. */
- if (lower_end)
- reserve_early_overlap_ok(lower_start, lower_end, name);
- if (upper_end)
- reserve_early_overlap_ok(upper_start, upper_end, name);
- }
-}
-
-static void __init __reserve_early(u64 start, u64 end, char *name,
- int overlap_ok)
-{
- int i;
- struct early_res *r;
-
- i = find_overlapped_early(start, end);
- if (i >= max_early_res)
- panic("Too many early reservations");
- r = &early_res[i];
- if (r->end)
- panic("Overlapping early reservations "
- "%llx-%llx %s to %llx-%llx %s\n",
- start, end - 1, name ? name : "", r->start,
- r->end - 1, r->name);
- r->start = start;
- r->end = end;
- r->overlap_ok = overlap_ok;
- if (name)
- strncpy(r->name, name, sizeof(r->name) - 1);
- early_res_count++;
-}
-
-/*
- * A few early reservtations come here.
- *
- * The 'overlap_ok' in the name of this routine does -not- mean it
- * is ok for these reservations to overlap an earlier reservation.
- * Rather it means that it is ok for subsequent reservations to
- * overlap this one.
- *
- * Use this entry point to reserve early ranges when you are doing
- * so out of "Paranoia", reserving perhaps more memory than you need,
- * just in case, and don't mind a subsequent overlapping reservation
- * that is known to be needed.
- *
- * The drop_overlaps_that_are_ok() call here isn't really needed.
- * It would be needed if we had two colliding 'overlap_ok'
- * reservations, so that the second such would not panic on the
- * overlap with the first. We don't have any such as of this
- * writing, but might as well tolerate such if it happens in
- * the future.
- */
-void __init reserve_early_overlap_ok(u64 start, u64 end, char *name)
-{
- drop_overlaps_that_are_ok(start, end);
- __reserve_early(start, end, name, 1);
-}
-
-u64 __init __weak find_fw_memmap_area(u64 start, u64 end, u64 size, u64 align)
-{
- panic("should have find_fw_memmap_area defined with arch");
-
- return -1ULL;
-}
-
-static void __init __check_and_double_early_res(u64 ex_start, u64 ex_end)
-{
- u64 start, end, size, mem;
- struct early_res *new;
-
- /* do we have enough slots left ? */
- if ((max_early_res - early_res_count) > max(max_early_res/8, 2))
- return;
-
- /* double it */
- mem = -1ULL;
- size = sizeof(struct early_res) * max_early_res * 2;
- if (early_res == early_res_x)
- start = 0;
- else
- start = early_res[0].end;
- end = ex_start;
- if (start + size < end)
- mem = find_fw_memmap_area(start, end, size,
- sizeof(struct early_res));
- if (mem == -1ULL) {
- start = ex_end;
- end = max_pfn_mapped << PAGE_SHIFT;
- if (start + size < end)
- mem = find_fw_memmap_area(start, end, size,
- sizeof(struct early_res));
- }
- if (mem == -1ULL)
- panic("can not find more space for early_res array");
-
- new = __va(mem);
- /* save the first one for own */
- new[0].start = mem;
- new[0].end = mem + size;
- new[0].overlap_ok = 0;
- /* copy old to new */
- if (early_res == early_res_x) {
- memcpy(&new[1], &early_res[0],
- sizeof(struct early_res) * max_early_res);
- memset(&new[max_early_res+1], 0,
- sizeof(struct early_res) * (max_early_res - 1));
- early_res_count++;
- } else {
- memcpy(&new[1], &early_res[1],
- sizeof(struct early_res) * (max_early_res - 1));
- memset(&new[max_early_res], 0,
- sizeof(struct early_res) * max_early_res);
- }
- memset(&early_res[0], 0, sizeof(struct early_res) * max_early_res);
- early_res = new;
- max_early_res *= 2;
- printk(KERN_DEBUG "early_res array is doubled to %d at [%llx - %llx]\n",
- max_early_res, mem, mem + size - 1);
-}
-
-/*
- * Most early reservations come here.
- *
- * We first have drop_overlaps_that_are_ok() drop any pre-existing
- * 'overlap_ok' ranges, so that we can then reserve this memory
- * range without risk of panic'ing on an overlapping overlap_ok
- * early reservation.
- */
-void __init reserve_early(u64 start, u64 end, char *name)
-{
- if (start >= end)
- return;
-
- __check_and_double_early_res(start, end);
-
- drop_overlaps_that_are_ok(start, end);
- __reserve_early(start, end, name, 0);
-}
-
-void __init reserve_early_without_check(u64 start, u64 end, char *name)
-{
- struct early_res *r;
-
- if (start >= end)
- return;
-
- __check_and_double_early_res(start, end);
-
- r = &early_res[early_res_count];
-
- r->start = start;
- r->end = end;
- r->overlap_ok = 0;
- if (name)
- strncpy(r->name, name, sizeof(r->name) - 1);
- early_res_count++;
-}
-
-void __init free_early(u64 start, u64 end)
-{
- struct early_res *r;
- int i;
-
- i = find_overlapped_early(start, end);
- r = &early_res[i];
- if (i >= max_early_res || r->end != end || r->start != start)
- panic("free_early on not reserved area: %llx-%llx!",
- start, end - 1);
-
- drop_range(i);
-}
-
-#ifdef CONFIG_NO_BOOTMEM
-static void __init subtract_early_res(struct range *range, int az)
-{
- int i, count;
- u64 final_start, final_end;
- int idx = 0;
-
- count = 0;
- for (i = 0; i < max_early_res && early_res[i].end; i++)
- count++;
-
- /* need to skip first one ?*/
- if (early_res != early_res_x)
- idx = 1;
-
-#define DEBUG_PRINT_EARLY_RES 1
-
-#if DEBUG_PRINT_EARLY_RES
- printk(KERN_INFO "Subtract (%d early reservations)\n", count);
-#endif
- for (i = idx; i < count; i++) {
- struct early_res *r = &early_res[i];
-#if DEBUG_PRINT_EARLY_RES
- printk(KERN_INFO " #%d [%010llx - %010llx] %15s\n", i,
- r->start, r->end, r->name);
-#endif
- final_start = PFN_DOWN(r->start);
- final_end = PFN_UP(r->end);
- if (final_start >= final_end)
- continue;
- subtract_range(range, az, final_start, final_end);
- }
-
-}
-
-int __init get_free_all_memory_range(struct range **rangep, int nodeid)
-{
- int i, count;
- u64 start = 0, end;
- u64 size;
- u64 mem;
- struct range *range;
- int nr_range;
-
- count = 0;
- for (i = 0; i < max_early_res && early_res[i].end; i++)
- count++;
-
- count *= 2;
-
- size = sizeof(struct range) * count;
-#ifdef MAX_DMA32_PFN
- if (max_pfn_mapped > MAX_DMA32_PFN)
- start = MAX_DMA32_PFN << PAGE_SHIFT;
-#endif
- end = max_pfn_mapped << PAGE_SHIFT;
- mem = find_fw_memmap_area(start, end, size, sizeof(struct range));
- if (mem == -1ULL)
- panic("can not find more space for range free");
-
- range = __va(mem);
- /* use early_node_map[] and early_res to get range array at first */
- memset(range, 0, size);
- nr_range = 0;
-
- /* need to go over early_node_map to find out good range for node */
- nr_range = add_from_early_node_map(range, count, nr_range, nodeid);
-#ifdef CONFIG_X86_32
- subtract_range(range, count, max_low_pfn, -1UL);
-#endif
- subtract_early_res(range, count);
- nr_range = clean_sort_range(range, count);
-
- /* need to clear it ? */
- if (nodeid == MAX_NUMNODES) {
- memset(&early_res[0], 0,
- sizeof(struct early_res) * max_early_res);
- early_res = NULL;
- max_early_res = 0;
- }
-
- *rangep = range;
- return nr_range;
-}
-#else
-void __init early_res_to_bootmem(u64 start, u64 end)
-{
- int i, count;
- u64 final_start, final_end;
- int idx = 0;
-
- count = 0;
- for (i = 0; i < max_early_res && early_res[i].end; i++)
- count++;
-
- /* need to skip first one ?*/
- if (early_res != early_res_x)
- idx = 1;
-
- printk(KERN_INFO "(%d/%d early reservations) ==> bootmem [%010llx - %010llx]\n",
- count - idx, max_early_res, start, end);
- for (i = idx; i < count; i++) {
- struct early_res *r = &early_res[i];
- printk(KERN_INFO " #%d [%010llx - %010llx] %16s", i,
- r->start, r->end, r->name);
- final_start = max(start, r->start);
- final_end = min(end, r->end);
- if (final_start >= final_end) {
- printk(KERN_CONT "\n");
- continue;
- }
- printk(KERN_CONT " ==> [%010llx - %010llx]\n",
- final_start, final_end);
- reserve_bootmem_generic(final_start, final_end - final_start,
- BOOTMEM_DEFAULT);
- }
- /* clear them */
- memset(&early_res[0], 0, sizeof(struct early_res) * max_early_res);
- early_res = NULL;
- max_early_res = 0;
- early_res_count = 0;
-}
-#endif
-
-/* Check for already reserved areas */
-static inline int __init bad_addr(u64 *addrp, u64 size, u64 align)
-{
- int i;
- u64 addr = *addrp;
- int changed = 0;
- struct early_res *r;
-again:
- i = find_overlapped_early(addr, addr + size);
- r = &early_res[i];
- if (i < max_early_res && r->end) {
- *addrp = addr = round_up(r->end, align);
- changed = 1;
- goto again;
- }
- return changed;
-}
-
-/* Check for already reserved areas */
-static inline int __init bad_addr_size(u64 *addrp, u64 *sizep, u64 align)
-{
- int i;
- u64 addr = *addrp, last;
- u64 size = *sizep;
- int changed = 0;
-again:
- last = addr + size;
- for (i = 0; i < max_early_res && early_res[i].end; i++) {
- struct early_res *r = &early_res[i];
- if (last > r->start && addr < r->start) {
- size = r->start - addr;
- changed = 1;
- goto again;
- }
- if (last > r->end && addr < r->end) {
- addr = round_up(r->end, align);
- size = last - addr;
- changed = 1;
- goto again;
- }
- if (last <= r->end && addr >= r->start) {
- (*sizep)++;
- return 0;
- }
- }
- if (changed) {
- *addrp = addr;
- *sizep = size;
- }
- return changed;
-}
-
-/*
- * Find a free area with specified alignment in a specific range.
- * only with the area.between start to end is active range from early_node_map
- * so they are good as RAM
- */
-u64 __init find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
- u64 size, u64 align)
-{
- u64 addr, last;
-
- addr = round_up(ei_start, align);
- if (addr < start)
- addr = round_up(start, align);
- if (addr >= ei_last)
- goto out;
- while (bad_addr(&addr, size, align) && addr+size <= ei_last)
- ;
- last = addr + size;
- if (last > ei_last)
- goto out;
- if (last > end)
- goto out;
-
- return addr;
-
-out:
- return -1ULL;
-}
-
-u64 __init find_early_area_size(u64 ei_start, u64 ei_last, u64 start,
- u64 *sizep, u64 align)
-{
- u64 addr, last;
-
- addr = round_up(ei_start, align);
- if (addr < start)
- addr = round_up(start, align);
- if (addr >= ei_last)
- goto out;
- *sizep = ei_last - addr;
- while (bad_addr_size(&addr, sizep, align) && addr + *sizep <= ei_last)
- ;
- last = addr + *sizep;
- if (last > ei_last)
- goto out;
-
- return addr;
-
-out:
- return -1ULL;
-}
-
-
diff --git a/include/linux/early_res.h b/include/linux/early_res.h
new file mode 100644
index 0000000..d3dd9ea
--- /dev/null
+++ b/include/linux/early_res.h
@@ -0,0 +1,21 @@
+#ifndef _LINUX_EARLY_RES_H
+#define _LINUX_EARLY_RES_H
+#ifdef __KERNEL__
+
+extern void reserve_early(u64 start, u64 end, char *name);
+extern void reserve_early_overlap_ok(u64 start, u64 end, char *name);
+extern void free_early(u64 start, u64 end);
+extern void early_res_to_bootmem(u64 start, u64 end);
+
+void reserve_early_without_check(u64 start, u64 end, char *name);
+u64 find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
+ u64 size, u64 align);
+u64 find_early_area_size(u64 ei_start, u64 ei_last, u64 start,
+ u64 *sizep, u64 align);
+u64 find_fw_memmap_area(u64 start, u64 end, u64 size, u64 align);
+#include <linux/range.h>
+int get_free_all_memory_range(struct range **rangep, int nodeid);
+
+#endif /* __KERNEL__ */
+
+#endif /* _LINUX_EARLY_RES_H */
diff --git a/kernel/Makefile b/kernel/Makefile
index 847a2a2..73784e6 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -10,7 +10,7 @@ obj-y = sched.o fork.o exec_domain.o panic.o printk.o \
kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
notifier.o ksysfs.o pm_qos_params.o sched_clock.o cred.o \
- async.o range.o
+ async.o range.o early_res.o
obj-y += groups.o

ifdef CONFIG_FUNCTION_TRACER
diff --git a/kernel/early_res.c b/kernel/early_res.c
new file mode 100644
index 0000000..4fd02b7
--- /dev/null
+++ b/kernel/early_res.c
@@ -0,0 +1,522 @@
+/*
+ * early_res, could be used to replace bootmem
+ */
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/bootmem.h>
+#include <linux/mm.h>
+#include <linux/early_res.h>
+
+/*
+ * Early reserved memory areas.
+ */
+/*
+ * need to make sure this one is bigger enough before
+ * find_fw_memmap_area could be used
+ */
+#define MAX_EARLY_RES_X 32
+
+struct early_res {
+ u64 start, end;
+ char name[15];
+ char overlap_ok;
+};
+static struct early_res early_res_x[MAX_EARLY_RES_X] __initdata;
+
+static int max_early_res __initdata = MAX_EARLY_RES_X;
+static struct early_res *early_res __initdata = &early_res_x[0];
+static int early_res_count __initdata;
+
+static int __init find_overlapped_early(u64 start, u64 end)
+{
+ int i;
+ struct early_res *r;
+
+ for (i = 0; i < max_early_res && early_res[i].end; i++) {
+ r = &early_res[i];
+ if (end > r->start && start < r->end)
+ break;
+ }
+
+ return i;
+}
+
+/*
+ * Drop the i-th range from the early reservation map,
+ * by copying any higher ranges down one over it, and
+ * clearing what had been the last slot.
+ */
+static void __init drop_range(int i)
+{
+ int j;
+
+ for (j = i + 1; j < max_early_res && early_res[j].end; j++)
+ ;
+
+ memmove(&early_res[i], &early_res[i + 1],
+ (j - 1 - i) * sizeof(struct early_res));
+
+ early_res[j - 1].end = 0;
+ early_res_count--;
+}
+
+/*
+ * Split any existing ranges that:
+ * 1) are marked 'overlap_ok', and
+ * 2) overlap with the stated range [start, end)
+ * into whatever portion (if any) of the existing range is entirely
+ * below or entirely above the stated range. Drop the portion
+ * of the existing range that overlaps with the stated range,
+ * which will allow the caller of this routine to then add that
+ * stated range without conflicting with any existing range.
+ */
+static void __init drop_overlaps_that_are_ok(u64 start, u64 end)
+{
+ int i;
+ struct early_res *r;
+ u64 lower_start, lower_end;
+ u64 upper_start, upper_end;
+ char name[15];
+
+ for (i = 0; i < max_early_res && early_res[i].end; i++) {
+ r = &early_res[i];
+
+ /* Continue past non-overlapping ranges */
+ if (end <= r->start || start >= r->end)
+ continue;
+
+ /*
+ * Leave non-ok overlaps as is; let caller
+ * panic "Overlapping early reservations"
+ * when it hits this overlap.
+ */
+ if (!r->overlap_ok)
+ return;
+
+ /*
+ * We have an ok overlap. We will drop it from the early
+ * reservation map, and add back in any non-overlapping
+ * portions (lower or upper) as separate, overlap_ok,
+ * non-overlapping ranges.
+ */
+
+ /* 1. Note any non-overlapping (lower or upper) ranges. */
+ strncpy(name, r->name, sizeof(name) - 1);
+
+ lower_start = lower_end = 0;
+ upper_start = upper_end = 0;
+ if (r->start < start) {
+ lower_start = r->start;
+ lower_end = start;
+ }
+ if (r->end > end) {
+ upper_start = end;
+ upper_end = r->end;
+ }
+
+ /* 2. Drop the original ok overlapping range */
+ drop_range(i);
+
+ i--; /* resume for-loop on copied down entry */
+
+ /* 3. Add back in any non-overlapping ranges. */
+ if (lower_end)
+ reserve_early_overlap_ok(lower_start, lower_end, name);
+ if (upper_end)
+ reserve_early_overlap_ok(upper_start, upper_end, name);
+ }
+}
+
+static void __init __reserve_early(u64 start, u64 end, char *name,
+ int overlap_ok)
+{
+ int i;
+ struct early_res *r;
+
+ i = find_overlapped_early(start, end);
+ if (i >= max_early_res)
+ panic("Too many early reservations");
+ r = &early_res[i];
+ if (r->end)
+ panic("Overlapping early reservations "
+ "%llx-%llx %s to %llx-%llx %s\n",
+ start, end - 1, name ? name : "", r->start,
+ r->end - 1, r->name);
+ r->start = start;
+ r->end = end;
+ r->overlap_ok = overlap_ok;
+ if (name)
+ strncpy(r->name, name, sizeof(r->name) - 1);
+ early_res_count++;
+}
+
+/*
+ * A few early reservtations come here.
+ *
+ * The 'overlap_ok' in the name of this routine does -not- mean it
+ * is ok for these reservations to overlap an earlier reservation.
+ * Rather it means that it is ok for subsequent reservations to
+ * overlap this one.
+ *
+ * Use this entry point to reserve early ranges when you are doing
+ * so out of "Paranoia", reserving perhaps more memory than you need,
+ * just in case, and don't mind a subsequent overlapping reservation
+ * that is known to be needed.
+ *
+ * The drop_overlaps_that_are_ok() call here isn't really needed.
+ * It would be needed if we had two colliding 'overlap_ok'
+ * reservations, so that the second such would not panic on the
+ * overlap with the first. We don't have any such as of this
+ * writing, but might as well tolerate such if it happens in
+ * the future.
+ */
+void __init reserve_early_overlap_ok(u64 start, u64 end, char *name)
+{
+ drop_overlaps_that_are_ok(start, end);
+ __reserve_early(start, end, name, 1);
+}
+
+u64 __init __weak find_fw_memmap_area(u64 start, u64 end, u64 size, u64 align)
+{
+ panic("should have find_fw_memmap_area defined with arch");
+
+ return -1ULL;
+}
+
+static void __init __check_and_double_early_res(u64 ex_start, u64 ex_end)
+{
+ u64 start, end, size, mem;
+ struct early_res *new;
+
+ /* do we have enough slots left ? */
+ if ((max_early_res - early_res_count) > max(max_early_res/8, 2))
+ return;
+
+ /* double it */
+ mem = -1ULL;
+ size = sizeof(struct early_res) * max_early_res * 2;
+ if (early_res == early_res_x)
+ start = 0;
+ else
+ start = early_res[0].end;
+ end = ex_start;
+ if (start + size < end)
+ mem = find_fw_memmap_area(start, end, size,
+ sizeof(struct early_res));
+ if (mem == -1ULL) {
+ start = ex_end;
+ end = max_pfn_mapped << PAGE_SHIFT;
+ if (start + size < end)
+ mem = find_fw_memmap_area(start, end, size,
+ sizeof(struct early_res));
+ }
+ if (mem == -1ULL)
+ panic("can not find more space for early_res array");
+
+ new = __va(mem);
+ /* save the first one for own */
+ new[0].start = mem;
+ new[0].end = mem + size;
+ new[0].overlap_ok = 0;
+ /* copy old to new */
+ if (early_res == early_res_x) {
+ memcpy(&new[1], &early_res[0],
+ sizeof(struct early_res) * max_early_res);
+ memset(&new[max_early_res+1], 0,
+ sizeof(struct early_res) * (max_early_res - 1));
+ early_res_count++;
+ } else {
+ memcpy(&new[1], &early_res[1],
+ sizeof(struct early_res) * (max_early_res - 1));
+ memset(&new[max_early_res], 0,
+ sizeof(struct early_res) * max_early_res);
+ }
+ memset(&early_res[0], 0, sizeof(struct early_res) * max_early_res);
+ early_res = new;
+ max_early_res *= 2;
+ printk(KERN_DEBUG "early_res array is doubled to %d at [%llx - %llx]\n",
+ max_early_res, mem, mem + size - 1);
+}
+
+/*
+ * Most early reservations come here.
+ *
+ * We first have drop_overlaps_that_are_ok() drop any pre-existing
+ * 'overlap_ok' ranges, so that we can then reserve this memory
+ * range without risk of panic'ing on an overlapping overlap_ok
+ * early reservation.
+ */
+void __init reserve_early(u64 start, u64 end, char *name)
+{
+ if (start >= end)
+ return;
+
+ __check_and_double_early_res(start, end);
+
+ drop_overlaps_that_are_ok(start, end);
+ __reserve_early(start, end, name, 0);
+}
+
+void __init reserve_early_without_check(u64 start, u64 end, char *name)
+{
+ struct early_res *r;
+
+ if (start >= end)
+ return;
+
+ __check_and_double_early_res(start, end);
+
+ r = &early_res[early_res_count];
+
+ r->start = start;
+ r->end = end;
+ r->overlap_ok = 0;
+ if (name)
+ strncpy(r->name, name, sizeof(r->name) - 1);
+ early_res_count++;
+}
+
+void __init free_early(u64 start, u64 end)
+{
+ struct early_res *r;
+ int i;
+
+ i = find_overlapped_early(start, end);
+ r = &early_res[i];
+ if (i >= max_early_res || r->end != end || r->start != start)
+ panic("free_early on not reserved area: %llx-%llx!",
+ start, end - 1);
+
+ drop_range(i);
+}
+
+#ifdef CONFIG_NO_BOOTMEM
+static void __init subtract_early_res(struct range *range, int az)
+{
+ int i, count;
+ u64 final_start, final_end;
+ int idx = 0;
+
+ count = 0;
+ for (i = 0; i < max_early_res && early_res[i].end; i++)
+ count++;
+
+ /* need to skip first one ?*/
+ if (early_res != early_res_x)
+ idx = 1;
+
+#define DEBUG_PRINT_EARLY_RES 1
+
+#if DEBUG_PRINT_EARLY_RES
+ printk(KERN_INFO "Subtract (%d early reservations)\n", count);
+#endif
+ for (i = idx; i < count; i++) {
+ struct early_res *r = &early_res[i];
+#if DEBUG_PRINT_EARLY_RES
+ printk(KERN_INFO " #%d [%010llx - %010llx] %15s\n", i,
+ r->start, r->end, r->name);
+#endif
+ final_start = PFN_DOWN(r->start);
+ final_end = PFN_UP(r->end);
+ if (final_start >= final_end)
+ continue;
+ subtract_range(range, az, final_start, final_end);
+ }
+
+}
+
+int __init get_free_all_memory_range(struct range **rangep, int nodeid)
+{
+ int i, count;
+ u64 start = 0, end;
+ u64 size;
+ u64 mem;
+ struct range *range;
+ int nr_range;
+
+ count = 0;
+ for (i = 0; i < max_early_res && early_res[i].end; i++)
+ count++;
+
+ count *= 2;
+
+ size = sizeof(struct range) * count;
+#ifdef MAX_DMA32_PFN
+ if (max_pfn_mapped > MAX_DMA32_PFN)
+ start = MAX_DMA32_PFN << PAGE_SHIFT;
+#endif
+ end = max_pfn_mapped << PAGE_SHIFT;
+ mem = find_fw_memmap_area(start, end, size, sizeof(struct range));
+ if (mem == -1ULL)
+ panic("can not find more space for range free");
+
+ range = __va(mem);
+ /* use early_node_map[] and early_res to get range array at first */
+ memset(range, 0, size);
+ nr_range = 0;
+
+ /* need to go over early_node_map to find out good range for node */
+ nr_range = add_from_early_node_map(range, count, nr_range, nodeid);
+#ifdef CONFIG_X86_32
+ subtract_range(range, count, max_low_pfn, -1UL);
+#endif
+ subtract_early_res(range, count);
+ nr_range = clean_sort_range(range, count);
+
+ /* need to clear it ? */
+ if (nodeid == MAX_NUMNODES) {
+ memset(&early_res[0], 0,
+ sizeof(struct early_res) * max_early_res);
+ early_res = NULL;
+ max_early_res = 0;
+ }
+
+ *rangep = range;
+ return nr_range;
+}
+#else
+void __init early_res_to_bootmem(u64 start, u64 end)
+{
+ int i, count;
+ u64 final_start, final_end;
+ int idx = 0;
+
+ count = 0;
+ for (i = 0; i < max_early_res && early_res[i].end; i++)
+ count++;
+
+ /* need to skip first one ?*/
+ if (early_res != early_res_x)
+ idx = 1;
+
+ printk(KERN_INFO "(%d/%d early reservations) ==> bootmem [%010llx - %010llx]\n",
+ count - idx, max_early_res, start, end);
+ for (i = idx; i < count; i++) {
+ struct early_res *r = &early_res[i];
+ printk(KERN_INFO " #%d [%010llx - %010llx] %16s", i,
+ r->start, r->end, r->name);
+ final_start = max(start, r->start);
+ final_end = min(end, r->end);
+ if (final_start >= final_end) {
+ printk(KERN_CONT "\n");
+ continue;
+ }
+ printk(KERN_CONT " ==> [%010llx - %010llx]\n",
+ final_start, final_end);
+ reserve_bootmem_generic(final_start, final_end - final_start,
+ BOOTMEM_DEFAULT);
+ }
+ /* clear them */
+ memset(&early_res[0], 0, sizeof(struct early_res) * max_early_res);
+ early_res = NULL;
+ max_early_res = 0;
+ early_res_count = 0;
+}
+#endif
+
+/* Check for already reserved areas */
+static inline int __init bad_addr(u64 *addrp, u64 size, u64 align)
+{
+ int i;
+ u64 addr = *addrp;
+ int changed = 0;
+ struct early_res *r;
+again:
+ i = find_overlapped_early(addr, addr + size);
+ r = &early_res[i];
+ if (i < max_early_res && r->end) {
+ *addrp = addr = round_up(r->end, align);
+ changed = 1;
+ goto again;
+ }
+ return changed;
+}
+
+/* Check for already reserved areas */
+static inline int __init bad_addr_size(u64 *addrp, u64 *sizep, u64 align)
+{
+ int i;
+ u64 addr = *addrp, last;
+ u64 size = *sizep;
+ int changed = 0;
+again:
+ last = addr + size;
+ for (i = 0; i < max_early_res && early_res[i].end; i++) {
+ struct early_res *r = &early_res[i];
+ if (last > r->start && addr < r->start) {
+ size = r->start - addr;
+ changed = 1;
+ goto again;
+ }
+ if (last > r->end && addr < r->end) {
+ addr = round_up(r->end, align);
+ size = last - addr;
+ changed = 1;
+ goto again;
+ }
+ if (last <= r->end && addr >= r->start) {
+ (*sizep)++;
+ return 0;
+ }
+ }
+ if (changed) {
+ *addrp = addr;
+ *sizep = size;
+ }
+ return changed;
+}
+
+/*
+ * Find a free area with specified alignment in a specific range.
+ * only with the area.between start to end is active range from early_node_map
+ * so they are good as RAM
+ */
+u64 __init find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
+ u64 size, u64 align)
+{
+ u64 addr, last;
+
+ addr = round_up(ei_start, align);
+ if (addr < start)
+ addr = round_up(start, align);
+ if (addr >= ei_last)
+ goto out;
+ while (bad_addr(&addr, size, align) && addr+size <= ei_last)
+ ;
+ last = addr + size;
+ if (last > ei_last)
+ goto out;
+ if (last > end)
+ goto out;
+
+ return addr;
+
+out:
+ return -1ULL;
+}
+
+u64 __init find_early_area_size(u64 ei_start, u64 ei_last, u64 start,
+ u64 *sizep, u64 align)
+{
+ u64 addr, last;
+
+ addr = round_up(ei_start, align);
+ if (addr < start)
+ addr = round_up(start, align);
+ if (addr >= ei_last)
+ goto out;
+ *sizep = ei_last - addr;
+ while (bad_addr_size(&addr, sizep, align) && addr + *sizep <= ei_last)
+ ;
+ last = addr + *sizep;
+ if (last > ei_last)
+ goto out;
+
+ return addr;
+
+out:
+ return -1ULL;
+}
+
+
--
1.6.4.2

2010-02-09 19:37:10

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 22/35] early_res: enhance check_and_double_early_res

to make it always try to start from low at first.

so make it more safe for early_memtest to reserve bad range.
aka put new early_res to range that is already tested.

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/kernel/early_res.c | 27 ++++++++++++++++++++-------
1 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/early_res.c b/arch/x86/kernel/early_res.c
index d0a70cc..52804e9 100644
--- a/arch/x86/kernel/early_res.c
+++ b/arch/x86/kernel/early_res.c
@@ -180,9 +180,9 @@ void __init reserve_early_overlap_ok(u64 start, u64 end, char *name)
__reserve_early(start, end, name, 1);
}

-static void __init __check_and_double_early_res(u64 start)
+static void __init __check_and_double_early_res(u64 ex_start, u64 ex_end)
{
- u64 end, size, mem;
+ u64 start, end, size, mem;
struct early_res *new;

/* do we have enough slots left ? */
@@ -190,10 +190,23 @@ static void __init __check_and_double_early_res(u64 start)
return;

/* double it */
- end = max_pfn_mapped << PAGE_SHIFT;
+ mem = -1ULL;
size = sizeof(struct early_res) * max_early_res * 2;
- mem = find_e820_area(start, end, size, sizeof(struct early_res));
-
+ if (early_res == early_res_x)
+ start = 0;
+ else
+ start = early_res[0].end;
+ end = ex_start;
+ if (start + size < end)
+ mem = find_e820_area(start, end, size,
+ sizeof(struct early_res));
+ if (mem == -1ULL) {
+ start = ex_end;
+ end = max_pfn_mapped << PAGE_SHIFT;
+ if (start + size < end)
+ mem = find_e820_area(start, end, size,
+ sizeof(struct early_res));
+ }
if (mem == -1ULL)
panic("can not find more space for early_res array");

@@ -235,7 +248,7 @@ void __init reserve_early(u64 start, u64 end, char *name)
if (start >= end)
return;

- __check_and_double_early_res(end);
+ __check_and_double_early_res(start, end);

drop_overlaps_that_are_ok(start, end);
__reserve_early(start, end, name, 0);
@@ -248,7 +261,7 @@ void __init reserve_early_without_check(u64 start, u64 end, char *name)
if (start >= end)
return;

- __check_and_double_early_res(end);
+ __check_and_double_early_res(start, end);

r = &early_res[early_res_count];

--
1.6.4.2

2010-02-09 19:36:09

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 02/35] x86: move range related operation to one file

we have almost same copies for mtrr cleanup and amd_bus checkup.

and will also use it in replacing bootmem with early_res

so try to move them together and reuse it from different parts.

also rename update_range to subtract_range as the function acctuallu doing

-v2: update comments as Christoph requested

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/kernel/cpu/mtrr/cleanup.c | 180 +++---------------------------------
arch/x86/kernel/mmconf-fam10h_64.c | 7 +-
arch/x86/pci/amd_bus.c | 70 ++------------
include/linux/range.h | 22 +++++
kernel/Makefile | 2 +-
kernel/range.c | 163 ++++++++++++++++++++++++++++++++
6 files changed, 214 insertions(+), 230 deletions(-)
create mode 100644 include/linux/range.h
create mode 100644 kernel/range.c

diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
index 09b1698..669da09 100644
--- a/arch/x86/kernel/cpu/mtrr/cleanup.c
+++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
@@ -22,10 +22,10 @@
#include <linux/pci.h>
#include <linux/smp.h>
#include <linux/cpu.h>
-#include <linux/sort.h>
#include <linux/mutex.h>
#include <linux/uaccess.h>
#include <linux/kvm_para.h>
+#include <linux/range.h>

#include <asm/processor.h>
#include <asm/e820.h>
@@ -34,11 +34,6 @@

#include "mtrr.h"

-struct res_range {
- unsigned long start;
- unsigned long end;
-};
-
struct var_mtrr_range_state {
unsigned long base_pfn;
unsigned long size_pfn;
@@ -56,7 +51,7 @@ struct var_mtrr_state {
/* Should be related to MTRR_VAR_RANGES nums */
#define RANGE_NUM 256

-static struct res_range __initdata range[RANGE_NUM];
+static struct range __initdata range[RANGE_NUM];
static int __initdata nr_range;

static struct var_mtrr_range_state __initdata range_state[RANGE_NUM];
@@ -64,152 +59,11 @@ static struct var_mtrr_range_state __initdata range_state[RANGE_NUM];
static int __initdata debug_print;
#define Dprintk(x...) do { if (debug_print) printk(KERN_DEBUG x); } while (0)

-
-static int __init
-add_range(struct res_range *range, int nr_range,
- unsigned long start, unsigned long end)
-{
- /* Out of slots: */
- if (nr_range >= RANGE_NUM)
- return nr_range;
-
- range[nr_range].start = start;
- range[nr_range].end = end;
-
- nr_range++;
-
- return nr_range;
-}
-
-static int __init
-add_range_with_merge(struct res_range *range, int nr_range,
- unsigned long start, unsigned long end)
-{
- int i;
-
- /* Try to merge it with old one: */
- for (i = 0; i < nr_range; i++) {
- unsigned long final_start, final_end;
- unsigned long common_start, common_end;
-
- if (!range[i].end)
- continue;
-
- common_start = max(range[i].start, start);
- common_end = min(range[i].end, end);
- if (common_start > common_end + 1)
- continue;
-
- final_start = min(range[i].start, start);
- final_end = max(range[i].end, end);
-
- range[i].start = final_start;
- range[i].end = final_end;
- return nr_range;
- }
-
- /* Need to add it: */
- return add_range(range, nr_range, start, end);
-}
-
-static void __init
-subtract_range(struct res_range *range, unsigned long start, unsigned long end)
-{
- int i, j;
-
- for (j = 0; j < RANGE_NUM; j++) {
- if (!range[j].end)
- continue;
-
- if (start <= range[j].start && end >= range[j].end) {
- range[j].start = 0;
- range[j].end = 0;
- continue;
- }
-
- if (start <= range[j].start && end < range[j].end &&
- range[j].start < end + 1) {
- range[j].start = end + 1;
- continue;
- }
-
-
- if (start > range[j].start && end >= range[j].end &&
- range[j].end > start - 1) {
- range[j].end = start - 1;
- continue;
- }
-
- if (start > range[j].start && end < range[j].end) {
- /* Find the new spare: */
- for (i = 0; i < RANGE_NUM; i++) {
- if (range[i].end == 0)
- break;
- }
- if (i < RANGE_NUM) {
- range[i].end = range[j].end;
- range[i].start = end + 1;
- } else {
- printk(KERN_ERR "run of slot in ranges\n");
- }
- range[j].end = start - 1;
- continue;
- }
- }
-}
-
-static int __init cmp_range(const void *x1, const void *x2)
-{
- const struct res_range *r1 = x1;
- const struct res_range *r2 = x2;
- long start1, start2;
-
- start1 = r1->start;
- start2 = r2->start;
-
- return start1 - start2;
-}
-
-static int __init clean_sort_range(struct res_range *range, int az)
-{
- int i, j, k = az - 1, nr_range = 0;
-
- for (i = 0; i < k; i++) {
- if (range[i].end)
- continue;
- for (j = k; j > i; j--) {
- if (range[j].end) {
- k = j;
- break;
- }
- }
- if (j == i)
- break;
- range[i].start = range[k].start;
- range[i].end = range[k].end;
- range[k].start = 0;
- range[k].end = 0;
- k--;
- }
- /* count it */
- for (i = 0; i < az; i++) {
- if (!range[i].end) {
- nr_range = i;
- break;
- }
- }
-
- /* sort them */
- sort(range, nr_range, sizeof(struct res_range), cmp_range, NULL);
-
- return nr_range;
-}
-
#define BIOS_BUG_MSG KERN_WARNING \
"WARNING: BIOS bug: VAR MTRR %d contains strange UC entry under 1M, check with your system vendor!\n"

static int __init
-x86_get_mtrr_mem_range(struct res_range *range, int nr_range,
+x86_get_mtrr_mem_range(struct range *range, int nr_range,
unsigned long extra_remove_base,
unsigned long extra_remove_size)
{
@@ -223,13 +77,13 @@ x86_get_mtrr_mem_range(struct res_range *range, int nr_range,
continue;
base = range_state[i].base_pfn;
size = range_state[i].size_pfn;
- nr_range = add_range_with_merge(range, nr_range, base,
- base + size - 1);
+ nr_range = add_range_with_merge(range, RANGE_NUM, nr_range,
+ base, base + size - 1);
}
if (debug_print) {
printk(KERN_DEBUG "After WB checking\n");
for (i = 0; i < nr_range; i++)
- printk(KERN_DEBUG "MTRR MAP PFN: %016lx - %016lx\n",
+ printk(KERN_DEBUG "MTRR MAP PFN: %016llx - %016llx\n",
range[i].start, range[i].end + 1);
}

@@ -252,10 +106,10 @@ x86_get_mtrr_mem_range(struct res_range *range, int nr_range,
size -= (1<<(20-PAGE_SHIFT)) - base;
base = 1<<(20-PAGE_SHIFT);
}
- subtract_range(range, base, base + size - 1);
+ subtract_range(range, RANGE_NUM, base, base + size - 1);
}
if (extra_remove_size)
- subtract_range(range, extra_remove_base,
+ subtract_range(range, RANGE_NUM, extra_remove_base,
extra_remove_base + extra_remove_size - 1);

if (debug_print) {
@@ -263,7 +117,7 @@ x86_get_mtrr_mem_range(struct res_range *range, int nr_range,
for (i = 0; i < RANGE_NUM; i++) {
if (!range[i].end)
continue;
- printk(KERN_DEBUG "MTRR MAP PFN: %016lx - %016lx\n",
+ printk(KERN_DEBUG "MTRR MAP PFN: %016llx - %016llx\n",
range[i].start, range[i].end + 1);
}
}
@@ -273,20 +127,16 @@ x86_get_mtrr_mem_range(struct res_range *range, int nr_range,
if (debug_print) {
printk(KERN_DEBUG "After sorting\n");
for (i = 0; i < nr_range; i++)
- printk(KERN_DEBUG "MTRR MAP PFN: %016lx - %016lx\n",
+ printk(KERN_DEBUG "MTRR MAP PFN: %016llx - %016llx\n",
range[i].start, range[i].end + 1);
}

- /* clear those is not used */
- for (i = nr_range; i < RANGE_NUM; i++)
- memset(&range[i], 0, sizeof(range[i]));
-
return nr_range;
}

#ifdef CONFIG_MTRR_SANITIZER

-static unsigned long __init sum_ranges(struct res_range *range, int nr_range)
+static unsigned long __init sum_ranges(struct range *range, int nr_range)
{
unsigned long sum = 0;
int i;
@@ -621,7 +471,7 @@ static int __init parse_mtrr_spare_reg(char *arg)
early_param("mtrr_spare_reg_nr", parse_mtrr_spare_reg);

static int __init
-x86_setup_var_mtrrs(struct res_range *range, int nr_range,
+x86_setup_var_mtrrs(struct range *range, int nr_range,
u64 chunk_size, u64 gran_size)
{
struct var_mtrr_state var_state;
@@ -742,7 +592,7 @@ mtrr_calc_range_state(u64 chunk_size, u64 gran_size,
unsigned long x_remove_base,
unsigned long x_remove_size, int i)
{
- static struct res_range range_new[RANGE_NUM];
+ static struct range range_new[RANGE_NUM];
unsigned long range_sums_new;
static int nr_range_new;
int num_reg;
@@ -869,10 +719,10 @@ int __init mtrr_cleanup(unsigned address_bits)
* [0, 1M) should always be covered by var mtrr with WB
* and fixed mtrrs should take effect before var mtrr for it:
*/
- nr_range = add_range_with_merge(range, nr_range, 0,
+ nr_range = add_range_with_merge(range, RANGE_NUM, nr_range, 0,
(1ULL<<(20 - PAGE_SHIFT)) - 1);
/* Sort the ranges: */
- sort(range, nr_range, sizeof(struct res_range), cmp_range, NULL);
+ sort_range(range, nr_range);

range_sums = sum_ranges(range, nr_range);
printk(KERN_INFO "total RAM covered: %ldM\n",
diff --git a/arch/x86/kernel/mmconf-fam10h_64.c b/arch/x86/kernel/mmconf-fam10h_64.c
index 712d15f..7182580 100644
--- a/arch/x86/kernel/mmconf-fam10h_64.c
+++ b/arch/x86/kernel/mmconf-fam10h_64.c
@@ -7,6 +7,8 @@
#include <linux/string.h>
#include <linux/pci.h>
#include <linux/dmi.h>
+#include <linux/range.h>
+
#include <asm/pci-direct.h>
#include <linux/sort.h>
#include <asm/io.h>
@@ -30,11 +32,6 @@ static struct pci_hostbridge_probe pci_probes[] __cpuinitdata = {
{ 0xff, 0, PCI_VENDOR_ID_AMD, 0x1200 },
};

-struct range {
- u64 start;
- u64 end;
-};
-
static int __cpuinit cmp_range(const void *x1, const void *x2)
{
const struct range *r1 = x1;
diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
index 95ecbd4..2356ea1 100644
--- a/arch/x86/pci/amd_bus.c
+++ b/arch/x86/pci/amd_bus.c
@@ -2,6 +2,8 @@
#include <linux/pci.h>
#include <linux/topology.h>
#include <linux/cpu.h>
+#include <linux/range.h>
+
#include <asm/pci_x86.h>

#ifdef CONFIG_X86_64
@@ -17,58 +19,6 @@

#ifdef CONFIG_X86_64

-#define RANGE_NUM 16
-
-struct res_range {
- size_t start;
- size_t end;
-};
-
-static void __init update_range(struct res_range *range, size_t start,
- size_t end)
-{
- int i;
- int j;
-
- for (j = 0; j < RANGE_NUM; j++) {
- if (!range[j].end)
- continue;
-
- if (start <= range[j].start && end >= range[j].end) {
- range[j].start = 0;
- range[j].end = 0;
- continue;
- }
-
- if (start <= range[j].start && end < range[j].end && range[j].start < end + 1) {
- range[j].start = end + 1;
- continue;
- }
-
-
- if (start > range[j].start && end >= range[j].end && range[j].end > start - 1) {
- range[j].end = start - 1;
- continue;
- }
-
- if (start > range[j].start && end < range[j].end) {
- /* find the new spare */
- for (i = 0; i < RANGE_NUM; i++) {
- if (range[i].end == 0)
- break;
- }
- if (i < RANGE_NUM) {
- range[i].end = range[j].end;
- range[i].start = end + 1;
- } else {
- printk(KERN_ERR "run of slot in ranges\n");
- }
- range[j].end = start - 1;
- continue;
- }
- }
-}
-
struct pci_hostbridge_probe {
u32 bus;
u32 slot;
@@ -111,6 +61,8 @@ static void __init get_pci_mmcfg_amd_fam10h_range(void)
fam10h_mmconf_end = base + (1ULL<<(segn_busn_bits + 20)) - 1;
}

+#define RANGE_NUM 16
+
/**
* early_fill_mp_bus_to_node()
* called before pcibios_scan_root and pci_scan_bus
@@ -132,7 +84,7 @@ static int __init early_fill_mp_bus_info(void)
struct resource *res;
size_t start;
size_t end;
- struct res_range range[RANGE_NUM];
+ struct range range[RANGE_NUM];
u64 val;
u32 address;

@@ -226,7 +178,7 @@ static int __init early_fill_mp_bus_info(void)
if (end > 0xffff)
end = 0xffff;
update_res(info, start, end, IORESOURCE_IO, 1);
- update_range(range, start, end);
+ subtract_range(range, RANGE_NUM, start, end);
}
/* add left over io port range to def node/link, [0, 0xffff] */
/* find the position */
@@ -256,14 +208,14 @@ static int __init early_fill_mp_bus_info(void)
end = (val & 0xffffff800000ULL);
printk(KERN_INFO "TOM: %016lx aka %ldM\n", end, end>>20);
if (end < (1ULL<<32))
- update_range(range, 0, end - 1);
+ subtract_range(range, RANGE_NUM, 0, end - 1);

/* get mmconfig */
get_pci_mmcfg_amd_fam10h_range();
/* need to take out mmconf range */
if (fam10h_mmconf_end) {
printk(KERN_DEBUG "Fam 10h mmconf [%llx, %llx]\n", fam10h_mmconf_start, fam10h_mmconf_end);
- update_range(range, fam10h_mmconf_start, fam10h_mmconf_end);
+ subtract_range(range, RANGE_NUM, fam10h_mmconf_start, fam10h_mmconf_end);
}

/* mmio resource */
@@ -318,7 +270,7 @@ static int __init early_fill_mp_bus_info(void)
/* we got a hole */
endx = fam10h_mmconf_start - 1;
update_res(info, start, endx, IORESOURCE_MEM, 0);
- update_range(range, start, endx);
+ subtract_range(range, RANGE_NUM, start, endx);
printk(KERN_CONT " ==> [%llx, %llx]", (u64)start, endx);
start = fam10h_mmconf_end + 1;
changed = 1;
@@ -334,7 +286,7 @@ static int __init early_fill_mp_bus_info(void)
}

update_res(info, start, end, IORESOURCE_MEM, 1);
- update_range(range, start, end);
+ subtract_range(range, RANGE_NUM, start, end);
printk(KERN_CONT "\n");
}

@@ -349,7 +301,7 @@ static int __init early_fill_mp_bus_info(void)
rdmsrl(address, val);
end = (val & 0xffffff800000ULL);
printk(KERN_INFO "TOM2: %016lx aka %ldM\n", end, end>>20);
- update_range(range, 1ULL<<32, end - 1);
+ subtract_range(range, RANGE_NUM, 1ULL<<32, end - 1);
}

/*
diff --git a/include/linux/range.h b/include/linux/range.h
new file mode 100644
index 0000000..0789f14
--- /dev/null
+++ b/include/linux/range.h
@@ -0,0 +1,22 @@
+#ifndef _LINUX_RANGE_H
+#define _LINUX_RANGE_H
+
+struct range {
+ u64 start;
+ u64 end;
+};
+
+int add_range(struct range *range, int az, int nr_range,
+ u64 start, u64 end);
+
+
+int add_range_with_merge(struct range *range, int az, int nr_range,
+ u64 start, u64 end);
+
+void subtract_range(struct range *range, int az, u64 start, u64 end);
+
+int clean_sort_range(struct range *range, int az);
+
+void sort_range(struct range *range, int nr_range);
+
+#endif
diff --git a/kernel/Makefile b/kernel/Makefile
index 8a5abe5..847a2a2 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -10,7 +10,7 @@ obj-y = sched.o fork.o exec_domain.o panic.o printk.o \
kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
notifier.o ksysfs.o pm_qos_params.o sched_clock.o cred.o \
- async.o
+ async.o range.o
obj-y += groups.o

ifdef CONFIG_FUNCTION_TRACER
diff --git a/kernel/range.c b/kernel/range.c
new file mode 100644
index 0000000..71e0021
--- /dev/null
+++ b/kernel/range.c
@@ -0,0 +1,163 @@
+/*
+ * Range add and subtract
+ */
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/sort.h>
+
+#include <linux/range.h>
+
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#endif
+
+int add_range(struct range *range, int az, int nr_range, u64 start, u64 end)
+{
+ if (start > end)
+ return nr_range;
+
+ /* Out of slots: */
+ if (nr_range >= az)
+ return nr_range;
+
+ range[nr_range].start = start;
+ range[nr_range].end = end;
+
+ nr_range++;
+
+ return nr_range;
+}
+
+int add_range_with_merge(struct range *range, int az, int nr_range,
+ u64 start, u64 end)
+{
+ int i;
+
+ if (start > end)
+ return nr_range;
+
+ /* Try to merge it with old one: */
+ for (i = 0; i < nr_range; i++) {
+ u64 final_start, final_end;
+ u64 common_start, common_end;
+
+ if (!range[i].end)
+ continue;
+
+ common_start = max(range[i].start, start);
+ common_end = min(range[i].end, end);
+ if (common_start > common_end + 1)
+ continue;
+
+ final_start = min(range[i].start, start);
+ final_end = max(range[i].end, end);
+
+ range[i].start = final_start;
+ range[i].end = final_end;
+ return nr_range;
+ }
+
+ /* Need to add it: */
+ return add_range(range, az, nr_range, start, end);
+}
+
+void subtract_range(struct range *range, int az, u64 start, u64 end)
+{
+ int i, j;
+
+ if (start > end)
+ return;
+
+ for (j = 0; j < az; j++) {
+ if (!range[j].end)
+ continue;
+
+ if (start <= range[j].start && end >= range[j].end) {
+ range[j].start = 0;
+ range[j].end = 0;
+ continue;
+ }
+
+ if (start <= range[j].start && end < range[j].end &&
+ range[j].start < end + 1) {
+ range[j].start = end + 1;
+ continue;
+ }
+
+
+ if (start > range[j].start && end >= range[j].end &&
+ range[j].end > start - 1) {
+ range[j].end = start - 1;
+ continue;
+ }
+
+ if (start > range[j].start && end < range[j].end) {
+ /* Find the new spare: */
+ for (i = 0; i < az; i++) {
+ if (range[i].end == 0)
+ break;
+ }
+ if (i < az) {
+ range[i].end = range[j].end;
+ range[i].start = end + 1;
+ } else {
+ printk(KERN_ERR "run of slot in ranges\n");
+ }
+ range[j].end = start - 1;
+ continue;
+ }
+ }
+}
+
+static int cmp_range(const void *x1, const void *x2)
+{
+ const struct range *r1 = x1;
+ const struct range *r2 = x2;
+ s64 start1, start2;
+
+ start1 = r1->start;
+ start2 = r2->start;
+
+ return start1 - start2;
+}
+
+int clean_sort_range(struct range *range, int az)
+{
+ int i, j, k = az - 1, nr_range = 0;
+
+ for (i = 0; i < k; i++) {
+ if (range[i].end)
+ continue;
+ for (j = k; j > i; j--) {
+ if (range[j].end) {
+ k = j;
+ break;
+ }
+ }
+ if (j == i)
+ break;
+ range[i].start = range[k].start;
+ range[i].end = range[k].end;
+ range[k].start = 0;
+ range[k].end = 0;
+ k--;
+ }
+ /* count it */
+ for (i = 0; i < az; i++) {
+ if (!range[i].end) {
+ nr_range = i;
+ break;
+ }
+ }
+
+ /* sort them */
+ sort(range, nr_range, sizeof(struct range), cmp_range, NULL);
+
+ return nr_range;
+}
+
+void sort_range(struct range *range, int nr_range)
+{
+ /* sort them */
+ sort(range, nr_range, sizeof(struct range), cmp_range, NULL);
+}
--
1.6.4.2

2010-02-09 19:36:59

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 09/35] x86: print out for RAM buffer

so could check that early in bootlog

Acked-by: Yinghai Lu <[email protected]>
Reviewed-by: Christoph Lameter <[email protected]>
---
arch/x86/kernel/e820.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index a966b75..dfb1689 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1429,6 +1429,9 @@ void __init e820_reserve_resources_late(void)
end = MAX_RESOURCE_SIZE;
if (start >= end)
continue;
+ printk(KERN_DEBUG "reserve RAM buffer: %016Lx - %016Lx ",
+ (unsigned long long) start,
+ (unsigned long long) end);
reserve_region_with_split(&iomem_resource, start, end,
"RAM buffer");
}
--
1.6.4.2

2010-02-09 19:36:15

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 18/35] x86: move bios page reserve early to head32/64.c

so prepare to make one more clean early_res.c

-v2: don't need to reserve first page in early_res
because we already mark that in e820 as reserved already.

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/kernel/e820.c | 22 ++--------------------
arch/x86/kernel/head32.c | 10 ++++++++++
2 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index fd14052..42dc8f1 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -743,29 +743,11 @@ struct early_res {
char name[15];
char overlap_ok;
};
-static struct early_res early_res_x[MAX_EARLY_RES_X] __initdata = {
- { 0, PAGE_SIZE, "BIOS data page", 1 }, /* BIOS data page */
-#if defined(CONFIG_X86_32) && defined(CONFIG_X86_TRAMPOLINE)
- /*
- * But first pinch a few for the stack/trampoline stuff
- * FIXME: Don't need the extra page at 4K, but need to fix
- * trampoline before removing it. (see the GDT stuff)
- */
- { PAGE_SIZE, PAGE_SIZE + PAGE_SIZE, "EX TRAMPOLINE", 1 },
-#endif
-
- {}
-};
+static struct early_res early_res_x[MAX_EARLY_RES_X] __initdata;

static int max_early_res __initdata = MAX_EARLY_RES_X;
static struct early_res *early_res __initdata = &early_res_x[0];
-static int early_res_count __initdata =
-#ifdef CONFIG_X86_32
- 2
-#else
- 1
-#endif
- ;
+static int early_res_count __initdata;

static int __init find_overlapped_early(u64 start, u64 end)
{
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index 5051b94..adedeef 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -29,6 +29,16 @@ static void __init i386_default_early_setup(void)

void __init i386_start_kernel(void)
{
+#ifdef CONFIG_X86_TRAMPOLINE
+ /*
+ * But first pinch a few for the stack/trampoline stuff
+ * FIXME: Don't need the extra page at 4K, but need to fix
+ * trampoline before removing it. (see the GDT stuff)
+ */
+ reserve_early_overlap_ok(PAGE_SIZE, PAGE_SIZE + PAGE_SIZE,
+ "EX TRAMPOLINE");
+#endif
+
reserve_early(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");

#ifdef CONFIG_BLK_DEV_INITRD
--
1.6.4.2

2010-02-09 19:36:53

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 33/35] x86: according to nr_cpu_ids to decide if need to leave logical flat

should use nr_cpu_ids instead of num_processor, in case we have hotplug cpus.

if current only have 8 cpus is up, but if we will have more cpus that
will be hot added later, we should use physical flat at first.

nr_cpu_ids is the total cpus that could be supported.

-v2: per linus, chenage default_setup_apic_routing to _init, and call it for uni_processor

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/kernel/apic/apic.c | 2 --
arch/x86/kernel/apic/probe_32.c | 20 ++++++++++++++++++--
arch/x86/kernel/apic/probe_64.c | 6 +++++-
arch/x86/kernel/smpboot.c | 2 --
4 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index a85f216..e97f18b 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1641,9 +1641,7 @@ int __init APIC_init_uniprocessor(void)
#endif

enable_IR_x2apic();
-#ifdef CONFIG_X86_64
default_setup_apic_routing();
-#endif

verify_local_APIC();
connect_bsp_APIC();
diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c
index 1a6559f..d657558 100644
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -52,7 +52,7 @@ static int __init print_ipi_mode(void)
}
late_initcall(print_ipi_mode);

-void default_setup_apic_routing(void)
+static void local_default_setup_apic_routing(void)
{
#ifdef CONFIG_X86_IO_APIC
printk(KERN_INFO
@@ -103,7 +103,7 @@ struct apic apic_default = {
.init_apic_ldr = default_init_apic_ldr,

.ioapic_phys_id_map = default_ioapic_phys_id_map,
- .setup_apic_routing = default_setup_apic_routing,
+ .setup_apic_routing = local_default_setup_apic_routing,
.multi_timer_check = NULL,
.apicid_to_node = default_apicid_to_node,
.cpu_to_logical_apicid = default_cpu_to_logical_apicid,
@@ -212,6 +212,22 @@ void __init generic_bigsmp_probe(void)
#endif
}

+void __init default_setup_apic_routing(void)
+{
+#ifdef CONFIG_X86_BIGSMP
+ /*
+ * make sure we go to bigsmp according to real nr_cpu_ids
+ */
+ if (!cmdline_apic && apic == &apic_default) {
+ if (nr_cpu_ids > 8) {
+ apic = &apic_bigsmp;
+ printk(KERN_INFO "Overriding APIC driver with %s\n",
+ apic->name);
+ }
+ }
+#endif
+}
+
void __init generic_apic_probe(void)
{
if (!cmdline_apic) {
diff --git a/arch/x86/kernel/apic/probe_64.c b/arch/x86/kernel/apic/probe_64.c
index 450fe20..7dce6f9 100644
--- a/arch/x86/kernel/apic/probe_64.c
+++ b/arch/x86/kernel/apic/probe_64.c
@@ -67,7 +67,11 @@ void __init default_setup_apic_routing(void)
}
#endif

- if (apic == &apic_flat && num_processors > 8)
+ /*
+ * not just num_processors, we could have hotplug cpus plugged
+ * in later
+ */
+ if (apic == &apic_flat && nr_cpu_ids > 8)
apic = &apic_physflat;

printk(KERN_INFO "Setting APIC routing to %s\n", apic->name);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 6c69693..da99eef 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1087,9 +1087,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
set_cpu_sibling_map(0);

enable_IR_x2apic();
-#ifdef CONFIG_X86_64
default_setup_apic_routing();
-#endif

if (smp_sanity_check(max_cpus) < 0) {
printk(KERN_INFO "SMP disabled\n");
--
1.6.4.2

2010-02-09 19:37:01

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 13/35] x86: make early_node_mem get mem > 4g if possible

so we could put pgdata for the node high, and later sparse
vmmap will get the section nr that need.

with this patch will make <4g ram will not use sparse vmmap

before this patch, will get, before swiotlb try get bootmem
[ 0.000000] nid=1 start=0 end=2080000 aligned=1
[ 0.000000] free [10 - 96]
[ 0.000000] free [b12 - 1000]
[ 0.000000] free [359f - 38a3]
[ 0.000000] free [38b5 - 3a00]
[ 0.000000] free [41e01 - 42000]
[ 0.000000] free [73dde - 73e00]
[ 0.000000] free [73fdd - 74000]
[ 0.000000] free [741dd - 74200]
[ 0.000000] free [743dd - 74400]
[ 0.000000] free [745dd - 74600]
[ 0.000000] free [747dd - 74800]
[ 0.000000] free [749dd - 74a00]
[ 0.000000] free [74bdd - 74c00]
[ 0.000000] free [74ddd - 74e00]
[ 0.000000] free [74fdd - 75000]
[ 0.000000] free [751dd - 75200]
[ 0.000000] free [753dd - 75400]
[ 0.000000] free [755dd - 75600]
[ 0.000000] free [757dd - 75800]
[ 0.000000] free [759dd - 75a00]
[ 0.000000] free [75bdd - 7bf5f]
[ 0.000000] free [7f730 - 7f750]
[ 0.000000] free [100000 - 2080000]
[ 0.000000] total free 1f87170
[ 93.301474] Placing 64MB software IO TLB between ffff880075bdd000 - ffff880079bdd000
[ 93.311814] software IO TLB at phys 0x75bdd000 - 0x79bdd000

with this patch will get: before swiotlb try get bootmem
[ 0.000000] nid=1 start=0 end=2080000 aligned=1
[ 0.000000] free [a - 96]
[ 0.000000] free [702 - 1000]
[ 0.000000] free [359f - 3600]
[ 0.000000] free [37de - 3800]
[ 0.000000] free [39dd - 3a00]
[ 0.000000] free [3bdd - 3c00]
[ 0.000000] free [3ddd - 3e00]
[ 0.000000] free [3fdd - 4000]
[ 0.000000] free [41dd - 4200]
[ 0.000000] free [43dd - 4400]
[ 0.000000] free [45dd - 4600]
[ 0.000000] free [47dd - 4800]
[ 0.000000] free [49dd - 4a00]
[ 0.000000] free [4bdd - 4c00]
[ 0.000000] free [4ddd - 4e00]
[ 0.000000] free [4fdd - 5000]
[ 0.000000] free [51dd - 5200]
[ 0.000000] free [53dd - 5400]
[ 0.000000] free [55dd - 7bf5f]
[ 0.000000] free [7f730 - 7f750]
[ 0.000000] free [100428 - 100600]
[ 0.000000] free [13ea01 - 13ec00]
[ 0.000000] free [170800 - 2080000]
[ 0.000000] total free 1f87170

[ 92.689485] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[ 92.699799] Placing 64MB software IO TLB between ffff8800055dd000 - ffff8800095dd000
[ 92.710916] software IO TLB at phys 0x55dd000 - 0x95dd000

so will get enough space below 4G, aka pfn 0x100000

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/mm/numa_64.c | 23 ++++++++++++++++++-----
1 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 3232148..02f13cb 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -163,14 +163,27 @@ static void * __init early_node_mem(int nodeid, unsigned long start,
unsigned long end, unsigned long size,
unsigned long align)
{
- unsigned long mem = find_e820_area(start, end, size, align);
+ unsigned long mem;

+ /*
+ * put it on high as possible
+ * something will go with NODE_DATA
+ */
+ if (start < (MAX_DMA_PFN<<PAGE_SHIFT))
+ start = MAX_DMA_PFN<<PAGE_SHIFT;
+ if (start < (MAX_DMA32_PFN<<PAGE_SHIFT) &&
+ end > (MAX_DMA32_PFN<<PAGE_SHIFT))
+ start = MAX_DMA32_PFN<<PAGE_SHIFT;
+ mem = find_e820_area(start, end, size, align);
if (mem != -1L)
return __va(mem);

-
- start = __pa(MAX_DMA_ADDRESS);
- end = max_low_pfn_mapped << PAGE_SHIFT;
+ /* extend the search scope */
+ end = max_pfn_mapped << PAGE_SHIFT;
+ if (end > (MAX_DMA32_PFN<<PAGE_SHIFT))
+ start = MAX_DMA32_PFN<<PAGE_SHIFT;
+ else
+ start = MAX_DMA_PFN<<PAGE_SHIFT;
mem = find_e820_area(start, end, size, align);
if (mem != -1L)
return __va(mem);
--
1.6.4.2

2010-02-09 19:36:13

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 19/35] x86: seperate early_res related code from e820.c

to make e820.c smaller.

-v2: fix 32bit compiling with MAX_DMA32_PFN

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/e820.h | 13 +-
arch/x86/include/asm/early_res.h | 20 ++
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/e820.c | 541 +-------------------------------------
arch/x86/kernel/early_res.c | 539 +++++++++++++++++++++++++++++++++++++
5 files changed, 562 insertions(+), 553 deletions(-)
create mode 100644 arch/x86/include/asm/early_res.h
create mode 100644 arch/x86/kernel/early_res.c

diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index 7d72e5f..efad699 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -109,19 +109,8 @@ static inline void early_memtest(unsigned long start, unsigned long end)

extern unsigned long end_user_pfn;

-extern u64 find_e820_area(u64 start, u64 end, u64 size, u64 align);
-extern u64 find_e820_area_size(u64 start, u64 *sizep, u64 align);
-extern void reserve_early(u64 start, u64 end, char *name);
-extern void reserve_early_overlap_ok(u64 start, u64 end, char *name);
-extern void free_early(u64 start, u64 end);
-extern void early_res_to_bootmem(u64 start, u64 end);
extern u64 early_reserve_e820(u64 startt, u64 sizet, u64 align);
-
-void reserve_early_without_check(u64 start, u64 end, char *name);
-u64 find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
- u64 size, u64 align);
-#include <linux/range.h>
-int get_free_all_memory_range(struct range **rangep, int nodeid);
+#include <asm/early_res.h>

extern unsigned long e820_end_of_ram_pfn(void);
extern unsigned long e820_end_of_low_ram_pfn(void);
diff --git a/arch/x86/include/asm/early_res.h b/arch/x86/include/asm/early_res.h
new file mode 100644
index 0000000..2d43b16
--- /dev/null
+++ b/arch/x86/include/asm/early_res.h
@@ -0,0 +1,20 @@
+#ifndef _ASM_X86_EARLY_RES_H
+#define _ASM_X86_EARLY_RES_H
+#ifdef __KERNEL__
+
+extern u64 find_e820_area(u64 start, u64 end, u64 size, u64 align);
+extern u64 find_e820_area_size(u64 start, u64 *sizep, u64 align);
+extern void reserve_early(u64 start, u64 end, char *name);
+extern void reserve_early_overlap_ok(u64 start, u64 end, char *name);
+extern void free_early(u64 start, u64 end);
+extern void early_res_to_bootmem(u64 start, u64 end);
+
+void reserve_early_without_check(u64 start, u64 end, char *name);
+u64 find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
+ u64 size, u64 align);
+#include <linux/range.h>
+int get_free_all_memory_range(struct range **rangep, int nodeid);
+
+#endif /* __KERNEL__ */
+
+#endif /* _ASM_X86_EARLY_RES_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index d87f09b..f5fb9f0 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -38,7 +38,7 @@ obj-$(CONFIG_X86_32) += probe_roms_32.o
obj-$(CONFIG_X86_32) += sys_i386_32.o i386_ksyms_32.o
obj-$(CONFIG_X86_64) += sys_x86_64.o x8664_ksyms_64.o
obj-$(CONFIG_X86_64) += syscall_64.o vsyscall_64.o
-obj-y += bootflag.o e820.o
+obj-y += bootflag.o e820.o early_res.o
obj-y += pci-dma.o quirks.o i8237.o topology.o kdebugfs.o
obj-y += alternative.o i8253.o pci-nommu.o hw_breakpoint.o
obj-y += tsc.o io_delay.o rtc.o
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 42dc8f1..096bb31 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -12,21 +12,14 @@
#include <linux/types.h>
#include <linux/init.h>
#include <linux/bootmem.h>
-#include <linux/ioport.h>
-#include <linux/string.h>
-#include <linux/kexec.h>
-#include <linux/module.h>
-#include <linux/mm.h>
#include <linux/pfn.h>
#include <linux/suspend.h>
#include <linux/firmware-map.h>

-#include <asm/pgtable.h>
-#include <asm/page.h>
#include <asm/e820.h>
+#include <asm/early_res.h>
#include <asm/proto.h>
#include <asm/setup.h>
-#include <asm/trampoline.h>

/*
* The e820 map is the map that gets modified e.g. with command line parameters
@@ -730,538 +723,6 @@ core_initcall(e820_mark_nvs_memory);
#endif

/*
- * Early reserved memory areas.
- */
-/*
- * need to make sure this one is bigger enough before
- * find_e820_area could be used
- */
-#define MAX_EARLY_RES_X 32
-
-struct early_res {
- u64 start, end;
- char name[15];
- char overlap_ok;
-};
-static struct early_res early_res_x[MAX_EARLY_RES_X] __initdata;
-
-static int max_early_res __initdata = MAX_EARLY_RES_X;
-static struct early_res *early_res __initdata = &early_res_x[0];
-static int early_res_count __initdata;
-
-static int __init find_overlapped_early(u64 start, u64 end)
-{
- int i;
- struct early_res *r;
-
- for (i = 0; i < max_early_res && early_res[i].end; i++) {
- r = &early_res[i];
- if (end > r->start && start < r->end)
- break;
- }
-
- return i;
-}
-
-/*
- * Drop the i-th range from the early reservation map,
- * by copying any higher ranges down one over it, and
- * clearing what had been the last slot.
- */
-static void __init drop_range(int i)
-{
- int j;
-
- for (j = i + 1; j < max_early_res && early_res[j].end; j++)
- ;
-
- memmove(&early_res[i], &early_res[i + 1],
- (j - 1 - i) * sizeof(struct early_res));
-
- early_res[j - 1].end = 0;
- early_res_count--;
-}
-
-/*
- * Split any existing ranges that:
- * 1) are marked 'overlap_ok', and
- * 2) overlap with the stated range [start, end)
- * into whatever portion (if any) of the existing range is entirely
- * below or entirely above the stated range. Drop the portion
- * of the existing range that overlaps with the stated range,
- * which will allow the caller of this routine to then add that
- * stated range without conflicting with any existing range.
- */
-static void __init drop_overlaps_that_are_ok(u64 start, u64 end)
-{
- int i;
- struct early_res *r;
- u64 lower_start, lower_end;
- u64 upper_start, upper_end;
- char name[15];
-
- for (i = 0; i < max_early_res && early_res[i].end; i++) {
- r = &early_res[i];
-
- /* Continue past non-overlapping ranges */
- if (end <= r->start || start >= r->end)
- continue;
-
- /*
- * Leave non-ok overlaps as is; let caller
- * panic "Overlapping early reservations"
- * when it hits this overlap.
- */
- if (!r->overlap_ok)
- return;
-
- /*
- * We have an ok overlap. We will drop it from the early
- * reservation map, and add back in any non-overlapping
- * portions (lower or upper) as separate, overlap_ok,
- * non-overlapping ranges.
- */
-
- /* 1. Note any non-overlapping (lower or upper) ranges. */
- strncpy(name, r->name, sizeof(name) - 1);
-
- lower_start = lower_end = 0;
- upper_start = upper_end = 0;
- if (r->start < start) {
- lower_start = r->start;
- lower_end = start;
- }
- if (r->end > end) {
- upper_start = end;
- upper_end = r->end;
- }
-
- /* 2. Drop the original ok overlapping range */
- drop_range(i);
-
- i--; /* resume for-loop on copied down entry */
-
- /* 3. Add back in any non-overlapping ranges. */
- if (lower_end)
- reserve_early_overlap_ok(lower_start, lower_end, name);
- if (upper_end)
- reserve_early_overlap_ok(upper_start, upper_end, name);
- }
-}
-
-static void __init __reserve_early(u64 start, u64 end, char *name,
- int overlap_ok)
-{
- int i;
- struct early_res *r;
-
- i = find_overlapped_early(start, end);
- if (i >= max_early_res)
- panic("Too many early reservations");
- r = &early_res[i];
- if (r->end)
- panic("Overlapping early reservations "
- "%llx-%llx %s to %llx-%llx %s\n",
- start, end - 1, name?name:"", r->start,
- r->end - 1, r->name);
- r->start = start;
- r->end = end;
- r->overlap_ok = overlap_ok;
- if (name)
- strncpy(r->name, name, sizeof(r->name) - 1);
- early_res_count++;
-}
-
-/*
- * A few early reservtations come here.
- *
- * The 'overlap_ok' in the name of this routine does -not- mean it
- * is ok for these reservations to overlap an earlier reservation.
- * Rather it means that it is ok for subsequent reservations to
- * overlap this one.
- *
- * Use this entry point to reserve early ranges when you are doing
- * so out of "Paranoia", reserving perhaps more memory than you need,
- * just in case, and don't mind a subsequent overlapping reservation
- * that is known to be needed.
- *
- * The drop_overlaps_that_are_ok() call here isn't really needed.
- * It would be needed if we had two colliding 'overlap_ok'
- * reservations, so that the second such would not panic on the
- * overlap with the first. We don't have any such as of this
- * writing, but might as well tolerate such if it happens in
- * the future.
- */
-void __init reserve_early_overlap_ok(u64 start, u64 end, char *name)
-{
- drop_overlaps_that_are_ok(start, end);
- __reserve_early(start, end, name, 1);
-}
-
-static void __init __check_and_double_early_res(u64 start)
-{
- u64 end, size, mem;
- struct early_res *new;
-
- /* do we have enough slots left ? */
- if ((max_early_res - early_res_count) > max(max_early_res/8, 2))
- return;
-
- /* double it */
- end = max_pfn_mapped << PAGE_SHIFT;
- size = sizeof(struct early_res) * max_early_res * 2;
- mem = find_e820_area(start, end, size, sizeof(struct early_res));
-
- if (mem == -1ULL)
- panic("can not find more space for early_res array");
-
- new = __va(mem);
- /* save the first one for own */
- new[0].start = mem;
- new[0].end = mem + size;
- new[0].overlap_ok = 0;
- /* copy old to new */
- if (early_res == early_res_x) {
- memcpy(&new[1], &early_res[0],
- sizeof(struct early_res) * max_early_res);
- memset(&new[max_early_res+1], 0,
- sizeof(struct early_res) * (max_early_res - 1));
- early_res_count++;
- } else {
- memcpy(&new[1], &early_res[1],
- sizeof(struct early_res) * (max_early_res - 1));
- memset(&new[max_early_res], 0,
- sizeof(struct early_res) * max_early_res);
- }
- memset(&early_res[0], 0, sizeof(struct early_res) * max_early_res);
- early_res = new;
- max_early_res *= 2;
- printk(KERN_DEBUG "early_res array is doubled to %d at [%llx - %llx]\n",
- max_early_res, mem, mem + size - 1);
-}
-
-/*
- * Most early reservations come here.
- *
- * We first have drop_overlaps_that_are_ok() drop any pre-existing
- * 'overlap_ok' ranges, so that we can then reserve this memory
- * range without risk of panic'ing on an overlapping overlap_ok
- * early reservation.
- */
-void __init reserve_early(u64 start, u64 end, char *name)
-{
- if (start >= end)
- return;
-
- __check_and_double_early_res(end);
-
- drop_overlaps_that_are_ok(start, end);
- __reserve_early(start, end, name, 0);
-}
-
-void __init reserve_early_without_check(u64 start, u64 end, char *name)
-{
- struct early_res *r;
-
- if (start >= end)
- return;
-
- __check_and_double_early_res(end);
-
- r = &early_res[early_res_count];
-
- r->start = start;
- r->end = end;
- r->overlap_ok = 0;
- if (name)
- strncpy(r->name, name, sizeof(r->name) - 1);
- early_res_count++;
-}
-
-void __init free_early(u64 start, u64 end)
-{
- struct early_res *r;
- int i;
-
- i = find_overlapped_early(start, end);
- r = &early_res[i];
- if (i >= max_early_res || r->end != end || r->start != start)
- panic("free_early on not reserved area: %llx-%llx!",
- start, end - 1);
-
- drop_range(i);
-}
-
-#ifdef CONFIG_NO_BOOTMEM
-static void __init subtract_early_res(struct range *range, int az)
-{
- int i, count;
- u64 final_start, final_end;
- int idx = 0;
-
- count = 0;
- for (i = 0; i < max_early_res && early_res[i].end; i++)
- count++;
-
- /* need to skip first one ?*/
- if (early_res != early_res_x)
- idx = 1;
-
-#if 1
- printk(KERN_INFO "Subtract (%d early reservations)\n", count);
-#endif
- for (i = idx; i < count; i++) {
- struct early_res *r = &early_res[i];
-#if 0
- printk(KERN_INFO " #%d [%010llx - %010llx] %15s", i,
- r->start, r->end, r->name);
-#endif
- final_start = PFN_DOWN(r->start);
- final_end = PFN_UP(r->end);
- if (final_start >= final_end) {
-#if 0
- printk(KERN_CONT "\n");
-#endif
- continue;
- }
-#if 0
- printk(KERN_CONT " subtract pfn [%010llx - %010llx]\n",
- final_start, final_end);
-#endif
- subtract_range(range, az, final_start, final_end);
- }
-
-}
-
-int __init get_free_all_memory_range(struct range **rangep, int nodeid)
-{
- int i, count;
- u64 start = 0, end;
- u64 size;
- u64 mem;
- struct range *range;
- int nr_range;
-
- count = 0;
- for (i = 0; i < max_early_res && early_res[i].end; i++)
- count++;
-
- count *= 2;
-
- size = sizeof(struct range) * count;
-#ifdef MAX_DMA32_PFN
- if (max_pfn_mapped > MAX_DMA32_PFN)
- start = MAX_DMA32_PFN << PAGE_SHIFT;
-#endif
- end = max_pfn_mapped << PAGE_SHIFT;
- mem = find_e820_area(start, end, size, sizeof(struct range));
- if (mem == -1ULL)
- panic("can not find more space for range free");
-
- range = __va(mem);
- /* use early_node_map[] and early_res to get range array at first */
- memset(range, 0, size);
- nr_range = 0;
-
- /* need to go over early_node_map to find out good range for node */
- nr_range = add_from_early_node_map(range, count, nr_range, nodeid);
- subtract_early_res(range, count);
- nr_range = clean_sort_range(range, count);
-
- /* need to clear it ? */
- if (nodeid == MAX_NUMNODES) {
- memset(&early_res[0], 0,
- sizeof(struct early_res) * max_early_res);
- early_res = NULL;
- max_early_res = 0;
- }
-
- *rangep = range;
- return nr_range;
-}
-#else
-void __init early_res_to_bootmem(u64 start, u64 end)
-{
- int i, count;
- u64 final_start, final_end;
- int idx = 0;
-
- count = 0;
- for (i = 0; i < max_early_res && early_res[i].end; i++)
- count++;
-
- /* need to skip first one ?*/
- if (early_res != early_res_x)
- idx = 1;
-
- printk(KERN_INFO "(%d/%d early reservations) ==> bootmem [%010llx - %010llx]\n",
- count - idx, max_early_res, start, end);
- for (i = idx; i < count; i++) {
- struct early_res *r = &early_res[i];
- printk(KERN_INFO " #%d [%010llx - %010llx] %16s", i,
- r->start, r->end, r->name);
- final_start = max(start, r->start);
- final_end = min(end, r->end);
- if (final_start >= final_end) {
- printk(KERN_CONT "\n");
- continue;
- }
- printk(KERN_CONT " ==> [%010llx - %010llx]\n",
- final_start, final_end);
- reserve_bootmem_generic(final_start, final_end - final_start,
- BOOTMEM_DEFAULT);
- }
- /* clear them */
- memset(&early_res[0], 0, sizeof(struct early_res) * max_early_res);
- early_res = NULL;
- max_early_res = 0;
- early_res_count = 0;
-}
-#endif
-
-/* Check for already reserved areas */
-static inline int __init bad_addr(u64 *addrp, u64 size, u64 align)
-{
- int i;
- u64 addr = *addrp;
- int changed = 0;
- struct early_res *r;
-again:
- i = find_overlapped_early(addr, addr + size);
- r = &early_res[i];
- if (i < max_early_res && r->end) {
- *addrp = addr = round_up(r->end, align);
- changed = 1;
- goto again;
- }
- return changed;
-}
-
-/* Check for already reserved areas */
-static inline int __init bad_addr_size(u64 *addrp, u64 *sizep, u64 align)
-{
- int i;
- u64 addr = *addrp, last;
- u64 size = *sizep;
- int changed = 0;
-again:
- last = addr + size;
- for (i = 0; i < max_early_res && early_res[i].end; i++) {
- struct early_res *r = &early_res[i];
- if (last > r->start && addr < r->start) {
- size = r->start - addr;
- changed = 1;
- goto again;
- }
- if (last > r->end && addr < r->end) {
- addr = round_up(r->end, align);
- size = last - addr;
- changed = 1;
- goto again;
- }
- if (last <= r->end && addr >= r->start) {
- (*sizep)++;
- return 0;
- }
- }
- if (changed) {
- *addrp = addr;
- *sizep = size;
- }
- return changed;
-}
-
-/*
- * Find a free area with specified alignment in a specific range.
- * only with the area.between start to end is active range from early_node_map
- * so they are good as RAM
- */
-u64 __init find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
- u64 size, u64 align)
-{
- u64 addr, last;
-
- addr = round_up(ei_start, align);
- if (addr < start)
- addr = round_up(start, align);
- if (addr >= ei_last)
- goto out;
- while (bad_addr(&addr, size, align) && addr+size <= ei_last)
- ;
- last = addr + size;
- if (last > ei_last)
- goto out;
- if (last > end)
- goto out;
-
- return addr;
-
-out:
- return -1ULL;
-}
-
-/*
- * Find a free area with specified alignment in a specific range.
- */
-u64 __init find_e820_area(u64 start, u64 end, u64 size, u64 align)
-{
- int i;
-
- for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *ei = &e820.map[i];
- u64 addr;
- u64 ei_start, ei_last;
-
- if (ei->type != E820_RAM)
- continue;
-
- ei_last = ei->addr + ei->size;
- ei_start = ei->addr;
- addr = find_early_area(ei_start, ei_last, start, end,
- size, align);
-
- if (addr == -1ULL)
- continue;
-
- return addr;
- }
- return -1ULL;
-}
-
-/*
- * Find next free range after *start
- */
-u64 __init find_e820_area_size(u64 start, u64 *sizep, u64 align)
-{
- int i;
-
- for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *ei = &e820.map[i];
- u64 addr, last;
- u64 ei_last;
-
- if (ei->type != E820_RAM)
- continue;
- addr = round_up(ei->addr, align);
- ei_last = ei->addr + ei->size;
- if (addr < start)
- addr = round_up(start, align);
- if (addr >= ei_last)
- continue;
- *sizep = ei_last - addr;
- while (bad_addr_size(&addr, sizep, align) &&
- addr + *sizep <= ei_last)
- ;
- last = addr + *sizep;
- if (last > ei_last)
- continue;
- return addr;
- }
-
- return -1ULL;
-}
-
-/*
* pre allocated 4k and reserved it in e820
*/
u64 __init early_reserve_e820(u64 startt, u64 sizet, u64 align)
diff --git a/arch/x86/kernel/early_res.c b/arch/x86/kernel/early_res.c
new file mode 100644
index 0000000..3ae0051
--- /dev/null
+++ b/arch/x86/kernel/early_res.c
@@ -0,0 +1,539 @@
+/*
+ * early_res, could be used to replace bootmem
+ */
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/bootmem.h>
+#include <linux/mm.h>
+
+#include <asm/e820.h>
+#include <asm/early_res.h>
+#include <asm/proto.h>
+
+/*
+ * Early reserved memory areas.
+ */
+/*
+ * need to make sure this one is bigger enough before
+ * find_e820_area could be used
+ */
+#define MAX_EARLY_RES_X 32
+
+struct early_res {
+ u64 start, end;
+ char name[15];
+ char overlap_ok;
+};
+static struct early_res early_res_x[MAX_EARLY_RES_X] __initdata;
+
+static int max_early_res __initdata = MAX_EARLY_RES_X;
+static struct early_res *early_res __initdata = &early_res_x[0];
+static int early_res_count __initdata;
+
+static int __init find_overlapped_early(u64 start, u64 end)
+{
+ int i;
+ struct early_res *r;
+
+ for (i = 0; i < max_early_res && early_res[i].end; i++) {
+ r = &early_res[i];
+ if (end > r->start && start < r->end)
+ break;
+ }
+
+ return i;
+}
+
+/*
+ * Drop the i-th range from the early reservation map,
+ * by copying any higher ranges down one over it, and
+ * clearing what had been the last slot.
+ */
+static void __init drop_range(int i)
+{
+ int j;
+
+ for (j = i + 1; j < max_early_res && early_res[j].end; j++)
+ ;
+
+ memmove(&early_res[i], &early_res[i + 1],
+ (j - 1 - i) * sizeof(struct early_res));
+
+ early_res[j - 1].end = 0;
+ early_res_count--;
+}
+
+/*
+ * Split any existing ranges that:
+ * 1) are marked 'overlap_ok', and
+ * 2) overlap with the stated range [start, end)
+ * into whatever portion (if any) of the existing range is entirely
+ * below or entirely above the stated range. Drop the portion
+ * of the existing range that overlaps with the stated range,
+ * which will allow the caller of this routine to then add that
+ * stated range without conflicting with any existing range.
+ */
+static void __init drop_overlaps_that_are_ok(u64 start, u64 end)
+{
+ int i;
+ struct early_res *r;
+ u64 lower_start, lower_end;
+ u64 upper_start, upper_end;
+ char name[15];
+
+ for (i = 0; i < max_early_res && early_res[i].end; i++) {
+ r = &early_res[i];
+
+ /* Continue past non-overlapping ranges */
+ if (end <= r->start || start >= r->end)
+ continue;
+
+ /*
+ * Leave non-ok overlaps as is; let caller
+ * panic "Overlapping early reservations"
+ * when it hits this overlap.
+ */
+ if (!r->overlap_ok)
+ return;
+
+ /*
+ * We have an ok overlap. We will drop it from the early
+ * reservation map, and add back in any non-overlapping
+ * portions (lower or upper) as separate, overlap_ok,
+ * non-overlapping ranges.
+ */
+
+ /* 1. Note any non-overlapping (lower or upper) ranges. */
+ strncpy(name, r->name, sizeof(name) - 1);
+
+ lower_start = lower_end = 0;
+ upper_start = upper_end = 0;
+ if (r->start < start) {
+ lower_start = r->start;
+ lower_end = start;
+ }
+ if (r->end > end) {
+ upper_start = end;
+ upper_end = r->end;
+ }
+
+ /* 2. Drop the original ok overlapping range */
+ drop_range(i);
+
+ i--; /* resume for-loop on copied down entry */
+
+ /* 3. Add back in any non-overlapping ranges. */
+ if (lower_end)
+ reserve_early_overlap_ok(lower_start, lower_end, name);
+ if (upper_end)
+ reserve_early_overlap_ok(upper_start, upper_end, name);
+ }
+}
+
+static void __init __reserve_early(u64 start, u64 end, char *name,
+ int overlap_ok)
+{
+ int i;
+ struct early_res *r;
+
+ i = find_overlapped_early(start, end);
+ if (i >= max_early_res)
+ panic("Too many early reservations");
+ r = &early_res[i];
+ if (r->end)
+ panic("Overlapping early reservations "
+ "%llx-%llx %s to %llx-%llx %s\n",
+ start, end - 1, name ? name : "", r->start,
+ r->end - 1, r->name);
+ r->start = start;
+ r->end = end;
+ r->overlap_ok = overlap_ok;
+ if (name)
+ strncpy(r->name, name, sizeof(r->name) - 1);
+ early_res_count++;
+}
+
+/*
+ * A few early reservtations come here.
+ *
+ * The 'overlap_ok' in the name of this routine does -not- mean it
+ * is ok for these reservations to overlap an earlier reservation.
+ * Rather it means that it is ok for subsequent reservations to
+ * overlap this one.
+ *
+ * Use this entry point to reserve early ranges when you are doing
+ * so out of "Paranoia", reserving perhaps more memory than you need,
+ * just in case, and don't mind a subsequent overlapping reservation
+ * that is known to be needed.
+ *
+ * The drop_overlaps_that_are_ok() call here isn't really needed.
+ * It would be needed if we had two colliding 'overlap_ok'
+ * reservations, so that the second such would not panic on the
+ * overlap with the first. We don't have any such as of this
+ * writing, but might as well tolerate such if it happens in
+ * the future.
+ */
+void __init reserve_early_overlap_ok(u64 start, u64 end, char *name)
+{
+ drop_overlaps_that_are_ok(start, end);
+ __reserve_early(start, end, name, 1);
+}
+
+static void __init __check_and_double_early_res(u64 start)
+{
+ u64 end, size, mem;
+ struct early_res *new;
+
+ /* do we have enough slots left ? */
+ if ((max_early_res - early_res_count) > max(max_early_res/8, 2))
+ return;
+
+ /* double it */
+ end = max_pfn_mapped << PAGE_SHIFT;
+ size = sizeof(struct early_res) * max_early_res * 2;
+ mem = find_e820_area(start, end, size, sizeof(struct early_res));
+
+ if (mem == -1ULL)
+ panic("can not find more space for early_res array");
+
+ new = __va(mem);
+ /* save the first one for own */
+ new[0].start = mem;
+ new[0].end = mem + size;
+ new[0].overlap_ok = 0;
+ /* copy old to new */
+ if (early_res == early_res_x) {
+ memcpy(&new[1], &early_res[0],
+ sizeof(struct early_res) * max_early_res);
+ memset(&new[max_early_res+1], 0,
+ sizeof(struct early_res) * (max_early_res - 1));
+ early_res_count++;
+ } else {
+ memcpy(&new[1], &early_res[1],
+ sizeof(struct early_res) * (max_early_res - 1));
+ memset(&new[max_early_res], 0,
+ sizeof(struct early_res) * max_early_res);
+ }
+ memset(&early_res[0], 0, sizeof(struct early_res) * max_early_res);
+ early_res = new;
+ max_early_res *= 2;
+ printk(KERN_DEBUG "early_res array is doubled to %d at [%llx - %llx]\n",
+ max_early_res, mem, mem + size - 1);
+}
+
+/*
+ * Most early reservations come here.
+ *
+ * We first have drop_overlaps_that_are_ok() drop any pre-existing
+ * 'overlap_ok' ranges, so that we can then reserve this memory
+ * range without risk of panic'ing on an overlapping overlap_ok
+ * early reservation.
+ */
+void __init reserve_early(u64 start, u64 end, char *name)
+{
+ if (start >= end)
+ return;
+
+ __check_and_double_early_res(end);
+
+ drop_overlaps_that_are_ok(start, end);
+ __reserve_early(start, end, name, 0);
+}
+
+void __init reserve_early_without_check(u64 start, u64 end, char *name)
+{
+ struct early_res *r;
+
+ if (start >= end)
+ return;
+
+ __check_and_double_early_res(end);
+
+ r = &early_res[early_res_count];
+
+ r->start = start;
+ r->end = end;
+ r->overlap_ok = 0;
+ if (name)
+ strncpy(r->name, name, sizeof(r->name) - 1);
+ early_res_count++;
+}
+
+void __init free_early(u64 start, u64 end)
+{
+ struct early_res *r;
+ int i;
+
+ i = find_overlapped_early(start, end);
+ r = &early_res[i];
+ if (i >= max_early_res || r->end != end || r->start != start)
+ panic("free_early on not reserved area: %llx-%llx!",
+ start, end - 1);
+
+ drop_range(i);
+}
+
+#ifdef CONFIG_NO_BOOTMEM
+static void __init subtract_early_res(struct range *range, int az)
+{
+ int i, count;
+ u64 final_start, final_end;
+ int idx = 0;
+
+ count = 0;
+ for (i = 0; i < max_early_res && early_res[i].end; i++)
+ count++;
+
+ /* need to skip first one ?*/
+ if (early_res != early_res_x)
+ idx = 1;
+
+#define DEBUG_PRINT_EARLY_RES 1
+
+#if DEBUG_PRINT_EARLY_RES
+ printk(KERN_INFO "Subtract (%d early reservations)\n", count);
+#endif
+ for (i = idx; i < count; i++) {
+ struct early_res *r = &early_res[i];
+#if DEBUG_PRINT_EARLY_RES
+ printk(KERN_INFO " #%d [%010llx - %010llx] %15s\n", i,
+ r->start, r->end, r->name);
+#endif
+ final_start = PFN_DOWN(r->start);
+ final_end = PFN_UP(r->end);
+ if (final_start >= final_end)
+ continue;
+ subtract_range(range, az, final_start, final_end);
+ }
+
+}
+
+int __init get_free_all_memory_range(struct range **rangep, int nodeid)
+{
+ int i, count;
+ u64 start = 0, end;
+ u64 size;
+ u64 mem;
+ struct range *range;
+ int nr_range;
+
+ count = 0;
+ for (i = 0; i < max_early_res && early_res[i].end; i++)
+ count++;
+
+ count *= 2;
+
+ size = sizeof(struct range) * count;
+#ifdef MAX_DMA32_PFN
+ if (max_pfn_mapped > MAX_DMA32_PFN)
+ start = MAX_DMA32_PFN << PAGE_SHIFT;
+#endif
+ end = max_pfn_mapped << PAGE_SHIFT;
+ mem = find_e820_area(start, end, size, sizeof(struct range));
+ if (mem == -1ULL)
+ panic("can not find more space for range free");
+
+ range = __va(mem);
+ /* use early_node_map[] and early_res to get range array at first */
+ memset(range, 0, size);
+ nr_range = 0;
+
+ /* need to go over early_node_map to find out good range for node */
+ nr_range = add_from_early_node_map(range, count, nr_range, nodeid);
+ subtract_early_res(range, count);
+ nr_range = clean_sort_range(range, count);
+
+ /* need to clear it ? */
+ if (nodeid == MAX_NUMNODES) {
+ memset(&early_res[0], 0,
+ sizeof(struct early_res) * max_early_res);
+ early_res = NULL;
+ max_early_res = 0;
+ }
+
+ *rangep = range;
+ return nr_range;
+}
+#else
+void __init early_res_to_bootmem(u64 start, u64 end)
+{
+ int i, count;
+ u64 final_start, final_end;
+ int idx = 0;
+
+ count = 0;
+ for (i = 0; i < max_early_res && early_res[i].end; i++)
+ count++;
+
+ /* need to skip first one ?*/
+ if (early_res != early_res_x)
+ idx = 1;
+
+ printk(KERN_INFO "(%d/%d early reservations) ==> bootmem [%010llx - %010llx]\n",
+ count - idx, max_early_res, start, end);
+ for (i = idx; i < count; i++) {
+ struct early_res *r = &early_res[i];
+ printk(KERN_INFO " #%d [%010llx - %010llx] %16s", i,
+ r->start, r->end, r->name);
+ final_start = max(start, r->start);
+ final_end = min(end, r->end);
+ if (final_start >= final_end) {
+ printk(KERN_CONT "\n");
+ continue;
+ }
+ printk(KERN_CONT " ==> [%010llx - %010llx]\n",
+ final_start, final_end);
+ reserve_bootmem_generic(final_start, final_end - final_start,
+ BOOTMEM_DEFAULT);
+ }
+ /* clear them */
+ memset(&early_res[0], 0, sizeof(struct early_res) * max_early_res);
+ early_res = NULL;
+ max_early_res = 0;
+ early_res_count = 0;
+}
+#endif
+
+/* Check for already reserved areas */
+static inline int __init bad_addr(u64 *addrp, u64 size, u64 align)
+{
+ int i;
+ u64 addr = *addrp;
+ int changed = 0;
+ struct early_res *r;
+again:
+ i = find_overlapped_early(addr, addr + size);
+ r = &early_res[i];
+ if (i < max_early_res && r->end) {
+ *addrp = addr = round_up(r->end, align);
+ changed = 1;
+ goto again;
+ }
+ return changed;
+}
+
+/* Check for already reserved areas */
+static inline int __init bad_addr_size(u64 *addrp, u64 *sizep, u64 align)
+{
+ int i;
+ u64 addr = *addrp, last;
+ u64 size = *sizep;
+ int changed = 0;
+again:
+ last = addr + size;
+ for (i = 0; i < max_early_res && early_res[i].end; i++) {
+ struct early_res *r = &early_res[i];
+ if (last > r->start && addr < r->start) {
+ size = r->start - addr;
+ changed = 1;
+ goto again;
+ }
+ if (last > r->end && addr < r->end) {
+ addr = round_up(r->end, align);
+ size = last - addr;
+ changed = 1;
+ goto again;
+ }
+ if (last <= r->end && addr >= r->start) {
+ (*sizep)++;
+ return 0;
+ }
+ }
+ if (changed) {
+ *addrp = addr;
+ *sizep = size;
+ }
+ return changed;
+}
+
+/*
+ * Find a free area with specified alignment in a specific range.
+ * only with the area.between start to end is active range from early_node_map
+ * so they are good as RAM
+ */
+u64 __init find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
+ u64 size, u64 align)
+{
+ u64 addr, last;
+
+ addr = round_up(ei_start, align);
+ if (addr < start)
+ addr = round_up(start, align);
+ if (addr >= ei_last)
+ goto out;
+ while (bad_addr(&addr, size, align) && addr+size <= ei_last)
+ ;
+ last = addr + size;
+ if (last > ei_last)
+ goto out;
+ if (last > end)
+ goto out;
+
+ return addr;
+
+out:
+ return -1ULL;
+}
+
+/*
+ * Find a free area with specified alignment in a specific range.
+ */
+u64 __init find_e820_area(u64 start, u64 end, u64 size, u64 align)
+{
+ int i;
+
+ for (i = 0; i < e820.nr_map; i++) {
+ struct e820entry *ei = &e820.map[i];
+ u64 addr;
+ u64 ei_start, ei_last;
+
+ if (ei->type != E820_RAM)
+ continue;
+
+ ei_last = ei->addr + ei->size;
+ ei_start = ei->addr;
+ addr = find_early_area(ei_start, ei_last, start, end,
+ size, align);
+
+ if (addr == -1ULL)
+ continue;
+
+ return addr;
+ }
+ return -1ULL;
+}
+
+/*
+ * Find next free range after *start
+ */
+u64 __init find_e820_area_size(u64 start, u64 *sizep, u64 align)
+{
+ int i;
+
+ for (i = 0; i < e820.nr_map; i++) {
+ struct e820entry *ei = &e820.map[i];
+ u64 addr, last;
+ u64 ei_last;
+
+ if (ei->type != E820_RAM)
+ continue;
+ addr = round_up(ei->addr, align);
+ ei_last = ei->addr + ei->size;
+ if (addr < start)
+ addr = round_up(start, align);
+ if (addr >= ei_last)
+ continue;
+ *sizep = ei_last - addr;
+ while (bad_addr_size(&addr, sizep, align) &&
+ addr + *sizep <= ei_last)
+ ;
+ last = addr + *sizep;
+ if (last > ei_last)
+ continue;
+ return addr;
+ }
+
+ return -1ULL;
+}
+
--
1.6.4.2

2010-02-09 19:36:56

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 35/35] x86: use num_processors for possible cpus

some systems that have disable cpus entries because same
BIOS will support 2 sockets and 4 sockets and more at
same time, BIOS just leave some disable entries, but
those system do not support cpu hotplug. we don't need
treat disabled_cpus as hotplug cpus.
so we can make nr_cpu_ids smaller and save more space
(pcpu data allocations), and could make some systems run
with logical flat instead of physical flat apic mode

-v2: change to black list instead
-v3: just remove that, and the one use possible_cpus= directly.

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/kernel/smpboot.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index efdf81e..2a25e74 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1194,7 +1194,6 @@ early_param("possible_cpus", _setup_possible_cpus);
* - Ashok Raj
*
* Three ways to find out the number of additional hotplug CPUs:
- * - If the BIOS specified disabled CPUs in ACPI/mptables use that.
* - The user can overwrite it with possible_cpus=NUM
* - Otherwise don't reserve additional CPUs.
* We do this because additional CPUs waste a lot of memory.
@@ -1209,7 +1208,7 @@ __init void prefill_possible_map(void)
num_processors = 1;

if (setup_possible_cpus == -1)
- possible = num_processors + disabled_cpus;
+ possible = num_processors;
else
possible = setup_possible_cpus;

--
1.6.4.2

2010-02-09 19:38:23

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 08/35] x86: change range end to start+size

so make interface more consistent with early_res.
later we can share some code with early_res.

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/kernel/cpu/mtrr/cleanup.c | 32 ++++++++++++++++----------------
arch/x86/pci/amd_bus.c | 24 ++++++++++++++----------
kernel/range.c | 20 ++++++++++----------
3 files changed, 40 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
index 669da09..06130b5 100644
--- a/arch/x86/kernel/cpu/mtrr/cleanup.c
+++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
@@ -78,13 +78,13 @@ x86_get_mtrr_mem_range(struct range *range, int nr_range,
base = range_state[i].base_pfn;
size = range_state[i].size_pfn;
nr_range = add_range_with_merge(range, RANGE_NUM, nr_range,
- base, base + size - 1);
+ base, base + size);
}
if (debug_print) {
printk(KERN_DEBUG "After WB checking\n");
for (i = 0; i < nr_range; i++)
printk(KERN_DEBUG "MTRR MAP PFN: %016llx - %016llx\n",
- range[i].start, range[i].end + 1);
+ range[i].start, range[i].end);
}

/* Take out UC ranges: */
@@ -106,11 +106,11 @@ x86_get_mtrr_mem_range(struct range *range, int nr_range,
size -= (1<<(20-PAGE_SHIFT)) - base;
base = 1<<(20-PAGE_SHIFT);
}
- subtract_range(range, RANGE_NUM, base, base + size - 1);
+ subtract_range(range, RANGE_NUM, base, base + size);
}
if (extra_remove_size)
subtract_range(range, RANGE_NUM, extra_remove_base,
- extra_remove_base + extra_remove_size - 1);
+ extra_remove_base + extra_remove_size);

if (debug_print) {
printk(KERN_DEBUG "After UC checking\n");
@@ -118,7 +118,7 @@ x86_get_mtrr_mem_range(struct range *range, int nr_range,
if (!range[i].end)
continue;
printk(KERN_DEBUG "MTRR MAP PFN: %016llx - %016llx\n",
- range[i].start, range[i].end + 1);
+ range[i].start, range[i].end);
}
}

@@ -128,7 +128,7 @@ x86_get_mtrr_mem_range(struct range *range, int nr_range,
printk(KERN_DEBUG "After sorting\n");
for (i = 0; i < nr_range; i++)
printk(KERN_DEBUG "MTRR MAP PFN: %016llx - %016llx\n",
- range[i].start, range[i].end + 1);
+ range[i].start, range[i].end);
}

return nr_range;
@@ -142,7 +142,7 @@ static unsigned long __init sum_ranges(struct range *range, int nr_range)
int i;

for (i = 0; i < nr_range; i++)
- sum += range[i].end + 1 - range[i].start;
+ sum += range[i].end - range[i].start;

return sum;
}
@@ -489,7 +489,7 @@ x86_setup_var_mtrrs(struct range *range, int nr_range,
/* Write the range: */
for (i = 0; i < nr_range; i++) {
set_var_mtrr_range(&var_state, range[i].start,
- range[i].end - range[i].start + 1);
+ range[i].end - range[i].start);
}

/* Write the last range: */
@@ -720,7 +720,7 @@ int __init mtrr_cleanup(unsigned address_bits)
* and fixed mtrrs should take effect before var mtrr for it:
*/
nr_range = add_range_with_merge(range, RANGE_NUM, nr_range, 0,
- (1ULL<<(20 - PAGE_SHIFT)) - 1);
+ 1ULL<<(20 - PAGE_SHIFT));
/* Sort the ranges: */
sort_range(range, nr_range);

@@ -939,9 +939,9 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
nr_range = 0;
if (mtrr_tom2) {
range[nr_range].start = (1ULL<<(32 - PAGE_SHIFT));
- range[nr_range].end = (mtrr_tom2 >> PAGE_SHIFT) - 1;
- if (highest_pfn < range[nr_range].end + 1)
- highest_pfn = range[nr_range].end + 1;
+ range[nr_range].end = mtrr_tom2 >> PAGE_SHIFT;
+ if (highest_pfn < range[nr_range].end)
+ highest_pfn = range[nr_range].end;
nr_range++;
}
nr_range = x86_get_mtrr_mem_range(range, nr_range, 0, 0);
@@ -953,15 +953,15 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)

/* Check the holes: */
for (i = 0; i < nr_range - 1; i++) {
- if (range[i].end + 1 < range[i+1].start)
- total_trim_size += real_trim_memory(range[i].end + 1,
+ if (range[i].end < range[i+1].start)
+ total_trim_size += real_trim_memory(range[i].end,
range[i+1].start);
}

/* Check the top: */
i = nr_range - 1;
- if (range[i].end + 1 < end_pfn)
- total_trim_size += real_trim_memory(range[i].end + 1,
+ if (range[i].end < end_pfn)
+ total_trim_size += real_trim_memory(range[i].end,
end_pfn);

if (total_trim_size) {
diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
index badef12..1f4f84d 100644
--- a/arch/x86/pci/amd_bus.c
+++ b/arch/x86/pci/amd_bus.c
@@ -145,7 +145,7 @@ static int __init early_fill_mp_bus_info(void)
def_link = (reg >> 8) & 0x03;

memset(range, 0, sizeof(range));
- range[0].end = 0xffff;
+ add_range(range, RANGE_NUM, 0, 0, 0xffff + 1);
/* io port resource */
for (i = 0; i < 4; i++) {
reg = read_pci_config(bus, slot, 1, 0xc0 + (i << 3));
@@ -175,7 +175,7 @@ static int __init early_fill_mp_bus_info(void)
if (end > 0xffff)
end = 0xffff;
update_res(info, start, end, IORESOURCE_IO, 1);
- subtract_range(range, RANGE_NUM, start, end);
+ subtract_range(range, RANGE_NUM, start, end + 1);
}
/* add left over io port range to def node/link, [0, 0xffff] */
/* find the position */
@@ -190,14 +190,16 @@ static int __init early_fill_mp_bus_info(void)
if (!range[i].end)
continue;

- update_res(info, range[i].start, range[i].end,
+ update_res(info, range[i].start, range[i].end - 1,
IORESOURCE_IO, 1);
}
}

memset(range, 0, sizeof(range));
/* 0xfd00000000-0xffffffffff for HT */
- range[0].end = cap_resource((0xfdULL<<32) - 1);
+ end = cap_resource((0xfdULL<<32) - 1);
+ end++;
+ add_range(range, RANGE_NUM, 0, 0, end);

/* need to take out [0, TOM) for RAM*/
address = MSR_K8_TOP_MEM1;
@@ -205,14 +207,15 @@ static int __init early_fill_mp_bus_info(void)
end = (val & 0xffffff800000ULL);
printk(KERN_INFO "TOM: %016llx aka %lldM\n", end, end>>20);
if (end < (1ULL<<32))
- subtract_range(range, RANGE_NUM, 0, end - 1);
+ subtract_range(range, RANGE_NUM, 0, end);

/* get mmconfig */
get_pci_mmcfg_amd_fam10h_range();
/* need to take out mmconf range */
if (fam10h_mmconf_end) {
printk(KERN_DEBUG "Fam 10h mmconf [%llx, %llx]\n", fam10h_mmconf_start, fam10h_mmconf_end);
- subtract_range(range, RANGE_NUM, fam10h_mmconf_start, fam10h_mmconf_end);
+ subtract_range(range, RANGE_NUM, fam10h_mmconf_start,
+ fam10h_mmconf_end + 1);
}

/* mmio resource */
@@ -267,7 +270,8 @@ static int __init early_fill_mp_bus_info(void)
/* we got a hole */
endx = fam10h_mmconf_start - 1;
update_res(info, start, endx, IORESOURCE_MEM, 0);
- subtract_range(range, RANGE_NUM, start, endx);
+ subtract_range(range, RANGE_NUM, start,
+ endx + 1);
printk(KERN_CONT " ==> [%llx, %llx]", start, endx);
start = fam10h_mmconf_end + 1;
changed = 1;
@@ -284,7 +288,7 @@ static int __init early_fill_mp_bus_info(void)

update_res(info, cap_resource(start), cap_resource(end),
IORESOURCE_MEM, 1);
- subtract_range(range, RANGE_NUM, start, end);
+ subtract_range(range, RANGE_NUM, start, end + 1);
printk(KERN_CONT "\n");
}

@@ -299,7 +303,7 @@ static int __init early_fill_mp_bus_info(void)
rdmsrl(address, val);
end = (val & 0xffffff800000ULL);
printk(KERN_INFO "TOM2: %016llx aka %lldM\n", end, end>>20);
- subtract_range(range, RANGE_NUM, 1ULL<<32, end - 1);
+ subtract_range(range, RANGE_NUM, 1ULL<<32, end);
}

/*
@@ -319,7 +323,7 @@ static int __init early_fill_mp_bus_info(void)
continue;

update_res(info, cap_resource(range[i].start),
- cap_resource(range[i].end),
+ cap_resource(range[i].end - 1),
IORESOURCE_MEM, 1);
}
}
diff --git a/kernel/range.c b/kernel/range.c
index 71e0021..74e2e61 100644
--- a/kernel/range.c
+++ b/kernel/range.c
@@ -13,7 +13,7 @@

int add_range(struct range *range, int az, int nr_range, u64 start, u64 end)
{
- if (start > end)
+ if (start >= end)
return nr_range;

/* Out of slots: */
@@ -33,7 +33,7 @@ int add_range_with_merge(struct range *range, int az, int nr_range,
{
int i;

- if (start > end)
+ if (start >= end)
return nr_range;

/* Try to merge it with old one: */
@@ -46,7 +46,7 @@ int add_range_with_merge(struct range *range, int az, int nr_range,

common_start = max(range[i].start, start);
common_end = min(range[i].end, end);
- if (common_start > common_end + 1)
+ if (common_start > common_end)
continue;

final_start = min(range[i].start, start);
@@ -65,7 +65,7 @@ void subtract_range(struct range *range, int az, u64 start, u64 end)
{
int i, j;

- if (start > end)
+ if (start >= end)
return;

for (j = 0; j < az; j++) {
@@ -79,15 +79,15 @@ void subtract_range(struct range *range, int az, u64 start, u64 end)
}

if (start <= range[j].start && end < range[j].end &&
- range[j].start < end + 1) {
- range[j].start = end + 1;
+ range[j].start < end) {
+ range[j].start = end;
continue;
}


if (start > range[j].start && end >= range[j].end &&
- range[j].end > start - 1) {
- range[j].end = start - 1;
+ range[j].end > start) {
+ range[j].end = start;
continue;
}

@@ -99,11 +99,11 @@ void subtract_range(struct range *range, int az, u64 start, u64 end)
}
if (i < az) {
range[i].end = range[j].end;
- range[i].start = end + 1;
+ range[i].start = end;
} else {
printk(KERN_ERR "run of slot in ranges\n");
}
- range[j].end = start - 1;
+ range[j].end = start;
continue;
}
}
--
1.6.4.2

2010-02-09 19:39:40

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 12/35] x86: dynamic increase early_res array size

use early_res_count to track the num, and use find_e820 to get new buffer.
and copy from old to new one.

also clear early_res to prevent later invalid using

-v2 _check_and_double_early_res should take new start

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/kernel/e820.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 53 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index cf7f7d0..e13bbd0 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -916,6 +916,48 @@ void __init reserve_early_overlap_ok(u64 start, u64 end, char *name)
__reserve_early(start, end, name, 1);
}

+static void __init __check_and_double_early_res(u64 start)
+{
+ u64 end, size, mem;
+ struct early_res *new;
+
+ /* do we have enough slots left ? */
+ if ((max_early_res - early_res_count) > max(max_early_res/8, 2))
+ return;
+
+ /* double it */
+ end = max_pfn_mapped << PAGE_SHIFT;
+ size = sizeof(struct early_res) * max_early_res * 2;
+ mem = find_e820_area(start, end, size, sizeof(struct early_res));
+
+ if (mem == -1ULL)
+ panic("can not find more space for early_res array");
+
+ new = __va(mem);
+ /* save the first one for own */
+ new[0].start = mem;
+ new[0].end = mem + size;
+ new[0].overlap_ok = 0;
+ /* copy old to new */
+ if (early_res == early_res_x) {
+ memcpy(&new[1], &early_res[0],
+ sizeof(struct early_res) * max_early_res);
+ memset(&new[max_early_res+1], 0,
+ sizeof(struct early_res) * (max_early_res - 1));
+ early_res_count++;
+ } else {
+ memcpy(&new[1], &early_res[1],
+ sizeof(struct early_res) * (max_early_res - 1));
+ memset(&new[max_early_res], 0,
+ sizeof(struct early_res) * max_early_res);
+ }
+ memset(&early_res[0], 0, sizeof(struct early_res) * max_early_res);
+ early_res = new;
+ max_early_res *= 2;
+ printk(KERN_DEBUG "early_res array is doubled to %d at [%llx - %llx]\n",
+ max_early_res, mem, mem + size - 1);
+}
+
/*
* Most early reservations come here.
*
@@ -929,6 +971,8 @@ void __init reserve_early(u64 start, u64 end, char *name)
if (start >= end)
return;

+ __check_and_double_early_res(end);
+
drop_overlaps_that_are_ok(start, end);
__reserve_early(start, end, name, 0);
}
@@ -957,6 +1001,10 @@ void __init early_res_to_bootmem(u64 start, u64 end)
for (i = 0; i < max_early_res && early_res[i].end; i++)
count++;

+ /* need to skip first one ?*/
+ if (early_res != early_res_x)
+ idx = 1;
+
printk(KERN_INFO "(%d/%d early reservations) ==> bootmem [%010llx - %010llx]\n",
count - idx, max_early_res, start, end);
for (i = idx; i < count; i++) {
@@ -974,6 +1022,11 @@ void __init early_res_to_bootmem(u64 start, u64 end)
reserve_bootmem_generic(final_start, final_end - final_start,
BOOTMEM_DEFAULT);
}
+ /* clear them */
+ memset(&early_res[0], 0, sizeof(struct early_res) * max_early_res);
+ early_res = NULL;
+ max_early_res = 0;
+ early_res_count = 0;
}

/* Check for already reserved areas */
--
1.6.4.2

2010-02-09 19:38:33

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 05/35] x86/pci: use u64 instead of size_t in amd_bus.c

prepare to enable it for 32bit

-v2: remove not needed cast

Signed-off-by: Yinghai Lu <[email protected]>
Acked-by: Jesse Barnes <[email protected]>
---
arch/x86/pci/amd_bus.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
index e467071..e8bb553 100644
--- a/arch/x86/pci/amd_bus.c
+++ b/arch/x86/pci/amd_bus.c
@@ -82,8 +82,8 @@ static int __init early_fill_mp_bus_info(void)
struct pci_root_info *info;
u32 reg;
struct resource *res;
- size_t start;
- size_t end;
+ u64 start;
+ u64 end;
struct range range[RANGE_NUM];
u64 val;
u32 address;
@@ -173,7 +173,7 @@ static int __init early_fill_mp_bus_info(void)

info = &pci_root_info[j];
printk(KERN_DEBUG "node %d link %d: io port [%llx, %llx]\n",
- node, link, (u64)start, (u64)end);
+ node, link, start, end);

/* kernel only handle 16 bit only */
if (end > 0xffff)
@@ -207,7 +207,7 @@ static int __init early_fill_mp_bus_info(void)
address = MSR_K8_TOP_MEM1;
rdmsrl(address, val);
end = (val & 0xffffff800000ULL);
- printk(KERN_INFO "TOM: %016lx aka %ldM\n", end, end>>20);
+ printk(KERN_INFO "TOM: %016llx aka %lldM\n", end, end>>20);
if (end < (1ULL<<32))
subtract_range(range, RANGE_NUM, 0, end - 1);

@@ -246,7 +246,7 @@ static int __init early_fill_mp_bus_info(void)
info = &pci_root_info[j];

printk(KERN_DEBUG "node %d link %d: mmio [%llx, %llx]",
- node, link, (u64)start, (u64)end);
+ node, link, start, end);
/*
* some sick allocation would have range overlap with fam10h
* mmconf range, so need to update start and end.
@@ -272,13 +272,13 @@ static int __init early_fill_mp_bus_info(void)
endx = fam10h_mmconf_start - 1;
update_res(info, start, endx, IORESOURCE_MEM, 0);
subtract_range(range, RANGE_NUM, start, endx);
- printk(KERN_CONT " ==> [%llx, %llx]", (u64)start, endx);
+ printk(KERN_CONT " ==> [%llx, %llx]", start, endx);
start = fam10h_mmconf_end + 1;
changed = 1;
}
if (changed) {
if (start <= end) {
- printk(KERN_CONT " %s [%llx, %llx]", endx?"and":"==>", (u64)start, (u64)end);
+ printk(KERN_CONT " %s [%llx, %llx]", endx?"and":"==>", start, end);
} else {
printk(KERN_CONT "%s\n", endx?"":" ==> none");
continue;
@@ -301,7 +301,7 @@ static int __init early_fill_mp_bus_info(void)
address = MSR_K8_TOP_MEM2;
rdmsrl(address, val);
end = (val & 0xffffff800000ULL);
- printk(KERN_INFO "TOM2: %016lx aka %ldM\n", end, end>>20);
+ printk(KERN_INFO "TOM2: %016llx aka %lldM\n", end, end>>20);
subtract_range(range, RANGE_NUM, 1ULL<<32, end - 1);
}

--
1.6.4.2

2010-02-09 19:39:36

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 17/35] sparsemem: put mem map for one node together.

add vmemmap_alloc_block_buf for mem map only.

it will fallback old wayif can not get that big.

before this patch, when node have 128g ram installed, memmap are split into
two parts or more.
[ 0.000000] [ffffea0000000000-ffffea003fffffff] PMD -> [ffff880100600000-ffff88013e9fffff] on node 1
[ 0.000000] [ffffea0040000000-ffffea006fffffff] PMD -> [ffff88013ec00000-ffff88016ebfffff] on node 1
[ 0.000000] [ffffea0070000000-ffffea007fffffff] PMD -> [ffff882000600000-ffff8820105fffff] on node 0
[ 0.000000] [ffffea0080000000-ffffea00bfffffff] PMD -> [ffff882010800000-ffff8820507fffff] on node 0
[ 0.000000] [ffffea00c0000000-ffffea00dfffffff] PMD -> [ffff882050a00000-ffff8820709fffff] on node 0
[ 0.000000] [ffffea00e0000000-ffffea00ffffffff] PMD -> [ffff884000600000-ffff8840205fffff] on node 2
[ 0.000000] [ffffea0100000000-ffffea013fffffff] PMD -> [ffff884020800000-ffff8840607fffff] on node 2
[ 0.000000] [ffffea0140000000-ffffea014fffffff] PMD -> [ffff884060a00000-ffff8840709fffff] on node 2
[ 0.000000] [ffffea0150000000-ffffea017fffffff] PMD -> [ffff886000600000-ffff8860305fffff] on node 3
[ 0.000000] [ffffea0180000000-ffffea01bfffffff] PMD -> [ffff886030800000-ffff8860707fffff] on node 3
[ 0.000000] [ffffea01c0000000-ffffea01ffffffff] PMD -> [ffff888000600000-ffff8880405fffff] on node 4
[ 0.000000] [ffffea0200000000-ffffea022fffffff] PMD -> [ffff888040800000-ffff8880707fffff] on node 4
[ 0.000000] [ffffea0230000000-ffffea023fffffff] PMD -> [ffff88a000600000-ffff88a0105fffff] on node 5
[ 0.000000] [ffffea0240000000-ffffea027fffffff] PMD -> [ffff88a010800000-ffff88a0507fffff] on node 5
[ 0.000000] [ffffea0280000000-ffffea029fffffff] PMD -> [ffff88a050a00000-ffff88a0709fffff] on node 5
[ 0.000000] [ffffea02a0000000-ffffea02bfffffff] PMD -> [ffff88c000600000-ffff88c0205fffff] on node 6
[ 0.000000] [ffffea02c0000000-ffffea02ffffffff] PMD -> [ffff88c020800000-ffff88c0607fffff] on node 6
[ 0.000000] [ffffea0300000000-ffffea030fffffff] PMD -> [ffff88c060a00000-ffff88c0709fffff] on node 6
[ 0.000000] [ffffea0310000000-ffffea033fffffff] PMD -> [ffff88e000600000-ffff88e0305fffff] on node 7
[ 0.000000] [ffffea0340000000-ffffea037fffffff] PMD -> [ffff88e030800000-ffff88e0707fffff] on node 7

after patch will get
[ 0.000000] [ffffea0000000000-ffffea006fffffff] PMD -> [ffff880100200000-ffff88016e5fffff] on node 0
[ 0.000000] [ffffea0070000000-ffffea00dfffffff] PMD -> [ffff882000200000-ffff8820701fffff] on node 1
[ 0.000000] [ffffea00e0000000-ffffea014fffffff] PMD -> [ffff884000200000-ffff8840701fffff] on node 2
[ 0.000000] [ffffea0150000000-ffffea01bfffffff] PMD -> [ffff886000200000-ffff8860701fffff] on node 3
[ 0.000000] [ffffea01c0000000-ffffea022fffffff] PMD -> [ffff888000200000-ffff8880701fffff] on node 4
[ 0.000000] [ffffea0230000000-ffffea029fffffff] PMD -> [ffff88a000200000-ffff88a0701fffff] on node 5
[ 0.000000] [ffffea02a0000000-ffffea030fffffff] PMD -> [ffff88c000200000-ffff88c0701fffff] on node 6
[ 0.000000] [ffffea0310000000-ffffea037fffffff] PMD -> [ffff88e000200000-ffff88e0701fffff] on node 7


-v2: change buf to vmemmap_buf instead according to Ingo
also add CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER according to Ingo

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Christoph Lameter <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
---
arch/x86/mm/init_64.c | 2 +-
include/linux/mm.h | 7 +++
mm/Kconfig | 4 ++
mm/sparse-vmemmap.c | 75 ++++++++++++++++++++++++++++++++-
mm/sparse.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++++-
5 files changed, 196 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index fd9a004..8f197ce 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -976,7 +976,7 @@ vmemmap_populate(struct page *start_page, unsigned long size, int node)
if (pmd_none(*pmd)) {
pte_t entry;

- p = vmemmap_alloc_block(PMD_SIZE, node);
+ p = vmemmap_alloc_block_buf(PMD_SIZE, node);
if (!p)
return -ENOMEM;

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f2c5b3c..f6002e5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1326,12 +1326,19 @@ extern int randomize_va_space;
const char * arch_vma_name(struct vm_area_struct *vma);
void print_vma_addr(char *prefix, unsigned long rip);

+void sparse_mem_maps_populate_node(struct page **map_map,
+ unsigned long pnum_begin,
+ unsigned long pnum_end,
+ unsigned long map_count,
+ int nodeid);
+
struct page *sparse_mem_map_populate(unsigned long pnum, int nid);
pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
pud_t *vmemmap_pud_populate(pgd_t *pgd, unsigned long addr, int node);
pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node);
pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node);
void *vmemmap_alloc_block(unsigned long size, int node);
+void *vmemmap_alloc_block_buf(unsigned long size, int node);
void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
int vmemmap_populate_basepages(struct page *start_page,
unsigned long pages, int node);
diff --git a/mm/Kconfig b/mm/Kconfig
index 28212b8..d84cdab 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -115,6 +115,10 @@ config SPARSEMEM_EXTREME
config SPARSEMEM_VMEMMAP_ENABLE
bool

+config SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
+ def_bool y
+ depends on SPARSEMEM && X86_64
+
config SPARSEMEM_VMEMMAP
bool "Sparse Memory virtual memmap"
depends on SPARSEMEM && SPARSEMEM_VMEMMAP_ENABLE
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 9506c39..6b4be75 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -43,6 +43,8 @@ static void * __init_refok __earlyonly_bootmem_alloc(int node,
return __alloc_bootmem_node_high(NODE_DATA(node), size, align, goal);
}

+static void *vmemmap_buf;
+static void *vmemmap_buf_end;

void * __meminit vmemmap_alloc_block(unsigned long size, int node)
{
@@ -64,6 +66,24 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node)
__pa(MAX_DMA_ADDRESS));
}

+/* need to make sure size is all the same during early stage */
+void * __meminit vmemmap_alloc_block_buf(unsigned long size, int node)
+{
+ void *ptr;
+
+ if (!vmemmap_buf)
+ return vmemmap_alloc_block(size, node);
+
+ /* take the from buf */
+ ptr = (void *)ALIGN((unsigned long)vmemmap_buf, size);
+ if (ptr + size > vmemmap_buf_end)
+ return vmemmap_alloc_block(size, node);
+
+ vmemmap_buf = ptr + size;
+
+ return ptr;
+}
+
void __meminit vmemmap_verify(pte_t *pte, int node,
unsigned long start, unsigned long end)
{
@@ -80,7 +100,7 @@ pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node)
pte_t *pte = pte_offset_kernel(pmd, addr);
if (pte_none(*pte)) {
pte_t entry;
- void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+ void *p = vmemmap_alloc_block_buf(PAGE_SIZE, node);
if (!p)
return NULL;
entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
@@ -163,3 +183,56 @@ struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid)

return map;
}
+
+void __init sparse_mem_maps_populate_node(struct page **map_map,
+ unsigned long pnum_begin,
+ unsigned long pnum_end,
+ unsigned long map_count, int nodeid)
+{
+ unsigned long pnum;
+ unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
+ void *vmemmap_buf_start;
+
+ size = ALIGN(size, PMD_SIZE);
+ vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
+ PMD_SIZE, __pa(MAX_DMA_ADDRESS));
+
+ if (vmemmap_buf_start) {
+ vmemmap_buf = vmemmap_buf_start;
+ vmemmap_buf_end = vmemmap_buf_start + size * map_count;
+ }
+
+ for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
+ struct mem_section *ms;
+
+ if (!present_section_nr(pnum))
+ continue;
+
+ map_map[pnum] = sparse_mem_map_populate(pnum, nodeid);
+ if (map_map[pnum])
+ continue;
+ ms = __nr_to_section(pnum);
+ printk(KERN_ERR "%s: sparsemem memory map backing failed "
+ "some memory will not be available.\n", __func__);
+ ms->section_mem_map = 0;
+ }
+
+ if (vmemmap_buf_start) {
+ /* need to free left buf */
+#ifdef CONFIG_NO_BOOTMEM
+ free_early(__pa(vmemmap_buf_start), __pa(vmemmap_buf_end));
+ if (vmemmap_buf_start < vmemmap_buf) {
+ char name[15];
+
+ memset(name, 0, sizeof(name));
+ snprintf(name, 15, "MEMMAP %d", nodeid);
+ reserve_early_without_check(__pa(vmemmap_buf_start),
+ __pa(vmemmap_buf), name);
+ }
+#else
+ free_bootmem(__pa(vmemmap_buf), vmemmap_buf_end - vmemmap_buf);
+#endif
+ vmemmap_buf = NULL;
+ vmemmap_buf_end = NULL;
+ }
+}
diff --git a/mm/sparse.c b/mm/sparse.c
index 0cdaf0b..9b6b93a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -390,8 +390,65 @@ struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid)
PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION));
return map;
}
+void __init sparse_mem_maps_populate_node(struct page **map_map,
+ unsigned long pnum_begin,
+ unsigned long pnum_end,
+ unsigned long map_count, int nodeid)
+{
+ void *map;
+ unsigned long pnum;
+ unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
+
+ map = alloc_remap(nodeid, size * map_count);
+ if (map) {
+ for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
+ if (!present_section_nr(pnum))
+ continue;
+ map_map[pnum] = map;
+ map += size;
+ }
+ return;
+ }
+
+ size = PAGE_ALIGN(size);
+ map = alloc_bootmem_pages_node(NODE_DATA(nodeid), size * map_count);
+ if (map) {
+ for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
+ if (!present_section_nr(pnum))
+ continue;
+ map_map[pnum] = map;
+ map += size;
+ }
+ return;
+ }
+
+ /* fallback */
+ for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
+ struct mem_section *ms;
+
+ if (!present_section_nr(pnum))
+ continue;
+ map_map[pnum] = sparse_mem_map_populate(pnum, nodeid);
+ if (map_map[pnum])
+ continue;
+ ms = __nr_to_section(pnum);
+ printk(KERN_ERR "%s: sparsemem memory map backing failed "
+ "some memory will not be available.\n", __func__);
+ ms->section_mem_map = 0;
+ }
+}
#endif /* !CONFIG_SPARSEMEM_VMEMMAP */

+static void __init sparse_early_mem_maps_alloc_node(struct page **map_map,
+ unsigned long pnum_begin,
+ unsigned long pnum_end,
+ unsigned long map_count, int nodeid)
+{
+ sparse_mem_maps_populate_node(map_map, pnum_begin, pnum_end,
+ map_count, nodeid);
+}
+
+#ifndef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
{
struct page *map;
@@ -407,6 +464,7 @@ static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
ms->section_mem_map = 0;
return NULL;
}
+#endif

void __attribute__((weak)) __meminit vmemmap_populate_print_last(void)
{
@@ -420,12 +478,14 @@ void __init sparse_init(void)
{
unsigned long pnum;
struct page *map;
+ struct page **map_map;
unsigned long *usemap;
unsigned long **usemap_map;
- int size;
+ int size, size2;
int nodeid_begin = 0;
unsigned long pnum_begin = 0;
unsigned long usemap_count;
+ unsigned long map_count;

/*
* map is using big page (aka 2M in x86 64 bit)
@@ -478,6 +538,48 @@ void __init sparse_init(void)
sparse_early_usemaps_alloc_node(usemap_map, pnum_begin, NR_MEM_SECTIONS,
usemap_count, nodeid_begin);

+#ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
+ size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
+ map_map = alloc_bootmem(size2);
+ if (!map_map)
+ panic("can not allocate map_map\n");
+
+ for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
+ struct mem_section *ms;
+
+ if (!present_section_nr(pnum))
+ continue;
+ ms = __nr_to_section(pnum);
+ nodeid_begin = sparse_early_nid(ms);
+ pnum_begin = pnum;
+ break;
+ }
+ map_count = 1;
+ for (pnum = pnum_begin + 1; pnum < NR_MEM_SECTIONS; pnum++) {
+ struct mem_section *ms;
+ int nodeid;
+
+ if (!present_section_nr(pnum))
+ continue;
+ ms = __nr_to_section(pnum);
+ nodeid = sparse_early_nid(ms);
+ if (nodeid == nodeid_begin) {
+ map_count++;
+ continue;
+ }
+ /* ok, we need to take cake of from pnum_begin to pnum - 1*/
+ sparse_early_mem_maps_alloc_node(map_map, pnum_begin, pnum,
+ map_count, nodeid_begin);
+ /* new start, update count etc*/
+ nodeid_begin = nodeid;
+ pnum_begin = pnum;
+ map_count = 1;
+ }
+ /* ok, last chunk */
+ sparse_early_mem_maps_alloc_node(map_map, pnum_begin, NR_MEM_SECTIONS,
+ map_count, nodeid_begin);
+#endif
+
for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
if (!present_section_nr(pnum))
continue;
@@ -486,7 +588,11 @@ void __init sparse_init(void)
if (!usemap)
continue;

+#ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
+ map = map_map[pnum];
+#else
map = sparse_early_mem_map_alloc(pnum);
+#endif
if (!map)
continue;

@@ -496,6 +602,9 @@ void __init sparse_init(void)

vmemmap_populate_print_last();

+#ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
+ free_bootmem(__pa(map_map), size2);
+#endif
free_bootmem(__pa(usemap_map), size);
}

--
1.6.4.2

2010-02-09 19:39:44

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 11/35] x86: introduce max_early_res and early_res_count

to prepare allocate early res array from fine_e820_area

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/kernel/e820.c | 47 ++++++++++++++++++++++++++++++++---------------
1 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index dfb1689..cf7f7d0 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -732,14 +732,18 @@ core_initcall(e820_mark_nvs_memory);
/*
* Early reserved memory areas.
*/
-#define MAX_EARLY_RES 32
+/*
+ * need to make sure this one is bigger enough before
+ * find_e820_area could be used
+ */
+#define MAX_EARLY_RES_X 32

struct early_res {
u64 start, end;
- char name[16];
+ char name[15];
char overlap_ok;
};
-static struct early_res early_res[MAX_EARLY_RES] __initdata = {
+static struct early_res early_res_x[MAX_EARLY_RES_X] __initdata = {
{ 0, PAGE_SIZE, "BIOS data page", 1 }, /* BIOS data page */
#if defined(CONFIG_X86_32) && defined(CONFIG_X86_TRAMPOLINE)
/*
@@ -753,12 +757,22 @@ static struct early_res early_res[MAX_EARLY_RES] __initdata = {
{}
};

+static int max_early_res __initdata = MAX_EARLY_RES_X;
+static struct early_res *early_res __initdata = &early_res_x[0];
+static int early_res_count __initdata =
+#ifdef CONFIG_X86_32
+ 2
+#else
+ 1
+#endif
+ ;
+
static int __init find_overlapped_early(u64 start, u64 end)
{
int i;
struct early_res *r;

- for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) {
+ for (i = 0; i < max_early_res && early_res[i].end; i++) {
r = &early_res[i];
if (end > r->start && start < r->end)
break;
@@ -776,13 +790,14 @@ static void __init drop_range(int i)
{
int j;

- for (j = i + 1; j < MAX_EARLY_RES && early_res[j].end; j++)
+ for (j = i + 1; j < max_early_res && early_res[j].end; j++)
;

memmove(&early_res[i], &early_res[i + 1],
(j - 1 - i) * sizeof(struct early_res));

early_res[j - 1].end = 0;
+ early_res_count--;
}

/*
@@ -801,9 +816,9 @@ static void __init drop_overlaps_that_are_ok(u64 start, u64 end)
struct early_res *r;
u64 lower_start, lower_end;
u64 upper_start, upper_end;
- char name[16];
+ char name[15];

- for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) {
+ for (i = 0; i < max_early_res && early_res[i].end; i++) {
r = &early_res[i];

/* Continue past non-overlapping ranges */
@@ -859,7 +874,7 @@ static void __init __reserve_early(u64 start, u64 end, char *name,
struct early_res *r;

i = find_overlapped_early(start, end);
- if (i >= MAX_EARLY_RES)
+ if (i >= max_early_res)
panic("Too many early reservations");
r = &early_res[i];
if (r->end)
@@ -872,6 +887,7 @@ static void __init __reserve_early(u64 start, u64 end, char *name,
r->overlap_ok = overlap_ok;
if (name)
strncpy(r->name, name, sizeof(r->name) - 1);
+ early_res_count++;
}

/*
@@ -924,7 +940,7 @@ void __init free_early(u64 start, u64 end)

i = find_overlapped_early(start, end);
r = &early_res[i];
- if (i >= MAX_EARLY_RES || r->end != end || r->start != start)
+ if (i >= max_early_res || r->end != end || r->start != start)
panic("free_early on not reserved area: %llx-%llx!",
start, end - 1);

@@ -935,14 +951,15 @@ void __init early_res_to_bootmem(u64 start, u64 end)
{
int i, count;
u64 final_start, final_end;
+ int idx = 0;

count = 0;
- for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++)
+ for (i = 0; i < max_early_res && early_res[i].end; i++)
count++;

- printk(KERN_INFO "(%d early reservations) ==> bootmem [%010llx - %010llx]\n",
- count, start, end);
- for (i = 0; i < count; i++) {
+ printk(KERN_INFO "(%d/%d early reservations) ==> bootmem [%010llx - %010llx]\n",
+ count - idx, max_early_res, start, end);
+ for (i = idx; i < count; i++) {
struct early_res *r = &early_res[i];
printk(KERN_INFO " #%d [%010llx - %010llx] %16s", i,
r->start, r->end, r->name);
@@ -969,7 +986,7 @@ static inline int __init bad_addr(u64 *addrp, u64 size, u64 align)
again:
i = find_overlapped_early(addr, addr + size);
r = &early_res[i];
- if (i < MAX_EARLY_RES && r->end) {
+ if (i < max_early_res && r->end) {
*addrp = addr = round_up(r->end, align);
changed = 1;
goto again;
@@ -986,7 +1003,7 @@ static inline int __init bad_addr_size(u64 *addrp, u64 *sizep, u64 align)
int changed = 0;
again:
last = addr + size;
- for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) {
+ for (i = 0; i < max_early_res && early_res[i].end; i++) {
struct early_res *r = &early_res[i];
if (last > r->start && addr < r->start) {
size = r->start - addr;
--
1.6.4.2

2010-02-09 19:38:36

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 06/35] x86/pci: add cap_resource

prepare for 32bit pci root bus

-v2: hpa said we should compare with (resource_size_t)~0
-v3: according to Linus to use MAX_RESOURCE instead.
also need need to put related patches together

Signed-off-by: Yinghai Lu <[email protected]>
Acked-by: Jesse Barnes <[email protected]>
---
arch/x86/pci/amd_bus.c | 8 +++++---
arch/x86/pci/bus_numa.c | 4 ++++
include/linux/range.h | 8 ++++++++
3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
index e8bb553..9bcb6fe 100644
--- a/arch/x86/pci/amd_bus.c
+++ b/arch/x86/pci/amd_bus.c
@@ -201,7 +201,7 @@ static int __init early_fill_mp_bus_info(void)

memset(range, 0, sizeof(range));
/* 0xfd00000000-0xffffffffff for HT */
- range[0].end = (0xfdULL<<32) - 1;
+ range[0].end = cap_resource((0xfdULL<<32) - 1);

/* need to take out [0, TOM) for RAM*/
address = MSR_K8_TOP_MEM1;
@@ -286,7 +286,8 @@ static int __init early_fill_mp_bus_info(void)
}
}

- update_res(info, start, end, IORESOURCE_MEM, 1);
+ update_res(info, cap_resource(start), cap_resource(end),
+ IORESOURCE_MEM, 1);
subtract_range(range, RANGE_NUM, start, end);
printk(KERN_CONT "\n");
}
@@ -321,7 +322,8 @@ static int __init early_fill_mp_bus_info(void)
if (!range[i].end)
continue;

- update_res(info, range[i].start, range[i].end,
+ update_res(info, cap_resource(range[i].start),
+ cap_resource(range[i].end),
IORESOURCE_MEM, 1);
}
}
diff --git a/arch/x86/pci/bus_numa.c b/arch/x86/pci/bus_numa.c
index 3411687..3510778 100644
--- a/arch/x86/pci/bus_numa.c
+++ b/arch/x86/pci/bus_numa.c
@@ -1,5 +1,6 @@
#include <linux/init.h>
#include <linux/pci.h>
+#include <linux/range.h>

#include "bus_numa.h"

@@ -55,6 +56,9 @@ void __devinit update_res(struct pci_root_info *info, resource_size_t start,
if (start > end)
return;

+ if (start == MAX_RESOURCE)
+ return;
+
if (!merge)
goto addit;

diff --git a/include/linux/range.h b/include/linux/range.h
index 0789f14..44c623e 100644
--- a/include/linux/range.h
+++ b/include/linux/range.h
@@ -19,4 +19,12 @@ int clean_sort_range(struct range *range, int az);

void sort_range(struct range *range, int nr_range);

+#define MAX_RESOURCE ((resource_size_t)~0)
+static inline resource_size_t cap_resource(u64 val)
+{
+ if (val > MAX_RESOURCE)
+ return MAX_RESOURCE;
+ else
+ return val;
+}
#endif
--
1.6.4.2

2010-02-09 19:39:34

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 20/35] x86: add find_early_area_size

prepare to move bck find_e820_area_size back to e820.c

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/kernel/early_res.c | 45 ++++++++++++++++++++++++++++++------------
1 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/early_res.c b/arch/x86/kernel/early_res.c
index 3ae0051..aba02f2 100644
--- a/arch/x86/kernel/early_res.c
+++ b/arch/x86/kernel/early_res.c
@@ -476,6 +476,29 @@ out:
return -1ULL;
}

+u64 __init find_early_area_size(u64 ei_start, u64 ei_last, u64 start,
+ u64 *sizep, u64 align)
+{
+ u64 addr, last;
+
+ addr = round_up(ei_start, align);
+ if (addr < start)
+ addr = round_up(start, align);
+ if (addr >= ei_last)
+ goto out;
+ *sizep = ei_last - addr;
+ while (bad_addr_size(&addr, sizep, align) && addr + *sizep <= ei_last)
+ ;
+ last = addr + *sizep;
+ if (last > ei_last)
+ goto out;
+
+ return addr;
+
+out:
+ return -1ULL;
+}
+
/*
* Find a free area with specified alignment in a specific range.
*/
@@ -513,24 +536,20 @@ u64 __init find_e820_area_size(u64 start, u64 *sizep, u64 align)

for (i = 0; i < e820.nr_map; i++) {
struct e820entry *ei = &e820.map[i];
- u64 addr, last;
- u64 ei_last;
+ u64 addr;
+ u64 ei_start, ei_last;

if (ei->type != E820_RAM)
continue;
- addr = round_up(ei->addr, align);
+
ei_last = ei->addr + ei->size;
- if (addr < start)
- addr = round_up(start, align);
- if (addr >= ei_last)
- continue;
- *sizep = ei_last - addr;
- while (bad_addr_size(&addr, sizep, align) &&
- addr + *sizep <= ei_last)
- ;
- last = addr + *sizep;
- if (last > ei_last)
+ ei_start = ei->addr;
+ addr = find_early_area_size(ei_start, ei_last, start,
+ sizep, align);
+
+ if (addr == -1ULL)
continue;
+
return addr;
}

--
1.6.4.2

2010-02-09 19:39:47

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 21/35] x86: move back find_e820_area to e820.c

make early_res.c more clean, so later could move it to /kernel

Signed-off: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/e820.h | 2 +
arch/x86/include/asm/early_res.h | 4 +-
arch/x86/kernel/e820.c | 57 ++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/early_res.c | 56 -------------------------------------
4 files changed, 61 insertions(+), 58 deletions(-)

diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index efad699..a8299e1 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -109,6 +109,8 @@ static inline void early_memtest(unsigned long start, unsigned long end)

extern unsigned long end_user_pfn;

+extern u64 find_e820_area(u64 start, u64 end, u64 size, u64 align);
+extern u64 find_e820_area_size(u64 start, u64 *sizep, u64 align);
extern u64 early_reserve_e820(u64 startt, u64 sizet, u64 align);
#include <asm/early_res.h>

diff --git a/arch/x86/include/asm/early_res.h b/arch/x86/include/asm/early_res.h
index 2d43b16..5a4d2eb 100644
--- a/arch/x86/include/asm/early_res.h
+++ b/arch/x86/include/asm/early_res.h
@@ -2,8 +2,6 @@
#define _ASM_X86_EARLY_RES_H
#ifdef __KERNEL__

-extern u64 find_e820_area(u64 start, u64 end, u64 size, u64 align);
-extern u64 find_e820_area_size(u64 start, u64 *sizep, u64 align);
extern void reserve_early(u64 start, u64 end, char *name);
extern void reserve_early_overlap_ok(u64 start, u64 end, char *name);
extern void free_early(u64 start, u64 end);
@@ -12,6 +10,8 @@ extern void early_res_to_bootmem(u64 start, u64 end);
void reserve_early_without_check(u64 start, u64 end, char *name);
u64 find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
u64 size, u64 align);
+u64 find_early_area_size(u64 ei_start, u64 ei_last, u64 start,
+ u64 *sizep, u64 align);
#include <linux/range.h>
int get_free_all_memory_range(struct range **rangep, int nodeid);

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 096bb31..86cbbd0 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -723,6 +723,63 @@ core_initcall(e820_mark_nvs_memory);
#endif

/*
+ * Find a free area with specified alignment in a specific range.
+ */
+u64 __init find_e820_area(u64 start, u64 end, u64 size, u64 align)
+{
+ int i;
+
+ for (i = 0; i < e820.nr_map; i++) {
+ struct e820entry *ei = &e820.map[i];
+ u64 addr;
+ u64 ei_start, ei_last;
+
+ if (ei->type != E820_RAM)
+ continue;
+
+ ei_last = ei->addr + ei->size;
+ ei_start = ei->addr;
+ addr = find_early_area(ei_start, ei_last, start, end,
+ size, align);
+
+ if (addr == -1ULL)
+ continue;
+
+ return addr;
+ }
+ return -1ULL;
+}
+
+/*
+ * Find next free range after *start
+ */
+u64 __init find_e820_area_size(u64 start, u64 *sizep, u64 align)
+{
+ int i;
+
+ for (i = 0; i < e820.nr_map; i++) {
+ struct e820entry *ei = &e820.map[i];
+ u64 addr;
+ u64 ei_start, ei_last;
+
+ if (ei->type != E820_RAM)
+ continue;
+
+ ei_last = ei->addr + ei->size;
+ ei_start = ei->addr;
+ addr = find_early_area_size(ei_start, ei_last, start,
+ sizep, align);
+
+ if (addr == -1ULL)
+ continue;
+
+ return addr;
+ }
+
+ return -1ULL;
+}
+
+/*
* pre allocated 4k and reserved it in e820
*/
u64 __init early_reserve_e820(u64 startt, u64 sizet, u64 align)
diff --git a/arch/x86/kernel/early_res.c b/arch/x86/kernel/early_res.c
index aba02f2..d0a70cc 100644
--- a/arch/x86/kernel/early_res.c
+++ b/arch/x86/kernel/early_res.c
@@ -499,60 +499,4 @@ out:
return -1ULL;
}

-/*
- * Find a free area with specified alignment in a specific range.
- */
-u64 __init find_e820_area(u64 start, u64 end, u64 size, u64 align)
-{
- int i;
-
- for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *ei = &e820.map[i];
- u64 addr;
- u64 ei_start, ei_last;
-
- if (ei->type != E820_RAM)
- continue;
-
- ei_last = ei->addr + ei->size;
- ei_start = ei->addr;
- addr = find_early_area(ei_start, ei_last, start, end,
- size, align);
-
- if (addr == -1ULL)
- continue;
-
- return addr;
- }
- return -1ULL;
-}
-
-/*
- * Find next free range after *start
- */
-u64 __init find_e820_area_size(u64 start, u64 *sizep, u64 align)
-{
- int i;
-
- for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *ei = &e820.map[i];
- u64 addr;
- u64 ei_start, ei_last;
-
- if (ei->type != E820_RAM)
- continue;
-
- ei_last = ei->addr + ei->size;
- ei_start = ei->addr;
- addr = find_early_area_size(ei_start, ei_last, start,
- sizep, align);
-
- if (addr == -1ULL)
- continue;
-
- return addr;
- }
-
- return -1ULL;
-}

--
1.6.4.2

2010-02-09 19:38:39

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 25/35] x86: add find_fw_memmap_area

so could move early_res up

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/early_res.h | 1 +
arch/x86/kernel/e820.c | 4 ++++
arch/x86/kernel/early_res.c | 17 +++++++++++------
3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/early_res.h b/arch/x86/include/asm/early_res.h
index 5a4d2eb..9758f3d 100644
--- a/arch/x86/include/asm/early_res.h
+++ b/arch/x86/include/asm/early_res.h
@@ -12,6 +12,7 @@ u64 find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
u64 size, u64 align);
u64 find_early_area_size(u64 ei_start, u64 ei_last, u64 start,
u64 *sizep, u64 align);
+u64 find_fw_memmap_area(u64 start, u64 end, u64 size, u64 align);
#include <linux/range.h>
int get_free_all_memory_range(struct range **rangep, int nodeid);

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 86cbbd0..2c3b090 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -750,6 +750,10 @@ u64 __init find_e820_area(u64 start, u64 end, u64 size, u64 align)
return -1ULL;
}

+u64 __init find_fw_memmap_area(u64 start, u64 end, u64 size, u64 align)
+{
+ return find_e820_area(start, end, size, align);
+}
/*
* Find next free range after *start
*/
diff --git a/arch/x86/kernel/early_res.c b/arch/x86/kernel/early_res.c
index 38a0c61..e209fc4 100644
--- a/arch/x86/kernel/early_res.c
+++ b/arch/x86/kernel/early_res.c
@@ -7,16 +7,14 @@
#include <linux/bootmem.h>
#include <linux/mm.h>

-#include <asm/e820.h>
#include <asm/early_res.h>
-#include <asm/proto.h>

/*
* Early reserved memory areas.
*/
/*
* need to make sure this one is bigger enough before
- * find_e820_area could be used
+ * find_fw_memmap_area could be used
*/
#define MAX_EARLY_RES_X 32

@@ -180,6 +178,13 @@ void __init reserve_early_overlap_ok(u64 start, u64 end, char *name)
__reserve_early(start, end, name, 1);
}

+u64 __init __weak find_fw_memmap_area(u64 start, u64 end, u64 size, u64 align)
+{
+ panic("should have find_fw_memmap_area defined with arch");
+
+ return -1ULL;
+}
+
static void __init __check_and_double_early_res(u64 ex_start, u64 ex_end)
{
u64 start, end, size, mem;
@@ -198,13 +203,13 @@ static void __init __check_and_double_early_res(u64 ex_start, u64 ex_end)
start = early_res[0].end;
end = ex_start;
if (start + size < end)
- mem = find_e820_area(start, end, size,
+ mem = find_fw_memmap_area(start, end, size,
sizeof(struct early_res));
if (mem == -1ULL) {
start = ex_end;
end = max_pfn_mapped << PAGE_SHIFT;
if (start + size < end)
- mem = find_e820_area(start, end, size,
+ mem = find_fw_memmap_area(start, end, size,
sizeof(struct early_res));
}
if (mem == -1ULL)
@@ -343,7 +348,7 @@ int __init get_free_all_memory_range(struct range **rangep, int nodeid)
start = MAX_DMA32_PFN << PAGE_SHIFT;
#endif
end = max_pfn_mapped << PAGE_SHIFT;
- mem = find_e820_area(start, end, size, sizeof(struct range));
+ mem = find_fw_memmap_area(start, end, size, sizeof(struct range));
if (mem == -1ULL)
panic("can not find more space for range free");

--
1.6.4.2

2010-02-09 19:38:28

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 31/35] x86: remove arch_probe_nr_irqs

so keep nr_irqs == NR_IRQS

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 22 ----------------------
1 files changed, 0 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index ee2fba1..1966475 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3886,28 +3886,6 @@ void __init probe_nr_irqs_gsi(void)
printk(KERN_DEBUG "nr_irqs_gsi: %d\n", nr_irqs_gsi);
}

-#ifdef CONFIG_SPARSE_IRQ
-int __init arch_probe_nr_irqs(void)
-{
- int nr;
-
- if (nr_irqs > (NR_VECTORS * nr_cpu_ids))
- nr_irqs = NR_VECTORS * nr_cpu_ids;
-
- nr = nr_irqs_gsi + 8 * nr_cpu_ids;
-#if defined(CONFIG_PCI_MSI) || defined(CONFIG_HT_IRQ)
- /*
- * for MSI and HT dyn irq
- */
- nr += nr_irqs_gsi * 64;
-#endif
- if (nr < nr_irqs)
- nr_irqs = nr;
-
- return 0;
-}
-#endif
-
static int __io_apic_set_pci_routing(struct device *dev, int irq,
struct io_apic_irq_attr *irq_attr)
{
--
1.6.4.2

2010-02-09 19:40:48

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 04/35] x86/pci: amd one chain system to use pci read out res

found MSI amd k8 based laptops is hiding [0x70000000, 0x80000000) RAM from
e820.

enable amd one chain even for all.

Signed-off-by: Yinghai Lu <[email protected]>
Acked-by: Jesse Barnes <[email protected]>
---
arch/x86/pci/amd_bus.c | 7 ++++---
arch/x86/pci/bus_numa.c | 5 -----
arch/x86/pci/bus_numa.h | 1 -
3 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
index 2356ea1..e467071 100644
--- a/arch/x86/pci/amd_bus.c
+++ b/arch/x86/pci/amd_bus.c
@@ -87,11 +87,12 @@ static int __init early_fill_mp_bus_info(void)
struct range range[RANGE_NUM];
u64 val;
u32 address;
+ int found;

if (!early_pci_allowed())
return -1;

- found_all_numa_early = 0;
+ found = 0;
for (i = 0; i < ARRAY_SIZE(pci_probes); i++) {
u32 id;
u16 device;
@@ -105,12 +106,12 @@ static int __init early_fill_mp_bus_info(void)
device = (id>>16) & 0xffff;
if (pci_probes[i].vendor == vendor &&
pci_probes[i].device == device) {
- found_all_numa_early = 1;
+ found = 1;
break;
}
}

- if (!found_all_numa_early)
+ if (!found)
return 0;

pci_root_num = 0;
diff --git a/arch/x86/pci/bus_numa.c b/arch/x86/pci/bus_numa.c
index 30f65ce..3411687 100644
--- a/arch/x86/pci/bus_numa.c
+++ b/arch/x86/pci/bus_numa.c
@@ -5,7 +5,6 @@

int pci_root_num;
struct pci_root_info pci_root_info[PCI_ROOT_NR];
-int found_all_numa_early;

void x86_pci_root_bus_res_quirks(struct pci_bus *b)
{
@@ -21,10 +20,6 @@ void x86_pci_root_bus_res_quirks(struct pci_bus *b)
if (!pci_root_num)
return;

- /* for amd, if only one root bus, don't need to do anything */
- if (pci_root_num < 2 && found_all_numa_early)
- return;
-
for (i = 0; i < pci_root_num; i++) {
if (pci_root_info[i].bus_min == b->number)
break;
diff --git a/arch/x86/pci/bus_numa.h b/arch/x86/pci/bus_numa.h
index 374ecc5..f63e802 100644
--- a/arch/x86/pci/bus_numa.h
+++ b/arch/x86/pci/bus_numa.h
@@ -20,7 +20,6 @@ struct pci_root_info {
#define PCI_ROOT_NR 4
extern int pci_root_num;
extern struct pci_root_info pci_root_info[PCI_ROOT_NR];
-extern int found_all_numa_early;

extern void update_res(struct pci_root_info *info, resource_size_t start,
resource_size_t end, unsigned long flags, int merge);
--
1.6.4.2

2010-02-09 19:38:30

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 16/35] sparsemem: put usemap for one node together

could save some buf instead of applying one by one

could help that system that is going to use early_res instead of bootmem
less entries in early_res make search more faster on system with more memory.

Signed-off-by: Yinghai Lu <[email protected]>
---
mm/sparse.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++------------
1 files changed, 66 insertions(+), 18 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 6ce4aab..0cdaf0b 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -271,7 +271,8 @@ static unsigned long *__kmalloc_section_usemap(void)

#ifdef CONFIG_MEMORY_HOTREMOVE
static unsigned long * __init
-sparse_early_usemap_alloc_pgdat_section(struct pglist_data *pgdat)
+sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
+ unsigned long count)
{
unsigned long section_nr;

@@ -286,7 +287,7 @@ sparse_early_usemap_alloc_pgdat_section(struct pglist_data *pgdat)
* this problem.
*/
section_nr = pfn_to_section_nr(__pa(pgdat) >> PAGE_SHIFT);
- return alloc_bootmem_section(usemap_size(), section_nr);
+ return alloc_bootmem_section(usemap_size() * count, section_nr);
}

static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
@@ -329,7 +330,8 @@ static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
}
#else
static unsigned long * __init
-sparse_early_usemap_alloc_pgdat_section(struct pglist_data *pgdat)
+sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
+ unsigned long count)
{
return NULL;
}
@@ -339,27 +341,40 @@ static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
}
#endif /* CONFIG_MEMORY_HOTREMOVE */

-static unsigned long *__init sparse_early_usemap_alloc(unsigned long pnum)
+static void __init sparse_early_usemaps_alloc_node(unsigned long**usemap_map,
+ unsigned long pnum_begin,
+ unsigned long pnum_end,
+ unsigned long usemap_count, int nodeid)
{
- unsigned long *usemap;
- struct mem_section *ms = __nr_to_section(pnum);
- int nid = sparse_early_nid(ms);
-
- usemap = sparse_early_usemap_alloc_pgdat_section(NODE_DATA(nid));
- if (usemap)
- return usemap;
+ void *usemap;
+ unsigned long pnum;
+ int size = usemap_size();

- usemap = alloc_bootmem_node(NODE_DATA(nid), usemap_size());
+ usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
+ usemap_count);
if (usemap) {
- check_usemap_section_nr(nid, usemap);
- return usemap;
+ for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
+ if (!present_section_nr(pnum))
+ continue;
+ usemap_map[pnum] = usemap;
+ usemap += size;
+ }
+ return;
}

- /* Stupid: suppress gcc warning for SPARSEMEM && !NUMA */
- nid = 0;
+ usemap = alloc_bootmem_node(NODE_DATA(nodeid), size * usemap_count);
+ if (usemap) {
+ for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
+ if (!present_section_nr(pnum))
+ continue;
+ usemap_map[pnum] = usemap;
+ usemap += size;
+ check_usemap_section_nr(nodeid, usemap_map[pnum]);
+ }
+ return;
+ }

printk(KERN_WARNING "%s: allocation failed\n", __func__);
- return NULL;
}

#ifndef CONFIG_SPARSEMEM_VMEMMAP
@@ -396,6 +411,7 @@ static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
void __attribute__((weak)) __meminit vmemmap_populate_print_last(void)
{
}
+
/*
* Allocate the accumulated non-linear sections, allocate a mem_map
* for each and record the physical to section mapping.
@@ -407,6 +423,9 @@ void __init sparse_init(void)
unsigned long *usemap;
unsigned long **usemap_map;
int size;
+ int nodeid_begin = 0;
+ unsigned long pnum_begin = 0;
+ unsigned long usemap_count;

/*
* map is using big page (aka 2M in x86 64 bit)
@@ -425,10 +444,39 @@ void __init sparse_init(void)
panic("can not allocate usemap_map\n");

for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
+ struct mem_section *ms;
+
if (!present_section_nr(pnum))
continue;
- usemap_map[pnum] = sparse_early_usemap_alloc(pnum);
+ ms = __nr_to_section(pnum);
+ nodeid_begin = sparse_early_nid(ms);
+ pnum_begin = pnum;
+ break;
+ }
+ usemap_count = 1;
+ for (pnum = pnum_begin + 1; pnum < NR_MEM_SECTIONS; pnum++) {
+ struct mem_section *ms;
+ int nodeid;
+
+ if (!present_section_nr(pnum))
+ continue;
+ ms = __nr_to_section(pnum);
+ nodeid = sparse_early_nid(ms);
+ if (nodeid == nodeid_begin) {
+ usemap_count++;
+ continue;
+ }
+ /* ok, we need to take cake of from pnum_begin to pnum - 1*/
+ sparse_early_usemaps_alloc_node(usemap_map, pnum_begin, pnum,
+ usemap_count, nodeid_begin);
+ /* new start, update count etc*/
+ nodeid_begin = nodeid;
+ pnum_begin = pnum;
+ usemap_count = 1;
}
+ /* ok, last chunk */
+ sparse_early_usemaps_alloc_node(usemap_map, pnum_begin, NR_MEM_SECTIONS,
+ usemap_count, nodeid_begin);

for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
if (!present_section_nr(pnum))
--
1.6.4.2

2010-02-09 19:33:48

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 03/35] x86/pci: use resource_size_t in update_res

prepare to enable 32bit intel and amd bus

Signed-off-by: Yinghai Lu <[email protected]>
Acked-by: Jesse Barnes <[email protected]>
---
arch/x86/pci/bus_numa.c | 16 ++++++++--------
arch/x86/pci/bus_numa.h | 4 ++--
2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/pci/bus_numa.c b/arch/x86/pci/bus_numa.c
index f939d60..30f65ce 100644
--- a/arch/x86/pci/bus_numa.c
+++ b/arch/x86/pci/bus_numa.c
@@ -51,8 +51,8 @@ void x86_pci_root_bus_res_quirks(struct pci_bus *b)
}
}

-void __devinit update_res(struct pci_root_info *info, size_t start,
- size_t end, unsigned long flags, int merge)
+void __devinit update_res(struct pci_root_info *info, resource_size_t start,
+ resource_size_t end, unsigned long flags, int merge)
{
int i;
struct resource *res;
@@ -65,20 +65,20 @@ void __devinit update_res(struct pci_root_info *info, size_t start,

/* try to merge it with old one */
for (i = 0; i < info->res_num; i++) {
- size_t final_start, final_end;
- size_t common_start, common_end;
+ resource_size_t final_start, final_end;
+ resource_size_t common_start, common_end;

res = &info->res[i];
if (res->flags != flags)
continue;

- common_start = max((size_t)res->start, start);
- common_end = min((size_t)res->end, end);
+ common_start = max(res->start, start);
+ common_end = min(res->end, end);
if (common_start > common_end + 1)
continue;

- final_start = min((size_t)res->start, start);
- final_end = max((size_t)res->end, end);
+ final_start = min(res->start, start);
+ final_end = max(res->end, end);

res->start = final_start;
res->end = final_end;
diff --git a/arch/x86/pci/bus_numa.h b/arch/x86/pci/bus_numa.h
index adbc23f..374ecc5 100644
--- a/arch/x86/pci/bus_numa.h
+++ b/arch/x86/pci/bus_numa.h
@@ -22,6 +22,6 @@ extern int pci_root_num;
extern struct pci_root_info pci_root_info[PCI_ROOT_NR];
extern int found_all_numa_early;

-extern void update_res(struct pci_root_info *info, size_t start,
- size_t end, unsigned long flags, int merge);
+extern void update_res(struct pci_root_info *info, resource_size_t start,
+ resource_size_t end, unsigned long flags, int merge);
#endif
--
1.6.4.2

2010-02-09 19:40:46

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 07/35] x86/pci: enable pci root res read out for 32bit too

should be good for 32bit too.

-v3: cast res->start
-v4: according to Linus, to use %pR instead of cast

Signed-off-by: Yinghai Lu <[email protected]>
Acked-by: Jesse Barnes <[email protected]>
---
arch/x86/pci/Makefile | 3 +--
arch/x86/pci/amd_bus.c | 18 ++----------------
arch/x86/pci/bus_numa.h | 4 ++--
arch/x86/pci/i386.c | 4 ----
4 files changed, 5 insertions(+), 24 deletions(-)

diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile
index 39fba37..0b7d3e9 100644
--- a/arch/x86/pci/Makefile
+++ b/arch/x86/pci/Makefile
@@ -14,8 +14,7 @@ obj-$(CONFIG_X86_VISWS) += visws.o
obj-$(CONFIG_X86_NUMAQ) += numaq_32.o

obj-y += common.o early.o
-obj-y += amd_bus.o
-obj-$(CONFIG_X86_64) += bus_numa.o
+obj-y += amd_bus.o bus_numa.o

ifeq ($(CONFIG_PCI_DEBUG),y)
EXTRA_CFLAGS += -DDEBUG
diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
index 9bcb6fe..badef12 100644
--- a/arch/x86/pci/amd_bus.c
+++ b/arch/x86/pci/amd_bus.c
@@ -6,9 +6,7 @@

#include <asm/pci_x86.h>

-#ifdef CONFIG_X86_64
#include <asm/pci-direct.h>
-#endif

#include "bus_numa.h"

@@ -17,8 +15,6 @@
* also get peer root bus resource for io,mmio
*/

-#ifdef CONFIG_X86_64
-
struct pci_hostbridge_probe {
u32 bus;
u32 slot;
@@ -339,24 +335,14 @@ static int __init early_fill_mp_bus_info(void)
info->bus_min, info->bus_max, info->node, info->link);
for (j = 0; j < res_num; j++) {
res = &info->res[j];
- printk(KERN_DEBUG "bus: %02x index %x %s: [%llx, %llx]\n",
- busnum, j,
- (res->flags & IORESOURCE_IO)?"io port":"mmio",
- res->start, res->end);
+ printk(KERN_DEBUG "bus: %02x index %x %pR\n",
+ busnum, j, res);
}
}

return 0;
}

-#else /* !CONFIG_X86_64 */
-
-static int __init early_fill_mp_bus_info(void) { return 0; }
-
-#endif /* !CONFIG_X86_64 */
-
-/* common 32/64 bit code */
-
#define ENABLE_CF8_EXT_CFG (1ULL << 46)

static void enable_pci_io_ecs(void *unused)
diff --git a/arch/x86/pci/bus_numa.h b/arch/x86/pci/bus_numa.h
index f63e802..08d8e15 100644
--- a/arch/x86/pci/bus_numa.h
+++ b/arch/x86/pci/bus_numa.h
@@ -1,5 +1,5 @@
-#ifdef CONFIG_X86_64
-
+#ifndef __BUS_NUMA_H
+#define __BUS_NUMA_H
/*
* sub bus (transparent) will use entres from 3 to store extra from
* root, so need to make sure we have enough slot there, Should we
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 5dc9e8c..f4e8481 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -257,10 +257,6 @@ void __init pcibios_resource_survey(void)
*/
fs_initcall(pcibios_assign_resources);

-void __weak x86_pci_root_bus_res_quirks(struct pci_bus *b)
-{
-}
-
/*
* If we set up a device for bus mastering, we need to check the latency
* timer as certain crappy BIOSes forget to set it properly.
--
1.6.4.2

2010-02-09 19:41:11

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 14/35] x86: only call dma32_reserve_bootmem 64bit !CONFIG_NUMA

64bit NUMA already make enough space under 4G with new early_node_mem

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/pci.h | 2 ++
arch/x86/include/asm/pci_64.h | 2 --
arch/x86/kernel/pci-dma.c | 13 ++++++++++---
arch/x86/kernel/setup.c | 7 -------
4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index ada8c20..b4a00dd 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -124,6 +124,8 @@ extern void pci_iommu_alloc(void);
#include "pci_64.h"
#endif

+void dma32_reserve_bootmem(void);
+
/* implement the pci_ DMA API in terms of the generic device dma_ one */
#include <asm-generic/pci-dma-compat.h>

diff --git a/arch/x86/include/asm/pci_64.h b/arch/x86/include/asm/pci_64.h
index ae5e40f..fe15cfb 100644
--- a/arch/x86/include/asm/pci_64.h
+++ b/arch/x86/include/asm/pci_64.h
@@ -22,8 +22,6 @@ extern int (*pci_config_read)(int seg, int bus, int dev, int fn,
extern int (*pci_config_write)(int seg, int bus, int dev, int fn,
int reg, int len, u32 value);

-extern void dma32_reserve_bootmem(void);
-
#endif /* __KERNEL__ */

#endif /* _ASM_X86_PCI_64_H */
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 75e14e2..1aa966c 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -65,7 +65,7 @@ int dma_set_mask(struct device *dev, u64 mask)
}
EXPORT_SYMBOL(dma_set_mask);

-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_NUMA)
static __initdata void *dma32_bootmem_ptr;
static unsigned long dma32_bootmem_size __initdata = (128ULL<<20);

@@ -116,14 +116,21 @@ static void __init dma32_free_bootmem(void)
dma32_bootmem_ptr = NULL;
dma32_bootmem_size = 0;
}
+#else
+void __init dma32_reserve_bootmem(void)
+{
+}
+static void __init dma32_free_bootmem(void)
+{
+}
+
#endif

void __init pci_iommu_alloc(void)
{
-#ifdef CONFIG_X86_64
/* free the range so iommu could get some range less than 4G */
dma32_free_bootmem();
-#endif
+
if (pci_swiotlb_detect())
goto out;

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 48cadbb..ea4141b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -969,14 +969,7 @@ void __init setup_arch(char **cmdline_p)
initmem_init(0, max_pfn, acpi, k8);
early_res_to_bootmem(0, max_low_pfn<<PAGE_SHIFT);

-#ifdef CONFIG_X86_64
- /*
- * dma32_reserve_bootmem() allocates bootmem which may conflict
- * with the crashkernel command line, so do that after
- * reserve_crashkernel()
- */
dma32_reserve_bootmem();
-#endif

reserve_ibft_region();

--
1.6.4.2

2010-02-09 19:41:33

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 34/35] x86: make 32bit apic flat to physflat switch like 64bit

kill def_to_bigsmp
and move switch from default to bigsmp at default_setup_apic_routing...
so make default_setup_apic_routing more like 64 bit
also make the dmi relate code to be __init/__initdata

Signed-off-by: Yinghai Lu <[email protected]>
---
arch/x86/include/asm/apic.h | 3 -
arch/x86/include/asm/mpspec.h | 1 -
arch/x86/kernel/acpi/boot.c | 3 -
arch/x86/kernel/apic/apic.c | 15 -------
arch/x86/kernel/apic/bigsmp_32.c | 48 +----------------------
arch/x86/kernel/apic/probe_32.c | 80 ++++++++++++++++++++-----------------
arch/x86/kernel/mpparse.c | 4 --
arch/x86/kernel/setup.c | 2 -
arch/x86/kernel/smpboot.c | 2 +-
9 files changed, 46 insertions(+), 112 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index b4ac2cd..d544523 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -456,9 +456,6 @@ static inline void default_wait_for_init_deassert(atomic_t *deassert)
return;
}

-extern void generic_bigsmp_probe(void);
-
-
#ifdef CONFIG_X86_LOCAL_APIC

#include <asm/smp.h>
diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index d8bf23a..b4b295c 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -23,7 +23,6 @@ extern int pic_mode;

#define MAX_IRQ_SOURCES 256

-extern unsigned int def_to_bigsmp;
extern u8 apicid_2_node[];

#ifdef CONFIG_X86_NUMAQ
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 6083bfa..3dfe6aa 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1192,9 +1192,6 @@ static void __init acpi_process_madt(void)
if (!error) {
acpi_lapic = 1;

-#ifdef CONFIG_X86_BIGSMP
- generic_bigsmp_probe();
-#endif
/*
* Parse MADT IO-APIC entries
*/
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index e97f18b..6e29b2a 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1889,21 +1889,6 @@ void __cpuinit generic_processor_info(int apicid, int version)
if (apicid > max_physical_apicid)
max_physical_apicid = apicid;

-#ifdef CONFIG_X86_32
- if (num_processors > 8) {
- switch (boot_cpu_data.x86_vendor) {
- case X86_VENDOR_INTEL:
- if (!APIC_XAPIC(version)) {
- def_to_bigsmp = 0;
- break;
- }
- /* If P4 and above fall through */
- case X86_VENDOR_AMD:
- def_to_bigsmp = 1;
- }
- }
-#endif
-
#if defined(CONFIG_SMP) || defined(CONFIG_X86_64)
early_per_cpu(x86_cpu_to_apicid, cpu) = apicid;
early_per_cpu(x86_bios_cpu_apicid, cpu) = apicid;
diff --git a/arch/x86/kernel/apic/bigsmp_32.c b/arch/x86/kernel/apic/bigsmp_32.c
index cb804c5..350d60e 100644
--- a/arch/x86/kernel/apic/bigsmp_32.c
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -7,7 +7,6 @@
#include <linux/cpumask.h>
#include <linux/kernel.h>
#include <linux/init.h>
-#include <linux/dmi.h>
#include <linux/smp.h>

#include <asm/apicdef.h>
@@ -73,13 +72,6 @@ static void bigsmp_init_apic_ldr(void)
apic_write(APIC_LDR, val);
}

-static void bigsmp_setup_apic_routing(void)
-{
- printk(KERN_INFO
- "Enabling APIC mode: Physflat. Using %d I/O APICs\n",
- nr_ioapics);
-}
-
static int bigsmp_apicid_to_node(int logical_apicid)
{
return apicid_2_node[hard_smp_processor_id()];
@@ -154,52 +146,16 @@ static void bigsmp_send_IPI_all(int vector)
bigsmp_send_IPI_mask(cpu_online_mask, vector);
}

-static int dmi_bigsmp; /* can be set by dmi scanners */
-
-static int hp_ht_bigsmp(const struct dmi_system_id *d)
-{
- printk(KERN_NOTICE "%s detected: force use of apic=bigsmp\n", d->ident);
- dmi_bigsmp = 1;
-
- return 0;
-}
-
-
-static const struct dmi_system_id bigsmp_dmi_table[] = {
- { hp_ht_bigsmp, "HP ProLiant DL760 G2",
- { DMI_MATCH(DMI_BIOS_VENDOR, "HP"),
- DMI_MATCH(DMI_BIOS_VERSION, "P44-"),
- }
- },
-
- { hp_ht_bigsmp, "HP ProLiant DL740",
- { DMI_MATCH(DMI_BIOS_VENDOR, "HP"),
- DMI_MATCH(DMI_BIOS_VERSION, "P47-"),
- }
- },
- { } /* NULL entry stops DMI scanning */
-};
-
static void bigsmp_vector_allocation_domain(int cpu, struct cpumask *retmask)
{
cpumask_clear(retmask);
cpumask_set_cpu(cpu, retmask);
}

-static int probe_bigsmp(void)
-{
- if (def_to_bigsmp)
- dmi_bigsmp = 1;
- else
- dmi_check_system(bigsmp_dmi_table);
-
- return dmi_bigsmp;
-}
-
struct apic apic_bigsmp = {

.name = "bigsmp",
- .probe = probe_bigsmp,
+ .probe = NULL,
.acpi_madt_oem_check = NULL,
.apic_id_registered = bigsmp_apic_id_registered,

@@ -217,7 +173,7 @@ struct apic apic_bigsmp = {
.init_apic_ldr = bigsmp_init_apic_ldr,

.ioapic_phys_id_map = bigsmp_ioapic_phys_id_map,
- .setup_apic_routing = bigsmp_setup_apic_routing,
+ .setup_apic_routing = NULL,
.multi_timer_check = NULL,
.apicid_to_node = bigsmp_apicid_to_node,
.cpu_to_logical_apicid = bigsmp_cpu_to_logical_apicid,
diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c
index d657558..b05e543 100644
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -14,6 +14,8 @@
#include <linux/ctype.h>
#include <linux/init.h>
#include <linux/errno.h>
+#include <linux/dmi.h>
+
#include <asm/fixmap.h>
#include <asm/mpspec.h>
#include <asm/apicdef.h>
@@ -52,15 +54,6 @@ static int __init print_ipi_mode(void)
}
late_initcall(print_ipi_mode);

-static void local_default_setup_apic_routing(void)
-{
-#ifdef CONFIG_X86_IO_APIC
- printk(KERN_INFO
- "Enabling APIC mode: Flat. Using %d I/O APICs\n",
- nr_ioapics);
-#endif
-}
-
static void default_vector_allocation_domain(int cpu, struct cpumask *retmask)
{
/*
@@ -76,16 +69,10 @@ static void default_vector_allocation_domain(int cpu, struct cpumask *retmask)
cpumask_bits(retmask)[0] = APIC_ALL_CPUS;
}

-/* should be called last. */
-static int probe_default(void)
-{
- return 1;
-}
-
struct apic apic_default = {

.name = "default",
- .probe = probe_default,
+ .probe = NULL,
.acpi_madt_oem_check = NULL,
.apic_id_registered = default_apic_id_registered,

@@ -103,7 +90,7 @@ struct apic apic_default = {
.init_apic_ldr = default_init_apic_ldr,

.ioapic_phys_id_map = default_ioapic_phys_id_map,
- .setup_apic_routing = local_default_setup_apic_routing,
+ .setup_apic_routing = NULL,
.multi_timer_check = NULL,
.apicid_to_node = default_apicid_to_node,
.cpu_to_logical_apicid = default_cpu_to_logical_apicid,
@@ -192,24 +179,40 @@ static int __init parse_apic(char *arg)
}
early_param("apic", parse_apic);

-void __init generic_bigsmp_probe(void)
+static int dmi_bigsmp __initdata; /* can be set by dmi scanners */
+
+static int __init hp_ht_bigsmp(const struct dmi_system_id *d)
{
-#ifdef CONFIG_X86_BIGSMP
- /*
- * This routine is used to switch to bigsmp mode when
- * - There is no apic= option specified by the user
- * - generic_apic_probe() has chosen apic_default as the sub_arch
- * - we find more than 8 CPUs in acpi LAPIC listing with xAPIC support
- */
+ printk(KERN_NOTICE "%s detected: force use of apic=bigsmp\n", d->ident);
+ dmi_bigsmp = 1;

- if (!cmdline_apic && apic == &apic_default) {
- if (apic_bigsmp.probe()) {
- apic = &apic_bigsmp;
- printk(KERN_INFO "Overriding APIC driver with %s\n",
- apic->name);
+ return 0;
+}
+
+static struct dmi_system_id bigsmp_dmi_table[] __initdata = {
+ { hp_ht_bigsmp, "HP ProLiant DL760 G2",
+ { DMI_MATCH(DMI_BIOS_VENDOR, "HP"),
+ DMI_MATCH(DMI_BIOS_VERSION, "P44-"),
}
- }
+ },
+
+ { hp_ht_bigsmp, "HP ProLiant DL740",
+ { DMI_MATCH(DMI_BIOS_VENDOR, "HP"),
+ DMI_MATCH(DMI_BIOS_VERSION, "P47-"),
+ }
+ },
+ { } /* NULL entry stops DMI scanning */
+};
+
+static inline const char *get_apic_name(struct apic *apic)
+{
+ if (apic == &apic_default)
+ return "flat";
+#ifdef CONFIG_X86_BIGSMP
+ if (apic == &apic_bigsmp);
+ return "physical flat";
#endif
+ return apic->name;
}

void __init default_setup_apic_routing(void)
@@ -219,13 +222,19 @@ void __init default_setup_apic_routing(void)
* make sure we go to bigsmp according to real nr_cpu_ids
*/
if (!cmdline_apic && apic == &apic_default) {
- if (nr_cpu_ids > 8) {
+ dmi_check_system(bigsmp_dmi_table);
+ if (nr_cpu_ids > 8 || dmi_bigsmp) {
apic = &apic_bigsmp;
printk(KERN_INFO "Overriding APIC driver with %s\n",
- apic->name);
+ get_apic_name(apic));
}
}
#endif
+#ifdef CONFIG_X86_IO_APIC
+ printk(KERN_INFO
+ "Enabling APIC mode: %s. Using %d I/O APICs\n",
+ get_apic_name(apic), nr_ioapics);
+#endif
}

void __init generic_apic_probe(void)
@@ -233,14 +242,11 @@ void __init generic_apic_probe(void)
if (!cmdline_apic) {
int i;
for (i = 0; apic_probe[i]; i++) {
- if (apic_probe[i]->probe()) {
+ if (apic_probe[i]->probe && apic_probe[i]->probe()) {
apic = apic_probe[i];
break;
}
}
- /* Not visible without early console */
- if (!apic_probe[i])
- panic("Didn't find an APIC driver");
}
printk(KERN_INFO "Using APIC driver %s\n", apic->name);
}
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index 40b54ce..0f885c6 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -359,10 +359,6 @@ static int __init smp_read_mpc(struct mpc_table *mpc, unsigned early)
x86_init.mpparse.mpc_record(1);
}

-#ifdef CONFIG_X86_BIGSMP
- generic_bigsmp_probe();
-#endif
-
if (apic->setup_apic_routing)
apic->setup_apic_routing();

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d49e168..17759b7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -184,8 +184,6 @@ static void set_mca_bus(int x)
#endif
}

-unsigned int def_to_bigsmp;
-
/* for MCA, but anyone else can use it if they want */
unsigned int machine_id;
unsigned int machine_submodel_id;
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index da99eef..efdf81e 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -950,7 +950,7 @@ static int __init smp_sanity_check(unsigned max_cpus)
preempt_disable();

#if !defined(CONFIG_X86_BIGSMP) && defined(CONFIG_X86_32)
- if (def_to_bigsmp && nr_cpu_ids > 8) {
+ if (apic == &apic_default && nr_cpu_ids > 8) {
unsigned int cpu;
unsigned nr;

--
1.6.4.2

2010-02-09 19:50:52

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 28/35] radix: move radix init early

On Tue, 09 Feb 2010 11:32:39 -0800
Yinghai Lu <[email protected]> wrote:

> prepare to use it in early_irq_init()
>
> Signed-off-by: Yinghai Lu <[email protected]>
> ---
> init/main.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/init/main.c b/init/main.c
> index 4cb47a1..8451878 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -584,6 +584,7 @@ asmlinkage void __init start_kernel(void)
> local_irq_disable();
> }
> rcu_init();
> + radix_tree_init();
> /* init some links before init_ISA_irqs() */
> early_irq_init();
> init_IRQ();
> @@ -657,7 +658,6 @@ asmlinkage void __init start_kernel(void)
> proc_caches_init();
> buffer_init();
> key_init();
> - radix_tree_init();
> security_init();
> vfs_caches_init(totalram_pages);
> signals_init();

Probably OK. Note that radix_tree_init() uses slab, and it is now being
called before the kernel has run kmem_cache_init_late(). So please
ensure that this code was tested with CONFIG_SLAB=y.

2010-02-09 19:53:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 09/35] x86: print out for RAM buffer

On Tue, 09 Feb 2010 11:32:20 -0800
Yinghai Lu <[email protected]> wrote:

> so could check that early in bootlog
>
> Acked-by: Yinghai Lu <[email protected]>
> Reviewed-by: Christoph Lameter <[email protected]>
> ---
> arch/x86/kernel/e820.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index a966b75..dfb1689 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -1429,6 +1429,9 @@ void __init e820_reserve_resources_late(void)
> end = MAX_RESOURCE_SIZE;
> if (start >= end)
> continue;
> + printk(KERN_DEBUG "reserve RAM buffer: %016Lx - %016Lx ",
> + (unsigned long long) start,
> + (unsigned long long) end);
> reserve_region_with_split(&iomem_resource, start, end,
> "RAM buffer");
> }

The typecasts for printing u64's are unneeded within x86 code. In fact
I think they're now unneeded on all architectures.

2010-02-09 19:55:22

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 04/35] x86/pci: amd one chain system to use pci read out res

On Tue, 09 Feb 2010 11:32:15 -0800
Yinghai Lu <[email protected]> wrote:

> --- a/arch/x86/pci/amd_bus.c
> +++ b/arch/x86/pci/amd_bus.c
> @@ -87,11 +87,12 @@ static int __init early_fill_mp_bus_info(void)
> struct range range[RANGE_NUM];
> u64 val;
> u32 address;
> + int found;

It'd be clearer to give this the `bool' type. After all, it's a bool.

>
> if (!early_pci_allowed())
> return -1;
>
> - found_all_numa_early = 0;
> + found = 0;
> for (i = 0; i < ARRAY_SIZE(pci_probes); i++) {
> u32 id;
> u16 device;
> @@ -105,12 +106,12 @@ static int __init early_fill_mp_bus_info(void)
> device = (id>>16) & 0xffff;
> if (pci_probes[i].vendor == vendor &&
> pci_probes[i].device == device) {
> - found_all_numa_early = 1;
> + found = 1;
> break;
> }
> }
>
> - if (!found_all_numa_early)
> + if (!found)
> return 0;
>
> pci_root_num = 0;

2010-02-09 19:56:45

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 28/35] radix: move radix init early

On Tue, Feb 9, 2010 at 9:49 PM, Andrew Morton <[email protected]> wrote:
> On Tue, 09 Feb 2010 11:32:39 -0800
> Yinghai Lu <[email protected]> wrote:
>
>> prepare to use it in early_irq_init()
>>
>> Signed-off-by: Yinghai Lu <[email protected]>
>> ---
>> ?init/main.c | ? ?2 +-
>> ?1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/init/main.c b/init/main.c
>> index 4cb47a1..8451878 100644
>> --- a/init/main.c
>> +++ b/init/main.c
>> @@ -584,6 +584,7 @@ asmlinkage void __init start_kernel(void)
>> ? ? ? ? ? ? ? local_irq_disable();
>> ? ? ? }
>> ? ? ? rcu_init();
>> + ? ? radix_tree_init();
>> ? ? ? /* init some links before init_ISA_irqs() */
>> ? ? ? early_irq_init();
>> ? ? ? init_IRQ();
>> @@ -657,7 +658,6 @@ asmlinkage void __init start_kernel(void)
>> ? ? ? proc_caches_init();
>> ? ? ? buffer_init();
>> ? ? ? key_init();
>> - ? ? radix_tree_init();
>> ? ? ? security_init();
>> ? ? ? vfs_caches_init(totalram_pages);
>> ? ? ? signals_init();
>
> Probably OK. ?Note that radix_tree_init() uses slab, and it is now being
> called before the kernel has run kmem_cache_init_late(). ?So please
> ensure that this code was tested with CONFIG_SLAB=y.

That should be fine but yeah, definitely needs to be tested.

2010-02-09 19:57:31

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 06/35] x86/pci: add cap_resource

On Tue, 09 Feb 2010 11:32:17 -0800
Yinghai Lu <[email protected]> wrote:

> +static inline resource_size_t cap_resource(u64 val)
> +{
> + if (val > MAX_RESOURCE)
> + return MAX_RESOURCE;
> + else
> + return val;
> +}

return min(val, MAX_RESOURCE);

2010-02-09 20:04:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 23/35] x86: make 32bit support NO_BOOTMEM

On Tue, 09 Feb 2010 11:32:34 -0800
Yinghai Lu <[email protected]> wrote:

> --- a/arch/x86/kernel/early_res.c
> +++ b/arch/x86/kernel/early_res.c
> @@ -354,6 +354,9 @@ int __init get_free_all_memory_range(struct range **rangep, int nodeid)
>
> /* need to go over early_node_map to find out good range for node */
> nr_range = add_from_early_node_map(range, count, nr_range, nodeid);
> +#ifdef CONFIG_X86_32
> + subtract_range(range, count, max_low_pfn, -1UL);
> +#endif
> subtract_early_res(range, count);
> nr_range = clean_sort_range(range, count);

I'm too lazy to hunt through all the patches to see the surrounding code...

The "-1UL" is worrisome. If that contant is to be assigned to a u64
then on x86_32 we end up with 0x00000000ffffffff, which is surely
wrong.

Generally I think it's safer and more maintainable (but odd-looking) to
use plain old "-1" when specifying the all-ones contant pattern.
Because "-1" always works, whatever the size of the target scalar.
This particularly matters when you're dealing with types which have
different sizes on different architectures, or whose type changes with
config options.

2010-02-09 20:06:25

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 32/35] use nr_cpus= to set nr_cpu_ids early

On Tue, 09 Feb 2010 11:32:43 -0800
Yinghai Lu <[email protected]> wrote:

> +early_param("nr_cpus", nrcpus);

Please document kernel boot parameters in Documentation/kernel-parameters.txt.

2010-02-09 20:16:12

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 21/35] x86: move back find_e820_area to e820.c

On Tue, 09 Feb 2010 11:32:32 -0800
Yinghai Lu <[email protected]> wrote:

> /*
> + * Find a free area with specified alignment in a specific range.
> + */
> +u64 __init find_e820_area(u64 start, u64 end, u64 size, u64 align)
> +{
> + int i;
> +
> + for (i = 0; i < e820.nr_map; i++) {
> + struct e820entry *ei = &e820.map[i];
> + u64 addr;
> + u64 ei_start, ei_last;
> +
> + if (ei->type != E820_RAM)
> + continue;
> +
> + ei_last = ei->addr + ei->size;
> + ei_start = ei->addr;
> + addr = find_early_area(ei_start, ei_last, start, end,
> + size, align);
> +
> + if (addr == -1ULL)
> + continue;
> +
> + return addr;

Small simplification:

if (addr != -1)
return addr;

> + }
> + return -1ULL;
> +}
> +
> +/*
> + * Find next free range after *start
> + */
> +u64 __init find_e820_area_size(u64 start, u64 *sizep, u64 align)
> +{
> + int i;
> +
> + for (i = 0; i < e820.nr_map; i++) {
> + struct e820entry *ei = &e820.map[i];
> + u64 addr;
> + u64 ei_start, ei_last;
> +
> + if (ei->type != E820_RAM)
> + continue;
> +
> + ei_last = ei->addr + ei->size;
> + ei_start = ei->addr;
> + addr = find_early_area_size(ei_start, ei_last, start,
> + sizep, align);
> +
> + if (addr == -1ULL)
> + continue;
> +
> + return addr;

Ditto.

> + }
> +
> + return -1ULL;
> +}

2010-02-09 20:22:51

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 17/35] sparsemem: put mem map for one node together.

On Tue, 09 Feb 2010 11:32:28 -0800
Yinghai Lu <[email protected]> wrote:

> +#ifdef CONFIG_NO_BOOTMEM
> + free_early(__pa(vmemmap_buf_start), __pa(vmemmap_buf_end));
> + if (vmemmap_buf_start < vmemmap_buf) {
> + char name[15];
> +
> + memset(name, 0, sizeof(name));

Is the memset actually needed?

> + snprintf(name, 15, "MEMMAP %d", nodeid);

s/15/sizeof(name)/

> + reserve_early_without_check(__pa(vmemmap_buf_start),
> + __pa(vmemmap_buf), name);
> + }
> +#else

2010-02-09 20:27:52

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 15/35] x86: make 64 bit use early_res instead of bootmem before slab

On Tue, 09 Feb 2010 11:32:26 -0800
Yinghai Lu <[email protected]> wrote:

> +u64 __init find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
> + u64 size, u64 align)
> +{
> + u64 addr, last;
> +
> + addr = round_up(ei_start, align);

Can we use include/linux/log2.h:roundup_pow_of_two() in this patchset?

2010-02-09 20:31:47

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 15/35] x86: make 64 bit use early_res instead of bootmem before slab

On Tue, 09 Feb 2010 11:32:26 -0800
Yinghai Lu <[email protected]> wrote:

> +config NO_BOOTMEM
> + default y
> + bool "Disable Bootmem code"
> + depends on X86_64
> + ---help---
> + use early_res directly instead of bootmem before slab is ready.
> +

s/use/Use/

It would help if the help was a bit more helpful than this. It
provides no indication why one would want to enable this option, nor
what the effects of doing so might be.

2010-02-09 21:38:16

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 15/35] x86: make 64 bit use early_res instead of bootmem before slab

On 02/09/2010 12:26 PM, Andrew Morton wrote:
> On Tue, 09 Feb 2010 11:32:26 -0800
> Yinghai Lu <[email protected]> wrote:
>
>> +u64 __init find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
>> + u64 size, u64 align)
>> +{
>> + u64 addr, last;
>> +
>> + addr = round_up(ei_start, align);
>
> Can we use include/linux/log2.h:roundup_pow_of_two() in this patchset?

no, that is not what we want.


/**
* roundup_pow_of_two - round the given value up to nearest power of two
* @n - parameter
*
* round the given value up to the nearest power of two
* - the result is undefined when n == 0
* - this can be used to initialise global variables from constant data
*/
#define roundup_pow_of_two(n) \
( \
__builtin_constant_p(n) ? ( \
(n == 1) ? 1 : \
(1UL << (ilog2((n) - 1) + 1)) \
) : \
__roundup_pow_of_two(n) \
)

YH

2010-02-09 21:46:11

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 15/35] x86: make 64 bit use early_res instead of bootmem before slab

On 02/09/2010 01:37 PM, Yinghai Lu wrote:
> On 02/09/2010 12:26 PM, Andrew Morton wrote:
>> On Tue, 09 Feb 2010 11:32:26 -0800
>> Yinghai Lu <[email protected]> wrote:
>>
>>> +u64 __init find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
>>> + u64 size, u64 align)
>>> +{
>>> + u64 addr, last;
>>> +
>>> + addr = round_up(ei_start, align);
>>
>> Can we use include/linux/log2.h:roundup_pow_of_two() in this patchset?
>
> no, that is not what we want.
>
>
> /**
> * roundup_pow_of_two - round the given value up to nearest power of two
> * @n - parameter
> *

Hah. Naming can be a real headache with small, common operations.

At the same time, round_up() versus roundup() is a source of some
serious confusion. roundup_bin() might be a better name... I don't know.

-hpa

2010-02-09 21:47:59

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 15/35] x86: make 64 bit use early_res instead of bootmem before slab

On 02/09/2010 01:45 PM, H. Peter Anvin wrote:
>>
>> /**
>> * roundup_pow_of_two - round the given value up to nearest power of two
>> * @n - parameter
>> *
>
> Hah. Naming can be a real headache with small, common operations.
>
> At the same time, round_up() versus roundup() is a source of some
> serious confusion. roundup_bin() might be a better name... I don't know.
>

On the other hand... if we're doing a renaming patchset at some point
(which would be necessary anyway, since all these interfaces have
users), perhaps we should change roundup_pow_of_two() to
next_power_of_two() or something like that.

-hpa

2010-02-09 22:33:52

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 15/35] x86: make 64 bit use early_res instead of bootmem before slab

On Tue, 09 Feb 2010 13:37:04 -0800
Yinghai Lu <[email protected]> wrote:

> On 02/09/2010 12:26 PM, Andrew Morton wrote:
> > On Tue, 09 Feb 2010 11:32:26 -0800
> > Yinghai Lu <[email protected]> wrote:
> >
> >> +u64 __init find_early_area(u64 ei_start, u64 ei_last, u64 start, u64 end,
> >> + u64 size, u64 align)
> >> +{
> >> + u64 addr, last;
> >> +
> >> + addr = round_up(ei_start, align);
> >
> > Can we use include/linux/log2.h:roundup_pow_of_two() in this patchset?
>
> no, that is not what we want.
>
>
> /**
> * roundup_pow_of_two - round the given value up to nearest power of two
> * @n - parameter
> *
> * round the given value up to the nearest power of two
> * - the result is undefined when n == 0
> * - this can be used to initialise global variables from constant data
> */
> #define roundup_pow_of_two(n) \
> ( \
> __builtin_constant_p(n) ? ( \
> (n == 1) ? 1 : \
> (1UL << (ilog2((n) - 1) + 1)) \
> ) : \
> __roundup_pow_of_two(n) \
> )
>

err, crap, yeah.

It would be nice to fix all this up, and get the private round_up()
stuff out of x86 and into kernel.h.

The present naming scheme is ghastly:

ALIGN
DIV_ROUND_UP
roundup
round_up
round_down

Where round_up and ALIGN are kinda maybe the same.

2010-02-09 22:45:12

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 15/35] x86: make 64 bit use early_res instead of bootmem before slab

On 02/09/2010 02:33 PM, Andrew Morton wrote:
>
> err, crap, yeah.
>
> It would be nice to fix all this up, and get the private round_up()
> stuff out of x86 and into kernel.h.
>
> The present naming scheme is ghastly:
>
> ALIGN
> DIV_ROUND_UP
> roundup
> round_up
> round_down
>
> Where round_up and ALIGN are kinda maybe the same.
>

Yes indeed. It seems to be something for a separate patchset, though,
since it will touch stuff across a lot of code.

-hpa

2010-02-10 00:17:28

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 28/35] radix: move radix init early

On 02/09/2010 11:56 AM, Pekka Enberg wrote:
> On Tue, Feb 9, 2010 at 9:49 PM, Andrew Morton <[email protected]> wrote:
>> On Tue, 09 Feb 2010 11:32:39 -0800
>> Yinghai Lu <[email protected]> wrote:
>>
>>> prepare to use it in early_irq_init()
>>>
>>> Signed-off-by: Yinghai Lu <[email protected]>
>>> ---
>>> init/main.c | 2 +-
>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/init/main.c b/init/main.c
>>> index 4cb47a1..8451878 100644
>>> --- a/init/main.c
>>> +++ b/init/main.c
>>> @@ -584,6 +584,7 @@ asmlinkage void __init start_kernel(void)
>>> local_irq_disable();
>>> }
>>> rcu_init();
>>> + radix_tree_init();
>>> /* init some links before init_ISA_irqs() */
>>> early_irq_init();
>>> init_IRQ();
>>> @@ -657,7 +658,6 @@ asmlinkage void __init start_kernel(void)
>>> proc_caches_init();
>>> buffer_init();
>>> key_init();
>>> - radix_tree_init();
>>> security_init();
>>> vfs_caches_init(totalram_pages);
>>> signals_init();
>>
>> Probably OK. Note that radix_tree_init() uses slab, and it is now being
>> called before the kernel has run kmem_cache_init_late(). So please
>> ensure that this code was tested with CONFIG_SLAB=y.
>
> That should be fine but yeah, definitely needs to be tested.

tested with x86 bit with CONFIG_SLAB=y, and it works.

YH

2010-02-10 10:05:57

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 28/35] radix: move radix init early

On Wed, Feb 10, 2010 at 2:16 AM, Yinghai Lu <[email protected]> wrote:
> On 02/09/2010 11:56 AM, Pekka Enberg wrote:
>> On Tue, Feb 9, 2010 at 9:49 PM, Andrew Morton <[email protected]> wrote:
>>> On Tue, 09 Feb 2010 11:32:39 -0800
>>> Yinghai Lu <[email protected]> wrote:
>>>
>>>> prepare to use it in early_irq_init()
>>>>
>>>> Signed-off-by: Yinghai Lu <[email protected]>
>>>> ---
>>>> ?init/main.c | ? ?2 +-
>>>> ?1 files changed, 1 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/init/main.c b/init/main.c
>>>> index 4cb47a1..8451878 100644
>>>> --- a/init/main.c
>>>> +++ b/init/main.c
>>>> @@ -584,6 +584,7 @@ asmlinkage void __init start_kernel(void)
>>>> ? ? ? ? ? ? ? local_irq_disable();
>>>> ? ? ? }
>>>> ? ? ? rcu_init();
>>>> + ? ? radix_tree_init();
>>>> ? ? ? /* init some links before init_ISA_irqs() */
>>>> ? ? ? early_irq_init();
>>>> ? ? ? init_IRQ();
>>>> @@ -657,7 +658,6 @@ asmlinkage void __init start_kernel(void)
>>>> ? ? ? proc_caches_init();
>>>> ? ? ? buffer_init();
>>>> ? ? ? key_init();
>>>> - ? ? radix_tree_init();
>>>> ? ? ? security_init();
>>>> ? ? ? vfs_caches_init(totalram_pages);
>>>> ? ? ? signals_init();
>>>
>>> Probably OK. ?Note that radix_tree_init() uses slab, and it is now being
>>> called before the kernel has run kmem_cache_init_late(). ?So please
>>> ensure that this code was tested with CONFIG_SLAB=y.
>>
>> That should be fine but yeah, definitely needs to be tested.
>
> tested with x86 bit with CONFIG_SLAB=y, and it works.

OK, great. Would be good to have some coverage on other architectures
as well. PPC was one of the archs that blew up last time we changed
init order.

2010-02-10 21:20:21

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 09/35] x86: print out for RAM buffer

On 02/09/2010 11:52 AM, Andrew Morton wrote:
>>
>> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
>> index a966b75..dfb1689 100644
>> --- a/arch/x86/kernel/e820.c
>> +++ b/arch/x86/kernel/e820.c
>> @@ -1429,6 +1429,9 @@ void __init e820_reserve_resources_late(void)
>> end = MAX_RESOURCE_SIZE;
>> if (start >= end)
>> continue;
>> + printk(KERN_DEBUG "reserve RAM buffer: %016Lx - %016Lx ",
>> + (unsigned long long) start,
>> + (unsigned long long) end);
>> reserve_region_with_split(&iomem_resource, start, end,
>> "RAM buffer");
>> }
>
> The typecasts for printing u64's are unneeded within x86 code. In fact
> I think they're now unneeded on all architectures.
>

... not to mention that the proper printf format for long long is "ll".
"L" is used for long double, although a lot of implementations treat
"L" and "ll" the same.

And yes, it's a nitpick.

-hpa

2010-02-11 05:36:46

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 28/35] radix: move radix init early


> OK, great. Would be good to have some coverage on other architectures
> as well. PPC was one of the archs that blew up last time we changed
> init order.

Hopefully that should be ok. In fact we use the radix tree in our
interrupt handling and today we have a horrible hack that reverts
to linear search early during boot until we can allocate the radix
trees in an arch_initcall.

Thus the patch should allow us to simplify that code and remove
that late init hack.

Cheers,
Ben.