2013-10-12 22:00:11

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 00/23] mm: Use memblock interface instead of bootmem

Tejun, Yinghai and others,

Here is an attempt to convert the core kernel code to memblock allocator
APIs when used with NO_BOOTMEM. Based on discussion thread [1] and my
limited understanding of the topic, I tried to cook up this RFC with
help from Grygorii. I am counting on reviews, guidance and testing help
to move forward with the approach. This is one of the blocking item for
the ARM LPAE architecture on the physical memory starts after 4BG boundary
and hence needs the early memory allocators

As outlined by Tejun, we would like to remove the use of nobootmem.c and
then eventually bootmem allocator once all arch switch to NO_BOOTMEM.
Not to break the existing architectures using bootmem, all the new
memblock interfaces fall back to bootmem layer with !NO_BOOTMEM

Testing is done on ARM architecture with 32 bit and ARM LAPE machines
with normal as well sparse(famed) memory model. To convert ARM to
NO_BOOTMEM, I have used Russell's work [2] and couple of patches
on top of that.

Comments/suggestions are welcome !!

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Russell King <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Nicolas Pitre <[email protected]>
Cc: Olof Johansson <[email protected]>

Grygorii Strashko (9):
mm/bootmem: remove duplicated declaration of __free_pages_bootmem()
mm/block: remove unnecessary inclusion of bootmem.h
mm/memory_hotplug: remove unnecessary inclusion of bootmem.h
mm/staging: remove unnecessary inclusion of bootmem.h
mm/char: remove unnecessary inclusion of bootmem.h
mm/memblock: debug: correct displaying of upper memory boundary
mm/memblock: debug: don't free reserved array if
!ARCH_DISCARD_MEMBLOCK
mm/hugetlb: Use memblock apis for early memory allocations
mm/page_cgroup: Use memblock apis for early memory allocations

Santosh Shilimkar (14):
mm/memblock: Add memblock early memory allocation apis
mm/init: Use memblock apis for early memory allocations
mm/printk: Use memblock apis for early memory allocations
mm/page_alloc: Use memblock apis for early memory allocations
mm/power: Use memblock apis for early memory allocations
mm/lib: Use memblock apis for early memory allocations
mm/lib: Use memblock apis for early memory allocations
mm/sparse: Use memblock apis for early memory allocations
mm/percpu: Use memblock apis for early memory allocations
mm/memory_hotplug: Use memblock apis for early memory allocations
mm/firmware: Use memblock apis for early memory allocations
mm/ARM: kernel: Use memblock apis for early memory allocations
mm/ARM: mm: Use memblock apis for early memory allocations
mm/ARM: OMAP: Use memblock apis for early memory allocations

arch/arm/kernel/devtree.c | 2 +-
arch/arm/kernel/setup.c | 2 +-
arch/arm/mach-omap2/omap_hwmod.c | 8 +--
arch/arm/mm/init.c | 2 +-
block/blk-ioc.c | 1 -
drivers/char/mem.c | 1 -
drivers/firmware/memmap.c | 2 +-
drivers/staging/speakup/main.c | 2 -
include/linux/bootmem.h | 73 ++++++++++++++++++++++-
init/main.c | 4 +-
kernel/power/snapshot.c | 2 +-
kernel/printk/printk.c | 10 +---
lib/cpumask.c | 4 +-
lib/swiotlb.c | 30 +++++-----
mm/hugetlb.c | 10 ++--
mm/memblock.c | 122 +++++++++++++++++++++++++++++++++++++-
mm/memory_hotplug.c | 3 +-
mm/page_alloc.c | 26 ++++----
mm/page_cgroup.c | 5 +-
mm/percpu.c | 39 +++++++-----
mm/sparse-vmemmap.c | 5 +-
mm/sparse.c | 24 ++++----
22 files changed, 284 insertions(+), 93 deletions(-)

Regards,
Santosh

[1] https://lkml.org/lkml/2013/6/29/77
[2] http://lwn.net/Articles/561854/
--
1.7.9.5


2013-10-12 21:59:40

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 01/23] mm/bootmem: remove duplicated declaration of __free_pages_bootmem()

From: Grygorii Strashko <[email protected]>

The __free_pages_bootmem is used internally by MM core and
already defined in internal.h. So, remove duplicated declaration.

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
---
include/linux/bootmem.h | 1 -
1 file changed, 1 deletion(-)

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index f1f07d3..55d52fb 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -52,7 +52,6 @@ extern void free_bootmem_node(pg_data_t *pgdat,
unsigned long size);
extern void free_bootmem(unsigned long physaddr, unsigned long size);
extern void free_bootmem_late(unsigned long physaddr, unsigned long size);
-extern void __free_pages_bootmem(struct page *page, unsigned int order);

/*
* Flags for reserve_bootmem (also if CONFIG_HAVE_ARCH_BOOTMEM_NODE,
--
1.7.9.5

2013-10-12 21:59:45

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 04/23] mm/staging: remove unnecessary inclusion of bootmem.h

From: Grygorii Strashko <[email protected]>

Clean-up to remove depedency with bootmem headers.

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
---
drivers/staging/speakup/main.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/drivers/staging/speakup/main.c b/drivers/staging/speakup/main.c
index 14079c4..041f01e 100644
--- a/drivers/staging/speakup/main.c
+++ b/drivers/staging/speakup/main.c
@@ -37,8 +37,6 @@
#include <linux/input.h>
#include <linux/kmod.h>

-#include <linux/bootmem.h> /* for alloc_bootmem */
-
/* speakup_*_selection */
#include <linux/module.h>
#include <linux/sched.h>
--
1.7.9.5

2013-10-12 22:00:20

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 11/23] mm/page_alloc: Use memblock apis for early memory allocations

Switch to memblock interfaces for early memory allocator

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
---
mm/page_alloc.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0ee638f..a451ebd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4218,7 +4218,6 @@ static noinline __init_refok
int zone_wait_table_init(struct zone *zone, unsigned long zone_size_pages)
{
int i;
- struct pglist_data *pgdat = zone->zone_pgdat;
size_t alloc_size;

/*
@@ -4234,7 +4233,8 @@ int zone_wait_table_init(struct zone *zone, unsigned long zone_size_pages)

if (!slab_is_available()) {
zone->wait_table = (wait_queue_head_t *)
- alloc_bootmem_node_nopanic(pgdat, alloc_size);
+ memblock_early_alloc_node_nopanic(
+ zone->zone_pgdat->node_id, alloc_size);
} else {
/*
* This case means that a zone whose size was 0 gets new memory
@@ -4354,13 +4354,14 @@ bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
#endif

/**
- * free_bootmem_with_active_regions - Call free_bootmem_node for each active range
+ * free_bootmem_with_active_regions - Call memblock_free_early_nid for each active range
* @nid: The node to free memory on. If MAX_NUMNODES, all nodes are freed.
- * @max_low_pfn: The highest PFN that will be passed to free_bootmem_node
+ * @max_low_pfn: The highest PFN that will be passed to memblock_free_early_nid
*
* If an architecture guarantees that all ranges registered with
* add_active_ranges() contain no holes and may be freed, this
- * this function may be used instead of calling free_bootmem() manually.
+ * this function may be used instead of calling memblock_free_early_nid()
+ * manually.
*/
void __init free_bootmem_with_active_regions(int nid, unsigned long max_low_pfn)
{
@@ -4372,9 +4373,8 @@ void __init free_bootmem_with_active_regions(int nid, unsigned long max_low_pfn)
end_pfn = min(end_pfn, max_low_pfn);

if (start_pfn < end_pfn)
- free_bootmem_node(NODE_DATA(this_nid),
- PFN_PHYS(start_pfn),
- (end_pfn - start_pfn) << PAGE_SHIFT);
+ memblock_free_early_nid(this_nid, PFN_PHYS(start_pfn),
+ (end_pfn - start_pfn) << PAGE_SHIFT);
}
}

@@ -4645,8 +4645,9 @@ static void __init setup_usemap(struct pglist_data *pgdat,
unsigned long usemapsize = usemap_size(zone_start_pfn, zonesize);
zone->pageblock_flags = NULL;
if (usemapsize)
- zone->pageblock_flags = alloc_bootmem_node_nopanic(pgdat,
- usemapsize);
+ zone->pageblock_flags =
+ memblock_early_alloc_node_nopanic(pgdat->node_id,
+ usemapsize);
}
#else
static inline void setup_usemap(struct pglist_data *pgdat, struct zone *zone,
@@ -4840,7 +4841,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
size = (end - start) * sizeof(struct page);
map = alloc_remap(pgdat->node_id, size);
if (!map)
- map = alloc_bootmem_node_nopanic(pgdat, size);
+ map = memblock_early_alloc_node_nopanic(pgdat->node_id,
+ size);
pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
}
#ifndef CONFIG_NEED_MULTIPLE_NODES
@@ -5866,7 +5868,7 @@ void *__init alloc_large_system_hash(const char *tablename,
do {
size = bucketsize << log2qty;
if (flags & HASH_EARLY)
- table = alloc_bootmem_nopanic(size);
+ table = memblock_early_alloc_nopanic(size);
else if (hashdist)
table = __vmalloc(size, GFP_ATOMIC, PAGE_KERNEL);
else {
--
1.7.9.5

2013-10-12 22:00:18

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 10/23] mm/printk: Use memblock apis for early memory allocations

Switch to memblock interfaces for early memory allocator

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
---
kernel/printk/printk.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b4e8500..8624466 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -757,14 +757,10 @@ void __init setup_log_buf(int early)
return;

if (early) {
- unsigned long mem;
-
- mem = memblock_alloc(new_log_buf_len, PAGE_SIZE);
- if (!mem)
- return;
- new_log_buf = __va(mem);
+ new_log_buf =
+ memblock_early_alloc_pages_nopanic(new_log_buf_len);
} else {
- new_log_buf = alloc_bootmem_nopanic(new_log_buf_len);
+ new_log_buf = memblock_early_alloc_nopanic(new_log_buf_len);
}

if (unlikely(!new_log_buf)) {
--
1.7.9.5

2013-10-12 22:00:16

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 14/23] mm/lib: Use memblock apis for early memory allocations

Switch to memblock interfaces for early memory allocator

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Santosh Shilimkar <[email protected]>
---
lib/cpumask.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/cpumask.c b/lib/cpumask.c
index d327b87..e85ff94 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -140,7 +140,7 @@ EXPORT_SYMBOL(zalloc_cpumask_var);
*/
void __init alloc_bootmem_cpumask_var(cpumask_var_t *mask)
{
- *mask = alloc_bootmem(cpumask_size());
+ *mask = memblock_early_alloc(cpumask_size());
}

/**
@@ -161,6 +161,6 @@ EXPORT_SYMBOL(free_cpumask_var);
*/
void __init free_bootmem_cpumask_var(cpumask_var_t mask)
{
- free_bootmem(__pa(mask), cpumask_size());
+ memblock_free_early(__pa(mask), cpumask_size());
}
#endif
--
1.7.9.5

2013-10-12 22:00:15

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 07/23] mm/memblock: debug: correct displaying of upper memory boundary

From: Grygorii Strashko <[email protected]>

When debugging is enabled (cmdline has "memblock=debug") the memblock
will display upper memory boundary per each allocated/freed memory range
wrongly. For example:
memblock_reserve: [0x0000009e7e8000-0x0000009e7ed000] _memblock_early_alloc_try_nid_nopanic+0xfc/0x12c

The 0x0000009e7ed000 is displayed instead of 0x0000009e7ecfff

Hence, correct this by changing formula used to calculate upper memory
boundary to (u64)base + size - 1 instead of (u64)base + size everywhere
in the debug messages.

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
---
mm/memblock.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index c67f4bb..d903138 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -547,7 +547,7 @@ int __init_memblock memblock_free(phys_addr_t base, phys_addr_t size)
{
memblock_dbg(" memblock_free: [%#016llx-%#016llx] %pF\n",
(unsigned long long)base,
- (unsigned long long)base + size,
+ (unsigned long long)base + size - 1,
(void *)_RET_IP_);

return __memblock_remove(&memblock.reserved, base, size);
@@ -559,7 +559,7 @@ int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)

memblock_dbg("memblock_reserve: [%#016llx-%#016llx] %pF\n",
(unsigned long long)base,
- (unsigned long long)base + size,
+ (unsigned long long)base + size - 1,
(void *)_RET_IP_);

return memblock_add_region(_rgn, base, size, MAX_NUMNODES);
@@ -914,7 +914,7 @@ void * __init memblock_early_alloc_try_nid(int nid,
void __init __memblock_free_early(phys_addr_t base, phys_addr_t size)
{
memblock_dbg("%s: [%#016llx-%#016llx] %pF\n",
- __func__, (u64)base, (u64)base + size,
+ __func__, (u64)base, (u64)base + size - 1,
(void *)_RET_IP_);
kmemleak_free_part(__va(base), size);
__memblock_remove(&memblock.reserved, base, size);
@@ -925,7 +925,7 @@ void __init __memblock_free_late(phys_addr_t base, phys_addr_t size)
u64 cursor, end;

memblock_dbg("%s: [%#016llx-%#016llx] %pF\n",
- __func__, (u64)base, (u64)base + size,
+ __func__, (u64)base, (u64)base + size - 1,
(void *)_RET_IP_);
kmemleak_free_part(__va(base), size);
cursor = PFN_UP(base);
--
1.7.9.5

2013-10-12 22:01:26

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 23/23] mm/ARM: OMAP: Use memblock apis for early memory allocations

Switch to memblock interfaces for early memory allocator

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Santosh Shilimkar <[email protected]>
---
arch/arm/mach-omap2/omap_hwmod.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c
index d9ee0ff..adfd6a2 100644
--- a/arch/arm/mach-omap2/omap_hwmod.c
+++ b/arch/arm/mach-omap2/omap_hwmod.c
@@ -2676,9 +2676,7 @@ static int __init _alloc_links(struct omap_hwmod_link **ml,
sz = sizeof(struct omap_hwmod_link) * LINKS_PER_OCP_IF;

*sl = NULL;
- *ml = alloc_bootmem(sz);
-
- memset(*ml, 0, sz);
+ *ml = memblock_early_alloc(sz);

*sl = (void *)(*ml) + sizeof(struct omap_hwmod_link);

@@ -2797,9 +2795,7 @@ static int __init _alloc_linkspace(struct omap_hwmod_ocp_if **ois)
pr_debug("omap_hwmod: %s: allocating %d byte linkspace (%d links)\n",
__func__, sz, max_ls);

- linkspace = alloc_bootmem(sz);
-
- memset(linkspace, 0, sz);
+ linkspace = memblock_early_alloc(sz);

return 0;
}
--
1.7.9.5

2013-10-12 22:00:13

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 08/23] mm/memblock: debug: don't free reserved array if !ARCH_DISCARD_MEMBLOCK

From: Grygorii Strashko <[email protected]>

Now the Nobootmem allocator will always try to free memory allocated for
reserved memory regions (free_low_memory_core_early()) without taking
into to account current memblock debugging configuration
(CONFIG_ARCH_DISCARD_MEMBLOCK and CONFIG_DEBUG_FS state).
As result if:
- CONFIG_DEBUG_FS defined
- CONFIG_ARCH_DISCARD_MEMBLOCK not defined;
- reserved memory regions array have been resized during boot

then:
- memory allocated for reserved memory regions array will be freed to
buddy allocator;
- debug_fs entry "sys/kernel/debug/memblock/reserved" will show garbage
instead of state of memory reservations. like:
0: 0x98393bc0..0x9a393bbf
1: 0xff120000..0xff11ffff
2: 0x00000000..0xffffffff

Hence, do not free memory allocated for reserved memory regions if
defined(CONFIG_DEBUG_FS) && !defined(CONFIG_ARCH_DISCARD_MEMBLOCK).

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
---
mm/memblock.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/mm/memblock.c b/mm/memblock.c
index d903138..1bb2cc0 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -169,6 +169,10 @@ phys_addr_t __init_memblock get_allocated_memblock_reserved_regions_info(
if (memblock.reserved.regions == memblock_reserved_init_regions)
return 0;

+ if (IS_ENABLED(CONFIG_DEBUG_FS) &&
+ !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK))
+ return 0;
+
*addr = __pa(memblock.reserved.regions);

return PAGE_ALIGN(sizeof(struct memblock_region) *
--
1.7.9.5

2013-10-12 22:01:44

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 21/23] mm/ARM: kernel: Use memblock apis for early memory allocations

Switch to memblock interfaces for early memory allocator

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Santosh Shilimkar <[email protected]>
---
arch/arm/kernel/devtree.c | 2 +-
arch/arm/kernel/setup.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/devtree.c b/arch/arm/kernel/devtree.c
index f35906b..881c495 100644
--- a/arch/arm/kernel/devtree.c
+++ b/arch/arm/kernel/devtree.c
@@ -33,7 +33,7 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)

void * __init early_init_dt_alloc_memory_arch(u64 size, u64 align)
{
- return alloc_bootmem_align(size, align);
+ return memblock_early_alloc_align(size, align);
}

void __init arm_dt_memblock_reserve(void)
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index e1b1394..d928500 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -707,7 +707,7 @@ static void __init request_standard_resources(const struct machine_desc *mdesc)
kernel_data.end = virt_to_phys(_end - 1);

for_each_memblock(memory, region) {
- res = alloc_bootmem_low(sizeof(*res));
+ res = memblock_early_alloc(sizeof(*res));
res->name = "System RAM";
res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
--
1.7.9.5

2013-10-12 22:01:43

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 22/23] mm/ARM: mm: Use memblock apis for early memory allocations

Switch to memblock interfaces for early memory allocator

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Santosh Shilimkar <[email protected]>
---
arch/arm/mm/init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index cef338d..091e2c9 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -414,7 +414,7 @@ free_memmap(unsigned long start_pfn, unsigned long end_pfn)
* free the section of the memmap array.
*/
if (pg < pgend)
- free_bootmem(pg, pgend - pg);
+ memblock_free_early(pg, pgend - pg);
}

/*
--
1.7.9.5

2013-10-12 22:02:26

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 16/23] mm/hugetlb: Use memblock apis for early memory allocations

From: Grygorii Strashko <[email protected]>

Switch to memblock interfaces for early memory allocator

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
---
mm/hugetlb.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b49579c..fe0cab4 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1282,9 +1282,9 @@ int __weak alloc_bootmem_huge_page(struct hstate *h)
for_each_node_mask_to_alloc(h, nr_nodes, node, &node_states[N_MEMORY]) {
void *addr;

- addr = __alloc_bootmem_node_nopanic(NODE_DATA(node),
- huge_page_size(h), huge_page_size(h), 0);
-
+ addr = memblock_early_alloc_try_nid_nopanic(node,
+ huge_page_size(h), huge_page_size(h),
+ 0, BOOTMEM_ALLOC_ACCESSIBLE);
if (addr) {
/*
* Use the beginning of the huge page to store the
@@ -1324,8 +1324,8 @@ static void __init gather_bootmem_prealloc(void)

#ifdef CONFIG_HIGHMEM
page = pfn_to_page(m->phys >> PAGE_SHIFT);
- free_bootmem_late((unsigned long)m,
- sizeof(struct huge_bootmem_page));
+ memblock_free_late(__pa(m),
+ sizeof(struct huge_bootmem_page));
#else
page = virt_to_page(m);
#endif
--
1.7.9.5

2013-10-12 22:02:47

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 19/23] mm/memory_hotplug: Use memblock apis for early memory allocations

Correct ensure_zone_is_initialized() function description according
to the introduced memblock APIs for early memory allocations.

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
---
mm/memory_hotplug.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f7bda5e..482255b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -267,7 +267,7 @@ static void fix_zone_id(struct zone *zone, unsigned long start_pfn,
}

/* Can fail with -ENOMEM from allocating a wait table with vmalloc() or
- * alloc_bootmem_node_nopanic() */
+ * alloc_bootmem_node_nopanic()/memblock_early_alloc_node_nopanic() */
static int __ref ensure_zone_is_initialized(struct zone *zone,
unsigned long start_pfn, unsigned long num_pages)
{
--
1.7.9.5

2013-10-12 22:02:49

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 18/23] mm/percpu: Use memblock apis for early memory allocations

Switch to memblock interfaces for early memory allocator

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Santosh Shilimkar <[email protected]>
---
mm/percpu.c | 39 +++++++++++++++++++++++----------------
1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 8c8e08f..0b2117f 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1063,7 +1063,7 @@ struct pcpu_alloc_info * __init pcpu_alloc_alloc_info(int nr_groups,
__alignof__(ai->groups[0].cpu_map[0]));
ai_size = base_size + nr_units * sizeof(ai->groups[0].cpu_map[0]);

- ptr = alloc_bootmem_nopanic(PFN_ALIGN(ai_size));
+ ptr = memblock_early_alloc_pages_nopanic(PFN_ALIGN(ai_size));
if (!ptr)
return NULL;
ai = ptr;
@@ -1088,7 +1088,7 @@ struct pcpu_alloc_info * __init pcpu_alloc_alloc_info(int nr_groups,
*/
void __init pcpu_free_alloc_info(struct pcpu_alloc_info *ai)
{
- free_bootmem(__pa(ai), ai->__ai_size);
+ memblock_free_early(__pa(ai), ai->__ai_size);
}

/**
@@ -1246,10 +1246,12 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
PCPU_SETUP_BUG_ON(pcpu_verify_alloc_info(ai) < 0);

/* process group information and build config tables accordingly */
- group_offsets = alloc_bootmem(ai->nr_groups * sizeof(group_offsets[0]));
- group_sizes = alloc_bootmem(ai->nr_groups * sizeof(group_sizes[0]));
- unit_map = alloc_bootmem(nr_cpu_ids * sizeof(unit_map[0]));
- unit_off = alloc_bootmem(nr_cpu_ids * sizeof(unit_off[0]));
+ group_offsets = memblock_early_alloc(ai->nr_groups *
+ sizeof(group_offsets[0]));
+ group_sizes = memblock_early_alloc(ai->nr_groups *
+ sizeof(group_sizes[0]));
+ unit_map = memblock_early_alloc(nr_cpu_ids * sizeof(unit_map[0]));
+ unit_off = memblock_early_alloc(nr_cpu_ids * sizeof(unit_off[0]));

for (cpu = 0; cpu < nr_cpu_ids; cpu++)
unit_map[cpu] = UINT_MAX;
@@ -1311,7 +1313,7 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
* empty chunks.
*/
pcpu_nr_slots = __pcpu_size_to_slot(pcpu_unit_size) + 2;
- pcpu_slot = alloc_bootmem(pcpu_nr_slots * sizeof(pcpu_slot[0]));
+ pcpu_slot = memblock_early_alloc(pcpu_nr_slots * sizeof(pcpu_slot[0]));
for (i = 0; i < pcpu_nr_slots; i++)
INIT_LIST_HEAD(&pcpu_slot[i]);

@@ -1322,7 +1324,7 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
* covers static area + reserved area (mostly used for module
* static percpu allocation).
*/
- schunk = alloc_bootmem(pcpu_chunk_struct_size);
+ schunk = memblock_early_alloc(pcpu_chunk_struct_size);
INIT_LIST_HEAD(&schunk->list);
schunk->base_addr = base_addr;
schunk->map = smap;
@@ -1346,7 +1348,7 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,

/* init dynamic chunk if necessary */
if (dyn_size) {
- dchunk = alloc_bootmem(pcpu_chunk_struct_size);
+ dchunk = memblock_early_alloc(pcpu_chunk_struct_size);
INIT_LIST_HEAD(&dchunk->list);
dchunk->base_addr = base_addr;
dchunk->map = dmap;
@@ -1626,7 +1628,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
size_sum = ai->static_size + ai->reserved_size + ai->dyn_size;
areas_size = PFN_ALIGN(ai->nr_groups * sizeof(void *));

- areas = alloc_bootmem_nopanic(areas_size);
+ areas = memblock_early_alloc_pages_nopanic(areas_size);
if (!areas) {
rc = -ENOMEM;
goto out_free;
@@ -1711,7 +1713,7 @@ out_free_areas:
out_free:
pcpu_free_alloc_info(ai);
if (areas)
- free_bootmem(__pa(areas), areas_size);
+ memblock_free_early(__pa(areas), areas_size);
return rc;
}
#endif /* BUILD_EMBED_FIRST_CHUNK */
@@ -1759,7 +1761,7 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
/* unaligned allocations can't be freed, round up to page size */
pages_size = PFN_ALIGN(unit_pages * num_possible_cpus() *
sizeof(pages[0]));
- pages = alloc_bootmem(pages_size);
+ pages = memblock_early_alloc_pages(pages_size);

/* allocate pages */
j = 0;
@@ -1822,7 +1824,7 @@ enomem:
free_fn(page_address(pages[j]), PAGE_SIZE);
rc = -ENOMEM;
out_free_ar:
- free_bootmem(__pa(pages), pages_size);
+ memblock_free_early(__pa(pages), pages_size);
pcpu_free_alloc_info(ai);
return rc;
}
@@ -1847,12 +1849,14 @@ EXPORT_SYMBOL(__per_cpu_offset);
static void * __init pcpu_dfl_fc_alloc(unsigned int cpu, size_t size,
size_t align)
{
- return __alloc_bootmem_nopanic(size, align, __pa(MAX_DMA_ADDRESS));
+ return memblock_early_alloc_try_nid_nopanic(MAX_NUMNODES, size, align,
+ __pa(MAX_DMA_ADDRESS),
+ BOOTMEM_ALLOC_ACCESSIBLE);
}

static void __init pcpu_dfl_fc_free(void *ptr, size_t size)
{
- free_bootmem(__pa(ptr), size);
+ memblock_free_early(__pa(ptr), size);
}

void __init setup_per_cpu_areas(void)
@@ -1895,7 +1899,10 @@ void __init setup_per_cpu_areas(void)
void *fc;

ai = pcpu_alloc_alloc_info(1, 1);
- fc = __alloc_bootmem(unit_size, PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
+ fc = memblock_early_alloc_try_nid_nopanic(MAX_NUMNODES, unit_size,
+ PAGE_SIZE,
+ __pa(MAX_DMA_ADDRESS),
+ BOOTMEM_ALLOC_ACCESSIBLE);
if (!ai || !fc)
panic("Failed to allocate memory for percpu areas.");
/* kmemleak tracks the percpu allocations separately */
--
1.7.9.5

2013-10-12 22:00:08

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 09/23] mm/init: Use memblock apis for early memory allocations

Switch to memblock interfaces for early memory allocator

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Santosh Shilimkar <[email protected]>
---
init/main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/init/main.c b/init/main.c
index af310af..e8d382a 100644
--- a/init/main.c
+++ b/init/main.c
@@ -346,8 +346,8 @@ static inline void smp_prepare_cpus(unsigned int maxcpus) { }
*/
static void __init setup_command_line(char *command_line)
{
- saved_command_line = alloc_bootmem(strlen (boot_command_line)+1);
- static_command_line = alloc_bootmem(strlen (command_line)+1);
+ saved_command_line = memblock_early_alloc(strlen(boot_command_line)+1);
+ static_command_line = memblock_early_alloc(strlen(command_line)+1);
strcpy (saved_command_line, boot_command_line);
strcpy (static_command_line, command_line);
}
--
1.7.9.5

2013-10-12 22:03:31

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 20/23] mm/firmware: Use memblock apis for early memory allocations

Switch to memblock interfaces for early memory allocator

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
---
drivers/firmware/memmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/firmware/memmap.c b/drivers/firmware/memmap.c
index e2e04b0..fa8a789 100644
--- a/drivers/firmware/memmap.c
+++ b/drivers/firmware/memmap.c
@@ -324,7 +324,7 @@ int __init firmware_map_add_early(u64 start, u64 end, const char *type)
{
struct firmware_map_entry *entry;

- entry = alloc_bootmem(sizeof(struct firmware_map_entry));
+ entry = memblock_early_alloc(sizeof(struct firmware_map_entry));
if (WARN_ON(!entry))
return -ENOMEM;

--
1.7.9.5

2013-10-12 22:03:33

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 15/23] mm/sparse: Use memblock apis for early memory allocations

Switch to memblock interfaces for early memory allocator

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Santosh Shilimkar <[email protected]>
---
mm/sparse-vmemmap.c | 5 +++--
mm/sparse.c | 24 +++++++++++++-----------
2 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 27eeab3..c1fb952 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -40,7 +40,7 @@ static void * __init_refok __earlyonly_bootmem_alloc(int node,
unsigned long align,
unsigned long goal)
{
- return __alloc_bootmem_node_high(NODE_DATA(node), size, align, goal);
+ return memblock_early_alloc_try_nid(node, size, align, goal, BOOTMEM_ALLOC_ACCESSIBLE);
}

static void *vmemmap_buf;
@@ -226,7 +226,8 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,

if (vmemmap_buf_start) {
/* need to free left buf */
- free_bootmem(__pa(vmemmap_buf), vmemmap_buf_end - vmemmap_buf);
+ memblock_free_early(__pa(vmemmap_buf),
+ vmemmap_buf_end - vmemmap_buf);
vmemmap_buf = NULL;
vmemmap_buf_end = NULL;
}
diff --git a/mm/sparse.c b/mm/sparse.c
index 4ac1d7e..1e06a60 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -69,7 +69,7 @@ static struct mem_section noinline __init_refok *sparse_index_alloc(int nid)
else
section = kzalloc(array_size, GFP_KERNEL);
} else {
- section = alloc_bootmem_node(NODE_DATA(nid), array_size);
+ section = memblock_early_alloc_node(nid, array_size);
}

return section;
@@ -279,7 +279,7 @@ sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
limit = goal + (1UL << PA_SECTION_SHIFT);
nid = early_pfn_to_nid(goal >> PAGE_SHIFT);
again:
- p = ___alloc_bootmem_node_nopanic(NODE_DATA(nid), size,
+ p = memblock_early_alloc_try_nid_nopanic(NODE_DATA(nid), size,
SMP_CACHE_BYTES, goal, limit);
if (!p && limit) {
limit = 0;
@@ -331,7 +331,7 @@ static unsigned long * __init
sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat,
unsigned long size)
{
- return alloc_bootmem_node_nopanic(pgdat, size);
+ return memblock_early_alloc_node_nopanic(pgdat->node_id, size);
}

static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
@@ -376,8 +376,9 @@ struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid)
return map;

size = PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
- map = __alloc_bootmem_node_high(NODE_DATA(nid), size,
- PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
+ map = memblock_early_alloc_try_nid(nid, size,
+ PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
+ BOOTMEM_ALLOC_ACCESSIBLE);
return map;
}
void __init sparse_mem_maps_populate_node(struct page **map_map,
@@ -401,8 +402,9 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
}

size = PAGE_ALIGN(size);
- map = __alloc_bootmem_node_high(NODE_DATA(nodeid), size * map_count,
- PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
+ map = memblock_early_alloc_try_nid(nodeid, size * map_count,
+ PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
+ BOOTMEM_ALLOC_ACCESSIBLE);
if (map) {
for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
if (!present_section_nr(pnum))
@@ -545,7 +547,7 @@ void __init sparse_init(void)
* sparse_early_mem_map_alloc, so allocate usemap_map at first.
*/
size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
- usemap_map = alloc_bootmem(size);
+ usemap_map = memblock_early_alloc(size);
if (!usemap_map)
panic("can not allocate usemap_map\n");
alloc_usemap_and_memmap(sparse_early_usemaps_alloc_node,
@@ -553,7 +555,7 @@ void __init sparse_init(void)

#ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
- map_map = alloc_bootmem(size2);
+ map_map = memblock_early_alloc(size2);
if (!map_map)
panic("can not allocate map_map\n");
alloc_usemap_and_memmap(sparse_early_mem_maps_alloc_node,
@@ -583,9 +585,9 @@ void __init sparse_init(void)
vmemmap_populate_print_last();

#ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
- free_bootmem(__pa(map_map), size2);
+ memblock_free_early(__pa(map_map), size2);
#endif
- free_bootmem(__pa(usemap_map), size);
+ memblock_free_early(__pa(usemap_map), size);
}

#ifdef CONFIG_MEMORY_HOTPLUG
--
1.7.9.5

2013-10-12 22:03:59

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 17/23] mm/page_cgroup: Use memblock apis for early memory allocations

From: Grygorii Strashko <[email protected]>

Switch to memblock interfaces for early memory allocator

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
---
mm/page_cgroup.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 6d757e3..7428f4c 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -54,8 +54,9 @@ static int __init alloc_node_page_cgroup(int nid)

table_size = sizeof(struct page_cgroup) * nr_pages;

- base = __alloc_bootmem_node_nopanic(NODE_DATA(nid),
- table_size, PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
+ base = memblock_early_alloc_try_nid_nopanic(nid,
+ table_size, PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
+ BOOTMEM_ALLOC_ACCESSIBLE);
if (!base)
return -ENOMEM;
NODE_DATA(nid)->node_page_cgroup = base;
--
1.7.9.5

2013-10-12 22:00:06

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 06/23] mm/memblock: Add memblock early memory allocation apis

Introduce memblock early memory allocation APIs which allow to support
LPAE extension on 32 bits archs. More over, this is the next step
to get rid of NO_BOOTMEM memblock wrapper(nobootmem.c) and directly use
memblock APIs.

The proposed interface will became active if both CONFIG_HAVE_MEMBLOCK
and CONFIG_NO_BOOTMEM are specified by arch. In case !CONFIG_NO_BOOTMEM,
the memblock() wrappers will fallback to the existing bootmem apis so
that arch's noy converted to NO_BOOTMEM continue to work as is.

The meaning of MEMBLOCK_ALLOC_ACCESSIBLE and MEMBLOCK_ALLOC_ANYWHERE is
kept same.

TODO: Now the free_all_bootmem() function is used as is from NO_BOOTMEM
allocator. Can be moved to memblock file once we remove the no-bootmem.c

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
---
include/linux/bootmem.h | 72 ++++++++++++++++++++++++++++++
mm/memblock.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 186 insertions(+)

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index 55d52fb..33b27bb 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -141,6 +141,78 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
#define alloc_bootmem_low_pages_node(pgdat, x) \
__alloc_bootmem_low_node(pgdat, x, PAGE_SIZE, 0)

+
+#if defined(CONFIG_HAVE_MEMBLOCK) && defined(CONFIG_NO_BOOTMEM)
+
+/* FIXME: use MEMBLOCK_ALLOC_* variants here */
+#define BOOTMEM_ALLOC_ACCESSIBLE 0
+#define BOOTMEM_ALLOC_ANYWHERE (~(phys_addr_t)0)
+
+/* FIXME: Move to memblock.h at a point where we remove nobootmem.c */
+void *memblock_early_alloc_try_nid_nopanic(int nid, phys_addr_t size,
+ phys_addr_t align, phys_addr_t from, phys_addr_t max_addr);
+void *memblock_early_alloc_try_nid(int nid, phys_addr_t size,
+ phys_addr_t align, phys_addr_t from, phys_addr_t max_addr);
+void __memblock_free_early(phys_addr_t base, phys_addr_t size);
+void __memblock_free_late(phys_addr_t base, phys_addr_t size);
+
+#define memblock_early_alloc(x) \
+ memblock_early_alloc_try_nid(MAX_NUMNODES, x, SMP_CACHE_BYTES, \
+ BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
+#define memblock_early_alloc_align(x, align) \
+ memblock_early_alloc_try_nid(MAX_NUMNODES, x, align, \
+ BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
+#define memblock_early_alloc_nopanic(x) \
+ memblock_early_alloc_try_nid_nopanic(MAX_NUMNODES, x, SMP_CACHE_BYTES, \
+ BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
+#define memblock_early_alloc_pages(x) \
+ memblock_early_alloc_try_nid(MAX_NUMNODES, x, PAGE_SIZE, \
+ BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
+#define memblock_early_alloc_pages_nopanic(x) \
+ memblock_early_alloc_try_nid_nopanic(MAX_NUMNODES, x, PAGE_SIZE, \
+ BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
+#define memblock_early_alloc_node(nid, x) \
+ memblock_early_alloc_try_nid(nid, x, SMP_CACHE_BYTES, \
+ BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
+#define memblock_early_alloc_node_nopanic(nid, x) \
+ memblock_early_alloc_try_nid_nopanic(nid, x, SMP_CACHE_BYTES, \
+ BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
+
+#define memblock_free_early(x, s) __memblock_free_early(x, s)
+#define memblock_free_early_nid(nid, x, s) __memblock_free_early(x, s)
+#define memblock_free_late(x, s) __memblock_free_late(x, s)
+
+#else
+
+/* Fall back to all the existing bootmem APIs */
+#define memblock_early_alloc(x) \
+ __alloc_bootmem(x, SMP_CACHE_BYTES, BOOTMEM_LOW_LIMIT)
+#define memblock_early_alloc_align(x, align) \
+ __alloc_bootmem(x, align, BOOTMEM_LOW_LIMIT)
+#define memblock_early_alloc_nopanic(x) \
+ __alloc_bootmem_nopanic(x, SMP_CACHE_BYTES, BOOTMEM_LOW_LIMIT)
+#define memblock_early_alloc_pages(x) \
+ __alloc_bootmem(MAX_NUMNODES, x, PAGE_SIZE)
+#define memblock_early_alloc_pages_nopanic(x) \
+ __alloc_bootmem_nopanic(x, PAGE_SIZE, BOOTMEM_LOW_LIMIT)
+#define memblock_early_alloc_node(nid, x) \
+ __alloc_bootmem_node(NODE_DATA(nid), x, SMP_CACHE_BYTES, \
+ BOOTMEM_LOW_LIMIT)
+#define memblock_early_alloc_node_nopanic(nid, x) \
+ __alloc_bootmem_node_nopanic(NODE_DATA(nid), x, SMP_CACHE_BYTES, \
+ BOOTMEM_LOW_LIMIT)
+#define memblock_early_alloc_try_nid(nid, size, align, from, max_addr) \
+ __alloc_bootmem_node_high(NODE_DATA(nid), size, align, from)
+#define memblock_early_alloc_try_nid_nopanic(nid, size, align, from, max_addr) \
+ ___alloc_bootmem_node_nopanic(NODE_DATA(nid), size, align, \
+ from, max_addr)
+#define memblock_free_early(x, s) free_bootmem(x, s)
+#define memblock_free_early_nid(nid, x, s) \
+ free_bootmem_node(NODE_DATA(nid), x, s)
+#define memblock_free_late(x, s) free_bootmem_late(x, s)
+
+#endif /* defined(CONFIG_HAVE_MEMBLOCK) && defined(CONFIG_NO_BOOTMEM) */
+
#ifdef CONFIG_HAVE_ARCH_ALLOC_REMAP
extern void *alloc_remap(int nid, unsigned long size);
#else
diff --git a/mm/memblock.c b/mm/memblock.c
index 0ac412a..c67f4bb 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -20,6 +20,8 @@
#include <linux/seq_file.h>
#include <linux/memblock.h>

+#include "internal.h"
+
static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;

@@ -822,6 +824,118 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
}

+static void * __init _memblock_early_alloc_try_nid_nopanic(int nid,
+ phys_addr_t size, phys_addr_t align,
+ phys_addr_t from, phys_addr_t max_addr)
+{
+ phys_addr_t alloc;
+ void *ptr;
+
+ if (WARN_ON_ONCE(slab_is_available())) {
+ if (nid == MAX_NUMNODES)
+ return kzalloc(size, GFP_NOWAIT);
+ else
+ return kzalloc_node(size, GFP_NOWAIT, nid);
+ }
+
+ if (WARN_ON(!align))
+ align = __alignof__(long long);
+
+ /* align @size to avoid excessive fragmentation on reserved array */
+ size = round_up(size, align);
+
+again:
+ alloc = memblock_find_in_range_node(from, max_addr, size, align, nid);
+ if (alloc)
+ goto done;
+
+ if (nid != MAX_NUMNODES) {
+ alloc =
+ memblock_find_in_range_node(from, max_addr, size,
+ align, MAX_NUMNODES);
+ if (alloc)
+ goto done;
+ }
+
+ if (from) {
+ from = 0;
+ goto again;
+ } else {
+ goto error;
+ }
+
+done:
+ memblock_reserve(alloc, size);
+ ptr = phys_to_virt(alloc);
+ memset(ptr, 0, size);
+
+ /*
+ * The min_count is set to 0 so that bootmem allocated blocks
+ * are never reported as leaks.
+ */
+ kmemleak_alloc(ptr, size, 0, 0);
+
+ return ptr;
+
+error:
+ return NULL;
+}
+
+void * __init memblock_early_alloc_try_nid_nopanic(int nid,
+ phys_addr_t size, phys_addr_t align,
+ phys_addr_t from, phys_addr_t max_addr)
+{
+ memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=0x%llx max_addr=0x%llx %pF\n",
+ __func__, (u64)size, (u64)align, nid, (u64)from,
+ (u64)max_addr, (void *)_RET_IP_);
+ return _memblock_early_alloc_try_nid_nopanic(nid, size,
+ align, from, max_addr);
+}
+
+void * __init memblock_early_alloc_try_nid(int nid,
+ phys_addr_t size, phys_addr_t align,
+ phys_addr_t from, phys_addr_t max_addr)
+{
+ void *ptr;
+
+ memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=0x%llx max_addr=0x%llx %pF\n",
+ __func__, (u64)size, (u64)align, nid, (u64)from,
+ (u64)max_addr, (void *)_RET_IP_);
+ ptr = _memblock_early_alloc_try_nid_nopanic(nid, size,
+ align, from, max_addr);
+ if (ptr)
+ return ptr;
+
+ panic("%s: Failed to allocate %llu bytes align=0x%llx nid=%d from=0x%llx max_addr=0x%llx\n",
+ __func__, (u64)size, (u64)align, nid, (u64)from, (u64)max_addr);
+ return NULL;
+}
+
+void __init __memblock_free_early(phys_addr_t base, phys_addr_t size)
+{
+ memblock_dbg("%s: [%#016llx-%#016llx] %pF\n",
+ __func__, (u64)base, (u64)base + size,
+ (void *)_RET_IP_);
+ kmemleak_free_part(__va(base), size);
+ __memblock_remove(&memblock.reserved, base, size);
+}
+
+void __init __memblock_free_late(phys_addr_t base, phys_addr_t size)
+{
+ u64 cursor, end;
+
+ memblock_dbg("%s: [%#016llx-%#016llx] %pF\n",
+ __func__, (u64)base, (u64)base + size,
+ (void *)_RET_IP_);
+ kmemleak_free_part(__va(base), size);
+ cursor = PFN_UP(base);
+ end = PFN_DOWN(base + size);
+
+ for (; cursor < end; cursor++) {
+ __free_pages_bootmem(pfn_to_page(cursor), 0);
+ totalram_pages++;
+ }
+}

/*
* Remaining API functions
--
1.7.9.5

2013-10-12 22:00:04

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 02/23] mm/block: remove unnecessary inclusion of bootmem.h

From: Grygorii Strashko <[email protected]>

Clean-up to remove depedency with bootmem headers.

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
---
block/blk-ioc.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 46cd7bd..242df01 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -6,7 +6,6 @@
#include <linux/init.h>
#include <linux/bio.h>
#include <linux/blkdev.h>
-#include <linux/bootmem.h> /* for max_pfn/max_low_pfn */
#include <linux/slab.h>

#include "blk.h"
--
1.7.9.5

2013-10-12 22:04:41

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 13/23] mm/lib: Use memblock apis for early memory allocations

Switch to memblock interfaces for early memory allocator

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Santosh Shilimkar <[email protected]>
---
lib/swiotlb.c | 30 ++++++++++++++++--------------
1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 4e8686c..504de6d 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -169,7 +169,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
/*
* Get the overflow emergency buffer
*/
- v_overflow_buffer = alloc_bootmem_low_pages_nopanic(
+ v_overflow_buffer = memblock_early_alloc_pages_nopanic(
PAGE_ALIGN(io_tlb_overflow));
if (!v_overflow_buffer)
return -ENOMEM;
@@ -181,11 +181,13 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
* to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
* between io_tlb_start and io_tlb_end.
*/
- io_tlb_list = alloc_bootmem_pages(PAGE_ALIGN(io_tlb_nslabs * sizeof(int)));
+ io_tlb_list = memblock_early_alloc_pages(
+ PAGE_ALIGN(io_tlb_nslabs * sizeof(int)));
for (i = 0; i < io_tlb_nslabs; i++)
io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
io_tlb_index = 0;
- io_tlb_orig_addr = alloc_bootmem_pages(PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t)));
+ io_tlb_orig_addr = memblock_early_alloc_pages(
+ PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t)));

if (verbose)
swiotlb_print_info();
@@ -212,13 +214,13 @@ swiotlb_init(int verbose)
bytes = io_tlb_nslabs << IO_TLB_SHIFT;

/* Get IO TLB memory from the low pages */
- vstart = alloc_bootmem_low_pages_nopanic(PAGE_ALIGN(bytes));
+ vstart = memblock_early_alloc_pages_nopanic(PAGE_ALIGN(bytes));
if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
return;

if (io_tlb_start)
- free_bootmem(io_tlb_start,
- PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
+ memblock_free_early(io_tlb_start,
+ PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
pr_warn("Cannot allocate SWIOTLB buffer");
no_iotlb_memory = true;
}
@@ -354,14 +356,14 @@ void __init swiotlb_free(void)
free_pages((unsigned long)phys_to_virt(io_tlb_start),
get_order(io_tlb_nslabs << IO_TLB_SHIFT));
} else {
- free_bootmem_late(io_tlb_overflow_buffer,
- PAGE_ALIGN(io_tlb_overflow));
- free_bootmem_late(__pa(io_tlb_orig_addr),
- PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t)));
- free_bootmem_late(__pa(io_tlb_list),
- PAGE_ALIGN(io_tlb_nslabs * sizeof(int)));
- free_bootmem_late(io_tlb_start,
- PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
+ memblock_free_late(io_tlb_overflow_buffer,
+ PAGE_ALIGN(io_tlb_overflow));
+ memblock_free_late(__pa(io_tlb_orig_addr),
+ PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t)));
+ memblock_free_late(__pa(io_tlb_list),
+ PAGE_ALIGN(io_tlb_nslabs * sizeof(int)));
+ memblock_free_late(io_tlb_start,
+ PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
}
io_tlb_nslabs = 0;
}
--
1.7.9.5

2013-10-12 22:04:58

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 12/23] mm/power: Use memblock apis for early memory allocations

Switch to memblock interfaces for early memory allocator

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Santosh Shilimkar <[email protected]>
---
kernel/power/snapshot.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 358a146..26cbb4c 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -637,7 +637,7 @@ __register_nosave_region(unsigned long start_pfn, unsigned long end_pfn,
BUG_ON(!region);
} else
/* This allocation cannot fail */
- region = alloc_bootmem(sizeof(struct nosave_region));
+ region = memblock_early_alloc(sizeof(struct nosave_region));
region->start_pfn = start_pfn;
region->end_pfn = end_pfn;
list_add_tail(&region->list, &nosave_regions);
--
1.7.9.5

2013-10-12 22:00:01

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 03/23] mm/memory_hotplug: remove unnecessary inclusion of bootmem.h

From: Grygorii Strashko <[email protected]>

Clean-up to remove depedency with bootmem headers.

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
---
mm/memory_hotplug.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ed85fe3..f7bda5e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -9,7 +9,6 @@
#include <linux/swap.h>
#include <linux/interrupt.h>
#include <linux/pagemap.h>
-#include <linux/bootmem.h>
#include <linux/compiler.h>
#include <linux/export.h>
#include <linux/pagevec.h>
--
1.7.9.5

2013-10-12 21:59:39

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC 05/23] mm/char: remove unnecessary inclusion of bootmem.h

From: Grygorii Strashko <[email protected]>

Clean-up to remove depedency with bootmem headers.

Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>

Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
---
drivers/char/mem.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index f895a8c..92c5937 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -22,7 +22,6 @@
#include <linux/device.h>
#include <linux/highmem.h>
#include <linux/backing-dev.h>
-#include <linux/bootmem.h>
#include <linux/splice.h>
#include <linux/pfn.h>
#include <linux/export.h>
--
1.7.9.5

2013-10-13 17:56:56

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC 06/23] mm/memblock: Add memblock early memory allocation apis

Hello,

On Sat, Oct 12, 2013 at 05:58:49PM -0400, Santosh Shilimkar wrote:
> Introduce memblock early memory allocation APIs which allow to support
> LPAE extension on 32 bits archs. More over, this is the next step

LPAE isn't something people outside arm circle would understand.
Let's stick to highmem.

> to get rid of NO_BOOTMEM memblock wrapper(nobootmem.c) and directly use
> memblock APIs.
>
> The proposed interface will became active if both CONFIG_HAVE_MEMBLOCK
> and CONFIG_NO_BOOTMEM are specified by arch. In case !CONFIG_NO_BOOTMEM,
> the memblock() wrappers will fallback to the existing bootmem apis so
> that arch's noy converted to NO_BOOTMEM continue to work as is.
^^^
typo

> +/* FIXME: Move to memblock.h at a point where we remove nobootmem.c */
> +void *memblock_early_alloc_try_nid_nopanic(int nid, phys_addr_t size,
> + phys_addr_t align, phys_addr_t from, phys_addr_t max_addr);
> +void *memblock_early_alloc_try_nid(int nid, phys_addr_t size,
> + phys_addr_t align, phys_addr_t from, phys_addr_t max_addr);

Wouldn't it make more sense to put @nid at the end. @size is the main
parameter here and it gets confusing with _alloc_node() interface as
the positions of paramters change. Plus, kmalloc_node() puts @node at
the end too.

> +void __memblock_free_early(phys_addr_t base, phys_addr_t size);
> +void __memblock_free_late(phys_addr_t base, phys_addr_t size);

Would it be possible to drop "early"? It's redundant and makes the
function names unnecessarily long. When memblock is enabled, these
are basically doing about the same thing as memblock_alloc() and
friends, right? Wouldn't it make more sense to define these as
memblock_alloc_XXX()?

> +#define memblock_early_alloc(x) \
> + memblock_early_alloc_try_nid(MAX_NUMNODES, x, SMP_CACHE_BYTES, \
> + BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
> +#define memblock_early_alloc_align(x, align) \
> + memblock_early_alloc_try_nid(MAX_NUMNODES, x, align, \
> + BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
> +#define memblock_early_alloc_nopanic(x) \
> + memblock_early_alloc_try_nid_nopanic(MAX_NUMNODES, x, SMP_CACHE_BYTES, \
> + BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
> +#define memblock_early_alloc_pages(x) \
> + memblock_early_alloc_try_nid(MAX_NUMNODES, x, PAGE_SIZE, \
> + BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
> +#define memblock_early_alloc_pages_nopanic(x) \
> + memblock_early_alloc_try_nid_nopanic(MAX_NUMNODES, x, PAGE_SIZE, \
> + BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)

I always felt a bit weird about _pages() interface. It says pages but
takes bytes in size. Maybe we're better off just converting the
current _pages users to _alloc_align()?

> +#define memblock_early_alloc_node(nid, x) \
> + memblock_early_alloc_try_nid(nid, x, SMP_CACHE_BYTES, \
> + BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
> +#define memblock_early_alloc_node_nopanic(nid, x) \
> + memblock_early_alloc_try_nid_nopanic(nid, x, SMP_CACHE_BYTES, \
> + BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)

Ditto as above. Maybe @nid can be moved to the end?

> +static void * __init _memblock_early_alloc_try_nid_nopanic(int nid,
> + phys_addr_t size, phys_addr_t align,
> + phys_addr_t from, phys_addr_t max_addr)
> +{
> + phys_addr_t alloc;
> + void *ptr;
> +
> + if (WARN_ON_ONCE(slab_is_available())) {
> + if (nid == MAX_NUMNODES)

Shouldn't we be using NUMA_NO_NODE?

> + return kzalloc(size, GFP_NOWAIT);
> + else
> + return kzalloc_node(size, GFP_NOWAIT, nid);

And kzalloc_node() understands NUMA_NO_NODE.

> + }
> +
> + if (WARN_ON(!align))
> + align = __alignof__(long long);

Wouldn't SMP_CACHE_BYTES make more sense? Also, I'm not sure we
actually want WARN on it. Interpreting 0 as "default align" isn't
that weird.

> + /* align @size to avoid excessive fragmentation on reserved array */
> + size = round_up(size, align);
> +
> +again:
> + alloc = memblock_find_in_range_node(from, max_addr, size, align, nid);
> + if (alloc)
> + goto done;
> +
> + if (nid != MAX_NUMNODES) {
> + alloc =
> + memblock_find_in_range_node(from, max_addr, size,
> + align, MAX_NUMNODES);
> + if (alloc)
> + goto done;
> + }
> +
> + if (from) {
> + from = 0;
> + goto again;
> + } else {
> + goto error;
> + }
> +
> +done:
> + memblock_reserve(alloc, size);
> + ptr = phys_to_virt(alloc);
> + memset(ptr, 0, size);

What if the address is high? Don't we need kmapping here?

> +
> + /*
> + * The min_count is set to 0 so that bootmem allocated blocks
> + * are never reported as leaks.
> + */
> + kmemleak_alloc(ptr, size, 0, 0);
> +
> + return ptr;
> +
> +error:
> + return NULL;
> +}
> +
> +void * __init memblock_early_alloc_try_nid_nopanic(int nid,
> + phys_addr_t size, phys_addr_t align,
> + phys_addr_t from, phys_addr_t max_addr)
> +{
> + memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=0x%llx max_addr=0x%llx %pF\n",
> + __func__, (u64)size, (u64)align, nid, (u64)from,
> + (u64)max_addr, (void *)_RET_IP_);
> + return _memblock_early_alloc_try_nid_nopanic(nid, size,
> + align, from, max_addr);

Do we need the extra level of wrapping? Just implement
alloc_try_nid_nopanic() here and make the panicky version call it?

Thanks.

--
tejun

2013-10-13 18:01:22

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC 06/23] mm/memblock: Add memblock early memory allocation apis

On Sun, Oct 13, 2013 at 01:56:48PM -0400, Tejun Heo wrote:
> Hello,
>
> On Sat, Oct 12, 2013 at 05:58:49PM -0400, Santosh Shilimkar wrote:
> > Introduce memblock early memory allocation APIs which allow to support
> > LPAE extension on 32 bits archs. More over, this is the next step
>
> LPAE isn't something people outside arm circle would understand.
> Let's stick to highmem.

LPAE != highmem. Two totally different things, unless you believe
system memory always starts at physical address zero, which is very
far from the case on the majority of ARM platforms.

So replacing LPAE with "highmem" is pure misrepresentation and is
inaccurate. PAE might be a better term, and is also the x86 term
for this.

2013-10-13 18:02:33

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC 07/23] mm/memblock: debug: correct displaying of upper memory boundary

On Sat, Oct 12, 2013 at 05:58:50PM -0400, Santosh Shilimkar wrote:
> From: Grygorii Strashko <[email protected]>
>
> When debugging is enabled (cmdline has "memblock=debug") the memblock
> will display upper memory boundary per each allocated/freed memory range
> wrongly. For example:
> memblock_reserve: [0x0000009e7e8000-0x0000009e7ed000] _memblock_early_alloc_try_nid_nopanic+0xfc/0x12c
>
> The 0x0000009e7ed000 is displayed instead of 0x0000009e7ecfff
>
> Hence, correct this by changing formula used to calculate upper memory
> boundary to (u64)base + size - 1 instead of (u64)base + size everywhere
> in the debug messages.

I kinda prefer base + size because it's easier to actually know the
size but yeah, it should have been [base, base + size) and other
places use base + size - 1 notation so it probably is better to stick
to that. Maybe move this one to the beginning of the series?

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2013-10-13 18:42:17

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC 06/23] mm/memblock: Add memblock early memory allocation apis

On Sun, Oct 13, 2013 at 07:00:59PM +0100, Russell King - ARM Linux wrote:
> On Sun, Oct 13, 2013 at 01:56:48PM -0400, Tejun Heo wrote:
> > Hello,
> >
> > On Sat, Oct 12, 2013 at 05:58:49PM -0400, Santosh Shilimkar wrote:
> > > Introduce memblock early memory allocation APIs which allow to support
> > > LPAE extension on 32 bits archs. More over, this is the next step
> >
> > LPAE isn't something people outside arm circle would understand.
> > Let's stick to highmem.
>
> LPAE != highmem. Two totally different things, unless you believe
> system memory always starts at physical address zero, which is very
> far from the case on the majority of ARM platforms.
>
> So replacing LPAE with "highmem" is pure misrepresentation and is
> inaccurate. PAE might be a better term, and is also the x86 term
> for this.

Ah, right, forgot about the base address. Let's please spell out the
requirements then. Briefly explaining both aspects (non-zero base
addr & highmem) and why the existing bootmem based interfaced can't
serve them would be helpful to later readers.

Thanks.

--
tejun

2013-10-13 19:51:16

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC 08/23] mm/memblock: debug: don't free reserved array if !ARCH_DISCARD_MEMBLOCK

On Sat, Oct 12, 2013 at 05:58:51PM -0400, Santosh Shilimkar wrote:
> From: Grygorii Strashko <[email protected]>
>
> Now the Nobootmem allocator will always try to free memory allocated for
> reserved memory regions (free_low_memory_core_early()) without taking
> into to account current memblock debugging configuration
> (CONFIG_ARCH_DISCARD_MEMBLOCK and CONFIG_DEBUG_FS state).
> As result if:
> - CONFIG_DEBUG_FS defined
> - CONFIG_ARCH_DISCARD_MEMBLOCK not defined;
> - reserved memory regions array have been resized during boot
>
> then:
> - memory allocated for reserved memory regions array will be freed to
> buddy allocator;
> - debug_fs entry "sys/kernel/debug/memblock/reserved" will show garbage
> instead of state of memory reservations. like:
> 0: 0x98393bc0..0x9a393bbf
> 1: 0xff120000..0xff11ffff
> 2: 0x00000000..0xffffffff
>
> Hence, do not free memory allocated for reserved memory regions if
> defined(CONFIG_DEBUG_FS) && !defined(CONFIG_ARCH_DISCARD_MEMBLOCK).
>
> Cc: Yinghai Lu <[email protected]>
> Cc: Tejun Heo <[email protected]>
> Cc: Andrew Morton <[email protected]>
>
> Signed-off-by: Grygorii Strashko <[email protected]>
> Signed-off-by: Santosh Shilimkar <[email protected]>
> ---
> mm/memblock.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index d903138..1bb2cc0 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -169,6 +169,10 @@ phys_addr_t __init_memblock get_allocated_memblock_reserved_regions_info(
> if (memblock.reserved.regions == memblock_reserved_init_regions)
> return 0;
>

Please add comment explaining why the following test exists. It's
pretty difficult to deduce the reason only from the code.

> + if (IS_ENABLED(CONFIG_DEBUG_FS) &&
> + !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK))
> + return 0;
> +

Also, as this is another fix patch, can you please move this to the
head of the series?

Thanks.

--
tejun

2013-10-13 19:54:11

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC 09/23] mm/init: Use memblock apis for early memory allocations

On Sat, Oct 12, 2013 at 05:58:52PM -0400, Santosh Shilimkar wrote:
> Switch to memblock interfaces for early memory allocator

When posting actual (non-RFC) patches later, please cc the maintainers
of the target subsystem and briefly explain why the new interface is
needed and that this doesn't change visible behavior.

Thanks.

--
tejun

2013-10-14 13:49:30

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC 06/23] mm/memblock: Add memblock early memory allocation apis

On Sunday 13 October 2013 02:42 PM, Tejun Heo wrote:
> On Sun, Oct 13, 2013 at 07:00:59PM +0100, Russell King - ARM Linux wrote:
>> On Sun, Oct 13, 2013 at 01:56:48PM -0400, Tejun Heo wrote:
>>> Hello,
>>>
>>> On Sat, Oct 12, 2013 at 05:58:49PM -0400, Santosh Shilimkar wrote:
>>>> Introduce memblock early memory allocation APIs which allow to support
>>>> LPAE extension on 32 bits archs. More over, this is the next step
>>>
>>> LPAE isn't something people outside arm circle would understand.
>>> Let's stick to highmem.
>>
>> LPAE != highmem. Two totally different things, unless you believe
>> system memory always starts at physical address zero, which is very
>> far from the case on the majority of ARM platforms.
>>
thanks Russell for clarification.

>> So replacing LPAE with "highmem" is pure misrepresentation and is
>> inaccurate. PAE might be a better term, and is also the x86 term
>> for this.
>
> Ah, right, forgot about the base address. Let's please spell out the
> requirements then. Briefly explaining both aspects (non-zero base
> addr & highmem) and why the existing bootmem based interfaced can't
> serve them would be helpful to later readers.
>
OK. Will try to describe bit more in the next version.Cover letter had
some of the information on the requirement which I will also
mention in the patch commit in next version.

Regards,
Santosh

2013-10-14 14:40:26

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC 06/23] mm/memblock: Add memblock early memory allocation apis

On Sunday 13 October 2013 01:56 PM, Tejun Heo wrote:
> Hello,
>
> On Sat, Oct 12, 2013 at 05:58:49PM -0400, Santosh Shilimkar wrote:
>> Introduce memblock early memory allocation APIs which allow to support
>> LPAE extension on 32 bits archs. More over, this is the next step
>

[..]

>> +/* FIXME: Move to memblock.h at a point where we remove nobootmem.c */
>> +void *memblock_early_alloc_try_nid_nopanic(int nid, phys_addr_t size,
>> + phys_addr_t align, phys_addr_t from, phys_addr_t max_addr);
>> +void *memblock_early_alloc_try_nid(int nid, phys_addr_t size,
>> + phys_addr_t align, phys_addr_t from, phys_addr_t max_addr);
>
> Wouldn't it make more sense to put @nid at the end. @size is the main
> parameter here and it gets confusing with _alloc_node() interface as
> the positions of paramters change. Plus, kmalloc_node() puts @node at
> the end too.
>
Ok. Will make @nid as a last parameter.

>> +void __memblock_free_early(phys_addr_t base, phys_addr_t size);
>> +void __memblock_free_late(phys_addr_t base, phys_addr_t size);
>
> Would it be possible to drop "early"? It's redundant and makes the
> function names unnecessarily long. When memblock is enabled, these
> are basically doing about the same thing as memblock_alloc() and
> friends, right? Wouldn't it make more sense to define these as
> memblock_alloc_XXX()?
>
A small a difference w.r.t existing memblock_alloc() vs these new
exports returns virtual mapped memory pointers. Actually I started
with memblock_alloc_xxx() but then memblock already exports memblock_alloc_xx()
returning physical memory pointer. So just wanted to make these interfaces
distinct and added "early". But I agree with you that the 'early' can
be dropped. Will fix it.

>> +#define memblock_early_alloc(x) \
>> + memblock_early_alloc_try_nid(MAX_NUMNODES, x, SMP_CACHE_BYTES, \
>> + BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
>> +#define memblock_early_alloc_align(x, align) \
>> + memblock_early_alloc_try_nid(MAX_NUMNODES, x, align, \
>> + BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
>> +#define memblock_early_alloc_nopanic(x) \
>> + memblock_early_alloc_try_nid_nopanic(MAX_NUMNODES, x, SMP_CACHE_BYTES, \
>> + BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
>> +#define memblock_early_alloc_pages(x) \
>> + memblock_early_alloc_try_nid(MAX_NUMNODES, x, PAGE_SIZE, \
>> + BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
>> +#define memblock_early_alloc_pages_nopanic(x) \
>> + memblock_early_alloc_try_nid_nopanic(MAX_NUMNODES, x, PAGE_SIZE, \
>> + BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
>
> I always felt a bit weird about _pages() interface. It says pages but
> takes bytes in size. Maybe we're better off just converting the
> current _pages users to _alloc_align()?
>
I thought the pages interfaces are more for asking the memory
allocations which are page aligned. So yes, we could convert
these users to make use of align interfaces.


>> +#define memblock_early_alloc_node(nid, x) \
>> + memblock_early_alloc_try_nid(nid, x, SMP_CACHE_BYTES, \
>> + BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
>> +#define memblock_early_alloc_node_nopanic(nid, x) \
>> + memblock_early_alloc_try_nid_nopanic(nid, x, SMP_CACHE_BYTES, \
>> + BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE)
>
> Ditto as above. Maybe @nid can be moved to the end?
>
ok

>> +static void * __init _memblock_early_alloc_try_nid_nopanic(int nid,
>> + phys_addr_t size, phys_addr_t align,
>> + phys_addr_t from, phys_addr_t max_addr)
>> +{
>> + phys_addr_t alloc;
>> + void *ptr;
>> +
>> + if (WARN_ON_ONCE(slab_is_available())) {
>> + if (nid == MAX_NUMNODES)
>
> Shouldn't we be using NUMA_NO_NODE?
>
>> + return kzalloc(size, GFP_NOWAIT);
>> + else
>> + return kzalloc_node(size, GFP_NOWAIT, nid);
>
> And kzalloc_node() understands NUMA_NO_NODE.
>
Will try this out.

>> + }
>> +
>> + if (WARN_ON(!align))
>> + align = __alignof__(long long);
>
> Wouldn't SMP_CACHE_BYTES make more sense? Also, I'm not sure we
> actually want WARN on it. Interpreting 0 as "default align" isn't
> that weird.
>
Will drop that WARN and use SMP_CACHE_BYTES as a default.


>> + /* align @size to avoid excessive fragmentation on reserved array */
>> + size = round_up(size, align);
>> +
>> +again:
>> + alloc = memblock_find_in_range_node(from, max_addr, size, align, nid);
>> + if (alloc)
>> + goto done;
>> +
>> + if (nid != MAX_NUMNODES) {
>> + alloc =
>> + memblock_find_in_range_node(from, max_addr, size,
>> + align, MAX_NUMNODES);
>> + if (alloc)
>> + goto done;
>> + }
>> +
>> + if (from) {
>> + from = 0;
>> + goto again;
>> + } else {
>> + goto error;
>> + }
>> +
>> +done:
>> + memblock_reserve(alloc, size);
>> + ptr = phys_to_virt(alloc);
>> + memset(ptr, 0, size);
>
> What if the address is high? Don't we need kmapping here?
>
The current nobootmem code actually don't handle the high
addresses since the max memory is limited by memblock.current_limit
which is max_low_pfn. So I am assuming we don't need to support
it. __alloc_bootmem_node_high() interface underneath uses
__alloc_memory_core_early() and we tried to keep the same
functionality in new code.

>> +
>> + /*
>> + * The min_count is set to 0 so that bootmem allocated blocks
>> + * are never reported as leaks.
>> + */
>> + kmemleak_alloc(ptr, size, 0, 0);
>> +
>> + return ptr;
>> +
>> +error:
>> + return NULL;
>> +}
>> +
>> +void * __init memblock_early_alloc_try_nid_nopanic(int nid,
>> + phys_addr_t size, phys_addr_t align,
>> + phys_addr_t from, phys_addr_t max_addr)
>> +{
>> + memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=0x%llx max_addr=0x%llx %pF\n",
>> + __func__, (u64)size, (u64)align, nid, (u64)from,
>> + (u64)max_addr, (void *)_RET_IP_);
>> + return _memblock_early_alloc_try_nid_nopanic(nid, size,
>> + align, from, max_addr);
>
> Do we need the extra level of wrapping? Just implement
> alloc_try_nid_nopanic() here and make the panicky version call it?
>
It was useful to have caller information (_RET_IP_) for debug. But
it can be dropped if you insist.

Regards,
Santosh

2013-10-14 14:41:45

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC 07/23] mm/memblock: debug: correct displaying of upper memory boundary

On Sunday 13 October 2013 02:02 PM, Tejun Heo wrote:
> On Sat, Oct 12, 2013 at 05:58:50PM -0400, Santosh Shilimkar wrote:
>> From: Grygorii Strashko <[email protected]>
>>
>> When debugging is enabled (cmdline has "memblock=debug") the memblock
>> will display upper memory boundary per each allocated/freed memory range
>> wrongly. For example:
>> memblock_reserve: [0x0000009e7e8000-0x0000009e7ed000] _memblock_early_alloc_try_nid_nopanic+0xfc/0x12c
>>
>> The 0x0000009e7ed000 is displayed instead of 0x0000009e7ecfff
>>
>> Hence, correct this by changing formula used to calculate upper memory
>> boundary to (u64)base + size - 1 instead of (u64)base + size everywhere
>> in the debug messages.
>
> I kinda prefer base + size because it's easier to actually know the
> size but yeah, it should have been [base, base + size) and other
> places use base + size - 1 notation so it probably is better to stick
> to that. Maybe move this one to the beginning of the series?
>
> Acked-by: Tejun Heo <[email protected]>
>
Thanks. Will do

2013-10-14 14:42:18

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC 08/23] mm/memblock: debug: don't free reserved array if !ARCH_DISCARD_MEMBLOCK

On Sunday 13 October 2013 03:51 PM, Tejun Heo wrote:
> On Sat, Oct 12, 2013 at 05:58:51PM -0400, Santosh Shilimkar wrote:
>> From: Grygorii Strashko <[email protected]>
>>
>> Now the Nobootmem allocator will always try to free memory allocated for
>> reserved memory regions (free_low_memory_core_early()) without taking
>> into to account current memblock debugging configuration
>> (CONFIG_ARCH_DISCARD_MEMBLOCK and CONFIG_DEBUG_FS state).
>> As result if:
>> - CONFIG_DEBUG_FS defined
>> - CONFIG_ARCH_DISCARD_MEMBLOCK not defined;
>> - reserved memory regions array have been resized during boot
>>
>> then:
>> - memory allocated for reserved memory regions array will be freed to
>> buddy allocator;
>> - debug_fs entry "sys/kernel/debug/memblock/reserved" will show garbage
>> instead of state of memory reservations. like:
>> 0: 0x98393bc0..0x9a393bbf
>> 1: 0xff120000..0xff11ffff
>> 2: 0x00000000..0xffffffff
>>
>> Hence, do not free memory allocated for reserved memory regions if
>> defined(CONFIG_DEBUG_FS) && !defined(CONFIG_ARCH_DISCARD_MEMBLOCK).
>>
>> Cc: Yinghai Lu <[email protected]>
>> Cc: Tejun Heo <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>>
>> Signed-off-by: Grygorii Strashko <[email protected]>
>> Signed-off-by: Santosh Shilimkar <[email protected]>
>> ---
>> mm/memblock.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index d903138..1bb2cc0 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -169,6 +169,10 @@ phys_addr_t __init_memblock get_allocated_memblock_reserved_regions_info(
>> if (memblock.reserved.regions == memblock_reserved_init_regions)
>> return 0;
>>
>
> Please add comment explaining why the following test exists. It's
> pretty difficult to deduce the reason only from the code.
>
ok.

>> + if (IS_ENABLED(CONFIG_DEBUG_FS) &&
>> + !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK))
>> + return 0;
>> +
>
> Also, as this is another fix patch, can you please move this to the
> head of the series?
>
Sure

2013-10-14 14:43:50

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC 09/23] mm/init: Use memblock apis for early memory allocations

On Sunday 13 October 2013 03:54 PM, Tejun Heo wrote:
> On Sat, Oct 12, 2013 at 05:58:52PM -0400, Santosh Shilimkar wrote:
>> Switch to memblock interfaces for early memory allocator
>
> When posting actual (non-RFC) patches later, please cc the maintainers
> of the target subsystem and briefly explain why the new interface is
> needed and that this doesn't change visible behavior.
>
Sure. Thanks a lot for quick response on the series. I will give another
week or so to see if there are more comments and then start addressing
comments in next version.

Regards,
Santosh

2013-10-14 14:58:41

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC 06/23] mm/memblock: Add memblock early memory allocation apis

Hello,

On Mon, Oct 14, 2013 at 10:39:54AM -0400, Santosh Shilimkar wrote:
> >> +void __memblock_free_early(phys_addr_t base, phys_addr_t size);
> >> +void __memblock_free_late(phys_addr_t base, phys_addr_t size);
> >
> > Would it be possible to drop "early"? It's redundant and makes the
> > function names unnecessarily long. When memblock is enabled, these
> > are basically doing about the same thing as memblock_alloc() and
> > friends, right? Wouldn't it make more sense to define these as
> > memblock_alloc_XXX()?
> >
> A small a difference w.r.t existing memblock_alloc() vs these new
> exports returns virtual mapped memory pointers. Actually I started
> with memblock_alloc_xxx() but then memblock already exports memblock_alloc_xx()
> returning physical memory pointer. So just wanted to make these interfaces
> distinct and added "early". But I agree with you that the 'early' can
> be dropped. Will fix it.

Hmmm, so while this removes address limit on the base / limit side, it
keeps virt address on the result. In that case, we probably want to
somehow distinguish the two sets of interfaces - one set dealing with
phys and the other dealing with virts. Maybe we want to build the
base interface on phys address and add convenience wrappers for virts?
Would that make more sense?

Thanks.

--
tejun

2013-10-14 15:03:56

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC 06/23] mm/memblock: Add memblock early memory allocation apis

On Monday 14 October 2013 10:58 AM, Tejun Heo wrote:
> Hello,
>
> On Mon, Oct 14, 2013 at 10:39:54AM -0400, Santosh Shilimkar wrote:
>>>> +void __memblock_free_early(phys_addr_t base, phys_addr_t size);
>>>> +void __memblock_free_late(phys_addr_t base, phys_addr_t size);
>>>
>>> Would it be possible to drop "early"? It's redundant and makes the
>>> function names unnecessarily long. When memblock is enabled, these
>>> are basically doing about the same thing as memblock_alloc() and
>>> friends, right? Wouldn't it make more sense to define these as
>>> memblock_alloc_XXX()?
>>>
>> A small a difference w.r.t existing memblock_alloc() vs these new
>> exports returns virtual mapped memory pointers. Actually I started
>> with memblock_alloc_xxx() but then memblock already exports memblock_alloc_xx()
>> returning physical memory pointer. So just wanted to make these interfaces
>> distinct and added "early". But I agree with you that the 'early' can
>> be dropped. Will fix it.
>
> Hmmm, so while this removes address limit on the base / limit side, it
> keeps virt address on the result. In that case, we probably want to
> somehow distinguish the two sets of interfaces - one set dealing with
> phys and the other dealing with virts. Maybe we want to build the
> base interface on phys address and add convenience wrappers for virts?
> Would that make more sense?
>
Thats what more or less we are doing if you look at it. The only
additional code we have is to manage the virtual memory and checks
as such, just the same way initially done in nobootmem.c wrappers.

Not sure if adding 'virt' word in these APIs to make it explicit
would help to avoid any confusion.

Regards,
Santosh