LinuxLists.cc - [RFC 0/3] Mirrored memory support for boot time allocations

2015-02-06 22:22:50

Subject: [RFC 0/3] Mirrored memory support for boot time allocations

Platforms that support a mix of mirrored and regular memory are coming.

We'd like to use the mirrored memory for kernel code, data and dynamically
allocated data because our machine check recovery code cannot fix problems
there. This series modifies the memblock allocator to comprehend mirrored
memory and use it for all boot time allocations. Later I'll dig into page_alloc.c
to put the leftover mirrored memory into a zone to be used for kernel allocation
by slab/slob/slub and others.

You'll see why this is just RFC when you get to part 3.

Tony Luck (3):
mm/memblock: Add extra "flag" to memblock to allow selection of memory
based on attribute
mm/memblock: Allocate boot time data structures from mirrored memory
x86, mirror: x86 enabling - find mirrored memory ranges and tell
memblock

arch/s390/kernel/crash_dump.c | 4 +-
arch/sparc/mm/init_64.c | 4 +-
arch/x86/kernel/check.c | 2 +-
arch/x86/kernel/e820.c | 2 +-
arch/x86/mm/init_32.c | 2 +-
arch/x86/mm/memtest.c | 2 +-
include/linux/memblock.h | 43 ++++++++++------
mm/cma.c | 4 +-
mm/memblock.c | 113 ++++++++++++++++++++++++++++++++----------
mm/nobootmem.c | 12 ++++-
10 files changed, 135 insertions(+), 53 deletions(-)

--
2.1.0

2015-02-06 22:23:22

by Tony Luck

[permalink] [raw]

Subject: [RFC 1/3] mm/memblock: Add extra "flag" to memblock to allow selection of memory based on attribute

No functional changes

Signed-off-by: Tony Luck <[email protected]>
---
arch/s390/kernel/crash_dump.c | 4 ++--
arch/sparc/mm/init_64.c | 4 ++--
arch/x86/kernel/check.c | 2 +-
arch/x86/kernel/e820.c | 2 +-
arch/x86/mm/init_32.c | 2 +-
arch/x86/mm/memtest.c | 2 +-
include/linux/memblock.h | 35 ++++++++++++++++++-------------
mm/cma.c | 4 ++--
mm/memblock.c | 49 +++++++++++++++++++++++++------------------
mm/nobootmem.c | 4 ++--
10 files changed, 61 insertions(+), 47 deletions(-)

diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index 9f73c8059022..1b117a2a60af 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -33,11 +33,11 @@ static struct memblock_type oldmem_type = {
};

#define for_each_dump_mem_range(i, nid, p_start, p_end, p_nid) \
- for (i = 0, __next_mem_range(&i, nid, &memblock.physmem, \
+ for (i = 0, __next_mem_range(&i, nid, 0, &memblock.physmem, \
&oldmem_type, p_start, \
p_end, p_nid); \
i != (u64)ULLONG_MAX; \
- __next_mem_range(&i, nid, &memblock.physmem, \
+ __next_mem_range(&i, nid, 0, &memblock.physmem, \
&oldmem_type, \
p_start, p_end, p_nid))

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 3ea267c53320..1f979c8ac2cd 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -1952,7 +1952,7 @@ static phys_addr_t __init available_memory(void)
phys_addr_t pa_start, pa_end;
u64 i;

- for_each_free_mem_range(i, NUMA_NO_NODE, &pa_start, &pa_end, NULL)
+ for_each_free_mem_range(i, NUMA_NO_NODE, 0, &pa_start, &pa_end, NULL)
available = available + (pa_end - pa_start);

return available;
@@ -1971,7 +1971,7 @@ static void __init reduce_memory(phys_addr_t limit_ram)
if (limit_ram >= avail_ram)
return;

- for_each_free_mem_range(i, NUMA_NO_NODE, &pa_start, &pa_end, NULL) {
+ for_each_free_mem_range(i, NUMA_NO_NODE, 0, &pa_start, &pa_end, NULL) {
phys_addr_t region_size = pa_end - pa_start;
phys_addr_t clip_start = pa_start;

diff --git a/arch/x86/kernel/check.c b/arch/x86/kernel/check.c
index 83a7995625a6..46c8bc62f840 100644
--- a/arch/x86/kernel/check.c
+++ b/arch/x86/kernel/check.c
@@ -91,7 +91,7 @@ void __init setup_bios_corruption_check(void)

corruption_check_size = round_up(corruption_check_size, PAGE_SIZE);

- for_each_free_mem_range(i, NUMA_NO_NODE, &start, &end, NULL) {
+ for_each_free_mem_range(i, NUMA_NO_NODE, 0, &start, &end, NULL) {
start = clamp_t(phys_addr_t, round_up(start, PAGE_SIZE),
PAGE_SIZE, corruption_check_size);
end = clamp_t(phys_addr_t, round_down(end, PAGE_SIZE),
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index dd2f07ae9d0c..b160b65bb9c1 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1119,7 +1119,7 @@ void __init memblock_find_dma_reserve(void)
nr_pages += end_pfn - start_pfn;
}

- for_each_free_mem_range(u, NUMA_NO_NODE, &start, &end, NULL) {
+ for_each_free_mem_range(u, NUMA_NO_NODE, 0, &start, &end, NULL) {
start_pfn = min_t(unsigned long, PFN_UP(start), MAX_DMA_PFN);
end_pfn = min_t(unsigned long, PFN_DOWN(end), MAX_DMA_PFN);
if (start_pfn < end_pfn)
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index c8140e12816a..6455c9f86bc8 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -433,7 +433,7 @@ void __init add_highpages_with_active_regions(int nid,
phys_addr_t start, end;
u64 i;

- for_each_free_mem_range(i, nid, &start, &end, NULL) {
+ for_each_free_mem_range(i, nid, 0, &start, &end, NULL) {
unsigned long pfn = clamp_t(unsigned long, PFN_UP(start),
start_pfn, end_pfn);
unsigned long e_pfn = clamp_t(unsigned long, PFN_DOWN(end),
diff --git a/arch/x86/mm/memtest.c b/arch/x86/mm/memtest.c
index 1e9da795767a..7e7c6627b784 100644
--- a/arch/x86/mm/memtest.c
+++ b/arch/x86/mm/memtest.c
@@ -74,7 +74,7 @@ static void __init do_one_pass(u64 pattern, u64 start, u64 end)
u64 i;
phys_addr_t this_start, this_end;

- for_each_free_mem_range(i, NUMA_NO_NODE, &this_start, &this_end, NULL) {
+ for_each_free_mem_range(i, NUMA_NO_NODE, 0, &this_start, &this_end, NULL) {
this_start = clamp_t(phys_addr_t, this_start, start, end);
this_end = clamp_t(phys_addr_t, this_end, start, end);
if (this_start < this_end) {
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index e8cc45307f8f..8a036cc6cdd5 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -61,7 +61,7 @@ extern bool movable_node_enabled;

phys_addr_t memblock_find_in_range_node(phys_addr_t size, phys_addr_t align,
phys_addr_t start, phys_addr_t end,
- int nid);
+ int nid, u32 flag);
phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end,
phys_addr_t size, phys_addr_t align);
phys_addr_t get_allocated_memblock_reserved_regions_info(phys_addr_t *addr);
@@ -85,11 +85,11 @@ int memblock_remove_range(struct memblock_type *type,
phys_addr_t base,
phys_addr_t size);

-void __next_mem_range(u64 *idx, int nid, struct memblock_type *type_a,
+void __next_mem_range(u64 *idx, int nid, u32 flags, struct memblock_type *type_a,
struct memblock_type *type_b, phys_addr_t *out_start,
phys_addr_t *out_end, int *out_nid);

-void __next_mem_range_rev(u64 *idx, int nid, struct memblock_type *type_a,
+void __next_mem_range_rev(u64 *idx, int nid, u32 flags, struct memblock_type *type_a,
struct memblock_type *type_b, phys_addr_t *out_start,
phys_addr_t *out_end, int *out_nid);

@@ -100,16 +100,17 @@ void __next_mem_range_rev(u64 *idx, int nid, struct memblock_type *type_a,
* @type_a: ptr to memblock_type to iterate
* @type_b: ptr to memblock_type which excludes from the iteration
* @nid: node selector, %NUMA_NO_NODE for all nodes
+ * @flag: pick from blocks based on memory attributes
* @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
* @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
* @p_nid: ptr to int for nid of the range, can be %NULL
*/
-#define for_each_mem_range(i, type_a, type_b, nid, \
+#define for_each_mem_range(i, type_a, type_b, nid, flag, \
p_start, p_end, p_nid) \
- for (i = 0, __next_mem_range(&i, nid, type_a, type_b, \
+ for (i = 0, __next_mem_range(&i, nid, flag, type_a, type_b, \
p_start, p_end, p_nid); \
i != (u64)ULLONG_MAX; \
- __next_mem_range(&i, nid, type_a, type_b, \
+ __next_mem_range(&i, nid, flag, type_a, type_b, \
p_start, p_end, p_nid))

/**
@@ -119,17 +120,18 @@ void __next_mem_range_rev(u64 *idx, int nid, struct memblock_type *type_a,
* @type_a: ptr to memblock_type to iterate
* @type_b: ptr to memblock_type which excludes from the iteration
* @nid: node selector, %NUMA_NO_NODE for all nodes
+ * @flag: pick from blocks based on memory attributes
* @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
* @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
* @p_nid: ptr to int for nid of the range, can be %NULL
*/
-#define for_each_mem_range_rev(i, type_a, type_b, nid, \
+#define for_each_mem_range_rev(i, type_a, type_b, nid, flag, \
p_start, p_end, p_nid) \
for (i = (u64)ULLONG_MAX, \
- __next_mem_range_rev(&i, nid, type_a, type_b, \
+ __next_mem_range_rev(&i, nid, flag, type_a, type_b,\
p_start, p_end, p_nid); \
i != (u64)ULLONG_MAX; \
- __next_mem_range_rev(&i, nid, type_a, type_b, \
+ __next_mem_range_rev(&i, nid, flag, type_a, type_b, \
p_start, p_end, p_nid))

#ifdef CONFIG_MOVABLE_NODE
@@ -181,13 +183,14 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
* @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
* @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
* @p_nid: ptr to int for nid of the range, can be %NULL
+ * @flag: pick from blocks based on memory attributes
*
* Walks over free (memory && !reserved) areas of memblock. Available as
* soon as memblock is initialized.
*/
-#define for_each_free_mem_range(i, nid, p_start, p_end, p_nid) \
+#define for_each_free_mem_range(i, nid, flag, p_start, p_end, p_nid) \
for_each_mem_range(i, &memblock.memory, &memblock.reserved, \
- nid, p_start, p_end, p_nid)
+ nid, flag, p_start, p_end, p_nid)

/**
* for_each_free_mem_range_reverse - rev-iterate through free memblock areas
@@ -196,13 +199,14 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
* @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
* @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
* @p_nid: ptr to int for nid of the range, can be %NULL
+ * @flag: pick from blocks based on memory attributes
*
* Walks over free (memory && !reserved) areas of memblock in reverse
* order. Available as soon as memblock is initialized.
*/
-#define for_each_free_mem_range_reverse(i, nid, p_start, p_end, p_nid) \
- for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved, \
- nid, p_start, p_end, p_nid)
+#define for_each_free_mem_range_reverse(i, nid, flag, p_start, p_end, p_nid) \
+ for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved, \
+ nid, flag, p_start, p_end, p_nid)

static inline void memblock_set_region_flags(struct memblock_region *r,
unsigned long flags)
@@ -273,7 +277,8 @@ static inline bool memblock_bottom_up(void) { return false; }
#define MEMBLOCK_ALLOC_ACCESSIBLE 0

phys_addr_t __init memblock_alloc_range(phys_addr_t size, phys_addr_t align,
- phys_addr_t start, phys_addr_t end);
+ phys_addr_t start, phys_addr_t end,
+ u32 flag);
phys_addr_t memblock_alloc_base(phys_addr_t size, phys_addr_t align,
phys_addr_t max_addr);
phys_addr_t __memblock_alloc_base(phys_addr_t size, phys_addr_t align,
diff --git a/mm/cma.c b/mm/cma.c
index a85ae28709a3..5800c037562a 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -312,13 +312,13 @@ int __init cma_declare_contiguous(phys_addr_t base,
*/
if (base < highmem_start && limit > highmem_start) {
addr = memblock_alloc_range(size, alignment,
- highmem_start, limit);
+ highmem_start, limit, 0);
limit = highmem_start;
}

if (!addr) {
addr = memblock_alloc_range(size, alignment, base,
- limit);
+ limit, 0);
if (!addr) {
ret = -ENOMEM;
goto err;
diff --git a/mm/memblock.c b/mm/memblock.c
index 252b77bdf65e..3c8db6d84a32 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -107,6 +107,7 @@ static long __init_memblock memblock_overlaps_region(struct memblock_type *type,
* @size: size of free area to find
* @align: alignment of free area to find
* @nid: nid of the free area to find, %NUMA_NO_NODE for any node
+ * @flag: pick from blocks based on memory attributes
*
* Utility called from memblock_find_in_range_node(), find free area bottom-up.
*
@@ -115,12 +116,13 @@ static long __init_memblock memblock_overlaps_region(struct memblock_type *type,
*/
static phys_addr_t __init_memblock
__memblock_find_range_bottom_up(phys_addr_t start, phys_addr_t end,
- phys_addr_t size, phys_addr_t align, int nid)
+ phys_addr_t size, phys_addr_t align, int nid,
+ u32 flag)
{
phys_addr_t this_start, this_end, cand;
u64 i;

- for_each_free_mem_range(i, nid, &this_start, &this_end, NULL) {
+ for_each_free_mem_range(i, nid, flag, &this_start, &this_end, NULL) {
this_start = clamp(this_start, start, end);
this_end = clamp(this_end, start, end);

@@ -139,6 +141,7 @@ __memblock_find_range_bottom_up(phys_addr_t start, phys_addr_t end,
* @size: size of free area to find
* @align: alignment of free area to find
* @nid: nid of the free area to find, %NUMA_NO_NODE for any node
+ * @flag: pick from blocks based on memory attributes
*
* Utility called from memblock_find_in_range_node(), find free area top-down.
*
@@ -147,12 +150,13 @@ __memblock_find_range_bottom_up(phys_addr_t start, phys_addr_t end,
*/
static phys_addr_t __init_memblock
__memblock_find_range_top_down(phys_addr_t start, phys_addr_t end,
- phys_addr_t size, phys_addr_t align, int nid)
+ phys_addr_t size, phys_addr_t align, int nid,
+ u32 flag)
{
phys_addr_t this_start, this_end, cand;
u64 i;

- for_each_free_mem_range_reverse(i, nid, &this_start, &this_end, NULL) {
+ for_each_free_mem_range_reverse(i, nid, flag, &this_start, &this_end, NULL) {
this_start = clamp(this_start, start, end);
this_end = clamp(this_end, start, end);

@@ -174,6 +178,7 @@ __memblock_find_range_top_down(phys_addr_t start, phys_addr_t end,
* @start: start of candidate range
* @end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE}
* @nid: nid of the free area to find, %NUMA_NO_NODE for any node
+ * @flag: pick from blocks based on memory attributes
*
* Find @size free area aligned to @align in the specified range and node.
*
@@ -190,7 +195,7 @@ __memblock_find_range_top_down(phys_addr_t start, phys_addr_t end,
*/
phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
phys_addr_t align, phys_addr_t start,
- phys_addr_t end, int nid)
+ phys_addr_t end, int nid, u32 flag)
{
phys_addr_t kernel_end, ret;

@@ -215,7 +220,7 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,

/* ok, try bottom-up allocation first */
ret = __memblock_find_range_bottom_up(bottom_up_start, end,
- size, align, nid);
+ size, align, nid, flag);
if (ret)
return ret;

@@ -233,7 +238,7 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
"memory hotunplug may be affected\n");
}

- return __memblock_find_range_top_down(start, end, size, align, nid);
+ return __memblock_find_range_top_down(start, end, size, align, nid, flag);
}

/**
@@ -242,6 +247,7 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
* @end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE}
* @size: size of free area to find
* @align: alignment of free area to find
+ * @flag: pick from blocks based on memory attributes
*
* Find @size free area aligned to @align in the specified range.
*
@@ -253,7 +259,7 @@ phys_addr_t __init_memblock memblock_find_in_range(phys_addr_t start,
phys_addr_t align)
{
return memblock_find_in_range_node(size, align, start, end,
- NUMA_NO_NODE);
+ NUMA_NO_NODE, 0);
}

static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r)
@@ -768,6 +774,7 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size)
* __next__mem_range - next function for for_each_free_mem_range() etc.
* @idx: pointer to u64 loop variable
* @nid: node selector, %NUMA_NO_NODE for all nodes
+ * @flags: pick from blocks based on memory attributes
* @type_a: pointer to memblock_type from where the range is taken
* @type_b: pointer to memblock_type which excludes memory from being taken
* @out_start: ptr to phys_addr_t for start address of the range, can be %NULL
@@ -789,7 +796,7 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size)
* As both region arrays are sorted, the function advances the two indices
* in lockstep and returns each intersection.
*/
-void __init_memblock __next_mem_range(u64 *idx, int nid,
+void __init_memblock __next_mem_range(u64 *idx, int nid, u32 flags,
struct memblock_type *type_a,
struct memblock_type *type_b,
phys_addr_t *out_start,
@@ -881,6 +888,7 @@ void __init_memblock __next_mem_range(u64 *idx, int nid,
*
* @idx: pointer to u64 loop variable
* @nid: nid: node selector, %NUMA_NO_NODE for all nodes
+ * @flags: pick from blocks based on memory attributes
* @type_a: pointer to memblock_type from where the range is taken
* @type_b: pointer to memblock_type which excludes memory from being taken
* @out_start: ptr to phys_addr_t for start address of the range, can be %NULL
@@ -889,7 +897,7 @@ void __init_memblock __next_mem_range(u64 *idx, int nid,
*
* Reverse of __next_mem_range().
*/
-void __init_memblock __next_mem_range_rev(u64 *idx, int nid,
+void __init_memblock __next_mem_range_rev(u64 *idx, int nid, u32 flags,
struct memblock_type *type_a,
struct memblock_type *type_b,
phys_addr_t *out_start,
@@ -1036,14 +1044,14 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,

static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
phys_addr_t align, phys_addr_t start,
- phys_addr_t end, int nid)
+ phys_addr_t end, int nid, u32 flag)
{
phys_addr_t found;

if (!align)
align = SMP_CACHE_BYTES;

- found = memblock_find_in_range_node(size, align, start, end, nid);
+ found = memblock_find_in_range_node(size, align, start, end, nid, flag);
if (found && !memblock_reserve(found, size)) {
/*
* The min_count is set to 0 so that memblock allocations are
@@ -1056,26 +1064,27 @@ static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
}

phys_addr_t __init memblock_alloc_range(phys_addr_t size, phys_addr_t align,
- phys_addr_t start, phys_addr_t end)
+ phys_addr_t start, phys_addr_t end,
+ u32 flag)
{
- return memblock_alloc_range_nid(size, align, start, end, NUMA_NO_NODE);
+ return memblock_alloc_range_nid(size, align, start, end, NUMA_NO_NODE, flag);
}

static phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
phys_addr_t align, phys_addr_t max_addr,
- int nid)
+ int nid, u32 flag)
{
- return memblock_alloc_range_nid(size, align, 0, max_addr, nid);
+ return memblock_alloc_range_nid(size, align, 0, max_addr, nid, flag);
}

phys_addr_t __init memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int nid)
{
- return memblock_alloc_base_nid(size, align, MEMBLOCK_ALLOC_ACCESSIBLE, nid);
+ return memblock_alloc_base_nid(size, align, MEMBLOCK_ALLOC_ACCESSIBLE, nid, 0);
}

phys_addr_t __init __memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
{
- return memblock_alloc_base_nid(size, align, max_addr, NUMA_NO_NODE);
+ return memblock_alloc_base_nid(size, align, max_addr, NUMA_NO_NODE, 0);
}

phys_addr_t __init memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
@@ -1159,13 +1168,13 @@ static void * __init memblock_virt_alloc_internal(

again:
alloc = memblock_find_in_range_node(size, align, min_addr, max_addr,
- nid);
+ nid, 0);
if (alloc)
goto done;

if (nid != NUMA_NO_NODE) {
alloc = memblock_find_in_range_node(size, align, min_addr,
- max_addr, NUMA_NO_NODE);
+ max_addr, NUMA_NO_NODE, 0);
if (alloc)
goto done;
}
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 90b50468333e..a4903046bcba 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -41,7 +41,7 @@ static void * __init __alloc_memory_core_early(int nid, u64 size, u64 align,
if (limit > memblock.current_limit)
limit = memblock.current_limit;

- addr = memblock_find_in_range_node(size, align, goal, limit, nid);
+ addr = memblock_find_in_range_node(size, align, goal, limit, nid, 0);
if (!addr)
return NULL;

@@ -121,7 +121,7 @@ static unsigned long __init free_low_memory_core_early(void)

memblock_clear_hotplug(0, -1);

- for_each_free_mem_range(i, NUMA_NO_NODE, &start, &end, NULL)
+ for_each_free_mem_range(i, NUMA_NO_NODE, 0, &start, &end, NULL)
count += __free_memory_core(start, end);

#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
--
2.1.0

2015-02-06 22:23:51

by Tony Luck

[permalink] [raw]

Subject: [RFC 2/3] mm/memblock: Allocate boot time data structures from mirrored memory

Try to allocate all boot time kernel data structures from mirrored
memory. If we run out of mirrored memory print warnings, but fall
back to using non-mirrored memory to make sure that we still boot.

Signed-off-by: Tony Luck <[email protected]>
---
include/linux/memblock.h | 8 ++++++
mm/memblock.c | 72 +++++++++++++++++++++++++++++++++++++++++-------
mm/nobootmem.c | 10 ++++++-
3 files changed, 79 insertions(+), 11 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 8a036cc6cdd5..03c2b91db474 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -22,6 +22,7 @@

/* Definition of memblock flags. */
#define MEMBLOCK_HOTPLUG 0x1 /* hotpluggable region */
+#define MEMBLOCK_MIRROR 0x2 /* mirrored region */

struct memblock_region {
phys_addr_t base;
@@ -75,6 +76,8 @@ int memblock_reserve(phys_addr_t base, phys_addr_t size);
void memblock_trim_memory(phys_addr_t align);
int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size);
int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size);
+int memblock_mark_mirror(phys_addr_t base, phys_addr_t size);
+u32 memblock_has_mirror(void);

/* Low level functions */
int memblock_add_range(struct memblock_type *type,
@@ -155,6 +158,11 @@ static inline bool movable_node_is_enabled(void)
}
#endif

+static inline bool memblock_is_mirror(struct memblock_region *m)
+{
+ return m->flags & MEMBLOCK_MIRROR;
+}
+
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn,
unsigned long *end_pfn);
diff --git a/mm/memblock.c b/mm/memblock.c
index 3c8db6d84a32..e0826ff5f59b 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -54,10 +54,16 @@ int memblock_debug __initdata_memblock;
#ifdef CONFIG_MOVABLE_NODE
bool movable_node_enabled __initdata_memblock = false;
#endif
+static bool memblock_have_mirror __initdata_memblock = false;
static int memblock_can_resize __initdata_memblock;
static int memblock_memory_in_slab __initdata_memblock = 0;
static int memblock_reserved_in_slab __initdata_memblock = 0;

+u32 __init_memblock memblock_has_mirror(void)
+{
+ return memblock_have_mirror ? MEMBLOCK_MIRROR : 0;
+}
+
/* inline so we don't get a warning when pr_debug is compiled out */
static __init_memblock const char *
memblock_type_name(struct memblock_type *type)
@@ -247,7 +253,6 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
* @end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE}
* @size: size of free area to find
* @align: alignment of free area to find
- * @flag: pick from blocks based on memory attributes
*
* Find @size free area aligned to @align in the specified range.
*
@@ -258,8 +263,19 @@ phys_addr_t __init_memblock memblock_find_in_range(phys_addr_t start,
phys_addr_t end, phys_addr_t size,
phys_addr_t align)
{
- return memblock_find_in_range_node(size, align, start, end,
+ phys_addr_t ret;
+ u32 flag = memblock_has_mirror();
+
+ ret = memblock_find_in_range_node(size, align, start, end,
+ NUMA_NO_NODE, flag);
+
+ if (!ret && flag) {
+ pr_warn("Could not allocate %lld bytes of mirrored memory\n", size);
+ ret = memblock_find_in_range_node(size, align, start, end,
NUMA_NO_NODE, 0);
+ }
+
+ return ret;
}

static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r)
@@ -771,6 +787,21 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size)
}

/**
+ * memblock_mark_mirror - Mark mirrored memory with flag MEMBLOCK_MIRROR.
+ * @base: the base phys addr of the region
+ * @size: the size of the region
+ *
+ * Return 0 on succees, -errno on failure.
+ */
+int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
+{
+ memblock_have_mirror = true;
+
+ return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR);
+}
+
+
+/**
* __next__mem_range - next function for for_each_free_mem_range() etc.
* @idx: pointer to u64 loop variable
* @nid: node selector, %NUMA_NO_NODE for all nodes
@@ -824,6 +855,10 @@ void __init_memblock __next_mem_range(u64 *idx, int nid, u32 flags,
if (movable_node_is_enabled() && memblock_is_hotpluggable(m))
continue;

+ /* if we want mirror memory skip non-mirror memory regions */
+ if ((flags & MEMBLOCK_MIRROR) && !memblock_is_mirror(m))
+ continue;
+
if (!type_b) {
if (out_start)
*out_start = m_start;
@@ -929,6 +964,10 @@ void __init_memblock __next_mem_range_rev(u64 *idx, int nid, u32 flags,
if (movable_node_is_enabled() && memblock_is_hotpluggable(m))
continue;

+ /* if we want mirror memory skip non-mirror memory regions */
+ if ((flags & MEMBLOCK_MIRROR) && !memblock_is_mirror(m))
+ continue;
+
if (!type_b) {
if (out_start)
*out_start = m_start;
@@ -1079,7 +1118,17 @@ static phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,

phys_addr_t __init memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int nid)
{
- return memblock_alloc_base_nid(size, align, MEMBLOCK_ALLOC_ACCESSIBLE, nid, 0);
+ u32 flag = memblock_has_mirror();
+ phys_addr_t ret;
+
+again:
+ ret = memblock_alloc_base_nid(size, align, MEMBLOCK_ALLOC_ACCESSIBLE, nid, flag);
+
+ if (!ret && flag) {
+ flag = 0;
+ goto again;
+ }
+ return ret;
}

phys_addr_t __init __memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
@@ -1148,6 +1197,7 @@ static void * __init memblock_virt_alloc_internal(
{
phys_addr_t alloc;
void *ptr;
+ u32 flag = memblock_has_mirror();

if (WARN_ONCE(nid == MAX_NUMNODES, "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
nid = NUMA_NO_NODE;
@@ -1168,13 +1218,13 @@ static void * __init memblock_virt_alloc_internal(

again:
alloc = memblock_find_in_range_node(size, align, min_addr, max_addr,
- nid, 0);
+ nid, flag);
if (alloc)
goto done;

if (nid != NUMA_NO_NODE) {
alloc = memblock_find_in_range_node(size, align, min_addr,
- max_addr, NUMA_NO_NODE, 0);
+ max_addr, NUMA_NO_NODE, flag);
if (alloc)
goto done;
}
@@ -1182,10 +1232,15 @@ again:
if (min_addr) {
min_addr = 0;
goto again;
- } else {
- goto error;
}

+ if (flag) {
+ flag = 0;
+ pr_warn("Could not allocate %lld bytes of mirrored memory\n", size);
+ goto again;
+ }
+
+ return NULL;
done:
memblock_reserve(alloc, size);
ptr = phys_to_virt(alloc);
@@ -1200,9 +1255,6 @@ done:
kmemleak_alloc(ptr, size, 0, 0);

return ptr;
-
-error:
- return NULL;
}

/**
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index a4903046bcba..35423c935a46 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -37,11 +37,19 @@ static void * __init __alloc_memory_core_early(int nid, u64 size, u64 align,
{
void *ptr;
u64 addr;
+ u32 flag = memblock_has_mirror();

if (limit > memblock.current_limit)
limit = memblock.current_limit;

- addr = memblock_find_in_range_node(size, align, goal, limit, nid, 0);
+again:
+ addr = memblock_find_in_range_node(size, align, goal, limit, nid, flag);
+
+ if (flag && !addr) {
+ flag = 0;
+ pr_warn("Could not allocate %lld bytes of mirrored memory\n", size);
+ goto again;
+ }
if (!addr)
return NULL;

--
2.1.0

2015-02-06 22:24:20

by Tony Luck

[permalink] [raw]

Subject: [RFC 3/3] x86, mirror: x86 enabling - find mirrored memory ranges and tell memblock

Can't post this part yet because it uses things in an upcoming[*] ACPI, UEFI, or some
other four-letter-ending-in-I standard. So just imagine a call someplace early
in startup that reads information about mirrored address ranges and does:

+ for (...) {
+ start = ...;
+ size = ...;
+ if (it looks mirrored)
+ memblock_mark_mirror(start, size);
+ }

Whole patch is pretty tiny:

3 files changed, 19 insertions(+)

How much damage could I possibly do in just 19 lines?

-Tony

[*] very soon, I'm told

2015-02-06 22:28:41

by Tony Luck

[permalink] [raw]

Subject: Re: [RFC 0/3] Mirrored memory support for boot time allocations

On Fri, Feb 6, 2015 at 1:54 PM, Tony Luck <[email protected]> wrote:
> Platforms that support a mix of mirrored and regular memory are coming.

Obviously I don't do enough -mm work to remember where linux-mm mailing list
is hosted :-(

Let's see who finds this on the linux-kernel list (that I did spell
right). When v2
happens I'll get it to the right places.

-Tony

2015-05-18 07:59:11

by Xishi Qiu

[permalink] [raw]

Subject: Re: [RFC 0/3] Mirrored memory support for boot time allocations

On 2015/2/7 5:54, Tony Luck wrote:

> Platforms that support a mix of mirrored and regular memory are coming.
>
> We'd like to use the mirrored memory for kernel code, data and dynamically
> allocated data because our machine check recovery code cannot fix problems
> there. This series modifies the memblock allocator to comprehend mirrored
> memory and use it for all boot time allocations. Later I'll dig into page_alloc.c
> to put the leftover mirrored memory into a zone to be used for kernel allocation
> by slab/slob/slub and others.

Hi Tony,

Is it means that you will create a new zone to fill mirrored memory, like the
movable zone, right?
I think this will change a lot of code, why not create a new migrate type?
such as CMA, e.g. MIGRATE_MIRROR

Thanks,
Xishi Qiu

>
> You'll see why this is just RFC when you get to part 3.
>
> Tony Luck (3):
> mm/memblock: Add extra "flag" to memblock to allow selection of memory
> based on attribute
> mm/memblock: Allocate boot time data structures from mirrored memory
> x86, mirror: x86 enabling - find mirrored memory ranges and tell
> memblock
>
> arch/s390/kernel/crash_dump.c | 4 +-
> arch/sparc/mm/init_64.c | 4 +-
> arch/x86/kernel/check.c | 2 +-
> arch/x86/kernel/e820.c | 2 +-
> arch/x86/mm/init_32.c | 2 +-
> arch/x86/mm/memtest.c | 2 +-
> include/linux/memblock.h | 43 ++++++++++------
> mm/cma.c | 4 +-
> mm/memblock.c | 113 ++++++++++++++++++++++++++++++++----------
> mm/nobootmem.c | 12 ++++-
> 10 files changed, 135 insertions(+), 53 deletions(-)
>

2015-05-18 08:09:49

by Xishi Qiu

[permalink] [raw]

Subject: Re: [RFC 3/3] x86, mirror: x86 enabling - find mirrored memory ranges and tell memblock

On 2015/2/4 6:40, Tony Luck wrote:

> Can't post this part yet because it uses things in an upcoming[*] ACPI, UEFI, or some
> other four-letter-ending-in-I standard. So just imagine a call someplace early
> in startup that reads information about mirrored address ranges and does:
>

Hi Tony,

Does the upcoming[*] ACPI will add a new flag in SRAT tables? just like memory hotplug.

#define ACPI_SRAT_MEM_HOT_PLUGGABLE (1<<1) /* 01: Memory region is hot pluggable */
+#define ACPI_SRAT_MEM_MIRROR (1<<3) /* 03: Memory region is mirrored */

acpi_numa_memory_affinity_init()
...
hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
+ mirrorable = ma->flags & ACPI_SRAT_MEM_MIRROR;
...
+ if (mirrorable)
+ memblock_mark_mirror(start, size);
...

Thanks,
Xishi Qiu

> + for (...) {
> + start = ...;
> + size = ...;
> + if (it looks mirrored)
> + memblock_mark_mirror(start, size);
> + }
>
> Whole patch is pretty tiny:
>
> 3 files changed, 19 insertions(+)
>
> How much damage could I possibly do in just 19 lines?
>
> -Tony
>
> [*] very soon, I'm told
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2015-05-18 08:25:02

by Xishi Qiu

[permalink] [raw]

Subject: Re: [RFC 0/3] Mirrored memory support for boot time allocations

Add [email protected]

On 2015/5/18 15:58, Xishi Qiu wrote:

> On 2015/2/7 5:54, Tony Luck wrote:
>
>> Platforms that support a mix of mirrored and regular memory are coming.
>>
>> We'd like to use the mirrored memory for kernel code, data and dynamically
>> allocated data because our machine check recovery code cannot fix problems
>> there. This series modifies the memblock allocator to comprehend mirrored
>> memory and use it for all boot time allocations. Later I'll dig into page_alloc.c
>> to put the leftover mirrored memory into a zone to be used for kernel allocation
>> by slab/slob/slub and others.
>
> Hi Tony,
>
> Is it means that you will create a new zone to fill mirrored memory, like the
> movable zone, right?
> I think this will change a lot of code, why not create a new migrate type?
> such as CMA, e.g. MIGRATE_MIRROR
>
> Thanks,
> Xishi Qiu
>
>>
>> You'll see why this is just RFC when you get to part 3.
>>
>> Tony Luck (3):
>> mm/memblock: Add extra "flag" to memblock to allow selection of memory
>> based on attribute
>> mm/memblock: Allocate boot time data structures from mirrored memory
>> x86, mirror: x86 enabling - find mirrored memory ranges and tell
>> memblock
>>
>> arch/s390/kernel/crash_dump.c | 4 +-
>> arch/sparc/mm/init_64.c | 4 +-
>> arch/x86/kernel/check.c | 2 +-
>> arch/x86/kernel/e820.c | 2 +-
>> arch/x86/mm/init_32.c | 2 +-
>> arch/x86/mm/memtest.c | 2 +-
>> include/linux/memblock.h | 43 ++++++++++------
>> mm/cma.c | 4 +-
>> mm/memblock.c | 113 ++++++++++++++++++++++++++++++++----------
>> mm/nobootmem.c | 12 ++++-
>> 10 files changed, 135 insertions(+), 53 deletions(-)
>>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
> .
>

2015-05-18 17:36:13

by Tony Luck

[permalink] [raw]

Subject: RE: [RFC 3/3] x86, mirror: x86 enabling - find mirrored memory ranges and tell memblock

On 2015/2/4 6:40, Tony Luck wrote:
>> Can't post this part yet because it uses things in an upcoming[*] ACPI, UEFI, or some
>> other four-letter-ending-in-I standard. So just imagine a call someplace early
>> in startup that reads information about mirrored address ranges and does:
>>

> Does the upcoming[*] ACPI will add a new flag in SRAT tables? just like memory hotplug.
>
> #define ACPI_SRAT_MEM_HOT_PLUGGABLE (1<<1) /* 01: Memory region is hot pluggable */
> +#define ACPI_SRAT_MEM_MIRROR (1<<3) /* 03: Memory region is mirrored */

The choice for this was UEFI - new attribute bit in the GetMemoryMap() return value.

UEFI 2.5 has been published with this change and I posted a newer patch 10 days ago:

https://lkml.org/lkml/2015/5/8/521

-Tony

2015-05-18 17:42:47

by Tony Luck

[permalink] [raw]

Subject: RE: [RFC 0/3] Mirrored memory support for boot time allocations

> Is it means that you will create a new zone to fill mirrored memory, like the
> movable zone, right?

That's my general plan.

> I think this will change a lot of code, why not create a new migrate type?
> such as CMA, e.g. MIGRATE_MIRROR

I'm still exploring options ... the idea is to use mirrored memory for kernel allocations
(because our machine check recovery code will always crash the system for errors
in kernel memory - while we can avoid the crash for errors in application memory).
I'm not familiar with CMA ... can you explain a bit how it might let me direct kernel
allocations to specific areas of memory?

-Tony