2018-07-09 17:54:55

by Pavel Tatashin

[permalink] [raw]
Subject: [PATCH v4 0/3] sparse_init rewrite

Changelog:
v4 - v3
- Addressed comments from Dave Hansen
v3 - v1
- Fixed two issues found by Baoquan He
v1 - v2
- Addressed comments from Oscar Salvador

In sparse_init() we allocate two large buffers to temporary hold usemap and
memmap for the whole machine. However, we can avoid doing that if we
changed sparse_init() to operated on per-node bases instead of doing it on
the whole machine beforehand.

As shown by Baoquan
http://lkml.kernel.org/r/[email protected]

The buffers are large enough to cause machine stop to boot on small memory
systems.

These patches should be applied on top of Baoquan's work, as
CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER is removed in that work.

For the ease of review, I split this work so the first patch only adds new
interfaces, the second patch enables them, and removes the old ones.

Pavel Tatashin (3):
mm/sparse: add sparse_init_nid()
mm/sparse: start using sparse_init_nid(), and remove old code
mm/sparse: refactor sparse vmemmap buffer allocations

include/linux/mm.h | 13 +-
mm/sparse-vmemmap.c | 111 ++++++++++-------
mm/sparse.c | 281 +++++++++++++++-----------------------------
3 files changed, 170 insertions(+), 235 deletions(-)

--
2.18.0



2018-07-09 17:54:51

by Pavel Tatashin

[permalink] [raw]
Subject: [PATCH v4 3/3] mm/sparse: refactor sparse vmemmap buffer allocations

When struct pages are allocated for sparse-vmemmap VA layout, we first
try to allocate one large buffer, and than if that fails allocate struct
pages for each section as we go.

The code that allocates buffer is uses global variables and is spread
across several call sites.

Cleanup the code by introducing three functions to handle the global
buffer:
vmemmap_buffer_init() initialize the buffer
vmemmap_buffer_fini() free the remaining part of the buffer
vmemmap_buffer_alloc() alloc from the buffer, and if buffer is empty
return NULL

Signed-off-by: Pavel Tatashin <[email protected]>
---
mm/sparse-vmemmap.c | 72 ++++++++++++++++++++++++++-------------------
1 file changed, 41 insertions(+), 31 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 87ba7cf8c75b..4e7f51aebabf 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -46,8 +46,42 @@ static void * __ref __earlyonly_bootmem_alloc(int node,
BOOTMEM_ALLOC_ACCESSIBLE, node);
}

-static void *vmemmap_buf;
-static void *vmemmap_buf_end;
+static void *vmemmap_buf __meminitdata;
+static void *vmemmap_buf_end __meminitdata;
+
+static void __init vmemmap_buffer_init(int nid, unsigned long map_count)
+{
+ unsigned long sec_size = sizeof(struct page) * PAGES_PER_SECTION;
+ unsigned long alloc_size = ALIGN(sec_size, PMD_SIZE) * map_count;
+
+ BUG_ON(vmemmap_buf);
+ vmemmap_buf = __earlyonly_bootmem_alloc(nid, alloc_size, 0,
+ __pa(MAX_DMA_ADDRESS));
+ vmemmap_buf_end = vmemmap_buf + alloc_size;
+}
+
+static void __init vmemmap_buffer_fini(void)
+{
+ unsigned long size = vmemmap_buf_end - vmemmap_buf;
+
+ if (vmemmap_buf && size > 0)
+ memblock_free_early(__pa(vmemmap_buf), size);
+ vmemmap_buf = NULL;
+}
+
+static void * __meminit vmemmap_buffer_alloc(unsigned long size)
+{
+ void *ptr = NULL;
+
+ if (vmemmap_buf) {
+ ptr = (void *)ALIGN((unsigned long)vmemmap_buf, size);
+ if (ptr + size > vmemmap_buf_end)
+ ptr = NULL;
+ else
+ vmemmap_buf = ptr + size;
+ }
+ return ptr;
+}

void * __meminit vmemmap_alloc_block(unsigned long size, int node)
{
@@ -76,18 +110,10 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node)
/* need to make sure size is all the same during early stage */
void * __meminit vmemmap_alloc_block_buf(unsigned long size, int node)
{
- void *ptr;
-
- if (!vmemmap_buf)
- return vmemmap_alloc_block(size, node);
-
- /* take the from buf */
- ptr = (void *)ALIGN((unsigned long)vmemmap_buf, size);
- if (ptr + size > vmemmap_buf_end)
- return vmemmap_alloc_block(size, node);
-
- vmemmap_buf = ptr + size;
+ void *ptr = vmemmap_buffer_alloc(size);

+ if (!ptr)
+ ptr = vmemmap_alloc_block(size, node);
return ptr;
}

@@ -282,19 +308,9 @@ struct page * __init sparse_populate_node(unsigned long pnum_begin,
unsigned long map_count,
int nid)
{
- unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
unsigned long pnum, map_index = 0;
- void *vmemmap_buf_start;
-
- size = ALIGN(size, PMD_SIZE) * map_count;
- vmemmap_buf_start = __earlyonly_bootmem_alloc(nid, size,
- PMD_SIZE,
- __pa(MAX_DMA_ADDRESS));
- if (vmemmap_buf_start) {
- vmemmap_buf = vmemmap_buf_start;
- vmemmap_buf_end = vmemmap_buf_start + size;
- }

+ vmemmap_buffer_init(nid, map_count);
for (pnum = pnum_begin; map_index < map_count; pnum++) {
if (!present_section_nr(pnum))
continue;
@@ -303,14 +319,8 @@ struct page * __init sparse_populate_node(unsigned long pnum_begin,
map_index++;
BUG_ON(pnum >= pnum_end);
}
+ vmemmap_buffer_fini();

- if (vmemmap_buf_start) {
- /* need to free left buf */
- memblock_free_early(__pa(vmemmap_buf),
- vmemmap_buf_end - vmemmap_buf);
- vmemmap_buf = NULL;
- vmemmap_buf_end = NULL;
- }
return pfn_to_page(section_nr_to_pfn(pnum_begin));
}

--
2.18.0


2018-07-09 17:55:16

by Pavel Tatashin

[permalink] [raw]
Subject: [PATCH v4 1/3] mm/sparse: add sparse_init_nid()

sparse_init() requires to temporary allocate two large buffers:
usemap_map and map_map. Baoquan He has identified that these buffers are so
large that Linux is not bootable on small memory machines, such as a kdump
boot. The buffers are especially large when CONFIG_X86_5LEVEL is set, as
they are scaled to the maximum physical memory size.

Baoquan provided a fix, which reduces these sizes of these buffers, but it
is much better to get rid of them entirely.

Add a new way to initialize sparse memory: sparse_init_nid(), which only
operates within one memory node, and thus allocates memory either in large
contiguous block or allocates section by section. This eliminates the need
for use of temporary buffers.

For simplified bisecting and review, the new interface is going to be
enabled as well as old code removed in the next patch.

Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
---
include/linux/mm.h | 8 ++++
mm/sparse-vmemmap.c | 54 +++++++++++++++++++++++++++
mm/sparse.c | 91 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 153 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a0fbb9ffe380..5fdea58e67a5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2651,6 +2651,14 @@ void sparse_mem_maps_populate_node(struct page **map_map,
unsigned long pnum_end,
unsigned long map_count,
int nodeid);
+struct page *sparse_populate_node(unsigned long pnum_begin,
+ unsigned long pnum_end,
+ unsigned long map_count,
+ int nid);
+struct page *sparse_populate_node_section(struct page *map_base,
+ unsigned long map_index,
+ unsigned long pnum,
+ int nid);

struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
struct vmem_altmap *altmap);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index e1a54ba411ec..f91056bfe972 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -311,3 +311,57 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
vmemmap_buf_end = NULL;
}
}
+
+/*
+ * Allocate struct pages for every section in nid node. Number of present
+ * sections is specified by map_count, and range is [pnum_begin, pnum_end).
+ */
+struct page * __init sparse_populate_node(unsigned long pnum_begin,
+ unsigned long pnum_end,
+ unsigned long map_count,
+ int nid)
+{
+ unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
+ unsigned long pnum, map_index = 0;
+ void *vmemmap_buf_start;
+
+ size = ALIGN(size, PMD_SIZE) * map_count;
+ vmemmap_buf_start = __earlyonly_bootmem_alloc(nid, size,
+ PMD_SIZE,
+ __pa(MAX_DMA_ADDRESS));
+ if (vmemmap_buf_start) {
+ vmemmap_buf = vmemmap_buf_start;
+ vmemmap_buf_end = vmemmap_buf_start + size;
+ }
+
+ for (pnum = pnum_begin; map_index < map_count; pnum++) {
+ if (!present_section_nr(pnum))
+ continue;
+ if (!sparse_mem_map_populate(pnum, nid, NULL))
+ break;
+ map_index++;
+ BUG_ON(pnum >= pnum_end);
+ }
+
+ if (vmemmap_buf_start) {
+ /* need to free left buf */
+ memblock_free_early(__pa(vmemmap_buf),
+ vmemmap_buf_end - vmemmap_buf);
+ vmemmap_buf = NULL;
+ vmemmap_buf_end = NULL;
+ }
+ return pfn_to_page(section_nr_to_pfn(pnum_begin));
+}
+
+/*
+ * Return map for pnum section. sparse_populate_node() has populated memory map
+ * in this node, we simply do pnum to struct page conversion.
+ * Note: unused arguments are used in non-vmemmap version of this function.
+ */
+struct page * __init sparse_populate_node_section(struct page *map_base,
+ unsigned long map_index,
+ unsigned long pnum,
+ int nid)
+{
+ return pfn_to_page(section_nr_to_pfn(pnum));
+}
diff --git a/mm/sparse.c b/mm/sparse.c
index d18e2697a781..3cf66bfb6b81 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -456,6 +456,43 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
__func__);
}
}
+
+static unsigned long __init section_map_size(void)
+{
+ return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
+}
+
+/*
+ * Try to allocate all struct pages for this node, if this fails, we will
+ * be allocating one section at a time in sparse_populate_node_section().
+ */
+struct page * __init sparse_populate_node(unsigned long pnum_begin,
+ unsigned long pnum_end,
+ unsigned long map_count,
+ int nid)
+{
+ return memblock_virt_alloc_try_nid_raw(section_map_size() * map_count,
+ PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
+ BOOTMEM_ALLOC_ACCESSIBLE, nid);
+}
+
+/*
+ * Return map for pnum section. map_base is not NULL if we could allocate map
+ * for this node together. Otherwise we allocate one section at a time.
+ * map_index is the index of pnum in this node counting only present sections.
+ */
+struct page * __init sparse_populate_node_section(struct page *map_base,
+ unsigned long map_index,
+ unsigned long pnum,
+ int nid)
+{
+ if (map_base) {
+ unsigned long offset = section_map_size() * map_index;
+
+ return (struct page *)((char *)map_base + offset);
+ }
+ return sparse_mem_map_populate(pnum, nid, NULL);
+}
#endif /* !CONFIG_SPARSEMEM_VMEMMAP */

static void __init sparse_early_mem_maps_alloc_node(void *data,
@@ -520,6 +557,60 @@ static void __init alloc_usemap_and_memmap(void (*alloc_func)
map_count, nodeid_begin);
}

+/*
+ * Initialize sparse on a specific node. The node spans [pnum_begin, pnum_end)
+ * And number of present sections in this node is map_count.
+ */
+void __init sparse_init_nid(int nid, unsigned long pnum_begin,
+ unsigned long pnum_end,
+ unsigned long map_count)
+{
+ unsigned long pnum, usemap_longs, *usemap, map_index;
+ struct page *map, *map_base;
+
+ usemap_longs = BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS);
+ usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid),
+ usemap_size() *
+ map_count);
+ if (!usemap) {
+ pr_err("%s: node[%d] usemap allocation failed", __func__, nid);
+ goto failed;
+ }
+ map_base = sparse_populate_node(pnum_begin, pnum_end,
+ map_count, nid);
+ map_index = 0;
+ for_each_present_section_nr(pnum_begin, pnum) {
+ if (pnum >= pnum_end)
+ break;
+
+ BUG_ON(map_index == map_count);
+ map = sparse_populate_node_section(map_base, map_index,
+ pnum, nid);
+ if (!map) {
+ pr_err("%s: node[%d] memory map backing failed. Some memory will not be available.",
+ __func__, nid);
+ pnum_begin = pnum;
+ goto failed;
+ }
+ check_usemap_section_nr(nid, usemap);
+ sparse_init_one_section(__nr_to_section(pnum), pnum, map,
+ usemap);
+ map_index++;
+ usemap += usemap_longs;
+ }
+ return;
+failed:
+ /* We failed to allocate, mark all the following pnums as not present */
+ for_each_present_section_nr(pnum_begin, pnum) {
+ struct mem_section *ms;
+
+ if (pnum >= pnum_end)
+ break;
+ ms = __nr_to_section(pnum);
+ ms->section_mem_map = 0;
+ }
+}
+
/*
* Allocate the accumulated non-linear sections, allocate a mem_map
* for each and record the physical to section mapping.
--
2.18.0


2018-07-09 17:56:06

by Pavel Tatashin

[permalink] [raw]
Subject: [PATCH v4 2/3] mm/sparse: start using sparse_init_nid(), and remove old code

Change sprase_init() to only find the pnum ranges that belong to a specific
node and call sprase_init_nid() for that range from sparse_init().

Delete all the code that became obsolete with this change.

Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
---
include/linux/mm.h | 5 -
mm/sparse-vmemmap.c | 39 --------
mm/sparse.c | 222 ++++----------------------------------------
3 files changed, 19 insertions(+), 247 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5fdea58e67a5..cb49611d1199 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2646,11 +2646,6 @@ extern int randomize_va_space;
const char * arch_vma_name(struct vm_area_struct *vma);
void print_vma_addr(char *prefix, unsigned long rip);

-void sparse_mem_maps_populate_node(struct page **map_map,
- unsigned long pnum_begin,
- unsigned long pnum_end,
- unsigned long map_count,
- int nodeid);
struct page *sparse_populate_node(unsigned long pnum_begin,
unsigned long pnum_end,
unsigned long map_count,
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index f91056bfe972..87ba7cf8c75b 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -273,45 +273,6 @@ struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid,
return map;
}

-void __init sparse_mem_maps_populate_node(struct page **map_map,
- unsigned long pnum_begin,
- unsigned long pnum_end,
- unsigned long map_count, int nodeid)
-{
- unsigned long pnum;
- unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
- void *vmemmap_buf_start;
- int nr_consumed_maps = 0;
-
- size = ALIGN(size, PMD_SIZE);
- vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
- PMD_SIZE, __pa(MAX_DMA_ADDRESS));
-
- if (vmemmap_buf_start) {
- vmemmap_buf = vmemmap_buf_start;
- vmemmap_buf_end = vmemmap_buf_start + size * map_count;
- }
-
- for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
- if (!present_section_nr(pnum))
- continue;
-
- map_map[nr_consumed_maps] = sparse_mem_map_populate(pnum, nodeid, NULL);
- if (map_map[nr_consumed_maps++])
- continue;
- pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
- __func__);
- }
-
- if (vmemmap_buf_start) {
- /* need to free left buf */
- memblock_free_early(__pa(vmemmap_buf),
- vmemmap_buf_end - vmemmap_buf);
- vmemmap_buf = NULL;
- vmemmap_buf_end = NULL;
- }
-}
-
/*
* Allocate struct pages for every section in nid node. Number of present
* sections is specified by map_count, and range is [pnum_begin, pnum_end).
diff --git a/mm/sparse.c b/mm/sparse.c
index 3cf66bfb6b81..629e0d979333 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -200,11 +200,10 @@ static inline int next_present_section_nr(int section_nr)
(section_nr <= __highest_present_section_nr)); \
section_nr = next_present_section_nr(section_nr))

-/*
- * Record how many memory sections are marked as present
- * during system bootup.
- */
-static int __initdata nr_present_sections;
+static inline unsigned long first_present_section_nr(void)
+{
+ return next_present_section_nr(-1);
+}

/* Record a memory area against a node. */
void __init memory_present(int nid, unsigned long start, unsigned long end)
@@ -235,7 +234,6 @@ void __init memory_present(int nid, unsigned long start, unsigned long end)
ms->section_mem_map = sparse_encode_early_nid(nid) |
SECTION_IS_ONLINE;
section_mark_present(ms);
- nr_present_sections++;
}
}
}
@@ -377,34 +375,6 @@ static void __init check_usemap_section_nr(int nid, unsigned long *usemap)
}
#endif /* CONFIG_MEMORY_HOTREMOVE */

-static void __init sparse_early_usemaps_alloc_node(void *data,
- unsigned long pnum_begin,
- unsigned long pnum_end,
- unsigned long usemap_count, int nodeid)
-{
- void *usemap;
- unsigned long pnum;
- unsigned long **usemap_map = (unsigned long **)data;
- int size = usemap_size();
- int nr_consumed_maps = 0;
-
- usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
- size * usemap_count);
- if (!usemap) {
- pr_warn("%s: allocation failed\n", __func__);
- return;
- }
-
- for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
- if (!present_section_nr(pnum))
- continue;
- usemap_map[nr_consumed_maps] = usemap;
- usemap += size;
- check_usemap_section_nr(nodeid, usemap_map[nr_consumed_maps]);
- nr_consumed_maps++;
- }
-}
-
#ifndef CONFIG_SPARSEMEM_VMEMMAP
struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid,
struct vmem_altmap *altmap)
@@ -418,44 +388,6 @@ struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid,
BOOTMEM_ALLOC_ACCESSIBLE, nid);
return map;
}
-void __init sparse_mem_maps_populate_node(struct page **map_map,
- unsigned long pnum_begin,
- unsigned long pnum_end,
- unsigned long map_count, int nodeid)
-{
- void *map;
- unsigned long pnum;
- unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
- int nr_consumed_maps;
-
- size = PAGE_ALIGN(size);
- map = memblock_virt_alloc_try_nid_raw(size * map_count,
- PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
- BOOTMEM_ALLOC_ACCESSIBLE, nodeid);
- if (map) {
- nr_consumed_maps = 0;
- for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
- if (!present_section_nr(pnum))
- continue;
- map_map[nr_consumed_maps] = map;
- map += size;
- nr_consumed_maps++;
- }
- return;
- }
-
- /* fallback */
- nr_consumed_maps = 0;
- for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
- if (!present_section_nr(pnum))
- continue;
- map_map[nr_consumed_maps] = sparse_mem_map_populate(pnum, nodeid, NULL);
- if (map_map[nr_consumed_maps++])
- continue;
- pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
- __func__);
- }
-}

static unsigned long __init section_map_size(void)
{
@@ -495,73 +427,15 @@ struct page * __init sparse_populate_node_section(struct page *map_base,
}
#endif /* !CONFIG_SPARSEMEM_VMEMMAP */

-static void __init sparse_early_mem_maps_alloc_node(void *data,
- unsigned long pnum_begin,
- unsigned long pnum_end,
- unsigned long map_count, int nodeid)
-{
- struct page **map_map = (struct page **)data;
- sparse_mem_maps_populate_node(map_map, pnum_begin, pnum_end,
- map_count, nodeid);
-}
-
void __weak __meminit vmemmap_populate_print_last(void)
{
}

-/**
- * alloc_usemap_and_memmap - memory alloction for pageblock flags and vmemmap
- * @map: usemap_map for pageblock flags or mmap_map for vmemmap
- * @unit_size: size of map unit
- */
-static void __init alloc_usemap_and_memmap(void (*alloc_func)
- (void *, unsigned long, unsigned long,
- unsigned long, int), void *data,
- int data_unit_size)
-{
- unsigned long pnum;
- unsigned long map_count;
- int nodeid_begin = 0;
- unsigned long pnum_begin = 0;
-
- for_each_present_section_nr(0, pnum) {
- struct mem_section *ms;
-
- ms = __nr_to_section(pnum);
- nodeid_begin = sparse_early_nid(ms);
- pnum_begin = pnum;
- break;
- }
- map_count = 1;
- for_each_present_section_nr(pnum_begin + 1, pnum) {
- struct mem_section *ms;
- int nodeid;
-
- ms = __nr_to_section(pnum);
- nodeid = sparse_early_nid(ms);
- if (nodeid == nodeid_begin) {
- map_count++;
- continue;
- }
- /* ok, we need to take cake of from pnum_begin to pnum - 1*/
- alloc_func(data, pnum_begin, pnum,
- map_count, nodeid_begin);
- /* new start, update count etc*/
- nodeid_begin = nodeid;
- pnum_begin = pnum;
- data += map_count * data_unit_size;
- map_count = 1;
- }
- /* ok, last chunk */
- alloc_func(data, pnum_begin, __highest_present_section_nr+1,
- map_count, nodeid_begin);
-}
-
/*
* Initialize sparse on a specific node. The node spans [pnum_begin, pnum_end)
* And number of present sections in this node is map_count.
*/
-void __init sparse_init_nid(int nid, unsigned long pnum_begin,
+static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
unsigned long pnum_end,
unsigned long map_count)
{
@@ -617,87 +491,29 @@ void __init sparse_init_nid(int nid, unsigned long pnum_begin,
*/
void __init sparse_init(void)
{
- unsigned long pnum;
- struct page *map;
- struct page **map_map;
- unsigned long *usemap;
- unsigned long **usemap_map;
- int size, size2;
- int nr_consumed_maps = 0;
-
- /* see include/linux/mmzone.h 'struct mem_section' definition */
- BUILD_BUG_ON(!is_power_of_2(sizeof(struct mem_section)));
+ unsigned long pnum_begin = first_present_section_nr();
+ int nid_begin = sparse_early_nid(__nr_to_section(pnum_begin));
+ unsigned long pnum_end, map_count = 1;

/* Setup pageblock_order for HUGETLB_PAGE_SIZE_VARIABLE */
set_pageblock_order();

- /*
- * map is using big page (aka 2M in x86 64 bit)
- * usemap is less one page (aka 24 bytes)
- * so alloc 2M (with 2M align) and 24 bytes in turn will
- * make next 2M slip to one more 2M later.
- * then in big system, the memory will have a lot of holes...
- * here try to allocate 2M pages continuously.
- *
- * powerpc need to call sparse_init_one_section right after each
- * sparse_early_mem_map_alloc, so allocate usemap_map at first.
- */
- size = sizeof(unsigned long *) * nr_present_sections;
- usemap_map = memblock_virt_alloc(size, 0);
- if (!usemap_map)
- panic("can not allocate usemap_map\n");
- alloc_usemap_and_memmap(sparse_early_usemaps_alloc_node,
- (void *)usemap_map,
- sizeof(usemap_map[0]));
-
- size2 = sizeof(struct page *) * nr_present_sections;
- map_map = memblock_virt_alloc(size2, 0);
- if (!map_map)
- panic("can not allocate map_map\n");
- alloc_usemap_and_memmap(sparse_early_mem_maps_alloc_node,
- (void *)map_map,
- sizeof(map_map[0]));
-
- /* The numner of present sections stored in nr_present_sections
- * are kept the same since mem sections are marked as present in
- * memory_present(). In this for loop, we need check which sections
- * failed to allocate memmap or usemap, then clear its
- * ->section_mem_map accordingly. During this process, we need
- * increase 'nr_consumed_maps' whether its allocation of memmap
- * or usemap failed or not, so that after we handle the i-th
- * memory section, can get memmap and usemap of (i+1)-th section
- * correctly. */
- for_each_present_section_nr(0, pnum) {
- struct mem_section *ms;
-
- if (nr_consumed_maps >= nr_present_sections) {
- pr_err("nr_consumed_maps goes beyond nr_present_sections\n");
- break;
- }
- ms = __nr_to_section(pnum);
- usemap = usemap_map[nr_consumed_maps];
- if (!usemap) {
- ms->section_mem_map = 0;
- nr_consumed_maps++;
- continue;
- }
+ for_each_present_section_nr(pnum_begin + 1, pnum_end) {
+ int nid = sparse_early_nid(__nr_to_section(pnum_end));

- map = map_map[nr_consumed_maps];
- if (!map) {
- ms->section_mem_map = 0;
- nr_consumed_maps++;
+ if (nid == nid_begin) {
+ map_count++;
continue;
}
-
- sparse_init_one_section(__nr_to_section(pnum), pnum, map,
- usemap);
- nr_consumed_maps++;
+ /* Init node with sections in range [pnum_begin, pnum_end) */
+ sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
+ nid_begin = nid;
+ pnum_begin = pnum_end;
+ map_count = 1;
}
-
+ /* cover the last node */
+ sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
vmemmap_populate_print_last();
-
- memblock_free_early(__pa(map_map), size2);
- memblock_free_early(__pa(usemap_map), size);
}

#ifdef CONFIG_MEMORY_HOTPLUG
--
2.18.0


2018-07-09 21:32:01

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] sparse_init rewrite

On Mon, 9 Jul 2018 13:53:09 -0400 Pavel Tatashin <[email protected]> wrote:

> In sparse_init() we allocate two large buffers to temporary hold usemap and
> memmap for the whole machine. However, we can avoid doing that if we
> changed sparse_init() to operated on per-node bases instead of doing it on
> the whole machine beforehand.
>
> As shown by Baoquan
> http://lkml.kernel.org/r/[email protected]
>
> The buffers are large enough to cause machine stop to boot on small memory
> systems.
>
> These patches should be applied on top of Baoquan's work, as
> CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER is removed in that work.
>
> For the ease of review, I split this work so the first patch only adds new
> interfaces, the second patch enables them, and removes the old ones.

This clashes pretty significantly with patches from Baoquan and Oscar:

mm-sparse-make-sparse_init_one_section-void-and-remove-check.patch
mm-sparse-make-sparse_init_one_section-void-and-remove-check-fix.patch
mm-sparse-make-sparse_init_one_section-void-and-remove-check-fix-2.patch
mm-sparse-add-a-static-variable-nr_present_sections.patch
mm-sparsemem-defer-the-ms-section_mem_map-clearing.patch
mm-sparse-add-a-new-parameter-data_unit_size-for-alloc_usemap_and_memmap.patch

Is there duplication of intent here? Any thoughts on the
prioritization of these efforts?



2018-07-09 22:56:04

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] sparse_init rewrite

On Mon, Jul 9, 2018 at 5:29 PM Andrew Morton <[email protected]> wrote:
>
> On Mon, 9 Jul 2018 13:53:09 -0400 Pavel Tatashin <[email protected]> wrote:
>
> > In sparse_init() we allocate two large buffers to temporary hold usemap and
> > memmap for the whole machine. However, we can avoid doing that if we
> > changed sparse_init() to operated on per-node bases instead of doing it on
> > the whole machine beforehand.
> >
> > As shown by Baoquan
> > http://lkml.kernel.org/r/[email protected]
> >
> > The buffers are large enough to cause machine stop to boot on small memory
> > systems.
> >
> > These patches should be applied on top of Baoquan's work, as
> > CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER is removed in that work.
> >
> > For the ease of review, I split this work so the first patch only adds new
> > interfaces, the second patch enables them, and removes the old ones.
>
> This clashes pretty significantly with patches from Baoquan and Oscar:
>
> mm-sparse-make-sparse_init_one_section-void-and-remove-check.patch
> mm-sparse-make-sparse_init_one_section-void-and-remove-check-fix.patch
> mm-sparse-make-sparse_init_one_section-void-and-remove-check-fix-2.patch
> mm-sparse-add-a-static-variable-nr_present_sections.patch
> mm-sparsemem-defer-the-ms-section_mem_map-clearing.patch
> mm-sparse-add-a-new-parameter-data_unit_size-for-alloc_usemap_and_memmap.patch
>
> Is there duplication of intent here? Any thoughts on the
> prioritization of these efforts?

Hi Andrew,

In the cover letter I wrote that these should be applied on top of
Baoquan's patches. His work fixes a bug by making temporary buffers
smaller on smaller machines, and also starts the sparse_init cleaning
process by getting rid of CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER. My
patches remove those buffers entirely. However, if my patches
conflict, I should resend based on mm-tree as Baoquan's patches are
already in and probably were slightly modified compared to what I have
locally, which I took from the mailing list.

Pavel

2018-07-09 23:57:19

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] sparse_init rewrite

Hi Andrew,

On 07/09/18 at 02:29pm, Andrew Morton wrote:
> On Mon, 9 Jul 2018 13:53:09 -0400 Pavel Tatashin <[email protected]> wrote:
> > For the ease of review, I split this work so the first patch only adds new
> > interfaces, the second patch enables them, and removes the old ones.
>
> This clashes pretty significantly with patches from Baoquan and Oscar:
>
> mm-sparse-make-sparse_init_one_section-void-and-remove-check.patch
> mm-sparse-make-sparse_init_one_section-void-and-remove-check-fix.patch
> mm-sparse-make-sparse_init_one_section-void-and-remove-check-fix-2.patch
> mm-sparse-add-a-static-variable-nr_present_sections.patch
> mm-sparsemem-defer-the-ms-section_mem_map-clearing.patch
> mm-sparse-add-a-new-parameter-data_unit_size-for-alloc_usemap_and_memmap.patch

> Is there duplication of intent here? Any thoughts on the
> prioritization of these efforts?

The final version of my patches was posted here:
http://lkml.kernel.org/r/[email protected]

Currently, only the first three patches are merged.

mm-sparse-add-a-static-variable-nr_present_sections.patch
mm-sparsemem-defer-the-ms-section_mem_map-clearing.patch
mm-sparse-add-a-new-parameter-data_unit_size-for-alloc_usemap_and_memmap.patch

They are preparation patches, and the 4th patch is the formal fix patch:
[PATCH v6 4/5] mm/sparse: Optimize memmap allocation during sparse_init()

The 5th patch is a clean up patch according to reviewer's suggestion:
[PATCH v6 5/5] mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER

I think Pavel's patches sits on top of all above five patches.

Thanks
Baoquan

2018-07-10 00:09:10

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] sparse_init rewrite

On Tue, 10 Jul 2018 07:56:04 +0800 Baoquan He <[email protected]> wrote:

> Hi Andrew,
>
> On 07/09/18 at 02:29pm, Andrew Morton wrote:
> > On Mon, 9 Jul 2018 13:53:09 -0400 Pavel Tatashin <[email protected]> wrote:
> > > For the ease of review, I split this work so the first patch only adds new
> > > interfaces, the second patch enables them, and removes the old ones.
> >
> > This clashes pretty significantly with patches from Baoquan and Oscar:
> >
> > mm-sparse-make-sparse_init_one_section-void-and-remove-check.patch
> > mm-sparse-make-sparse_init_one_section-void-and-remove-check-fix.patch
> > mm-sparse-make-sparse_init_one_section-void-and-remove-check-fix-2.patch
> > mm-sparse-add-a-static-variable-nr_present_sections.patch
> > mm-sparsemem-defer-the-ms-section_mem_map-clearing.patch
> > mm-sparse-add-a-new-parameter-data_unit_size-for-alloc_usemap_and_memmap.patch
>
> > Is there duplication of intent here? Any thoughts on the
> > prioritization of these efforts?
>
> The final version of my patches was posted here:
> http://lkml.kernel.org/r/[email protected]
>
> Currently, only the first three patches are merged.
>
> mm-sparse-add-a-static-variable-nr_present_sections.patch
> mm-sparsemem-defer-the-ms-section_mem_map-clearing.patch
> mm-sparse-add-a-new-parameter-data_unit_size-for-alloc_usemap_and_memmap.patch
>
> They are preparation patches, and the 4th patch is the formal fix patch:
> [PATCH v6 4/5] mm/sparse: Optimize memmap allocation during sparse_init()
>
> The 5th patch is a clean up patch according to reviewer's suggestion:
> [PATCH v6 5/5] mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
>
> I think Pavel's patches sits on top of all above five patches.

OK, thanks, I've just moved to the v6 series.

2018-07-10 06:01:06

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] sparse_init rewrite

On Mon, Jul 09, 2018 at 02:29:28PM -0700, Andrew Morton wrote:
> On Mon, 9 Jul 2018 13:53:09 -0400 Pavel Tatashin <[email protected]> wrote:
>
> > In sparse_init() we allocate two large buffers to temporary hold usemap and
> > memmap for the whole machine. However, we can avoid doing that if we
> > changed sparse_init() to operated on per-node bases instead of doing it on
> > the whole machine beforehand.
> >
> > As shown by Baoquan
> > http://lkml.kernel.org/r/[email protected]
> >
> > The buffers are large enough to cause machine stop to boot on small memory
> > systems.
> >
> > These patches should be applied on top of Baoquan's work, as
> > CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER is removed in that work.
> >
> > For the ease of review, I split this work so the first patch only adds new
> > interfaces, the second patch enables them, and removes the old ones.
>
> This clashes pretty significantly with patches from Baoquan and Oscar:
>
> mm-sparse-make-sparse_init_one_section-void-and-remove-check.patch
> mm-sparse-make-sparse_init_one_section-void-and-remove-check-fix.patch
> mm-sparse-make-sparse_init_one_section-void-and-remove-check-fix-2.patch

Does this patchset still clash with those patches?
If so, since those patches are already in the -mm tree, would it be better to re-base the patchset on top of that?

Thanks
--
Oscar Salvador
SUSE L3