2017-03-24 19:21:01

by Pavel Tatashin

[permalink] [raw]
Subject: [v2 0/5] parallelized "struct page" zeroing

Changelog:
v1 - v2
- Per request, added s390 to deferred "struct page" zeroing
- Collected performance data on x86 which proofs the importance to
keep memset() as prefetch (see below).

When deferred struct page initialization feature is enabled, we get a
performance gain of initializing vmemmap in parallel after other CPUs are
started. However, we still zero the memory for vmemmap using one boot CPU.
This patch-set fixes the memset-zeroing limitation by deferring it as well.

Performance gain on SPARC with 32T:
base: https://hastebin.com/ozanelatat.go
fix: https://hastebin.com/utonawukof.go

As you can see without the fix it takes: 97.89s to boot
With the fix it takes: 46.91 to boot.

Performance gain on x86 with 1T:
base: https://hastebin.com/uvifasohon.pas
fix: https://hastebin.com/anodiqaguj.pas

On Intel we save 10.66s/T while on SPARC we save 1.59s/T. Intel has
twice as many pages, and also fewer nodes than SPARC (sparc 32 nodes, vs.
intel 8 nodes).

It takes one thread 11.25s to zero vmemmap on Intel for 1T, so it should
take additional 11.25 / 8 = 1.4s (this machine has 8 nodes) per node to
initialize the memory, but it takes only additional 0.456s per node, which
means on Intel we also benefit from having memset() and initializing all
other fields in one place.

Pavel Tatashin (5):
sparc64: simplify vmemmap_populate
mm: defining memblock_virt_alloc_try_nid_raw
mm: add "zero" argument to vmemmap allocators
mm: zero struct pages during initialization
mm: teach platforms not to zero struct pages memory

arch/powerpc/mm/init_64.c | 4 +-
arch/s390/mm/vmem.c | 5 ++-
arch/sparc/mm/init_64.c | 26 +++++++----------------
arch/x86/mm/init_64.c | 3 +-
include/linux/bootmem.h | 3 ++
include/linux/mm.h | 15 +++++++++++--
mm/memblock.c | 46 ++++++++++++++++++++++++++++++++++++------
mm/page_alloc.c | 3 ++
mm/sparse-vmemmap.c | 48 +++++++++++++++++++++++++++++---------------
9 files changed, 103 insertions(+), 50 deletions(-)


2017-03-24 19:20:53

by Pavel Tatashin

[permalink] [raw]
Subject: [v2 5/5] mm: teach platforms not to zero struct pages memory

If we are using deferred struct page initialization feature, most of
"struct page"es are getting initialized after other CPUs are started, and
hence we are benefiting from doing this job in parallel. However, we are
still zeroing all the memory that is allocated for "struct pages" using the
boot CPU. This patch solves this problem, by deferring zeroing "struct
pages" to only when they are initialized.

Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Shannon Nelson <[email protected]>
---
arch/powerpc/mm/init_64.c | 2 +-
arch/s390/mm/vmem.c | 2 +-
arch/sparc/mm/init_64.c | 2 +-
arch/x86/mm/init_64.c | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index eb4c270..24faf2d 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -181,7 +181,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
if (vmemmap_populated(start, page_size))
continue;

- p = vmemmap_alloc_block(page_size, node, true);
+ p = vmemmap_alloc_block(page_size, node, VMEMMAP_ZERO);
if (!p)
return -ENOMEM;

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 9c75214..ffe9ba1 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -252,7 +252,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
void *new_page;

new_page = vmemmap_alloc_block(PMD_SIZE, node,
- true);
+ VMEMMAP_ZERO);
if (!new_page)
goto out;
pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index d91e462..280834e 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2542,7 +2542,7 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
pte = pmd_val(*pmd);
if (!(pte & _PAGE_VALID)) {
void *block = vmemmap_alloc_block(PMD_SIZE, node,
- true);
+ VMEMMAP_ZERO);

if (!block)
return -ENOMEM;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 46101b6..9d8c72c 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1177,7 +1177,7 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
void *p;

p = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap,
- true);
+ VMEMMAP_ZERO);
if (p) {
pte_t entry;

--
1.7.1

2017-03-24 19:21:22

by Pavel Tatashin

[permalink] [raw]
Subject: [v2 4/5] mm: zero struct pages during initialization

When deferred struct page initialization is enabled, do not expect that
the memory that was allocated for struct pages was zeroed by the
allocator. Zero it when "struct pages" are initialized.

Also, a defined boolean VMEMMAP_ZERO is provided to tell platforms whether
they should zero memory or can deffer it.

Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Shannon Nelson <[email protected]>
---
include/linux/mm.h | 9 +++++++++
mm/page_alloc.c | 3 +++
2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 54df194..eb052f6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2427,6 +2427,15 @@ int vmemmap_populate_basepages(unsigned long start, unsigned long end,
#ifdef CONFIG_MEMORY_HOTPLUG
void vmemmap_free(unsigned long start, unsigned long end);
#endif
+/*
+ * Don't zero "struct page"es during early boot, and zero only when they are
+ * initialized in parallel.
+ */
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+#define VMEMMAP_ZERO false
+#else
+#define VMEMMAP_ZERO true
+#endif
void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
unsigned long size);

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f202f8b..02945e4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1168,6 +1168,9 @@ static void free_one_page(struct zone *zone,
static void __meminit __init_single_page(struct page *page, unsigned long pfn,
unsigned long zone, int nid)
{
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+ memset(page, 0, sizeof(struct page));
+#endif
set_page_links(page, zone, nid, pfn);
init_page_count(page);
page_mapcount_reset(page);
--
1.7.1

2017-03-24 19:21:10

by Pavel Tatashin

[permalink] [raw]
Subject: [v2 2/5] mm: defining memblock_virt_alloc_try_nid_raw

A new version of memblock_virt_alloc_* allocations:
- Does not zero the allocated memory
- Does not panic if request cannot be satisfied

Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Shannon Nelson <[email protected]>
---
include/linux/bootmem.h | 3 +++
mm/memblock.c | 46 +++++++++++++++++++++++++++++++++++++++-------
2 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index dbaf312..b61ea10 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -160,6 +160,9 @@ extern int reserve_bootmem_node(pg_data_t *pgdat,
#define BOOTMEM_ALLOC_ANYWHERE (~(phys_addr_t)0)

/* FIXME: Move to memblock.h at a point where we remove nobootmem.c */
+void *memblock_virt_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
+ phys_addr_t min_addr,
+ phys_addr_t max_addr, int nid);
void *memblock_virt_alloc_try_nid_nopanic(phys_addr_t size,
phys_addr_t align, phys_addr_t min_addr,
phys_addr_t max_addr, int nid);
diff --git a/mm/memblock.c b/mm/memblock.c
index 696f06d..7fdc555 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1271,7 +1271,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
static void * __init memblock_virt_alloc_internal(
phys_addr_t size, phys_addr_t align,
phys_addr_t min_addr, phys_addr_t max_addr,
- int nid)
+ int nid, bool zero)
{
phys_addr_t alloc;
void *ptr;
@@ -1322,7 +1322,8 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
return NULL;
done:
ptr = phys_to_virt(alloc);
- memset(ptr, 0, size);
+ if (zero)
+ memset(ptr, 0, size);

/*
* The min_count is set to 0 so that bootmem allocated blocks
@@ -1336,6 +1337,37 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
}

/**
+ * memblock_virt_alloc_try_nid_raw - allocate boot memory block without zeroing
+ * memory and without panicking
+ * @size: size of memory block to be allocated in bytes
+ * @align: alignment of the region and block's size
+ * @min_addr: the lower bound of the memory region from where the allocation
+ * is preferred (phys address)
+ * @max_addr: the upper bound of the memory region from where the allocation
+ * is preferred (phys address), or %BOOTMEM_ALLOC_ACCESSIBLE to
+ * allocate only from memory limited by memblock.current_limit value
+ * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
+ *
+ * Public function, provides additional debug information (including caller
+ * info), if enabled. Does not zero allocated memory, does not panic if request
+ * cannot be satisfied.
+ *
+ * RETURNS:
+ * Virtual address of allocated memory block on success, NULL on failure.
+ */
+void * __init memblock_virt_alloc_try_nid_raw(
+ phys_addr_t size, phys_addr_t align,
+ phys_addr_t min_addr, phys_addr_t max_addr,
+ int nid)
+{
+ memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=0x%llx max_addr=0x%llx %pF\n",
+ __func__, (u64)size, (u64)align, nid, (u64)min_addr,
+ (u64)max_addr, (void *)_RET_IP_);
+ return memblock_virt_alloc_internal(size, align,
+ min_addr, max_addr, nid, false);
+}
+
+/**
* memblock_virt_alloc_try_nid_nopanic - allocate boot memory block
* @size: size of memory block to be allocated in bytes
* @align: alignment of the region and block's size
@@ -1346,8 +1378,8 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
* allocate only from memory limited by memblock.current_limit value
* @nid: nid of the free area to find, %NUMA_NO_NODE for any node
*
- * Public version of _memblock_virt_alloc_try_nid_nopanic() which provides
- * additional debug information (including caller info), if enabled.
+ * Public function, provides additional debug information (including caller
+ * info), if enabled. This function zeroes the allocated memory.
*
* RETURNS:
* Virtual address of allocated memory block on success, NULL on failure.
@@ -1361,7 +1393,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
__func__, (u64)size, (u64)align, nid, (u64)min_addr,
(u64)max_addr, (void *)_RET_IP_);
return memblock_virt_alloc_internal(size, align, min_addr,
- max_addr, nid);
+ max_addr, nid, true);
}

/**
@@ -1375,7 +1407,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
* allocate only from memory limited by memblock.current_limit value
* @nid: nid of the free area to find, %NUMA_NO_NODE for any node
*
- * Public panicking version of _memblock_virt_alloc_try_nid_nopanic()
+ * Public panicking version of memblock_virt_alloc_try_nid_nopanic()
* which provides debug information (including caller info), if enabled,
* and panics if the request can not be satisfied.
*
@@ -1393,7 +1425,7 @@ phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, i
__func__, (u64)size, (u64)align, nid, (u64)min_addr,
(u64)max_addr, (void *)_RET_IP_);
ptr = memblock_virt_alloc_internal(size, align,
- min_addr, max_addr, nid);
+ min_addr, max_addr, nid, true);
if (ptr)
return ptr;

--
1.7.1

2017-03-24 19:21:34

by Pavel Tatashin

[permalink] [raw]
Subject: [v2 1/5] sparc64: simplify vmemmap_populate

Remove duplicating code, by using common functions
vmemmap_pud_populate and vmemmap_pgd_populate functions.

Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Shannon Nelson <[email protected]>
---
arch/sparc/mm/init_64.c | 23 ++++++-----------------
1 files changed, 6 insertions(+), 17 deletions(-)

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 2c0cb2a..01eccab 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2526,30 +2526,19 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
vstart = vstart & PMD_MASK;
vend = ALIGN(vend, PMD_SIZE);
for (; vstart < vend; vstart += PMD_SIZE) {
- pgd_t *pgd = pgd_offset_k(vstart);
+ pgd_t *pgd = vmemmap_pgd_populate(vstart, node);
unsigned long pte;
pud_t *pud;
pmd_t *pmd;

- if (pgd_none(*pgd)) {
- pud_t *new = vmemmap_alloc_block(PAGE_SIZE, node);
+ if (!pgd)
+ return -ENOMEM;

- if (!new)
- return -ENOMEM;
- pgd_populate(&init_mm, pgd, new);
- }
-
- pud = pud_offset(pgd, vstart);
- if (pud_none(*pud)) {
- pmd_t *new = vmemmap_alloc_block(PAGE_SIZE, node);
-
- if (!new)
- return -ENOMEM;
- pud_populate(&init_mm, pud, new);
- }
+ pud = vmemmap_pud_populate(pgd, vstart, node);
+ if (!pud)
+ return -ENOMEM;

pmd = pmd_offset(pud, vstart);
-
pte = pmd_val(*pmd);
if (!(pte & _PAGE_VALID)) {
void *block = vmemmap_alloc_block(PMD_SIZE, node);
--
1.7.1

2017-03-24 19:20:47

by Pavel Tatashin

[permalink] [raw]
Subject: [v2 3/5] mm: add "zero" argument to vmemmap allocators

Allow clients to request non-zeroed memory from vmemmap allocator.
The following two public function have a new boolean argument called zero:

__vmemmap_alloc_block_buf()
vmemmap_alloc_block()

When zero is true, memory that is allocated by memblock allocator is zeroed
(the current behavior), when argument is false, the memory is not zeroed.

This change allows for optimizations where client knows when it is better
to zero memory: may be later when other CPUs are started, or may be client
is going to set every byte in the allocated memory, so no need to zero
memory beforehand.

Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Shannon Nelson <[email protected]>
---
arch/powerpc/mm/init_64.c | 4 +-
arch/s390/mm/vmem.c | 5 ++-
arch/sparc/mm/init_64.c | 3 +-
arch/x86/mm/init_64.c | 3 +-
include/linux/mm.h | 6 ++--
mm/sparse-vmemmap.c | 48 +++++++++++++++++++++++++++++---------------
6 files changed, 43 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 9be9920..eb4c270 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -133,7 +133,7 @@ static int __meminit vmemmap_populated(unsigned long start, int page_size)

/* allocate a page when required and hand out chunks */
if (!num_left) {
- next = vmemmap_alloc_block(PAGE_SIZE, node);
+ next = vmemmap_alloc_block(PAGE_SIZE, node, true);
if (unlikely(!next)) {
WARN_ON(1);
return NULL;
@@ -181,7 +181,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
if (vmemmap_populated(start, page_size))
continue;

- p = vmemmap_alloc_block(page_size, node);
+ p = vmemmap_alloc_block(page_size, node, true);
if (!p)
return -ENOMEM;

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 60d3899..9c75214 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -251,7 +251,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
if (MACHINE_HAS_EDAT1) {
void *new_page;

- new_page = vmemmap_alloc_block(PMD_SIZE, node);
+ new_page = vmemmap_alloc_block(PMD_SIZE, node,
+ true);
if (!new_page)
goto out;
pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
@@ -271,7 +272,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
if (pte_none(*pt_dir)) {
void *new_page;

- new_page = vmemmap_alloc_block(PAGE_SIZE, node);
+ new_page = vmemmap_alloc_block(PAGE_SIZE, node, true);
if (!new_page)
goto out;
pte_val(*pt_dir) = __pa(new_page) | pgt_prot;
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 01eccab..d91e462 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2541,7 +2541,8 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
pmd = pmd_offset(pud, vstart);
pte = pmd_val(*pmd);
if (!(pte & _PAGE_VALID)) {
- void *block = vmemmap_alloc_block(PMD_SIZE, node);
+ void *block = vmemmap_alloc_block(PMD_SIZE, node,
+ true);

if (!block)
return -ENOMEM;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 15173d3..46101b6 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1176,7 +1176,8 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
if (pmd_none(*pmd)) {
void *p;

- p = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
+ p = __vmemmap_alloc_block_buf(PMD_SIZE, node, altmap,
+ true);
if (p) {
pte_t entry;

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5f01c88..54df194 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2410,13 +2410,13 @@ void sparse_mem_maps_populate_node(struct page **map_map,
pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);
pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node);
pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node);
-void *vmemmap_alloc_block(unsigned long size, int node);
+void *vmemmap_alloc_block(unsigned long size, int node, bool zero);
struct vmem_altmap;
void *__vmemmap_alloc_block_buf(unsigned long size, int node,
- struct vmem_altmap *altmap);
+ struct vmem_altmap *altmap, bool zero);
static inline void *vmemmap_alloc_block_buf(unsigned long size, int node)
{
- return __vmemmap_alloc_block_buf(size, node, NULL);
+ return __vmemmap_alloc_block_buf(size, node, NULL, true);
}

void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index a56c398..1e9508b 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -39,16 +39,27 @@
static void * __ref __earlyonly_bootmem_alloc(int node,
unsigned long size,
unsigned long align,
- unsigned long goal)
+ unsigned long goal,
+ bool zero)
{
- return memblock_virt_alloc_try_nid(size, align, goal,
- BOOTMEM_ALLOC_ACCESSIBLE, node);
+ void *mem = memblock_virt_alloc_try_nid_raw(size, align, goal,
+ BOOTMEM_ALLOC_ACCESSIBLE,
+ node);
+ if (!mem) {
+ panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=0x%lx\n",
+ __func__, size, align, node, goal);
+ return NULL;
+ }
+
+ if (zero)
+ memset(mem, 0, size);
+ return mem;
}

static void *vmemmap_buf;
static void *vmemmap_buf_end;

-void * __meminit vmemmap_alloc_block(unsigned long size, int node)
+void * __meminit vmemmap_alloc_block(unsigned long size, int node, bool zero)
{
/* If the main allocator is up use that, fallback to bootmem. */
if (slab_is_available()) {
@@ -67,24 +78,27 @@
return NULL;
} else
return __earlyonly_bootmem_alloc(node, size, size,
- __pa(MAX_DMA_ADDRESS));
+ __pa(MAX_DMA_ADDRESS), zero);
}

/* need to make sure size is all the same during early stage */
-static void * __meminit alloc_block_buf(unsigned long size, int node)
+static void * __meminit alloc_block_buf(unsigned long size, int node, bool zero)
{
void *ptr;

if (!vmemmap_buf)
- return vmemmap_alloc_block(size, node);
+ return vmemmap_alloc_block(size, node, zero);

/* take the from buf */
ptr = (void *)ALIGN((unsigned long)vmemmap_buf, size);
if (ptr + size > vmemmap_buf_end)
- return vmemmap_alloc_block(size, node);
+ return vmemmap_alloc_block(size, node, zero);

vmemmap_buf = ptr + size;

+ if (zero)
+ memset(ptr, 0, size);
+
return ptr;
}

@@ -152,11 +166,11 @@ static unsigned long __meminit vmem_altmap_alloc(struct vmem_altmap *altmap,

/* need to make sure size is all the same during early stage */
void * __meminit __vmemmap_alloc_block_buf(unsigned long size, int node,
- struct vmem_altmap *altmap)
+ struct vmem_altmap *altmap, bool zero)
{
if (altmap)
return altmap_alloc_block_buf(size, altmap);
- return alloc_block_buf(size, node);
+ return alloc_block_buf(size, node, zero);
}

void __meminit vmemmap_verify(pte_t *pte, int node,
@@ -175,7 +189,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
pte_t *pte = pte_offset_kernel(pmd, addr);
if (pte_none(*pte)) {
pte_t entry;
- void *p = alloc_block_buf(PAGE_SIZE, node);
+ void *p = alloc_block_buf(PAGE_SIZE, node, true);
if (!p)
return NULL;
entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
@@ -188,7 +202,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
{
pmd_t *pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd)) {
- void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+ void *p = vmemmap_alloc_block(PAGE_SIZE, node, true);
if (!p)
return NULL;
pmd_populate_kernel(&init_mm, pmd, p);
@@ -200,7 +214,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
{
pud_t *pud = pud_offset(p4d, addr);
if (pud_none(*pud)) {
- void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+ void *p = vmemmap_alloc_block(PAGE_SIZE, node, true);
if (!p)
return NULL;
pud_populate(&init_mm, pud, p);
@@ -212,7 +226,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
{
p4d_t *p4d = p4d_offset(pgd, addr);
if (p4d_none(*p4d)) {
- void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+ void *p = vmemmap_alloc_block(PAGE_SIZE, node, true);
if (!p)
return NULL;
p4d_populate(&init_mm, p4d, p);
@@ -224,7 +238,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
{
pgd_t *pgd = pgd_offset_k(addr);
if (pgd_none(*pgd)) {
- void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+ void *p = vmemmap_alloc_block(PAGE_SIZE, node, true);
if (!p)
return NULL;
pgd_populate(&init_mm, pgd, p);
@@ -290,8 +304,8 @@ void __init sparse_mem_maps_populate_node(struct page **map_map,
void *vmemmap_buf_start;

size = ALIGN(size, PMD_SIZE);
- vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
- PMD_SIZE, __pa(MAX_DMA_ADDRESS));
+ vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size
+ * map_count, PMD_SIZE, __pa(MAX_DMA_ADDRESS), false);

if (vmemmap_buf_start) {
vmemmap_buf = vmemmap_buf_start;
--
1.7.1

2017-03-25 21:26:35

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [v2 0/5] parallelized "struct page" zeroing

On Fri, Mar 24, 2017 at 03:19:47PM -0400, Pavel Tatashin wrote:
> Changelog:
> v1 - v2
> - Per request, added s390 to deferred "struct page" zeroing
> - Collected performance data on x86 which proofs the importance to
> keep memset() as prefetch (see below).
>
> When deferred struct page initialization feature is enabled, we get a
> performance gain of initializing vmemmap in parallel after other CPUs are
> started. However, we still zero the memory for vmemmap using one boot CPU.
> This patch-set fixes the memset-zeroing limitation by deferring it as well.
>
> Performance gain on SPARC with 32T:
> base: https://hastebin.com/ozanelatat.go
> fix: https://hastebin.com/utonawukof.go
>
> As you can see without the fix it takes: 97.89s to boot
> With the fix it takes: 46.91 to boot.
>
> Performance gain on x86 with 1T:
> base: https://hastebin.com/uvifasohon.pas
> fix: https://hastebin.com/anodiqaguj.pas
>
> On Intel we save 10.66s/T while on SPARC we save 1.59s/T. Intel has
> twice as many pages, and also fewer nodes than SPARC (sparc 32 nodes, vs.
> intel 8 nodes).
>
> It takes one thread 11.25s to zero vmemmap on Intel for 1T, so it should
> take additional 11.25 / 8 = 1.4s (this machine has 8 nodes) per node to
> initialize the memory, but it takes only additional 0.456s per node, which
> means on Intel we also benefit from having memset() and initializing all
> other fields in one place.

My question was how long it takes if you memset in neither place.

2017-03-27 06:01:36

by Heiko Carstens

[permalink] [raw]
Subject: Re: [v2 5/5] mm: teach platforms not to zero struct pages memory

On Fri, Mar 24, 2017 at 03:19:52PM -0400, Pavel Tatashin wrote:
> If we are using deferred struct page initialization feature, most of
> "struct page"es are getting initialized after other CPUs are started, and
> hence we are benefiting from doing this job in parallel. However, we are
> still zeroing all the memory that is allocated for "struct pages" using the
> boot CPU. This patch solves this problem, by deferring zeroing "struct
> pages" to only when they are initialized.
>
> Signed-off-by: Pavel Tatashin <[email protected]>
> Reviewed-by: Shannon Nelson <[email protected]>
> ---
> arch/powerpc/mm/init_64.c | 2 +-
> arch/s390/mm/vmem.c | 2 +-
> arch/sparc/mm/init_64.c | 2 +-
> arch/x86/mm/init_64.c | 2 +-
> 4 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index eb4c270..24faf2d 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -181,7 +181,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
> if (vmemmap_populated(start, page_size))
> continue;
>
> - p = vmemmap_alloc_block(page_size, node, true);
> + p = vmemmap_alloc_block(page_size, node, VMEMMAP_ZERO);
> if (!p)
> return -ENOMEM;
>
> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
> index 9c75214..ffe9ba1 100644
> --- a/arch/s390/mm/vmem.c
> +++ b/arch/s390/mm/vmem.c
> @@ -252,7 +252,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
> void *new_page;
>
> new_page = vmemmap_alloc_block(PMD_SIZE, node,
> - true);
> + VMEMMAP_ZERO);
> if (!new_page)
> goto out;
> pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;

s390 has two call sites that need to be converted, like you did in one of
your previous patches. The same seems to be true for powerpc, unless there
is a reason to not convert them?

2017-04-07 06:15:40

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [v2 5/5] mm: teach platforms not to zero struct pages memory

Heiko Carstens <[email protected]> writes:

> On Fri, Mar 24, 2017 at 03:19:52PM -0400, Pavel Tatashin wrote:
>> If we are using deferred struct page initialization feature, most of
>> "struct page"es are getting initialized after other CPUs are started, and
>> hence we are benefiting from doing this job in parallel. However, we are
>> still zeroing all the memory that is allocated for "struct pages" using the
>> boot CPU. This patch solves this problem, by deferring zeroing "struct
>> pages" to only when they are initialized.
>>
>> Signed-off-by: Pavel Tatashin <[email protected]>
>> Reviewed-by: Shannon Nelson <[email protected]>
>> ---
>> arch/powerpc/mm/init_64.c | 2 +-
>> arch/s390/mm/vmem.c | 2 +-
>> arch/sparc/mm/init_64.c | 2 +-
>> arch/x86/mm/init_64.c | 2 +-
>> 4 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
>> index eb4c270..24faf2d 100644
>> --- a/arch/powerpc/mm/init_64.c
>> +++ b/arch/powerpc/mm/init_64.c
>> @@ -181,7 +181,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
>> if (vmemmap_populated(start, page_size))
>> continue;
>>
>> - p = vmemmap_alloc_block(page_size, node, true);
>> + p = vmemmap_alloc_block(page_size, node, VMEMMAP_ZERO);
>> if (!p)
>> return -ENOMEM;
>>
>> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
>> index 9c75214..ffe9ba1 100644
>> --- a/arch/s390/mm/vmem.c
>> +++ b/arch/s390/mm/vmem.c
>> @@ -252,7 +252,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
>> void *new_page;
>>
>> new_page = vmemmap_alloc_block(PMD_SIZE, node,
>> - true);
>> + VMEMMAP_ZERO);
>> if (!new_page)
>> goto out;
>> pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
>
> s390 has two call sites that need to be converted, like you did in one of
> your previous patches. The same seems to be true for powerpc, unless there
> is a reason to not convert them?
>

vmemmap_list_alloc is not really struct page allocation right ? We are
just allocating memory to be used as vmemmmap_backing. But considering
we are updating all the three elements of the sturct, we can avoid that
memset . But instead of VMEMMAP_ZERO we can just pass false in that case
?

-aneesh