2020-07-03 13:40:05

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 0/9] s390: implement and optimize vmemmap_free()

This series is based on the latest s390/features branch [1]. It implements
vmemmap_free(), consolidating it with vmem_add_range(), and optimizes it by
- Freeing empty page tables (now also done for idendity mapping).
- Handling cases where the vmemmap of a section does not fill huge pages
completely.

vmemmap_free() is currently never used, unless adiing standby memory fails
(unlikely). This is relevant for virtio-mem, which adds/removes memory
in memory block/section granularity (always removes memory in the same
granularity it added it).

I gave this a proper test with my virtio-mem prototype (which I will share
once the basic QEMU implementation is upstream), both with 56 byte memmap
per page and 64 byte memmap per page, with and without huge page support.
In both cases, removing memory (routed through arch_remove_memory()) will
result in
- all populated vmemmap pages to get removed/freed
- all applicable page tables for the vmemmap getting removed/freed
- all applicable page tables for the idendity mapping getting removed/freed
Unfortunately, I don't have access to bigger and z/VM (esp. dcss)
environments.

This is the basis for real memory hotunplug support for s390x and should
complete my journey to s390x vmem/vmemmap code for now :)

What needs double-checking is tlb flushing. AFAIKS, as there are no valid
accesses, doing a single range flush at the end is sufficient, both when
removing vmemmap pages and the idendity mapping.

Along, some minor cleanups.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features

David Hildenbrand (9):
s390/vmem: rename vmem_add_mem() to vmem_add_range()
s390/vmem: recursive implementation of vmem_remove_range()
s390/vmemmap: implement vmemmap_free()
s390/vmemmap: cleanup when vmemmap_populate() fails
s390/vmemmap: take the vmem_mutex when populating/freeing
s390/vmem: cleanup empty page tables
s390/vmemmap: fallback to PTEs if mapping large PMD fails
s390/vmemmap: remember unused sub-pmd ranges
s390/vmemmap: avoid memset(PAGE_UNUSED) when adding consecutive
sections

arch/s390/mm/vmem.c | 400 +++++++++++++++++++++++++++++++++++++-------
1 file changed, 338 insertions(+), 62 deletions(-)

--
2.26.2


2020-07-03 13:40:06

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 2/9] s390/vmem: recursive implementation of vmem_remove_range()

We want to reuse the same functionality in vmemmap_free(). Let's start by
introducing recursive remove_pagetable(), inspired by x86. We'll extend
it to cover the vmemmap similarly next.

A recursive implementation makes it easier to expand individual cases
without harming readability. In addition, we minimize traversing the
whole hierarchy over and over again.

One change is that we don't unmap large PMDs/PUDs when not completely
covered by the request, something that should never happen with direct
mappings, unless one would be removing in other granularity than added,
which would be broken already.

Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/mm/vmem.c | 153 +++++++++++++++++++++++++++++++-------------
1 file changed, 107 insertions(+), 46 deletions(-)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 66c5333020ead..6fe156c3f035c 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -138,64 +138,125 @@ static int vmem_add_range(unsigned long start, unsigned long size)
return ret;
}

-/*
- * Remove a physical memory range from the 1:1 mapping.
- * Currently only invalidates page table entries.
- */
-static void vmem_remove_range(unsigned long start, unsigned long size)
+static void remove_pte_table(pmd_t *pmd, unsigned long addr,
+ unsigned long end)
{
- unsigned long pages4k, pages1m, pages2g;
- unsigned long end = start + size;
- unsigned long address = start;
- pgd_t *pg_dir;
- p4d_t *p4_dir;
- pud_t *pu_dir;
- pmd_t *pm_dir;
- pte_t *pt_dir;
+ unsigned long pages = 0;
+ pte_t *pte;

- pages4k = pages1m = pages2g = 0;
- while (address < end) {
- pg_dir = pgd_offset_k(address);
- if (pgd_none(*pg_dir)) {
- address += PGDIR_SIZE;
+ pte = pte_offset_kernel(pmd, addr);
+ for (; addr < end; addr += PAGE_SIZE, pte++) {
+ if (pte_none(*pte))
continue;
- }
- p4_dir = p4d_offset(pg_dir, address);
- if (p4d_none(*p4_dir)) {
- address += P4D_SIZE;
+
+ pte_clear(&init_mm, addr, pte);
+ pages++;
+ }
+
+ update_page_count(PG_DIRECT_MAP_4K, -pages);
+}
+
+static void remove_pmd_table(pud_t *pud, unsigned long addr,
+ unsigned long end)
+{
+ unsigned long next, pages = 0;
+ pmd_t *pmd;
+
+ pmd = pmd_offset(pud, addr);
+ for (; addr < end; addr = next, pmd++) {
+ next = pmd_addr_end(addr, end);
+
+ if (pmd_none(*pmd))
continue;
- }
- pu_dir = pud_offset(p4_dir, address);
- if (pud_none(*pu_dir)) {
- address += PUD_SIZE;
+
+ if (pmd_large(*pmd)) {
+ if (IS_ALIGNED(addr, PMD_SIZE) &&
+ IS_ALIGNED(next, PMD_SIZE)) {
+ pmd_clear(pmd);
+ pages++;
+ }
continue;
}
- if (pud_large(*pu_dir)) {
- pud_clear(pu_dir);
- address += PUD_SIZE;
- pages2g++;
+
+ remove_pte_table(pmd, addr, next);
+ }
+
+ update_page_count(PG_DIRECT_MAP_1M, -pages);
+}
+
+static void remove_pud_table(p4d_t *p4d, unsigned long addr,
+ unsigned long end)
+{
+ unsigned long next, pages = 0;
+ pud_t *pud;
+
+ pud = pud_offset(p4d, addr);
+ for (; addr < end; addr = next, pud++) {
+ next = pud_addr_end(addr, end);
+
+ if (pud_none(*pud))
continue;
- }
- pm_dir = pmd_offset(pu_dir, address);
- if (pmd_none(*pm_dir)) {
- address += PMD_SIZE;
+
+ if (pud_large(*pud)) {
+ if (IS_ALIGNED(addr, PUD_SIZE) &&
+ IS_ALIGNED(next, PUD_SIZE)) {
+ pud_clear(pud);
+ pages++;
+ }
continue;
}
- if (pmd_large(*pm_dir)) {
- pmd_clear(pm_dir);
- address += PMD_SIZE;
- pages1m++;
+
+ remove_pmd_table(pud, addr, next);
+ }
+
+ update_page_count(PG_DIRECT_MAP_2G, -pages);
+}
+
+static void remove_p4d_table(pgd_t *pgd, unsigned long addr,
+ unsigned long end)
+{
+ unsigned long next;
+ p4d_t *p4d;
+
+ p4d = p4d_offset(pgd, addr);
+ for (; addr < end; addr = next, p4d++) {
+ next = p4d_addr_end(addr, end);
+
+ if (p4d_none(*p4d))
continue;
- }
- pt_dir = pte_offset_kernel(pm_dir, address);
- pte_clear(&init_mm, address, pt_dir);
- address += PAGE_SIZE;
- pages4k++;
+
+ remove_pud_table(p4d, addr, next);
}
+}
+
+static void remove_pagetable(unsigned long start, unsigned long end)
+{
+ unsigned long addr, next;
+ pgd_t *pgd;
+
+ if (WARN_ON_ONCE(!PAGE_ALIGNED(start | end)))
+ return;
+
+ for (addr = start; addr < end; addr = next) {
+ next = pgd_addr_end(addr, end);
+ pgd = pgd_offset_k(addr);
+
+ if (pgd_none(*pgd))
+ continue;
+
+ remove_p4d_table(pgd, addr, next);
+ }
+
flush_tlb_kernel_range(start, end);
- update_page_count(PG_DIRECT_MAP_4K, -pages4k);
- update_page_count(PG_DIRECT_MAP_1M, -pages1m);
- update_page_count(PG_DIRECT_MAP_2G, -pages2g);
+}
+
+/*
+ * Remove a physical memory range from the 1:1 mapping.
+ * Currently only invalidates page table entries.
+ */
+static void vmem_remove_range(unsigned long start, unsigned long size)
+{
+ remove_pagetable(start, start + size);
}

/*
--
2.26.2

2020-07-03 13:40:13

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 1/9] s390/vmem: rename vmem_add_mem() to vmem_add_range()

Let's match the name to vmem_remove_range().

Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/mm/vmem.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 3b9e71654c37b..66c5333020ead 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -57,7 +57,7 @@ pte_t __ref *vmem_pte_alloc(void)
/*
* Add a physical memory range to the 1:1 mapping.
*/
-static int vmem_add_mem(unsigned long start, unsigned long size)
+static int vmem_add_range(unsigned long start, unsigned long size)
{
unsigned long pgt_prot, sgt_prot, r3_prot;
unsigned long pages4k, pages1m, pages2g;
@@ -308,7 +308,7 @@ int vmem_add_mapping(unsigned long start, unsigned long size)
return -ERANGE;

mutex_lock(&vmem_mutex);
- ret = vmem_add_mem(start, size);
+ ret = vmem_add_range(start, size);
if (ret)
vmem_remove_range(start, size);
mutex_unlock(&vmem_mutex);
@@ -325,7 +325,7 @@ void __init vmem_map_init(void)
struct memblock_region *reg;

for_each_memblock(memory, reg)
- vmem_add_mem(reg->base, reg->size);
+ vmem_add_range(reg->base, reg->size);
__set_memory((unsigned long)_stext,
(unsigned long)(_etext - _stext) >> PAGE_SHIFT,
SET_MEMORY_RO | SET_MEMORY_X);
--
2.26.2

2020-07-03 13:40:15

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 3/9] s390/vmemmap: implement vmemmap_free()

Reuse the shiny new remove_pagetable(), tweaking it to handle freeing of
vmemmap pages, similar to the x86-64 variant (passing "bool direct" to
distinguish).

Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/mm/vmem.c | 46 ++++++++++++++++++++++++++++++++-------------
1 file changed, 33 insertions(+), 13 deletions(-)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 6fe156c3f035c..16e109c292bf5 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -29,6 +29,15 @@ static void __ref *vmem_alloc_pages(unsigned int order)
return (void *) memblock_phys_alloc(size, size);
}

+static void vmem_free_pages(unsigned long addr, int order)
+{
+ /* We don't expect boot memory to be removed ever. */
+ if (!slab_is_available() ||
+ WARN_ON_ONCE(PageReserved(phys_to_page(addr))))
+ return;
+ free_pages(addr, order);
+}
+
void *vmem_crst_alloc(unsigned long val)
{
unsigned long *table;
@@ -139,7 +148,7 @@ static int vmem_add_range(unsigned long start, unsigned long size)
}

static void remove_pte_table(pmd_t *pmd, unsigned long addr,
- unsigned long end)
+ unsigned long end, bool direct)
{
unsigned long pages = 0;
pte_t *pte;
@@ -149,15 +158,18 @@ static void remove_pte_table(pmd_t *pmd, unsigned long addr,
if (pte_none(*pte))
continue;

+ if (!direct)
+ vmem_free_pages(pfn_to_phys(pte_pfn(*pte)), 0);
pte_clear(&init_mm, addr, pte);
pages++;
}

- update_page_count(PG_DIRECT_MAP_4K, -pages);
+ if (direct)
+ update_page_count(PG_DIRECT_MAP_4K, -pages);
}

static void remove_pmd_table(pud_t *pud, unsigned long addr,
- unsigned long end)
+ unsigned long end, bool direct)
{
unsigned long next, pages = 0;
pmd_t *pmd;
@@ -172,20 +184,24 @@ static void remove_pmd_table(pud_t *pud, unsigned long addr,
if (pmd_large(*pmd)) {
if (IS_ALIGNED(addr, PMD_SIZE) &&
IS_ALIGNED(next, PMD_SIZE)) {
+ if (!direct)
+ vmem_free_pages(pmd_deref(*pmd),
+ get_order(PMD_SIZE));
pmd_clear(pmd);
pages++;
}
continue;
}

- remove_pte_table(pmd, addr, next);
+ remove_pte_table(pmd, addr, next, direct);
}

- update_page_count(PG_DIRECT_MAP_1M, -pages);
+ if (direct)
+ update_page_count(PG_DIRECT_MAP_1M, -pages);
}

static void remove_pud_table(p4d_t *p4d, unsigned long addr,
- unsigned long end)
+ unsigned long end, bool direct)
{
unsigned long next, pages = 0;
pud_t *pud;
@@ -200,20 +216,22 @@ static void remove_pud_table(p4d_t *p4d, unsigned long addr,
if (pud_large(*pud)) {
if (IS_ALIGNED(addr, PUD_SIZE) &&
IS_ALIGNED(next, PUD_SIZE)) {
+ WARN_ON_ONCE(!direct);
pud_clear(pud);
pages++;
}
continue;
}

- remove_pmd_table(pud, addr, next);
+ remove_pmd_table(pud, addr, next, direct);
}

- update_page_count(PG_DIRECT_MAP_2G, -pages);
+ if (direct)
+ update_page_count(PG_DIRECT_MAP_2G, -pages);
}

static void remove_p4d_table(pgd_t *pgd, unsigned long addr,
- unsigned long end)
+ unsigned long end, bool direct)
{
unsigned long next;
p4d_t *p4d;
@@ -225,11 +243,12 @@ static void remove_p4d_table(pgd_t *pgd, unsigned long addr,
if (p4d_none(*p4d))
continue;

- remove_pud_table(p4d, addr, next);
+ remove_pud_table(p4d, addr, next, direct);
}
}

-static void remove_pagetable(unsigned long start, unsigned long end)
+static void remove_pagetable(unsigned long start, unsigned long end,
+ bool direct)
{
unsigned long addr, next;
pgd_t *pgd;
@@ -244,7 +263,7 @@ static void remove_pagetable(unsigned long start, unsigned long end)
if (pgd_none(*pgd))
continue;

- remove_p4d_table(pgd, addr, next);
+ remove_p4d_table(pgd, addr, next, direct);
}

flush_tlb_kernel_range(start, end);
@@ -256,7 +275,7 @@ static void remove_pagetable(unsigned long start, unsigned long end)
*/
static void vmem_remove_range(unsigned long start, unsigned long size)
{
- remove_pagetable(start, start + size);
+ remove_pagetable(start, start + size, true);
}

/*
@@ -351,6 +370,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
void vmemmap_free(unsigned long start, unsigned long end,
struct vmem_altmap *altmap)
{
+ remove_pagetable(start, end, false);
}

void vmem_remove_mapping(unsigned long start, unsigned long size)
--
2.26.2

2020-07-03 13:40:23

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 7/9] s390/vmemmap: fallback to PTEs if mapping large PMD fails

Let's fallback to single pages if short on huge pages. No need to stop
memory hotplug.

Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/mm/vmem.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 5239130770b7b..b7fdb9536707f 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -422,23 +422,23 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
}

pm_dir = pmd_offset(pu_dir, address);
- if (pmd_none(*pm_dir)) {
+ if (pmd_none(*pm_dir) && MACHINE_HAS_EDAT1) {
+ void *new_page;
+
/* Use 1MB frames for vmemmap if available. We always
* use large frames even if they are only partially
* used.
* Otherwise we would have also page tables since
* vmemmap_populate gets called for each section
* separately. */
- if (MACHINE_HAS_EDAT1) {
- void *new_page;
-
- new_page = vmemmap_alloc_block(PMD_SIZE, node);
- if (!new_page)
- goto out;
+ new_page = vmemmap_alloc_block(PMD_SIZE, node);
+ if (new_page) {
pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
address = (address + PMD_SIZE) & PMD_MASK;
continue;
}
+ }
+ if (pmd_none(*pm_dir)) {
pt_dir = vmem_pte_alloc();
if (!pt_dir)
goto out;
--
2.26.2

2020-07-03 13:40:29

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 6/9] s390/vmem: cleanup empty page tables

Let's cleanup empty page tables. Consider only page tables that fully
fall into the idendity mapping and the vmemmap range.

As there are no valid accesses to vmem/vmemmap within non-populated ranges,
the single tlb flush at the end should be sufficient.

Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/mm/vmem.c | 98 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 97 insertions(+), 1 deletion(-)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index aa968f67d7f9f..5239130770b7b 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -63,6 +63,15 @@ pte_t __ref *vmem_pte_alloc(void)
return pte;
}

+static void vmem_pte_free(unsigned long *table)
+{
+ /* We don't expect boot memory to be removed ever. */
+ if (!slab_is_available() ||
+ WARN_ON_ONCE(PageReserved(virt_to_page(table))))
+ return;
+ page_table_free(&init_mm, table);
+}
+
/*
* Add a physical memory range to the 1:1 mapping.
*/
@@ -168,6 +177,21 @@ static void remove_pte_table(pmd_t *pmd, unsigned long addr,
update_page_count(PG_DIRECT_MAP_4K, -pages);
}

+static void try_free_pte_table(pmd_t *pmd, unsigned long start)
+{
+ pte_t *pte;
+ int i;
+
+ /* We can safely assume this is fully in 1:1 mapping & vmemmap area */
+ pte = pte_offset_kernel(pmd, start);
+ for (i = 0; i < PTRS_PER_PTE; i++, pte++)
+ if (!pte_none(*pte))
+ return;
+
+ vmem_pte_free(__va(pmd_deref(*pmd)));
+ pmd_clear(pmd);
+}
+
static void remove_pmd_table(pud_t *pud, unsigned long addr,
unsigned long end, bool direct)
{
@@ -194,12 +218,36 @@ static void remove_pmd_table(pud_t *pud, unsigned long addr,
}

remove_pte_table(pmd, addr, next, direct);
+ try_free_pte_table(pmd, addr & PMD_MASK);
}

if (direct)
update_page_count(PG_DIRECT_MAP_1M, -pages);
}

+static void try_free_pmd_table(pud_t *pud, unsigned long start)
+{
+ const unsigned long end = start + PUD_SIZE;
+ pmd_t *pmd;
+ int i;
+
+ /* Don't mess with any tables not fully in 1:1 mapping & vmemmap area */
+ if (end > VMALLOC_START)
+ return;
+#ifdef CONFIG_KASAN
+ if (start < KASAN_SHADOW_END && KASAN_SHADOW_START > end)
+ return;
+#endif
+
+ pmd = pmd_offset(pud, start);
+ for (i = 0; i < PTRS_PER_PMD; i++, pmd++)
+ if (!pmd_none(*pmd))
+ return;
+
+ vmem_free_pages(pud_deref(*pud), CRST_ALLOC_ORDER);
+ pud_clear(pud);
+}
+
static void remove_pud_table(p4d_t *p4d, unsigned long addr,
unsigned long end, bool direct)
{
@@ -224,12 +272,36 @@ static void remove_pud_table(p4d_t *p4d, unsigned long addr,
}

remove_pmd_table(pud, addr, next, direct);
+ try_free_pmd_table(pud, addr & PUD_MASK);
}

if (direct)
update_page_count(PG_DIRECT_MAP_2G, -pages);
}

+static void try_free_pud_table(p4d_t *p4d, unsigned long start)
+{
+ const unsigned long end = start + P4D_SIZE;
+ pud_t *pud;
+ int i;
+
+ /* Don't mess with any tables not fully in 1:1 mapping & vmemmap area */
+ if (end > VMALLOC_START)
+ return;
+#ifdef CONFIG_KASAN
+ if (start < KASAN_SHADOW_END && KASAN_SHADOW_START > end)
+ return;
+#endif
+
+ pud = pud_offset(p4d, start);
+ for (i = 0; i < PTRS_PER_PUD; i++, pud++)
+ if (!pud_none(*pud))
+ return;
+
+ vmem_free_pages(p4d_deref(*p4d), CRST_ALLOC_ORDER);
+ p4d_clear(p4d);
+}
+
static void remove_p4d_table(pgd_t *pgd, unsigned long addr,
unsigned long end, bool direct)
{
@@ -244,9 +316,33 @@ static void remove_p4d_table(pgd_t *pgd, unsigned long addr,
continue;

remove_pud_table(p4d, addr, next, direct);
+ try_free_pud_table(p4d, addr & P4D_MASK);
}
}

+static void try_free_p4d_table(pgd_t *pgd, unsigned long start)
+{
+ const unsigned long end = start + PGDIR_SIZE;
+ p4d_t *p4d;
+ int i;
+
+ /* Don't mess with any tables not fully in 1:1 mapping & vmemmap area */
+ if (end > VMALLOC_START)
+ return;
+#ifdef CONFIG_KASAN
+ if (start < KASAN_SHADOW_END && KASAN_SHADOW_START > end)
+ return;
+#endif
+
+ p4d = p4d_offset(pgd, start);
+ for (i = 0; i < PTRS_PER_P4D; i++, p4d++)
+ if (!p4d_none(*p4d))
+ return;
+
+ vmem_free_pages(pgd_deref(*pgd), CRST_ALLOC_ORDER);
+ pgd_clear(pgd);
+}
+
static void remove_pagetable(unsigned long start, unsigned long end,
bool direct)
{
@@ -264,6 +360,7 @@ static void remove_pagetable(unsigned long start, unsigned long end,
continue;

remove_p4d_table(pgd, addr, next, direct);
+ try_free_p4d_table(pgd, addr & PGDIR_MASK);
}

flush_tlb_kernel_range(start, end);
@@ -271,7 +368,6 @@ static void remove_pagetable(unsigned long start, unsigned long end,

/*
* Remove a physical memory range from the 1:1 mapping.
- * Currently only invalidates page table entries.
*/
static void vmem_remove_range(unsigned long start, unsigned long size)
{
--
2.26.2

2020-07-03 13:40:44

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 8/9] s390/vmemmap: remember unused sub-pmd ranges

With a memmap size of 56 bytes or 72 bytes per page, the memmap for a
256 MB section won't span full PMDs. As we populate single sections and
depopulate single sections, the depopulation step would not be able to
free all vmemmap pmds anymore.

Do it similarly to x86, marking the unused memmap ranges in a special way
(pad it with 0xFD).

This allows us to add/remove sections, cleaning up all allocated
vmemmap pages even if the memmap size is not multiple of 16 bytes per page.

A 56 byte memmap can, for example, be created with !CONFIG_MEMCG and
!CONFIG_SLUB.

Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/mm/vmem.c | 66 ++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 60 insertions(+), 6 deletions(-)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index b7fdb9536707f..a981ff5d47223 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -72,6 +72,42 @@ static void vmem_pte_free(unsigned long *table)
page_table_free(&init_mm, table);
}

+#define PAGE_UNUSED 0xFD
+
+static void vmemmap_use_sub_pmd(unsigned long start, unsigned long end)
+{
+ /*
+ * As we expect to add in the same granularity as we remove, it's
+ * sufficient to mark only some piece used to block the memmap page from
+ * getting removed (just in case the memmap never gets initialized,
+ * e.g., because the memory block never gets onlined).
+ */
+ memset(__va(start), 0, sizeof(struct page));
+}
+
+static void vmemmap_use_new_sub_pmd(unsigned long start, unsigned long end)
+{
+ void *page = __va(ALIGN_DOWN(start, PMD_SIZE));
+
+ /* Could be our memmap page is filled with PAGE_UNUSED already ... */
+ vmemmap_use_sub_pmd(start, end);
+
+ /* Mark the unused parts of the new memmap page PAGE_UNUSED. */
+ if (!IS_ALIGNED(start, PMD_SIZE))
+ memset(page, PAGE_UNUSED, start - __pa(page));
+ if (!IS_ALIGNED(end, PMD_SIZE))
+ memset(__va(end), PAGE_UNUSED, __pa(page) + PMD_SIZE - end);
+}
+
+/* Returns true if the PMD is completely unused and can be freed. */
+static bool vmemmap_unuse_sub_pmd(unsigned long start, unsigned long end)
+{
+ void *page = __va(ALIGN_DOWN(start, PMD_SIZE));
+
+ memset(__va(start), PAGE_UNUSED, end - start);
+ return !memchr_inv(page, PAGE_UNUSED, PMD_SIZE);
+}
+
/*
* Add a physical memory range to the 1:1 mapping.
*/
@@ -213,6 +249,11 @@ static void remove_pmd_table(pud_t *pud, unsigned long addr,
get_order(PMD_SIZE));
pmd_clear(pmd);
pages++;
+ } else if (!direct &&
+ vmemmap_unuse_sub_pmd(addr, next)) {
+ vmem_free_pages(pmd_deref(*pmd),
+ get_order(PMD_SIZE));
+ pmd_clear(pmd);
}
continue;
}
@@ -381,7 +422,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
struct vmem_altmap *altmap)
{
unsigned long pgt_prot, sgt_prot;
- unsigned long address = start;
+ unsigned long next, address = start;
pgd_t *pg_dir;
p4d_t *p4_dir;
pud_t *pu_dir;
@@ -425,16 +466,27 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
if (pmd_none(*pm_dir) && MACHINE_HAS_EDAT1) {
void *new_page;

- /* Use 1MB frames for vmemmap if available. We always
+ /*
+ * Use 1MB frames for vmemmap if available. We always
* use large frames even if they are only partially
- * used.
+ * used, and mark the unused parts using PAGE_UNUSED.
+ *
+ * This is only an issue in some setups. E.g.,
+ * a full sections with 64 byte memmap per page need
+ * 4 MB in total. However, with 56 byte, it's 3.5 MB.
+ *
* Otherwise we would have also page tables since
* vmemmap_populate gets called for each section
- * separately. */
+ * separately.
+ */
new_page = vmemmap_alloc_block(PMD_SIZE, node);
if (new_page) {
pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
- address = (address + PMD_SIZE) & PMD_MASK;
+ next = pmd_addr_end(address, end);
+ if (!IS_ALIGNED(next, PMD_SIZE) ||
+ !IS_ALIGNED(address, PMD_SIZE))
+ vmemmap_use_new_sub_pmd(address, next);
+ address = next;
continue;
}
}
@@ -444,7 +496,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
goto out;
pmd_populate(&init_mm, pm_dir, pt_dir);
} else if (pmd_large(*pm_dir)) {
- address = (address + PMD_SIZE) & PMD_MASK;
+ next = pmd_addr_end(address, end);
+ vmemmap_use_sub_pmd(address, next);
+ address = next;
continue;
}

--
2.26.2

2020-07-03 13:41:16

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 5/9] s390/vmemmap: take the vmem_mutex when populating/freeing

Let's synchronize all accesses to the 1:1 and vmemmap mappings. This will
be especially relevant when wanting to cleanup empty page tables that could
be shared by both. Avoid races when removing tables that might be just
about to get reused.

Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/mm/vmem.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index bcddabd509da8..aa968f67d7f9f 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -293,6 +293,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
pte_t *pt_dir;
int ret = -ENOMEM;

+ mutex_lock(&vmem_mutex);
pgt_prot = pgprot_val(PAGE_KERNEL);
sgt_prot = pgprot_val(SEGMENT_KERNEL);
if (!MACHINE_HAS_NX) {
@@ -364,6 +365,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
}
ret = 0;
out:
+ mutex_unlock(&vmem_mutex);
if (ret)
vmemmap_free(start, end, altmap);
return ret;
@@ -372,7 +374,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
void vmemmap_free(unsigned long start, unsigned long end,
struct vmem_altmap *altmap)
{
+ mutex_lock(&vmem_mutex);
remove_pagetable(start, end, false);
+ mutex_unlock(&vmem_mutex);
}

void vmem_remove_mapping(unsigned long start, unsigned long size)
--
2.26.2

2020-07-03 13:41:32

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 4/9] s390/vmemmap: cleanup when vmemmap_populate() fails

Cleanup what we partially added in case vmemmap_populate() fails.

Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/mm/vmem.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 16e109c292bf5..bcddabd509da8 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -364,6 +364,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
}
ret = 0;
out:
+ if (ret)
+ vmemmap_free(start, end, altmap);
return ret;
}

--
2.26.2

2020-07-03 13:42:41

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 9/9] s390/vmemmap: avoid memset(PAGE_UNUSED) when adding consecutive sections

Let's avoid memset(PAGE_UNUSED) when adding consecutive sections,
whereby the vmemmap of a single section does not span full PMDs.

Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/s390/mm/vmem.c | 45 ++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 42 insertions(+), 3 deletions(-)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index a981ff5d47223..9db15fc864fc8 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -74,7 +74,22 @@ static void vmem_pte_free(unsigned long *table)

#define PAGE_UNUSED 0xFD

-static void vmemmap_use_sub_pmd(unsigned long start, unsigned long end)
+/*
+ * The unused vmemmap range, which was not yet memset(PAGE_UNUSED) ranges
+ * from unused_pmd_start to next PMD_SIZE boundary.
+ */
+unsigned long unused_pmd_start;
+
+static void vmemmap_flush_unused_pmd(void)
+{
+ if (!unused_pmd_start)
+ return;
+ memset(__va(unused_pmd_start), PAGE_UNUSED,
+ ALIGN(unused_pmd_start, PMD_SIZE) - unused_pmd_start);
+ unused_pmd_start = 0;
+}
+
+static void __vmemmap_use_sub_pmd(unsigned long start, unsigned long end)
{
/*
* As we expect to add in the same granularity as we remove, it's
@@ -85,18 +100,41 @@ static void vmemmap_use_sub_pmd(unsigned long start, unsigned long end)
memset(__va(start), 0, sizeof(struct page));
}

+static void vmemmap_use_sub_pmd(unsigned long start, unsigned long end)
+{
+ /*
+ * We only optimize if the new used range directly follows the
+ * previously unused range (esp., when populating consecutive sections).
+ */
+ if (unused_pmd_start == start) {
+ unused_pmd_start = end;
+ if (likely(IS_ALIGNED(unused_pmd_start, PMD_SIZE)))
+ unused_pmd_start = 0;
+ return;
+ }
+ vmemmap_flush_unused_pmd();
+ __vmemmap_use_sub_pmd(start, end);
+}
+
static void vmemmap_use_new_sub_pmd(unsigned long start, unsigned long end)
{
void *page = __va(ALIGN_DOWN(start, PMD_SIZE));

+ vmemmap_flush_unused_pmd();
+
/* Could be our memmap page is filled with PAGE_UNUSED already ... */
- vmemmap_use_sub_pmd(start, end);
+ __vmemmap_use_sub_pmd(start, end);

/* Mark the unused parts of the new memmap page PAGE_UNUSED. */
if (!IS_ALIGNED(start, PMD_SIZE))
memset(page, PAGE_UNUSED, start - __pa(page));
+ /*
+ * We want to avoid memset(PAGE_UNUSED) when populating the vmemmap of
+ * consecutive sections. Remember for the last added PMD the last
+ * unused range in the populated PMD.
+ */
if (!IS_ALIGNED(end, PMD_SIZE))
- memset(__va(end), PAGE_UNUSED, __pa(page) + PMD_SIZE - end);
+ unused_pmd_start = end;
}

/* Returns true if the PMD is completely unused and can be freed. */
@@ -104,6 +142,7 @@ static bool vmemmap_unuse_sub_pmd(unsigned long start, unsigned long end)
{
void *page = __va(ALIGN_DOWN(start, PMD_SIZE));

+ vmemmap_flush_unused_pmd();
memset(__va(start), PAGE_UNUSED, end - start);
return !memchr_inv(page, PAGE_UNUSED, PMD_SIZE);
}
--
2.26.2

2020-07-03 15:50:45

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] s390: implement and optimize vmemmap_free()

On Fri, Jul 03, 2020 at 03:39:08PM +0200, David Hildenbrand wrote:
> This series is based on the latest s390/features branch [1]. It implements
> vmemmap_free(), consolidating it with vmem_add_range(), and optimizes it by
> - Freeing empty page tables (now also done for idendity mapping).
> - Handling cases where the vmemmap of a section does not fill huge pages
> completely.

Nice! You implemented some things I "always wanted to do". Maybe I
should just do nothing and wait until patches appear ;)

Will take a look at the series next week. Thanks!

2020-07-03 17:11:46

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 4/9] s390/vmemmap: cleanup when vmemmap_populate() fails

Hi David,

I love your patch! Yet something to improve:

[auto build test ERROR on s390/features]
[also build test ERROR on next-20200703]
[cannot apply to linux/master kvms390/next linus/master v5.8-rc3]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/David-Hildenbrand/s390-implement-and-optimize-vmemmap_free/20200703-214348
base: https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
config: s390-alldefconfig (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All error/warnings (new ones prefixed by >>):

arch/s390/mm/vmem.c: In function 'vmemmap_populate':
>> arch/s390/mm/vmem.c:368:3: error: implicit declaration of function 'vmemmap_free'; did you mean 'vmem_altmap_free'? [-Werror=implicit-function-declaration]
368 | vmemmap_free(start, end, altmap);
| ^~~~~~~~~~~~
| vmem_altmap_free
arch/s390/mm/vmem.c: At top level:
arch/s390/mm/vmem.c:372:6: warning: no previous prototype for 'vmemmap_free' [-Wmissing-prototypes]
372 | void vmemmap_free(unsigned long start, unsigned long end,
| ^~~~~~~~~~~~
>> arch/s390/mm/vmem.c:372:6: warning: conflicting types for 'vmemmap_free'
arch/s390/mm/vmem.c:368:3: note: previous implicit declaration of 'vmemmap_free' was here
368 | vmemmap_free(start, end, altmap);
| ^~~~~~~~~~~~
cc1: some warnings being treated as errors

vim +368 arch/s390/mm/vmem.c

280
281 /*
282 * Add a backed mem_map array to the virtual mem_map array.
283 */
284 int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
285 struct vmem_altmap *altmap)
286 {
287 unsigned long pgt_prot, sgt_prot;
288 unsigned long address = start;
289 pgd_t *pg_dir;
290 p4d_t *p4_dir;
291 pud_t *pu_dir;
292 pmd_t *pm_dir;
293 pte_t *pt_dir;
294 int ret = -ENOMEM;
295
296 pgt_prot = pgprot_val(PAGE_KERNEL);
297 sgt_prot = pgprot_val(SEGMENT_KERNEL);
298 if (!MACHINE_HAS_NX) {
299 pgt_prot &= ~_PAGE_NOEXEC;
300 sgt_prot &= ~_SEGMENT_ENTRY_NOEXEC;
301 }
302 for (address = start; address < end;) {
303 pg_dir = pgd_offset_k(address);
304 if (pgd_none(*pg_dir)) {
305 p4_dir = vmem_crst_alloc(_REGION2_ENTRY_EMPTY);
306 if (!p4_dir)
307 goto out;
308 pgd_populate(&init_mm, pg_dir, p4_dir);
309 }
310
311 p4_dir = p4d_offset(pg_dir, address);
312 if (p4d_none(*p4_dir)) {
313 pu_dir = vmem_crst_alloc(_REGION3_ENTRY_EMPTY);
314 if (!pu_dir)
315 goto out;
316 p4d_populate(&init_mm, p4_dir, pu_dir);
317 }
318
319 pu_dir = pud_offset(p4_dir, address);
320 if (pud_none(*pu_dir)) {
321 pm_dir = vmem_crst_alloc(_SEGMENT_ENTRY_EMPTY);
322 if (!pm_dir)
323 goto out;
324 pud_populate(&init_mm, pu_dir, pm_dir);
325 }
326
327 pm_dir = pmd_offset(pu_dir, address);
328 if (pmd_none(*pm_dir)) {
329 /* Use 1MB frames for vmemmap if available. We always
330 * use large frames even if they are only partially
331 * used.
332 * Otherwise we would have also page tables since
333 * vmemmap_populate gets called for each section
334 * separately. */
335 if (MACHINE_HAS_EDAT1) {
336 void *new_page;
337
338 new_page = vmemmap_alloc_block(PMD_SIZE, node);
339 if (!new_page)
340 goto out;
341 pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
342 address = (address + PMD_SIZE) & PMD_MASK;
343 continue;
344 }
345 pt_dir = vmem_pte_alloc();
346 if (!pt_dir)
347 goto out;
348 pmd_populate(&init_mm, pm_dir, pt_dir);
349 } else if (pmd_large(*pm_dir)) {
350 address = (address + PMD_SIZE) & PMD_MASK;
351 continue;
352 }
353
354 pt_dir = pte_offset_kernel(pm_dir, address);
355 if (pte_none(*pt_dir)) {
356 void *new_page;
357
358 new_page = vmemmap_alloc_block(PAGE_SIZE, node);
359 if (!new_page)
360 goto out;
361 pte_val(*pt_dir) = __pa(new_page) | pgt_prot;
362 }
363 address += PAGE_SIZE;
364 }
365 ret = 0;
366 out:
367 if (ret)
> 368 vmemmap_free(start, end, altmap);
369 return ret;
370 }
371
> 372 void vmemmap_free(unsigned long start, unsigned long end,
373 struct vmem_altmap *altmap)
374 {
375 remove_pagetable(start, end, false);
376 }
377

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (5.22 kB)
.config.gz (7.78 kB)
Download all attachments

2020-07-04 11:54:04

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 4/9] s390/vmemmap: cleanup when vmemmap_populate() fails

Hi David,

I love your patch! Yet something to improve:

[auto build test ERROR on s390/features]
[also build test ERROR on next-20200703]
[cannot apply to linux/master kvms390/next linus/master v5.8-rc3]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/David-Hildenbrand/s390-implement-and-optimize-vmemmap_free/20200703-214348
base: https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
config: s390-randconfig-r036-20200701 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project ca464639a1c9dd3944eb055ffd2796e8c2e7639f)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install s390 cross compiling tool for clang build
# apt-get install binutils-s390x-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

#define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
^
include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
___constant_swab32(x) : \
^
include/uapi/linux/swab.h:19:12: note: expanded from macro '___constant_swab32'
(((__u32)(x) & (__u32)0x000000ffUL) << 24) | \
^
In file included from arch/s390/mm/vmem.c:7:
In file included from include/linux/memblock.h:14:
In file included from arch/s390/include/asm/dma.h:5:
In file included from arch/s390/include/asm/io.h:72:
include/asm-generic/io.h:490:45: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu(__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
^
include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
___constant_swab32(x) : \
^
include/uapi/linux/swab.h:20:12: note: expanded from macro '___constant_swab32'
(((__u32)(x) & (__u32)0x0000ff00UL) << 8) | \
^
In file included from arch/s390/mm/vmem.c:7:
In file included from include/linux/memblock.h:14:
In file included from arch/s390/include/asm/dma.h:5:
In file included from arch/s390/include/asm/io.h:72:
include/asm-generic/io.h:490:45: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu(__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
^
include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
___constant_swab32(x) : \
^
include/uapi/linux/swab.h:21:12: note: expanded from macro '___constant_swab32'
(((__u32)(x) & (__u32)0x00ff0000UL) >> 8) | \
^
In file included from arch/s390/mm/vmem.c:7:
In file included from include/linux/memblock.h:14:
In file included from arch/s390/include/asm/dma.h:5:
In file included from arch/s390/include/asm/io.h:72:
include/asm-generic/io.h:490:45: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu(__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
^
include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
___constant_swab32(x) : \
^
include/uapi/linux/swab.h:22:12: note: expanded from macro '___constant_swab32'
(((__u32)(x) & (__u32)0xff000000UL) >> 24)))
^
In file included from arch/s390/mm/vmem.c:7:
In file included from include/linux/memblock.h:14:
In file included from arch/s390/include/asm/dma.h:5:
In file included from arch/s390/include/asm/io.h:72:
include/asm-generic/io.h:490:45: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu(__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
^
include/uapi/linux/swab.h:120:12: note: expanded from macro '__swab32'
__fswab32(x))
^
In file included from arch/s390/mm/vmem.c:7:
In file included from include/linux/memblock.h:14:
In file included from arch/s390/include/asm/dma.h:5:
In file included from arch/s390/include/asm/io.h:72:
include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writeb(value, PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:511:46: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writew(cpu_to_le16(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:521:46: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writel(cpu_to_le32(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsb(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsw(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsl(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesb(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesw(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesl(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
>> arch/s390/mm/vmem.c:368:3: error: implicit declaration of function 'vmemmap_free' [-Werror,-Wimplicit-function-declaration]
vmemmap_free(start, end, altmap);
^
>> arch/s390/mm/vmem.c:372:6: error: conflicting types for 'vmemmap_free'
void vmemmap_free(unsigned long start, unsigned long end,
^
arch/s390/mm/vmem.c:368:3: note: previous implicit declaration is here
vmemmap_free(start, end, altmap);
^
20 warnings and 2 errors generated.

vim +/vmemmap_free +368 arch/s390/mm/vmem.c

280
281 /*
282 * Add a backed mem_map array to the virtual mem_map array.
283 */
284 int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
285 struct vmem_altmap *altmap)
286 {
287 unsigned long pgt_prot, sgt_prot;
288 unsigned long address = start;
289 pgd_t *pg_dir;
290 p4d_t *p4_dir;
291 pud_t *pu_dir;
292 pmd_t *pm_dir;
293 pte_t *pt_dir;
294 int ret = -ENOMEM;
295
296 pgt_prot = pgprot_val(PAGE_KERNEL);
297 sgt_prot = pgprot_val(SEGMENT_KERNEL);
298 if (!MACHINE_HAS_NX) {
299 pgt_prot &= ~_PAGE_NOEXEC;
300 sgt_prot &= ~_SEGMENT_ENTRY_NOEXEC;
301 }
302 for (address = start; address < end;) {
303 pg_dir = pgd_offset_k(address);
304 if (pgd_none(*pg_dir)) {
305 p4_dir = vmem_crst_alloc(_REGION2_ENTRY_EMPTY);
306 if (!p4_dir)
307 goto out;
308 pgd_populate(&init_mm, pg_dir, p4_dir);
309 }
310
311 p4_dir = p4d_offset(pg_dir, address);
312 if (p4d_none(*p4_dir)) {
313 pu_dir = vmem_crst_alloc(_REGION3_ENTRY_EMPTY);
314 if (!pu_dir)
315 goto out;
316 p4d_populate(&init_mm, p4_dir, pu_dir);
317 }
318
319 pu_dir = pud_offset(p4_dir, address);
320 if (pud_none(*pu_dir)) {
321 pm_dir = vmem_crst_alloc(_SEGMENT_ENTRY_EMPTY);
322 if (!pm_dir)
323 goto out;
324 pud_populate(&init_mm, pu_dir, pm_dir);
325 }
326
327 pm_dir = pmd_offset(pu_dir, address);
328 if (pmd_none(*pm_dir)) {
329 /* Use 1MB frames for vmemmap if available. We always
330 * use large frames even if they are only partially
331 * used.
332 * Otherwise we would have also page tables since
333 * vmemmap_populate gets called for each section
334 * separately. */
335 if (MACHINE_HAS_EDAT1) {
336 void *new_page;
337
338 new_page = vmemmap_alloc_block(PMD_SIZE, node);
339 if (!new_page)
340 goto out;
341 pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
342 address = (address + PMD_SIZE) & PMD_MASK;
343 continue;
344 }
345 pt_dir = vmem_pte_alloc();
346 if (!pt_dir)
347 goto out;
348 pmd_populate(&init_mm, pm_dir, pt_dir);
349 } else if (pmd_large(*pm_dir)) {
350 address = (address + PMD_SIZE) & PMD_MASK;
351 continue;
352 }
353
354 pt_dir = pte_offset_kernel(pm_dir, address);
355 if (pte_none(*pt_dir)) {
356 void *new_page;
357
358 new_page = vmemmap_alloc_block(PAGE_SIZE, node);
359 if (!new_page)
360 goto out;
361 pte_val(*pt_dir) = __pa(new_page) | pgt_prot;
362 }
363 address += PAGE_SIZE;
364 }
365 ret = 0;
366 out:
367 if (ret)
> 368 vmemmap_free(start, end, altmap);
369 return ret;
370 }
371
> 372 void vmemmap_free(unsigned long start, unsigned long end,
373 struct vmem_altmap *altmap)
374 {
375 remove_pagetable(start, end, false);
376 }
377

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (11.78 kB)
.config.gz (29.13 kB)
Download all attachments

2020-07-06 07:31:47

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 4/9] s390/vmemmap: cleanup when vmemmap_populate() fails

On 03.07.20 19:09, kernel test robot wrote:
> Hi David,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on s390/features]
> [also build test ERROR on next-20200703]
> [cannot apply to linux/master kvms390/next linus/master v5.8-rc3]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url: https://github.com/0day-ci/linux/commits/David-Hildenbrand/s390-implement-and-optimize-vmemmap_free/20200703-214348
> base: https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
> config: s390-alldefconfig (attached as .config)
> compiler: s390-linux-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <[email protected]>
>
> All error/warnings (new ones prefixed by >>):
>
> arch/s390/mm/vmem.c: In function 'vmemmap_populate':
>>> arch/s390/mm/vmem.c:368:3: error: implicit declaration of function 'vmemmap_free'; did you mean 'vmem_altmap_free'? [-Werror=implicit-function-declaration]
> 368 | vmemmap_free(start, end, altmap);
> | ^~~~~~~~~~~~
> | vmem_altmap_free
> arch/s390/mm/vmem.c: At top level:
> arch/s390/mm/vmem.c:372:6: warning: no previous prototype for 'vmemmap_free' [-Wmissing-prototypes]
> 372 | void vmemmap_free(unsigned long start, unsigned long end,
> | ^~~~~~~~~~~~
>>> arch/s390/mm/vmem.c:372:6: warning: conflicting types for 'vmemmap_free'
> arch/s390/mm/vmem.c:368:3: note: previous implicit declaration of 'vmemmap_free' was here
> 368 | vmemmap_free(start, end, altmap);
> | ^~~~~~~~~~~~
> cc1: some warnings being treated as errors

Ah, vmemmap_free() is only defined with CONFIG_MEMORY_HOTPLUG. Easy to fix.


--
Thanks,

David / dhildenb

2020-07-07 12:11:28

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] s390: implement and optimize vmemmap_free()

On Fri, Jul 03, 2020 at 03:39:08PM +0200, David Hildenbrand wrote:
> This series is based on the latest s390/features branch [1]. It implements
> vmemmap_free(), consolidating it with vmem_add_range(), and optimizes it by
> - Freeing empty page tables (now also done for idendity mapping).
> - Handling cases where the vmemmap of a section does not fill huge pages
> completely.
>
> vmemmap_free() is currently never used, unless adiing standby memory fails
> (unlikely). This is relevant for virtio-mem, which adds/removes memory
> in memory block/section granularity (always removes memory in the same
> granularity it added it).
>
> I gave this a proper test with my virtio-mem prototype (which I will share
> once the basic QEMU implementation is upstream), both with 56 byte memmap
> per page and 64 byte memmap per page, with and without huge page support.
> In both cases, removing memory (routed through arch_remove_memory()) will
> result in
> - all populated vmemmap pages to get removed/freed
> - all applicable page tables for the vmemmap getting removed/freed
> - all applicable page tables for the idendity mapping getting removed/freed
> Unfortunately, I don't have access to bigger and z/VM (esp. dcss)
> environments.
>
> This is the basis for real memory hotunplug support for s390x and should
> complete my journey to s390x vmem/vmemmap code for now :)
>
> What needs double-checking is tlb flushing. AFAIKS, as there are no valid
> accesses, doing a single range flush at the end is sufficient, both when
> removing vmemmap pages and the idendity mapping.
>
> Along, some minor cleanups.

Hmm.. I really would like to see if there would be only a single page
table walker left in vmem.c, which handles both adding and removing
things.
Now we end up with two different page table walk implementations
within the same file. However not sure if it is worth the effort to
unify them though.

2020-07-07 12:15:33

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] s390: implement and optimize vmemmap_free()

On 07.07.20 14:08, Heiko Carstens wrote:
> On Fri, Jul 03, 2020 at 03:39:08PM +0200, David Hildenbrand wrote:
>> This series is based on the latest s390/features branch [1]. It implements
>> vmemmap_free(), consolidating it with vmem_add_range(), and optimizes it by
>> - Freeing empty page tables (now also done for idendity mapping).
>> - Handling cases where the vmemmap of a section does not fill huge pages
>> completely.
>>
>> vmemmap_free() is currently never used, unless adiing standby memory fails
>> (unlikely). This is relevant for virtio-mem, which adds/removes memory
>> in memory block/section granularity (always removes memory in the same
>> granularity it added it).
>>
>> I gave this a proper test with my virtio-mem prototype (which I will share
>> once the basic QEMU implementation is upstream), both with 56 byte memmap
>> per page and 64 byte memmap per page, with and without huge page support.
>> In both cases, removing memory (routed through arch_remove_memory()) will
>> result in
>> - all populated vmemmap pages to get removed/freed
>> - all applicable page tables for the vmemmap getting removed/freed
>> - all applicable page tables for the idendity mapping getting removed/freed
>> Unfortunately, I don't have access to bigger and z/VM (esp. dcss)
>> environments.
>>
>> This is the basis for real memory hotunplug support for s390x and should
>> complete my journey to s390x vmem/vmemmap code for now :)
>>
>> What needs double-checking is tlb flushing. AFAIKS, as there are no valid
>> accesses, doing a single range flush at the end is sufficient, both when
>> removing vmemmap pages and the idendity mapping.
>>
>> Along, some minor cleanups.
>
> Hmm.. I really would like to see if there would be only a single page
> table walker left in vmem.c, which handles both adding and removing
> things.
> Now we end up with two different page table walk implementations
> within the same file. However not sure if it is worth the effort to
> unify them though.

I tried to unify vmemmap_populate() and vmem_add_range() already and
didn't like the end result ... so, unifying these along with the removal
part won't be any better - most probably. Open for suggestions :)

(at least arm64 and x86-64 handle it similarly)

--
Thanks,

David / dhildenb

2020-07-08 06:51:49

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] s390: implement and optimize vmemmap_free()

On 07.07.20 14:13, David Hildenbrand wrote:
> On 07.07.20 14:08, Heiko Carstens wrote:
>> On Fri, Jul 03, 2020 at 03:39:08PM +0200, David Hildenbrand wrote:
>>> This series is based on the latest s390/features branch [1]. It implements
>>> vmemmap_free(), consolidating it with vmem_add_range(), and optimizes it by
>>> - Freeing empty page tables (now also done for idendity mapping).
>>> - Handling cases where the vmemmap of a section does not fill huge pages
>>> completely.
>>>
>>> vmemmap_free() is currently never used, unless adiing standby memory fails
>>> (unlikely). This is relevant for virtio-mem, which adds/removes memory
>>> in memory block/section granularity (always removes memory in the same
>>> granularity it added it).
>>>
>>> I gave this a proper test with my virtio-mem prototype (which I will share
>>> once the basic QEMU implementation is upstream), both with 56 byte memmap
>>> per page and 64 byte memmap per page, with and without huge page support.
>>> In both cases, removing memory (routed through arch_remove_memory()) will
>>> result in
>>> - all populated vmemmap pages to get removed/freed
>>> - all applicable page tables for the vmemmap getting removed/freed
>>> - all applicable page tables for the idendity mapping getting removed/freed
>>> Unfortunately, I don't have access to bigger and z/VM (esp. dcss)
>>> environments.
>>>
>>> This is the basis for real memory hotunplug support for s390x and should
>>> complete my journey to s390x vmem/vmemmap code for now :)
>>>
>>> What needs double-checking is tlb flushing. AFAIKS, as there are no valid
>>> accesses, doing a single range flush at the end is sufficient, both when
>>> removing vmemmap pages and the idendity mapping.
>>>
>>> Along, some minor cleanups.
>>
>> Hmm.. I really would like to see if there would be only a single page
>> table walker left in vmem.c, which handles both adding and removing
>> things.
>> Now we end up with two different page table walk implementations
>> within the same file. However not sure if it is worth the effort to
>> unify them though.
>
> I tried to unify vmemmap_populate() and vmem_add_range() already and
> didn't like the end result ... so, unifying these along with the removal
> part won't be any better - most probably. Open for suggestions :)
>
> (at least arm64 and x86-64 handle it similarly)
>

I'll play with something like

static void modify_pagetable(unsigned long start, unsigned long end,
bool direct, bool add)

and see how it turns out.

--
Thanks,

David / dhildenb

2020-07-08 12:18:11

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] s390: implement and optimize vmemmap_free()

On 08.07.20 08:50, David Hildenbrand wrote:
> On 07.07.20 14:13, David Hildenbrand wrote:
>> On 07.07.20 14:08, Heiko Carstens wrote:
>>> On Fri, Jul 03, 2020 at 03:39:08PM +0200, David Hildenbrand wrote:
>>>> This series is based on the latest s390/features branch [1]. It implements
>>>> vmemmap_free(), consolidating it with vmem_add_range(), and optimizes it by
>>>> - Freeing empty page tables (now also done for idendity mapping).
>>>> - Handling cases where the vmemmap of a section does not fill huge pages
>>>> completely.
>>>>
>>>> vmemmap_free() is currently never used, unless adiing standby memory fails
>>>> (unlikely). This is relevant for virtio-mem, which adds/removes memory
>>>> in memory block/section granularity (always removes memory in the same
>>>> granularity it added it).
>>>>
>>>> I gave this a proper test with my virtio-mem prototype (which I will share
>>>> once the basic QEMU implementation is upstream), both with 56 byte memmap
>>>> per page and 64 byte memmap per page, with and without huge page support.
>>>> In both cases, removing memory (routed through arch_remove_memory()) will
>>>> result in
>>>> - all populated vmemmap pages to get removed/freed
>>>> - all applicable page tables for the vmemmap getting removed/freed
>>>> - all applicable page tables for the idendity mapping getting removed/freed
>>>> Unfortunately, I don't have access to bigger and z/VM (esp. dcss)
>>>> environments.
>>>>
>>>> This is the basis for real memory hotunplug support for s390x and should
>>>> complete my journey to s390x vmem/vmemmap code for now :)
>>>>
>>>> What needs double-checking is tlb flushing. AFAIKS, as there are no valid
>>>> accesses, doing a single range flush at the end is sufficient, both when
>>>> removing vmemmap pages and the idendity mapping.
>>>>
>>>> Along, some minor cleanups.
>>>
>>> Hmm.. I really would like to see if there would be only a single page
>>> table walker left in vmem.c, which handles both adding and removing
>>> things.
>>> Now we end up with two different page table walk implementations
>>> within the same file. However not sure if it is worth the effort to
>>> unify them though.
>>
>> I tried to unify vmemmap_populate() and vmem_add_range() already and
>> didn't like the end result ... so, unifying these along with the removal
>> part won't be any better - most probably. Open for suggestions :)
>>
>> (at least arm64 and x86-64 handle it similarly)
>>
>
> I'll play with something like
>
> static void modify_pagetable(unsigned long start, unsigned long end,
> bool direct, bool add)
>
> and see how it turns out.
>

Did a quick hack. With a single walker (modify_pagetable) I get

arch/s390/mm/vmem.c | 628 ++++++++++++++++++++++++++++++--------------
1 file changed, 434 insertions(+), 194 deletions(-)

Overall looks cleaner, only modify_pte_table() and modify_pmd_table()
are a little more involved ...

--
Thanks,

David / dhildenb

2020-07-10 14:01:16

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] s390: implement and optimize vmemmap_free()

On Wed, Jul 08, 2020 at 02:16:39PM +0200, David Hildenbrand wrote:
> >>> Hmm.. I really would like to see if there would be only a single page
> >>> table walker left in vmem.c, which handles both adding and removing
> >>> things.
> >>> Now we end up with two different page table walk implementations
> >>> within the same file. However not sure if it is worth the effort to
> >>> unify them though.
> >>
> >> I tried to unify vmemmap_populate() and vmem_add_range() already and
> >> didn't like the end result ... so, unifying these along with the removal
> >> part won't be any better - most probably. Open for suggestions :)
> >>
> >> (at least arm64 and x86-64 handle it similarly)
> >>
> >
> > I'll play with something like
> >
> > static void modify_pagetable(unsigned long start, unsigned long end,
> > bool direct, bool add)
> >
> > and see how it turns out.
> >
>
> Did a quick hack. With a single walker (modify_pagetable) I get
>
> arch/s390/mm/vmem.c | 628 ++++++++++++++++++++++++++++++--------------
> 1 file changed, 434 insertions(+), 194 deletions(-)
>
> Overall looks cleaner, only modify_pte_table() and modify_pmd_table()
> are a little more involved ...

Would you mind to resend the series with this integrated?

2020-07-10 14:03:27

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 0/9] s390: implement and optimize vmemmap_free()

On 10.07.20 15:57, Heiko Carstens wrote:
> On Wed, Jul 08, 2020 at 02:16:39PM +0200, David Hildenbrand wrote:
>>>>> Hmm.. I really would like to see if there would be only a single page
>>>>> table walker left in vmem.c, which handles both adding and removing
>>>>> things.
>>>>> Now we end up with two different page table walk implementations
>>>>> within the same file. However not sure if it is worth the effort to
>>>>> unify them though.
>>>>
>>>> I tried to unify vmemmap_populate() and vmem_add_range() already and
>>>> didn't like the end result ... so, unifying these along with the removal
>>>> part won't be any better - most probably. Open for suggestions :)
>>>>
>>>> (at least arm64 and x86-64 handle it similarly)
>>>>
>>>
>>> I'll play with something like
>>>
>>> static void modify_pagetable(unsigned long start, unsigned long end,
>>> bool direct, bool add)
>>>
>>> and see how it turns out.
>>>
>>
>> Did a quick hack. With a single walker (modify_pagetable) I get
>>
>> arch/s390/mm/vmem.c | 628 ++++++++++++++++++++++++++++++--------------
>> 1 file changed, 434 insertions(+), 194 deletions(-)
>>
>> Overall looks cleaner, only modify_pte_table() and modify_pmd_table()
>> are a little more involved ...
>
> Would you mind to resend the series with this integrated?
>

Yes, did some testing yesterday. Want to reshuffle/cleanup some things
before posting.

Will be out of office next week, so expect a v2 some-when in 1-2 weeks.

Cheers!

--
Thanks,

David / dhildenb