2018-09-25 15:36:27

by Masayoshi Mizuma

[permalink] [raw]
Subject: [PATCH v2 0/3] mm: Fix for movable_node boot option

This patch series are the fix for movable_node boot option
issue which was introduced by commit 124049decbb1 ("x86/e820:
put !E820_TYPE_RAM regions into memblock.reserved").

First patch, revert the commit. Second and third patch fix the
original issue.

Masayoshi Mizuma (1):
Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved"

Naoya Horiguchi (1):
mm: zero remaining unavailable struct pages

Pavel Tatashin (1):
mm: return zero_resv_unavail optimization

arch/x86/kernel/e820.c | 15 +++--------
include/linux/memblock.h | 15 -----------
mm/page_alloc.c | 54 +++++++++++++++++++++++++++-------------
3 files changed, 40 insertions(+), 44 deletions(-)

--
2.18.0



2018-09-25 15:36:32

by Masayoshi Mizuma

[permalink] [raw]
Subject: [PATCH v2 2/3] mm: zero remaining unavailable struct pages

From: Naoya Horiguchi <[email protected]>

There is a kernel panic that is triggered when reading /proc/kpageflags
on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':

BUG: unable to handle kernel paging request at fffffffffffffffe
PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014
RIP: 0010:stable_page_flags+0x27/0x3c0
Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
RSP: 0018:ffffbbd44111fde0 EFLAGS: 00010202
RAX: fffffffffffffffe RBX: 00007fffffffeff9 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffffed1182fff5c0
RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000001
R10: ffffbbd44111fed8 R11: 0000000000000000 R12: ffffed1182fff5c0
R13: 00000000000bffd7 R14: 0000000002fff5c0 R15: ffffbbd44111ff10
FS: 00007efc4335a500(0000) GS:ffff93a5bfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffffffffffffe CR3: 00000000b2a58000 CR4: 00000000001406e0
Call Trace:
kpageflags_read+0xc7/0x120
proc_reg_read+0x3c/0x60
__vfs_read+0x36/0x170
vfs_read+0x89/0x130
ksys_pread64+0x71/0x90
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7efc42e75e23
Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24

According to kernel bisection, this problem became visible due to commit
f7f99100d8d9 which changes how struct pages are initialized.

Memblock layout affects the pfn ranges covered by node/zone. Consider
that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
the default (no memmap= given) memblock layout is like below:

MEMBLOCK configuration:
memory size = 0x00000001fff75c00 reserved size = 0x000000000300c000
memory.cnt = 0x4
memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
memory[0x1] [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
memory[0x2] [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0
memory[0x3] [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
...

If you give memmap=1G!4G (so it just covers memory[0x2]),
the range [0x100000000-0x13fffffff] is gone:

MEMBLOCK configuration:
memory size = 0x00000001bff75c00 reserved size = 0x000000000300c000
memory.cnt = 0x3
memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
memory[0x1] [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
memory[0x2] [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
...

This causes shrinking node 0's pfn range because it is calculated by
the address range of memblock.memory. So some of struct pages in the
gap range are left uninitialized.

We have a function zero_resv_unavail() which does zeroing the struct
pages outside memblock.memory, but currently it covers only the reserved
unavailable range (i.e. memblock.memory && !memblock.reserved).
This patch extends it to cover all unavailable range, which fixes
the reported issue.

Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Naoya Horiguchi <[email protected]>
Tested-by: Oscar Salvador <[email protected]>
Tested-by: Masayoshi Mizuma <[email protected]>
---
include/linux/memblock.h | 15 ---------------
mm/page_alloc.c | 36 +++++++++++++++++++++++++-----------
2 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 5169205..2acdd04 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -265,21 +265,6 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved, \
nid, flags, p_start, p_end, p_nid)

-/**
- * for_each_resv_unavail_range - iterate through reserved and unavailable memory
- * @i: u64 used as loop variable
- * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
- * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
- *
- * Walks over unavailable but reserved (reserved && !memory) areas of memblock.
- * Available as soon as memblock is initialized.
- * Note: because this memory does not belong to any physical node, flags and
- * nid arguments do not make sense and thus not exported as arguments.
- */
-#define for_each_resv_unavail_range(i, p_start, p_end) \
- for_each_mem_range(i, &memblock.reserved, &memblock.memory, \
- NUMA_NO_NODE, MEMBLOCK_NONE, p_start, p_end, NULL)
-
static inline void memblock_set_region_flags(struct memblock_region *r,
enum memblock_flags flags)
{
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 89d2a2a..3b9d89e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6446,29 +6446,42 @@ void __init free_area_init_node(int nid, unsigned long *zones_size,
* struct pages which are reserved in memblock allocator and their fields
* may be accessed (for example page_to_pfn() on some configuration accesses
* flags). We must explicitly zero those struct pages.
+ *
+ * This function also addresses a similar issue where struct pages are left
+ * uninitialized because the physical address range is not covered by
+ * memblock.memory or memblock.reserved. That could happen when memblock
+ * layout is manually configured via memmap=.
*/
void __init zero_resv_unavail(void)
{
phys_addr_t start, end;
unsigned long pfn;
u64 i, pgcnt;
+ phys_addr_t next = 0;

/*
- * Loop through ranges that are reserved, but do not have reported
- * physical memory backing.
+ * Loop through unavailable ranges not covered by memblock.memory.
*/
pgcnt = 0;
- for_each_resv_unavail_range(i, &start, &end) {
- for (pfn = PFN_DOWN(start); pfn < PFN_UP(end); pfn++) {
- if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
- pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
- + pageblock_nr_pages - 1;
- continue;
+ for_each_mem_range(i, &memblock.memory, NULL,
+ NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) {
+ if (next < start) {
+ for (pfn = PFN_DOWN(next); pfn < PFN_UP(start); pfn++) {
+ if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages)))
+ continue;
+ mm_zero_struct_page(pfn_to_page(pfn));
+ pgcnt++;
}
- mm_zero_struct_page(pfn_to_page(pfn));
- pgcnt++;
}
+ next = end;
}
+ for (pfn = PFN_DOWN(next); pfn < max_pfn; pfn++) {
+ if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages)))
+ continue;
+ mm_zero_struct_page(pfn_to_page(pfn));
+ pgcnt++;
+ }
+

/*
* Struct pages that do not have backing memory. This could be because
@@ -6478,7 +6491,8 @@ void __init zero_resv_unavail(void)
* this code can be removed.
*/
if (pgcnt)
- pr_info("Reserved but unavailable: %lld pages", pgcnt);
+ pr_info("Zeroed struct page in unavailable ranges: %lld pages", pgcnt);
+
}
#endif /* CONFIG_HAVE_MEMBLOCK && !CONFIG_FLAT_NODE_MEM_MAP */

--
2.18.0


2018-09-25 15:37:17

by Masayoshi Mizuma

[permalink] [raw]
Subject: [PATCH v2 3/3] mm: return zero_resv_unavail optimization

From: Pavel Tatashin <[email protected]>

When checking for valid pfns in zero_resv_unavail(), it is not necessary to
verify that pfns within pageblock_nr_pages ranges are valid, only the first
one needs to be checked. This is because memory for pages are allocated in
contiguous chunks that contain pageblock_nr_pages struct pages.

Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-off-by: Masayoshi Mizuma <[email protected]>
---
mm/page_alloc.c | 46 ++++++++++++++++++++++++++--------------------
1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3b9d89e..bd5b7e4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6440,6 +6440,29 @@ void __init free_area_init_node(int nid, unsigned long *zones_size,
}

#if defined(CONFIG_HAVE_MEMBLOCK) && !defined(CONFIG_FLAT_NODE_MEM_MAP)
+
+/*
+ * Zero all valid struct pages in range [spfn, epfn), return number of struct
+ * pages zeroed
+ */
+static u64 zero_pfn_range(unsigned long spfn, unsigned long epfn)
+{
+ unsigned long pfn;
+ u64 pgcnt = 0;
+
+ for (pfn = spfn; pfn < epfn; pfn++) {
+ if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
+ pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
+ + pageblock_nr_pages - 1;
+ continue;
+ }
+ mm_zero_struct_page(pfn_to_page(pfn));
+ pgcnt++;
+ }
+
+ return pgcnt;
+}
+
/*
* Only struct pages that are backed by physical memory are zeroed and
* initialized by going through __init_single_page(). But, there are some
@@ -6455,7 +6478,6 @@ void __init free_area_init_node(int nid, unsigned long *zones_size,
void __init zero_resv_unavail(void)
{
phys_addr_t start, end;
- unsigned long pfn;
u64 i, pgcnt;
phys_addr_t next = 0;

@@ -6465,34 +6487,18 @@ void __init zero_resv_unavail(void)
pgcnt = 0;
for_each_mem_range(i, &memblock.memory, NULL,
NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) {
- if (next < start) {
- for (pfn = PFN_DOWN(next); pfn < PFN_UP(start); pfn++) {
- if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages)))
- continue;
- mm_zero_struct_page(pfn_to_page(pfn));
- pgcnt++;
- }
- }
+ if (next < start)
+ pgcnt += zero_pfn_range(PFN_DOWN(next), PFN_UP(start));
next = end;
}
- for (pfn = PFN_DOWN(next); pfn < max_pfn; pfn++) {
- if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages)))
- continue;
- mm_zero_struct_page(pfn_to_page(pfn));
- pgcnt++;
- }
-
+ pgcnt += zero_pfn_range(PFN_DOWN(next), max_pfn);

/*
* Struct pages that do not have backing memory. This could be because
* firmware is using some of this memory, or for some other reasons.
- * Once memblock is changed so such behaviour is not allowed: i.e.
- * list of "reserved" memory must be a subset of list of "memory", then
- * this code can be removed.
*/
if (pgcnt)
pr_info("Zeroed struct page in unavailable ranges: %lld pages", pgcnt);
-
}
#endif /* CONFIG_HAVE_MEMBLOCK && !CONFIG_FLAT_NODE_MEM_MAP */

--
2.18.0


2018-09-25 15:37:45

by Masayoshi Mizuma

[permalink] [raw]
Subject: [PATCH v2 1/3] Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved"

From: Masayoshi Mizuma <[email protected]>

commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into
memblock.reserved") breaks movable_node kernel option because it
changed the memory gap range to reserved memblock. So, the node
is marked as Normal zone even if the SRAT has Hot plaggable affinity.

=====================================================================
kernel: BIOS-e820: [mem 0x0000180000000000-0x0000180fffffffff] usable
kernel: BIOS-e820: [mem 0x00001c0000000000-0x00001c0fffffffff] usable
...
kernel: reserved[0x12]#011[0x0000181000000000-0x00001bffffffffff], 0x000003f000000000 bytes flags: 0x0
...
kernel: ACPI: SRAT: Node 2 PXM 6 [mem 0x180000000000-0x1bffffffffff] hotplug
kernel: ACPI: SRAT: Node 3 PXM 7 [mem 0x1c0000000000-0x1fffffffffff] hotplug
...
kernel: Movable zone start for each node
kernel: Node 3: 0x00001c0000000000
kernel: Early memory node ranges
...
=====================================================================

Naoya's v1 patch [*] fixes the original issue and this movable_node
issue doesn't occur.
Let's revert commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM
regions into memblock.reserved") and apply the v1 patch.

[*] https://lkml.org/lkml/2018/6/13/27

Signed-off-by: Masayoshi Mizuma <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
---
arch/x86/kernel/e820.c | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index c88c23c..d1f25c8 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1248,7 +1248,6 @@ void __init e820__memblock_setup(void)
{
int i;
u64 end;
- u64 addr = 0;

/*
* The bootstrap memblock region count maximum is 128 entries
@@ -1265,21 +1264,13 @@ void __init e820__memblock_setup(void)
struct e820_entry *entry = &e820_table->entries[i];

end = entry->addr + entry->size;
- if (addr < entry->addr)
- memblock_reserve(addr, entry->addr - addr);
- addr = end;
if (end != (resource_size_t)end)
continue;

- /*
- * all !E820_TYPE_RAM ranges (including gap ranges) are put
- * into memblock.reserved to make sure that struct pages in
- * such regions are not left uninitialized after bootup.
- */
if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
- memblock_reserve(entry->addr, entry->size);
- else
- memblock_add(entry->addr, entry->size);
+ continue;
+
+ memblock_add(entry->addr, entry->size);
}

/* Throw away partial pages: */
--
2.18.0


2018-09-27 20:43:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] mm: Fix for movable_node boot option

On Tue, 25 Sep 2018, Masayoshi Mizuma wrote:

> This patch series are the fix for movable_node boot option
> issue which was introduced by commit 124049decbb1 ("x86/e820:
> put !E820_TYPE_RAM regions into memblock.reserved").
>
> First patch, revert the commit. Second and third patch fix the
> original issue.

Can the mm folks please comment on this?

Thanks,

tglx

2018-09-28 00:23:20

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] mm: return zero_resv_unavail optimization

On Tue, Sep 25, 2018 at 11:35:32AM -0400, Masayoshi Mizuma wrote:
> From: Pavel Tatashin <[email protected]>
>
> When checking for valid pfns in zero_resv_unavail(), it is not necessary to
> verify that pfns within pageblock_nr_pages ranges are valid, only the first
> one needs to be checked. This is because memory for pages are allocated in
> contiguous chunks that contain pageblock_nr_pages struct pages.
>
> Signed-off-by: Pavel Tatashin <[email protected]>
> Reviewed-off-by: Masayoshi Mizuma <[email protected]>

According to convention, review tag is formatted like "Reviewed-by: ...",
Otherwise, looks good to me.

Acked-by: Naoya Horiguchi <[email protected]>

> ---
> mm/page_alloc.c | 46 ++++++++++++++++++++++++++--------------------
> 1 file changed, 26 insertions(+), 20 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3b9d89e..bd5b7e4 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6440,6 +6440,29 @@ void __init free_area_init_node(int nid, unsigned long *zones_size,
> }
>
> #if defined(CONFIG_HAVE_MEMBLOCK) && !defined(CONFIG_FLAT_NODE_MEM_MAP)
> +
> +/*
> + * Zero all valid struct pages in range [spfn, epfn), return number of struct
> + * pages zeroed
> + */
> +static u64 zero_pfn_range(unsigned long spfn, unsigned long epfn)
> +{
> + unsigned long pfn;
> + u64 pgcnt = 0;
> +
> + for (pfn = spfn; pfn < epfn; pfn++) {
> + if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
> + pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
> + + pageblock_nr_pages - 1;
> + continue;
> + }
> + mm_zero_struct_page(pfn_to_page(pfn));
> + pgcnt++;
> + }
> +
> + return pgcnt;
> +}
> +
> /*
> * Only struct pages that are backed by physical memory are zeroed and
> * initialized by going through __init_single_page(). But, there are some
> @@ -6455,7 +6478,6 @@ void __init free_area_init_node(int nid, unsigned long *zones_size,
> void __init zero_resv_unavail(void)
> {
> phys_addr_t start, end;
> - unsigned long pfn;
> u64 i, pgcnt;
> phys_addr_t next = 0;
>
> @@ -6465,34 +6487,18 @@ void __init zero_resv_unavail(void)
> pgcnt = 0;
> for_each_mem_range(i, &memblock.memory, NULL,
> NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) {
> - if (next < start) {
> - for (pfn = PFN_DOWN(next); pfn < PFN_UP(start); pfn++) {
> - if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages)))
> - continue;
> - mm_zero_struct_page(pfn_to_page(pfn));
> - pgcnt++;
> - }
> - }
> + if (next < start)
> + pgcnt += zero_pfn_range(PFN_DOWN(next), PFN_UP(start));
> next = end;
> }
> - for (pfn = PFN_DOWN(next); pfn < max_pfn; pfn++) {
> - if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages)))
> - continue;
> - mm_zero_struct_page(pfn_to_page(pfn));
> - pgcnt++;
> - }
> -
> + pgcnt += zero_pfn_range(PFN_DOWN(next), max_pfn);
>
> /*
> * Struct pages that do not have backing memory. This could be because
> * firmware is using some of this memory, or for some other reasons.
> - * Once memblock is changed so such behaviour is not allowed: i.e.
> - * list of "reserved" memory must be a subset of list of "memory", then
> - * this code can be removed.
> */
> if (pgcnt)
> pr_info("Zeroed struct page in unavailable ranges: %lld pages", pgcnt);
> -
> }
> #endif /* CONFIG_HAVE_MEMBLOCK && !CONFIG_FLAT_NODE_MEM_MAP */
>
> --
> 2.18.0
>
>

2018-09-28 15:36:44

by Masayoshi Mizuma

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] mm: return zero_resv_unavail optimization

On Fri, Sep 28, 2018 at 12:19:44AM +0000, Naoya Horiguchi wrote:
> On Tue, Sep 25, 2018 at 11:35:32AM -0400, Masayoshi Mizuma wrote:
> > From: Pavel Tatashin <[email protected]>
> >
> > When checking for valid pfns in zero_resv_unavail(), it is not necessary to
> > verify that pfns within pageblock_nr_pages ranges are valid, only the first
> > one needs to be checked. This is because memory for pages are allocated in
> > contiguous chunks that contain pageblock_nr_pages struct pages.
> >
> > Signed-off-by: Pavel Tatashin <[email protected]>
> > Reviewed-off-by: Masayoshi Mizuma <[email protected]>
>
> According to convention, review tag is formatted like "Reviewed-by: ...",

Sorry for the typo...

> Otherwise, looks good to me.
>
> Acked-by: Naoya Horiguchi <[email protected]>

Thanks!

- Masa

>
> > ---
> > mm/page_alloc.c | 46 ++++++++++++++++++++++++++--------------------
> > 1 file changed, 26 insertions(+), 20 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 3b9d89e..bd5b7e4 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -6440,6 +6440,29 @@ void __init free_area_init_node(int nid, unsigned long *zones_size,
> > }
> >
> > #if defined(CONFIG_HAVE_MEMBLOCK) && !defined(CONFIG_FLAT_NODE_MEM_MAP)
> > +
> > +/*
> > + * Zero all valid struct pages in range [spfn, epfn), return number of struct
> > + * pages zeroed
> > + */
> > +static u64 zero_pfn_range(unsigned long spfn, unsigned long epfn)
> > +{
> > + unsigned long pfn;
> > + u64 pgcnt = 0;
> > +
> > + for (pfn = spfn; pfn < epfn; pfn++) {
> > + if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
> > + pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
> > + + pageblock_nr_pages - 1;
> > + continue;
> > + }
> > + mm_zero_struct_page(pfn_to_page(pfn));
> > + pgcnt++;
> > + }
> > +
> > + return pgcnt;
> > +}
> > +
> > /*
> > * Only struct pages that are backed by physical memory are zeroed and
> > * initialized by going through __init_single_page(). But, there are some
> > @@ -6455,7 +6478,6 @@ void __init free_area_init_node(int nid, unsigned long *zones_size,
> > void __init zero_resv_unavail(void)
> > {
> > phys_addr_t start, end;
> > - unsigned long pfn;
> > u64 i, pgcnt;
> > phys_addr_t next = 0;
> >
> > @@ -6465,34 +6487,18 @@ void __init zero_resv_unavail(void)
> > pgcnt = 0;
> > for_each_mem_range(i, &memblock.memory, NULL,
> > NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) {
> > - if (next < start) {
> > - for (pfn = PFN_DOWN(next); pfn < PFN_UP(start); pfn++) {
> > - if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages)))
> > - continue;
> > - mm_zero_struct_page(pfn_to_page(pfn));
> > - pgcnt++;
> > - }
> > - }
> > + if (next < start)
> > + pgcnt += zero_pfn_range(PFN_DOWN(next), PFN_UP(start));
> > next = end;
> > }
> > - for (pfn = PFN_DOWN(next); pfn < max_pfn; pfn++) {
> > - if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages)))
> > - continue;
> > - mm_zero_struct_page(pfn_to_page(pfn));
> > - pgcnt++;
> > - }
> > -
> > + pgcnt += zero_pfn_range(PFN_DOWN(next), max_pfn);
> >
> > /*
> > * Struct pages that do not have backing memory. This could be because
> > * firmware is using some of this memory, or for some other reasons.
> > - * Once memblock is changed so such behaviour is not allowed: i.e.
> > - * list of "reserved" memory must be a subset of list of "memory", then
> > - * this code can be removed.
> > */
> > if (pgcnt)
> > pr_info("Zeroed struct page in unavailable ranges: %lld pages", pgcnt);
> > -
> > }
> > #endif /* CONFIG_HAVE_MEMBLOCK && !CONFIG_FLAT_NODE_MEM_MAP */
> >
> > --
> > 2.18.0
> >
> >

2018-10-02 09:40:13

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved"


* Masayoshi Mizuma <[email protected]> wrote:

> From: Masayoshi Mizuma <[email protected]>
>
> commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into
> memblock.reserved") breaks movable_node kernel option because it
> changed the memory gap range to reserved memblock. So, the node
> is marked as Normal zone even if the SRAT has Hot plaggable affinity.
>
> =====================================================================
> kernel: BIOS-e820: [mem 0x0000180000000000-0x0000180fffffffff] usable
> kernel: BIOS-e820: [mem 0x00001c0000000000-0x00001c0fffffffff] usable
> ...
> kernel: reserved[0x12]#011[0x0000181000000000-0x00001bffffffffff], 0x000003f000000000 bytes flags: 0x0
> ...
> kernel: ACPI: SRAT: Node 2 PXM 6 [mem 0x180000000000-0x1bffffffffff] hotplug
> kernel: ACPI: SRAT: Node 3 PXM 7 [mem 0x1c0000000000-0x1fffffffffff] hotplug
> ...
> kernel: Movable zone start for each node
> kernel: Node 3: 0x00001c0000000000
> kernel: Early memory node ranges
> ...
> =====================================================================
>
> Naoya's v1 patch [*] fixes the original issue and this movable_node
> issue doesn't occur.
> Let's revert commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM
> regions into memblock.reserved") and apply the v1 patch.
>
> [*] https://lkml.org/lkml/2018/6/13/27
>
> Signed-off-by: Masayoshi Mizuma <[email protected]>
> Reviewed-by: Pavel Tatashin <[email protected]>
> ---
> arch/x86/kernel/e820.c | 15 +++------------
> 1 file changed, 3 insertions(+), 12 deletions(-)

Bad ordering which introduces the bug and thus breaks bisection of related issues: the fixes
should come first, then the revert of the unnecessary or bad fix.

Thanks,

Ingo

2018-10-02 13:52:52

by Masayoshi Mizuma

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved"

On Tue, Oct 02, 2018 at 11:39:40AM +0200, Ingo Molnar wrote:
>
> * Masayoshi Mizuma <[email protected]> wrote:
>
> > From: Masayoshi Mizuma <[email protected]>
> >
> > commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into
> > memblock.reserved") breaks movable_node kernel option because it
> > changed the memory gap range to reserved memblock. So, the node
> > is marked as Normal zone even if the SRAT has Hot plaggable affinity.
> >
> > =====================================================================
> > kernel: BIOS-e820: [mem 0x0000180000000000-0x0000180fffffffff] usable
> > kernel: BIOS-e820: [mem 0x00001c0000000000-0x00001c0fffffffff] usable
> > ...
> > kernel: reserved[0x12]#011[0x0000181000000000-0x00001bffffffffff], 0x000003f000000000 bytes flags: 0x0
> > ...
> > kernel: ACPI: SRAT: Node 2 PXM 6 [mem 0x180000000000-0x1bffffffffff] hotplug
> > kernel: ACPI: SRAT: Node 3 PXM 7 [mem 0x1c0000000000-0x1fffffffffff] hotplug
> > ...
> > kernel: Movable zone start for each node
> > kernel: Node 3: 0x00001c0000000000
> > kernel: Early memory node ranges
> > ...
> > =====================================================================
> >
> > Naoya's v1 patch [*] fixes the original issue and this movable_node
> > issue doesn't occur.
> > Let's revert commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM
> > regions into memblock.reserved") and apply the v1 patch.
> >
> > [*] https://lkml.org/lkml/2018/6/13/27
> >
> > Signed-off-by: Masayoshi Mizuma <[email protected]>
> > Reviewed-by: Pavel Tatashin <[email protected]>
> > ---
> > arch/x86/kernel/e820.c | 15 +++------------
> > 1 file changed, 3 insertions(+), 12 deletions(-)
>
> Bad ordering which introduces the bug and thus breaks bisection of related issues: the fixes
> should come first, then the revert of the unnecessary or bad fix.

Thanks. I'll change the order, then resend them.

- Masa

2018-10-02 14:02:26

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] mm: Fix for movable_node boot option

On Thu 27-09-18 22:41:36, Thomas Gleixner wrote:
> On Tue, 25 Sep 2018, Masayoshi Mizuma wrote:
>
> > This patch series are the fix for movable_node boot option
> > issue which was introduced by commit 124049decbb1 ("x86/e820:
> > put !E820_TYPE_RAM regions into memblock.reserved").
> >
> > First patch, revert the commit. Second and third patch fix the
> > original issue.
>
> Can the mm folks please comment on this?

I was under impression that Pavel who authored the original change which
got reverted here has reviewed these patches.
--
Michal Hocko
SUSE Labs

2018-10-05 19:03:23

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] mm: Fix for movable_node boot option

I have not reviewed them yet. I am waiting for Masayoshi to send a new
series with correct order as Ingo requested.

Pavel

On 10/2/18 10:01 AM, Michal Hocko wrote:
> On Thu 27-09-18 22:41:36, Thomas Gleixner wrote:
>> On Tue, 25 Sep 2018, Masayoshi Mizuma wrote:
>>
>>> This patch series are the fix for movable_node boot option
>>> issue which was introduced by commit 124049decbb1 ("x86/e820:
>>> put !E820_TYPE_RAM regions into memblock.reserved").
>>>
>>> First patch, revert the commit. Second and third patch fix the
>>> original issue.
>>
>> Can the mm folks please comment on this?
>
> I was under impression that Pavel who authored the original change which
> got reverted here has reviewed these patches.
>