2018-01-31 21:04:23

by Pavel Tatashin

[permalink] [raw]
Subject: [PATCH v2 0/2] optimize memory hotplug

Changelog:
v1 - v2
Added struct page poisoning checking in order to verify that
struct pages are never accessed until initialized during memory
hotplug

Pavel Tatashin (2):
mm: uninitialized struct page poisoning sanity checking
mm, memory_hotplug: optimize memory hotplug

drivers/base/memory.c | 38 +++++++++++++++++++++-----------------
drivers/base/node.c | 17 ++++++++++-------
include/linux/memory_hotplug.h | 2 ++
include/linux/mm.h | 4 +++-
include/linux/node.h | 4 ++--
include/linux/page-flags.h | 22 +++++++++++++++++-----
mm/memblock.c | 2 +-
mm/memory_hotplug.c | 21 ++-------------------
mm/page_alloc.c | 28 ++++++++++------------------
mm/sparse.c | 29 ++++++++++++++++++++++++++---
10 files changed, 94 insertions(+), 73 deletions(-)

--
2.16.1



2018-01-31 21:06:12

by Pavel Tatashin

[permalink] [raw]
Subject: [PATCH v2 2/2] mm, memory_hotplug: optimize memory hotplug

This patch was inspired by the discussion of this problem:
http://lkml.kernel.org/r/[email protected]

Currently, during memory hotplugging we traverse struct pages several
times:

1. memset(0) in sparse_add_one_section()
2. loop in __add_section() to set do: set_page_node(page, nid); and
SetPageReserved(page);
3. loop in pages_correctly_reserved() to check that SetPageReserved is set.
4. loop in memmap_init_zone() to call __init_single_pfn()

This patch removes loops 1, 2, and 3 and only leaves the loop 4, where all
struct page fields are initialized in one go, the same as it is now done
during boot.

The benefits:
- We improve the memory hotplug performance because we are not evicting
cache several times and also reduce loop branching overheads.

- Remove condition from hotpath in __init_single_pfn(), that was added in
order to fix the problem that was reported by Bharata in the above email
thread, thus also improve the performance during normal boot.

- Make memory hotplug more similar to boot memory initialization path
because we zero and initialize struct pages only in one function.

- Simplifies memory hotplug strut page initialization code, and thus
enables future improvements, such as multi-threading the initialization
of struct pages in order to improve the hotplug performance even further
on larger machines.

Signed-off-by: Pavel Tatashin <[email protected]>
---
drivers/base/memory.c | 38 +++++++++++++++++++++-----------------
drivers/base/node.c | 17 ++++++++++-------
include/linux/memory_hotplug.h | 2 ++
include/linux/node.h | 4 ++--
mm/memory_hotplug.c | 21 ++-------------------
mm/page_alloc.c | 28 ++++++++++------------------
mm/sparse.c | 29 ++++++++++++++++++++++++++---
7 files changed, 73 insertions(+), 66 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index fe4b24f05f6a..a14fb0cd424a 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -187,13 +187,14 @@ int memory_isolate_notify(unsigned long val, void *v)
}

/*
- * The probe routines leave the pages reserved, just as the bootmem code does.
- * Make sure they're still that way.
+ * The probe routines leave the pages uninitialized, just as the bootmem code
+ * does. Make sure we do not access them, but instead use only information from
+ * within sections.
*/
-static bool pages_correctly_reserved(unsigned long start_pfn)
+static bool pages_correctly_probed(unsigned long start_pfn)
{
- int i, j;
- struct page *page;
+ unsigned long section_nr = pfn_to_section_nr(start_pfn);
+ unsigned long section_nr_end = section_nr + sections_per_block;
unsigned long pfn = start_pfn;

/*
@@ -201,21 +202,24 @@ static bool pages_correctly_reserved(unsigned long start_pfn)
* SPARSEMEM_VMEMMAP. We lookup the page once per section
* and assume memmap is contiguous within each section
*/
- for (i = 0; i < sections_per_block; i++, pfn += PAGES_PER_SECTION) {
+ for (; section_nr < section_nr_end; section_nr++) {
if (WARN_ON_ONCE(!pfn_valid(pfn)))
return false;
- page = pfn_to_page(pfn);
-
- for (j = 0; j < PAGES_PER_SECTION; j++) {
- if (PageReserved(page + j))
- continue;
-
- printk(KERN_WARNING "section number %ld page number %d "
- "not reserved, was it already online?\n",
- pfn_to_section_nr(pfn), j);

+ if (!present_section_nr(section_nr)) {
+ pr_warn("section %ld pfn[%lx, %lx) not present",
+ section_nr, pfn, pfn + PAGES_PER_SECTION);
+ return false;
+ } else if (!valid_section_nr(section_nr)) {
+ pr_warn("section %ld pfn[%lx, %lx) no valid memmap",
+ section_nr, pfn, pfn + PAGES_PER_SECTION);
+ return false;
+ } else if (online_section_nr(section_nr)) {
+ pr_warn("section %ld pfn[%lx, %lx) is already online",
+ section_nr, pfn, pfn + PAGES_PER_SECTION);
return false;
}
+ pfn += PAGES_PER_SECTION;
}

return true;
@@ -237,7 +241,7 @@ memory_block_action(unsigned long phys_index, unsigned long action, int online_t

switch (action) {
case MEM_ONLINE:
- if (!pages_correctly_reserved(start_pfn))
+ if (!pages_correctly_probed(start_pfn))
return -EBUSY;

ret = online_pages(start_pfn, nr_pages, online_type);
@@ -727,7 +731,7 @@ int register_new_memory(int nid, struct mem_section *section)
}

if (mem->section_count == sections_per_block)
- ret = register_mem_sect_under_node(mem, nid);
+ ret = register_mem_sect_under_node(mem, nid, false);
out:
mutex_unlock(&mem_sysfs_mutex);
return ret;
diff --git a/drivers/base/node.c b/drivers/base/node.c
index ee090ab9171c..f1c0c130ac88 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -397,7 +397,8 @@ static int __ref get_nid_for_pfn(unsigned long pfn)
}

/* register memory section under specified node if it spans that node */
-int register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
+int register_mem_sect_under_node(struct memory_block *mem_blk, int nid,
+ bool check_nid)
{
int ret;
unsigned long pfn, sect_start_pfn, sect_end_pfn;
@@ -423,11 +424,13 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
continue;
}

- page_nid = get_nid_for_pfn(pfn);
- if (page_nid < 0)
- continue;
- if (page_nid != nid)
- continue;
+ if (check_nid) {
+ page_nid = get_nid_for_pfn(pfn);
+ if (page_nid < 0)
+ continue;
+ if (page_nid != nid)
+ continue;
+ }
ret = sysfs_create_link_nowarn(&node_devices[nid]->dev.kobj,
&mem_blk->dev.kobj,
kobject_name(&mem_blk->dev.kobj));
@@ -502,7 +505,7 @@ int link_mem_sections(int nid, unsigned long start_pfn, unsigned long nr_pages)

mem_blk = find_memory_block_hinted(mem_sect, mem_blk);

- ret = register_mem_sect_under_node(mem_blk, nid);
+ ret = register_mem_sect_under_node(mem_blk, nid, true);
if (!err)
err = ret;

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 787411bdeadb..13acf181fb4b 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -234,6 +234,8 @@ void put_online_mems(void);
void mem_hotplug_begin(void);
void mem_hotplug_done(void);

+int get_section_nid(unsigned long section_nr);
+
#else /* ! CONFIG_MEMORY_HOTPLUG */
#define pfn_to_online_page(pfn) \
({ \
diff --git a/include/linux/node.h b/include/linux/node.h
index 4ece0fee0ffc..41f171861dcc 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -67,7 +67,7 @@ extern void unregister_one_node(int nid);
extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
extern int register_mem_sect_under_node(struct memory_block *mem_blk,
- int nid);
+ int nid, bool check_nid);
extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
unsigned long phys_index);

@@ -97,7 +97,7 @@ static inline int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
return 0;
}
static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
- int nid)
+ int nid, bool check_nid)
{
return 0;
}
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6a9bee33ffa7..e7d11a14b1d1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -250,7 +250,6 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
struct vmem_altmap *altmap, bool want_memblock)
{
int ret;
- int i;

if (pfn_valid(phys_start_pfn))
return -EEXIST;
@@ -259,23 +258,6 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
if (ret < 0)
return ret;

- /*
- * Make all the pages reserved so that nobody will stumble over half
- * initialized state.
- * FIXME: We also have to associate it with a node because page_to_nid
- * relies on having page with the proper node.
- */
- for (i = 0; i < PAGES_PER_SECTION; i++) {
- unsigned long pfn = phys_start_pfn + i;
- struct page *page;
- if (!pfn_valid(pfn))
- continue;
-
- page = pfn_to_page(pfn);
- set_page_node(page, nid);
- SetPageReserved(page);
- }
-
if (!want_memblock)
return 0;

@@ -913,7 +895,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
int ret;
struct memory_notify arg;

- nid = pfn_to_nid(pfn);
+ nid = get_section_nid(pfn_to_section_nr(pfn));
+
/* associate pfn range with the zone */
zone = move_pfn_range(online_type, nid, pfn, nr_pages);

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e2b42f603b1a..6af4c2cb88f6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1178,10 +1178,9 @@ static void free_one_page(struct zone *zone,
}

static void __meminit __init_single_page(struct page *page, unsigned long pfn,
- unsigned long zone, int nid, bool zero)
+ unsigned long zone, int nid)
{
- if (zero)
- mm_zero_struct_page(page);
+ mm_zero_struct_page(page);
set_page_links(page, zone, nid, pfn);
init_page_count(page);
page_mapcount_reset(page);
@@ -1195,12 +1194,6 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn,
#endif
}

-static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone,
- int nid, bool zero)
-{
- return __init_single_page(pfn_to_page(pfn), pfn, zone, nid, zero);
-}
-
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
static void __meminit init_reserved_page(unsigned long pfn)
{
@@ -1219,7 +1212,7 @@ static void __meminit init_reserved_page(unsigned long pfn)
if (pfn >= zone->zone_start_pfn && pfn < zone_end_pfn(zone))
break;
}
- __init_single_pfn(pfn, zid, nid, true);
+ __init_single_page(pfn_to_page(pfn), pfn, zid, nid);
}
#else
static inline void init_reserved_page(unsigned long pfn)
@@ -1536,7 +1529,7 @@ static unsigned long __init deferred_init_pages(int nid, int zid,
} else {
page++;
}
- __init_single_page(page, pfn, zid, nid, true);
+ __init_single_page(page, pfn, zid, nid);
nr_pages++;
}
return (nr_pages);
@@ -5329,6 +5322,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
pg_data_t *pgdat = NODE_DATA(nid);
unsigned long pfn;
unsigned long nr_initialised = 0;
+ struct page *page;
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
struct memblock_region *r = NULL, *tmp;
#endif
@@ -5390,6 +5384,11 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
#endif

not_early:
+ page = pfn_to_page(pfn);
+ __init_single_page(page, pfn, zone, nid);
+ if (context == MEMMAP_HOTPLUG)
+ SetPageReserved(page);
+
/*
* Mark the block movable so that blocks are reserved for
* movable at startup. This will force kernel allocations
@@ -5406,15 +5405,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
* because this is done early in sparse_add_one_section
*/
if (!(pfn & (pageblock_nr_pages - 1))) {
- struct page *page = pfn_to_page(pfn);
-
- __init_single_page(page, pfn, zone, nid,
- context != MEMMAP_HOTPLUG);
set_pageblock_migratetype(page, MIGRATE_MOVABLE);
cond_resched();
- } else {
- __init_single_pfn(pfn, zone, nid,
- context != MEMMAP_HOTPLUG);
}
}
}
diff --git a/mm/sparse.c b/mm/sparse.c
index 7af5e7a92528..d7808307023b 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -30,11 +30,14 @@ struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]
#endif
EXPORT_SYMBOL(mem_section);

-#ifdef NODE_NOT_IN_PAGE_FLAGS
+#if defined(NODE_NOT_IN_PAGE_FLAGS) || defined(CONFIG_MEMORY_HOTPLUG)
/*
* If we did not store the node number in the page then we have to
* do a lookup in the section_to_node_table in order to find which
* node the page belongs to.
+ *
+ * We also use this data in case memory hotplugging is enabled to be
+ * able to determine nid while struct pages are not yet initialized.
*/
#if MAX_NUMNODES <= 256
static u8 section_to_node_table[NR_MEM_SECTIONS] __cacheline_aligned;
@@ -42,17 +45,28 @@ static u8 section_to_node_table[NR_MEM_SECTIONS] __cacheline_aligned;
static u16 section_to_node_table[NR_MEM_SECTIONS] __cacheline_aligned;
#endif

+#ifdef NODE_NOT_IN_PAGE_FLAGS
int page_to_nid(const struct page *page)
{
return section_to_node_table[page_to_section(page)];
}
EXPORT_SYMBOL(page_to_nid);
+#endif /* NODE_NOT_IN_PAGE_FLAGS */

static void set_section_nid(unsigned long section_nr, int nid)
{
section_to_node_table[section_nr] = nid;
}
-#else /* !NODE_NOT_IN_PAGE_FLAGS */
+
+/* Return NID for given section number */
+int get_section_nid(unsigned long section_nr)
+{
+ if (WARN_ON(section_nr >= NR_MEM_SECTIONS))
+ return 0;
+ return section_to_node_table[section_nr];
+}
+EXPORT_SYMBOL(get_section_nid);
+#else /* ! (NODE_NOT_IN_PAGE_FLAGS || CONFIG_MEMORY_HOTPLUG) */
static inline void set_section_nid(unsigned long section_nr, int nid)
{
}
@@ -816,7 +830,14 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat,
goto out;
}

- memset(memmap, 0, sizeof(struct page) * PAGES_PER_SECTION);
+#ifdef CONFIG_DEBUG_VM
+ /*
+ * poison uninitialized struct pages in order to catch invalid flags
+ * combinations.
+ */
+ memset(memmap, PAGE_POISON_PATTERN,
+ sizeof(struct page) * PAGES_PER_SECTION);
+#endif

section_mark_present(ms);

@@ -827,6 +848,8 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat,
if (ret <= 0) {
kfree(usemap);
__kfree_section_memmap(memmap, altmap);
+ } else {
+ set_section_nid(section_nr, pgdat->node_id);
}
return ret;
}
--
2.16.1


2018-01-31 21:06:45

by Pavel Tatashin

[permalink] [raw]
Subject: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking

During boot we poison struct page memory in order to ensure that no one is
accessing this memory until the struct pages are initialized in
__init_single_page().

This patch adds more scrutiny to this checking, by making sure that flags
do not equal to poison pattern when the are accessed. The pattern is all
ones.

Since, node id is also stored in struct page, and may be accessed quiet
early we add the enforcement into page_to_nid() function as well.

Signed-off-by: Pavel Tatashin <[email protected]>
---
include/linux/mm.h | 4 +++-
include/linux/page-flags.h | 22 +++++++++++++++++-----
mm/memblock.c | 2 +-
3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 42b7154291e4..e0281de2cb13 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -896,7 +896,9 @@ extern int page_to_nid(const struct page *page);
#else
static inline int page_to_nid(const struct page *page)
{
- return (page->flags >> NODES_PGSHIFT) & NODES_MASK;
+ struct page *p = (struct page *)page;
+
+ return (PF_POISONED_CHECK(p)->flags >> NODES_PGSHIFT) & NODES_MASK;
}
#endif

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 3ec44e27aa9d..743644b73359 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -161,9 +161,18 @@ static __always_inline int PageCompound(struct page *page)
return test_bit(PG_head, &page->flags) || PageTail(page);
}

+#define PAGE_POISON_PATTERN ~0ul
+static inline int PagePoisoned(const struct page *page)
+{
+ return page->flags == PAGE_POISON_PATTERN;
+}
+
/*
* Page flags policies wrt compound pages
*
+ * PF_POISONED_CHECK
+ * check if this struct page poisoned/uninitialized
+ *
* PF_ANY:
* the page flag is relevant for small, head and tail pages.
*
@@ -181,17 +190,20 @@ static __always_inline int PageCompound(struct page *page)
* PF_NO_COMPOUND:
* the page flag is not relevant for compound pages.
*/
-#define PF_ANY(page, enforce) page
-#define PF_HEAD(page, enforce) compound_head(page)
+#define PF_POISONED_CHECK(page) ({ \
+ VM_BUG_ON_PGFLAGS(PagePoisoned(page), page); \
+ page; })
+#define PF_ANY(page, enforce) PF_POISONED_CHECK(page)
+#define PF_HEAD(page, enforce) PF_POISONED_CHECK(compound_head(page))
#define PF_ONLY_HEAD(page, enforce) ({ \
VM_BUG_ON_PGFLAGS(PageTail(page), page); \
- page;})
+ PF_POISONED_CHECK(page); })
#define PF_NO_TAIL(page, enforce) ({ \
VM_BUG_ON_PGFLAGS(enforce && PageTail(page), page); \
- compound_head(page);})
+ PF_POISONED_CHECK(compound_head(page)); })
#define PF_NO_COMPOUND(page, enforce) ({ \
VM_BUG_ON_PGFLAGS(enforce && PageCompound(page), page); \
- page;})
+ PF_POISONED_CHECK(page); })

/*
* Macros to create function definitions for page flags
diff --git a/mm/memblock.c b/mm/memblock.c
index 5a9ca2a1751b..d85c8754e0ce 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1373,7 +1373,7 @@ void * __init memblock_virt_alloc_try_nid_raw(
min_addr, max_addr, nid);
#ifdef CONFIG_DEBUG_VM
if (ptr && size > 0)
- memset(ptr, 0xff, size);
+ memset(ptr, PAGE_POISON_PATTERN, size);
#endif
return ptr;
}
--
2.16.1


2018-02-02 04:15:32

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking

Hi Pavel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mmotm/master]
[also build test WARNING on v4.15 next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Pavel-Tatashin/mm-uninitialized-struct-page-poisoning-sanity-checking/20180202-105827
base: git://git.cmpxchg.org/linux-mmotm.git master
config: x86_64-randconfig-x013-201804 (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

All warnings (new ones prefixed by >>):

In file included from include/linux/page_ref.h:7:0,
from include/linux/mm.h:26,
from include/linux/memblock.h:18,
from mm/memblock.c:21:
mm/memblock.c: In function 'memblock_virt_alloc_try_nid_raw':
>> include/linux/page-flags.h:159:29: warning: overflow in implicit constant conversion [-Woverflow]
#define PAGE_POISON_PATTERN ~0ul
^
>> mm/memblock.c:1376:15: note: in expansion of macro 'PAGE_POISON_PATTERN'
memset(ptr, PAGE_POISON_PATTERN, size);
^~~~~~~~~~~~~~~~~~~

vim +159 include/linux/page-flags.h

158
> 159 #define PAGE_POISON_PATTERN ~0ul
160 static inline int PagePoisoned(const struct page *page)
161 {
162 return page->flags == PAGE_POISON_PATTERN;
163 }
164

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.68 kB)
.config.gz (34.10 kB)
Download all attachments

2018-02-02 04:51:05

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] mm, memory_hotplug: optimize memory hotplug

Hi Pavel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mmotm/master]
[also build test WARNING on next-20180201]
[cannot apply to v4.15]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Pavel-Tatashin/mm-uninitialized-struct-page-poisoning-sanity-checking/20180202-105827
base: git://git.cmpxchg.org/linux-mmotm.git master
config: x86_64-randconfig-x019-201804 (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

All warnings (new ones prefixed by >>):

In file included from include/linux/page_ref.h:7:0,
from include/linux/mm.h:26,
from mm/sparse.c:5:
mm/sparse.c: In function 'sparse_add_one_section':
include/linux/page-flags.h:159:29: warning: overflow in implicit constant conversion [-Woverflow]
#define PAGE_POISON_PATTERN ~0ul
^
>> mm/sparse.c:838:17: note: in expansion of macro 'PAGE_POISON_PATTERN'
memset(memmap, PAGE_POISON_PATTERN,
^~~~~~~~~~~~~~~~~~~
Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls64
Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u64
Cyclomatic Complexity 3 include/linux/log2.h:is_power_of_2
Cyclomatic Complexity 1 include/asm-generic/getorder.h:__get_order
Cyclomatic Complexity 3 include/linux/string.h:memset
Cyclomatic Complexity 1 include/linux/spinlock.h:spinlock_check
Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock_irqrestore
Cyclomatic Complexity 1 include/linux/nodemask.h:node_state
Cyclomatic Complexity 1 include/linux/memory_hotplug.h:pgdat_resize_lock
Cyclomatic Complexity 1 include/linux/memory_hotplug.h:pgdat_resize_unlock
Cyclomatic Complexity 1 include/linux/mmzone.h:pfn_to_section_nr
Cyclomatic Complexity 1 include/linux/mmzone.h:section_nr_to_pfn
Cyclomatic Complexity 3 include/linux/mmzone.h:__nr_to_section
Cyclomatic Complexity 1 include/linux/mmzone.h:__section_mem_map_addr
Cyclomatic Complexity 3 include/linux/mmzone.h:present_section
Cyclomatic Complexity 1 include/linux/mmzone.h:present_section_nr
Cyclomatic Complexity 3 include/linux/mmzone.h:valid_section
Cyclomatic Complexity 1 include/linux/mmzone.h:valid_section_nr
Cyclomatic Complexity 1 include/linux/mmzone.h:__pfn_to_section
Cyclomatic Complexity 2 include/linux/mmzone.h:pfn_present
Cyclomatic Complexity 1 arch/x86/include/asm/topology.h:numa_node_id
Cyclomatic Complexity 1 include/linux/topology.h:numa_mem_id
Cyclomatic Complexity 1 include/linux/mm.h:is_vmalloc_addr
Cyclomatic Complexity 1 include/linux/mm.h:page_to_section
Cyclomatic Complexity 28 include/linux/slab.h:kmalloc_index
Cyclomatic Complexity 1 include/linux/slab.h:__kmalloc_node
Cyclomatic Complexity 1 include/linux/slab.h:kmem_cache_alloc_node_trace
Cyclomatic Complexity 68 include/linux/slab.h:kmalloc_large
Cyclomatic Complexity 5 include/linux/slab.h:kmalloc
Cyclomatic Complexity 5 include/linux/slab.h:kmalloc_node
Cyclomatic Complexity 1 include/linux/slab.h:kzalloc_node
Cyclomatic Complexity 1 include/linux/bootmem.h:alloc_remap
Cyclomatic Complexity 1 mm/sparse.c:set_section_nid
Cyclomatic Complexity 1 mm/sparse.c:sparse_encode_early_nid
Cyclomatic Complexity 1 mm/sparse.c:sparse_early_nid
Cyclomatic Complexity 4 mm/sparse.c:next_present_section_nr
Cyclomatic Complexity 1 mm/sparse.c:check_usemap_section_nr
Cyclomatic Complexity 6 mm/sparse.c:alloc_usemap_and_memmap
Cyclomatic Complexity 1 include/linux/bootmem.h:memblock_virt_alloc
Cyclomatic Complexity 1 include/linux/bootmem.h:memblock_virt_alloc_node
Cyclomatic Complexity 2 mm/sparse.c:sparse_index_alloc
Cyclomatic Complexity 3 mm/sparse.c:sparse_index_init
Cyclomatic Complexity 1 include/linux/bootmem.h:memblock_virt_alloc_node_nopanic
Cyclomatic Complexity 1 mm/sparse.c:sparse_early_usemaps_alloc_pgdat_section
Cyclomatic Complexity 2 mm/sparse.c:sparse_encode_mem_map
Cyclomatic Complexity 2 mm/sparse.c:sparse_init_one_section
Cyclomatic Complexity 1 include/linux/bootmem.h:memblock_free_early
Cyclomatic Complexity 1 include/linux/gfp.h:__alloc_pages
Cyclomatic Complexity 2 include/linux/gfp.h:__alloc_pages_node
Cyclomatic Complexity 2 include/linux/gfp.h:alloc_pages_node
Cyclomatic Complexity 70 mm/sparse.c:__kmalloc_section_memmap
Cyclomatic Complexity 1 mm/sparse.c:kmalloc_section_memmap
Cyclomatic Complexity 2 mm/sparse.c:__kfree_section_memmap
Cyclomatic Complexity 3 mm/sparse.c:get_section_nid
Cyclomatic Complexity 5 mm/sparse.c:__section_nr
Cyclomatic Complexity 2 mm/sparse.c:section_mark_present
Cyclomatic Complexity 7 mm/sparse.c:mminit_validate_memmodel_limits
Cyclomatic Complexity 4 mm/sparse.c:node_memmap_size_bytes
Cyclomatic Complexity 4 mm/sparse.c:memory_present
Cyclomatic Complexity 1 mm/sparse.c:sparse_decode_mem_map
Cyclomatic Complexity 1 mm/sparse.c:usemap_size
Cyclomatic Complexity 4 mm/sparse.c:sparse_early_usemaps_alloc_node
Cyclomatic Complexity 1 mm/sparse.c:__kmalloc_section_usemap
Cyclomatic Complexity 2 mm/sparse.c:sparse_mem_map_populate
Cyclomatic Complexity 10 mm/sparse.c:sparse_mem_maps_populate_node
Cyclomatic Complexity 1 mm/sparse.c:sparse_early_mem_maps_alloc_node
Cyclomatic Complexity 1 mm/sparse.c:vmemmap_populate_print_last
Cyclomatic Complexity 6 mm/sparse.c:sparse_init
Cyclomatic Complexity 4 mm/sparse.c:online_mem_sections
Cyclomatic Complexity 6 mm/sparse.c:sparse_add_one_section

vim +/PAGE_POISON_PATTERN +838 mm/sparse.c

793
794 /*
795 * returns the number of sections whose mem_maps were properly
796 * set. If this is <=0, then that means that the passed-in
797 * map was not consumed and must be freed.
798 */
799 int __meminit sparse_add_one_section(struct pglist_data *pgdat,
800 unsigned long start_pfn, struct vmem_altmap *altmap)
801 {
802 unsigned long section_nr = pfn_to_section_nr(start_pfn);
803 struct mem_section *ms;
804 struct page *memmap;
805 unsigned long *usemap;
806 unsigned long flags;
807 int ret;
808
809 /*
810 * no locking for this, because it does its own
811 * plus, it does a kmalloc
812 */
813 ret = sparse_index_init(section_nr, pgdat->node_id);
814 if (ret < 0 && ret != -EEXIST)
815 return ret;
816 memmap = kmalloc_section_memmap(section_nr, pgdat->node_id, altmap);
817 if (!memmap)
818 return -ENOMEM;
819 usemap = __kmalloc_section_usemap();
820 if (!usemap) {
821 __kfree_section_memmap(memmap, altmap);
822 return -ENOMEM;
823 }
824
825 pgdat_resize_lock(pgdat, &flags);
826
827 ms = __pfn_to_section(start_pfn);
828 if (ms->section_mem_map & SECTION_MARKED_PRESENT) {
829 ret = -EEXIST;
830 goto out;
831 }
832
833 #ifdef CONFIG_DEBUG_VM
834 /*
835 * poison uninitialized struct pages in order to catch invalid flags
836 * combinations.
837 */
> 838 memset(memmap, PAGE_POISON_PATTERN,
839 sizeof(struct page) * PAGES_PER_SECTION);
840 #endif
841
842 section_mark_present(ms);
843
844 ret = sparse_init_one_section(ms, section_nr, memmap, usemap);
845
846 out:
847 pgdat_resize_unlock(pgdat, &flags);
848 if (ret <= 0) {
849 kfree(usemap);
850 __kfree_section_memmap(memmap, altmap);
851 } else {
852 set_section_nid(section_nr, pgdat->node_id);
853 }
854 return ret;
855 }
856

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (7.90 kB)
.config.gz (29.17 kB)
Download all attachments

2018-03-13 23:46:46

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking

On Wed, Jan 31, 2018 at 04:02:59PM -0500, Pavel Tatashin wrote:
>During boot we poison struct page memory in order to ensure that no one is
>accessing this memory until the struct pages are initialized in
>__init_single_page().
>
>This patch adds more scrutiny to this checking, by making sure that flags
>do not equal to poison pattern when the are accessed. The pattern is all
>ones.
>
>Since, node id is also stored in struct page, and may be accessed quiet
>early we add the enforcement into page_to_nid() function as well.
>
>Signed-off-by: Pavel Tatashin <[email protected]>
>---

Hey Pasha,

This patch is causing the following on boot:

[ 1.253732] BUG: unable to handle kernel paging request at fffffffffffffffe
[ 1.254000] PGD 2284e19067 P4D 2284e19067 PUD 2284e1b067 PMD 0
[ 1.254000] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 1.254000] Modules linked in:
[ 1.254000] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc5-next-20180313 #10
[ 1.254000] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017
[ 1.254000] RIP: 0010:__dump_page (??:?)
[ 1.254000] RSP: 0000:ffff881c63c17810 EFLAGS: 00010246
[ 1.254000] RAX: dffffc0000000000 RBX: ffffea0084000000 RCX: 1ffff1038c782f2b
[ 1.254000] RDX: 1fffffffffffffff RSI: ffffffff9e160640 RDI: ffffea0084000000
[ 1.254000] RBP: ffff881c63c17c00 R08: ffff8840107e8880 R09: ffffed0802167a4d
[ 1.254000] R10: 0000000000000001 R11: ffffed0802167a4c R12: 1ffff1038c782f07
[ 1.254000] R13: ffffea0084000020 R14: fffffffffffffffe R15: ffff881c63c17bd8
[ 1.254000] FS: 0000000000000000(0000) GS:ffff881c6ac00000(0000) knlGS:0000000000000000
[ 1.254000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.254000] CR2: fffffffffffffffe CR3: 0000002284e16000 CR4: 00000000003406e0
[ 1.254000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1.254000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1.254000] Call Trace:
[ 1.254000] dump_page (/mm/debug.c:80)
[ 1.254000] get_nid_for_pfn (/./include/linux/mm.h:900 /drivers/base/node.c:396)
[ 1.254000] register_mem_sect_under_node (/drivers/base/node.c:438)
[ 1.254000] link_mem_sections (/drivers/base/node.c:517)
[ 1.254000] topology_init (/./include/linux/nodemask.h:271 /arch/x86/kernel/topology.c:164)
[ 1.254000] do_one_initcall (/init/main.c:835)
[ 1.254000] kernel_init_freeable (/init/main.c:901 /init/main.c:909 /init/main.c:927 /init/main.c:1076)
[ 1.254000] kernel_init (/init/main.c:1004)
[ 1.254000] ret_from_fork (/arch/x86/entry/entry_64.S:417)
[ 1.254000] Code: ff a8 01 4c 0f 44 f3 4d 85 f6 0f 84 31 0e 00 00 4c 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 2d 11 00 00 <49> 83 3e ff 0f 84 a9 06 00 00 4d 8d b7 c0 fd ff ff 48 b8 00 00
All code
========
0: ff a8 01 4c 0f 44 ljmp *0x440f4c01(%rax)
6: f3 4d 85 f6 repz test %r14,%r14
a: 0f 84 31 0e 00 00 je 0xe41
10: 4c 89 f2 mov %r14,%rdx
13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
1a: fc ff df
1d: 48 c1 ea 03 shr $0x3,%rdx
21: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1)
25: 0f 85 2d 11 00 00 jne 0x1158
2b:* 49 83 3e ff cmpq $0xffffffffffffffff,(%r14) <-- trapping instruction
2f: 0f 84 a9 06 00 00 je 0x6de
35: 4d 8d b7 c0 fd ff ff lea -0x240(%r15),%r14
3c: 48 rex.W
3d: b8 .byte 0xb8
...

Code starting with the faulting instruction
===========================================
0: 49 83 3e ff cmpq $0xffffffffffffffff,(%r14)
4: 0f 84 a9 06 00 00 je 0x6b3
a: 4d 8d b7 c0 fd ff ff lea -0x240(%r15),%r14
11: 48 rex.W
12: b8 .byte 0xb8
...
[ 1.254000] RIP: __dump_page+0x1c8/0x13c0 RSP: ffff881c63c17810 (/./include/asm-generic/sections.h:42)
[ 1.254000] CR2: fffffffffffffffe
[ 1.254000] ---[ end trace e643dfbc44b562ca ]---

--

Thanks,
Sasha

2018-03-14 00:41:03

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking

Hi Sasha,

It seems the patch is doing the right thing, and it catches bugs. Here
we access uninitialized struct page. The question is why this happens?

register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
page_nid = get_nid_for_pfn(pfn);

node id is stored in page flags, and since struct page is poisoned,
and the pattern is recognized, the panic is triggered.

Do you have config file? Also, instructions how to reproduce it?

Thank you,
Pasha


On Tue, Mar 13, 2018 at 7:43 PM, Sasha Levin
<[email protected]> wrote:
> On Wed, Jan 31, 2018 at 04:02:59PM -0500, Pavel Tatashin wrote:
>>During boot we poison struct page memory in order to ensure that no one is
>>accessing this memory until the struct pages are initialized in
>>__init_single_page().
>>
>>This patch adds more scrutiny to this checking, by making sure that flags
>>do not equal to poison pattern when the are accessed. The pattern is all
>>ones.
>>
>>Since, node id is also stored in struct page, and may be accessed quiet
>>early we add the enforcement into page_to_nid() function as well.
>>
>>Signed-off-by: Pavel Tatashin <[email protected]>
>>---
>
> Hey Pasha,
>
> This patch is causing the following on boot:
>
> [ 1.253732] BUG: unable to handle kernel paging request at fffffffffffffffe
> [ 1.254000] PGD 2284e19067 P4D 2284e19067 PUD 2284e1b067 PMD 0
> [ 1.254000] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
> [ 1.254000] Modules linked in:
> [ 1.254000] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc5-next-20180313 #10
> [ 1.254000] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017
> [ 1.254000] RIP: 0010:__dump_page (??:?)
> [ 1.254000] RSP: 0000:ffff881c63c17810 EFLAGS: 00010246
> [ 1.254000] RAX: dffffc0000000000 RBX: ffffea0084000000 RCX: 1ffff1038c782f2b
> [ 1.254000] RDX: 1fffffffffffffff RSI: ffffffff9e160640 RDI: ffffea0084000000
> [ 1.254000] RBP: ffff881c63c17c00 R08: ffff8840107e8880 R09: ffffed0802167a4d
> [ 1.254000] R10: 0000000000000001 R11: ffffed0802167a4c R12: 1ffff1038c782f07
> [ 1.254000] R13: ffffea0084000020 R14: fffffffffffffffe R15: ffff881c63c17bd8
> [ 1.254000] FS: 0000000000000000(0000) GS:ffff881c6ac00000(0000) knlGS:0000000000000000
> [ 1.254000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1.254000] CR2: fffffffffffffffe CR3: 0000002284e16000 CR4: 00000000003406e0
> [ 1.254000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1.254000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1.254000] Call Trace:
> [ 1.254000] dump_page (/mm/debug.c:80)
> [ 1.254000] get_nid_for_pfn (/./include/linux/mm.h:900 /drivers/base/node.c:396)
> [ 1.254000] register_mem_sect_under_node (/drivers/base/node.c:438)
> [ 1.254000] link_mem_sections (/drivers/base/node.c:517)
> [ 1.254000] topology_init (/./include/linux/nodemask.h:271 /arch/x86/kernel/topology.c:164)
> [ 1.254000] do_one_initcall (/init/main.c:835)
> [ 1.254000] kernel_init_freeable (/init/main.c:901 /init/main.c:909 /init/main.c:927 /init/main.c:1076)
> [ 1.254000] kernel_init (/init/main.c:1004)
> [ 1.254000] ret_from_fork (/arch/x86/entry/entry_64.S:417)
> [ 1.254000] Code: ff a8 01 4c 0f 44 f3 4d 85 f6 0f 84 31 0e 00 00 4c 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 2d 11 00 00 <49> 83 3e ff 0f 84 a9 06 00 00 4d 8d b7 c0 fd ff ff 48 b8 00 00
> All code
> ========
> 0: ff a8 01 4c 0f 44 ljmp *0x440f4c01(%rax)
> 6: f3 4d 85 f6 repz test %r14,%r14
> a: 0f 84 31 0e 00 00 je 0xe41
> 10: 4c 89 f2 mov %r14,%rdx
> 13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
> 1a: fc ff df
> 1d: 48 c1 ea 03 shr $0x3,%rdx
> 21: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1)
> 25: 0f 85 2d 11 00 00 jne 0x1158
> 2b:* 49 83 3e ff cmpq $0xffffffffffffffff,(%r14) <-- trapping instruction
> 2f: 0f 84 a9 06 00 00 je 0x6de
> 35: 4d 8d b7 c0 fd ff ff lea -0x240(%r15),%r14
> 3c: 48 rex.W
> 3d: b8 .byte 0xb8
> ...
>
> Code starting with the faulting instruction
> ===========================================
> 0: 49 83 3e ff cmpq $0xffffffffffffffff,(%r14)
> 4: 0f 84 a9 06 00 00 je 0x6b3
> a: 4d 8d b7 c0 fd ff ff lea -0x240(%r15),%r14
> 11: 48 rex.W
> 12: b8 .byte 0xb8
> ...
> [ 1.254000] RIP: __dump_page+0x1c8/0x13c0 RSP: ffff881c63c17810 (/./include/asm-generic/sections.h:42)
> [ 1.254000] CR2: fffffffffffffffe
> [ 1.254000] ---[ end trace e643dfbc44b562ca ]---
>
> --
>
> Thanks,
> Sasha

2018-03-14 00:55:06

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking

On Tue, Mar 13, 2018 at 08:38:57PM -0400, Pavel Tatashin wrote:
>Hi Sasha,
>
>It seems the patch is doing the right thing, and it catches bugs. Here
>we access uninitialized struct page. The question is why this happens?

Not completely; note that we die on an invalid reference rather than
assertion failure.

>register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
> page_nid = get_nid_for_pfn(pfn);
>
>node id is stored in page flags, and since struct page is poisoned,
>and the pattern is recognized, the panic is triggered.
>
>Do you have config file? Also, instructions how to reproduce it?

Attached the config. It just happens on boot.

>Thank you,
>Pasha
>
>
>On Tue, Mar 13, 2018 at 7:43 PM, Sasha Levin
><[email protected]> wrote:
>> On Wed, Jan 31, 2018 at 04:02:59PM -0500, Pavel Tatashin wrote:
>>>During boot we poison struct page memory in order to ensure that no one is
>>>accessing this memory until the struct pages are initialized in
>>>__init_single_page().
>>>
>>>This patch adds more scrutiny to this checking, by making sure that flags
>>>do not equal to poison pattern when the are accessed. The pattern is all
>>>ones.
>>>
>>>Since, node id is also stored in struct page, and may be accessed quiet
>>>early we add the enforcement into page_to_nid() function as well.
>>>
>>>Signed-off-by: Pavel Tatashin <[email protected]>
>>>---
>>
>> Hey Pasha,
>>
>> This patch is causing the following on boot:
>>
>> [ 1.253732] BUG: unable to handle kernel paging request at fffffffffffffffe
>> [ 1.254000] PGD 2284e19067 P4D 2284e19067 PUD 2284e1b067 PMD 0
>> [ 1.254000] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
>> [ 1.254000] Modules linked in:
>> [ 1.254000] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc5-next-20180313 #10
>> [ 1.254000] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017
>> [ 1.254000] RIP: 0010:__dump_page (??:?)
>> [ 1.254000] RSP: 0000:ffff881c63c17810 EFLAGS: 00010246
>> [ 1.254000] RAX: dffffc0000000000 RBX: ffffea0084000000 RCX: 1ffff1038c782f2b
>> [ 1.254000] RDX: 1fffffffffffffff RSI: ffffffff9e160640 RDI: ffffea0084000000
>> [ 1.254000] RBP: ffff881c63c17c00 R08: ffff8840107e8880 R09: ffffed0802167a4d
>> [ 1.254000] R10: 0000000000000001 R11: ffffed0802167a4c R12: 1ffff1038c782f07
>> [ 1.254000] R13: ffffea0084000020 R14: fffffffffffffffe R15: ffff881c63c17bd8
>> [ 1.254000] FS: 0000000000000000(0000) GS:ffff881c6ac00000(0000) knlGS:0000000000000000
>> [ 1.254000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 1.254000] CR2: fffffffffffffffe CR3: 0000002284e16000 CR4: 00000000003406e0
>> [ 1.254000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 1.254000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 1.254000] Call Trace:
>> [ 1.254000] dump_page (/mm/debug.c:80)
>> [ 1.254000] get_nid_for_pfn (/./include/linux/mm.h:900 /drivers/base/node.c:396)
>> [ 1.254000] register_mem_sect_under_node (/drivers/base/node.c:438)
>> [ 1.254000] link_mem_sections (/drivers/base/node.c:517)
>> [ 1.254000] topology_init (/./include/linux/nodemask.h:271 /arch/x86/kernel/topology.c:164)
>> [ 1.254000] do_one_initcall (/init/main.c:835)
>> [ 1.254000] kernel_init_freeable (/init/main.c:901 /init/main.c:909 /init/main.c:927 /init/main.c:1076)
>> [ 1.254000] kernel_init (/init/main.c:1004)
>> [ 1.254000] ret_from_fork (/arch/x86/entry/entry_64.S:417)
>> [ 1.254000] Code: ff a8 01 4c 0f 44 f3 4d 85 f6 0f 84 31 0e 00 00 4c 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 2d 11 00 00 <49> 83 3e ff 0f 84 a9 06 00 00 4d 8d b7 c0 fd ff ff 48 b8 00 00
>> All code
>> ========
>> 0: ff a8 01 4c 0f 44 ljmp *0x440f4c01(%rax)
>> 6: f3 4d 85 f6 repz test %r14,%r14
>> a: 0f 84 31 0e 00 00 je 0xe41
>> 10: 4c 89 f2 mov %r14,%rdx
>> 13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
>> 1a: fc ff df
>> 1d: 48 c1 ea 03 shr $0x3,%rdx
>> 21: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1)
>> 25: 0f 85 2d 11 00 00 jne 0x1158
>> 2b:* 49 83 3e ff cmpq $0xffffffffffffffff,(%r14) <-- trapping instruction
>> 2f: 0f 84 a9 06 00 00 je 0x6de
>> 35: 4d 8d b7 c0 fd ff ff lea -0x240(%r15),%r14
>> 3c: 48 rex.W
>> 3d: b8 .byte 0xb8
>> ...
>>
>> Code starting with the faulting instruction
>> ===========================================
>> 0: 49 83 3e ff cmpq $0xffffffffffffffff,(%r14)
>> 4: 0f 84 a9 06 00 00 je 0x6b3
>> a: 4d 8d b7 c0 fd ff ff lea -0x240(%r15),%r14
>> 11: 48 rex.W
>> 12: b8 .byte 0xb8
>> ...
>> [ 1.254000] RIP: __dump_page+0x1c8/0x13c0 RSP: ffff881c63c17810 (/./include/asm-generic/sections.h:42)
>> [ 1.254000] CR2: fffffffffffffffe
>> [ 1.254000] ---[ end trace e643dfbc44b562ca ]---
>>
>> --
>>
>> Thanks,
>> Sasha

--

Thanks,
Sasha


Attachments:
config-sasha.gz (32.62 kB)
config-sasha.gz

2018-03-14 01:04:45

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking

On Tue, Mar 13, 2018 at 8:53 PM, Sasha Levin
<[email protected]> wrote:
> On Tue, Mar 13, 2018 at 08:38:57PM -0400, Pavel Tatashin wrote:
>>Hi Sasha,
>>
>>It seems the patch is doing the right thing, and it catches bugs. Here
>>we access uninitialized struct page. The question is why this happens?
>
> Not completely; note that we die on an invalid reference rather than
> assertion failure.

I think that invalid reference happens within assertion failure, as
far as I can tell, it is dump_page() where we get the invalid
reference, but to get to dump_page() from get_nid_for_pfn() we must
have triggered the assertion.

>
>>register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
>> page_nid = get_nid_for_pfn(pfn);
>>
>>node id is stored in page flags, and since struct page is poisoned,
>>and the pattern is recognized, the panic is triggered.
>>
>>Do you have config file? Also, instructions how to reproduce it?
>
> Attached the config. It just happens on boot.

Thanks, I will try in qemu.

Pasha

2018-03-15 19:07:50

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking

>
> Attached the config. It just happens on boot.

Hi Sasha,

I have tried unsuccessfully to reproduce the bug in qemu with 20G RAM,
and 8 CPUs.

Patch "mm: uninitialized struct page poisoning sanity" should be improved
to make dump_page() to detect poisoned struct page, and simply print hex
in such case. I will send an updated patch later.

How do you run this on Microsoft hypervisor? Do I need Windows 10 for
that?

BTW, I am going to be on vacation for the next two week (going to Israel),
so I may not be able to response quickly.

Thank you,
Pasha

2018-03-15 20:46:27

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking

On Thu, Mar 15, 2018 at 03:04:30PM -0400, Pavel Tatashin wrote:
>>
>> Attached the config. It just happens on boot.
>
>Hi Sasha,
>
>I have tried unsuccessfully to reproduce the bug in qemu with 20G RAM,
>and 8 CPUs.
>
>Patch "mm: uninitialized struct page poisoning sanity" should be improved
>to make dump_page() to detect poisoned struct page, and simply print hex
>in such case. I will send an updated patch later.
>
>How do you run this on Microsoft hypervisor? Do I need Windows 10 for
>that?

Booting a Linux VM on Azure would be the easiest, and free too :)

>BTW, I am going to be on vacation for the next two week (going to Israel),
>so I may not be able to response quickly.

Have fun!

We may need to hold off on getting this patch merged for the time being.

--

Thanks,
Sasha

2018-04-04 02:19:35

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking

On 18-03-15 20:43:14, Sasha Levin wrote:
> On Thu, Mar 15, 2018 at 03:04:30PM -0400, Pavel Tatashin wrote:
> >>
> >> Attached the config. It just happens on boot.
> >
> >Hi Sasha,
> >
> >I have tried unsuccessfully to reproduce the bug in qemu with 20G RAM,
> >and 8 CPUs.
> >
> >Patch "mm: uninitialized struct page poisoning sanity" should be improved
> >to make dump_page() to detect poisoned struct page, and simply print hex
> >in such case. I will send an updated patch later.
> >
> >How do you run this on Microsoft hypervisor? Do I need Windows 10 for
> >that?
>
> Booting a Linux VM on Azure would be the easiest, and free too :)

Hi Sasha,

I have registered on Azure's portal, and created a VM with 4 CPUs and 16G
of RAM. However, I still was not able to reproduce the boot bug you found.

Could you please try an updated patch that I sent out today, the panic
message should improve:

https://lkml.org/lkml/2018/4/3/583

Thank you,
Pasha

2018-04-05 13:52:45

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking

> Hi Sasha,
>
> I have registered on Azure's portal, and created a VM with 4 CPUs and 16G
> of RAM. However, I still was not able to reproduce the boot bug you found.

I have also tried to reproduce this issue on Windows 10 + Hyper-V, still
unsuccessful.

>
> Could you please try an updated patch that I sent out today, the panic
> message should improve:
>
> https://lkml.org/lkml/2018/4/3/583
>
> Thank you,
> Pasha
>

2018-04-05 19:24:31

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking

On Thu, Apr 05, 2018 at 09:49:40AM -0400, Pavel Tatashin wrote:
>> Hi Sasha,
>>
>> I have registered on Azure's portal, and created a VM with 4 CPUs and 16G
>> of RAM. However, I still was not able to reproduce the boot bug you found.
>
>I have also tried to reproduce this issue on Windows 10 + Hyper-V, still
>unsuccessful.

I'm not sure why you can't reproduce it. I built a 4.16 kernel + your 6
patches on top, and booting on a D64s_v3 instance gives me this:

[ 1.205726] page:ffffea0084000000 is uninitialized and poisoned
[ 1.205737] raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[ 1.207016] raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[ 1.208014] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
[ 1.209087] ------------[ cut here ]------------
[ 1.210000] kernel BUG at ./include/linux/mm.h:901!
[ 1.210015] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 1.211000] Modules linked in:
[ 1.211000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0+ #10
[ 1.211000] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017
[ 1.211000] RIP: 0010:get_nid_for_pfn+0x6e/0xa0
[ 1.211000] RSP: 0000:ffff881c63cbfc28 EFLAGS: 00010246
[ 1.211000] RAX: 0000000000000000 RBX: ffffea0084000000 RCX: 0000000000000000
[ 1.211000] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffed038c797f78
[ 1.211000] RBP: ffff881c63cbfc30 R08: ffff88401174a480 R09: 0000000000000000
[ 1.211000] R10: ffff8840e00d6040 R11: 0000000000000000 R12: 0000000002107fff
[ 1.211000] R13: fffffbfff4648234 R14: 0000000000000001 R15: 0000000000000001
[ 1.211000] FS: 0000000000000000(0000) GS:ffff881c6aa00000(0000) knlGS:0000000000000000
[ 1.211000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.211000] CR2: 0000000000000000 CR3: 0000002814216000 CR4: 00000000003406f0
[ 1.211000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1.211000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1.211000] Call Trace:
[ 1.211000] register_mem_sect_under_node+0x1a2/0x530
[ 1.211000] link_mem_sections+0x12d/0x200
[ 1.211000] topology_init+0xe6/0x178
[ 1.211000] ? enable_cpu0_hotplug+0x1a/0x1a
[ 1.211000] do_one_initcall+0xb0/0x31f
[ 1.211000] ? initcall_blacklisted+0x220/0x220
[ 1.211000] ? up_write+0x78/0x140
[ 1.211000] ? up_read+0x40/0x40
[ 1.211000] ? __asan_register_globals+0x30/0xa0
[ 1.211000] ? kasan_unpoison_shadow+0x35/0x50
[ 1.211000] kernel_init_freeable+0x69d/0x764
[ 1.211000] ? start_kernel+0x8fd/0x8fd
[ 1.211000] ? finish_task_switch+0x1b6/0x9c0
[ 1.211000] ? rest_init+0x120/0x120
[ 1.211000] kernel_init+0x13/0x150
[ 1.211000] ? rest_init+0x120/0x120
[ 1.211000] ret_from_fork+0x3a/0x50
[ 1.211000] Code: ff df 48 c1 ea 03 80 3c 02 00 75 34 48 8b 03 48 83 f8 ff 74 07 48 c1 e8 36 5b 5d c3 48 c7 c6 00 ca f5 9e 48 89 df e8 82 13 d5 fd <0f> 0b 48 c7 c7 00 24 2e a1 e8 05 36 c1 fe e8 af 07 ea fd eb ac
[ 1.211000] RIP: get_nid_for_pfn+0x6e/0xa0 RSP: ffff881c63cbfc28
[ 1.211017] ---[ end trace d86a03841f7ef229 ]---
[ 1.212020] ==================================================================
[ 1.213000] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x64c/0x810
[ 1.213000] Read of size 8 at addr ffff881c63cbfaf8 by task swapper/0/1
[ 1.213000]
[ 1.213000] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G D 4.16.0+ #10
[ 1.213000] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017
[ 1.213000] Call Trace:
[ 1.213000] dump_stack+0xe3/0x196
[ 1.213000] ? _atomic_dec_and_lock+0x31a/0x31a
[ 1.213000] ? vprintk_func+0x27/0x60
[ 1.213000] ? printk+0x9c/0xc3
[ 1.213000] ? show_regs_print_info+0x10/0x10
[ 1.213000] ? lock_acquire+0x760/0x760
[ 1.213000] ? update_stack_state+0x64c/0x810
[ 1.213000] print_address_description+0xe4/0x480
[ 1.213000] ? update_stack_state+0x64c/0x810
[ 1.213000] kasan_report+0x1d7/0x460
[ 1.213000] ? console_unlock+0x652/0xe90
[ 1.213000] ? update_stack_state+0x64c/0x810
[ 1.213000] __asan_report_load8_noabort+0x19/0x20
[ 1.213000] update_stack_state+0x64c/0x810
[ 1.213000] ? __read_once_size_nocheck.constprop.2+0x50/0x50
[ 1.213000] ? put_files_struct+0x2a4/0x390
[ 1.213000] ? unwind_next_frame+0x202/0x1230
[ 1.213000] unwind_next_frame+0x202/0x1230
[ 1.213000] ? unwind_dump+0x590/0x590
[ 1.213000] ? get_stack_info+0x42/0x3b0
[ 1.213000] ? debug_check_no_locks_freed+0x300/0x300
[ 1.213000] ? __unwind_start+0x170/0x380
[ 1.213000] __save_stack_trace+0x82/0x140
[ 1.213000] ? put_files_struct+0x2a4/0x390
[ 1.213000] save_stack_trace+0x39/0x70
[ 1.213000] save_stack+0x43/0xd0
[ 1.213000] ? save_stack+0x43/0xd0
[ 1.213000] ? __kasan_slab_free+0x11f/0x170
[ 1.213000] ? kasan_slab_free+0xe/0x10
[ 1.213000] ? kmem_cache_free+0xe6/0x560
[ 1.213000] ? put_files_struct+0x2a4/0x390
[ 1.213000] ? _get_random_bytes+0x162/0x5a0
[ 1.213000] ? trace_hardirqs_off+0xd/0x10
[ 1.213000] ? lock_acquire+0x212/0x760
[ 1.213000] ? rcuwait_wake_up+0x15e/0x2c0
[ 1.213000] ? lock_acquire+0x212/0x760
[ 1.213000] ? free_obj_work+0x8a0/0x8a0
[ 1.213000] ? lock_acquire+0x212/0x760
[ 1.213000] ? acct_collect+0x776/0xe80
[ 1.213000] ? acct_collect+0x2e4/0xe80
[ 1.213000] ? acct_collect+0x2e4/0xe80
[ 1.213000] ? lock_acquire+0x760/0x760
[ 1.213000] ? lock_downgrade+0x910/0x910
[ 1.213000] __kasan_slab_free+0x11f/0x170
[ 1.213000] ? put_files_struct+0x2a4/0x390
[ 1.213000] kasan_slab_free+0xe/0x10
[ 1.213000] kmem_cache_free+0xe6/0x560
[ 1.213000] put_files_struct+0x2a4/0x390
[ 1.213000] ? get_files_struct+0x80/0x80
[ 1.213000] ? do_raw_spin_trylock+0x1f0/0x1f0
[ 1.213000] exit_files+0x83/0xc0
[ 1.213000] do_exit+0x9be/0x2190
[ 1.213000] ? do_invalid_op+0x20/0x30
[ 1.213000] ? mm_update_next_owner+0x1200/0x1200
[ 1.213000] ? get_nid_for_pfn+0x6e/0xa0
[ 1.213000] ? get_nid_for_pfn+0x6e/0xa0
[ 1.213000] ? register_mem_sect_under_node+0x1a2/0x530
[ 1.213000] ? link_mem_sections+0x12d/0x200
[ 1.213000] ? topology_init+0xe6/0x178
[ 1.213000] ? enable_cpu0_hotplug+0x1a/0x1a
[ 1.213000] ? do_one_initcall+0xb0/0x31f
[ 1.213000] ? initcall_blacklisted+0x220/0x220
[ 1.213000] ? up_write+0x78/0x140
[ 1.213000] ? up_read+0x40/0x40
[ 1.213000] ? __asan_register_globals+0x30/0xa0
[ 1.213000] ? kasan_unpoison_shadow+0x35/0x50
[ 1.213000] ? kernel_init_freeable+0x69d/0x764
[ 1.213000] ? start_kernel+0x8fd/0x8fd
[ 1.213000] ? finish_task_switch+0x1b6/0x9c0
[ 1.213000] ? rest_init+0x120/0x120
[ 1.213000] rewind_stack_do_exit+0x17/0x20
[ 1.213000]
[ 1.213000] The buggy address belongs to the page:
[ 1.213000] page:ffffea00718f2fc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 1.213000] flags: 0x17ffffc0000000()
[ 1.213000] raw: 0017ffffc0000000 0000000000000000 0000000000000000 00000000ffffffff
[ 1.213000] raw: ffffea00718f2fe0 ffffea00718f2fe0 0000000000000000 0000000000000000
[ 1.213000] page dumped because: kasan: bad access detected
[ 1.213000]
[ 1.213000] Memory state around the buggy address:
[ 1.213000] ffff881c63cbf980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1.213000] ffff881c63cbfa00: 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1
[ 1.213000] >ffff881c63cbfa80: f1 f8 f2 f2 f2 00 00 00 00 00 00 00 00 00 f3 f3
[ 1.213000] ^
[ 1.213000] ffff881c63cbfb00: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1.213000] ffff881c63cbfb80: f1 f1 f1 f1 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2
[ 1.213000] ==================================================================
[ 1.213033] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 1.213033]
[ 1.214000] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

2018-04-06 12:47:20

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking

On 18-04-05 19:22:58, Sasha Levin wrote:
> On Thu, Apr 05, 2018 at 09:49:40AM -0400, Pavel Tatashin wrote:
> >> Hi Sasha,
> >>
> >> I have registered on Azure's portal, and created a VM with 4 CPUs and 16G
> >> of RAM. However, I still was not able to reproduce the boot bug you found.
> >
> >I have also tried to reproduce this issue on Windows 10 + Hyper-V, still
> >unsuccessful.
>
> I'm not sure why you can't reproduce it. I built a 4.16 kernel + your 6
> patches on top, and booting on a D64s_v3 instance gives me this:

Hi Sasha,

Thank you for running it again, the new trace is cleaner, as we do not get
nested panics within dump_page.

Perhaps a NUMA is required to reproduce this issue. I have tried,
unsuccessfully, on D4S_V3. This is the largest VM allowed with free trial,
as 4 CPU is the limit. D64S_V3 is with 64 CPUs and over $2K a month! :)

Let me study your trace, perhaps I will able to figure out the issue
without reproducing it.

Thank you,
Pasha

2018-04-07 14:49:54

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking

> Let me study your trace, perhaps I will able to figure out the issue
> without reproducing it.

Hi Sasha,

I've been studying this problem more. The issue happens in this stack:

...subsys_init...
topology_init()
register_one_node(nid)
link_mem_sections(nid, pgdat->node_start_pfn, pgdat->node_spanned_pages)
register_mem_sect_under_node(mem_blk, nid)
get_nid_for_pfn(pfn)
pfn_to_nid(pfn)
page_to_nid(page)
PF_POISONED_CHECK(page)

We are trying to get nid from struct page which has not been
initialized. My patches add this extra scrutiny to make sure that we
never get invalid nid from a "struct page" by adding
PF_POISONED_CHECK() to page_to_nid(). So, the bug already exists in
Linux where incorrect nid is read. The question is why does happen?

First, I thought, that perhaps struct page is not yet initialized.
But, the initcalls are done after deferred pages are initialized, and
thus every struct page must be initialized by now. Also, if deferred
pages were enabled, we would take a slightly different path and avoid
this bug by getting nid from memblock instead of struct page:

get_nid_for_pfn(pfn)
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
if (system_state < SYSTEM_RUNNING)
return early_pfn_to_nid(pfn);
#endif

I also verified in your config that CONFIG_DEFERRED_STRUCT_PAGE_INIT
is not set. So, one way to fix this issue, is to remove this "#ifdef"
(I have not checked for dependancies), but this is simply addressing
symptom, not fixing the actual issue.

Thus, we have a "struct page" backing memory for this pfn, but we have
not initialized it. For some reason memmap_init_zone() decided to skip
it, and I am not sure why. Looking at the code we skip initializing
if:
!early_pfn_valid(pfn)) aka !pfn_valid(pfn) and if !early_pfn_in_nid(pfn, nid).

I suspect, this has something to do with !pfn_valid(pfn). But, without
having a machine on which I could reproduce this problem, I cannot
study it further to determine exactly why pfn is not valid.

Please replace !pfn_valid_within() with !pfn_valid() in
get_nid_for_pfn() and see if problem still happens. If it does not
happen, lets study the memory map, pgdata's start end, and the value
of this pfn.

Thank you,
Pasha