This series adds a test validation for architecture exported page table
helpers. Patch in the series adds basic transformation tests at various
levels of the page table.
This test was originally suggested by Catalin during arm64 THP migration
RFC discussion earlier. Going forward it can include more specific tests
with respect to various generic MM functions like THP, HugeTLB etc and
platform specific tests.
https://lore.kernel.org/linux-mm/[email protected]/
Questions:
Should alloc_gigantic_page() be made available as an interface for general
use in the kernel. The test module here uses very similar implementation from
HugeTLB to allocate a PUD aligned memory block. Similar for mm_alloc() which
needs to be exported through a header.
Matthew Wilcox had expressed concerns regarding memory allocation for mapped
page table entries at various level. He also suggested using synethetic pfns
which can be derived from virtual address of a known kernel text symbol like
kernel_init(). But as discussed previously, it seems like allocated memory
might still outweigh synthetic pfns. This proposal goes with allocated memory
but if there is a broader agreement with respect to synthetic pfns, will be
happy to rework the test.
Testing:
Build and boot tested on arm64 and x86 platforms. While arm64 clears all
these tests, following errors were reported on x86.
1. WARN_ON(pud_bad(pud)) in pud_populate_tests()
2. WARN_ON(p4d_bad(p4d)) in p4d_populate_tests()
I would really appreciate if folks can help validate this test on other
platforms and report back problems if any. Suggestions, comments and
inputs welcome. Thank you.
Changes in V3:
- Added fallback mechanism for PMD aligned memory allocation failure
Changes in V2:
https://lore.kernel.org/linux-mm/[email protected]/T/#u
- Moved test module and it's config from lib/ to mm/
- Renamed config TEST_ARCH_PGTABLE as DEBUG_ARCH_PGTABLE_TEST
- Renamed file from test_arch_pgtable.c to arch_pgtable_test.c
- Added relevant MODULE_DESCRIPTION() and MODULE_AUTHOR() details
- Dropped loadable module config option
- Basic tests now use memory blocks with required size and alignment
- PUD aligned memory block gets allocated with alloc_contig_range()
- If PUD aligned memory could not be allocated it falls back on PMD aligned
memory block from page allocator and pud_* tests are skipped
- Clear and populate tests now operate on real in memory page table entries
- Dummy mm_struct gets allocated with mm_alloc()
- Dummy page table entries get allocated with [pud|pmd|pte]_alloc_[map]()
- Simplified [p4d|pgd]_basic_tests(), now has random values in the entries
RFC V1:
https://lore.kernel.org/linux-mm/[email protected]/
Cc: Andrew Morton <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Steven Price <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Sri Krishna chowdary <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Russell King - ARM Linux <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Anshuman Khandual (1):
mm/pgtable/debug: Add test validating architecture page table helpers
mm/Kconfig.debug | 14 ++
mm/Makefile | 1 +
mm/arch_pgtable_test.c | 425 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 440 insertions(+)
create mode 100644 mm/arch_pgtable_test.c
--
2.20.1
This adds a test module which will validate architecture page table helpers
and accessors regarding compliance with generic MM semantics expectations.
This will help various architectures in validating changes to the existing
page table helpers or addition of new ones.
Test page table and memory pages creating it's entries at various level are
all allocated from system memory with required alignments. If memory pages
with required size and alignment could not be allocated, then all depending
individual tests are skipped.
Cc: Andrew Morton <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Steven Price <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Sri Krishna chowdary <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Russell King - ARM Linux <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Suggested-by: Catalin Marinas <[email protected]>
Signed-off-by: Anshuman Khandual <[email protected]>
---
mm/Kconfig.debug | 14 ++
mm/Makefile | 1 +
mm/arch_pgtable_test.c | 425 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 440 insertions(+)
create mode 100644 mm/arch_pgtable_test.c
diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index 327b3ebf23bf..ce9c397f7b07 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -117,3 +117,17 @@ config DEBUG_RODATA_TEST
depends on STRICT_KERNEL_RWX
---help---
This option enables a testcase for the setting rodata read-only.
+
+config DEBUG_ARCH_PGTABLE_TEST
+ bool "Test arch page table helpers for semantics compliance"
+ depends on MMU
+ depends on DEBUG_KERNEL
+ help
+ This options provides a kernel module which can be used to test
+ architecture page table helper functions on various platform in
+ verifying if they comply with expected generic MM semantics. This
+ will help architectures code in making sure that any changes or
+ new additions of these helpers will still conform to generic MM
+ expected semantics.
+
+ If unsure, say N.
diff --git a/mm/Makefile b/mm/Makefile
index d996846697ef..bb572c5aa8c5 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -86,6 +86,7 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
obj-$(CONFIG_DEBUG_RODATA_TEST) += rodata_test.o
+obj-$(CONFIG_DEBUG_ARCH_PGTABLE_TEST) += arch_pgtable_test.o
obj-$(CONFIG_PAGE_OWNER) += page_owner.o
obj-$(CONFIG_CLEANCACHE) += cleancache.o
obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
diff --git a/mm/arch_pgtable_test.c b/mm/arch_pgtable_test.c
new file mode 100644
index 000000000000..f15be8a73723
--- /dev/null
+++ b/mm/arch_pgtable_test.c
@@ -0,0 +1,425 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * This kernel module validates architecture page table helpers &
+ * accessors and helps in verifying their continued compliance with
+ * generic MM semantics.
+ *
+ * Copyright (C) 2019 ARM Ltd.
+ *
+ * Author: Anshuman Khandual <[email protected]>
+ */
+#define pr_fmt(fmt) "arch_pgtable_test: %s " fmt, __func__
+
+#include <linux/kernel.h>
+#include <linux/hugetlb.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
+#include <linux/pfn_t.h>
+#include <linux/gfp.h>
+#include <linux/spinlock.h>
+#include <linux/sched/mm.h>
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+
+/*
+ * Basic operations
+ *
+ * mkold(entry) = An old and not a young entry
+ * mkyoung(entry) = A young and not an old entry
+ * mkdirty(entry) = A dirty and not a clean entry
+ * mkclean(entry) = A clean and not a dirty entry
+ * mkwrite(entry) = A write and not a write protected entry
+ * wrprotect(entry) = A write protected and not a write entry
+ * pxx_bad(entry) = A mapped and non-table entry
+ * pxx_same(entry1, entry2) = Both entries hold the exact same value
+ */
+#define VADDR_TEST (PGDIR_SIZE + PUD_SIZE + PMD_SIZE + PAGE_SIZE)
+#define VMA_TEST_FLAGS (VM_READ|VM_WRITE|VM_EXEC)
+#define RANDOM_NZVALUE (0xbe)
+
+static bool pud_aligned;
+static bool pmd_aligned;
+
+extern struct mm_struct *mm_alloc(void);
+
+static void pte_basic_tests(struct page *page, pgprot_t prot)
+{
+ pte_t pte = mk_pte(page, prot);
+
+ WARN_ON(!pte_same(pte, pte));
+ WARN_ON(!pte_young(pte_mkyoung(pte)));
+ WARN_ON(!pte_dirty(pte_mkdirty(pte)));
+ WARN_ON(!pte_write(pte_mkwrite(pte)));
+ WARN_ON(pte_young(pte_mkold(pte)));
+ WARN_ON(pte_dirty(pte_mkclean(pte)));
+ WARN_ON(pte_write(pte_wrprotect(pte)));
+}
+
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
+static void pmd_basic_tests(struct page *page, pgprot_t prot)
+{
+ pmd_t pmd;
+
+ /*
+ * Memory block here must be PMD_SIZE aligned. Abort this
+ * test in case we could not allocate such a memory block.
+ */
+ if (!pmd_aligned) {
+ pr_warn("Could not proceed with PMD tests\n");
+ return;
+ }
+
+ pmd = mk_pmd(page, prot);
+ WARN_ON(!pmd_same(pmd, pmd));
+ WARN_ON(!pmd_young(pmd_mkyoung(pmd)));
+ WARN_ON(!pmd_dirty(pmd_mkdirty(pmd)));
+ WARN_ON(!pmd_write(pmd_mkwrite(pmd)));
+ WARN_ON(pmd_young(pmd_mkold(pmd)));
+ WARN_ON(pmd_dirty(pmd_mkclean(pmd)));
+ WARN_ON(pmd_write(pmd_wrprotect(pmd)));
+ /*
+ * A huge page does not point to next level page table
+ * entry. Hence this must qualify as pmd_bad().
+ */
+ WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
+}
+#else
+static void pmd_basic_tests(struct page *page, pgprot_t prot) { }
+#endif
+
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
+static void pud_basic_tests(struct page *page, pgprot_t prot)
+{
+ pud_t pud;
+
+ /*
+ * Memory block here must be PUD_SIZE aligned. Abort this
+ * test in case we could not allocate such a memory block.
+ */
+ if (!pud_aligned) {
+ pr_warn("Could not proceed with PUD tests\n");
+ return;
+ }
+
+ pud = pfn_pud(page_to_pfn(page), prot);
+ WARN_ON(!pud_same(pud, pud));
+ WARN_ON(!pud_young(pud_mkyoung(pud)));
+ WARN_ON(!pud_write(pud_mkwrite(pud)));
+ WARN_ON(pud_write(pud_wrprotect(pud)));
+ WARN_ON(pud_young(pud_mkold(pud)));
+
+#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
+ /*
+ * A huge page does not point to next level page table
+ * entry. Hence this must qualify as pud_bad().
+ */
+ WARN_ON(!pud_bad(pud_mkhuge(pud)));
+#endif
+}
+#else
+static void pud_basic_tests(struct page *page, pgprot_t prot) { }
+#endif
+
+static void p4d_basic_tests(struct page *page, pgprot_t prot)
+{
+ p4d_t p4d;
+
+ memset(&p4d, RANDOM_NZVALUE, sizeof(p4d_t));
+ WARN_ON(!p4d_same(p4d, p4d));
+}
+
+static void pgd_basic_tests(struct page *page, pgprot_t prot)
+{
+ pgd_t pgd;
+
+ memset(&pgd, RANDOM_NZVALUE, sizeof(pgd_t));
+ WARN_ON(!pgd_same(pgd, pgd));
+}
+
+#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
+static void pud_clear_tests(pud_t *pudp)
+{
+ memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
+ pud_clear(pudp);
+ WARN_ON(!pud_none(READ_ONCE(*pudp)));
+}
+
+static void pud_populate_tests(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
+{
+ /*
+ * This entry points to next level page table page.
+ * Hence this must not qualify as pud_bad().
+ */
+ pmd_clear(pmdp);
+ pud_clear(pudp);
+ pud_populate(mm, pudp, pmdp);
+ WARN_ON(pud_bad(READ_ONCE(*pudp)));
+}
+#else
+static void pud_clear_tests(pud_t *pudp) { }
+static void pud_populate_tests(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
+{
+}
+#endif
+
+#if !defined(__PAGETABLE_PUD_FOLDED) && !defined(__ARCH_HAS_5LEVEL_HACK)
+static void p4d_clear_tests(p4d_t *p4dp)
+{
+ memset(p4dp, RANDOM_NZVALUE, sizeof(p4d_t));
+ p4d_clear(p4dp);
+ WARN_ON(!p4d_none(READ_ONCE(*p4dp)));
+}
+
+static void p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
+{
+ /*
+ * This entry points to next level page table page.
+ * Hence this must not qualify as p4d_bad().
+ */
+ pud_clear(pudp);
+ p4d_clear(p4dp);
+ p4d_populate(mm, p4dp, pudp);
+ WARN_ON(p4d_bad(READ_ONCE(*p4dp)));
+}
+#else
+static void p4d_clear_tests(p4d_t *p4dp) { }
+static void p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
+{
+}
+#endif
+
+#ifndef __PAGETABLE_P4D_FOLDED
+static void pgd_clear_tests(pgd_t *pgdp)
+{
+ memset(pgdp, RANDOM_NZVALUE, sizeof(pgd_t));
+ pgd_clear(pgdp);
+ WARN_ON(!pgd_none(READ_ONCE(*pgdp)));
+}
+
+static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
+{
+ /*
+ * This entry points to next level page table page.
+ * Hence this must not qualify as pgd_bad().
+ */
+ p4d_clear(p4dp);
+ pgd_clear(pgdp);
+ pgd_populate(mm, pgdp, p4dp);
+ WARN_ON(pgd_bad(READ_ONCE(*pgdp)));
+}
+#else
+static void pgd_clear_tests(pgd_t *pgdp) { }
+static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
+{
+}
+#endif
+
+static void pte_clear_tests(pte_t *ptep)
+{
+ memset(ptep, RANDOM_NZVALUE, sizeof(pte_t));
+ pte_clear(NULL, 0, ptep);
+ WARN_ON(!pte_none(READ_ONCE(*ptep)));
+}
+
+static void pmd_clear_tests(pmd_t *pmdp)
+{
+ memset(pmdp, RANDOM_NZVALUE, sizeof(pmd_t));
+ pmd_clear(pmdp);
+ WARN_ON(!pmd_none(READ_ONCE(*pmdp)));
+}
+
+static void pmd_populate_tests(struct mm_struct *mm, pmd_t *pmdp,
+ pgtable_t pgtable)
+{
+ /*
+ * This entry points to next level page table page.
+ * Hence this must not qualify as pmd_bad().
+ */
+ pmd_clear(pmdp);
+ pmd_populate(mm, pmdp, pgtable);
+ WARN_ON(pmd_bad(READ_ONCE(*pmdp)));
+}
+
+static bool pfn_range_valid(struct zone *z, unsigned long start_pfn,
+ unsigned long nr_pages)
+{
+ unsigned long i, end_pfn = start_pfn + nr_pages;
+ struct page *page;
+
+ for (i = start_pfn; i < end_pfn; i++) {
+ if (!pfn_valid(i))
+ return false;
+
+ page = pfn_to_page(i);
+
+ if (page_zone(page) != z)
+ return false;
+
+ if (PageReserved(page))
+ return false;
+
+ if (page_count(page) > 0)
+ return false;
+
+ if (PageHuge(page))
+ return false;
+ }
+ return true;
+}
+
+static struct page *alloc_gigantic_page(nodemask_t *nodemask,
+ int nid, gfp_t gfp_mask, int order)
+{
+ struct zonelist *zonelist;
+ struct zone *zone;
+ struct zoneref *z;
+ enum zone_type zonesel;
+ unsigned long ret, pfn, flags, nr_pages;
+
+ nr_pages = 1UL << order;
+ zonesel = gfp_zone(gfp_mask);
+ zonelist = node_zonelist(nid, gfp_mask);
+ for_each_zone_zonelist_nodemask(zone, z, zonelist, zonesel, nodemask) {
+ spin_lock_irqsave(&zone->lock, flags);
+ pfn = ALIGN(zone->zone_start_pfn, nr_pages);
+ while (zone_spans_pfn(zone, pfn + nr_pages - 1)) {
+ if (pfn_range_valid(zone, pfn, nr_pages)) {
+ spin_unlock_irqrestore(&zone->lock, flags);
+ ret = alloc_contig_range(pfn, pfn + nr_pages,
+ MIGRATE_MOVABLE,
+ gfp_mask);
+ if (!ret)
+ return pfn_to_page(pfn);
+ spin_lock_irqsave(&zone->lock, flags);
+ }
+ pfn += nr_pages;
+ }
+ spin_unlock_irqrestore(&zone->lock, flags);
+ }
+ return NULL;
+}
+
+static struct page *alloc_mapped_page(void)
+{
+ gfp_t gfp_mask = GFP_KERNEL | __GFP_ZERO;
+ struct page *page = NULL;
+
+ page = alloc_gigantic_page(&node_states[N_MEMORY], first_memory_node,
+ gfp_mask, get_order(PUD_SIZE));
+ if (page) {
+ pud_aligned = true;
+ pmd_aligned = true;
+ return page;
+ }
+
+ page = alloc_pages(gfp_mask, get_order(PMD_SIZE));
+ if (page) {
+ pmd_aligned = true;
+ return page;
+ }
+ return alloc_page(gfp_mask);
+}
+
+static void free_mapped_page(struct page *page)
+{
+ if (pud_aligned) {
+ unsigned long pfn = page_to_pfn(page);
+
+ free_contig_range(pfn, 1ULL << get_order(PUD_SIZE));
+ return;
+ }
+
+ if (pmd_aligned) {
+ int order = get_order(PMD_SIZE);
+
+ free_pages((unsigned long)page_address(page), order);
+ return;
+ }
+ free_page((unsigned long)page_address(page));
+}
+
+static int __init arch_pgtable_tests_init(void)
+{
+ struct mm_struct *mm;
+ struct page *page;
+ pgd_t *pgdp;
+ p4d_t *p4dp, *saved_p4dp;
+ pud_t *pudp, *saved_pudp;
+ pmd_t *pmdp, *saved_pmdp;
+ pte_t *ptep, *saved_ptep;
+ pgprot_t prot = vm_get_page_prot(VMA_TEST_FLAGS);
+ unsigned long vaddr = VADDR_TEST;
+
+ mm = mm_alloc();
+ if (!mm) {
+ pr_err("mm_struct allocation failed\n");
+ return 1;
+ }
+
+ page = alloc_mapped_page();
+ if (!page) {
+ pr_err("memory allocation failed\n");
+ return 1;
+ }
+
+ pgdp = pgd_offset(mm, vaddr);
+ p4dp = p4d_alloc(mm, pgdp, vaddr);
+ pudp = pud_alloc(mm, p4dp, vaddr);
+ pmdp = pmd_alloc(mm, pudp, vaddr);
+ ptep = pte_alloc_map(mm, pmdp, vaddr);
+
+ /*
+ * Save all the page table page addresses as the page table
+ * entries will be used for testing with random or garbage
+ * values. These saved addresses will be used for freeing
+ * page table pages.
+ */
+ saved_p4dp = p4d_offset(pgdp, 0UL);
+ saved_pudp = pud_offset(p4dp, 0UL);
+ saved_pmdp = pmd_offset(pudp, 0UL);
+ saved_ptep = pte_offset_map(pmdp, 0UL);
+
+ pte_basic_tests(page, prot);
+ pmd_basic_tests(page, prot);
+ pud_basic_tests(page, prot);
+ p4d_basic_tests(page, prot);
+ pgd_basic_tests(page, prot);
+
+ pte_clear_tests(ptep);
+ pmd_clear_tests(pmdp);
+ pud_clear_tests(pudp);
+ p4d_clear_tests(p4dp);
+ pgd_clear_tests(pgdp);
+
+ pmd_populate_tests(mm, pmdp, (pgtable_t) page);
+ pud_populate_tests(mm, pudp, pmdp);
+ p4d_populate_tests(mm, p4dp, pudp);
+ pgd_populate_tests(mm, pgdp, p4dp);
+
+ p4d_free(mm, saved_p4dp);
+ pud_free(mm, saved_pudp);
+ pmd_free(mm, saved_pmdp);
+ pte_free(mm, (pgtable_t) virt_to_page(saved_ptep));
+
+ mm_dec_nr_puds(mm);
+ mm_dec_nr_pmds(mm);
+ mm_dec_nr_ptes(mm);
+ __mmdrop(mm);
+
+ free_mapped_page(page);
+ return 0;
+}
+
+static void __exit arch_pgtable_tests_exit(void) { }
+
+module_init(arch_pgtable_tests_init);
+module_exit(arch_pgtable_tests_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Anshuman Khandual <[email protected]>");
+MODULE_DESCRIPTION("Test archicture page table helpers");
--
2.20.1
Hi Anshuman,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linus/master]
[cannot apply to v5.3-rc7 next-20190902]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-debug-Add-tests-for-architecture-exported-page-table-helpers/20190903-162959
config: m68k-allmodconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 7.4.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.4.0 make.cross ARCH=m68k
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <[email protected]>
All error/warnings (new ones prefixed by >>):
In file included from arch/m68k/include/asm/bug.h:32:0,
from include/linux/bug.h:5,
from include/linux/thread_info.h:12,
from include/asm-generic/preempt.h:5,
from ./arch/m68k/include/generated/asm/preempt.h:1,
from include/linux/preempt.h:78,
from arch/m68k/include/asm/irqflags.h:6,
from include/linux/irqflags.h:16,
from arch/m68k/include/asm/atomic.h:6,
from include/linux/atomic.h:7,
from include/linux/mm_types_task.h:13,
from include/linux/mm_types.h:5,
from include/linux/hugetlb.h:5,
from mm/arch_pgtable_test.c:14:
mm/arch_pgtable_test.c: In function 'pmd_clear_tests':
>> arch/m68k/include/asm/page.h:31:22: error: lvalue required as unary '&' operand
#define pmd_val(x) ((&x)->pmd[0])
^
include/asm-generic/bug.h:124:25: note: in definition of macro 'WARN_ON'
int __ret_warn_on = !!(condition); \
^~~~~~~~~
>> arch/m68k/include/asm/motorola_pgtable.h:138:26: note: in expansion of macro 'pmd_val'
#define pmd_none(pmd) (!pmd_val(pmd))
^~~~~~~
>> mm/arch_pgtable_test.c:233:11: note: in expansion of macro 'pmd_none'
WARN_ON(!pmd_none(READ_ONCE(*pmdp)));
^~~~~~~~
mm/arch_pgtable_test.c: In function 'pmd_populate_tests':
>> arch/m68k/include/asm/page.h:31:22: error: lvalue required as unary '&' operand
#define pmd_val(x) ((&x)->pmd[0])
^
include/asm-generic/bug.h:124:25: note: in definition of macro 'WARN_ON'
int __ret_warn_on = !!(condition); \
^~~~~~~~~
arch/m68k/include/asm/motorola_pgtable.h:139:25: note: in expansion of macro 'pmd_val'
#define pmd_bad(pmd) ((pmd_val(pmd) & _DESCTYPE_MASK) != _PAGE_TABLE)
^~~~~~~
>> mm/arch_pgtable_test.c:245:10: note: in expansion of macro 'pmd_bad'
WARN_ON(pmd_bad(READ_ONCE(*pmdp)));
^~~~~~~
vim +/pmd_none +233 mm/arch_pgtable_test.c
228
229 static void pmd_clear_tests(pmd_t *pmdp)
230 {
231 memset(pmdp, RANDOM_NZVALUE, sizeof(pmd_t));
232 pmd_clear(pmdp);
> 233 WARN_ON(!pmd_none(READ_ONCE(*pmdp)));
234 }
235
236 static void pmd_populate_tests(struct mm_struct *mm, pmd_t *pmdp,
237 pgtable_t pgtable)
238 {
239 /*
240 * This entry points to next level page table page.
241 * Hence this must not qualify as pmd_bad().
242 */
243 pmd_clear(pmdp);
244 pmd_populate(mm, pmdp, pgtable);
> 245 WARN_ON(pmd_bad(READ_ONCE(*pmdp)));
246 }
247
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
On 09/03/2019 04:43 PM, kbuild test robot wrote:
> Hi Anshuman,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on linus/master]
> [cannot apply to v5.3-rc7 next-20190902]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url: https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-debug-Add-tests-for-architecture-exported-page-table-helpers/20190903-162959
> config: m68k-allmodconfig (attached as .config)
> compiler: m68k-linux-gcc (GCC) 7.4.0
> reproduce:
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> GCC_VERSION=7.4.0 make.cross ARCH=m68k
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <[email protected]>
>
> All error/warnings (new ones prefixed by >>):
>
> In file included from arch/m68k/include/asm/bug.h:32:0,
> from include/linux/bug.h:5,
> from include/linux/thread_info.h:12,
> from include/asm-generic/preempt.h:5,
> from ./arch/m68k/include/generated/asm/preempt.h:1,
> from include/linux/preempt.h:78,
> from arch/m68k/include/asm/irqflags.h:6,
> from include/linux/irqflags.h:16,
> from arch/m68k/include/asm/atomic.h:6,
> from include/linux/atomic.h:7,
> from include/linux/mm_types_task.h:13,
> from include/linux/mm_types.h:5,
> from include/linux/hugetlb.h:5,
> from mm/arch_pgtable_test.c:14:
> mm/arch_pgtable_test.c: In function 'pmd_clear_tests':
>>> arch/m68k/include/asm/page.h:31:22: error: lvalue required as unary '&' operand
> #define pmd_val(x) ((&x)->pmd[0])
> ^
> include/asm-generic/bug.h:124:25: note: in definition of macro 'WARN_ON'
> int __ret_warn_on = !!(condition); \
> ^~~~~~~~~
>>> arch/m68k/include/asm/motorola_pgtable.h:138:26: note: in expansion of macro 'pmd_val'
> #define pmd_none(pmd) (!pmd_val(pmd))
> ^~~~~~~
>>> mm/arch_pgtable_test.c:233:11: note: in expansion of macro 'pmd_none'
> WARN_ON(!pmd_none(READ_ONCE(*pmdp)));
> ^~~~~~~~
> mm/arch_pgtable_test.c: In function 'pmd_populate_tests':
>>> arch/m68k/include/asm/page.h:31:22: error: lvalue required as unary '&' operand
> #define pmd_val(x) ((&x)->pmd[0])
Storing READ_ONCE(*pmdp) in a local pmd_t variable first solves the problem.
On Tue, Sep 03, 2019 at 01:31:46PM +0530, Anshuman Khandual wrote:
> This adds a test module which will validate architecture page table helpers
> and accessors regarding compliance with generic MM semantics expectations.
> This will help various architectures in validating changes to the existing
> page table helpers or addition of new ones.
>
> Test page table and memory pages creating it's entries at various level are
> all allocated from system memory with required alignments. If memory pages
> with required size and alignment could not be allocated, then all depending
> individual tests are skipped.
See my comments below.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Mike Rapoport <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Mark Brown <[email protected]>
> Cc: Steven Price <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: Masahiro Yamada <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Tetsuo Handa <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Sri Krishna chowdary <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Russell King - ARM Linux <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Martin Schwidefsky <[email protected]>
> Cc: Heiko Carstens <[email protected]>
> Cc: "David S. Miller" <[email protected]>
> Cc: Vineet Gupta <[email protected]>
> Cc: James Hogan <[email protected]>
> Cc: Paul Burton <[email protected]>
> Cc: Ralf Baechle <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
>
> Suggested-by: Catalin Marinas <[email protected]>
> Signed-off-by: Anshuman Khandual <[email protected]>
> ---
> mm/Kconfig.debug | 14 ++
> mm/Makefile | 1 +
> mm/arch_pgtable_test.c | 425 +++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 440 insertions(+)
> create mode 100644 mm/arch_pgtable_test.c
>
> diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
> index 327b3ebf23bf..ce9c397f7b07 100644
> --- a/mm/Kconfig.debug
> +++ b/mm/Kconfig.debug
> @@ -117,3 +117,17 @@ config DEBUG_RODATA_TEST
> depends on STRICT_KERNEL_RWX
> ---help---
> This option enables a testcase for the setting rodata read-only.
> +
> +config DEBUG_ARCH_PGTABLE_TEST
> + bool "Test arch page table helpers for semantics compliance"
> + depends on MMU
> + depends on DEBUG_KERNEL
> + help
> + This options provides a kernel module which can be used to test
> + architecture page table helper functions on various platform in
> + verifying if they comply with expected generic MM semantics. This
> + will help architectures code in making sure that any changes or
> + new additions of these helpers will still conform to generic MM
> + expected semantics.
> +
> + If unsure, say N.
> diff --git a/mm/Makefile b/mm/Makefile
> index d996846697ef..bb572c5aa8c5 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -86,6 +86,7 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
> obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
> obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
> obj-$(CONFIG_DEBUG_RODATA_TEST) += rodata_test.o
> +obj-$(CONFIG_DEBUG_ARCH_PGTABLE_TEST) += arch_pgtable_test.o
> obj-$(CONFIG_PAGE_OWNER) += page_owner.o
> obj-$(CONFIG_CLEANCACHE) += cleancache.o
> obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
> diff --git a/mm/arch_pgtable_test.c b/mm/arch_pgtable_test.c
> new file mode 100644
> index 000000000000..f15be8a73723
> --- /dev/null
> +++ b/mm/arch_pgtable_test.c
> @@ -0,0 +1,425 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * This kernel module validates architecture page table helpers &
> + * accessors and helps in verifying their continued compliance with
> + * generic MM semantics.
> + *
> + * Copyright (C) 2019 ARM Ltd.
> + *
> + * Author: Anshuman Khandual <[email protected]>
> + */
> +#define pr_fmt(fmt) "arch_pgtable_test: %s " fmt, __func__
> +
> +#include <linux/kernel.h>
> +#include <linux/hugetlb.h>
> +#include <linux/mm.h>
> +#include <linux/mman.h>
> +#include <linux/mm_types.h>
> +#include <linux/module.h>
> +#include <linux/printk.h>
> +#include <linux/swap.h>
> +#include <linux/swapops.h>
> +#include <linux/pfn_t.h>
> +#include <linux/gfp.h>
> +#include <linux/spinlock.h>
> +#include <linux/sched/mm.h>
> +#include <asm/pgalloc.h>
> +#include <asm/pgtable.h>
> +
> +/*
> + * Basic operations
> + *
> + * mkold(entry) = An old and not a young entry
> + * mkyoung(entry) = A young and not an old entry
> + * mkdirty(entry) = A dirty and not a clean entry
> + * mkclean(entry) = A clean and not a dirty entry
> + * mkwrite(entry) = A write and not a write protected entry
> + * wrprotect(entry) = A write protected and not a write entry
> + * pxx_bad(entry) = A mapped and non-table entry
> + * pxx_same(entry1, entry2) = Both entries hold the exact same value
> + */
> +#define VADDR_TEST (PGDIR_SIZE + PUD_SIZE + PMD_SIZE + PAGE_SIZE)
What is special about this address? How do you know if it is not occupied
yet?
> +#define VMA_TEST_FLAGS (VM_READ|VM_WRITE|VM_EXEC)
> +#define RANDOM_NZVALUE (0xbe)
> +
> +static bool pud_aligned;
> +static bool pmd_aligned;
> +
> +extern struct mm_struct *mm_alloc(void);
> +
> +static void pte_basic_tests(struct page *page, pgprot_t prot)
> +{
> + pte_t pte = mk_pte(page, prot);
> +
> + WARN_ON(!pte_same(pte, pte));
> + WARN_ON(!pte_young(pte_mkyoung(pte)));
> + WARN_ON(!pte_dirty(pte_mkdirty(pte)));
> + WARN_ON(!pte_write(pte_mkwrite(pte)));
> + WARN_ON(pte_young(pte_mkold(pte)));
> + WARN_ON(pte_dirty(pte_mkclean(pte)));
> + WARN_ON(pte_write(pte_wrprotect(pte)));
> +}
> +
> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
> +static void pmd_basic_tests(struct page *page, pgprot_t prot)
> +{
> + pmd_t pmd;
> +
> + /*
> + * Memory block here must be PMD_SIZE aligned. Abort this
> + * test in case we could not allocate such a memory block.
> + */
> + if (!pmd_aligned) {
> + pr_warn("Could not proceed with PMD tests\n");
> + return;
> + }
> +
> + pmd = mk_pmd(page, prot);
> + WARN_ON(!pmd_same(pmd, pmd));
> + WARN_ON(!pmd_young(pmd_mkyoung(pmd)));
> + WARN_ON(!pmd_dirty(pmd_mkdirty(pmd)));
> + WARN_ON(!pmd_write(pmd_mkwrite(pmd)));
> + WARN_ON(pmd_young(pmd_mkold(pmd)));
> + WARN_ON(pmd_dirty(pmd_mkclean(pmd)));
> + WARN_ON(pmd_write(pmd_wrprotect(pmd)));
> + /*
> + * A huge page does not point to next level page table
> + * entry. Hence this must qualify as pmd_bad().
> + */
> + WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
> +}
> +#else
> +static void pmd_basic_tests(struct page *page, pgprot_t prot) { }
> +#endif
> +
> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> +static void pud_basic_tests(struct page *page, pgprot_t prot)
> +{
> + pud_t pud;
> +
> + /*
> + * Memory block here must be PUD_SIZE aligned. Abort this
> + * test in case we could not allocate such a memory block.
> + */
> + if (!pud_aligned) {
> + pr_warn("Could not proceed with PUD tests\n");
> + return;
> + }
> +
> + pud = pfn_pud(page_to_pfn(page), prot);
> + WARN_ON(!pud_same(pud, pud));
> + WARN_ON(!pud_young(pud_mkyoung(pud)));
> + WARN_ON(!pud_write(pud_mkwrite(pud)));
> + WARN_ON(pud_write(pud_wrprotect(pud)));
> + WARN_ON(pud_young(pud_mkold(pud)));
> +
> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
> + /*
> + * A huge page does not point to next level page table
> + * entry. Hence this must qualify as pud_bad().
> + */
> + WARN_ON(!pud_bad(pud_mkhuge(pud)));
> +#endif
> +}
> +#else
> +static void pud_basic_tests(struct page *page, pgprot_t prot) { }
> +#endif
> +
> +static void p4d_basic_tests(struct page *page, pgprot_t prot)
> +{
> + p4d_t p4d;
> +
> + memset(&p4d, RANDOM_NZVALUE, sizeof(p4d_t));
> + WARN_ON(!p4d_same(p4d, p4d));
> +}
> +
> +static void pgd_basic_tests(struct page *page, pgprot_t prot)
> +{
> + pgd_t pgd;
> +
> + memset(&pgd, RANDOM_NZVALUE, sizeof(pgd_t));
> + WARN_ON(!pgd_same(pgd, pgd));
> +}
> +
> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
> +static void pud_clear_tests(pud_t *pudp)
> +{
> + memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
> + pud_clear(pudp);
> + WARN_ON(!pud_none(READ_ONCE(*pudp)));
> +}
> +
> +static void pud_populate_tests(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
> +{
> + /*
> + * This entry points to next level page table page.
> + * Hence this must not qualify as pud_bad().
> + */
> + pmd_clear(pmdp);
> + pud_clear(pudp);
> + pud_populate(mm, pudp, pmdp);
> + WARN_ON(pud_bad(READ_ONCE(*pudp)));
> +}
> +#else
> +static void pud_clear_tests(pud_t *pudp) { }
> +static void pud_populate_tests(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
> +{
> +}
> +#endif
> +
> +#if !defined(__PAGETABLE_PUD_FOLDED) && !defined(__ARCH_HAS_5LEVEL_HACK)
> +static void p4d_clear_tests(p4d_t *p4dp)
> +{
> + memset(p4dp, RANDOM_NZVALUE, sizeof(p4d_t));
> + p4d_clear(p4dp);
> + WARN_ON(!p4d_none(READ_ONCE(*p4dp)));
> +}
> +
> +static void p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
> +{
> + /*
> + * This entry points to next level page table page.
> + * Hence this must not qualify as p4d_bad().
> + */
> + pud_clear(pudp);
> + p4d_clear(p4dp);
> + p4d_populate(mm, p4dp, pudp);
> + WARN_ON(p4d_bad(READ_ONCE(*p4dp)));
> +}
> +#else
> +static void p4d_clear_tests(p4d_t *p4dp) { }
> +static void p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
> +{
> +}
> +#endif
> +
> +#ifndef __PAGETABLE_P4D_FOLDED
> +static void pgd_clear_tests(pgd_t *pgdp)
> +{
> + memset(pgdp, RANDOM_NZVALUE, sizeof(pgd_t));
> + pgd_clear(pgdp);
> + WARN_ON(!pgd_none(READ_ONCE(*pgdp)));
> +}
> +
> +static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
> +{
> + /*
> + * This entry points to next level page table page.
> + * Hence this must not qualify as pgd_bad().
> + */
> + p4d_clear(p4dp);
> + pgd_clear(pgdp);
> + pgd_populate(mm, pgdp, p4dp);
> + WARN_ON(pgd_bad(READ_ONCE(*pgdp)));
> +}
> +#else
> +static void pgd_clear_tests(pgd_t *pgdp) { }
> +static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
> +{
> +}
> +#endif
This will not work if p4d is folded at runtime. Like for x86-64 and s390.
Here's the fixup. It should work for both x86-64 and s390, but I only
tested on x86-64:
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 52e5f5f2240d..b882792a3999 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -40,6 +40,8 @@ static inline bool pgtable_l5_enabled(void)
#define pgtable_l5_enabled() 0
#endif /* CONFIG_X86_5LEVEL */
+#define mm_p4d_folded(mm) (!pgtable_l5_enabled())
+
extern unsigned int pgdir_shift;
extern unsigned int ptrs_per_p4d;
diff --git a/mm/arch_pgtable_test.c b/mm/arch_pgtable_test.c
index f15be8a73723..206fe3334a28 100644
--- a/mm/arch_pgtable_test.c
+++ b/mm/arch_pgtable_test.c
@@ -193,9 +193,11 @@ static void p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
}
#endif
-#ifndef __PAGETABLE_P4D_FOLDED
static void pgd_clear_tests(pgd_t *pgdp)
{
+ if (mm_p4d_folded(mm))
+ return;
+
memset(pgdp, RANDOM_NZVALUE, sizeof(pgd_t));
pgd_clear(pgdp);
WARN_ON(!pgd_none(READ_ONCE(*pgdp)));
@@ -203,6 +205,9 @@ static void pgd_clear_tests(pgd_t *pgdp)
static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
{
+ if (mm_p4d_folded(mm))
+ return;
+
/*
* This entry points to next level page table page.
* Hence this must not qualify as pgd_bad().
@@ -212,12 +217,6 @@ static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
pgd_populate(mm, pgdp, p4dp);
WARN_ON(pgd_bad(READ_ONCE(*pgdp)));
}
-#else
-static void pgd_clear_tests(pgd_t *pgdp) { }
-static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
-{
-}
-#endif
static void pte_clear_tests(pte_t *ptep)
{
> +
> +static void pte_clear_tests(pte_t *ptep)
> +{
> + memset(ptep, RANDOM_NZVALUE, sizeof(pte_t));
> + pte_clear(NULL, 0, ptep);
> + WARN_ON(!pte_none(READ_ONCE(*ptep)));
> +}
> +
> +static void pmd_clear_tests(pmd_t *pmdp)
> +{
> + memset(pmdp, RANDOM_NZVALUE, sizeof(pmd_t));
> + pmd_clear(pmdp);
> + WARN_ON(!pmd_none(READ_ONCE(*pmdp)));
> +}
> +
> +static void pmd_populate_tests(struct mm_struct *mm, pmd_t *pmdp,
> + pgtable_t pgtable)
> +{
> + /*
> + * This entry points to next level page table page.
> + * Hence this must not qualify as pmd_bad().
> + */
> + pmd_clear(pmdp);
> + pmd_populate(mm, pmdp, pgtable);
> + WARN_ON(pmd_bad(READ_ONCE(*pmdp)));
> +}
> +
> +static bool pfn_range_valid(struct zone *z, unsigned long start_pfn,
> + unsigned long nr_pages)
> +{
> + unsigned long i, end_pfn = start_pfn + nr_pages;
> + struct page *page;
> +
> + for (i = start_pfn; i < end_pfn; i++) {
> + if (!pfn_valid(i))
> + return false;
> +
> + page = pfn_to_page(i);
> +
> + if (page_zone(page) != z)
> + return false;
> +
> + if (PageReserved(page))
> + return false;
> +
> + if (page_count(page) > 0)
> + return false;
> +
> + if (PageHuge(page))
> + return false;
> + }
> + return true;
> +}
> +
> +static struct page *alloc_gigantic_page(nodemask_t *nodemask,
> + int nid, gfp_t gfp_mask, int order)
> +{
> + struct zonelist *zonelist;
> + struct zone *zone;
> + struct zoneref *z;
> + enum zone_type zonesel;
> + unsigned long ret, pfn, flags, nr_pages;
> +
> + nr_pages = 1UL << order;
> + zonesel = gfp_zone(gfp_mask);
> + zonelist = node_zonelist(nid, gfp_mask);
> + for_each_zone_zonelist_nodemask(zone, z, zonelist, zonesel, nodemask) {
> + spin_lock_irqsave(&zone->lock, flags);
> + pfn = ALIGN(zone->zone_start_pfn, nr_pages);
> + while (zone_spans_pfn(zone, pfn + nr_pages - 1)) {
> + if (pfn_range_valid(zone, pfn, nr_pages)) {
> + spin_unlock_irqrestore(&zone->lock, flags);
> + ret = alloc_contig_range(pfn, pfn + nr_pages,
> + MIGRATE_MOVABLE,
> + gfp_mask);
> + if (!ret)
> + return pfn_to_page(pfn);
> + spin_lock_irqsave(&zone->lock, flags);
> + }
> + pfn += nr_pages;
> + }
> + spin_unlock_irqrestore(&zone->lock, flags);
> + }
> + return NULL;
> +}
> +
> +static struct page *alloc_mapped_page(void)
> +{
> + gfp_t gfp_mask = GFP_KERNEL | __GFP_ZERO;
> + struct page *page = NULL;
> +
> + page = alloc_gigantic_page(&node_states[N_MEMORY], first_memory_node,
> + gfp_mask, get_order(PUD_SIZE));
> + if (page) {
> + pud_aligned = true;
> + pmd_aligned = true;
> + return page;
> + }
> +
> + page = alloc_pages(gfp_mask, get_order(PMD_SIZE));
> + if (page) {
> + pmd_aligned = true;
> + return page;
> + }
> + return alloc_page(gfp_mask);
> +}
> +
> +static void free_mapped_page(struct page *page)
> +{
> + if (pud_aligned) {
> + unsigned long pfn = page_to_pfn(page);
> +
> + free_contig_range(pfn, 1ULL << get_order(PUD_SIZE));
> + return;
> + }
> +
> + if (pmd_aligned) {
> + int order = get_order(PMD_SIZE);
> +
> + free_pages((unsigned long)page_address(page), order);
> + return;
> + }
> + free_page((unsigned long)page_address(page));
> +}
> +
> +static int __init arch_pgtable_tests_init(void)
> +{
> + struct mm_struct *mm;
> + struct page *page;
> + pgd_t *pgdp;
> + p4d_t *p4dp, *saved_p4dp;
> + pud_t *pudp, *saved_pudp;
> + pmd_t *pmdp, *saved_pmdp;
> + pte_t *ptep, *saved_ptep;
> + pgprot_t prot = vm_get_page_prot(VMA_TEST_FLAGS);
> + unsigned long vaddr = VADDR_TEST;
> +
> + mm = mm_alloc();
> + if (!mm) {
> + pr_err("mm_struct allocation failed\n");
> + return 1;
> + }
> +
> + page = alloc_mapped_page();
> + if (!page) {
> + pr_err("memory allocation failed\n");
> + return 1;
> + }
> +
> + pgdp = pgd_offset(mm, vaddr);
> + p4dp = p4d_alloc(mm, pgdp, vaddr);
> + pudp = pud_alloc(mm, p4dp, vaddr);
> + pmdp = pmd_alloc(mm, pudp, vaddr);
> + ptep = pte_alloc_map(mm, pmdp, vaddr);
> +
> + /*
> + * Save all the page table page addresses as the page table
> + * entries will be used for testing with random or garbage
> + * values. These saved addresses will be used for freeing
> + * page table pages.
> + */
> + saved_p4dp = p4d_offset(pgdp, 0UL);
> + saved_pudp = pud_offset(p4dp, 0UL);
> + saved_pmdp = pmd_offset(pudp, 0UL);
> + saved_ptep = pte_offset_map(pmdp, 0UL);
> +
> + pte_basic_tests(page, prot);
> + pmd_basic_tests(page, prot);
> + pud_basic_tests(page, prot);
> + p4d_basic_tests(page, prot);
> + pgd_basic_tests(page, prot);
> +
> + pte_clear_tests(ptep);
> + pmd_clear_tests(pmdp);
> + pud_clear_tests(pudp);
> + p4d_clear_tests(p4dp);
> + pgd_clear_tests(pgdp);
> +
> + pmd_populate_tests(mm, pmdp, (pgtable_t) page);
This is not correct for architectures that defines pgtable_t as pte_t
pointer, not struct page pointer.
> + pud_populate_tests(mm, pudp, pmdp);
> + p4d_populate_tests(mm, p4dp, pudp);
> + pgd_populate_tests(mm, pgdp, p4dp);
This is wrong. All p?dp points to the second entry in page table entry.
This is not valid pointer for page table and triggers p?d_bad() on x86.
Use saved_p?dp instead.
> +
> + p4d_free(mm, saved_p4dp);
> + pud_free(mm, saved_pudp);
> + pmd_free(mm, saved_pmdp);
> + pte_free(mm, (pgtable_t) virt_to_page(saved_ptep));
> +
> + mm_dec_nr_puds(mm);
> + mm_dec_nr_pmds(mm);
> + mm_dec_nr_ptes(mm);
> + __mmdrop(mm);
> +
> + free_mapped_page(page);
> + return 0;
> +}
> +
> +static void __exit arch_pgtable_tests_exit(void) { }
> +
> +module_init(arch_pgtable_tests_init);
> +module_exit(arch_pgtable_tests_exit);
> +
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR("Anshuman Khandual <[email protected]>");
> +MODULE_DESCRIPTION("Test archicture page table helpers");
> --
> 2.20.1
>
>
--
Kirill A. Shutemov
On Tue, 3 Sep 2019 13:31:46 +0530
Anshuman Khandual <[email protected]> wrote:
> This adds a test module which will validate architecture page table helpers
> and accessors regarding compliance with generic MM semantics expectations.
> This will help various architectures in validating changes to the existing
> page table helpers or addition of new ones.
>
> Test page table and memory pages creating it's entries at various level are
> all allocated from system memory with required alignments. If memory pages
> with required size and alignment could not be allocated, then all depending
> individual tests are skipped.
This looks very useful, thanks. Of course, s390 is quite special and does
not work nicely with this patch (yet), mostly because of our dynamic page
table levels/folding. Still need to figure out what can be fixed in the arch
code and what would need to be changed in the test module. See below for some
generic comments/questions.
At least one real bug in the s390 code was already revealed by this, which
is very nice. In pmd/pud_bad(), we also check large pmds/puds for sanity,
instead of reporting them as bad, which is apparently not how it is expected.
[...]
> +/*
> + * Basic operations
> + *
> + * mkold(entry) = An old and not a young entry
> + * mkyoung(entry) = A young and not an old entry
> + * mkdirty(entry) = A dirty and not a clean entry
> + * mkclean(entry) = A clean and not a dirty entry
> + * mkwrite(entry) = A write and not a write protected entry
> + * wrprotect(entry) = A write protected and not a write entry
> + * pxx_bad(entry) = A mapped and non-table entry
> + * pxx_same(entry1, entry2) = Both entries hold the exact same value
> + */
> +#define VADDR_TEST (PGDIR_SIZE + PUD_SIZE + PMD_SIZE + PAGE_SIZE)
Why is P4D_SIZE missing in the VADDR_TEST calculation?
[...]
> +
> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
> +static void pud_clear_tests(pud_t *pudp)
> +{
> + memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
> + pud_clear(pudp);
> + WARN_ON(!pud_none(READ_ONCE(*pudp)));
> +}
For pgd/p4d/pud_clear(), we only clear if the page table level is present
and not folded. The memset() here overwrites the table type bits, so
pud_clear() will not clear anything on s390 and the pud_none() check will
fail.
Would it be possible to OR a (larger) random value into the table, so that
the lower 12 bits would be preserved?
> +
> +static void pud_populate_tests(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
> +{
> + /*
> + * This entry points to next level page table page.
> + * Hence this must not qualify as pud_bad().
> + */
> + pmd_clear(pmdp);
> + pud_clear(pudp);
> + pud_populate(mm, pudp, pmdp);
> + WARN_ON(pud_bad(READ_ONCE(*pudp)));
> +}
This will populate the pud with a pmd pointer that does not point to the
beginning of the pmd table, but to the second entry (because of how
VADDR_TEST is constructed). This will result in failing pud_bad() check
on s390. Not sure why/how it works on other archs, but would it be possible
to align pmdp down to the beginning of the pmd table (and similar for the
other pxd_populate_tests)?
[...]
> +
> + p4d_free(mm, saved_p4dp);
> + pud_free(mm, saved_pudp);
> + pmd_free(mm, saved_pmdp);
> + pte_free(mm, (pgtable_t) virt_to_page(saved_ptep));
pgtable_t is arch-specific, and on s390 it is not a struct page pointer,
but a pte pointer. So this will go wrong, also on all other archs (if any)
where pgtable_t is not struct page.
Would it be possible to use pte_free_kernel() instead, and just pass
saved_ptep directly?
Regards,
Gerald
On 9/3/19 1:01 AM, Anshuman Khandual wrote:
> This adds a test module which will validate architecture page table helpers
> and accessors regarding compliance with generic MM semantics expectations.
> This will help various architectures in validating changes to the existing
> page table helpers or addition of new ones.
This looks really cool. The "only" complication on x86 is the large
number of compile and runtime options that we have. When this gets
merged, it would be really nice to make sure that the 0day guys have
good coverage of all the configurations.
I'm not _quite_ sure what kind of bugs it will catch on x86 and I
suspect it'll have more value for the other architectures, but it seems
harmless enough.
On 09/04/2019 07:49 PM, Kirill A. Shutemov wrote:
> On Tue, Sep 03, 2019 at 01:31:46PM +0530, Anshuman Khandual wrote:
>> This adds a test module which will validate architecture page table helpers
>> and accessors regarding compliance with generic MM semantics expectations.
>> This will help various architectures in validating changes to the existing
>> page table helpers or addition of new ones.
>>
>> Test page table and memory pages creating it's entries at various level are
>> all allocated from system memory with required alignments. If memory pages
>> with required size and alignment could not be allocated, then all depending
>> individual tests are skipped.
>
> See my comments below.
>
>>
>> Cc: Andrew Morton <[email protected]>
>> Cc: Vlastimil Babka <[email protected]>
>> Cc: Greg Kroah-Hartman <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Mike Rapoport <[email protected]>
>> Cc: Jason Gunthorpe <[email protected]>
>> Cc: Dan Williams <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: Michal Hocko <[email protected]>
>> Cc: Mark Rutland <[email protected]>
>> Cc: Mark Brown <[email protected]>
>> Cc: Steven Price <[email protected]>
>> Cc: Ard Biesheuvel <[email protected]>
>> Cc: Masahiro Yamada <[email protected]>
>> Cc: Kees Cook <[email protected]>
>> Cc: Tetsuo Handa <[email protected]>
>> Cc: Matthew Wilcox <[email protected]>
>> Cc: Sri Krishna chowdary <[email protected]>
>> Cc: Dave Hansen <[email protected]>
>> Cc: Russell King - ARM Linux <[email protected]>
>> Cc: Michael Ellerman <[email protected]>
>> Cc: Paul Mackerras <[email protected]>
>> Cc: Martin Schwidefsky <[email protected]>
>> Cc: Heiko Carstens <[email protected]>
>> Cc: "David S. Miller" <[email protected]>
>> Cc: Vineet Gupta <[email protected]>
>> Cc: James Hogan <[email protected]>
>> Cc: Paul Burton <[email protected]>
>> Cc: Ralf Baechle <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>>
>> Suggested-by: Catalin Marinas <[email protected]>
>> Signed-off-by: Anshuman Khandual <[email protected]>
>> ---
>> mm/Kconfig.debug | 14 ++
>> mm/Makefile | 1 +
>> mm/arch_pgtable_test.c | 425 +++++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 440 insertions(+)
>> create mode 100644 mm/arch_pgtable_test.c
>>
>> diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
>> index 327b3ebf23bf..ce9c397f7b07 100644
>> --- a/mm/Kconfig.debug
>> +++ b/mm/Kconfig.debug
>> @@ -117,3 +117,17 @@ config DEBUG_RODATA_TEST
>> depends on STRICT_KERNEL_RWX
>> ---help---
>> This option enables a testcase for the setting rodata read-only.
>> +
>> +config DEBUG_ARCH_PGTABLE_TEST
>> + bool "Test arch page table helpers for semantics compliance"
>> + depends on MMU
>> + depends on DEBUG_KERNEL
>> + help
>> + This options provides a kernel module which can be used to test
>> + architecture page table helper functions on various platform in
>> + verifying if they comply with expected generic MM semantics. This
>> + will help architectures code in making sure that any changes or
>> + new additions of these helpers will still conform to generic MM
>> + expected semantics.
>> +
>> + If unsure, say N.
>> diff --git a/mm/Makefile b/mm/Makefile
>> index d996846697ef..bb572c5aa8c5 100644
>> --- a/mm/Makefile
>> +++ b/mm/Makefile
>> @@ -86,6 +86,7 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
>> obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
>> obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
>> obj-$(CONFIG_DEBUG_RODATA_TEST) += rodata_test.o
>> +obj-$(CONFIG_DEBUG_ARCH_PGTABLE_TEST) += arch_pgtable_test.o
>> obj-$(CONFIG_PAGE_OWNER) += page_owner.o
>> obj-$(CONFIG_CLEANCACHE) += cleancache.o
>> obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
>> diff --git a/mm/arch_pgtable_test.c b/mm/arch_pgtable_test.c
>> new file mode 100644
>> index 000000000000..f15be8a73723
>> --- /dev/null
>> +++ b/mm/arch_pgtable_test.c
>> @@ -0,0 +1,425 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * This kernel module validates architecture page table helpers &
>> + * accessors and helps in verifying their continued compliance with
>> + * generic MM semantics.
>> + *
>> + * Copyright (C) 2019 ARM Ltd.
>> + *
>> + * Author: Anshuman Khandual <[email protected]>
>> + */
>> +#define pr_fmt(fmt) "arch_pgtable_test: %s " fmt, __func__
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/hugetlb.h>
>> +#include <linux/mm.h>
>> +#include <linux/mman.h>
>> +#include <linux/mm_types.h>
>> +#include <linux/module.h>
>> +#include <linux/printk.h>
>> +#include <linux/swap.h>
>> +#include <linux/swapops.h>
>> +#include <linux/pfn_t.h>
>> +#include <linux/gfp.h>
>> +#include <linux/spinlock.h>
>> +#include <linux/sched/mm.h>
>> +#include <asm/pgalloc.h>
>> +#include <asm/pgtable.h>
>> +
>> +/*
>> + * Basic operations
>> + *
>> + * mkold(entry) = An old and not a young entry
>> + * mkyoung(entry) = A young and not an old entry
>> + * mkdirty(entry) = A dirty and not a clean entry
>> + * mkclean(entry) = A clean and not a dirty entry
>> + * mkwrite(entry) = A write and not a write protected entry
>> + * wrprotect(entry) = A write protected and not a write entry
>> + * pxx_bad(entry) = A mapped and non-table entry
>> + * pxx_same(entry1, entry2) = Both entries hold the exact same value
>> + */
>> +#define VADDR_TEST (PGDIR_SIZE + PUD_SIZE + PMD_SIZE + PAGE_SIZE)
>
> What is special about this address? How do you know if it is not occupied
> yet?
We are creating the page table from scratch after allocating an mm_struct
for a given random virtual address 'VADDR_TEST'. Hence nothing is occupied
just yet. There is nothing special about this address, just that it tries
to ensure the page table entries are being created with some offset from
beginning of respective page table page at all levels ? The idea is to
have a more representative form of page table structure for test.
>
>> +#define VMA_TEST_FLAGS (VM_READ|VM_WRITE|VM_EXEC)
>> +#define RANDOM_NZVALUE (0xbe)
>> +
>> +static bool pud_aligned;
>> +static bool pmd_aligned;
>> +
>> +extern struct mm_struct *mm_alloc(void);
>> +
>> +static void pte_basic_tests(struct page *page, pgprot_t prot)
>> +{
>> + pte_t pte = mk_pte(page, prot);
>> +
>> + WARN_ON(!pte_same(pte, pte));
>> + WARN_ON(!pte_young(pte_mkyoung(pte)));
>> + WARN_ON(!pte_dirty(pte_mkdirty(pte)));
>> + WARN_ON(!pte_write(pte_mkwrite(pte)));
>> + WARN_ON(pte_young(pte_mkold(pte)));
>> + WARN_ON(pte_dirty(pte_mkclean(pte)));
>> + WARN_ON(pte_write(pte_wrprotect(pte)));
>> +}
>> +
>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE
>> +static void pmd_basic_tests(struct page *page, pgprot_t prot)
>> +{
>> + pmd_t pmd;
>> +
>> + /*
>> + * Memory block here must be PMD_SIZE aligned. Abort this
>> + * test in case we could not allocate such a memory block.
>> + */
>> + if (!pmd_aligned) {
>> + pr_warn("Could not proceed with PMD tests\n");
>> + return;
>> + }
>> +
>> + pmd = mk_pmd(page, prot);
>> + WARN_ON(!pmd_same(pmd, pmd));
>> + WARN_ON(!pmd_young(pmd_mkyoung(pmd)));
>> + WARN_ON(!pmd_dirty(pmd_mkdirty(pmd)));
>> + WARN_ON(!pmd_write(pmd_mkwrite(pmd)));
>> + WARN_ON(pmd_young(pmd_mkold(pmd)));
>> + WARN_ON(pmd_dirty(pmd_mkclean(pmd)));
>> + WARN_ON(pmd_write(pmd_wrprotect(pmd)));
>> + /*
>> + * A huge page does not point to next level page table
>> + * entry. Hence this must qualify as pmd_bad().
>> + */
>> + WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
>> +}
>> +#else
>> +static void pmd_basic_tests(struct page *page, pgprot_t prot) { }
>> +#endif
>> +
>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>> +static void pud_basic_tests(struct page *page, pgprot_t prot)
>> +{
>> + pud_t pud;
>> +
>> + /*
>> + * Memory block here must be PUD_SIZE aligned. Abort this
>> + * test in case we could not allocate such a memory block.
>> + */
>> + if (!pud_aligned) {
>> + pr_warn("Could not proceed with PUD tests\n");
>> + return;
>> + }
>> +
>> + pud = pfn_pud(page_to_pfn(page), prot);
>> + WARN_ON(!pud_same(pud, pud));
>> + WARN_ON(!pud_young(pud_mkyoung(pud)));
>> + WARN_ON(!pud_write(pud_mkwrite(pud)));
>> + WARN_ON(pud_write(pud_wrprotect(pud)));
>> + WARN_ON(pud_young(pud_mkold(pud)));
>> +
>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>> + /*
>> + * A huge page does not point to next level page table
>> + * entry. Hence this must qualify as pud_bad().
>> + */
>> + WARN_ON(!pud_bad(pud_mkhuge(pud)));
>> +#endif
>> +}
>> +#else
>> +static void pud_basic_tests(struct page *page, pgprot_t prot) { }
>> +#endif
>> +
>> +static void p4d_basic_tests(struct page *page, pgprot_t prot)
>> +{
>> + p4d_t p4d;
>> +
>> + memset(&p4d, RANDOM_NZVALUE, sizeof(p4d_t));
>> + WARN_ON(!p4d_same(p4d, p4d));
>> +}
>> +
>> +static void pgd_basic_tests(struct page *page, pgprot_t prot)
>> +{
>> + pgd_t pgd;
>> +
>> + memset(&pgd, RANDOM_NZVALUE, sizeof(pgd_t));
>> + WARN_ON(!pgd_same(pgd, pgd));
>> +}
>> +
>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>> +static void pud_clear_tests(pud_t *pudp)
>> +{
>> + memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
>> + pud_clear(pudp);
>> + WARN_ON(!pud_none(READ_ONCE(*pudp)));
>> +}
>> +
>> +static void pud_populate_tests(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
>> +{
>> + /*
>> + * This entry points to next level page table page.
>> + * Hence this must not qualify as pud_bad().
>> + */
>> + pmd_clear(pmdp);
>> + pud_clear(pudp);
>> + pud_populate(mm, pudp, pmdp);
>> + WARN_ON(pud_bad(READ_ONCE(*pudp)));
>> +}
>> +#else
>> +static void pud_clear_tests(pud_t *pudp) { }
>> +static void pud_populate_tests(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
>> +{
>> +}
>> +#endif
>> +
>> +#if !defined(__PAGETABLE_PUD_FOLDED) && !defined(__ARCH_HAS_5LEVEL_HACK)
>> +static void p4d_clear_tests(p4d_t *p4dp)
>> +{
>> + memset(p4dp, RANDOM_NZVALUE, sizeof(p4d_t));
>> + p4d_clear(p4dp);
>> + WARN_ON(!p4d_none(READ_ONCE(*p4dp)));
>> +}
>> +
>> +static void p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
>> +{
>> + /*
>> + * This entry points to next level page table page.
>> + * Hence this must not qualify as p4d_bad().
>> + */
>> + pud_clear(pudp);
>> + p4d_clear(p4dp);
>> + p4d_populate(mm, p4dp, pudp);
>> + WARN_ON(p4d_bad(READ_ONCE(*p4dp)));
>> +}
>> +#else
>> +static void p4d_clear_tests(p4d_t *p4dp) { }
>> +static void p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
>> +{
>> +}
>> +#endif
>> +
>> +#ifndef __PAGETABLE_P4D_FOLDED
>> +static void pgd_clear_tests(pgd_t *pgdp)
>> +{
>> + memset(pgdp, RANDOM_NZVALUE, sizeof(pgd_t));
>> + pgd_clear(pgdp);
>> + WARN_ON(!pgd_none(READ_ONCE(*pgdp)));
>> +}
>> +
>> +static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
>> +{
>> + /*
>> + * This entry points to next level page table page.
>> + * Hence this must not qualify as pgd_bad().
>> + */
>> + p4d_clear(p4dp);
>> + pgd_clear(pgdp);
>> + pgd_populate(mm, pgdp, p4dp);
>> + WARN_ON(pgd_bad(READ_ONCE(*pgdp)));
>> +}
>> +#else
>> +static void pgd_clear_tests(pgd_t *pgdp) { }
>> +static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
>> +{
>> +}
>> +#endif
>
> This will not work if p4d is folded at runtime. Like for x86-64 and s390.
>
> Here's the fixup. It should work for both x86-64 and s390, but I only
> tested on x86-64:
>
> diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
> index 52e5f5f2240d..b882792a3999 100644
> --- a/arch/x86/include/asm/pgtable_64_types.h
> +++ b/arch/x86/include/asm/pgtable_64_types.h
> @@ -40,6 +40,8 @@ static inline bool pgtable_l5_enabled(void)
> #define pgtable_l5_enabled() 0
> #endif /* CONFIG_X86_5LEVEL */
>
> +#define mm_p4d_folded(mm) (!pgtable_l5_enabled())
> +
> extern unsigned int pgdir_shift;
> extern unsigned int ptrs_per_p4d;
>
> diff --git a/mm/arch_pgtable_test.c b/mm/arch_pgtable_test.c
> index f15be8a73723..206fe3334a28 100644
> --- a/mm/arch_pgtable_test.c
> +++ b/mm/arch_pgtable_test.c
> @@ -193,9 +193,11 @@ static void p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
> }
> #endif
>
> -#ifndef __PAGETABLE_P4D_FOLDED
> static void pgd_clear_tests(pgd_t *pgdp)
> {
> + if (mm_p4d_folded(mm))
> + return;
> +
> memset(pgdp, RANDOM_NZVALUE, sizeof(pgd_t));
> pgd_clear(pgdp);
> WARN_ON(!pgd_none(READ_ONCE(*pgdp)));
> @@ -203,6 +205,9 @@ static void pgd_clear_tests(pgd_t *pgdp)
>
> static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
> {
> + if (mm_p4d_folded(mm))
> + return;
> +
> /*
> * This entry points to next level page table page.
> * Hence this must not qualify as pgd_bad().
> @@ -212,12 +217,6 @@ static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
> pgd_populate(mm, pgdp, p4dp);
> WARN_ON(pgd_bad(READ_ONCE(*pgdp)));
> }
> -#else
> -static void pgd_clear_tests(pgd_t *pgdp) { }
> -static void pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
> -{
> -}
> -#endif
This makes sense for runtime cases but there is a problem here.
On arm64, pgd_populate() which takes (pud_t *) as last argument instead of
(p4d_t *) will fail to build when not wrapped in !__PAGETABLE_P4D_FOLDED
on certain configurations.
./arch/arm64/include/asm/pgalloc.h:81:75: note:
expected ‘pud_t *’ {aka ‘struct <anonymous> *’}
but argument is of type ‘pgd_t *’ {aka ‘struct <anonymous> *’}
static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, pud_t *pudp)
~~~~~~~^~~~
Wondering if this is something to be fixed on arm64 or its more general
problem. Will look into this further.
>
> static void pte_clear_tests(pte_t *ptep)
> {
>
>> +
>> +static void pte_clear_tests(pte_t *ptep)
>> +{
>> + memset(ptep, RANDOM_NZVALUE, sizeof(pte_t));
>> + pte_clear(NULL, 0, ptep);
>> + WARN_ON(!pte_none(READ_ONCE(*ptep)));
>> +}
>> +
>> +static void pmd_clear_tests(pmd_t *pmdp)
>> +{
>> + memset(pmdp, RANDOM_NZVALUE, sizeof(pmd_t));
>> + pmd_clear(pmdp);
>> + WARN_ON(!pmd_none(READ_ONCE(*pmdp)));
>> +}
>> +
>> +static void pmd_populate_tests(struct mm_struct *mm, pmd_t *pmdp,
>> + pgtable_t pgtable)
>> +{
>> + /*
>> + * This entry points to next level page table page.
>> + * Hence this must not qualify as pmd_bad().
>> + */
>> + pmd_clear(pmdp);
>> + pmd_populate(mm, pmdp, pgtable);
>> + WARN_ON(pmd_bad(READ_ONCE(*pmdp)));
>> +}
>> +
>> +static bool pfn_range_valid(struct zone *z, unsigned long start_pfn,
>> + unsigned long nr_pages)
>> +{
>> + unsigned long i, end_pfn = start_pfn + nr_pages;
>> + struct page *page;
>> +
>> + for (i = start_pfn; i < end_pfn; i++) {
>> + if (!pfn_valid(i))
>> + return false;
>> +
>> + page = pfn_to_page(i);
>> +
>> + if (page_zone(page) != z)
>> + return false;
>> +
>> + if (PageReserved(page))
>> + return false;
>> +
>> + if (page_count(page) > 0)
>> + return false;
>> +
>> + if (PageHuge(page))
>> + return false;
>> + }
>> + return true;
>> +}
>> +
>> +static struct page *alloc_gigantic_page(nodemask_t *nodemask,
>> + int nid, gfp_t gfp_mask, int order)
>> +{
>> + struct zonelist *zonelist;
>> + struct zone *zone;
>> + struct zoneref *z;
>> + enum zone_type zonesel;
>> + unsigned long ret, pfn, flags, nr_pages;
>> +
>> + nr_pages = 1UL << order;
>> + zonesel = gfp_zone(gfp_mask);
>> + zonelist = node_zonelist(nid, gfp_mask);
>> + for_each_zone_zonelist_nodemask(zone, z, zonelist, zonesel, nodemask) {
>> + spin_lock_irqsave(&zone->lock, flags);
>> + pfn = ALIGN(zone->zone_start_pfn, nr_pages);
>> + while (zone_spans_pfn(zone, pfn + nr_pages - 1)) {
>> + if (pfn_range_valid(zone, pfn, nr_pages)) {
>> + spin_unlock_irqrestore(&zone->lock, flags);
>> + ret = alloc_contig_range(pfn, pfn + nr_pages,
>> + MIGRATE_MOVABLE,
>> + gfp_mask);
>> + if (!ret)
>> + return pfn_to_page(pfn);
>> + spin_lock_irqsave(&zone->lock, flags);
>> + }
>> + pfn += nr_pages;
>> + }
>> + spin_unlock_irqrestore(&zone->lock, flags);
>> + }
>> + return NULL;
>> +}
>> +
>> +static struct page *alloc_mapped_page(void)
>> +{
>> + gfp_t gfp_mask = GFP_KERNEL | __GFP_ZERO;
>> + struct page *page = NULL;
>> +
>> + page = alloc_gigantic_page(&node_states[N_MEMORY], first_memory_node,
>> + gfp_mask, get_order(PUD_SIZE));
>> + if (page) {
>> + pud_aligned = true;
>> + pmd_aligned = true;
>> + return page;
>> + }
>> +
>> + page = alloc_pages(gfp_mask, get_order(PMD_SIZE));
>> + if (page) {
>> + pmd_aligned = true;
>> + return page;
>> + }
>> + return alloc_page(gfp_mask);
>> +}
>> +
>> +static void free_mapped_page(struct page *page)
>> +{
>> + if (pud_aligned) {
>> + unsigned long pfn = page_to_pfn(page);
>> +
>> + free_contig_range(pfn, 1ULL << get_order(PUD_SIZE));
>> + return;
>> + }
>> +
>> + if (pmd_aligned) {
>> + int order = get_order(PMD_SIZE);
>> +
>> + free_pages((unsigned long)page_address(page), order);
>> + return;
>> + }
>> + free_page((unsigned long)page_address(page));
>> +}
>> +
>> +static int __init arch_pgtable_tests_init(void)
>> +{
>> + struct mm_struct *mm;
>> + struct page *page;
>> + pgd_t *pgdp;
>> + p4d_t *p4dp, *saved_p4dp;
>> + pud_t *pudp, *saved_pudp;
>> + pmd_t *pmdp, *saved_pmdp;
>> + pte_t *ptep, *saved_ptep;
>> + pgprot_t prot = vm_get_page_prot(VMA_TEST_FLAGS);
>> + unsigned long vaddr = VADDR_TEST;
>> +
>> + mm = mm_alloc();
>> + if (!mm) {
>> + pr_err("mm_struct allocation failed\n");
>> + return 1;
>> + }
>> +
>> + page = alloc_mapped_page();
>> + if (!page) {
>> + pr_err("memory allocation failed\n");
>> + return 1;
>> + }
>> +
>> + pgdp = pgd_offset(mm, vaddr);
>> + p4dp = p4d_alloc(mm, pgdp, vaddr);
>> + pudp = pud_alloc(mm, p4dp, vaddr);
>> + pmdp = pmd_alloc(mm, pudp, vaddr);
>> + ptep = pte_alloc_map(mm, pmdp, vaddr);
>> +
>> + /*
>> + * Save all the page table page addresses as the page table
>> + * entries will be used for testing with random or garbage
>> + * values. These saved addresses will be used for freeing
>> + * page table pages.
>> + */
>> + saved_p4dp = p4d_offset(pgdp, 0UL);
>> + saved_pudp = pud_offset(p4dp, 0UL);
>> + saved_pmdp = pmd_offset(pudp, 0UL);
>> + saved_ptep = pte_offset_map(pmdp, 0UL);
>> +
>> + pte_basic_tests(page, prot);
>> + pmd_basic_tests(page, prot);
>> + pud_basic_tests(page, prot);
>> + p4d_basic_tests(page, prot);
>> + pgd_basic_tests(page, prot);
>> +
>> + pte_clear_tests(ptep);
>> + pmd_clear_tests(pmdp);
>> + pud_clear_tests(pudp);
>> + p4d_clear_tests(p4dp);
>> + pgd_clear_tests(pgdp);
>> +
>> + pmd_populate_tests(mm, pmdp, (pgtable_t) page);
>
> This is not correct for architectures that defines pgtable_t as pte_t
> pointer, not struct page pointer.
Right, a grep on the source confirms that.
These platforms define pgtable_t as struct page *
arch/alpha/include/asm/page.h:typedef struct page *pgtable_t;
arch/arm/include/asm/page.h:typedef struct page *pgtable_t;
arch/arm64/include/asm/page.h:typedef struct page *pgtable_t;
arch/csky/include/asm/page.h:typedef struct page *pgtable_t;
arch/hexagon/include/asm/page.h:typedef struct page *pgtable_t;
arch/ia64/include/asm/page.h: typedef struct page *pgtable_t;
arch/ia64/include/asm/page.h: typedef struct page *pgtable_t;
arch/m68k/include/asm/page.h:typedef struct page *pgtable_t;
arch/microblaze/include/asm/page.h:typedef struct page *pgtable_t;
arch/mips/include/asm/page.h:typedef struct page *pgtable_t;
arch/nds32/include/asm/page.h:typedef struct page *pgtable_t;
arch/nios2/include/asm/page.h:typedef struct page *pgtable_t;
arch/openrisc/include/asm/page.h:typedef struct page *pgtable_t;
arch/parisc/include/asm/page.h:typedef struct page *pgtable_t;
arch/riscv/include/asm/page.h:typedef struct page *pgtable_t;
arch/sh/include/asm/page.h:typedef struct page *pgtable_t;
arch/sparc/include/asm/page_32.h:typedef struct page *pgtable_t;
arch/um/include/asm/page.h:typedef struct page *pgtable_t;
arch/unicore32/include/asm/page.h:typedef struct page *pgtable_t;
arch/x86/include/asm/pgtable_types.h:typedef struct page *pgtable_t;
arch/xtensa/include/asm/page.h:typedef struct page *pgtable_t;
These platforms define pgtable_t as pte_t *
arch/arc/include/asm/page.h:typedef pte_t * pgtable_t;
arch/powerpc/include/asm/mmu.h:typedef pte_t *pgtable_t;
arch/s390/include/asm/page.h:typedef pte_t *pgtable_t;
arch/sparc/include/asm/page_64.h:typedef pte_t *pgtable_t;
Should we need have two pmd_populate_tests() definitions with
different arguments (struct page pointer or pte_t pointer) and then
call either one after detecting the given platform ?
>
>> + pud_populate_tests(mm, pudp, pmdp);
>> + p4d_populate_tests(mm, p4dp, pudp);
>> + pgd_populate_tests(mm, pgdp, p4dp);
>
> This is wrong. All p?dp points to the second entry in page table entry.
> This is not valid pointer for page table and triggers p?d_bad() on x86.
Yeah these are second entries because of the way we create the page table.
But I guess its applicable only to the second argument in all these above
cases because the first argument can be any valid entry on previous page
table level.
>
> Use saved_p?dp instead.
It works on x86. Will test on arm64 and update.
On Thu, Sep 05, 2019 at 01:48:27PM +0530, Anshuman Khandual wrote:
> >> +#define VADDR_TEST (PGDIR_SIZE + PUD_SIZE + PMD_SIZE + PAGE_SIZE)
> >
> > What is special about this address? How do you know if it is not occupied
> > yet?
>
> We are creating the page table from scratch after allocating an mm_struct
> for a given random virtual address 'VADDR_TEST'. Hence nothing is occupied
> just yet. There is nothing special about this address, just that it tries
> to ensure the page table entries are being created with some offset from
> beginning of respective page table page at all levels ? The idea is to
> have a more representative form of page table structure for test.
Why P4D_SIZE is missing?
Are you sure it will not land into kernel address space on any arch?
I think more robust way to deal with this would be using
get_unmapped_area() instead of fixed address.
> This makes sense for runtime cases but there is a problem here.
>
> On arm64, pgd_populate() which takes (pud_t *) as last argument instead of
> (p4d_t *) will fail to build when not wrapped in !__PAGETABLE_P4D_FOLDED
> on certain configurations.
>
> ./arch/arm64/include/asm/pgalloc.h:81:75: note:
> expected ‘pud_t *’ {aka ‘struct <anonymous> *’}
> but argument is of type ‘pgd_t *’ {aka ‘struct <anonymous> *’}
> static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, pud_t *pudp)
> ~~~~~~~^~~~
> Wondering if this is something to be fixed on arm64 or its more general
> problem. Will look into this further.
I think you need wrap this into #ifndef __ARCH_HAS_5LEVEL_HACK.
> >> + pmd_populate_tests(mm, pmdp, (pgtable_t) page);
> >
> > This is not correct for architectures that defines pgtable_t as pte_t
> > pointer, not struct page pointer.
>
> Right, a grep on the source confirms that.
>
> These platforms define pgtable_t as struct page *
>
> arch/alpha/include/asm/page.h:typedef struct page *pgtable_t;
> arch/arm/include/asm/page.h:typedef struct page *pgtable_t;
> arch/arm64/include/asm/page.h:typedef struct page *pgtable_t;
> arch/csky/include/asm/page.h:typedef struct page *pgtable_t;
> arch/hexagon/include/asm/page.h:typedef struct page *pgtable_t;
> arch/ia64/include/asm/page.h: typedef struct page *pgtable_t;
> arch/ia64/include/asm/page.h: typedef struct page *pgtable_t;
> arch/m68k/include/asm/page.h:typedef struct page *pgtable_t;
> arch/microblaze/include/asm/page.h:typedef struct page *pgtable_t;
> arch/mips/include/asm/page.h:typedef struct page *pgtable_t;
> arch/nds32/include/asm/page.h:typedef struct page *pgtable_t;
> arch/nios2/include/asm/page.h:typedef struct page *pgtable_t;
> arch/openrisc/include/asm/page.h:typedef struct page *pgtable_t;
> arch/parisc/include/asm/page.h:typedef struct page *pgtable_t;
> arch/riscv/include/asm/page.h:typedef struct page *pgtable_t;
> arch/sh/include/asm/page.h:typedef struct page *pgtable_t;
> arch/sparc/include/asm/page_32.h:typedef struct page *pgtable_t;
> arch/um/include/asm/page.h:typedef struct page *pgtable_t;
> arch/unicore32/include/asm/page.h:typedef struct page *pgtable_t;
> arch/x86/include/asm/pgtable_types.h:typedef struct page *pgtable_t;
> arch/xtensa/include/asm/page.h:typedef struct page *pgtable_t;
>
> These platforms define pgtable_t as pte_t *
>
> arch/arc/include/asm/page.h:typedef pte_t * pgtable_t;
> arch/powerpc/include/asm/mmu.h:typedef pte_t *pgtable_t;
> arch/s390/include/asm/page.h:typedef pte_t *pgtable_t;
> arch/sparc/include/asm/page_64.h:typedef pte_t *pgtable_t;
>
> Should we need have two pmd_populate_tests() definitions with
> different arguments (struct page pointer or pte_t pointer) and then
> call either one after detecting the given platform ?
Use pte_alloc_one() instead of alloc_mapped_page() to allocate the page
table.
> >> + pud_populate_tests(mm, pudp, pmdp);
> >> + p4d_populate_tests(mm, p4dp, pudp);
> >> + pgd_populate_tests(mm, pgdp, p4dp);
> >
> > This is wrong. All p?dp points to the second entry in page table entry.
> > This is not valid pointer for page table and triggers p?d_bad() on x86.
>
> Yeah these are second entries because of the way we create the page table.
> But I guess its applicable only to the second argument in all these above
> cases because the first argument can be any valid entry on previous page
> table level.
Yes:
@@ -397,9 +396,9 @@ static int __init arch_pgtable_tests_init(void)
pgd_clear_tests(pgdp);
pmd_populate_tests(mm, pmdp, (pgtable_t) page);
- pud_populate_tests(mm, pudp, pmdp);
- p4d_populate_tests(mm, p4dp, pudp);
- pgd_populate_tests(mm, pgdp, p4dp);
+ pud_populate_tests(mm, pudp, saved_pmdp);
+ p4d_populate_tests(mm, p4dp, saved_pudp);
+ pgd_populate_tests(mm, pgdp, saved_p4dp);
p4d_free(mm, saved_p4dp);
pud_free(mm, saved_pudp);
--
Kirill A. Shutemov
On 09/05/2019 01:46 AM, Gerald Schaefer wrote:
> On Tue, 3 Sep 2019 13:31:46 +0530
> Anshuman Khandual <[email protected]> wrote:
>
>> This adds a test module which will validate architecture page table helpers
>> and accessors regarding compliance with generic MM semantics expectations.
>> This will help various architectures in validating changes to the existing
>> page table helpers or addition of new ones.
>>
>> Test page table and memory pages creating it's entries at various level are
>> all allocated from system memory with required alignments. If memory pages
>> with required size and alignment could not be allocated, then all depending
>> individual tests are skipped.
>
> This looks very useful, thanks. Of course, s390 is quite special and does
> not work nicely with this patch (yet), mostly because of our dynamic page
> table levels/folding. Still need to figure out what can be fixed in the arch
Hmm.
> code and what would need to be changed in the test module. See below for some
> generic comments/questions.
Sure.
>
> At least one real bug in the s390 code was already revealed by this, which
> is very nice. In pmd/pud_bad(), we also check large pmds/puds for sanity,
> instead of reporting them as bad, which is apparently not how it is expected.
Hmm, so it has already started being useful :)
>
> [...]
>> +/*
>> + * Basic operations
>> + *
>> + * mkold(entry) = An old and not a young entry
>> + * mkyoung(entry) = A young and not an old entry
>> + * mkdirty(entry) = A dirty and not a clean entry
>> + * mkclean(entry) = A clean and not a dirty entry
>> + * mkwrite(entry) = A write and not a write protected entry
>> + * wrprotect(entry) = A write protected and not a write entry
>> + * pxx_bad(entry) = A mapped and non-table entry
>> + * pxx_same(entry1, entry2) = Both entries hold the exact same value
>> + */
>> +#define VADDR_TEST (PGDIR_SIZE + PUD_SIZE + PMD_SIZE + PAGE_SIZE)
>
> Why is P4D_SIZE missing in the VADDR_TEST calculation?
This was a random possible virtual address to generate a representative
page table structure for the test. As there is a default (PGDIR_SIZE) for
P4D_SIZE on platforms which really do not have P4D level, it should be okay
to add P4D_SIZE in the above calculation.
>
> [...]
>> +
>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>> +static void pud_clear_tests(pud_t *pudp)
>> +{
>> + memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
>> + pud_clear(pudp);
>> + WARN_ON(!pud_none(READ_ONCE(*pudp)));
>> +}
>
> For pgd/p4d/pud_clear(), we only clear if the page table level is present
> and not folded. The memset() here overwrites the table type bits, so
> pud_clear() will not clear anything on s390 and the pud_none() check will
> fail.
> Would it be possible to OR a (larger) random value into the table, so that
> the lower 12 bits would be preserved?
So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
it should OR a large random value preserving lower 12 bits. Hmm, this should
still do the trick for other platforms, they just need non zero value. So on
s390, the lower 12 bits on the page table entry already has valid value while
entering this function which would make sure that pud_clear() really does
clear the entry ?
>
>> +
>> +static void pud_populate_tests(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
>> +{
>> + /*
>> + * This entry points to next level page table page.
>> + * Hence this must not qualify as pud_bad().
>> + */
>> + pmd_clear(pmdp);
>> + pud_clear(pudp);
>> + pud_populate(mm, pudp, pmdp);
>> + WARN_ON(pud_bad(READ_ONCE(*pudp)));
>> +}
>
> This will populate the pud with a pmd pointer that does not point to the
> beginning of the pmd table, but to the second entry (because of how
> VADDR_TEST is constructed). This will result in failing pud_bad() check
> on s390. Not sure why/how it works on other archs, but would it be possible
> to align pmdp down to the beginning of the pmd table (and similar for the
> other pxd_populate_tests)?
Right, that was a problem. Will fix it by using the saved entries used for
freeing the page table pages at the end, which always point to the beginning
of a page table page.
>
> [...]
>> +
>> + p4d_free(mm, saved_p4dp);
>> + pud_free(mm, saved_pudp);
>> + pmd_free(mm, saved_pmdp);
>> + pte_free(mm, (pgtable_t) virt_to_page(saved_ptep));
>
> pgtable_t is arch-specific, and on s390 it is not a struct page pointer,
> but a pte pointer. So this will go wrong, also on all other archs (if any)
> where pgtable_t is not struct page.
> Would it be possible to use pte_free_kernel() instead, and just pass
> saved_ptep directly?
It makes sense, will change.
On Thu, 5 Sep 2019 14:48:14 +0530
Anshuman Khandual <[email protected]> wrote:
> > [...]
> >> +
> >> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
> >> +static void pud_clear_tests(pud_t *pudp)
> >> +{
> >> + memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
> >> + pud_clear(pudp);
> >> + WARN_ON(!pud_none(READ_ONCE(*pudp)));
> >> +}
> >
> > For pgd/p4d/pud_clear(), we only clear if the page table level is present
> > and not folded. The memset() here overwrites the table type bits, so
> > pud_clear() will not clear anything on s390 and the pud_none() check will
> > fail.
> > Would it be possible to OR a (larger) random value into the table, so that
> > the lower 12 bits would be preserved?
>
> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
> it should OR a large random value preserving lower 12 bits. Hmm, this should
> still do the trick for other platforms, they just need non zero value. So on
> s390, the lower 12 bits on the page table entry already has valid value while
> entering this function which would make sure that pud_clear() really does
> clear the entry ?
Yes, in theory the table entry on s390 would have the type set in the last
4 bits, so preserving those would be enough. If it does not conflict with
others, I would still suggest preserving all 12 bits since those would contain
arch-specific flags in general, just to be sure. For s390, the pte/pmd tests
would also work with the memset, but for consistency I think the same logic
should be used in all pxd_clear_tests.
However, there is another issue on s390 which will make this only work
for pud_clear_tests(), and not for the p4d/pgd_tests. The problem is that
mm_alloc() will only give you a 3-level page table initially on s390.
This means that pudp == p4dp == pgdp, and so the p4d/pgd_tests will
both see the pud level (of course this also affects other tests).
Not sure yet how to fix this, i.e. how to initialize/update the page table
to 5 levels. We can handle 5 level page tables, and it would be good if
all levels could be tested, but using mm_alloc() to establish the page
tables might not work on s390. One option could be to provide an arch-hook
or weak function to allocate/initialize the mm.
IIUC, the (dummy) mm is really only needed to provide an mm->pgd as starting
point, right?
Regards,
Gerald
On 09/05/2019 10:36 PM, Gerald Schaefer wrote:
> On Thu, 5 Sep 2019 14:48:14 +0530
> Anshuman Khandual <[email protected]> wrote:
>
>>> [...]
>>>> +
>>>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>>>> +static void pud_clear_tests(pud_t *pudp)
>>>> +{
>>>> + memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
>>>> + pud_clear(pudp);
>>>> + WARN_ON(!pud_none(READ_ONCE(*pudp)));
>>>> +}
>>>
>>> For pgd/p4d/pud_clear(), we only clear if the page table level is present
>>> and not folded. The memset() here overwrites the table type bits, so
>>> pud_clear() will not clear anything on s390 and the pud_none() check will
>>> fail.
>>> Would it be possible to OR a (larger) random value into the table, so that
>>> the lower 12 bits would be preserved?
>>
>> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
>> it should OR a large random value preserving lower 12 bits. Hmm, this should
>> still do the trick for other platforms, they just need non zero value. So on
>> s390, the lower 12 bits on the page table entry already has valid value while
>> entering this function which would make sure that pud_clear() really does
>> clear the entry ?
>
> Yes, in theory the table entry on s390 would have the type set in the last
> 4 bits, so preserving those would be enough. If it does not conflict with
> others, I would still suggest preserving all 12 bits since those would contain
> arch-specific flags in general, just to be sure. For s390, the pte/pmd tests
> would also work with the memset, but for consistency I think the same logic
> should be used in all pxd_clear_tests.
Makes sense but..
There is a small challenge with this. Modifying individual bits on a given
page table entry from generic code like this test case is bit tricky. That
is because there are not enough helpers to create entries with an absolute
value. This would have been easier if all the platforms provided functions
like __pxx() which is not the case now. Otherwise something like this should
have worked.
pud_t pud = READ_ONCE(*pudp);
pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0))
WRITE_ONCE(*pudp, pud);
But __pud() will fail to build in many platforms.
The other alternative will be to make sure memset() happens on all other
bits except the lower 12 bits which will depend on endianness. If s390
has a fixed endianness, we can still use either of them which will hold
good for others as well.
memset(pudp, RANDOM_NZVALUE, sizeof(pud_t) - 3);
OR
memset(pudp + 3, RANDOM_NZVALUE, sizeof(pud_t) - 3);
>
> However, there is another issue on s390 which will make this only work
> for pud_clear_tests(), and not for the p4d/pgd_tests. The problem is that
> mm_alloc() will only give you a 3-level page table initially on s390.
> This means that pudp == p4dp == pgdp, and so the p4d/pgd_tests will
> both see the pud level (of course this also affects other tests).
Got it.
>
> Not sure yet how to fix this, i.e. how to initialize/update the page table
> to 5 levels. We can handle 5 level page tables, and it would be good if
> all levels could be tested, but using mm_alloc() to establish the page
> tables might not work on s390. One option could be to provide an arch-hook
> or weak function to allocate/initialize the mm.
Sure, got it. Though I plan to do add some arch specific tests or init sequence
like the above later on but for now the idea is to get the smallest possible set
of test cases which builds and runs on all platforms without requiring any arch
specific hooks or special casing (#ifdef) to be agreed upon broadly and accepted.
Do you think this is absolutely necessary on s390 for the very first set of test
cases or we can add this later on as an improvement ?
>
> IIUC, the (dummy) mm is really only needed to provide an mm->pgd as starting
> point, right?
Right.
On 09/05/2019 02:29 PM, Kirill A. Shutemov wrote:
> On Thu, Sep 05, 2019 at 01:48:27PM +0530, Anshuman Khandual wrote:
>>>> +#define VADDR_TEST (PGDIR_SIZE + PUD_SIZE + PMD_SIZE + PAGE_SIZE)
>>>
>>> What is special about this address? How do you know if it is not occupied
>>> yet?
>>
>> We are creating the page table from scratch after allocating an mm_struct
>> for a given random virtual address 'VADDR_TEST'. Hence nothing is occupied
>> just yet. There is nothing special about this address, just that it tries
>> to ensure the page table entries are being created with some offset from
>> beginning of respective page table page at all levels ? The idea is to
>> have a more representative form of page table structure for test.
>
> Why P4D_SIZE is missing?
That was an omission even though I was wondering whether it will be
applicable or even make sense on platforms which dont have real P4D.
>
> Are you sure it will not land into kernel address space on any arch?
Can it even cross user virtual address range with just a single span
at each page table level ? TBH I did not think about that possibility.
>
> I think more robust way to deal with this would be using
> get_unmapped_area() instead of fixed address.
Makes sense and probably its better to get a virtual address which
is known to have been checked against all boundary conditions. Will
explore around get_unmapped_area() in this regard.
>
>> This makes sense for runtime cases but there is a problem here.
>>
>> On arm64, pgd_populate() which takes (pud_t *) as last argument instead of
>> (p4d_t *) will fail to build when not wrapped in !__PAGETABLE_P4D_FOLDED
>> on certain configurations.
>>
>> ./arch/arm64/include/asm/pgalloc.h:81:75: note:
>> expected ‘pud_t *’ {aka ‘struct <anonymous> *’}
>> but argument is of type ‘pgd_t *’ {aka ‘struct <anonymous> *’}
>> static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, pud_t *pudp)
>> ~~~~~~~^~~~
>> Wondering if this is something to be fixed on arm64 or its more general
>> problem. Will look into this further.
>
> I think you need wrap this into #ifndef __ARCH_HAS_5LEVEL_HACK.
Okay.
>
>>>> + pmd_populate_tests(mm, pmdp, (pgtable_t) page);
>>>
>>> This is not correct for architectures that defines pgtable_t as pte_t
>>> pointer, not struct page pointer.
>>
>> Right, a grep on the source confirms that.
>>
>> These platforms define pgtable_t as struct page *
>>
>> arch/alpha/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/arm/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/arm64/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/csky/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/hexagon/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/ia64/include/asm/page.h: typedef struct page *pgtable_t;
>> arch/ia64/include/asm/page.h: typedef struct page *pgtable_t;
>> arch/m68k/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/microblaze/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/mips/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/nds32/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/nios2/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/openrisc/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/parisc/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/riscv/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/sh/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/sparc/include/asm/page_32.h:typedef struct page *pgtable_t;
>> arch/um/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/unicore32/include/asm/page.h:typedef struct page *pgtable_t;
>> arch/x86/include/asm/pgtable_types.h:typedef struct page *pgtable_t;
>> arch/xtensa/include/asm/page.h:typedef struct page *pgtable_t;
>>
>> These platforms define pgtable_t as pte_t *
>>
>> arch/arc/include/asm/page.h:typedef pte_t * pgtable_t;
>> arch/powerpc/include/asm/mmu.h:typedef pte_t *pgtable_t;
>> arch/s390/include/asm/page.h:typedef pte_t *pgtable_t;
>> arch/sparc/include/asm/page_64.h:typedef pte_t *pgtable_t;
>>
>> Should we need have two pmd_populate_tests() definitions with
>> different arguments (struct page pointer or pte_t pointer) and then
>> call either one after detecting the given platform ?
>
> Use pte_alloc_one() instead of alloc_mapped_page() to allocate the page
> table.
Right, the PTE page table page should come from pte_alloc_one() instead
directly from a struct page. The functions pte_alloc_one() and pte_free()
operate on (struct page or pte_t) pointers depending applicable pgtable_t
definition (in cases where platform defines otherwise). Will fix it.
>
>>>> + pud_populate_tests(mm, pudp, pmdp);
>>>> + p4d_populate_tests(mm, p4dp, pudp);
>>>> + pgd_populate_tests(mm, pgdp, p4dp);
>>>
>>> This is wrong. All p?dp points to the second entry in page table entry.
>>> This is not valid pointer for page table and triggers p?d_bad() on x86.
>>
>> Yeah these are second entries because of the way we create the page table.
>> But I guess its applicable only to the second argument in all these above
>> cases because the first argument can be any valid entry on previous page
>> table level.
>
> Yes:
>
> @@ -397,9 +396,9 @@ static int __init arch_pgtable_tests_init(void)
> pgd_clear_tests(pgdp);
>
> pmd_populate_tests(mm, pmdp, (pgtable_t) page);
> - pud_populate_tests(mm, pudp, pmdp);
> - p4d_populate_tests(mm, p4dp, pudp);
> - pgd_populate_tests(mm, pgdp, p4dp);
> + pud_populate_tests(mm, pudp, saved_pmdp);
> + p4d_populate_tests(mm, p4dp, saved_pudp);
> + pgd_populate_tests(mm, pgdp, saved_p4dp);
Sure.
On Fri, 6 Sep 2019 11:58:59 +0530
Anshuman Khandual <[email protected]> wrote:
> On 09/05/2019 10:36 PM, Gerald Schaefer wrote:
> > On Thu, 5 Sep 2019 14:48:14 +0530
> > Anshuman Khandual <[email protected]> wrote:
> >
> >>> [...]
> >>>> +
> >>>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
> >>>> +static void pud_clear_tests(pud_t *pudp)
> >>>> +{
> >>>> + memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
> >>>> + pud_clear(pudp);
> >>>> + WARN_ON(!pud_none(READ_ONCE(*pudp)));
> >>>> +}
> >>>
> >>> For pgd/p4d/pud_clear(), we only clear if the page table level is present
> >>> and not folded. The memset() here overwrites the table type bits, so
> >>> pud_clear() will not clear anything on s390 and the pud_none() check will
> >>> fail.
> >>> Would it be possible to OR a (larger) random value into the table, so that
> >>> the lower 12 bits would be preserved?
> >>
> >> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
> >> it should OR a large random value preserving lower 12 bits. Hmm, this should
> >> still do the trick for other platforms, they just need non zero value. So on
> >> s390, the lower 12 bits on the page table entry already has valid value while
> >> entering this function which would make sure that pud_clear() really does
> >> clear the entry ?
> >
> > Yes, in theory the table entry on s390 would have the type set in the last
> > 4 bits, so preserving those would be enough. If it does not conflict with
> > others, I would still suggest preserving all 12 bits since those would contain
> > arch-specific flags in general, just to be sure. For s390, the pte/pmd tests
> > would also work with the memset, but for consistency I think the same logic
> > should be used in all pxd_clear_tests.
>
> Makes sense but..
>
> There is a small challenge with this. Modifying individual bits on a given
> page table entry from generic code like this test case is bit tricky. That
> is because there are not enough helpers to create entries with an absolute
> value. This would have been easier if all the platforms provided functions
> like __pxx() which is not the case now. Otherwise something like this should
> have worked.
>
>
> pud_t pud = READ_ONCE(*pudp);
> pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0))
> WRITE_ONCE(*pudp, pud);
>
> But __pud() will fail to build in many platforms.
Hmm, I simply used this on my system to make pud_clear_tests() work, not
sure if it works on all archs:
pud_val(*pudp) |= RANDOM_NZVALUE;
>
> The other alternative will be to make sure memset() happens on all other
> bits except the lower 12 bits which will depend on endianness. If s390
> has a fixed endianness, we can still use either of them which will hold
> good for others as well.
>
> memset(pudp, RANDOM_NZVALUE, sizeof(pud_t) - 3);
>
> OR
>
> memset(pudp + 3, RANDOM_NZVALUE, sizeof(pud_t) - 3);
>
> >
> > However, there is another issue on s390 which will make this only work
> > for pud_clear_tests(), and not for the p4d/pgd_tests. The problem is that
> > mm_alloc() will only give you a 3-level page table initially on s390.
> > This means that pudp == p4dp == pgdp, and so the p4d/pgd_tests will
> > both see the pud level (of course this also affects other tests).
>
> Got it.
>
> >
> > Not sure yet how to fix this, i.e. how to initialize/update the page table
> > to 5 levels. We can handle 5 level page tables, and it would be good if
> > all levels could be tested, but using mm_alloc() to establish the page
> > tables might not work on s390. One option could be to provide an arch-hook
> > or weak function to allocate/initialize the mm.
>
> Sure, got it. Though I plan to do add some arch specific tests or init sequence
> like the above later on but for now the idea is to get the smallest possible set
> of test cases which builds and runs on all platforms without requiring any arch
> specific hooks or special casing (#ifdef) to be agreed upon broadly and accepted.
>
> Do you think this is absolutely necessary on s390 for the very first set of test
> cases or we can add this later on as an improvement ?
It can be added later, no problem. I did not expect this to work flawlessly
on s390 right from the start anyway, with all our peculiarities, so don't
let this hinder you. I might come up with an add-on patch later.
Actually, using get_unmapped_area() as suggested by Kirill could also
solve this issue. We do create a new mm with 3-level page tables on s390,
and the dynamic upgrade to 4 or 5 levels is then triggered exactly by
arch_get_unmapped_area(), depending on the addr. But I currently don't
see how / where arch_get_unmapped_area() is set up for such a dummy mm
created by mm_alloc().
Regards,
Gerald
On 09/07/2019 12:33 AM, Gerald Schaefer wrote:
> On Fri, 6 Sep 2019 11:58:59 +0530
> Anshuman Khandual <[email protected]> wrote:
>
>> On 09/05/2019 10:36 PM, Gerald Schaefer wrote:
>>> On Thu, 5 Sep 2019 14:48:14 +0530
>>> Anshuman Khandual <[email protected]> wrote:
>>>
>>>>> [...]
>>>>>> +
>>>>>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>>>>>> +static void pud_clear_tests(pud_t *pudp)
>>>>>> +{
>>>>>> + memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
>>>>>> + pud_clear(pudp);
>>>>>> + WARN_ON(!pud_none(READ_ONCE(*pudp)));
>>>>>> +}
>>>>>
>>>>> For pgd/p4d/pud_clear(), we only clear if the page table level is present
>>>>> and not folded. The memset() here overwrites the table type bits, so
>>>>> pud_clear() will not clear anything on s390 and the pud_none() check will
>>>>> fail.
>>>>> Would it be possible to OR a (larger) random value into the table, so that
>>>>> the lower 12 bits would be preserved?
>>>>
>>>> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
>>>> it should OR a large random value preserving lower 12 bits. Hmm, this should
>>>> still do the trick for other platforms, they just need non zero value. So on
>>>> s390, the lower 12 bits on the page table entry already has valid value while
>>>> entering this function which would make sure that pud_clear() really does
>>>> clear the entry ?
>>>
>>> Yes, in theory the table entry on s390 would have the type set in the last
>>> 4 bits, so preserving those would be enough. If it does not conflict with
>>> others, I would still suggest preserving all 12 bits since those would contain
>>> arch-specific flags in general, just to be sure. For s390, the pte/pmd tests
>>> would also work with the memset, but for consistency I think the same logic
>>> should be used in all pxd_clear_tests.
>>
>> Makes sense but..
>>
>> There is a small challenge with this. Modifying individual bits on a given
>> page table entry from generic code like this test case is bit tricky. That
>> is because there are not enough helpers to create entries with an absolute
>> value. This would have been easier if all the platforms provided functions
>> like __pxx() which is not the case now. Otherwise something like this should
>> have worked.
>>
>>
>> pud_t pud = READ_ONCE(*pudp);
>> pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0))
>> WRITE_ONCE(*pudp, pud);
>>
>> But __pud() will fail to build in many platforms.
>
> Hmm, I simply used this on my system to make pud_clear_tests() work, not
> sure if it works on all archs:
>
> pud_val(*pudp) |= RANDOM_NZVALUE;
Which compiles on arm64 but then fails on x86 because of the way pmd_val()
has been defined there. on arm64 and s390 (with many others) pmd_val() is
a macro which still got the variable that can be used as lvalue but that is
not true for some other platforms like x86.
arch/arm64/include/asm/pgtable-types.h: #define pmd_val(x) ((x).pmd)
arch/s390/include/asm/page.h: #define pmd_val(x) ((x).pmd)
arch/x86/include/asm/pgtable.h: #define pmd_val(x) native_pmd_val(x)
static inline pmdval_t native_pmd_val(pmd_t pmd)
{
return pmd.pmd;
}
Unless I am mistaken, the return value from this function can not be used as
lvalue for future assignments.
mm/arch_pgtable_test.c: In function ‘pud_clear_tests’:
mm/arch_pgtable_test.c:156:17: error: lvalue required as left operand of assignment
pud_val(*pudp) |= RANDOM_ORVALUE;
^~
AFAICS pxx_val() were never intended to be used as lvalue and using it that way
might just happen to work on all those platforms which define them as macros.
They meant to just provide values for an entry as being determined by the platform.
In principle pxx_val() on an entry was not supposed to be modified directly from
generic code without going through (again) platform helpers for any specific state
change (write, old, dirty, special, huge etc). The current use case is a deviation
for that.
I originally went with memset() just to load up the entries with non-zero value so
that we know pxx_clear() are really doing the clearing. The same is being followed
for all pxx_same() checks.
Another way for fixing the problem would be to mark them with known attributes
like write/young/huge etc instead which for sure will create non-zero entries.
We can do that for pxx_clear() and pxx_same() tests and drop RANDOM_NZVALUE
completely. Does that sound good ?
>
>>
>> The other alternative will be to make sure memset() happens on all other
>> bits except the lower 12 bits which will depend on endianness. If s390
>> has a fixed endianness, we can still use either of them which will hold
>> good for others as well.
>>
>> memset(pudp, RANDOM_NZVALUE, sizeof(pud_t) - 3);
>>
>> OR
>>
>> memset(pudp + 3, RANDOM_NZVALUE, sizeof(pud_t) - 3);
>>
>>>
>>> However, there is another issue on s390 which will make this only work
>>> for pud_clear_tests(), and not for the p4d/pgd_tests. The problem is that
>>> mm_alloc() will only give you a 3-level page table initially on s390.
>>> This means that pudp == p4dp == pgdp, and so the p4d/pgd_tests will
>>> both see the pud level (of course this also affects other tests).
>>
>> Got it.
>>
>>>
>>> Not sure yet how to fix this, i.e. how to initialize/update the page table
>>> to 5 levels. We can handle 5 level page tables, and it would be good if
>>> all levels could be tested, but using mm_alloc() to establish the page
>>> tables might not work on s390. One option could be to provide an arch-hook
>>> or weak function to allocate/initialize the mm.
>>
>> Sure, got it. Though I plan to do add some arch specific tests or init sequence
>> like the above later on but for now the idea is to get the smallest possible set
>> of test cases which builds and runs on all platforms without requiring any arch
>> specific hooks or special casing (#ifdef) to be agreed upon broadly and accepted.
>>
>> Do you think this is absolutely necessary on s390 for the very first set of test
>> cases or we can add this later on as an improvement ?
>
> It can be added later, no problem. I did not expect this to work flawlessly
> on s390 right from the start anyway, with all our peculiarities, so don't
> let this hinder you. I might come up with an add-on patch later.
Sure.
>
> Actually, using get_unmapped_area() as suggested by Kirill could also
> solve this issue. We do create a new mm with 3-level page tables on s390,
> and the dynamic upgrade to 4 or 5 levels is then triggered exactly by
> arch_get_unmapped_area(), depending on the addr. But I currently don't
> see how / where arch_get_unmapped_area() is set up for such a dummy mm
> created by mm_alloc().
Normally they are set during program loading but we can set it up explicitly
for the test mm_struct if we need to but there are some other challenges.
load_[aout|elf|flat|..]_binary()
setup_new_exec()
arch_pick_mmap_layout().
I did some initial experiments around get_unmapped_area(). Seems bit tricky
to get it working on a pure 'test' mm_struct. It expects a real user context
in the form of current->mm.
get_unmapped_area()
{
....
get_area = current->mm->get_unmapped_area;
....
addr = get_area(file, addr, len, pgoff, flags); {
....
struct mm_struct *mm = current->mm;
....
if (addr) {
...
vma = find_vma_prev(mm, addr, &prev);
}
....
vm_unmapped_area() {
struct mm_struct *mm = current->mm;
....
/* Walks across mm->mm_rb.rb_node */
}
}
....
}
Simple call like get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0) to get an
address fails right away on current->mm->get_unmapped_area which does
not have a valid value in the kernel context.
There might be two methods to get around this problem
1) Write a custom get_unmapped_area() imitating the real one but going
around the problem by taking an appropriately initialized mm_struct
instead of current->mm.
2) Create dummy user task with dummy mm, switch 'current' context before
calling into get_unmapped_area() and switch back again. Dont know if
this is even possible.
Wondering if this might deviate too much from the original goal of
testing the page table helpers.
Looking back again at the proposed test vaddr, wondering what will be the
real problem in case it goes beyond user address range ? Will pxx_alloc()
fail to create page table ranges at required level ? Apart from skipping
pgtable_page_ctor/dtor for page table pages, it might not really affect
any helpers as such.
VADDR_TEST (PGDIR_SIZE + [P4D_SIZE] + PUD_SIZE + PMD_SIZE + PAGE_SIZE)
OR
A random page aligned address in [FIRST_USER_ADDRESS..TASK_SIZE] range ?
On 09/09/2019 08:43 PM, Kirill A. Shutemov wrote:
> On Mon, Sep 09, 2019 at 11:56:50AM +0530, Anshuman Khandual wrote:
>>
>>
>> On 09/07/2019 12:33 AM, Gerald Schaefer wrote:
>>> On Fri, 6 Sep 2019 11:58:59 +0530
>>> Anshuman Khandual <[email protected]> wrote:
>>>
>>>> On 09/05/2019 10:36 PM, Gerald Schaefer wrote:
>>>>> On Thu, 5 Sep 2019 14:48:14 +0530
>>>>> Anshuman Khandual <[email protected]> wrote:
>>>>>
>>>>>>> [...]
>>>>>>>> +
>>>>>>>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>>>>>>>> +static void pud_clear_tests(pud_t *pudp)
>>>>>>>> +{
>>>>>>>> + memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
>>>>>>>> + pud_clear(pudp);
>>>>>>>> + WARN_ON(!pud_none(READ_ONCE(*pudp)));
>>>>>>>> +}
>>>>>>>
>>>>>>> For pgd/p4d/pud_clear(), we only clear if the page table level is present
>>>>>>> and not folded. The memset() here overwrites the table type bits, so
>>>>>>> pud_clear() will not clear anything on s390 and the pud_none() check will
>>>>>>> fail.
>>>>>>> Would it be possible to OR a (larger) random value into the table, so that
>>>>>>> the lower 12 bits would be preserved?
>>>>>>
>>>>>> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
>>>>>> it should OR a large random value preserving lower 12 bits. Hmm, this should
>>>>>> still do the trick for other platforms, they just need non zero value. So on
>>>>>> s390, the lower 12 bits on the page table entry already has valid value while
>>>>>> entering this function which would make sure that pud_clear() really does
>>>>>> clear the entry ?
>>>>>
>>>>> Yes, in theory the table entry on s390 would have the type set in the last
>>>>> 4 bits, so preserving those would be enough. If it does not conflict with
>>>>> others, I would still suggest preserving all 12 bits since those would contain
>>>>> arch-specific flags in general, just to be sure. For s390, the pte/pmd tests
>>>>> would also work with the memset, but for consistency I think the same logic
>>>>> should be used in all pxd_clear_tests.
>>>>
>>>> Makes sense but..
>>>>
>>>> There is a small challenge with this. Modifying individual bits on a given
>>>> page table entry from generic code like this test case is bit tricky. That
>>>> is because there are not enough helpers to create entries with an absolute
>>>> value. This would have been easier if all the platforms provided functions
>>>> like __pxx() which is not the case now. Otherwise something like this should
>>>> have worked.
>>>>
>>>>
>>>> pud_t pud = READ_ONCE(*pudp);
>>>> pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0))
>>>> WRITE_ONCE(*pudp, pud);
>>>>
>>>> But __pud() will fail to build in many platforms.
>>>
>>> Hmm, I simply used this on my system to make pud_clear_tests() work, not
>>> sure if it works on all archs:
>>>
>>> pud_val(*pudp) |= RANDOM_NZVALUE;
>>
>> Which compiles on arm64 but then fails on x86 because of the way pmd_val()
>> has been defined there.
>
> Use instead
>
> *pudp = __pud(pud_val(*pudp) | RANDOM_NZVALUE);
Agreed.
As I had mentioned before this would have been really the cleanest approach.
>
> It *should* be more portable.
Not really, because not all the platforms have __pxx() definitions right now.
Going with these will clearly cause build failures on affected platforms. Lets
examine __pud() for instance. It is defined only on these platforms.
arch/arm64/include/asm/pgtable-types.h: #define __pud(x) ((pud_t) { (x) } )
arch/mips/include/asm/pgtable-64.h: #define __pud(x) ((pud_t) { (x) })
arch/powerpc/include/asm/pgtable-be-types.h: #define __pud(x) ((pud_t) { cpu_to_be64(x) })
arch/powerpc/include/asm/pgtable-types.h: #define __pud(x) ((pud_t) { (x) })
arch/s390/include/asm/page.h: #define __pud(x) ((pud_t) { (x) } )
arch/sparc/include/asm/page_64.h: #define __pud(x) ((pud_t) { (x) } )
arch/sparc/include/asm/page_64.h: #define __pud(x) (x)
arch/x86/include/asm/pgtable.h: #define __pud(x) native_make_pud(x)
Similarly for __pmd()
arch/alpha/include/asm/page.h: #define __pmd(x) ((pmd_t) { (x) } )
arch/arm/include/asm/page-nommu.h: #define __pmd(x) (x)
arch/arm/include/asm/pgtable-2level-types.h: #define __pmd(x) ((pmd_t) { (x) } )
arch/arm/include/asm/pgtable-2level-types.h: #define __pmd(x) (x)
arch/arm/include/asm/pgtable-3level-types.h: #define __pmd(x) ((pmd_t) { (x) } )
arch/arm/include/asm/pgtable-3level-types.h: #define __pmd(x) (x)
arch/arm64/include/asm/pgtable-types.h: #define __pmd(x) ((pmd_t) { (x) } )
arch/m68k/include/asm/page.h: #define __pmd(x) ((pmd_t) { { (x) }, })
arch/mips/include/asm/pgtable-64.h: #define __pmd(x) ((pmd_t) { (x) } )
arch/nds32/include/asm/page.h: #define __pmd(x) (x)
arch/parisc/include/asm/page.h: #define __pmd(x) ((pmd_t) { (x) } )
arch/parisc/include/asm/page.h: #define __pmd(x) (x)
arch/powerpc/include/asm/pgtable-be-types.h: #define __pmd(x) ((pmd_t) { cpu_to_be64(x) })
arch/powerpc/include/asm/pgtable-types.h: #define __pmd(x) ((pmd_t) { (x) })
arch/riscv/include/asm/pgtable-64.h: #define __pmd(x) ((pmd_t) { (x) })
arch/s390/include/asm/page.h: #define __pmd(x) ((pmd_t) { (x) } )
arch/sh/include/asm/pgtable-3level.h: #define __pmd(x) ((pmd_t) { (x) } )
arch/sparc/include/asm/page_32.h: #define __pmd(x) ((pmd_t) { { (x) }, })
arch/sparc/include/asm/page_32.h: #define __pmd(x) ((pmd_t) { { (x) }, })
arch/sparc/include/asm/page_64.h: #define __pmd(x) ((pmd_t) { (x) } )
arch/sparc/include/asm/page_64.h: #define __pmd(x) (x)
arch/um/include/asm/page.h: #define __pmd(x) ((pmd_t) { (x) } )
arch/um/include/asm/page.h: #define __pmd(x) ((pmd_t) { (x) } )
arch/x86/include/asm/pgtable.h: #define __pmd(x) native_make_pmd(x)
Similarly for __pgd()
arch/alpha/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
arch/alpha/include/asm/page.h: #define __pgd(x) (x)
arch/arc/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
arch/arc/include/asm/page.h: #define __pgd(x) (x)
arch/arm/include/asm/pgtable-3level-types.h: #define __pgd(x) ((pgd_t) { (x) } )
arch/arm/include/asm/pgtable-3level-types.h: #define __pgd(x) (x)
arch/arm64/include/asm/pgtable-types.h: #define __pgd(x) ((pgd_t) { (x) } )
arch/csky/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
arch/hexagon/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
arch/m68k/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
arch/mips/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
arch/nds32/include/asm/page.h: #define __pgd(x) (x)
arch/nios2/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
arch/openrisc/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
arch/parisc/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
arch/parisc/include/asm/page.h: #define __pgd(x) (x)
arch/powerpc/include/asm/pgtable-be-types.h: #define __pgd(x) ((pgd_t) { cpu_to_be64(x) })
arch/powerpc/include/asm/pgtable-types.h: #define __pgd(x) ((pgd_t) { (x) })
arch/riscv/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
arch/s390/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
arch/sh/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
arch/sparc/include/asm/page_32.h: #define __pgd(x) ((pgd_t) { (x) } )
arch/sparc/include/asm/page_32.h: #define __pgd(x) (x)
arch/sparc/include/asm/page_64.h: #define __pgd(x) ((pgd_t) { (x) } )
arch/sparc/include/asm/page_64.h: #define __pgd(x) (x)
arch/um/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
arch/unicore32/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
arch/unicore32/include/asm/page.h: #define __pgd(x) (x)
arch/x86/include/asm/pgtable.h: #define __pgd(x) native_make_pgd(x)
arch/xtensa/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
Similarly for __p4d()
arch/s390/include/asm/page.h: #define __p4d(x) ((p4d_t) { (x) } )
arch/x86/include/asm/pgtable.h: #define __p4d(x) native_make_p4d(x)
The search pattern here has been "#define __pxx(". Unless I am missing something,
I dont see how we can use these without risking build failures.
On 09/10/2019 03:56 AM, Anshuman Khandual wrote:
>
>
> On 09/09/2019 08:43 PM, Kirill A. Shutemov wrote:
>> On Mon, Sep 09, 2019 at 11:56:50AM +0530, Anshuman Khandual wrote:
>>>
>>>
>>> On 09/07/2019 12:33 AM, Gerald Schaefer wrote:
>>>> On Fri, 6 Sep 2019 11:58:59 +0530
>>>> Anshuman Khandual <[email protected]> wrote:
>>>>
>>>>> On 09/05/2019 10:36 PM, Gerald Schaefer wrote:
>>>>>> On Thu, 5 Sep 2019 14:48:14 +0530
>>>>>> Anshuman Khandual <[email protected]> wrote:
>>>>>>
>>>>>>>> [...]
>>>>>>>>> +
>>>>>>>>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>>>>>>>>> +static void pud_clear_tests(pud_t *pudp)
>>>>>>>>> +{
>>>>>>>>> + memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
>>>>>>>>> + pud_clear(pudp);
>>>>>>>>> + WARN_ON(!pud_none(READ_ONCE(*pudp)));
>>>>>>>>> +}
>>>>>>>>
>>>>>>>> For pgd/p4d/pud_clear(), we only clear if the page table level is present
>>>>>>>> and not folded. The memset() here overwrites the table type bits, so
>>>>>>>> pud_clear() will not clear anything on s390 and the pud_none() check will
>>>>>>>> fail.
>>>>>>>> Would it be possible to OR a (larger) random value into the table, so that
>>>>>>>> the lower 12 bits would be preserved?
>>>>>>>
>>>>>>> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
>>>>>>> it should OR a large random value preserving lower 12 bits. Hmm, this should
>>>>>>> still do the trick for other platforms, they just need non zero value. So on
>>>>>>> s390, the lower 12 bits on the page table entry already has valid value while
>>>>>>> entering this function which would make sure that pud_clear() really does
>>>>>>> clear the entry ?
>>>>>>
>>>>>> Yes, in theory the table entry on s390 would have the type set in the last
>>>>>> 4 bits, so preserving those would be enough. If it does not conflict with
>>>>>> others, I would still suggest preserving all 12 bits since those would contain
>>>>>> arch-specific flags in general, just to be sure. For s390, the pte/pmd tests
>>>>>> would also work with the memset, but for consistency I think the same logic
>>>>>> should be used in all pxd_clear_tests.
>>>>>
>>>>> Makes sense but..
>>>>>
>>>>> There is a small challenge with this. Modifying individual bits on a given
>>>>> page table entry from generic code like this test case is bit tricky. That
>>>>> is because there are not enough helpers to create entries with an absolute
>>>>> value. This would have been easier if all the platforms provided functions
>>>>> like __pxx() which is not the case now. Otherwise something like this should
>>>>> have worked.
>>>>>
>>>>>
>>>>> pud_t pud = READ_ONCE(*pudp);
>>>>> pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0))
>>>>> WRITE_ONCE(*pudp, pud);
>>>>>
>>>>> But __pud() will fail to build in many platforms.
>>>>
>>>> Hmm, I simply used this on my system to make pud_clear_tests() work, not
>>>> sure if it works on all archs:
>>>>
>>>> pud_val(*pudp) |= RANDOM_NZVALUE;
>>>
>>> Which compiles on arm64 but then fails on x86 because of the way pmd_val()
>>> has been defined there.
>>
>> Use instead
>>
>> *pudp = __pud(pud_val(*pudp) | RANDOM_NZVALUE);
>
> Agreed.
>
> As I had mentioned before this would have been really the cleanest approach.
>
>>
>> It *should* be more portable.
>
> Not really, because not all the platforms have __pxx() definitions right now.
> Going with these will clearly cause build failures on affected platforms. Lets
> examine __pud() for instance. It is defined only on these platforms.
>
> arch/arm64/include/asm/pgtable-types.h: #define __pud(x) ((pud_t) { (x) } )
> arch/mips/include/asm/pgtable-64.h: #define __pud(x) ((pud_t) { (x) })
> arch/powerpc/include/asm/pgtable-be-types.h: #define __pud(x) ((pud_t) { cpu_to_be64(x) })
> arch/powerpc/include/asm/pgtable-types.h: #define __pud(x) ((pud_t) { (x) })
> arch/s390/include/asm/page.h: #define __pud(x) ((pud_t) { (x) } )
> arch/sparc/include/asm/page_64.h: #define __pud(x) ((pud_t) { (x) } )
> arch/sparc/include/asm/page_64.h: #define __pud(x) (x)
> arch/x86/include/asm/pgtable.h: #define __pud(x) native_make_pud(x)
You missed:
arch/x86/include/asm/paravirt.h:static inline pud_t __pud(pudval_t val)
include/asm-generic/pgtable-nop4d-hack.h:#define __pud(x)
((pud_t) { __pgd(x) })
include/asm-generic/pgtable-nopud.h:#define __pud(x)
((pud_t) { __p4d(x) })
>
> Similarly for __pmd()
>
> arch/alpha/include/asm/page.h: #define __pmd(x) ((pmd_t) { (x) } )
> arch/arm/include/asm/page-nommu.h: #define __pmd(x) (x)
> arch/arm/include/asm/pgtable-2level-types.h: #define __pmd(x) ((pmd_t) { (x) } )
> arch/arm/include/asm/pgtable-2level-types.h: #define __pmd(x) (x)
> arch/arm/include/asm/pgtable-3level-types.h: #define __pmd(x) ((pmd_t) { (x) } )
> arch/arm/include/asm/pgtable-3level-types.h: #define __pmd(x) (x)
> arch/arm64/include/asm/pgtable-types.h: #define __pmd(x) ((pmd_t) { (x) } )
> arch/m68k/include/asm/page.h: #define __pmd(x) ((pmd_t) { { (x) }, })
> arch/mips/include/asm/pgtable-64.h: #define __pmd(x) ((pmd_t) { (x) } )
> arch/nds32/include/asm/page.h: #define __pmd(x) (x)
> arch/parisc/include/asm/page.h: #define __pmd(x) ((pmd_t) { (x) } )
> arch/parisc/include/asm/page.h: #define __pmd(x) (x)
> arch/powerpc/include/asm/pgtable-be-types.h: #define __pmd(x) ((pmd_t) { cpu_to_be64(x) })
> arch/powerpc/include/asm/pgtable-types.h: #define __pmd(x) ((pmd_t) { (x) })
> arch/riscv/include/asm/pgtable-64.h: #define __pmd(x) ((pmd_t) { (x) })
> arch/s390/include/asm/page.h: #define __pmd(x) ((pmd_t) { (x) } )
> arch/sh/include/asm/pgtable-3level.h: #define __pmd(x) ((pmd_t) { (x) } )
> arch/sparc/include/asm/page_32.h: #define __pmd(x) ((pmd_t) { { (x) }, })
> arch/sparc/include/asm/page_32.h: #define __pmd(x) ((pmd_t) { { (x) }, })
> arch/sparc/include/asm/page_64.h: #define __pmd(x) ((pmd_t) { (x) } )
> arch/sparc/include/asm/page_64.h: #define __pmd(x) (x)
> arch/um/include/asm/page.h: #define __pmd(x) ((pmd_t) { (x) } )
> arch/um/include/asm/page.h: #define __pmd(x) ((pmd_t) { (x) } )
> arch/x86/include/asm/pgtable.h: #define __pmd(x) native_make_pmd(x)
You missed:
arch/x86/include/asm/paravirt.h:static inline pmd_t __pmd(pmdval_t val)
include/asm-generic/page.h:#define __pmd(x) ((pmd_t) { (x) } )
include/asm-generic/pgtable-nopmd.h:#define __pmd(x)
((pmd_t) { __pud(x) } )
>
> Similarly for __pgd()
>
> arch/alpha/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
> arch/alpha/include/asm/page.h: #define __pgd(x) (x)
> arch/arc/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
> arch/arc/include/asm/page.h: #define __pgd(x) (x)
> arch/arm/include/asm/pgtable-3level-types.h: #define __pgd(x) ((pgd_t) { (x) } )
> arch/arm/include/asm/pgtable-3level-types.h: #define __pgd(x) (x)
> arch/arm64/include/asm/pgtable-types.h: #define __pgd(x) ((pgd_t) { (x) } )
> arch/csky/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
> arch/hexagon/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
> arch/m68k/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
> arch/mips/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
> arch/nds32/include/asm/page.h: #define __pgd(x) (x)
> arch/nios2/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
> arch/openrisc/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
> arch/parisc/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
> arch/parisc/include/asm/page.h: #define __pgd(x) (x)
> arch/powerpc/include/asm/pgtable-be-types.h: #define __pgd(x) ((pgd_t) { cpu_to_be64(x) })
> arch/powerpc/include/asm/pgtable-types.h: #define __pgd(x) ((pgd_t) { (x) })
> arch/riscv/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
> arch/s390/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
> arch/sh/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
> arch/sparc/include/asm/page_32.h: #define __pgd(x) ((pgd_t) { (x) } )
> arch/sparc/include/asm/page_32.h: #define __pgd(x) (x)
> arch/sparc/include/asm/page_64.h: #define __pgd(x) ((pgd_t) { (x) } )
> arch/sparc/include/asm/page_64.h: #define __pgd(x) (x)
> arch/um/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
> arch/unicore32/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
> arch/unicore32/include/asm/page.h: #define __pgd(x) (x)
> arch/x86/include/asm/pgtable.h: #define __pgd(x) native_make_pgd(x)
> arch/xtensa/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
You missed:
arch/x86/include/asm/paravirt.h:static inline pgd_t __pgd(pgdval_t val)
include/asm-generic/page.h:#define __pgd(x) ((pgd_t) { (x) } )
>
> Similarly for __p4d()
>
> arch/s390/include/asm/page.h: #define __p4d(x) ((p4d_t) { (x) } )
> arch/x86/include/asm/pgtable.h: #define __p4d(x) native_make_p4d(x)
You missed:
arch/x86/include/asm/paravirt.h:static inline p4d_t __p4d(p4dval_t val)
include/asm-generic/5level-fixup.h:#define __p4d(x)
__pgd(x)
include/asm-generic/pgtable-nop4d.h:#define __p4d(x)
((p4d_t) { __pgd(x) })
>
> The search pattern here has been "#define __pxx(". Unless I am missing something,
> I dont see how we can use these without risking build failures.
>
I guess you missed that arches not defining them fall back on the
definitions in include/asm-generic
Christophe
On 09/10/2019 10:15 AM, Christophe Leroy wrote:
>
>
> On 09/10/2019 03:56 AM, Anshuman Khandual wrote:
>>
>>
>> On 09/09/2019 08:43 PM, Kirill A. Shutemov wrote:
>>> On Mon, Sep 09, 2019 at 11:56:50AM +0530, Anshuman Khandual wrote:
>>>>
>>>>
>>>> On 09/07/2019 12:33 AM, Gerald Schaefer wrote:
>>>>> On Fri, 6 Sep 2019 11:58:59 +0530
>>>>> Anshuman Khandual <[email protected]> wrote:
>>>>>
>>>>>> On 09/05/2019 10:36 PM, Gerald Schaefer wrote:
>>>>>>> On Thu, 5 Sep 2019 14:48:14 +0530
>>>>>>> Anshuman Khandual <[email protected]> wrote:
>>>>>>>
>>>>>>>>> [...]
>>>>>>>>>> +
>>>>>>>>>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>>>>>>>>>> +static void pud_clear_tests(pud_t *pudp)
>>>>>>>>>> +{
>>>>>>>>>> + memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
>>>>>>>>>> + pud_clear(pudp);
>>>>>>>>>> + WARN_ON(!pud_none(READ_ONCE(*pudp)));
>>>>>>>>>> +}
>>>>>>>>>
>>>>>>>>> For pgd/p4d/pud_clear(), we only clear if the page table level is present
>>>>>>>>> and not folded. The memset() here overwrites the table type bits, so
>>>>>>>>> pud_clear() will not clear anything on s390 and the pud_none() check will
>>>>>>>>> fail.
>>>>>>>>> Would it be possible to OR a (larger) random value into the table, so that
>>>>>>>>> the lower 12 bits would be preserved?
>>>>>>>>
>>>>>>>> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
>>>>>>>> it should OR a large random value preserving lower 12 bits. Hmm, this should
>>>>>>>> still do the trick for other platforms, they just need non zero value. So on
>>>>>>>> s390, the lower 12 bits on the page table entry already has valid value while
>>>>>>>> entering this function which would make sure that pud_clear() really does
>>>>>>>> clear the entry ?
>>>>>>>
>>>>>>> Yes, in theory the table entry on s390 would have the type set in the last
>>>>>>> 4 bits, so preserving those would be enough. If it does not conflict with
>>>>>>> others, I would still suggest preserving all 12 bits since those would contain
>>>>>>> arch-specific flags in general, just to be sure. For s390, the pte/pmd tests
>>>>>>> would also work with the memset, but for consistency I think the same logic
>>>>>>> should be used in all pxd_clear_tests.
>>>>>>
>>>>>> Makes sense but..
>>>>>>
>>>>>> There is a small challenge with this. Modifying individual bits on a given
>>>>>> page table entry from generic code like this test case is bit tricky. That
>>>>>> is because there are not enough helpers to create entries with an absolute
>>>>>> value. This would have been easier if all the platforms provided functions
>>>>>> like __pxx() which is not the case now. Otherwise something like this should
>>>>>> have worked.
>>>>>>
>>>>>>
>>>>>> pud_t pud = READ_ONCE(*pudp);
>>>>>> pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0))
>>>>>> WRITE_ONCE(*pudp, pud);
>>>>>>
>>>>>> But __pud() will fail to build in many platforms.
>>>>>
>>>>> Hmm, I simply used this on my system to make pud_clear_tests() work, not
>>>>> sure if it works on all archs:
>>>>>
>>>>> pud_val(*pudp) |= RANDOM_NZVALUE;
>>>>
>>>> Which compiles on arm64 but then fails on x86 because of the way pmd_val()
>>>> has been defined there.
>>>
>>> Use instead
>>>
>>> *pudp = __pud(pud_val(*pudp) | RANDOM_NZVALUE);
>>
>> Agreed.
>>
>> As I had mentioned before this would have been really the cleanest approach.
>>
>>>
>>> It *should* be more portable.
>>
>> Not really, because not all the platforms have __pxx() definitions right now.
>> Going with these will clearly cause build failures on affected platforms. Lets
>> examine __pud() for instance. It is defined only on these platforms.
>>
>> arch/arm64/include/asm/pgtable-types.h: #define __pud(x) ((pud_t) { (x) } )
>> arch/mips/include/asm/pgtable-64.h: #define __pud(x) ((pud_t) { (x) })
>> arch/powerpc/include/asm/pgtable-be-types.h: #define __pud(x) ((pud_t) { cpu_to_be64(x) })
>> arch/powerpc/include/asm/pgtable-types.h: #define __pud(x) ((pud_t) { (x) })
>> arch/s390/include/asm/page.h: #define __pud(x) ((pud_t) { (x) } )
>> arch/sparc/include/asm/page_64.h: #define __pud(x) ((pud_t) { (x) } )
>> arch/sparc/include/asm/page_64.h: #define __pud(x) (x)
>> arch/x86/include/asm/pgtable.h: #define __pud(x) native_make_pud(x)
>
> You missed:
> arch/x86/include/asm/paravirt.h:static inline pud_t __pud(pudval_t val)
> include/asm-generic/pgtable-nop4d-hack.h:#define __pud(x) ((pud_t) { __pgd(x) })
> include/asm-generic/pgtable-nopud.h:#define __pud(x) ((pud_t) { __p4d(x) })
>
>>
>> Similarly for __pmd()
>>
>> arch/alpha/include/asm/page.h: #define __pmd(x) ((pmd_t) { (x) } )
>> arch/arm/include/asm/page-nommu.h: #define __pmd(x) (x)
>> arch/arm/include/asm/pgtable-2level-types.h: #define __pmd(x) ((pmd_t) { (x) } )
>> arch/arm/include/asm/pgtable-2level-types.h: #define __pmd(x) (x)
>> arch/arm/include/asm/pgtable-3level-types.h: #define __pmd(x) ((pmd_t) { (x) } )
>> arch/arm/include/asm/pgtable-3level-types.h: #define __pmd(x) (x)
>> arch/arm64/include/asm/pgtable-types.h: #define __pmd(x) ((pmd_t) { (x) } )
>> arch/m68k/include/asm/page.h: #define __pmd(x) ((pmd_t) { { (x) }, })
>> arch/mips/include/asm/pgtable-64.h: #define __pmd(x) ((pmd_t) { (x) } )
>> arch/nds32/include/asm/page.h: #define __pmd(x) (x)
>> arch/parisc/include/asm/page.h: #define __pmd(x) ((pmd_t) { (x) } )
>> arch/parisc/include/asm/page.h: #define __pmd(x) (x)
>> arch/powerpc/include/asm/pgtable-be-types.h: #define __pmd(x) ((pmd_t) { cpu_to_be64(x) })
>> arch/powerpc/include/asm/pgtable-types.h: #define __pmd(x) ((pmd_t) { (x) })
>> arch/riscv/include/asm/pgtable-64.h: #define __pmd(x) ((pmd_t) { (x) })
>> arch/s390/include/asm/page.h: #define __pmd(x) ((pmd_t) { (x) } )
>> arch/sh/include/asm/pgtable-3level.h: #define __pmd(x) ((pmd_t) { (x) } )
>> arch/sparc/include/asm/page_32.h: #define __pmd(x) ((pmd_t) { { (x) }, })
>> arch/sparc/include/asm/page_32.h: #define __pmd(x) ((pmd_t) { { (x) }, })
>> arch/sparc/include/asm/page_64.h: #define __pmd(x) ((pmd_t) { (x) } )
>> arch/sparc/include/asm/page_64.h: #define __pmd(x) (x)
>> arch/um/include/asm/page.h: #define __pmd(x) ((pmd_t) { (x) } )
>> arch/um/include/asm/page.h: #define __pmd(x) ((pmd_t) { (x) } )
>> arch/x86/include/asm/pgtable.h: #define __pmd(x) native_make_pmd(x)
>
> You missed:
> arch/x86/include/asm/paravirt.h:static inline pmd_t __pmd(pmdval_t val)
> include/asm-generic/page.h:#define __pmd(x) ((pmd_t) { (x) } )
> include/asm-generic/pgtable-nopmd.h:#define __pmd(x) ((pmd_t) { __pud(x) } )
>
>
>>
>> Similarly for __pgd()
>>
>> arch/alpha/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
>> arch/alpha/include/asm/page.h: #define __pgd(x) (x)
>> arch/arc/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
>> arch/arc/include/asm/page.h: #define __pgd(x) (x)
>> arch/arm/include/asm/pgtable-3level-types.h: #define __pgd(x) ((pgd_t) { (x) } )
>> arch/arm/include/asm/pgtable-3level-types.h: #define __pgd(x) (x)
>> arch/arm64/include/asm/pgtable-types.h: #define __pgd(x) ((pgd_t) { (x) } )
>> arch/csky/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
>> arch/hexagon/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
>> arch/m68k/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
>> arch/mips/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
>> arch/nds32/include/asm/page.h: #define __pgd(x) (x)
>> arch/nios2/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
>> arch/openrisc/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
>> arch/parisc/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
>> arch/parisc/include/asm/page.h: #define __pgd(x) (x)
>> arch/powerpc/include/asm/pgtable-be-types.h: #define __pgd(x) ((pgd_t) { cpu_to_be64(x) })
>> arch/powerpc/include/asm/pgtable-types.h: #define __pgd(x) ((pgd_t) { (x) })
>> arch/riscv/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
>> arch/s390/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
>> arch/sh/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
>> arch/sparc/include/asm/page_32.h: #define __pgd(x) ((pgd_t) { (x) } )
>> arch/sparc/include/asm/page_32.h: #define __pgd(x) (x)
>> arch/sparc/include/asm/page_64.h: #define __pgd(x) ((pgd_t) { (x) } )
>> arch/sparc/include/asm/page_64.h: #define __pgd(x) (x)
>> arch/um/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
>> arch/unicore32/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) })
>> arch/unicore32/include/asm/page.h: #define __pgd(x) (x)
>> arch/x86/include/asm/pgtable.h: #define __pgd(x) native_make_pgd(x)
>> arch/xtensa/include/asm/page.h: #define __pgd(x) ((pgd_t) { (x) } )
>
> You missed:
> arch/x86/include/asm/paravirt.h:static inline pgd_t __pgd(pgdval_t val)
> include/asm-generic/page.h:#define __pgd(x) ((pgd_t) { (x) } )
>
>
>>
>> Similarly for __p4d()
>>
>> arch/s390/include/asm/page.h: #define __p4d(x) ((p4d_t) { (x) } )
>> arch/x86/include/asm/pgtable.h: #define __p4d(x) native_make_p4d(x)
>
> You missed:
> arch/x86/include/asm/paravirt.h:static inline p4d_t __p4d(p4dval_t val)
> include/asm-generic/5level-fixup.h:#define __p4d(x) __pgd(x)
> include/asm-generic/pgtable-nop4d.h:#define __p4d(x) ((p4d_t) { __pgd(x) })
>
>
>>
>> The search pattern here has been "#define __pxx(". Unless I am missing something,
>> I dont see how we can use these without risking build failures.
>>
>
> I guess you missed that arches not defining them fall back on the definitions in include/asm-generic
You are right. I was confused whether these generic definitions were really
applicable for all those platforms as fallback (with so many page table
level folding combinations available) when they dont define. Sure will take
this approach and try to build them on multiple platforms.
On Mon, Sep 09, 2019 at 11:56:50AM +0530, Anshuman Khandual wrote:
>
>
> On 09/07/2019 12:33 AM, Gerald Schaefer wrote:
> > On Fri, 6 Sep 2019 11:58:59 +0530
> > Anshuman Khandual <[email protected]> wrote:
> >
> >> On 09/05/2019 10:36 PM, Gerald Schaefer wrote:
> >>> On Thu, 5 Sep 2019 14:48:14 +0530
> >>> Anshuman Khandual <[email protected]> wrote:
> >>>
> >>>>> [...]
> >>>>>> +
> >>>>>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
> >>>>>> +static void pud_clear_tests(pud_t *pudp)
> >>>>>> +{
> >>>>>> + memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
> >>>>>> + pud_clear(pudp);
> >>>>>> + WARN_ON(!pud_none(READ_ONCE(*pudp)));
> >>>>>> +}
> >>>>>
> >>>>> For pgd/p4d/pud_clear(), we only clear if the page table level is present
> >>>>> and not folded. The memset() here overwrites the table type bits, so
> >>>>> pud_clear() will not clear anything on s390 and the pud_none() check will
> >>>>> fail.
> >>>>> Would it be possible to OR a (larger) random value into the table, so that
> >>>>> the lower 12 bits would be preserved?
> >>>>
> >>>> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
> >>>> it should OR a large random value preserving lower 12 bits. Hmm, this should
> >>>> still do the trick for other platforms, they just need non zero value. So on
> >>>> s390, the lower 12 bits on the page table entry already has valid value while
> >>>> entering this function which would make sure that pud_clear() really does
> >>>> clear the entry ?
> >>>
> >>> Yes, in theory the table entry on s390 would have the type set in the last
> >>> 4 bits, so preserving those would be enough. If it does not conflict with
> >>> others, I would still suggest preserving all 12 bits since those would contain
> >>> arch-specific flags in general, just to be sure. For s390, the pte/pmd tests
> >>> would also work with the memset, but for consistency I think the same logic
> >>> should be used in all pxd_clear_tests.
> >>
> >> Makes sense but..
> >>
> >> There is a small challenge with this. Modifying individual bits on a given
> >> page table entry from generic code like this test case is bit tricky. That
> >> is because there are not enough helpers to create entries with an absolute
> >> value. This would have been easier if all the platforms provided functions
> >> like __pxx() which is not the case now. Otherwise something like this should
> >> have worked.
> >>
> >>
> >> pud_t pud = READ_ONCE(*pudp);
> >> pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0))
> >> WRITE_ONCE(*pudp, pud);
> >>
> >> But __pud() will fail to build in many platforms.
> >
> > Hmm, I simply used this on my system to make pud_clear_tests() work, not
> > sure if it works on all archs:
> >
> > pud_val(*pudp) |= RANDOM_NZVALUE;
>
> Which compiles on arm64 but then fails on x86 because of the way pmd_val()
> has been defined there.
Use instead
*pudp = __pud(pud_val(*pudp) | RANDOM_NZVALUE);
It *should* be more portable.
--
Kirill A. Shutemov
On Mon, 9 Sep 2019 11:56:50 +0530
Anshuman Khandual <[email protected]> wrote:
[..]
> >
> > Hmm, I simply used this on my system to make pud_clear_tests() work, not
> > sure if it works on all archs:
> >
> > pud_val(*pudp) |= RANDOM_NZVALUE;
>
> Which compiles on arm64 but then fails on x86 because of the way pmd_val()
> has been defined there. on arm64 and s390 (with many others) pmd_val() is
> a macro which still got the variable that can be used as lvalue but that is
> not true for some other platforms like x86.
>
> arch/arm64/include/asm/pgtable-types.h: #define pmd_val(x) ((x).pmd)
> arch/s390/include/asm/page.h: #define pmd_val(x) ((x).pmd)
> arch/x86/include/asm/pgtable.h: #define pmd_val(x) native_pmd_val(x)
>
> static inline pmdval_t native_pmd_val(pmd_t pmd)
> {
> return pmd.pmd;
> }
>
> Unless I am mistaken, the return value from this function can not be used as
> lvalue for future assignments.
>
> mm/arch_pgtable_test.c: In function ‘pud_clear_tests’:
> mm/arch_pgtable_test.c:156:17: error: lvalue required as left operand of assignment
> pud_val(*pudp) |= RANDOM_ORVALUE;
> ^~
> AFAICS pxx_val() were never intended to be used as lvalue and using it that way
> might just happen to work on all those platforms which define them as macros.
> They meant to just provide values for an entry as being determined by the platform.
>
> In principle pxx_val() on an entry was not supposed to be modified directly from
> generic code without going through (again) platform helpers for any specific state
> change (write, old, dirty, special, huge etc). The current use case is a deviation
> for that.
>
> I originally went with memset() just to load up the entries with non-zero value so
> that we know pxx_clear() are really doing the clearing. The same is being followed
> for all pxx_same() checks.
>
> Another way for fixing the problem would be to mark them with known attributes
> like write/young/huge etc instead which for sure will create non-zero entries.
> We can do that for pxx_clear() and pxx_same() tests and drop RANDOM_NZVALUE
> completely. Does that sound good ?
Umm, not really. Those mkwrite/young/huge etc. helpers do only exist for
page table levels where we can also have large mappings, at least on s390.
Also, we do (on s390) again check for certain sanity before actually setting
the bits.
Good news is that at least for the pxx_same() checks the memset() is no
problem, because pxx_same() does not do any checks other than the same check.
For the pxx_clear_tests(), maybe it could be an option to put them behind the
pxx_populate_tests(), and rely on them having properly populated (non-clear)
values after that?
[...]
> >
> > Actually, using get_unmapped_area() as suggested by Kirill could also
> > solve this issue. We do create a new mm with 3-level page tables on s390,
> > and the dynamic upgrade to 4 or 5 levels is then triggered exactly by
> > arch_get_unmapped_area(), depending on the addr. But I currently don't
> > see how / where arch_get_unmapped_area() is set up for such a dummy mm
> > created by mm_alloc().
>
> Normally they are set during program loading but we can set it up explicitly
> for the test mm_struct if we need to but there are some other challenges.
>
> load_[aout|elf|flat|..]_binary()
> setup_new_exec()
> arch_pick_mmap_layout().
>
> I did some initial experiments around get_unmapped_area(). Seems bit tricky
> to get it working on a pure 'test' mm_struct. It expects a real user context
> in the form of current->mm.
Yes, that's where I stopped because it looked rather complicated :-)
Not sure why Kirill suggested it initially, but if using get_unmapped_area()
would only be necessary to get properly initialized page table levels
on s390, you could also defer this to a later add-on patch.
Regards,
Gerald