2020-03-24 05:23:53

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests

This series adds more arch page table helper tests. The new tests here are
either related to core memory functions and advanced arch pgtable helpers.
This also creates a documentation file enlisting all expected semantics as
suggested by Mike Rapoport (https://lkml.org/lkml/2020/1/30/40).

This series has been tested on arm64 and x86 platforms. There is just one
expected failure on arm64 that will be fixed when we enable THP migration.

[ 21.741634] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:782

which corresponds to

WARN_ON(!pmd_present(pmd_mknotpresent(pmd_mkhuge(pmd))))

There are many TRANSPARENT_HUGEPAGE and ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD
ifdefs scattered across the test. But consolidating all the fallback stubs
is not very straight forward because ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD is
not explicitly dependent on ARCH_HAS_TRANSPARENT_HUGEPAGE.

This series has been build tested on many platforms including the ones that
subscribe the test through ARCH_HAS_DEBUG_VM_PGTABLE.

This series is based on v5.6-rc7 after applying these following patches.

1. https://patchwork.kernel.org/patch/11431277/
2. https://patchwork.kernel.org/patch/11452185/

Changes in V2:

- Dropped CONFIG_ARCH_HAS_PTE_SPECIAL per Christophe
- Dropped CONFIG_NUMA_BALANCING per Christophe
- Dropped CONFIG_HAVE_ARCH_SOFT_DIRTY per Christophe
- Dropped CONFIG_MIGRATION per Christophe
- Replaced CONFIG_S390 with __HAVE_ARCH_PMDP_INVALIDATE
- Moved page allocation & free inside swap_migration_tests() per Christophe
- Added CONFIG_TRANSPARENT_HUGEPAGE to protect pfn_pmd()
- Added CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD to protect pfn_pud()
- Added a patch for other arch advanced page table helper tests
- Added a patch creating a documentation for page table helper semantics

Changes in V1: (https://patchwork.kernel.org/patch/11408253/)

Cc: Jonathan Corbet <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Anshuman Khandual (3):
mm/debug: Add tests validating arch page table helpers for core features
mm/debug: Add tests validating arch advanced page table helpers
Documentation/mm: Add descriptions for arch page table helpers

Documentation/vm/arch_pgtable_helpers.rst | 256 ++++++++++
mm/debug_vm_pgtable.c | 594 +++++++++++++++++++++-
2 files changed, 839 insertions(+), 11 deletions(-)
create mode 100644 Documentation/vm/arch_pgtable_helpers.rst

--
2.20.1


2020-03-24 05:24:56

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V2 3/3] Documentation/mm: Add descriptions for arch page table helpers

This removes existing description from the test and instead adds a specific
description file for all arch page table helpers which is in sync with the
semantics being tested via CONFIG_DEBUG_VM_PGTABLE. All future changes
either to these descriptions here or the debug test should always remain
in sync.

Cc: Jonathan Corbet <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Suggested-by: Mike Rapoport <[email protected]>
Signed-off-by: Anshuman Khandual <[email protected]>
---
Documentation/vm/arch_pgtable_helpers.rst | 256 ++++++++++++++++++++++
mm/debug_vm_pgtable.c | 13 +-
2 files changed, 259 insertions(+), 10 deletions(-)
create mode 100644 Documentation/vm/arch_pgtable_helpers.rst

diff --git a/Documentation/vm/arch_pgtable_helpers.rst b/Documentation/vm/arch_pgtable_helpers.rst
new file mode 100644
index 000000000000..1fc8a1d8932a
--- /dev/null
+++ b/Documentation/vm/arch_pgtable_helpers.rst
@@ -0,0 +1,256 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _arch_page_table_helpers:
+
+===============================
+Architecture Page Table Helpers
+===============================
+
+Generic MM expects architectures (with MMU) to provide helpers to create, access
+and modify page table entries at various level for different memory functions.
+These page table helpers need to conform to a common semantics across platforms.
+Following tables describe the expected semantics which can also be tested during
+boot via CONFIG_DEBUG_VM_PGTABLE option. All future changes in here or the debug
+test need to be in sync.
+
+======================
+PTE Page Table Helpers
+======================
+
+--------------------------------------------------------------------------------
+| pte_same | Tests whether both PTE entries are the same |
+--------------------------------------------------------------------------------
+| pte_bad | Tests a non-table mapped PTE |
+--------------------------------------------------------------------------------
+| pte_present | Tests a valid mapped PTE |
+--------------------------------------------------------------------------------
+| pte_young | Tests a young PTE |
+--------------------------------------------------------------------------------
+| pte_dirty | Tests a dirty PTE |
+--------------------------------------------------------------------------------
+| pte_write | Tests a writable PTE |
+--------------------------------------------------------------------------------
+| pte_special | Tests a special PTE |
+--------------------------------------------------------------------------------
+| pte_protnone | Tests a PROT_NONE PTE |
+--------------------------------------------------------------------------------
+| pte_devmap | Tests a ZONE_DEVICE mapped PTE |
+--------------------------------------------------------------------------------
+| pte_soft_dirty | Tests a soft dirty PTE |
+--------------------------------------------------------------------------------
+| pte_swp_soft_dirty | Tests a soft dirty swapped PTE |
+--------------------------------------------------------------------------------
+| pte_mkyoung | Creates a young PTE |
+--------------------------------------------------------------------------------
+| pte_mkold | Creates an old PTE |
+--------------------------------------------------------------------------------
+| pte_mkdirty | Creates a dirty PTE |
+--------------------------------------------------------------------------------
+| pte_mkclean | Creates a clean PTE |
+--------------------------------------------------------------------------------
+| pte_mkwrite | Creates a writable PTE |
+--------------------------------------------------------------------------------
+| pte_mkwrprotect | Creates a write protected PTE |
+--------------------------------------------------------------------------------
+| pte_mkspecial | Creates a special PTE |
+--------------------------------------------------------------------------------
+| pte_mkdevmap | Creates a ZONE_DEVICE mapped PTE |
+--------------------------------------------------------------------------------
+| pte_mksoft_dirty | Creates a soft dirty PTE |
+--------------------------------------------------------------------------------
+| pte_clear_soft_dirty | Clears a soft dirty PTE |
+--------------------------------------------------------------------------------
+| pte_swp_mksoft_dirty | Creates a soft dirty swapped PTE |
+--------------------------------------------------------------------------------
+| pte_swp_clear_soft_dirty | Clears a soft dirty swapped PTE |
+--------------------------------------------------------------------------------
+| pte_mknotpresent | Invalidates a mapped PTE |
+--------------------------------------------------------------------------------
+| ptep_get_and_clear | Clears a PTE |
+--------------------------------------------------------------------------------
+| ptep_get_and_clear_full | Clears a PTE |
+--------------------------------------------------------------------------------
+| ptep_test_and_clear_young | Clears young from a PTE |
+--------------------------------------------------------------------------------
+| ptep_set_wrprotect | Converts into a write protected PTE |
+--------------------------------------------------------------------------------
+| ptep_set_access_flags | Converts into a more permissive PTE |
+--------------------------------------------------------------------------------
+
+======================
+PMD Page Table Helpers
+======================
+
+--------------------------------------------------------------------------------
+| pmd_same | Tests whether both PMD entries are the same |
+--------------------------------------------------------------------------------
+| pmd_bad | Tests a non-table mapped PMD |
+--------------------------------------------------------------------------------
+| pmd_leaf | Tests a leaf mapped PMD |
+--------------------------------------------------------------------------------
+| pmd_huge | Tests a HugeTLB mapped PMD |
+--------------------------------------------------------------------------------
+| pmd_trans_huge | Tests a Transparent Huge Page (THP) at PMD |
+--------------------------------------------------------------------------------
+| pmd_present | Tests a valid mapped PMD |
+--------------------------------------------------------------------------------
+| pmd_young | Tests a young PMD |
+--------------------------------------------------------------------------------
+| pmd_dirty | Tests a dirty PMD |
+--------------------------------------------------------------------------------
+| pmd_write | Tests a writable PMD |
+--------------------------------------------------------------------------------
+| pmd_special | Tests a special PMD |
+--------------------------------------------------------------------------------
+| pmd_protnone | Tests a PROT_NONE PMD |
+--------------------------------------------------------------------------------
+| pmd_devmap | Tests a ZONE_DEVICE mapped PMD |
+--------------------------------------------------------------------------------
+| pmd_soft_dirty | Tests a soft dirty PMD |
+--------------------------------------------------------------------------------
+| pmd_swp_soft_dirty | Tests a soft dirty swapped PMD |
+--------------------------------------------------------------------------------
+| pmd_mkyoung | Creates a young PMD |
+--------------------------------------------------------------------------------
+| pmd_mkold | Creates an old PMD |
+--------------------------------------------------------------------------------
+| pmd_mkdirty | Creates a dirty PMD |
+--------------------------------------------------------------------------------
+| pmd_mkclean | Creates a clean PMD |
+--------------------------------------------------------------------------------
+| pmd_mkwrite | Creates a writable PMD |
+--------------------------------------------------------------------------------
+| pmd_mkwrprotect | Creates a write protected PMD |
+--------------------------------------------------------------------------------
+| pmd_mkspecial | Creates a special PMD |
+--------------------------------------------------------------------------------
+| pmd_mkdevmap | Creates a ZONE_DEVICE mapped PMD |
+--------------------------------------------------------------------------------
+| pmd_mksoft_dirty | Creates a soft dirty PMD |
+--------------------------------------------------------------------------------
+| pmd_clear_soft_dirty | Clears a soft dirty PMD |
+--------------------------------------------------------------------------------
+| pmd_swp_mksoft_dirty | Creates a soft dirty swapped PMD |
+--------------------------------------------------------------------------------
+| pmd_swp_clear_soft_dirty | Clears a soft dirty swapped PMD |
+--------------------------------------------------------------------------------
+| pmd_mknotpresent | Invalidates a mapped PMD |
+--------------------------------------------------------------------------------
+| pmd_set_huge | Creates a PMD huge mapping |
+--------------------------------------------------------------------------------
+| pmd_clear_huge | Clears a PMD huge mapping |
+--------------------------------------------------------------------------------
+| pmdp_get_and_clear | Clears a PMD |
+--------------------------------------------------------------------------------
+| pmdp_get_and_clear_full | Clears a PMD |
+--------------------------------------------------------------------------------
+| pmdp_test_and_clear_young | Clears young from a PMD |
+--------------------------------------------------------------------------------
+| pmdp_set_wrprotect | Converts into a write protected PMD |
+--------------------------------------------------------------------------------
+| pmdp_set_access_flags | Converts into a more permissive PMD |
+--------------------------------------------------------------------------------
+
+======================
+PUD Page Table Helpers
+======================
+
+--------------------------------------------------------------------------------
+| pud_same | Tests whether both PUD entries are the same |
+--------------------------------------------------------------------------------
+| pud_bad | Tests a non-table mapped PUD |
+--------------------------------------------------------------------------------
+| pud_leaf | Tests a leaf mapped PUD |
+--------------------------------------------------------------------------------
+| pud_huge | Tests a HugeTLB mapped PUD |
+--------------------------------------------------------------------------------
+| pud_trans_huge | Tests a Transparent Huge Page (THP) at PUD |
+--------------------------------------------------------------------------------
+| pud_present | Tests a valid mapped PUD |
+--------------------------------------------------------------------------------
+| pud_young | Tests a young PUD |
+--------------------------------------------------------------------------------
+| pud_dirty | Tests a dirty PUD |
+--------------------------------------------------------------------------------
+| pud_write | Tests a writable PUD |
+--------------------------------------------------------------------------------
+| pud_devmap | Tests a ZONE_DEVICE mapped PUD |
+--------------------------------------------------------------------------------
+| pud_mkyoung | Creates a young PUD |
+--------------------------------------------------------------------------------
+| pud_mkold | Creates an old PUD |
+--------------------------------------------------------------------------------
+| pud_mkdirty | Creates a dirty PUD |
+--------------------------------------------------------------------------------
+| pud_mkclean | Creates a clean PUD |
+--------------------------------------------------------------------------------
+| pud_mkwrite | Creates a writable PMD |
+--------------------------------------------------------------------------------
+| pud_mkwrprotect | Creates a write protected PMD |
+--------------------------------------------------------------------------------
+| pud_mkdevmap | Creates a ZONE_DEVICE mapped PMD |
+--------------------------------------------------------------------------------
+| pud_mknotpresent | Invalidates a mapped PUD |
+--------------------------------------------------------------------------------
+| pud_set_huge | Creates a PUD huge mapping |
+--------------------------------------------------------------------------------
+| pud_clear_huge | Clears a PUD huge mapping |
+--------------------------------------------------------------------------------
+| pudp_get_and_clear | Clears a PUD |
+--------------------------------------------------------------------------------
+| pudp_get_and_clear_full | Clears a PUD |
+--------------------------------------------------------------------------------
+| pudp_test_and_clear_young | Clears young from a PUD |
+--------------------------------------------------------------------------------
+| pudp_set_wrprotect | Converts into a write protected PUD |
+--------------------------------------------------------------------------------
+| pudp_set_access_flags | Converts into a more permissive PUD |
+--------------------------------------------------------------------------------
+
+==========================
+HugeTLB Page Table Helpers
+==========================
+
+--------------------------------------------------------------------------------
+| pte_huge | Tests a HugeTLB |
+--------------------------------------------------------------------------------
+| pte_mkhuge | Creates a HugeTLB |
+--------------------------------------------------------------------------------
+| huge_pte_dirty | Tests a dirty HugeTLB |
+--------------------------------------------------------------------------------
+| huge_pte_write | Tests a writable HugeTLB |
+--------------------------------------------------------------------------------
+| huge_pte_mkdirty | Creates a dirty HugeTLB |
+--------------------------------------------------------------------------------
+| huge_pte_mkwrite | Creates a writable HugeTLB |
+--------------------------------------------------------------------------------
+| huge_pte_mkwrprotect | Creates a write protected HugeTLB |
+--------------------------------------------------------------------------------
+| huge_ptep_get_and_clear | Clears a HugeTLB |
+--------------------------------------------------------------------------------
+| huge_ptep_set_wrprotect | Converts into a write protected HugeTLB |
+--------------------------------------------------------------------------------
+| huge_ptep_set_access_flags | Converts into a more permissive HugeTLB |
+--------------------------------------------------------------------------------
+
+========================
+SWAP Page Table Helpers
+========================
+
+--------------------------------------------------------------------------------
+| __pte_to_swp_entry | Creates a swapped entry (arch) from a mapepd PTE |
+--------------------------------------------------------------------------------
+| __swp_to_pte_entry | Creates a mapped PTE from a swapped entry (arch) |
+--------------------------------------------------------------------------------
+| __pmd_to_swp_entry | Creates a swapped entry (arch) from a mapepd PMD |
+--------------------------------------------------------------------------------
+| __swp_to_pmd_entry | Creates a mapped PMD from a swapped entry (arch) |
+--------------------------------------------------------------------------------
+| is_migration_entry | Tests a migration (read or write) swapped entry |
+--------------------------------------------------------------------------------
+| is_write_migration_entry | Tests a write migration swapped entry |
+--------------------------------------------------------------------------------
+| make_migration_entry_read | Converts into read migration swapped entry |
+--------------------------------------------------------------------------------
+| make_migration_entry | Creates a migration swapped entry (read or write)|
+--------------------------------------------------------------------------------
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 87b4b495333b..7c210f99a812 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -32,16 +32,9 @@
#include <asm/tlbflush.h>

/*
- * Basic operations
- *
- * mkold(entry) = An old and not a young entry
- * mkyoung(entry) = A young and not an old entry
- * mkdirty(entry) = A dirty and not a clean entry
- * mkclean(entry) = A clean and not a dirty entry
- * mkwrite(entry) = A write and not a write protected entry
- * wrprotect(entry) = A write protected and not a write entry
- * pxx_bad(entry) = A mapped and non-table entry
- * pxx_same(entry1, entry2) = Both entries hold the exact same value
+ * Please refer Documentation/vm/arch_pgtable_helpers.rst for the semantics
+ * expectations that are being validated here. All future changes in here
+ * or the documentation need to be in sync.
*/
#define VMFLAGS (VM_READ|VM_WRITE|VM_EXEC)

--
2.20.1

2020-03-24 05:25:03

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V2 1/3] mm/debug: Add tests validating arch page table helpers for core features

This adds new tests validating arch page table helpers for these following
core memory features. These tests create and test specific mapping types at
various page table levels.

1. SPECIAL mapping
2. PROTNONE mapping
3. DEVMAP mapping
4. SOFTDIRTY mapping
5. SWAP mapping
6. MIGRATION mapping
7. HUGETLB mapping
8. THP mapping

Cc: Andrew Morton <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Suggested-by: Catalin Marinas <[email protected]>
Signed-off-by: Anshuman Khandual <[email protected]>
---
mm/debug_vm_pgtable.c | 291 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 290 insertions(+), 1 deletion(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 98990a515268..15055a8f6478 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -289,6 +289,267 @@ static void __init pmd_populate_tests(struct mm_struct *mm, pmd_t *pmdp,
WARN_ON(pmd_bad(pmd));
}

+static void __init pte_special_tests(unsigned long pfn, pgprot_t prot)
+{
+ pte_t pte = pfn_pte(pfn, prot);
+
+ if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL))
+ return;
+
+ WARN_ON(!pte_special(pte_mkspecial(pte)));
+}
+
+static void __init pte_protnone_tests(unsigned long pfn, pgprot_t prot)
+{
+ pte_t pte = pfn_pte(pfn, prot);
+
+ if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
+ return;
+
+ WARN_ON(!pte_protnone(pte));
+ WARN_ON(!pte_present(pte));
+}
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static void __init pmd_protnone_tests(unsigned long pfn, pgprot_t prot)
+{
+ pmd_t pmd = pfn_pmd(pfn, prot);
+
+ if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
+ return;
+
+ WARN_ON(!pmd_protnone(pmd));
+ WARN_ON(!pmd_present(pmd));
+}
+#else
+static void __init pmd_protnone_tests(unsigned long pfn, pgprot_t prot) { }
+#endif
+
+#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
+static void __init pte_devmap_tests(unsigned long pfn, pgprot_t prot)
+{
+ pte_t pte = pfn_pte(pfn, prot);
+
+ WARN_ON(!pte_devmap(pte_mkdevmap(pte)));
+}
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot)
+{
+ pmd_t pmd = pfn_pmd(pfn, prot);
+
+ WARN_ON(!pmd_devmap(pmd_mkdevmap(pmd)));
+}
+
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
+static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot)
+{
+ pud_t pud = pfn_pud(pfn, prot);
+
+ WARN_ON(!pud_devmap(pud_mkdevmap(pud)));
+}
+#else
+static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { }
+#endif
+#else
+static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot) { }
+static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { }
+#endif
+#else
+static void __init pte_devmap_tests(unsigned long pfn, pgprot_t prot) { }
+static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot) { }
+static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { }
+#endif
+
+static void __init pte_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
+{
+ pte_t pte = pfn_pte(pfn, prot);
+
+ if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY))
+ return;
+
+ WARN_ON(!pte_soft_dirty(pte_mksoft_dirty(pte)));
+ WARN_ON(pte_soft_dirty(pte_clear_soft_dirty(pte)));
+}
+
+static void __init pte_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
+{
+ pte_t pte = pfn_pte(pfn, prot);
+
+ if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY))
+ return;
+
+ WARN_ON(!pte_swp_soft_dirty(pte_swp_mksoft_dirty(pte)));
+ WARN_ON(pte_swp_soft_dirty(pte_swp_clear_soft_dirty(pte)));
+}
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static void __init pmd_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
+{
+ pmd_t pmd = pfn_pmd(pfn, prot);
+
+ if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY))
+ return;
+
+ WARN_ON(!pmd_soft_dirty(pmd_mksoft_dirty(pmd)));
+ WARN_ON(pmd_soft_dirty(pmd_clear_soft_dirty(pmd)));
+}
+
+static void __init pmd_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
+{
+ pmd_t pmd = pfn_pmd(pfn, prot);
+
+ if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY) ||
+ !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION))
+ return;
+
+ WARN_ON(!pmd_swp_soft_dirty(pmd_swp_mksoft_dirty(pmd)));
+ WARN_ON(pmd_swp_soft_dirty(pmd_swp_clear_soft_dirty(pmd)));
+}
+#else
+static void __init pmd_soft_dirty_tests(unsigned long pfn, pgprot_t prot) { }
+static void __init pmd_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
+{
+}
+#endif
+
+static void __init pte_swap_tests(unsigned long pfn, pgprot_t prot)
+{
+ swp_entry_t swp;
+ pte_t pte;
+
+ pte = pfn_pte(pfn, prot);
+ swp = __pte_to_swp_entry(pte);
+ WARN_ON(!pte_same(pte, __swp_entry_to_pte(swp)));
+}
+
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot)
+{
+ swp_entry_t swp;
+ pmd_t pmd;
+
+ pmd = pfn_pmd(pfn, prot);
+ swp = __pmd_to_swp_entry(pmd);
+ WARN_ON(!pmd_same(pmd, __swp_entry_to_pmd(swp)));
+}
+#else
+static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot) { }
+#endif
+
+static void __init swap_migration_tests(void)
+{
+ struct page *page;
+ swp_entry_t swp;
+
+ if (!IS_ENABLED(CONFIG_MIGRATION))
+ return;
+ /*
+ * swap_migration_tests() requires a dedicated page as it needs to
+ * be locked before creating a migration entry from it. Locking the
+ * page that actually maps kernel text ('start_kernel') can be real
+ * problematic. Lets allocate a dedicated page explicitly for this
+ * purpose that will be freed subsequently.
+ */
+ page = alloc_page(GFP_KERNEL);
+ if (!page) {
+ pr_err("page allocation failed\n");
+ return;
+ }
+
+ /*
+ * make_migration_entry() expects given page to be
+ * locked, otherwise it stumbles upon a BUG_ON().
+ */
+ __SetPageLocked(page);
+ swp = make_migration_entry(page, 1);
+ WARN_ON(!is_migration_entry(swp));
+ WARN_ON(!is_write_migration_entry(swp));
+
+ make_migration_entry_read(&swp);
+ WARN_ON(!is_migration_entry(swp));
+ WARN_ON(is_write_migration_entry(swp));
+
+ swp = make_migration_entry(page, 0);
+ WARN_ON(!is_migration_entry(swp));
+ WARN_ON(is_write_migration_entry(swp));
+ __ClearPageLocked(page);
+ __free_page(page);
+}
+
+#ifdef CONFIG_HUGETLB_PAGE
+static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot)
+{
+ struct page *page;
+ pte_t pte;
+
+ /*
+ * Accessing the page associated with the pfn is safe here,
+ * as it was previously derived from a real kernel symbol.
+ */
+ page = pfn_to_page(pfn);
+ pte = mk_huge_pte(page, prot);
+
+ WARN_ON(!huge_pte_dirty(huge_pte_mkdirty(pte)));
+ WARN_ON(!huge_pte_write(huge_pte_mkwrite(huge_pte_wrprotect(pte))));
+ WARN_ON(huge_pte_write(huge_pte_wrprotect(huge_pte_mkwrite(pte))));
+
+#ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB
+ pte = pfn_pte(pfn, prot);
+
+ WARN_ON(!pte_huge(pte_mkhuge(pte)));
+#endif
+}
+#else
+static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot) { }
+#endif
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static void __init pmd_thp_tests(unsigned long pfn, pgprot_t prot)
+{
+ pmd_t pmd;
+
+ /*
+ * pmd_trans_huge() and pmd_present() must return positive
+ * after MMU invalidation with pmd_mknotpresent().
+ */
+ pmd = pfn_pmd(pfn, prot);
+ WARN_ON(!pmd_trans_huge(pmd_mkhuge(pmd)));
+
+#ifndef __HAVE_ARCH_PMDP_INVALIDATE
+ WARN_ON(!pmd_trans_huge(pmd_mknotpresent(pmd_mkhuge(pmd))));
+ WARN_ON(!pmd_present(pmd_mknotpresent(pmd_mkhuge(pmd))));
+#endif
+}
+
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
+static void __init pud_thp_tests(unsigned long pfn, pgprot_t prot)
+{
+ pud_t pud;
+
+ /*
+ * pud_trans_huge() and pud_present() must return positive
+ * after MMU invalidation with pud_mknotpresent().
+ */
+ pud = pfn_pud(pfn, prot);
+ WARN_ON(!pud_trans_huge(pud_mkhuge(pud)));
+
+ /*
+ * pud_mknotpresent() has been dropped for now. Enable back
+ * these tests when it comes back with a modified pud_present().
+ *
+ * WARN_ON(!pud_trans_huge(pud_mknotpresent(pud_mkhuge(pud))));
+ * WARN_ON(!pud_present(pud_mknotpresent(pud_mkhuge(pud))));
+ */
+}
+#else
+static void __init pud_thp_tests(unsigned long pfn, pgprot_t prot) { }
+#endif
+#else
+static void __init pmd_thp_tests(unsigned long pfn, pgprot_t prot) { }
+static void __init pud_thp_tests(unsigned long pfn, pgprot_t prot) { }
+#endif
+
static unsigned long __init get_random_vaddr(void)
{
unsigned long random_vaddr, random_pages, total_user_pages;
@@ -310,7 +571,7 @@ void __init debug_vm_pgtable(void)
pmd_t *pmdp, *saved_pmdp, pmd;
pte_t *ptep;
pgtable_t saved_ptep;
- pgprot_t prot;
+ pgprot_t prot, protnone;
phys_addr_t paddr;
unsigned long vaddr, pte_aligned, pmd_aligned;
unsigned long pud_aligned, p4d_aligned, pgd_aligned;
@@ -325,6 +586,12 @@ void __init debug_vm_pgtable(void)
return;
}

+ /*
+ * __P000 (or even __S000) will help create page table entries with
+ * PROT_NONE permission as required for pxx_protnone_tests().
+ */
+ protnone = __P000;
+
/*
* PFN for mapping at PTE level is determined from a standard kernel
* text symbol. But pfns for higher page table levels are derived by
@@ -380,6 +647,28 @@ void __init debug_vm_pgtable(void)
p4d_populate_tests(mm, p4dp, saved_pudp);
pgd_populate_tests(mm, pgdp, saved_p4dp);

+ pte_special_tests(pte_aligned, prot);
+ pte_protnone_tests(pte_aligned, protnone);
+ pmd_protnone_tests(pmd_aligned, protnone);
+
+ pte_devmap_tests(pte_aligned, prot);
+ pmd_devmap_tests(pmd_aligned, prot);
+ pud_devmap_tests(pud_aligned, prot);
+
+ pte_soft_dirty_tests(pte_aligned, prot);
+ pmd_soft_dirty_tests(pmd_aligned, prot);
+ pte_swap_soft_dirty_tests(pte_aligned, prot);
+ pmd_swap_soft_dirty_tests(pmd_aligned, prot);
+
+ pte_swap_tests(pte_aligned, prot);
+ pmd_swap_tests(pmd_aligned, prot);
+
+ swap_migration_tests();
+ hugetlb_basic_tests(pte_aligned, prot);
+
+ pmd_thp_tests(pmd_aligned, prot);
+ pud_thp_tests(pud_aligned, prot);
+
p4d_free(mm, saved_p4dp);
pud_free(mm, saved_pudp);
pmd_free(mm, saved_pmdp);
--
2.20.1

2020-03-24 05:25:45

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V2 2/3] mm/debug: Add tests validating arch advanced page table helpers

This adds new tests validating for these following arch advanced page table
helpers. These tests create and test specific mapping types at various page
table levels.

1. pxxp_set_wrprotect()
2. pxxp_get_and_clear()
3. pxxp_set_access_flags()
4. pxxp_get_and_clear_full()
5. pxxp_test_and_clear_young()
6. pxx_leaf()
7. pxx_set_huge()
8. pxx_(clear|mk)_savedwrite()
9. huge_pxxp_xxx()

Cc: Andrew Morton <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Suggested-by: Catalin Marinas <[email protected]>
Signed-off-by: Anshuman Khandual <[email protected]>
---
mm/debug_vm_pgtable.c | 290 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 290 insertions(+)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 15055a8f6478..87b4b495333b 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -29,6 +29,7 @@
#include <linux/sched/mm.h>
#include <asm/pgalloc.h>
#include <asm/pgtable.h>
+#include <asm/tlbflush.h>

/*
* Basic operations
@@ -68,6 +69,54 @@ static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot)
WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte))));
}

+static void __init pte_advanced_tests(struct mm_struct *mm,
+ struct vm_area_struct *vma, pte_t *ptep,
+ unsigned long pfn, unsigned long vaddr, pgprot_t prot)
+{
+ pte_t pte = pfn_pte(pfn, prot);
+
+ pte = pfn_pte(pfn, prot);
+ set_pte_at(mm, vaddr, ptep, pte);
+ ptep_set_wrprotect(mm, vaddr, ptep);
+ pte = READ_ONCE(*ptep);
+ WARN_ON(pte_write(pte));
+
+ pte = pfn_pte(pfn, prot);
+ set_pte_at(mm, vaddr, ptep, pte);
+ ptep_get_and_clear(mm, vaddr, ptep);
+ pte = READ_ONCE(*ptep);
+ WARN_ON(!pte_none(pte));
+
+ pte = pfn_pte(pfn, prot);
+ pte = pte_wrprotect(pte);
+ pte = pte_mkclean(pte);
+ set_pte_at(mm, vaddr, ptep, pte);
+ pte = pte_mkwrite(pte);
+ pte = pte_mkdirty(pte);
+ ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
+ pte = READ_ONCE(*ptep);
+ WARN_ON(!(pte_write(pte) && pte_dirty(pte)));
+
+ pte = pfn_pte(pfn, prot);
+ set_pte_at(mm, vaddr, ptep, pte);
+ ptep_get_and_clear_full(mm, vaddr, ptep, 1);
+ pte = READ_ONCE(*ptep);
+ WARN_ON(!pte_none(pte));
+
+ pte = pte_mkyoung(pte);
+ set_pte_at(mm, vaddr, ptep, pte);
+ ptep_test_and_clear_young(vma, vaddr, ptep);
+ pte = READ_ONCE(*ptep);
+ WARN_ON(pte_young(pte));
+}
+
+static void __init pte_savedwrite_tests(unsigned long pfn, pgprot_t prot)
+{
+ pte_t pte = pfn_pte(pfn, prot);
+
+ WARN_ON(!pte_savedwrite(pte_mk_savedwrite(pte_clear_savedwrite(pte))));
+ WARN_ON(pte_savedwrite(pte_clear_savedwrite(pte_mk_savedwrite(pte))));
+}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
{
@@ -87,6 +136,83 @@ static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
}

+static void __init pmd_advanced_tests(struct mm_struct *mm,
+ struct vm_area_struct *vma, pmd_t *pmdp,
+ unsigned long pfn, unsigned long vaddr, pgprot_t prot)
+{
+ pmd_t pmd = pfn_pmd(pfn, prot);
+
+ /* Align the address wrt HPAGE_PMD_SIZE */
+ vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
+
+ pmd = pfn_pmd(pfn, prot);
+ set_pmd_at(mm, vaddr, pmdp, pmd);
+ pmdp_set_wrprotect(mm, vaddr, pmdp);
+ pmd = READ_ONCE(*pmdp);
+ WARN_ON(pmd_write(pmd));
+
+ pmd = pfn_pmd(pfn, prot);
+ set_pmd_at(mm, vaddr, pmdp, pmd);
+ pmdp_huge_get_and_clear(mm, vaddr, pmdp);
+ pmd = READ_ONCE(*pmdp);
+ WARN_ON(!pmd_none(pmd));
+
+ pmd = pfn_pmd(pfn, prot);
+ pmd = pmd_wrprotect(pmd);
+ pmd = pmd_mkclean(pmd);
+ set_pmd_at(mm, vaddr, pmdp, pmd);
+ pmd = pmd_mkwrite(pmd);
+ pmd = pmd_mkdirty(pmd);
+ pmdp_set_access_flags(vma, vaddr, pmdp, pmd, 1);
+ pmd = READ_ONCE(*pmdp);
+ WARN_ON(!(pmd_write(pmd) && pmd_dirty(pmd)));
+
+ pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
+ set_pmd_at(mm, vaddr, pmdp, pmd);
+ pmdp_huge_get_and_clear_full(mm, vaddr, pmdp, 1);
+ pmd = READ_ONCE(*pmdp);
+ WARN_ON(!pmd_none(pmd));
+
+ pmd = pmd_mkyoung(pmd);
+ set_pmd_at(mm, vaddr, pmdp, pmd);
+ pmdp_test_and_clear_young(vma, vaddr, pmdp);
+ pmd = READ_ONCE(*pmdp);
+ WARN_ON(pmd_young(pmd));
+}
+
+static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot)
+{
+ pmd_t pmd = pfn_pmd(pfn, prot);
+
+ /*
+ * PMD based THP is a leaf entry.
+ */
+ pmd = pmd_mkhuge(pmd);
+ WARN_ON(!pmd_leaf(pmd));
+}
+
+static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot)
+{
+ pmd_t pmd;
+
+ /*
+ * X86 defined pmd_set_huge() verifies that the given
+ * PMD is not a populated non-leaf entry.
+ */
+ WRITE_ONCE(*pmdp, __pmd(0));
+ WARN_ON(!pmd_set_huge(pmdp, __pfn_to_phys(pfn), prot));
+ WARN_ON(!pmd_clear_huge(pmdp));
+ pmd = READ_ONCE(*pmdp);
+ WARN_ON(!pmd_none(pmd));
+}
+
+static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
+{
+ pmd_t pmd = pfn_pmd(pfn, prot);
+
+ WARN_ON(!pmd_savedwrite(pmd_mk_savedwrite(pmd_clear_savedwrite(pmd))));
+ WARN_ON(pmd_savedwrite(pmd_clear_savedwrite(pmd_mk_savedwrite(pmd))));
+}
#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot)
{
@@ -107,12 +233,110 @@ static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot)
*/
WARN_ON(!pud_bad(pud_mkhuge(pud)));
}
+
+static void pud_advanced_tests(struct mm_struct *mm,
+ struct vm_area_struct *vma, pud_t *pudp,
+ unsigned long pfn, unsigned long vaddr, pgprot_t prot)
+{
+ pud_t pud = pfn_pud(pfn, prot);
+
+ /* Align the address wrt HPAGE_PUD_SIZE */
+ vaddr = (vaddr & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE;
+
+ set_pud_at(mm, vaddr, pudp, pud);
+ pudp_set_wrprotect(mm, vaddr, pudp);
+ pud = READ_ONCE(*pudp);
+ WARN_ON(pud_write(pud));
+
+#ifndef __PAGETABLE_PMD_FOLDED
+ pud = pfn_pud(pfn, prot);
+ set_pud_at(mm, vaddr, pudp, pud);
+ pudp_huge_get_and_clear(mm, vaddr, pudp);
+ pud = READ_ONCE(*pudp);
+ WARN_ON(!pud_none(pud));
+
+ pud = pfn_pud(pfn, prot);
+ set_pud_at(mm, vaddr, pudp, pud);
+ pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1);
+ pud = READ_ONCE(*pudp);
+ WARN_ON(!pud_none(pud));
+#endif
+ pud = pfn_pud(pfn, prot);
+ pud = pud_wrprotect(pud);
+ pud = pud_mkclean(pud);
+ set_pud_at(mm, vaddr, pudp, pud);
+ pud = pud_mkwrite(pud);
+ pud = pud_mkdirty(pud);
+ pudp_set_access_flags(vma, vaddr, pudp, pud, 1);
+ pud = READ_ONCE(*pudp);
+ WARN_ON(!(pud_write(pud) && pud_dirty(pud)));
+
+ pud = pud_mkyoung(pud);
+ set_pud_at(mm, vaddr, pudp, pud);
+ pudp_test_and_clear_young(vma, vaddr, pudp);
+ pud = READ_ONCE(*pudp);
+ WARN_ON(pud_young(pud));
+}
+
+static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot)
+{
+ pud_t pud = pfn_pud(pfn, prot);
+
+ /*
+ * PUD based THP is a leaf entry.
+ */
+ pud = pud_mkhuge(pud);
+ WARN_ON(!pud_leaf(pud));
+}
+
+static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot)
+{
+ pud_t pud;
+
+ /*
+ * X86 defined pud_set_huge() verifies that the given
+ * PUD is not a populated non-leaf entry.
+ */
+ WRITE_ONCE(*pudp, __pud(0));
+ WARN_ON(!pud_set_huge(pudp, __pfn_to_phys(pfn), prot));
+ WARN_ON(!pud_clear_huge(pudp));
+ pud = READ_ONCE(*pudp);
+ WARN_ON(!pud_none(pud));
+}
#else
static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
+static void pud_advanced_tests(struct mm_struct *mm,
+ struct vm_area_struct *vma, pud_t *pudp,
+ unsigned long pfn, unsigned long vaddr, pgprot_t prot)
+{
+}
+static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot) { }
+static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot)
+{
+}
#endif
#else
static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { }
static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
+static void __init pmd_advanced_tests(struct mm_struct *mm,
+ struct vm_area_struct *vma, pmd_t *pmdp,
+ unsigned long pfn, unsigned long vaddr, pgprot_t prot)
+{
+}
+static void __init pud_advanced_tests(struct mm_struct *mm,
+ struct vm_area_struct *vma, pud_t *pudp,
+ unsigned long pfn, unsigned long vaddr, pgprot_t prot)
+{
+}
+static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot) { }
+static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot) { }
+static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot)
+{
+}
+static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot)
+{
+}
+static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot) { }
#endif

static void __init p4d_basic_tests(unsigned long pfn, pgprot_t prot)
@@ -500,8 +724,52 @@ static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot)
WARN_ON(!pte_huge(pte_mkhuge(pte)));
#endif
}
+
+static void __init hugetlb_advanced_tests(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ pte_t *ptep, unsigned long pfn,
+ unsigned long vaddr, pgprot_t prot)
+{
+ struct page *page = pfn_to_page(pfn);
+ pte_t pte = READ_ONCE(*ptep);
+
+ pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
+ set_huge_pte_at(mm, vaddr, ptep, pte);
+ barrier();
+ WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
+ huge_pte_clear(mm, vaddr, ptep, PMD_SIZE);
+ pte = READ_ONCE(*ptep);
+ WARN_ON(!huge_pte_none(pte));
+
+ pte = mk_huge_pte(page, prot);
+ set_huge_pte_at(mm, vaddr, ptep, pte);
+ huge_ptep_set_wrprotect(mm, vaddr, ptep);
+ pte = READ_ONCE(*ptep);
+ WARN_ON(huge_pte_write(pte));
+
+ pte = mk_huge_pte(page, prot);
+ set_huge_pte_at(mm, vaddr, ptep, pte);
+ huge_ptep_get_and_clear(mm, vaddr, ptep);
+ pte = READ_ONCE(*ptep);
+ WARN_ON(!huge_pte_none(pte));
+
+ pte = mk_huge_pte(page, prot);
+ pte = huge_pte_wrprotect(pte);
+ set_huge_pte_at(mm, vaddr, ptep, pte);
+ pte = huge_pte_mkwrite(pte);
+ pte = huge_pte_mkdirty(pte);
+ huge_ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
+ pte = READ_ONCE(*ptep);
+ WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte)));
+}
#else
static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot) { }
+static void __init hugetlb_advanced_tests(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ pte_t *ptep, unsigned long pfn,
+ unsigned long vaddr, pgprot_t prot)
+{
+}
#endif

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -564,6 +832,7 @@ static unsigned long __init get_random_vaddr(void)

void __init debug_vm_pgtable(void)
{
+ struct vm_area_struct *vma;
struct mm_struct *mm;
pgd_t *pgdp;
p4d_t *p4dp, *saved_p4dp;
@@ -592,6 +861,12 @@ void __init debug_vm_pgtable(void)
*/
protnone = __P000;

+ vma = vm_area_alloc(mm);
+ if (!vma) {
+ pr_err("vma allocation failed\n");
+ return;
+ }
+
/*
* PFN for mapping at PTE level is determined from a standard kernel
* text symbol. But pfns for higher page table levels are derived by
@@ -640,6 +915,20 @@ void __init debug_vm_pgtable(void)
p4d_clear_tests(mm, p4dp);
pgd_clear_tests(mm, pgdp);

+ pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+ pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot);
+ pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
+ hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+
+ pmd_leaf_tests(pmd_aligned, prot);
+ pud_leaf_tests(pud_aligned, prot);
+
+ pmd_huge_tests(pmdp, pmd_aligned, prot);
+ pud_huge_tests(pudp, pud_aligned, prot);
+
+ pte_savedwrite_tests(pte_aligned, prot);
+ pmd_savedwrite_tests(pmd_aligned, prot);
+
pte_unmap_unlock(ptep, ptl);

pmd_populate_tests(mm, pmdp, saved_ptep);
@@ -674,6 +963,7 @@ void __init debug_vm_pgtable(void)
pmd_free(mm, saved_pmdp);
pte_free(mm, saved_ptep);

+ vm_area_free(vma);
mm_dec_nr_puds(mm);
mm_dec_nr_pmds(mm);
mm_dec_nr_ptes(mm);
--
2.20.1

2020-03-24 13:30:56

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH V2 1/3] mm/debug: Add tests validating arch page table helpers for core features

On 24 Mar 2020, at 1:22, Anshuman Khandual wrote:

> This adds new tests validating arch page table helpers for these following
> core memory features. These tests create and test specific mapping types at
> various page table levels.
>
> 1. SPECIAL mapping
> 2. PROTNONE mapping
> 3. DEVMAP mapping
> 4. SOFTDIRTY mapping
> 5. SWAP mapping
> 6. MIGRATION mapping
> 7. HUGETLB mapping
> 8. THP mapping
>
> Cc: Andrew Morton <[email protected]>
> Cc: Mike Rapoport <[email protected]>
> Cc: Vineet Gupta <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Heiko Carstens <[email protected]>
> Cc: Vasily Gorbik <[email protected]>
> Cc: Christian Borntraeger <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Kirill A. Shutemov <[email protected]>
> Cc: Paul Walmsley <[email protected]>
> Cc: Palmer Dabbelt <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Suggested-by: Catalin Marinas <[email protected]>
> Signed-off-by: Anshuman Khandual <[email protected]>
> ---
> mm/debug_vm_pgtable.c | 291 +++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 290 insertions(+), 1 deletion(-)
>
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index 98990a515268..15055a8f6478 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -289,6 +289,267 @@ static void __init pmd_populate_tests(struct mm_struct *mm, pmd_t *pmdp,
> WARN_ON(pmd_bad(pmd));
> }
>
> +static void __init pte_special_tests(unsigned long pfn, pgprot_t prot)
> +{
> + pte_t pte = pfn_pte(pfn, prot);
> +
> + if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL))
> + return;
> +
> + WARN_ON(!pte_special(pte_mkspecial(pte)));
> +}
> +
> +static void __init pte_protnone_tests(unsigned long pfn, pgprot_t prot)
> +{
> + pte_t pte = pfn_pte(pfn, prot);
> +
> + if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
> + return;
> +
> + WARN_ON(!pte_protnone(pte));
> + WARN_ON(!pte_present(pte));
> +}
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +static void __init pmd_protnone_tests(unsigned long pfn, pgprot_t prot)
> +{
> + pmd_t pmd = pfn_pmd(pfn, prot);
> +
> + if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
> + return;
> +
> + WARN_ON(!pmd_protnone(pmd));
> + WARN_ON(!pmd_present(pmd));
> +}
> +#else
> +static void __init pmd_protnone_tests(unsigned long pfn, pgprot_t prot) { }
> +#endif
> +
> +#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
> +static void __init pte_devmap_tests(unsigned long pfn, pgprot_t prot)
> +{
> + pte_t pte = pfn_pte(pfn, prot);
> +
> + WARN_ON(!pte_devmap(pte_mkdevmap(pte)));
> +}
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot)
> +{
> + pmd_t pmd = pfn_pmd(pfn, prot);
> +
> + WARN_ON(!pmd_devmap(pmd_mkdevmap(pmd)));
> +}
> +
> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> +static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot)
> +{
> + pud_t pud = pfn_pud(pfn, prot);
> +
> + WARN_ON(!pud_devmap(pud_mkdevmap(pud)));
> +}
> +#else
> +static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { }
> +#endif
> +#else
> +static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot) { }
> +static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { }
> +#endif
> +#else
> +static void __init pte_devmap_tests(unsigned long pfn, pgprot_t prot) { }
> +static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot) { }
> +static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { }
> +#endif
> +
> +static void __init pte_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
> +{
> + pte_t pte = pfn_pte(pfn, prot);
> +
> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY))
> + return;
> +
> + WARN_ON(!pte_soft_dirty(pte_mksoft_dirty(pte)));
> + WARN_ON(pte_soft_dirty(pte_clear_soft_dirty(pte)));
> +}
> +
> +static void __init pte_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
> +{
> + pte_t pte = pfn_pte(pfn, prot);
> +
> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY))
> + return;
> +
> + WARN_ON(!pte_swp_soft_dirty(pte_swp_mksoft_dirty(pte)));
> + WARN_ON(pte_swp_soft_dirty(pte_swp_clear_soft_dirty(pte)));
> +}
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +static void __init pmd_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
> +{
> + pmd_t pmd = pfn_pmd(pfn, prot);
> +
> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY))
> + return;
> +
> + WARN_ON(!pmd_soft_dirty(pmd_mksoft_dirty(pmd)));
> + WARN_ON(pmd_soft_dirty(pmd_clear_soft_dirty(pmd)));
> +}
> +
> +static void __init pmd_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
> +{
> + pmd_t pmd = pfn_pmd(pfn, prot);
> +
> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY) ||
> + !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION))
> + return;
> +
> + WARN_ON(!pmd_swp_soft_dirty(pmd_swp_mksoft_dirty(pmd)));
> + WARN_ON(pmd_swp_soft_dirty(pmd_swp_clear_soft_dirty(pmd)));
> +}
> +#else
> +static void __init pmd_soft_dirty_tests(unsigned long pfn, pgprot_t prot) { }
> +static void __init pmd_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
> +{
> +}
> +#endif
> +
> +static void __init pte_swap_tests(unsigned long pfn, pgprot_t prot)
> +{
> + swp_entry_t swp;
> + pte_t pte;
> +
> + pte = pfn_pte(pfn, prot);
> + swp = __pte_to_swp_entry(pte);
> + WARN_ON(!pte_same(pte, __swp_entry_to_pte(swp)));
> +}
> +
> +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> +static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot)
> +{
> + swp_entry_t swp;
> + pmd_t pmd;
> +
> + pmd = pfn_pmd(pfn, prot);
> + swp = __pmd_to_swp_entry(pmd);
> + WARN_ON(!pmd_same(pmd, __swp_entry_to_pmd(swp)));
> +}
> +#else
> +static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot) { }
> +#endif
> +
> +static void __init swap_migration_tests(void)
> +{
> + struct page *page;
> + swp_entry_t swp;
> +
> + if (!IS_ENABLED(CONFIG_MIGRATION))
> + return;
> + /*
> + * swap_migration_tests() requires a dedicated page as it needs to
> + * be locked before creating a migration entry from it. Locking the
> + * page that actually maps kernel text ('start_kernel') can be real
> + * problematic. Lets allocate a dedicated page explicitly for this
> + * purpose that will be freed subsequently.
> + */
> + page = alloc_page(GFP_KERNEL);
> + if (!page) {
> + pr_err("page allocation failed\n");
> + return;
> + }
> +
> + /*
> + * make_migration_entry() expects given page to be
> + * locked, otherwise it stumbles upon a BUG_ON().
> + */
> + __SetPageLocked(page);
> + swp = make_migration_entry(page, 1);
> + WARN_ON(!is_migration_entry(swp));
> + WARN_ON(!is_write_migration_entry(swp));
> +
> + make_migration_entry_read(&swp);
> + WARN_ON(!is_migration_entry(swp));
> + WARN_ON(is_write_migration_entry(swp));
> +
> + swp = make_migration_entry(page, 0);
> + WARN_ON(!is_migration_entry(swp));
> + WARN_ON(is_write_migration_entry(swp));
> + __ClearPageLocked(page);
> + __free_page(page);
> +}
> +
> +#ifdef CONFIG_HUGETLB_PAGE
> +static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot)
> +{
> + struct page *page;
> + pte_t pte;
> +
> + /*
> + * Accessing the page associated with the pfn is safe here,
> + * as it was previously derived from a real kernel symbol.
> + */
> + page = pfn_to_page(pfn);
> + pte = mk_huge_pte(page, prot);
> +
> + WARN_ON(!huge_pte_dirty(huge_pte_mkdirty(pte)));
> + WARN_ON(!huge_pte_write(huge_pte_mkwrite(huge_pte_wrprotect(pte))));
> + WARN_ON(huge_pte_write(huge_pte_wrprotect(huge_pte_mkwrite(pte))));
> +
> +#ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB
> + pte = pfn_pte(pfn, prot);
> +
> + WARN_ON(!pte_huge(pte_mkhuge(pte)));
> +#endif
> +}
> +#else
> +static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot) { }
> +#endif
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +static void __init pmd_thp_tests(unsigned long pfn, pgprot_t prot)
> +{
> + pmd_t pmd;
> +
> + /*
> + * pmd_trans_huge() and pmd_present() must return positive
> + * after MMU invalidation with pmd_mknotpresent().
> + */
> + pmd = pfn_pmd(pfn, prot);
> + WARN_ON(!pmd_trans_huge(pmd_mkhuge(pmd)));
> +
> +#ifndef __HAVE_ARCH_PMDP_INVALIDATE
> + WARN_ON(!pmd_trans_huge(pmd_mknotpresent(pmd_mkhuge(pmd))));
> + WARN_ON(!pmd_present(pmd_mknotpresent(pmd_mkhuge(pmd))));
> +#endif

I think we need a better comment here, because requiring pmd_trans_huge() and
pmd_present() returning true after pmd_mknotpresent() is not straightforward.

According to Andrea Arcangeli’s email (https://lore.kernel.org/linux-mm/[email protected]/),
This behavior is an optimization for transparent huge page.
pmd_trans_huge() must be true if pmd_page() returns you a valid THP to avoid
taking the pmd_lock when others walk over non transhuge pmds (i.e. there are no
THP allocated). Especially when we split a THP, removing the present bit from
the pmd, pmd_trans_huge() still needs to return true. pmd_present() should
be true whenever pmd_trans_huge() returns true.

I think it is also worth either putting Andres’s email or the link to it
in the rst file in your 3rd patch. It is a good documentation for this special
case.

—
Best Regards,
Yan Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2020-03-26 02:20:33

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V2 1/3] mm/debug: Add tests validating arch page table helpers for core features


On 03/24/2020 06:59 PM, Zi Yan wrote:
> On 24 Mar 2020, at 1:22, Anshuman Khandual wrote:
>
>> This adds new tests validating arch page table helpers for these following
>> core memory features. These tests create and test specific mapping types at
>> various page table levels.
>>
>> 1. SPECIAL mapping
>> 2. PROTNONE mapping
>> 3. DEVMAP mapping
>> 4. SOFTDIRTY mapping
>> 5. SWAP mapping
>> 6. MIGRATION mapping
>> 7. HUGETLB mapping
>> 8. THP mapping
>>
>> Cc: Andrew Morton <[email protected]>
>> Cc: Mike Rapoport <[email protected]>
>> Cc: Vineet Gupta <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: Benjamin Herrenschmidt <[email protected]>
>> Cc: Paul Mackerras <[email protected]>
>> Cc: Michael Ellerman <[email protected]>
>> Cc: Heiko Carstens <[email protected]>
>> Cc: Vasily Gorbik <[email protected]>
>> Cc: Christian Borntraeger <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Borislav Petkov <[email protected]>
>> Cc: "H. Peter Anvin" <[email protected]>
>> Cc: Kirill A. Shutemov <[email protected]>
>> Cc: Paul Walmsley <[email protected]>
>> Cc: Palmer Dabbelt <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Suggested-by: Catalin Marinas <[email protected]>
>> Signed-off-by: Anshuman Khandual <[email protected]>
>> ---
>> mm/debug_vm_pgtable.c | 291 +++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 290 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>> index 98990a515268..15055a8f6478 100644
>> --- a/mm/debug_vm_pgtable.c
>> +++ b/mm/debug_vm_pgtable.c
>> @@ -289,6 +289,267 @@ static void __init pmd_populate_tests(struct mm_struct *mm, pmd_t *pmdp,
>> WARN_ON(pmd_bad(pmd));
>> }
>>
>> +static void __init pte_special_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> + pte_t pte = pfn_pte(pfn, prot);
>> +
>> + if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL))
>> + return;
>> +
>> + WARN_ON(!pte_special(pte_mkspecial(pte)));
>> +}
>> +
>> +static void __init pte_protnone_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> + pte_t pte = pfn_pte(pfn, prot);
>> +
>> + if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
>> + return;
>> +
>> + WARN_ON(!pte_protnone(pte));
>> + WARN_ON(!pte_present(pte));
>> +}
>> +
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +static void __init pmd_protnone_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> + pmd_t pmd = pfn_pmd(pfn, prot);
>> +
>> + if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
>> + return;
>> +
>> + WARN_ON(!pmd_protnone(pmd));
>> + WARN_ON(!pmd_present(pmd));
>> +}
>> +#else
>> +static void __init pmd_protnone_tests(unsigned long pfn, pgprot_t prot) { }
>> +#endif
>> +
>> +#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
>> +static void __init pte_devmap_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> + pte_t pte = pfn_pte(pfn, prot);
>> +
>> + WARN_ON(!pte_devmap(pte_mkdevmap(pte)));
>> +}
>> +
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> + pmd_t pmd = pfn_pmd(pfn, prot);
>> +
>> + WARN_ON(!pmd_devmap(pmd_mkdevmap(pmd)));
>> +}
>> +
>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>> +static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> + pud_t pud = pfn_pud(pfn, prot);
>> +
>> + WARN_ON(!pud_devmap(pud_mkdevmap(pud)));
>> +}
>> +#else
>> +static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { }
>> +#endif
>> +#else
>> +static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot) { }
>> +static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { }
>> +#endif
>> +#else
>> +static void __init pte_devmap_tests(unsigned long pfn, pgprot_t prot) { }
>> +static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot) { }
>> +static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { }
>> +#endif
>> +
>> +static void __init pte_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> + pte_t pte = pfn_pte(pfn, prot);
>> +
>> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY))
>> + return;
>> +
>> + WARN_ON(!pte_soft_dirty(pte_mksoft_dirty(pte)));
>> + WARN_ON(pte_soft_dirty(pte_clear_soft_dirty(pte)));
>> +}
>> +
>> +static void __init pte_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> + pte_t pte = pfn_pte(pfn, prot);
>> +
>> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY))
>> + return;
>> +
>> + WARN_ON(!pte_swp_soft_dirty(pte_swp_mksoft_dirty(pte)));
>> + WARN_ON(pte_swp_soft_dirty(pte_swp_clear_soft_dirty(pte)));
>> +}
>> +
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +static void __init pmd_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> + pmd_t pmd = pfn_pmd(pfn, prot);
>> +
>> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY))
>> + return;
>> +
>> + WARN_ON(!pmd_soft_dirty(pmd_mksoft_dirty(pmd)));
>> + WARN_ON(pmd_soft_dirty(pmd_clear_soft_dirty(pmd)));
>> +}
>> +
>> +static void __init pmd_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> + pmd_t pmd = pfn_pmd(pfn, prot);
>> +
>> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY) ||
>> + !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION))
>> + return;
>> +
>> + WARN_ON(!pmd_swp_soft_dirty(pmd_swp_mksoft_dirty(pmd)));
>> + WARN_ON(pmd_swp_soft_dirty(pmd_swp_clear_soft_dirty(pmd)));
>> +}
>> +#else
>> +static void __init pmd_soft_dirty_tests(unsigned long pfn, pgprot_t prot) { }
>> +static void __init pmd_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> +}
>> +#endif
>> +
>> +static void __init pte_swap_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> + swp_entry_t swp;
>> + pte_t pte;
>> +
>> + pte = pfn_pte(pfn, prot);
>> + swp = __pte_to_swp_entry(pte);
>> + WARN_ON(!pte_same(pte, __swp_entry_to_pte(swp)));
>> +}
>> +
>> +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>> +static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> + swp_entry_t swp;
>> + pmd_t pmd;
>> +
>> + pmd = pfn_pmd(pfn, prot);
>> + swp = __pmd_to_swp_entry(pmd);
>> + WARN_ON(!pmd_same(pmd, __swp_entry_to_pmd(swp)));
>> +}
>> +#else
>> +static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot) { }
>> +#endif
>> +
>> +static void __init swap_migration_tests(void)
>> +{
>> + struct page *page;
>> + swp_entry_t swp;
>> +
>> + if (!IS_ENABLED(CONFIG_MIGRATION))
>> + return;
>> + /*
>> + * swap_migration_tests() requires a dedicated page as it needs to
>> + * be locked before creating a migration entry from it. Locking the
>> + * page that actually maps kernel text ('start_kernel') can be real
>> + * problematic. Lets allocate a dedicated page explicitly for this
>> + * purpose that will be freed subsequently.
>> + */
>> + page = alloc_page(GFP_KERNEL);
>> + if (!page) {
>> + pr_err("page allocation failed\n");
>> + return;
>> + }
>> +
>> + /*
>> + * make_migration_entry() expects given page to be
>> + * locked, otherwise it stumbles upon a BUG_ON().
>> + */
>> + __SetPageLocked(page);
>> + swp = make_migration_entry(page, 1);
>> + WARN_ON(!is_migration_entry(swp));
>> + WARN_ON(!is_write_migration_entry(swp));
>> +
>> + make_migration_entry_read(&swp);
>> + WARN_ON(!is_migration_entry(swp));
>> + WARN_ON(is_write_migration_entry(swp));
>> +
>> + swp = make_migration_entry(page, 0);
>> + WARN_ON(!is_migration_entry(swp));
>> + WARN_ON(is_write_migration_entry(swp));
>> + __ClearPageLocked(page);
>> + __free_page(page);
>> +}
>> +
>> +#ifdef CONFIG_HUGETLB_PAGE
>> +static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> + struct page *page;
>> + pte_t pte;
>> +
>> + /*
>> + * Accessing the page associated with the pfn is safe here,
>> + * as it was previously derived from a real kernel symbol.
>> + */
>> + page = pfn_to_page(pfn);
>> + pte = mk_huge_pte(page, prot);
>> +
>> + WARN_ON(!huge_pte_dirty(huge_pte_mkdirty(pte)));
>> + WARN_ON(!huge_pte_write(huge_pte_mkwrite(huge_pte_wrprotect(pte))));
>> + WARN_ON(huge_pte_write(huge_pte_wrprotect(huge_pte_mkwrite(pte))));
>> +
>> +#ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB
>> + pte = pfn_pte(pfn, prot);
>> +
>> + WARN_ON(!pte_huge(pte_mkhuge(pte)));
>> +#endif
>> +}
>> +#else
>> +static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot) { }
>> +#endif
>> +
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +static void __init pmd_thp_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> + pmd_t pmd;
>> +
>> + /*
>> + * pmd_trans_huge() and pmd_present() must return positive
>> + * after MMU invalidation with pmd_mknotpresent().
>> + */
>> + pmd = pfn_pmd(pfn, prot);
>> + WARN_ON(!pmd_trans_huge(pmd_mkhuge(pmd)));
>> +
>> +#ifndef __HAVE_ARCH_PMDP_INVALIDATE
>> + WARN_ON(!pmd_trans_huge(pmd_mknotpresent(pmd_mkhuge(pmd))));
>> + WARN_ON(!pmd_present(pmd_mknotpresent(pmd_mkhuge(pmd))));
>> +#endif
>
> I think we need a better comment here, because requiring pmd_trans_huge() and
> pmd_present() returning true after pmd_mknotpresent() is not straightforward.

Thats right.

>
> According to Andrea Arcangeli’s email (https://lore.kernel.org/linux-mm/[email protected]/),
> This behavior is an optimization for transparent huge page.
> pmd_trans_huge() must be true if pmd_page() returns you a valid THP to avoid
> taking the pmd_lock when others walk over non transhuge pmds (i.e. there are no
> THP allocated). Especially when we split a THP, removing the present bit from
> the pmd, pmd_trans_huge() still needs to return true. pmd_present() should
> be true whenever pmd_trans_huge() returns true.

Sure, will modify the existing comment here like this.

/*
* pmd_trans_huge() and pmd_present() must return positive after
* MMU invalidation with pmd_mknotpresent(). This behavior is an
* optimization for transparent huge page. pmd_trans_huge() must
* be true if pmd_page() returns a valid THP to avoid taking the
* pmd_lock when others walk over non transhuge pmds (i.e. there
* are no THP allocated). Especially when splitting a THP and
* removing the present bit from the pmd, pmd_trans_huge() still
* needs to return true. pmd_present() should be true whenever
* pmd_trans_huge() returns true.
*/

>
> I think it is also worth either putting Andres’s email or the link to it
> in the rst file in your 3rd patch. It is a good documentation for this special
> case.

Makes sense. Will update Andrea's email link in the .rst file as well.

>
> —
> Best Regards,
> Yan Zi
>

2020-03-26 02:24:43

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests



On 03/24/2020 10:52 AM, Anshuman Khandual wrote:
> This series adds more arch page table helper tests. The new tests here are
> either related to core memory functions and advanced arch pgtable helpers.
> This also creates a documentation file enlisting all expected semantics as
> suggested by Mike Rapoport (https://lkml.org/lkml/2020/1/30/40).
>
> This series has been tested on arm64 and x86 platforms.

If folks can test these patches out on remaining ARCH_HAS_DEBUG_VM_PGTABLE
enabled platforms i.e s390, arc, powerpc (32 and 64), that will be really
appreciated. Thank you.

- Anshuman

2020-03-26 15:26:58

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests



Le 26/03/2020 à 03:23, Anshuman Khandual a écrit :
>
>
> On 03/24/2020 10:52 AM, Anshuman Khandual wrote:
>> This series adds more arch page table helper tests. The new tests here are
>> either related to core memory functions and advanced arch pgtable helpers.
>> This also creates a documentation file enlisting all expected semantics as
>> suggested by Mike Rapoport (https://lkml.org/lkml/2020/1/30/40).
>>
>> This series has been tested on arm64 and x86 platforms.
>
> If folks can test these patches out on remaining ARCH_HAS_DEBUG_VM_PGTABLE
> enabled platforms i.e s390, arc, powerpc (32 and 64), that will be really
> appreciated. Thank you.
>

On powerpc 8xx (PPC32), I get:

[ 53.338368] debug_vm_pgtable: debug_vm_pgtable: Validating
architecture page table helpers
[ 53.347403] ------------[ cut here ]------------
[ 53.351832] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:647
debug_vm_pgtable+0x280/0x3f4
[ 53.360140] CPU: 0 PID: 1 Comm: swapper Not tainted
5.6.0-rc7-s3k-dev-01090-g92710e99881f #3544
[ 53.368718] NIP: c0777c04 LR: c0777bb8 CTR: 00000000
[ 53.373720] REGS: c9023df0 TRAP: 0700 Not tainted
(5.6.0-rc7-s3k-dev-01090-g92710e99881f)
[ 53.382042] MSR: 00029032 <EE,ME,IR,DR,RI> CR: 22000222 XER: 20000000
[ 53.388667]
[ 53.388667] GPR00: c0777bb8 c9023ea8 c6120000 00000001 1e410000
00000000 00000000 007641c9
[ 53.388667] GPR08: 00000000 00000001 00000000 ffffffff 82000222
00000000 c00039b8 00000000
[ 53.388667] GPR16: 00000000 00000000 00000000 fffffff0 065fc000
1e410000 c6600000 000001e4
[ 53.388667] GPR24: 000001d9 c062d14c c65fc000 c642d448 000006c9
00000000 c65f8000 c65fc040
[ 53.423400] NIP [c0777c04] debug_vm_pgtable+0x280/0x3f4
[ 53.428559] LR [c0777bb8] debug_vm_pgtable+0x234/0x3f4
[ 53.433593] Call Trace:
[ 53.436048] [c9023ea8] [c0777bb8] debug_vm_pgtable+0x234/0x3f4
(unreliable)
[ 53.442936] [c9023f28] [c00039e0] kernel_init+0x28/0x124
[ 53.448184] [c9023f38] [c000f174] ret_from_kernel_thread+0x14/0x1c
[ 53.454245] Instruction dump:
[ 53.457180] 41a20008 4bea3ed9 62890021 7d36b92e 7d36b82e 71290fd0
3149ffff 7d2a4910
[ 53.464838] 0f090000 5789077e 3149ffff 7d2a4910 <0f090000> 38c00000
38a00000 38800000
[ 53.472671] ---[ end trace fd5dd92744dc0065 ]---
[ 53.519778] Freeing unused kernel memory: 608K

2020-03-26 17:10:50

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH V2 1/3] mm/debug: Add tests validating arch page table helpers for core features

On 25 Mar 2020, at 22:18, Anshuman Khandual wrote:

> External email: Use caution opening links or attachments
>
>
> On 03/24/2020 06:59 PM, Zi Yan wrote:
>> On 24 Mar 2020, at 1:22, Anshuman Khandual wrote:
>>
>>> This adds new tests validating arch page table helpers for these following
>>> core memory features. These tests create and test specific mapping types at
>>> various page table levels.
>>>
>>> 1. SPECIAL mapping
>>> 2. PROTNONE mapping
>>> 3. DEVMAP mapping
>>> 4. SOFTDIRTY mapping
>>> 5. SWAP mapping
>>> 6. MIGRATION mapping
>>> 7. HUGETLB mapping
>>> 8. THP mapping
>>>
>>> Cc: Andrew Morton <[email protected]>
>>> Cc: Mike Rapoport <[email protected]>
>>> Cc: Vineet Gupta <[email protected]>
>>> Cc: Catalin Marinas <[email protected]>
>>> Cc: Will Deacon <[email protected]>
>>> Cc: Benjamin Herrenschmidt <[email protected]>
>>> Cc: Paul Mackerras <[email protected]>
>>> Cc: Michael Ellerman <[email protected]>
>>> Cc: Heiko Carstens <[email protected]>
>>> Cc: Vasily Gorbik <[email protected]>
>>> Cc: Christian Borntraeger <[email protected]>
>>> Cc: Thomas Gleixner <[email protected]>
>>> Cc: Ingo Molnar <[email protected]>
>>> Cc: Borislav Petkov <[email protected]>
>>> Cc: "H. Peter Anvin" <[email protected]>
>>> Cc: Kirill A. Shutemov <[email protected]>
>>> Cc: Paul Walmsley <[email protected]>
>>> Cc: Palmer Dabbelt <[email protected]>
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Suggested-by: Catalin Marinas <[email protected]>
>>> Signed-off-by: Anshuman Khandual <[email protected]>
>>> ---
>>> mm/debug_vm_pgtable.c | 291 +++++++++++++++++++++++++++++++++++++++++-
>>> 1 file changed, 290 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>>> index 98990a515268..15055a8f6478 100644
>>> --- a/mm/debug_vm_pgtable.c
>>> +++ b/mm/debug_vm_pgtable.c
>>> @@ -289,6 +289,267 @@ static void __init pmd_populate_tests(struct mm_struct *mm, pmd_t *pmdp,
>>> WARN_ON(pmd_bad(pmd));
>>> }
>>>
>>> +static void __init pte_special_tests(unsigned long pfn, pgprot_t prot)
>>> +{
>>> + pte_t pte = pfn_pte(pfn, prot);
>>> +
>>> + if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL))
>>> + return;
>>> +
>>> + WARN_ON(!pte_special(pte_mkspecial(pte)));
>>> +}
>>> +
>>> +static void __init pte_protnone_tests(unsigned long pfn, pgprot_t prot)
>>> +{
>>> + pte_t pte = pfn_pte(pfn, prot);
>>> +
>>> + if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
>>> + return;
>>> +
>>> + WARN_ON(!pte_protnone(pte));
>>> + WARN_ON(!pte_present(pte));
>>> +}
>>> +
>>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> +static void __init pmd_protnone_tests(unsigned long pfn, pgprot_t prot)
>>> +{
>>> + pmd_t pmd = pfn_pmd(pfn, prot);
>>> +
>>> + if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
>>> + return;
>>> +
>>> + WARN_ON(!pmd_protnone(pmd));
>>> + WARN_ON(!pmd_present(pmd));
>>> +}
>>> +#else
>>> +static void __init pmd_protnone_tests(unsigned long pfn, pgprot_t prot) { }
>>> +#endif
>>> +
>>> +#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
>>> +static void __init pte_devmap_tests(unsigned long pfn, pgprot_t prot)
>>> +{
>>> + pte_t pte = pfn_pte(pfn, prot);
>>> +
>>> + WARN_ON(!pte_devmap(pte_mkdevmap(pte)));
>>> +}
>>> +
>>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> +static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot)
>>> +{
>>> + pmd_t pmd = pfn_pmd(pfn, prot);
>>> +
>>> + WARN_ON(!pmd_devmap(pmd_mkdevmap(pmd)));
>>> +}
>>> +
>>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>> +static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot)
>>> +{
>>> + pud_t pud = pfn_pud(pfn, prot);
>>> +
>>> + WARN_ON(!pud_devmap(pud_mkdevmap(pud)));
>>> +}
>>> +#else
>>> +static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { }
>>> +#endif
>>> +#else
>>> +static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot) { }
>>> +static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { }
>>> +#endif
>>> +#else
>>> +static void __init pte_devmap_tests(unsigned long pfn, pgprot_t prot) { }
>>> +static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot) { }
>>> +static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { }
>>> +#endif
>>> +
>>> +static void __init pte_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
>>> +{
>>> + pte_t pte = pfn_pte(pfn, prot);
>>> +
>>> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY))
>>> + return;
>>> +
>>> + WARN_ON(!pte_soft_dirty(pte_mksoft_dirty(pte)));
>>> + WARN_ON(pte_soft_dirty(pte_clear_soft_dirty(pte)));
>>> +}
>>> +
>>> +static void __init pte_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
>>> +{
>>> + pte_t pte = pfn_pte(pfn, prot);
>>> +
>>> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY))
>>> + return;
>>> +
>>> + WARN_ON(!pte_swp_soft_dirty(pte_swp_mksoft_dirty(pte)));
>>> + WARN_ON(pte_swp_soft_dirty(pte_swp_clear_soft_dirty(pte)));
>>> +}
>>> +
>>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> +static void __init pmd_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
>>> +{
>>> + pmd_t pmd = pfn_pmd(pfn, prot);
>>> +
>>> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY))
>>> + return;
>>> +
>>> + WARN_ON(!pmd_soft_dirty(pmd_mksoft_dirty(pmd)));
>>> + WARN_ON(pmd_soft_dirty(pmd_clear_soft_dirty(pmd)));
>>> +}
>>> +
>>> +static void __init pmd_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
>>> +{
>>> + pmd_t pmd = pfn_pmd(pfn, prot);
>>> +
>>> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY) ||
>>> + !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION))
>>> + return;
>>> +
>>> + WARN_ON(!pmd_swp_soft_dirty(pmd_swp_mksoft_dirty(pmd)));
>>> + WARN_ON(pmd_swp_soft_dirty(pmd_swp_clear_soft_dirty(pmd)));
>>> +}
>>> +#else
>>> +static void __init pmd_soft_dirty_tests(unsigned long pfn, pgprot_t prot) { }
>>> +static void __init pmd_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
>>> +{
>>> +}
>>> +#endif
>>> +
>>> +static void __init pte_swap_tests(unsigned long pfn, pgprot_t prot)
>>> +{
>>> + swp_entry_t swp;
>>> + pte_t pte;
>>> +
>>> + pte = pfn_pte(pfn, prot);
>>> + swp = __pte_to_swp_entry(pte);
>>> + WARN_ON(!pte_same(pte, __swp_entry_to_pte(swp)));
>>> +}
>>> +
>>> +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>>> +static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot)
>>> +{
>>> + swp_entry_t swp;
>>> + pmd_t pmd;
>>> +
>>> + pmd = pfn_pmd(pfn, prot);
>>> + swp = __pmd_to_swp_entry(pmd);
>>> + WARN_ON(!pmd_same(pmd, __swp_entry_to_pmd(swp)));
>>> +}
>>> +#else
>>> +static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot) { }
>>> +#endif
>>> +
>>> +static void __init swap_migration_tests(void)
>>> +{
>>> + struct page *page;
>>> + swp_entry_t swp;
>>> +
>>> + if (!IS_ENABLED(CONFIG_MIGRATION))
>>> + return;
>>> + /*
>>> + * swap_migration_tests() requires a dedicated page as it needs to
>>> + * be locked before creating a migration entry from it. Locking the
>>> + * page that actually maps kernel text ('start_kernel') can be real
>>> + * problematic. Lets allocate a dedicated page explicitly for this
>>> + * purpose that will be freed subsequently.
>>> + */
>>> + page = alloc_page(GFP_KERNEL);
>>> + if (!page) {
>>> + pr_err("page allocation failed\n");
>>> + return;
>>> + }
>>> +
>>> + /*
>>> + * make_migration_entry() expects given page to be
>>> + * locked, otherwise it stumbles upon a BUG_ON().
>>> + */
>>> + __SetPageLocked(page);
>>> + swp = make_migration_entry(page, 1);
>>> + WARN_ON(!is_migration_entry(swp));
>>> + WARN_ON(!is_write_migration_entry(swp));
>>> +
>>> + make_migration_entry_read(&swp);
>>> + WARN_ON(!is_migration_entry(swp));
>>> + WARN_ON(is_write_migration_entry(swp));
>>> +
>>> + swp = make_migration_entry(page, 0);
>>> + WARN_ON(!is_migration_entry(swp));
>>> + WARN_ON(is_write_migration_entry(swp));
>>> + __ClearPageLocked(page);
>>> + __free_page(page);
>>> +}
>>> +
>>> +#ifdef CONFIG_HUGETLB_PAGE
>>> +static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot)
>>> +{
>>> + struct page *page;
>>> + pte_t pte;
>>> +
>>> + /*
>>> + * Accessing the page associated with the pfn is safe here,
>>> + * as it was previously derived from a real kernel symbol.
>>> + */
>>> + page = pfn_to_page(pfn);
>>> + pte = mk_huge_pte(page, prot);
>>> +
>>> + WARN_ON(!huge_pte_dirty(huge_pte_mkdirty(pte)));
>>> + WARN_ON(!huge_pte_write(huge_pte_mkwrite(huge_pte_wrprotect(pte))));
>>> + WARN_ON(huge_pte_write(huge_pte_wrprotect(huge_pte_mkwrite(pte))));
>>> +
>>> +#ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB
>>> + pte = pfn_pte(pfn, prot);
>>> +
>>> + WARN_ON(!pte_huge(pte_mkhuge(pte)));
>>> +#endif
>>> +}
>>> +#else
>>> +static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot) { }
>>> +#endif
>>> +
>>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> +static void __init pmd_thp_tests(unsigned long pfn, pgprot_t prot)
>>> +{
>>> + pmd_t pmd;
>>> +
>>> + /*
>>> + * pmd_trans_huge() and pmd_present() must return positive
>>> + * after MMU invalidation with pmd_mknotpresent().
>>> + */
>>> + pmd = pfn_pmd(pfn, prot);
>>> + WARN_ON(!pmd_trans_huge(pmd_mkhuge(pmd)));
>>> +
>>> +#ifndef __HAVE_ARCH_PMDP_INVALIDATE
>>> + WARN_ON(!pmd_trans_huge(pmd_mknotpresent(pmd_mkhuge(pmd))));
>>> + WARN_ON(!pmd_present(pmd_mknotpresent(pmd_mkhuge(pmd))));
>>> +#endif
>>
>> I think we need a better comment here, because requiring pmd_trans_huge() and
>> pmd_present() returning true after pmd_mknotpresent() is not straightforward.
>
> Thats right.
>
>>
>> According to Andrea Arcangeli’s email (https://lore.kernel.org/linux-mm/[email protected]/),
>> This behavior is an optimization for transparent huge page.
>> pmd_trans_huge() must be true if pmd_page() returns you a valid THP to avoid
>> taking the pmd_lock when others walk over non transhuge pmds (i.e. there are no
>> THP allocated). Especially when we split a THP, removing the present bit from
>> the pmd, pmd_trans_huge() still needs to return true. pmd_present() should
>> be true whenever pmd_trans_huge() returns true.
>
> Sure, will modify the existing comment here like this.
>
> /*
> * pmd_trans_huge() and pmd_present() must return positive after
> * MMU invalidation with pmd_mknotpresent(). This behavior is an
> * optimization for transparent huge page. pmd_trans_huge() must
> * be true if pmd_page() returns a valid THP to avoid taking the
> * pmd_lock when others walk over non transhuge pmds (i.e. there
> * are no THP allocated). Especially when splitting a THP and
> * removing the present bit from the pmd, pmd_trans_huge() still
> * needs to return true. pmd_present() should be true whenever
> * pmd_trans_huge() returns true.
> */
>
>>
>> I think it is also worth either putting Andres’s email or the link to it
>> in the rst file in your 3rd patch. It is a good documentation for this special
>> case.
>
> Makes sense. Will update Andrea's email link in the .rst file as well.
>

Look good to me. Thanks.

Reviewed-by: Zi Yan <[email protected]>

—
Best Regards,
Yan Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2020-03-27 06:47:20

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests


On 03/26/2020 08:53 PM, Christophe Leroy wrote:
>
>
> Le 26/03/2020 à 03:23, Anshuman Khandual a écrit :
>>
>>
>> On 03/24/2020 10:52 AM, Anshuman Khandual wrote:
>>> This series adds more arch page table helper tests. The new tests here are
>>> either related to core memory functions and advanced arch pgtable helpers.
>>> This also creates a documentation file enlisting all expected semantics as
>>> suggested by Mike Rapoport (https://lkml.org/lkml/2020/1/30/40).
>>>
>>> This series has been tested on arm64 and x86 platforms.
>>
>> If folks can test these patches out on remaining ARCH_HAS_DEBUG_VM_PGTABLE
>> enabled platforms i.e s390, arc, powerpc (32 and 64), that will be really
>> appreciated. Thank you.
>>
>
> On powerpc 8xx (PPC32), I get:
>
> [   53.338368] debug_vm_pgtable: debug_vm_pgtable: Validating architecture page table helpers
> [   53.347403] ------------[ cut here ]------------
> [   53.351832] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:647 debug_vm_pgtable+0x280/0x3f4

mm/debug_vm_pgtable.c:647 ?

With the following commits in place

53a8338ce (HEAD) Documentation/mm: Add descriptions for arch page table helper
5d4913fc1 mm/debug: Add tests validating arch advanced page table helpers
bcaf120a7 mm/debug: Add tests validating arch page table helpers for core features
d6ed5a4a5 x86/memory: Drop pud_mknotpresent()
0739d1f8d mm/debug: Add tests validating architecture page table helpers
16fbf79b0 (tag: v5.6-rc7) Linux 5.6-rc7

mm/debug_vm_pgtable.c:647 is here.

#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot)
{
swp_entry_t swp;
pmd_t pmd; -----------------------------> Line #647

pmd = pfn_pmd(pfn, prot);
swp = __pmd_to_swp_entry(pmd);
WARN_ON(!pmd_same(pmd, __swp_entry_to_pmd(swp)));
}
#else
static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot) { }
#end

Did I miss something ?

> [   53.360140] CPU: 0 PID: 1 Comm: swapper Not tainted 5.6.0-rc7-s3k-dev-01090-g92710e99881f #3544
> [   53.368718] NIP:  c0777c04 LR: c0777bb8 CTR: 00000000
> [   53.373720] REGS: c9023df0 TRAP: 0700   Not tainted (5.6.0-rc7-s3k-dev-01090-g92710e99881f)
> [   53.382042] MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 22000222  XER: 20000000
> [   53.388667]
> [   53.388667] GPR00: c0777bb8 c9023ea8 c6120000 00000001 1e410000 00000000 00000000 007641c9
> [   53.388667] GPR08: 00000000 00000001 00000000 ffffffff 82000222 00000000 c00039b8 00000000
> [   53.388667] GPR16: 00000000 00000000 00000000 fffffff0 065fc000 1e410000 c6600000 000001e4
> [   53.388667] GPR24: 000001d9 c062d14c c65fc000 c642d448 000006c9 00000000 c65f8000 c65fc040
> [   53.423400] NIP [c0777c04] debug_vm_pgtable+0x280/0x3f4
> [   53.428559] LR [c0777bb8] debug_vm_pgtable+0x234/0x3f4
> [   53.433593] Call Trace:
> [   53.436048] [c9023ea8] [c0777bb8] debug_vm_pgtable+0x234/0x3f4 (unreliable)
> [   53.442936] [c9023f28] [c00039e0] kernel_init+0x28/0x124
> [   53.448184] [c9023f38] [c000f174] ret_from_kernel_thread+0x14/0x1c
> [   53.454245] Instruction dump:
> [   53.457180] 41a20008 4bea3ed9 62890021 7d36b92e 7d36b82e 71290fd0 3149ffff 7d2a4910
> [   53.464838] 0f090000 5789077e 3149ffff 7d2a4910 <0f090000> 38c00000 38a00000 38800000
> [   53.472671] ---[ end trace fd5dd92744dc0065 ]---
Could you please point me to the exact test which is failing ?

> [   53.519778] Freeing unused kernel memory: 608K
>
>
So I assume that the system should have come till runtime just fine apart from
the above warning message because.

2020-03-27 07:01:24

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests



On 03/27/2020 06:46 AM, Anshuman Khandual wrote:
>
> On 03/26/2020 08:53 PM, Christophe Leroy wrote:
>>
>>
>> Le 26/03/2020 à 03:23, Anshuman Khandual a écrit :
>>>
>>>
>>> On 03/24/2020 10:52 AM, Anshuman Khandual wrote:
>>>> This series adds more arch page table helper tests. The new tests here are
>>>> either related to core memory functions and advanced arch pgtable helpers.
>>>> This also creates a documentation file enlisting all expected semantics as
>>>> suggested by Mike Rapoport (https://lkml.org/lkml/2020/1/30/40).
>>>>
>>>> This series has been tested on arm64 and x86 platforms.
>>>
>>> If folks can test these patches out on remaining ARCH_HAS_DEBUG_VM_PGTABLE
>>> enabled platforms i.e s390, arc, powerpc (32 and 64), that will be really
>>> appreciated. Thank you.
>>>
>>
>> On powerpc 8xx (PPC32), I get:
>>
>> [   53.338368] debug_vm_pgtable: debug_vm_pgtable: Validating architecture page table helpers
>> [   53.347403] ------------[ cut here ]------------
>> [   53.351832] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:647 debug_vm_pgtable+0x280/0x3f4
>
> mm/debug_vm_pgtable.c:647 ?
>
> With the following commits in place
>
> 53a8338ce (HEAD) Documentation/mm: Add descriptions for arch page table helper
> 5d4913fc1 mm/debug: Add tests validating arch advanced page table helpers
> bcaf120a7 mm/debug: Add tests validating arch page table helpers for core features
> d6ed5a4a5 x86/memory: Drop pud_mknotpresent()
> 0739d1f8d mm/debug: Add tests validating architecture page table helpers
> 16fbf79b0 (tag: v5.6-rc7) Linux 5.6-rc7

I have:

facaa5eb5909 (HEAD -> helpers0) mm/debug: Add tests validating arch
advanced page table helpers
6389fed515fc mm/debug: Add tests validating arch page table helpers for
core features
dc14ecc8b94e mm/debug: add tests validating architecture page table helpers
c6624071c338 (origin/merge, merge) Automatic merge of branches 'master',
'next' and 'fixes' into merge
58e05c5508e6 Automatic merge of branches 'master', 'next' and 'fixes'
into merge
1b649e0bcae7 (origin/master, origin/HEAD) Merge
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

origin is https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git

I can't see your last patch in powerpc mailing list
(https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166237)

>
> mm/debug_vm_pgtable.c:647 is here.

Line 647 is:

WARN_ON(!pte_same(pte, __swp_entry_to_pte(swp)));


>
> #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot)
> {
> swp_entry_t swp;
> pmd_t pmd; -----------------------------> Line #647
>
> pmd = pfn_pmd(pfn, prot);
> swp = __pmd_to_swp_entry(pmd);
> WARN_ON(!pmd_same(pmd, __swp_entry_to_pmd(swp)));
> }
> #else
> static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot) { }
> #end
>
> Did I miss something ?
>

[...]

> Could you please point me to the exact test which is failing ?
>
>> [   53.519778] Freeing unused kernel memory: 608K
>>
>>
> So I assume that the system should have come till runtime just fine apart from
> the above warning message because.
>

Yes it boots fine otherwise.

Christophe

2020-03-27 07:43:45

by Chen, Rong A

[permalink] [raw]
Subject: [mm/debug] d157503f6f: WARNING:at_mm/debug_vm_pgtable.c:#debug_vm_pgtable

FYI, we noticed the following commit (built with gcc-7):

commit: d157503f6ff935d0ab4e0c3f324099a458597b5d ("[PATCH V2 2/3] mm/debug: Add tests validating arch advanced page table helpers")
url: https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-debug-Add-more-arch-page-table-helper-tests/20200324-161233


in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+-----------------------------------------------------------+------------+------------+
| | f675f2f91d | d157503f6f |
+-----------------------------------------------------------+------------+------------+
| boot_successes | 0 | 0 |
| boot_failures | 4 | 4 |
| Kernel_panic-not_syncing:Attempted_to_kill_init!exitcode= | 4 | 4 |
| WARNING:at_mm/debug_vm_pgtable.c:#debug_vm_pgtable | 0 | 4 |
| EIP:debug_vm_pgtable | 0 | 4 |
+-----------------------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 25.884391] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:203 debug_vm_pgtable+0x320/0x51a
[ 25.886281] Modules linked in:
[ 25.886838] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc7-next-20200323-00002-gd157503f6ff93 #1
[ 25.888378] EIP: debug_vm_pgtable+0x320/0x51a
[ 25.889113] Code: 00 00 00 83 f8 01 19 c0 25 00 f0 3f 00 2d 00 00 40 00 f7 d0 21 d0 a8 20 74 02 0f 0b 8b 45 f0 e8 3b ff 31 fe c7 03 00 00 00 00 <0f> 0b 0f 0b 8b 03 85 c0 74 02 0f 0b 8b 45 f0 e8 21 ff 31 fe 8b 45
[ 25.892453] EAX: 00000025 EBX: e9160444 ECX: e9160444 EDX: 00000000
[ 25.893652] ESI: 44800000 EDI: 16800080 EBP: c0265fa4 ESP: c0265f7c
[ 25.894842] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246
[ 25.900205] CR0: 80050033 CR2: b7d17b20 CR3: 16ae4000 CR4: 000006d0
[ 25.901289] Call Trace:
[ 25.901728] ? rest_init+0x110/0x110
[ 25.902395] kernel_init+0x12/0x110
[ 25.903065] ret_from_fork+0x2e/0x40
[ 25.903797] ---[ end trace a6228a4356840124 ]---


To reproduce:

# build kernel
cd linux
cp config-5.6.0-rc7-next-20200323-00002-gd157503f6ff93 .config
make HOSTCC=gcc-7 CC=gcc-7 ARCH=i386 olddefconfig prepare modules_prepare bzImage

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
Rong Chen


Attachments:
(No filename) (2.76 kB)
config-5.6.0-rc7-next-20200323-00002-gd157503f6ff93 (141.19 kB)
job-script (4.71 kB)
dmesg.xz (15.99 kB)
Download all attachments

2020-03-29 14:23:11

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests


On 03/27/2020 12:30 PM, Christophe Leroy wrote:
>
>
> On 03/27/2020 06:46 AM, Anshuman Khandual wrote:
>>
>> On 03/26/2020 08:53 PM, Christophe Leroy wrote:
>>>
>>>
>>> Le 26/03/2020 à 03:23, Anshuman Khandual a écrit :
>>>>
>>>>
>>>> On 03/24/2020 10:52 AM, Anshuman Khandual wrote:
>>>>> This series adds more arch page table helper tests. The new tests here are
>>>>> either related to core memory functions and advanced arch pgtable helpers.
>>>>> This also creates a documentation file enlisting all expected semantics as
>>>>> suggested by Mike Rapoport (https://lkml.org/lkml/2020/1/30/40).
>>>>>
>>>>> This series has been tested on arm64 and x86 platforms.
>>>>
>>>> If folks can test these patches out on remaining ARCH_HAS_DEBUG_VM_PGTABLE
>>>> enabled platforms i.e s390, arc, powerpc (32 and 64), that will be really
>>>> appreciated. Thank you.
>>>>
>>>
>>> On powerpc 8xx (PPC32), I get:
>>>
>>> [   53.338368] debug_vm_pgtable: debug_vm_pgtable: Validating architecture page table helpers
>>> [   53.347403] ------------[ cut here ]------------
>>> [   53.351832] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:647 debug_vm_pgtable+0x280/0x3f4
>>
>> mm/debug_vm_pgtable.c:647 ?
>>
>> With the following commits in place
>>
>> 53a8338ce (HEAD) Documentation/mm: Add descriptions for arch page table helper
>> 5d4913fc1 mm/debug: Add tests validating arch advanced page table helpers
>> bcaf120a7 mm/debug: Add tests validating arch page table helpers for core features
>> d6ed5a4a5 x86/memory: Drop pud_mknotpresent()
>> 0739d1f8d mm/debug: Add tests validating architecture page table helpers
>> 16fbf79b0 (tag: v5.6-rc7) Linux 5.6-rc7
>
> I have:
>
> facaa5eb5909 (HEAD -> helpers0) mm/debug: Add tests validating arch advanced page table helpers
> 6389fed515fc mm/debug: Add tests validating arch page table helpers for core features
> dc14ecc8b94e mm/debug: add tests validating architecture page table helpers
> c6624071c338 (origin/merge, merge) Automatic merge of branches 'master', 'next' and 'fixes' into merge
> 58e05c5508e6 Automatic merge of branches 'master', 'next' and 'fixes' into merge
> 1b649e0bcae7 (origin/master, origin/HEAD) Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
>
> origin is https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
>
> I can't see your last patch in powerpc mailing list (https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166237)

My bad, did not update the last patch with required lists (will fix).

>
>>
>> mm/debug_vm_pgtable.c:647 is here.
>
> Line 647 is:
>
>     WARN_ON(!pte_same(pte, __swp_entry_to_pte(swp)));

Both set of definitions suggest that the last three bits (if present)
on the PTE will be discarded during PTE->SWP->PTE conversion which
might be leading to this mismatch and subsequent failure.

arch/powerpc/include/asm/nohash/32/pgtable.h
arch/powerpc/include/asm/book3s/32/pgtable.h

#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) >> 3 })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val << 3 })

Also there are some more architectures (microblaze, sh, etc) where these
conversions are not always preserving. On powerpc64, it sets back _PAGE_PTE
irrespective of whether the bit was originally set or not.

Probably it is wrong to expect that PTE->SWP->PTE conversion will be always
preserving. So wondering if it is worth changing this test to accommodate
all such architectures or just drop it instead.

>
>
>>
>> #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>> static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot)
>> {
>>          swp_entry_t swp;
>>          pmd_t pmd;  -----------------------------> Line #647
>>
>>          pmd = pfn_pmd(pfn, prot);
>>          swp = __pmd_to_swp_entry(pmd);
>>          WARN_ON(!pmd_same(pmd, __swp_entry_to_pmd(swp)));
>> }
>> #else
>> static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot) { }
>> #end
>>
>> Did I miss something ?
>>
>
> [...]
>
>> Could you please point me to the exact test which is failing ?
>>
>>> [   53.519778] Freeing unused kernel memory: 608K
>>>
>>>
>> So I assume that the system should have come till runtime just fine apart from
>> the above warning message because.
>>
>
> Yes it boots fine otherwise.

Cool, that is good to know.

>
> Christophe
>

2020-03-30 08:58:55

by Chen, Rong A

[permalink] [raw]
Subject: [mm/debug] f675f2f91d: WARNING:at_mm/debug_vm_pgtable.c:#debug_vm_pgtable

FYI, we noticed the following commit (built with gcc-7):

commit: f675f2f91d0457e2b430936f0d756457fdcad1ca ("[PATCH V2 1/3] mm/debug: Add tests validating arch page table helpers for core features")
url: https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-debug-Add-more-arch-page-table-helper-tests/20200324-161233


in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+-------------------------------------------------------------------------------+---------------+------------+
| | next-20200323 | f675f2f91d |
+-------------------------------------------------------------------------------+---------------+------------+
| boot_successes | 1 | 0 |
| boot_failures | 3 | 4 |
| WARNING:suspicious_RCU_usage | 3 | 4 |
| drivers/char/ipmi/ipmi_msghandler.c:#RCU-list_traversed_in_non-reader_section | 3 | 4 |
| net/ipv4/fib_trie.c:#RCU-list_traversed_in_non-reader_section | 3 | 2 |
| WARNING:at_mm/debug_vm_pgtable.c:#debug_vm_pgtable | 0 | 4 |
| RIP:debug_vm_pgtable | 0 | 4 |
| calltrace:hrtimers_dead_cpu | 0 | 1 |
| calltrace:irq_exit | 0 | 3 |
+-------------------------------------------------------------------------------+---------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 283.486118] WARNING: CPU: 1 PID: 1 at mm/debug_vm_pgtable.c:371 debug_vm_pgtable+0x4dc/0x7e3
[ 283.487342] Modules linked in:
[ 283.487752] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc7-next-20200323-00001-gf675f2f91d045 #1
[ 283.488817] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 283.489794] RIP: 0010:debug_vm_pgtable+0x4dc/0x7e3
[ 283.490361] Code: b5 fd 48 8b 7d d0 be 20 01 00 00 e8 3d 9f b5 fd 48 8b 75 c8 48 8b 7d d0 e8 30 9f b5 fd 48 8b 75 c8 48 8b 7d d0 e8 23 9f b5 fd <0f> 0b 48 8b 75 c8 48 8b 7d d0 e8 14 9f b5 fd 0f 0b 48 8b 75 c8 48
[ 283.492577] RSP: 0000:ffff888236493ed8 EFLAGS: 00010202
[ 283.493235] RAX: 00000001e1d31025 RBX: ffff88823e7f6cd8 RCX: ffffffffffffffff
[ 283.494135] RDX: 0000000000000000 RSI: 0000000000000025 RDI: 00000001e1d31000
[ 283.495002] RBP: ffff888236493f38 R08: 0000000000000001 R09: 0000000000000001
[ 283.495858] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88821d907000
[ 283.496748] R13: ffff88821d8fc498 R14: ffff88821d8fda90 R15: ffff88821d8fc000
[ 283.497614] FS: 0000000000000000(0000) GS:ffff888237800000(0000) knlGS:0000000000000000
[ 283.498585] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 283.499290] CR2: 00000000ffffffff CR3: 00000001e1222000 CR4: 00000000000406e0
[ 283.500165] Call Trace:
[ 283.500499] ? rest_init+0x240/0x240
[ 283.500985] kernel_init+0x13/0x110
[ 283.501433] ret_from_fork+0x24/0x30
[ 283.501907] irq event stamp: 4760776
[ 283.502366] hardirqs last enabled at (4760775): [<ffffffffb481e34d>] _raw_spin_unlock_irqrestore+0x4d/0x60
[ 283.511686] hardirqs last disabled at (4760776): [<ffffffffb3c038d4>] trace_hardirqs_off_thunk+0x1a/0x1c
[ 283.512914] softirqs last enabled at (4760748): [<ffffffffb4c002cf>] __do_softirq+0x2cf/0x4ad
[ 283.514086] softirqs last disabled at (4760741): [<ffffffffb3cf4f4d>] irq_exit+0xcd/0xe0
[ 283.515114] ---[ end trace 7e3383c4261f8faa ]---


To reproduce:

# build kernel
cd linux
cp config-5.6.0-rc7-next-20200323-00001-gf675f2f91d045 .config
make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
Rong Chen


Attachments:
(No filename) (4.53 kB)
config-5.6.0-rc7-next-20200323-00001-gf675f2f91d045 (135.55 kB)
job-script (4.63 kB)
dmesg.xz (49.82 kB)
trinity (275.96 kB)
Download all attachments

2020-03-31 12:31:46

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests

On Tue, 24 Mar 2020 10:52:52 +0530
Anshuman Khandual <[email protected]> wrote:

> This series adds more arch page table helper tests. The new tests here are
> either related to core memory functions and advanced arch pgtable helpers.
> This also creates a documentation file enlisting all expected semantics as
> suggested by Mike Rapoport (https://lkml.org/lkml/2020/1/30/40).
>
> This series has been tested on arm64 and x86 platforms. There is just one
> expected failure on arm64 that will be fixed when we enable THP migration.
>
> [ 21.741634] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:782
>
> which corresponds to
>
> WARN_ON(!pmd_present(pmd_mknotpresent(pmd_mkhuge(pmd))))
>
> There are many TRANSPARENT_HUGEPAGE and ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD
> ifdefs scattered across the test. But consolidating all the fallback stubs
> is not very straight forward because ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD is
> not explicitly dependent on ARCH_HAS_TRANSPARENT_HUGEPAGE.
>
> This series has been build tested on many platforms including the ones that
> subscribe the test through ARCH_HAS_DEBUG_VM_PGTABLE.
>

Hi Anshuman,

thanks for the update. There are a couple of issues on s390, some might
also affect other archs.

1) The pxd_huge_tests are using pxd_set/clear_huge, which defaults to
returning 0 if !CONFIG_HAVE_ARCH_HUGE_VMAP. As result, the checks for
!pxd_test/clear_huge in the pxd_huge_tests will always trigger the
warning. This should affect all archs w/o CONFIG_HAVE_ARCH_HUGE_VMAP.
Could be fixed like this:

@@ -923,8 +923,10 @@ void __init debug_vm_pgtable(void)
pmd_leaf_tests(pmd_aligned, prot);
pud_leaf_tests(pud_aligned, prot);

- pmd_huge_tests(pmdp, pmd_aligned, prot);
- pud_huge_tests(pudp, pud_aligned, prot);
+ if (IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) {
+ pmd_huge_tests(pmdp, pmd_aligned, prot);
+ pud_huge_tests(pudp, pud_aligned, prot);
+ }

pte_savedwrite_tests(pte_aligned, prot);
pmd_savedwrite_tests(pmd_aligned, prot);

BTW, please add some comments to the various #ifdef/#else stuff, especially
when the different parts are far away and/or nested.

2) The hugetlb_advanced_test will fail because it directly de-references
huge *ptep pointers instead of using huge_ptep_get() for this. We have
very different pagetable entry layout for pte and (large) pmd on s390,
and unfortunately the whole hugetlbfs code is using pte_t instead of pmd_t
like THP. For this reason, huge_ptep_get() was introduced, which will
return a "converted" pte, because directly reading from a *ptep (pointing
to a large pmd) will not return a proper pte. Only ARM has also an
implementation of huge_ptep_get(), so they could be affected, depending
on what exactly they need it for.

Could be fixed like this (the first de-reference is a bit special,
because at that point *ptep does not really point to a large (pmd) entry
yet, it is initially an invalid pte entry, which breaks our huge_ptep_get()
conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE,
because we do have some special bits there in our large pmds. It seems
to also work w/o that alignment, but it feels a bit wrong):

@@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test
unsigned long vaddr, pgprot_t prot)
{
struct page *page = pfn_to_page(pfn);
- pte_t pte = READ_ONCE(*ptep);
+ pte_t pte;

- pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
+ pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot));
set_huge_pte_at(mm, vaddr, ptep, pte);
barrier();
WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
huge_pte_clear(mm, vaddr, ptep, PMD_SIZE);
- pte = READ_ONCE(*ptep);
+ pte = huge_ptep_get(ptep);
WARN_ON(!huge_pte_none(pte));

pte = mk_huge_pte(page, prot);
set_huge_pte_at(mm, vaddr, ptep, pte);
huge_ptep_set_wrprotect(mm, vaddr, ptep);
- pte = READ_ONCE(*ptep);
+ pte = huge_ptep_get(ptep);
WARN_ON(huge_pte_write(pte));

pte = mk_huge_pte(page, prot);
set_huge_pte_at(mm, vaddr, ptep, pte);
huge_ptep_get_and_clear(mm, vaddr, ptep);
- pte = READ_ONCE(*ptep);
+ pte = huge_ptep_get(ptep);
WARN_ON(!huge_pte_none(pte));

pte = mk_huge_pte(page, prot);
@@ -759,7 +759,7 @@ static void __init hugetlb_advanced_test
pte = huge_pte_mkwrite(pte);
pte = huge_pte_mkdirty(pte);
huge_ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
- pte = READ_ONCE(*ptep);
+ pte = huge_ptep_get(ptep);
WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte)));
}
#else

3) The pmd_protnone_tests() has an issue, because it passes a pmd to
pmd_protnone() which has not been marked as large. We check for large
pmd in the s390 implementation of pmd_protnone(), and will fail if a
pmd is not large. We had similar issues before, in other helpers, where
I changed the logic on s390 to not require the pmd large check, but I'm
not so sure in this case. Is there a valid use case for doing
pmd_protnone() on "normal" pmds? Or could this be changed like this:

@@ -537,7 +537,7 @@ static void __init pte_protnone_tests(un
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static void __init pmd_protnone_tests(unsigned long pfn, pgprot_t prot)
{
- pmd_t pmd = pfn_pmd(pfn, prot);
+ pmd_t pmd = mk_huge_pmd(pfn_to_page(pfn), prot);

if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
return;

Regards,
Gerald

2020-04-05 12:53:01

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests

On 03/31/2020 06:00 PM, Gerald Schaefer wrote:
> On Tue, 24 Mar 2020 10:52:52 +0530
> Anshuman Khandual <[email protected]> wrote:
>
>> This series adds more arch page table helper tests. The new tests here are
>> either related to core memory functions and advanced arch pgtable helpers.
>> This also creates a documentation file enlisting all expected semantics as
>> suggested by Mike Rapoport (https://lkml.org/lkml/2020/1/30/40).
>>
>> This series has been tested on arm64 and x86 platforms. There is just one
>> expected failure on arm64 that will be fixed when we enable THP migration.
>>
>> [ 21.741634] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:782
>>
>> which corresponds to
>>
>> WARN_ON(!pmd_present(pmd_mknotpresent(pmd_mkhuge(pmd))))
>>
>> There are many TRANSPARENT_HUGEPAGE and ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD
>> ifdefs scattered across the test. But consolidating all the fallback stubs
>> is not very straight forward because ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD is
>> not explicitly dependent on ARCH_HAS_TRANSPARENT_HUGEPAGE.
>>
>> This series has been build tested on many platforms including the ones that
>> subscribe the test through ARCH_HAS_DEBUG_VM_PGTABLE.
>>
>
> Hi Anshuman,
>
> thanks for the update. There are a couple of issues on s390, some might
> also affect other archs.

Sure, thanks for taking a look and giving it a spin on s390.

>
> 1) The pxd_huge_tests are using pxd_set/clear_huge, which defaults to
> returning 0 if !CONFIG_HAVE_ARCH_HUGE_VMAP. As result, the checks for
> !pxd_test/clear_huge in the pxd_huge_tests will always trigger the
> warning. This should affect all archs w/o CONFIG_HAVE_ARCH_HUGE_VMAP.
> Could be fixed like this:
>
> @@ -923,8 +923,10 @@ void __init debug_vm_pgtable(void)
> pmd_leaf_tests(pmd_aligned, prot);
> pud_leaf_tests(pud_aligned, prot);
>
> - pmd_huge_tests(pmdp, pmd_aligned, prot);
> - pud_huge_tests(pudp, pud_aligned, prot);
> + if (IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) {
> + pmd_huge_tests(pmdp, pmd_aligned, prot);
> + pud_huge_tests(pudp, pud_aligned, prot);
> + }

That is correct. It was an omission on my part and will fix it.

>
> pte_savedwrite_tests(pte_aligned, prot);
> pmd_savedwrite_tests(pmd_aligned, prot);
>
> BTW, please add some comments to the various #ifdef/#else stuff, especially
> when the different parts are far away and/or nested.

Sure, will do.

>
> 2) The hugetlb_advanced_test will fail because it directly de-references
> huge *ptep pointers instead of using huge_ptep_get() for this. We have
> very different pagetable entry layout for pte and (large) pmd on s390,
> and unfortunately the whole hugetlbfs code is using pte_t instead of pmd_t
> like THP. For this reason, huge_ptep_get() was introduced, which will
> return a "converted" pte, because directly reading from a *ptep (pointing
> to a large pmd) will not return a proper pte. Only ARM has also an
> implementation of huge_ptep_get(), so they could be affected, depending
> on what exactly they need it for.

Currently, we dont support ARM (32). But as huge_ptep_get() already got a
fallback, its better to use that than a direct READ_ONCE().

>
> Could be fixed like this (the first de-reference is a bit special,
> because at that point *ptep does not really point to a large (pmd) entry
> yet, it is initially an invalid pte entry, which breaks our huge_ptep_get()

There seems to be an inconsistency on s390 platform. Even though it defines
a huge_ptep_get() override, it does not subscribe __HAVE_ARCH_HUGE_PTEP_GET
which should have forced it fallback on generic huge_ptep_get() but it does
not :) Then I realized that __HAVE_ARCH_HUGE_PTEP_GET only makes sense when
an arch uses <asm-generic/hugetlb.h>. s390 does not use that and hence gets
away with it's own huge_ptep_get() without __HAVE_ARCH_HUGE_PTEP_GET. Sounds
confusing ? But I might not have the entire context here.

> conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE,
> because we do have some special bits there in our large pmds. It seems
> to also work w/o that alignment, but it feels a bit wrong):

Sure, we can accommodate that.

>
> @@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test
> unsigned long vaddr, pgprot_t prot)
> {
> struct page *page = pfn_to_page(pfn);
> - pte_t pte = READ_ONCE(*ptep);
> + pte_t pte;
>
> - pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
> + pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot));

So that keeps the existing value in 'ptep' pointer at bay and instead
construct a PTE from scratch. I would rather have READ_ONCE(*ptep) at
least provide the seed that can be ORed with RANDOM_ORVALUE before
being masked with PMD_MASK. Do you see any problem ?

Some thing like this instead.

pte_t pte = READ_ONCE(*ptep);
pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK));

We cannot use mk_pte_phys() as it is defined only on some platforms
without any generic fallback for others.

> set_huge_pte_at(mm, vaddr, ptep, pte);
> barrier();
> WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
> huge_pte_clear(mm, vaddr, ptep, PMD_SIZE);
> - pte = READ_ONCE(*ptep);
> + pte = huge_ptep_get(ptep);
> WARN_ON(!huge_pte_none(pte));
>
> pte = mk_huge_pte(page, prot);
> set_huge_pte_at(mm, vaddr, ptep, pte);
> huge_ptep_set_wrprotect(mm, vaddr, ptep);
> - pte = READ_ONCE(*ptep);
> + pte = huge_ptep_get(ptep);
> WARN_ON(huge_pte_write(pte));
>
> pte = mk_huge_pte(page, prot);
> set_huge_pte_at(mm, vaddr, ptep, pte);
> huge_ptep_get_and_clear(mm, vaddr, ptep);
> - pte = READ_ONCE(*ptep);
> + pte = huge_ptep_get(ptep);
> WARN_ON(!huge_pte_none(pte));
>
> pte = mk_huge_pte(page, prot);
> @@ -759,7 +759,7 @@ static void __init hugetlb_advanced_test
> pte = huge_pte_mkwrite(pte);
> pte = huge_pte_mkdirty(pte);
> huge_ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
> - pte = READ_ONCE(*ptep);
> + pte = huge_ptep_get(ptep);
> WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte)));
> }
> #else
>
> 3) The pmd_protnone_tests() has an issue, because it passes a pmd to
> pmd_protnone() which has not been marked as large. We check for large
> pmd in the s390 implementation of pmd_protnone(), and will fail if a
> pmd is not large. We had similar issues before, in other helpers, where
> I changed the logic on s390 to not require the pmd large check, but I'm
> not so sure in this case. Is there a valid use case for doing
> pmd_protnone() on "normal" pmds? Or could this be changed like this:

That is a valid question. IIUC, all existing callers for pmd_protnone()
ensure that it is indeed a huge PMD. But even assuming otherwise should
not the huge PMD requirement get checked in the caller itself rather than
in the arch helper which is just supposed to check the existence of the
dedicated PTE bit(s) for this purpose. Purely from a helper perspective
pmd_protnone() should not really care about being large even though it
might never get used without one.

Also all platforms (except s390) derive the pmd_protnone() from their
respective pte_protnone(). I wonder why should s390 be any different
unless it is absolutely necessary.

>
> @@ -537,7 +537,7 @@ static void __init pte_protnone_tests(un
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> static void __init pmd_protnone_tests(unsigned long pfn, pgprot_t prot)
> {
> - pmd_t pmd = pfn_pmd(pfn, prot);
> + pmd_t pmd = mk_huge_pmd(pfn_to_page(pfn), prot);
>
> if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
> return;
>
> Regards,
> Gerald
>
>

2020-04-05 14:53:50

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [mm/debug] f675f2f91d: WARNING:at_mm/debug_vm_pgtable.c:#debug_vm_pgtable

On 03/30/2020 02:26 PM, kernel test robot wrote:
> [ 283.486118] WARNING: CPU: 1 PID: 1 at mm/debug_vm_pgtable.c:371 debug_vm_pgtable+0x4dc/0x7e3
> [ 283.487342] Modules linked in:
> [ 283.487752] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc7-next-20200323-00001-gf675f2f91d045 #1
> [ 283.488817] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> [ 283.489794] RIP: 0010:debug_vm_pgtable+0x4dc/0x7e3
> [ 283.490361] Code: b5 fd 48 8b 7d d0 be 20 01 00 00 e8 3d 9f b5 fd 48 8b 75 c8 48 8b 7d d0 e8 30 9f b5 fd 48 8b 75 c8 48 8b 7d d0 e8 23 9f b5 fd <0f> 0b 48 8b 75 c8 48 8b 7d d0 e8 14 9f b5 fd 0f 0b 48 8b 75 c8 48
> [ 283.492577] RSP: 0000:ffff888236493ed8 EFLAGS: 00010202
> [ 283.493235] RAX: 00000001e1d31025 RBX: ffff88823e7f6cd8 RCX: ffffffffffffffff
> [ 283.494135] RDX: 0000000000000000 RSI: 0000000000000025 RDI: 00000001e1d31000
> [ 283.495002] RBP: ffff888236493f38 R08: 0000000000000001 R09: 0000000000000001
> [ 283.495858] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88821d907000
> [ 283.496748] R13: ffff88821d8fc498 R14: ffff88821d8fda90 R15: ffff88821d8fc000
> [ 283.497614] FS: 0000000000000000(0000) GS:ffff888237800000(0000) knlGS:0000000000000000
> [ 283.498585] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 283.499290] CR2: 00000000ffffffff CR3: 00000001e1222000 CR4: 00000000000406e0
> [ 283.500165] Call Trace:
> [ 283.500499] ? rest_init+0x240/0x240
> [ 283.500985] kernel_init+0x13/0x110
> [ 283.501433] ret_from_fork+0x24/0x30
> [ 283.501907] irq event stamp: 4760776
> [ 283.502366] hardirqs last enabled at (4760775): [<ffffffffb481e34d>] _raw_spin_unlock_irqrestore+0x4d/0x60
> [ 283.511686] hardirqs last disabled at (4760776): [<ffffffffb3c038d4>] trace_hardirqs_off_thunk+0x1a/0x1c
> [ 283.512914] softirqs last enabled at (4760748): [<ffffffffb4c002cf>] __do_softirq+0x2cf/0x4ad
> [ 283.514086] softirqs last disabled at (4760741): [<ffffffffb3cf4f4d>] irq_exit+0xcd/0xe0
> [ 283.515114] ---[ end trace 7e3383c4261f8faa ]---

The above failure here and the one on the other thread can be solved with
the following change. The failure is caused by the fact that even though
the soft dirty helpers are defined within CONFIG_HAVE_ARCH_SOFT_DIRTY, the
required PTE bits (_PAGE_SOFT_DIRTY and _PAGE_SWP_SOFT_DIRTY) are available
only when CONFIG_MEM_SOFT_DIRTY is enabled. Hence these tests should not
proceed unless CONFIG_MEM_SOFT_DIRTY is enabled. Similar situation exists
in s390 (_PAGE_SOFT_DIRTY and _SEGMENT_ENTRY_SOFT_DIRTY) and powerpc (at
least with _PAGE_SWP_SOFT_DIRTY).

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 87b4b495333b..2a75a51fed06 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -589,7 +589,7 @@ static void __init pte_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
{
pte_t pte = pfn_pte(pfn, prot);

- if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY))
+ if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
return;

WARN_ON(!pte_soft_dirty(pte_mksoft_dirty(pte)));
@@ -600,7 +600,7 @@ static void __init pte_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
{
pte_t pte = pfn_pte(pfn, prot);

- if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY))
+ if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
return;

WARN_ON(!pte_swp_soft_dirty(pte_swp_mksoft_dirty(pte)));
@@ -612,7 +612,7 @@ static void __init pmd_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
{
pmd_t pmd = pfn_pmd(pfn, prot);

- if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY))
+ if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
return;

WARN_ON(!pmd_soft_dirty(pmd_mksoft_dirty(pmd)));
@@ -623,7 +623,7 @@ static void __init pmd_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot)
{
pmd_t pmd = pfn_pmd(pfn, prot);

- if (!IS_ENABLED(CONFIG_HAVE_ARCH_SOFT_DIRTY) ||
+ if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) ||
!IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION))
return;

2020-04-07 15:55:43

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests

On Sun, 5 Apr 2020 17:58:14 +0530
Anshuman Khandual <[email protected]> wrote:

[...]
> >
> > Could be fixed like this (the first de-reference is a bit special,
> > because at that point *ptep does not really point to a large (pmd) entry
> > yet, it is initially an invalid pte entry, which breaks our huge_ptep_get()
>
> There seems to be an inconsistency on s390 platform. Even though it defines
> a huge_ptep_get() override, it does not subscribe __HAVE_ARCH_HUGE_PTEP_GET
> which should have forced it fallback on generic huge_ptep_get() but it does
> not :) Then I realized that __HAVE_ARCH_HUGE_PTEP_GET only makes sense when
> an arch uses <asm-generic/hugetlb.h>. s390 does not use that and hence gets
> away with it's own huge_ptep_get() without __HAVE_ARCH_HUGE_PTEP_GET. Sounds
> confusing ? But I might not have the entire context here.

Yes, that sounds very confusing. Also a bit ironic, since huge_ptep_get()
was initially introduced because of s390, and now we don't select
__HAVE_ARCH_HUGE_PTEP_GET...

As you realized, I guess this is because we do not use generic hugetlb.h.
And when __HAVE_ARCH_HUGE_PTEP_GET was introduced with commit 544db7597ad
("hugetlb: introduce generic version of huge_ptep_get"), that was probably
the reason why we did not get our share of __HAVE_ARCH_HUGE_PTEP_GET.

Nothing really wrong with that, but yes, very confusing. Maybe we could
also select it for s390, even though it wouldn't have any functional
impact (so far), just for less confusion. Maybe also thinking about
using the generic hugetlb.h, not sure if the original reasons for not
doing so would still apply. Now I only need to find the time...

>
> > conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE,
> > because we do have some special bits there in our large pmds. It seems
> > to also work w/o that alignment, but it feels a bit wrong):
>
> Sure, we can accommodate that.
>
> >
> > @@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test
> > unsigned long vaddr, pgprot_t prot)
> > {
> > struct page *page = pfn_to_page(pfn);
> > - pte_t pte = READ_ONCE(*ptep);
> > + pte_t pte;
> >
> > - pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
> > + pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot));
>
> So that keeps the existing value in 'ptep' pointer at bay and instead
> construct a PTE from scratch. I would rather have READ_ONCE(*ptep) at
> least provide the seed that can be ORed with RANDOM_ORVALUE before
> being masked with PMD_MASK. Do you see any problem ?

Yes, unfortunately. The problem is that the resulting pte is not marked
as present. The conversion pte -> (huge) pmd, which is done in
set_huge_pte_at() for s390, will establish an empty pmd for non-present
ptes, all the RANDOM_ORVALUE stuff is lost. And a subsequent
huge_ptep_get() will not result in the same original pte value. If you
want to preserve and check the RANDOM_ORVALUE, it has to be a present
pte, hence the mk_pte(_phys).

>
> Some thing like this instead.
>
> pte_t pte = READ_ONCE(*ptep);
> pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK));
>
> We cannot use mk_pte_phys() as it is defined only on some platforms
> without any generic fallback for others.

Oh, didn't know that, sorry. What about using mk_pte() instead, at least
it would result in a present pte:

pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot));

And if you also want to do some with the existing value, which seems
to be an empty pte, then maybe just check if writing and reading that
value with set_huge_pte_at() / huge_ptep_get() returns the same,
i.e. initially w/o RANDOM_ORVALUE.

So, in combination, like this (BTW, why is the barrier() needed, it
is not used for the other set_huge_pte_at() calls later?):

@@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test
struct page *page = pfn_to_page(pfn);
pte_t pte = READ_ONCE(*ptep);

- pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
+ set_huge_pte_at(mm, vaddr, ptep, pte);
+ WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
+
+ pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot));
set_huge_pte_at(mm, vaddr, ptep, pte);
barrier();
WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));

This would actually add a new test "write empty pte with
set_huge_pte_at(), then verify with huge_ptep_get()", which happens
to trigger a warning on s390 :-)

That (new) warning actually points to misbehavior on s390, we do not
write a correct empty pmd in this case, but one that is empty and also
marked as large. huge_ptep_get() will then not correctly recognize it
as empty and do wrong conversion. It is also not consistent with
huge_ptep_get_and_clear(), where we write the empty pmd w/o marking
as large. Last but not least it would also break our pmd_protnone()
logic (see below). Another nice finding on s390 :-)

I don't think this has any effect in practice (yet), but I will post a
fix for that, just in case you will add / change this test.

>
> > set_huge_pte_at(mm, vaddr, ptep, pte);
> > barrier();
> > WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
> > huge_pte_clear(mm, vaddr, ptep, PMD_SIZE);
> > - pte = READ_ONCE(*ptep);
> > + pte = huge_ptep_get(ptep);
> > WARN_ON(!huge_pte_none(pte));
> >
> > pte = mk_huge_pte(page, prot);
> > set_huge_pte_at(mm, vaddr, ptep, pte);
> > huge_ptep_set_wrprotect(mm, vaddr, ptep);
> > - pte = READ_ONCE(*ptep);
> > + pte = huge_ptep_get(ptep);
> > WARN_ON(huge_pte_write(pte));
> >
> > pte = mk_huge_pte(page, prot);
> > set_huge_pte_at(mm, vaddr, ptep, pte);
> > huge_ptep_get_and_clear(mm, vaddr, ptep);
> > - pte = READ_ONCE(*ptep);
> > + pte = huge_ptep_get(ptep);
> > WARN_ON(!huge_pte_none(pte));
> >
> > pte = mk_huge_pte(page, prot);
> > @@ -759,7 +759,7 @@ static void __init hugetlb_advanced_test
> > pte = huge_pte_mkwrite(pte);
> > pte = huge_pte_mkdirty(pte);
> > huge_ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
> > - pte = READ_ONCE(*ptep);
> > + pte = huge_ptep_get(ptep);
> > WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte)));
> > }
> > #else
> >
> > 3) The pmd_protnone_tests() has an issue, because it passes a pmd to
> > pmd_protnone() which has not been marked as large. We check for large
> > pmd in the s390 implementation of pmd_protnone(), and will fail if a
> > pmd is not large. We had similar issues before, in other helpers, where
> > I changed the logic on s390 to not require the pmd large check, but I'm
> > not so sure in this case. Is there a valid use case for doing
> > pmd_protnone() on "normal" pmds? Or could this be changed like this:
>
> That is a valid question. IIUC, all existing callers for pmd_protnone()
> ensure that it is indeed a huge PMD. But even assuming otherwise should
> not the huge PMD requirement get checked in the caller itself rather than
> in the arch helper which is just supposed to check the existence of the
> dedicated PTE bit(s) for this purpose. Purely from a helper perspective
> pmd_protnone() should not really care about being large even though it
> might never get used without one.
>
> Also all platforms (except s390) derive the pmd_protnone() from their
> respective pte_protnone(). I wonder why should s390 be any different
> unless it is absolutely necessary.

This is again because of our different page table entry layouts for
pte/pmd and (large) pmd. The bits we check for pmd_protnone() are
not valid for normal pmd/pte, and we would return undefined result for
normal entries.

Of course, we could rely on nobody calling pmd_protnone() on normal
pmds, but in this case we also use pmd_large() check in pmd_protnone()
for indication if the pmd is present. W/o that, we would return
true for empty pmds, that doesn't sound right. Not sure if we also
want to rely on nobody calling pmd_protnone() on empty pmds.

Anyway, if in practice it is not correct to use pmd_protnone()
on normal pmds, then I would suggest that your tests should also
not do / test it. And I strongly assume that it is not correct, at
least I cannot think of a valid case, and of course s390 would
already be broken if there was such a case.

Regards,
Gerald

2020-04-08 07:30:32

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests


On 04/07/2020 09:24 PM, Gerald Schaefer wrote:
> On Sun, 5 Apr 2020 17:58:14 +0530
> Anshuman Khandual <[email protected]> wrote:
>
> [...]
>>>
>>> Could be fixed like this (the first de-reference is a bit special,
>>> because at that point *ptep does not really point to a large (pmd) entry
>>> yet, it is initially an invalid pte entry, which breaks our huge_ptep_get()
>>
>> There seems to be an inconsistency on s390 platform. Even though it defines
>> a huge_ptep_get() override, it does not subscribe __HAVE_ARCH_HUGE_PTEP_GET
>> which should have forced it fallback on generic huge_ptep_get() but it does
>> not :) Then I realized that __HAVE_ARCH_HUGE_PTEP_GET only makes sense when
>> an arch uses <asm-generic/hugetlb.h>. s390 does not use that and hence gets
>> away with it's own huge_ptep_get() without __HAVE_ARCH_HUGE_PTEP_GET. Sounds
>> confusing ? But I might not have the entire context here.
>
> Yes, that sounds very confusing. Also a bit ironic, since huge_ptep_get()
> was initially introduced because of s390, and now we don't select
> __HAVE_ARCH_HUGE_PTEP_GET...
>
> As you realized, I guess this is because we do not use generic hugetlb.h.
> And when __HAVE_ARCH_HUGE_PTEP_GET was introduced with commit 544db7597ad
> ("hugetlb: introduce generic version of huge_ptep_get"), that was probably
> the reason why we did not get our share of __HAVE_ARCH_HUGE_PTEP_GET.

Understood.

>
> Nothing really wrong with that, but yes, very confusing. Maybe we could
> also select it for s390, even though it wouldn't have any functional
> impact (so far), just for less confusion. Maybe also thinking about
> using the generic hugetlb.h, not sure if the original reasons for not
> doing so would still apply. Now I only need to find the time...

Seems like something worth to explore if we could remove this confusion.

>
>>
>>> conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE,
>>> because we do have some special bits there in our large pmds. It seems
>>> to also work w/o that alignment, but it feels a bit wrong):
>>
>> Sure, we can accommodate that.
>>
>>>
>>> @@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test
>>> unsigned long vaddr, pgprot_t prot)
>>> {
>>> struct page *page = pfn_to_page(pfn);
>>> - pte_t pte = READ_ONCE(*ptep);
>>> + pte_t pte;
>>>
>>> - pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
>>> + pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot));
>>
>> So that keeps the existing value in 'ptep' pointer at bay and instead
>> construct a PTE from scratch. I would rather have READ_ONCE(*ptep) at
>> least provide the seed that can be ORed with RANDOM_ORVALUE before
>> being masked with PMD_MASK. Do you see any problem ?
>
> Yes, unfortunately. The problem is that the resulting pte is not marked
> as present. The conversion pte -> (huge) pmd, which is done in
> set_huge_pte_at() for s390, will establish an empty pmd for non-present
> ptes, all the RANDOM_ORVALUE stuff is lost. And a subsequent
> huge_ptep_get() will not result in the same original pte value. If you

Ohh.

> want to preserve and check the RANDOM_ORVALUE, it has to be a present
> pte, hence the mk_pte(_phys).

Understood and mk_pte() is also available on all platforms.

>
>>
>> Some thing like this instead.
>>
>> pte_t pte = READ_ONCE(*ptep);
>> pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK));
>>
>> We cannot use mk_pte_phys() as it is defined only on some platforms
>> without any generic fallback for others.
>
> Oh, didn't know that, sorry. What about using mk_pte() instead, at least
> it would result in a present pte:
>
> pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot));

Lets use mk_pte() here but can we do this instead

paddr = (__pfn_to_phys(pfn) | RANDOM_ORVALUE) & PMD_MASK;
pte = pte_mkhuge(mk_pte(phys_to_page(paddr), prot));

>
> And if you also want to do some with the existing value, which seems
> to be an empty pte, then maybe just check if writing and reading that
> value with set_huge_pte_at() / huge_ptep_get() returns the same,
> i.e. initially w/o RANDOM_ORVALUE.
>
> So, in combination, like this (BTW, why is the barrier() needed, it
> is not used for the other set_huge_pte_at() calls later?):

Ahh missed, will add them. Earlier we faced problem without it after
set_pte_at() for a test on powerpc (64) platform. Hence just added it
here to be extra careful.

>
> @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test
> struct page *page = pfn_to_page(pfn);
> pte_t pte = READ_ONCE(*ptep);
>
> - pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
> + set_huge_pte_at(mm, vaddr, ptep, pte);
> + WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
> +
> + pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot));
> set_huge_pte_at(mm, vaddr, ptep, pte);
> barrier();
> WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
>
> This would actually add a new test "write empty pte with
> set_huge_pte_at(), then verify with huge_ptep_get()", which happens
> to trigger a warning on s390 :-)

On arm64 as well which checks for pte_present() in set_huge_pte_at().
But PTE present check is not really present in each set_huge_pte_at()
implementation especially without __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT.
Hence wondering if we should add this new test here which will keep
giving warnings on s390 and arm64 (at the least).

>
> That (new) warning actually points to misbehavior on s390, we do not
> write a correct empty pmd in this case, but one that is empty and also
> marked as large. huge_ptep_get() will then not correctly recognize it
> as empty and do wrong conversion. It is also not consistent with
> huge_ptep_get_and_clear(), where we write the empty pmd w/o marking
> as large. Last but not least it would also break our pmd_protnone()
> logic (see below). Another nice finding on s390 :-)

:)

>
> I don't think this has any effect in practice (yet), but I will post a
> fix for that, just in case you will add / change this test.

Okay.

>
>>
>>> set_huge_pte_at(mm, vaddr, ptep, pte);
>>> barrier();
>>> WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
>>> huge_pte_clear(mm, vaddr, ptep, PMD_SIZE);
>>> - pte = READ_ONCE(*ptep);
>>> + pte = huge_ptep_get(ptep);
>>> WARN_ON(!huge_pte_none(pte));
>>>
>>> pte = mk_huge_pte(page, prot);
>>> set_huge_pte_at(mm, vaddr, ptep, pte);
>>> huge_ptep_set_wrprotect(mm, vaddr, ptep);
>>> - pte = READ_ONCE(*ptep);
>>> + pte = huge_ptep_get(ptep);
>>> WARN_ON(huge_pte_write(pte));
>>>
>>> pte = mk_huge_pte(page, prot);
>>> set_huge_pte_at(mm, vaddr, ptep, pte);
>>> huge_ptep_get_and_clear(mm, vaddr, ptep);
>>> - pte = READ_ONCE(*ptep);
>>> + pte = huge_ptep_get(ptep);
>>> WARN_ON(!huge_pte_none(pte));
>>>
>>> pte = mk_huge_pte(page, prot);
>>> @@ -759,7 +759,7 @@ static void __init hugetlb_advanced_test
>>> pte = huge_pte_mkwrite(pte);
>>> pte = huge_pte_mkdirty(pte);
>>> huge_ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
>>> - pte = READ_ONCE(*ptep);
>>> + pte = huge_ptep_get(ptep);
>>> WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte)));
>>> }
>>> #else
>>>
>>> 3) The pmd_protnone_tests() has an issue, because it passes a pmd to
>>> pmd_protnone() which has not been marked as large. We check for large
>>> pmd in the s390 implementation of pmd_protnone(), and will fail if a
>>> pmd is not large. We had similar issues before, in other helpers, where
>>> I changed the logic on s390 to not require the pmd large check, but I'm
>>> not so sure in this case. Is there a valid use case for doing
>>> pmd_protnone() on "normal" pmds? Or could this be changed like this:
>>
>> That is a valid question. IIUC, all existing callers for pmd_protnone()
>> ensure that it is indeed a huge PMD. But even assuming otherwise should
>> not the huge PMD requirement get checked in the caller itself rather than
>> in the arch helper which is just supposed to check the existence of the
>> dedicated PTE bit(s) for this purpose. Purely from a helper perspective
>> pmd_protnone() should not really care about being large even though it
>> might never get used without one.
>>
>> Also all platforms (except s390) derive the pmd_protnone() from their
>> respective pte_protnone(). I wonder why should s390 be any different
>> unless it is absolutely necessary.
>
> This is again because of our different page table entry layouts for
> pte/pmd and (large) pmd. The bits we check for pmd_protnone() are
> not valid for normal pmd/pte, and we would return undefined result for
> normal entries.
>
> Of course, we could rely on nobody calling pmd_protnone() on normal
> pmds, but in this case we also use pmd_large() check in pmd_protnone()
> for indication if the pmd is present. W/o that, we would return
> true for empty pmds, that doesn't sound right. Not sure if we also
> want to rely on nobody calling pmd_protnone() on empty pmds.

That might be problematic.

>
> Anyway, if in practice it is not correct to use pmd_protnone()
> on normal pmds, then I would suggest that your tests should also
> not do / test it. And I strongly assume that it is not correct, at
> least I cannot think of a valid case, and of course s390 would
> already be broken if there was such a case.

Okay, will use huge PMD here as you had suggested earlier.

>
> Regards,
> Gerald
>
>

2020-04-08 12:26:24

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests

On Wed, 8 Apr 2020 12:41:51 +0530
Anshuman Khandual <[email protected]> wrote:

[...]
> >
> >>
> >> Some thing like this instead.
> >>
> >> pte_t pte = READ_ONCE(*ptep);
> >> pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK));
> >>
> >> We cannot use mk_pte_phys() as it is defined only on some platforms
> >> without any generic fallback for others.
> >
> > Oh, didn't know that, sorry. What about using mk_pte() instead, at least
> > it would result in a present pte:
> >
> > pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot));
>
> Lets use mk_pte() here but can we do this instead
>
> paddr = (__pfn_to_phys(pfn) | RANDOM_ORVALUE) & PMD_MASK;
> pte = pte_mkhuge(mk_pte(phys_to_page(paddr), prot));
>

Sure, that will also work.

BTW, this RANDOM_ORVALUE is not really very random, the way it is
defined. For s390 we already changed it to mask out some arch bits,
but I guess there are other archs and bits that would always be
set with this "not so random" value, and I wonder if/how that would
affect all the tests using this value, see also below.

> >
> > And if you also want to do some with the existing value, which seems
> > to be an empty pte, then maybe just check if writing and reading that
> > value with set_huge_pte_at() / huge_ptep_get() returns the same,
> > i.e. initially w/o RANDOM_ORVALUE.
> >
> > So, in combination, like this (BTW, why is the barrier() needed, it
> > is not used for the other set_huge_pte_at() calls later?):
>
> Ahh missed, will add them. Earlier we faced problem without it after
> set_pte_at() for a test on powerpc (64) platform. Hence just added it
> here to be extra careful.
>
> >
> > @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test
> > struct page *page = pfn_to_page(pfn);
> > pte_t pte = READ_ONCE(*ptep);
> >
> > - pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
> > + set_huge_pte_at(mm, vaddr, ptep, pte);
> > + WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
> > +
> > + pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot));
> > set_huge_pte_at(mm, vaddr, ptep, pte);
> > barrier();
> > WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
> >
> > This would actually add a new test "write empty pte with
> > set_huge_pte_at(), then verify with huge_ptep_get()", which happens
> > to trigger a warning on s390 :-)
>
> On arm64 as well which checks for pte_present() in set_huge_pte_at().
> But PTE present check is not really present in each set_huge_pte_at()
> implementation especially without __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT.
> Hence wondering if we should add this new test here which will keep
> giving warnings on s390 and arm64 (at the least).

Hmm, interesting. I forgot about huge swap / migration, which is not
(and probably cannot be) supported on s390. The pte_present() check
on arm64 seems to check for such huge swap / migration entries,
according to the comment.

The new test "write empty pte with set_huge_pte_at(), then verify
with huge_ptep_get()" would then probably trigger the
WARN_ON(!pte_present(pte)) in arm64 code. So I guess "writing empty
ptes with set_huge_pte_at()" is not really a valid use case in practice,
or else you would have seen this warning before. In that case, it
might not be a good idea to add this test.

I also do wonder now, why the original test with
"pte = __pte(pte_val(pte) | RANDOM_ORVALUE);"
did not also trigger that warning on arm64. On s390 this test failed
exactly because the constructed pte was not present (initially empty,
or'ing RANDOM_ORVALUE does not make it present for s390). I guess this
just worked by chance on arm64, because the bits from RANDOM_ORVALUE
also happened to mark the pte present for arm64.

This brings us back to the question above, regarding the "randomness"
of RANDOM_ORVALUE. Not really sure what the intention behind that was,
but maybe it would make sense to restrict this RANDOM_ORVALUE to
non-arch-specific bits, i.e. only bits that would be part of the
address value within a page table entry? Or was it intentionally
chosen to also mess with other bits?

Regards,
Gerald

2020-04-09 01:08:07

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests


On 04/08/2020 05:45 PM, Gerald Schaefer wrote:
> On Wed, 8 Apr 2020 12:41:51 +0530
> Anshuman Khandual <[email protected]> wrote:
>
> [...]
>>>
>>>>
>>>> Some thing like this instead.
>>>>
>>>> pte_t pte = READ_ONCE(*ptep);
>>>> pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK));
>>>>
>>>> We cannot use mk_pte_phys() as it is defined only on some platforms
>>>> without any generic fallback for others.
>>>
>>> Oh, didn't know that, sorry. What about using mk_pte() instead, at least
>>> it would result in a present pte:
>>>
>>> pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot));
>>
>> Lets use mk_pte() here but can we do this instead
>>
>> paddr = (__pfn_to_phys(pfn) | RANDOM_ORVALUE) & PMD_MASK;
>> pte = pte_mkhuge(mk_pte(phys_to_page(paddr), prot));
>>
>
> Sure, that will also work.
>
> BTW, this RANDOM_ORVALUE is not really very random, the way it is
> defined. For s390 we already changed it to mask out some arch bits,
> but I guess there are other archs and bits that would always be
> set with this "not so random" value, and I wonder if/how that would
> affect all the tests using this value, see also below.

RANDOM_ORVALUE is a constant which was added in order to make sure
that the page table entries should have some non-zero value before
getting called with pxx_clear() and followed by a pxx_none() check.
This is currently used only in pxx_clear_tests() tests. Hence there
is no impact for the existing tests.

>
>>>
>>> And if you also want to do some with the existing value, which seems
>>> to be an empty pte, then maybe just check if writing and reading that
>>> value with set_huge_pte_at() / huge_ptep_get() returns the same,
>>> i.e. initially w/o RANDOM_ORVALUE.
>>>
>>> So, in combination, like this (BTW, why is the barrier() needed, it
>>> is not used for the other set_huge_pte_at() calls later?):
>>
>> Ahh missed, will add them. Earlier we faced problem without it after
>> set_pte_at() for a test on powerpc (64) platform. Hence just added it
>> here to be extra careful.
>>
>>>
>>> @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test
>>> struct page *page = pfn_to_page(pfn);
>>> pte_t pte = READ_ONCE(*ptep);
>>>
>>> - pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
>>> + set_huge_pte_at(mm, vaddr, ptep, pte);
>>> + WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
>>> +
>>> + pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot));
>>> set_huge_pte_at(mm, vaddr, ptep, pte);
>>> barrier();
>>> WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
>>>
>>> This would actually add a new test "write empty pte with
>>> set_huge_pte_at(), then verify with huge_ptep_get()", which happens
>>> to trigger a warning on s390 :-)
>>
>> On arm64 as well which checks for pte_present() in set_huge_pte_at().
>> But PTE present check is not really present in each set_huge_pte_at()
>> implementation especially without __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT.
>> Hence wondering if we should add this new test here which will keep
>> giving warnings on s390 and arm64 (at the least).
>
> Hmm, interesting. I forgot about huge swap / migration, which is not
> (and probably cannot be) supported on s390. The pte_present() check
> on arm64 seems to check for such huge swap / migration entries,
> according to the comment.
>
> The new test "write empty pte with set_huge_pte_at(), then verify
> with huge_ptep_get()" would then probably trigger the
> WARN_ON(!pte_present(pte)) in arm64 code. So I guess "writing empty
> ptes with set_huge_pte_at()" is not really a valid use case in practice,
> or else you would have seen this warning before. In that case, it
> might not be a good idea to add this test.

Got it.

>
> I also do wonder now, why the original test with
> "pte = __pte(pte_val(pte) | RANDOM_ORVALUE);"
> did not also trigger that warning on arm64. On s390 this test failed
> exactly because the constructed pte was not present (initially empty,
> or'ing RANDOM_ORVALUE does not make it present for s390). I guess this
> just worked by chance on arm64, because the bits from RANDOM_ORVALUE
> also happened to mark the pte present for arm64.

That is correct. RANDOM_ORVALUE has got PTE_PROT_NONE bit set that makes
the PTE test for pte_present().

On arm64 platform,

#define pte_present(pte) (!!(pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)))

>
> This brings us back to the question above, regarding the "randomness"
> of RANDOM_ORVALUE. Not really sure what the intention behind that was,
> but maybe it would make sense to restrict this RANDOM_ORVALUE to
> non-arch-specific bits, i.e. only bits that would be part of the
> address value within a page table entry? Or was it intentionally
> chosen to also mess with other bits?

As mentioned before, RANDOM_ORVALUE just helped make a given page table
entry contain non-zero values before getting cleared. AFAICS we should
not need RANDOM_ORVALUE for HugeTLB test here. I believe the following
'paddr' construct will just be fine instead.

paddr = __pfn_to_phys(pfn) & PMD_MASK;
pte = pte_mkhuge(mk_pte(phys_to_page(paddr), prot));

>
> Regards,
> Gerald
>
>