2024-02-26 20:56:27

by Zi Yan

[permalink] [raw]
Subject: [PATCH v5 0/8] Split a folio to any lower order folios

From: Zi Yan <[email protected]>

Hi all,

File folio supports any order and multi-size THP is upstreamed[1], so both
file and anonymous folios can be >0 order. Currently, split_huge_page()
only splits a huge page to order-0 pages, but splitting to orders higher than
0 might better utilize large folios, if done properly. In addition,
Large Block Sizes in XFS support would benefit from it during truncate[2].
This patchset adds support for splitting a large folio to any lower order
folios. The patchset is on top of mm-everything-2024-02-24-02-40.

In addition to this implementation of split_huge_page_to_list_to_order(),
a possible optimization could be splitting a large folio to arbitrary
smaller folios instead of a single order. As both Hugh and Ryan pointed
out [3,5] that split to a single order might not be optimal, an order-9 folio
might be better split into 1 order-8, 1 order-7, ..., 1 order-1, and 2 order-0
folios, depending on subsequent folio operations. Leave this as future work.


Changelog
===

Since v4[4]
1. Picked up Matthew's order-1 folio support in the page cache patch, so
that XFS Large Block Sizes patchset can avoid additional code churn in
split_huge_page_to_list_to_order().
2. Dropped truncate change patch and corresponding testing code.
3. Removed thp_nr_pages() use in __split_huge_page()
(per David Hildenbrand).
4. Fixed __split_page_owner() (per David Hildenbrand).
5. Changed unmap_folio() to only add TTU_SPLIT_HUGE_PMD if the folios is
pmd mappable (per Ryan Roberts).
6. Moved swapcached folio split warning upfront and return -EINVAL
(per Ryan Roberts).

Since v3
---
1. Excluded shmem folios and pagecache folios without FS support from
splitting to any order (per Hugh Dickins).
2. Allowed splitting anonymous large folio to any lower order since
multi-size THP is upstreamed.
3. Adapted selftests code to new framework.

Since v2
---
1. Fixed an issue in __split_page_owner() introduced during my rebase

Since v1
---
1. Changed split_page_memcg() and split_page_owner() parameter to use order
2. Used folio_test_pmd_mappable() in place of the equivalent code

Details
===

* Patch 1 changes unmap_folio() to only add TTU_SPLIT_HUGE_PMD if the
folio is pmd mappable.
* Patch 2 adds support for order-1 page cache folio.
* Patch 3 changes split_page_memcg() to use order instead of nr_pages.
* Patch 4 changes split_page_owner() to use order instead of nr_pages.
* Patch 5 and 6 add new_order parameter split_page_memcg() and
split_page_owner() and prepare for upcoming changes.
* Patch 7 adds split_huge_page_to_list_to_order() to split a huge page
to any lower order. The original split_huge_page_to_list() calls
split_huge_page_to_list_to_order() with new_order = 0.
* Patch 8 adds a test API to debugfs and test cases in
split_huge_page_test selftests.

Comments and/or suggestions are welcome.

[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/
[3] https://lore.kernel.org/linux-mm/[email protected]/
[4] https://lore.kernel.org/linux-mm/[email protected]/
[5] https://lore.kernel.org/linux-mm/[email protected]/


Matthew Wilcox (Oracle) (1):
mm: Support order-1 folios in the page cache

Zi Yan (7):
mm/huge_memory: only split PMD mapping when necessary in unmap_folio()
mm/memcg: use order instead of nr in split_page_memcg()
mm/page_owner: use order instead of nr in split_page_owner()
mm: memcg: make memcg huge page split support any order split.
mm: page_owner: add support for splitting to any order in split
page_owner.
mm: thp: split huge page to any lower order pages
mm: huge_memory: enable debugfs to split huge pages to any order.

include/linux/huge_mm.h | 21 ++-
include/linux/memcontrol.h | 4 +-
include/linux/page_owner.h | 14 +-
mm/filemap.c | 2 -
mm/huge_memory.c | 173 +++++++++++++-----
mm/internal.h | 3 +-
mm/memcontrol.c | 10 +-
mm/page_alloc.c | 8 +-
mm/page_owner.c | 6 +-
mm/readahead.c | 3 -
.../selftests/mm/split_huge_page_test.c | 115 +++++++++++-
11 files changed, 276 insertions(+), 83 deletions(-)

--
2.43.0



2024-02-26 20:56:36

by Zi Yan

[permalink] [raw]
Subject: [PATCH v5 1/8] mm/huge_memory: only split PMD mapping when necessary in unmap_folio()

From: Zi Yan <[email protected]>

As multi-size THP support is added, not all THPs are PMD-mapped, thus
during a huge page split, there is no need to always split PMD mapping
in unmap_folio(). Make it conditional.

Signed-off-by: Zi Yan <[email protected]>
---
mm/huge_memory.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 28341a5067fb..b20e535e874c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2727,11 +2727,14 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,

static void unmap_folio(struct folio *folio)
{
- enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD |
- TTU_SYNC | TTU_BATCH_FLUSH;
+ enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SYNC |
+ TTU_BATCH_FLUSH;

VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);

+ if (folio_test_pmd_mappable(folio))
+ ttu_flags |= TTU_SPLIT_HUGE_PMD;
+
/*
* Anon pages need migration entries to preserve them, but file
* pages can simply be left unmapped, then faulted back on demand.
--
2.43.0


2024-02-26 20:56:51

by Zi Yan

[permalink] [raw]
Subject: [PATCH v5 2/8] mm: Support order-1 folios in the page cache

From: "Matthew Wilcox (Oracle)" <[email protected]>

Folios of order 1 have no space to store the deferred list. This is
not a problem for the page cache as file-backed folios are never
placed on the deferred list. All we need to do is prevent the core
MM from touching the deferred list for order 1 folios and remove the
code which prevented us from allocating order 1 folios.

Link: https://lore.kernel.org/linux-mm/[email protected]/
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Zi Yan <[email protected]>
---
mm/filemap.c | 2 --
mm/huge_memory.c | 19 +++++++++++++++----
mm/internal.h | 3 +--
mm/readahead.c | 3 ---
4 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index b7a21551fbc7..b4858d89f1b1 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1912,8 +1912,6 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
gfp_t alloc_gfp = gfp;

err = -ENOMEM;
- if (order == 1)
- order = 0;
if (order > 0)
alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
folio = filemap_alloc_folio(alloc_gfp, order);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b20e535e874c..9840f312c08f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -790,8 +790,10 @@ struct deferred_split *get_deferred_split_queue(struct folio *folio)

void folio_prep_large_rmappable(struct folio *folio)
{
- VM_BUG_ON_FOLIO(folio_order(folio) < 2, folio);
- INIT_LIST_HEAD(&folio->_deferred_list);
+ if (!folio || !folio_test_large(folio))
+ return;
+ if (folio_order(folio) > 1)
+ INIT_LIST_HEAD(&folio->_deferred_list);
folio_set_large_rmappable(folio);
}

@@ -3114,7 +3116,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
/* Prevent deferred_split_scan() touching ->_refcount */
spin_lock(&ds_queue->split_queue_lock);
if (folio_ref_freeze(folio, 1 + extra_pins)) {
- if (!list_empty(&folio->_deferred_list)) {
+ if (folio_order(folio) > 1 &&
+ !list_empty(&folio->_deferred_list)) {
ds_queue->split_queue_len--;
list_del(&folio->_deferred_list);
}
@@ -3165,6 +3168,9 @@ void folio_undo_large_rmappable(struct folio *folio)
struct deferred_split *ds_queue;
unsigned long flags;

+ if (folio_order(folio) <= 1)
+ return;
+
/*
* At this point, there is no one trying to add the folio to
* deferred_list. If folio is not in deferred_list, it's safe
@@ -3190,7 +3196,12 @@ void deferred_split_folio(struct folio *folio)
#endif
unsigned long flags;

- VM_BUG_ON_FOLIO(folio_order(folio) < 2, folio);
+ /*
+ * Order 1 folios have no space for a deferred list, but we also
+ * won't waste much memory by not adding them to the deferred list.
+ */
+ if (folio_order(folio) <= 1)
+ return;

/*
* The try_to_unmap() in page reclaim path might reach here too,
diff --git a/mm/internal.h b/mm/internal.h
index 2b7efffbe4d7..c4853ebfa030 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -420,8 +420,7 @@ static inline struct folio *page_rmappable_folio(struct page *page)
{
struct folio *folio = (struct folio *)page;

- if (folio && folio_order(folio) > 1)
- folio_prep_large_rmappable(folio);
+ folio_prep_large_rmappable(folio);
return folio;
}

diff --git a/mm/readahead.c b/mm/readahead.c
index 1e74455f908e..130c0e7df99f 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -514,9 +514,6 @@ void page_cache_ra_order(struct readahead_control *ractl,
/* Don't allocate pages past EOF */
while (index + (1UL << order) - 1 > limit)
order--;
- /* THP machinery does not support order-1 */
- if (order == 1)
- order = 0;
err = ra_alloc_folio(ractl, index, mark, order, gfp);
if (err)
break;
--
2.43.0


2024-02-26 20:57:00

by Zi Yan

[permalink] [raw]
Subject: [PATCH v5 3/8] mm/memcg: use order instead of nr in split_page_memcg()

From: Zi Yan <[email protected]>

We do not have non power of two pages, using nr is error prone if nr
is not power-of-two. Use page order instead.

Signed-off-by: Zi Yan <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
---
include/linux/memcontrol.h | 4 ++--
mm/huge_memory.c | 5 +++--
mm/memcontrol.c | 3 ++-
mm/page_alloc.c | 4 ++--
4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 4e4caeaea404..173bbb53c1ec 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1163,7 +1163,7 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm,
rcu_read_unlock();
}

-void split_page_memcg(struct page *head, unsigned int nr);
+void split_page_memcg(struct page *head, int order);

unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
gfp_t gfp_mask,
@@ -1621,7 +1621,7 @@ void count_memcg_event_mm(struct mm_struct *mm, enum vm_event_item idx)
{
}

-static inline void split_page_memcg(struct page *head, unsigned int nr)
+static inline void split_page_memcg(struct page *head, int order)
{
}

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9840f312c08f..96ac7c62c375 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2889,11 +2889,12 @@ static void __split_huge_page(struct page *page, struct list_head *list,
struct lruvec *lruvec;
struct address_space *swap_cache = NULL;
unsigned long offset = 0;
- unsigned int nr = thp_nr_pages(head);
int i, nr_dropped = 0;
+ int order = folio_order(folio);
+ unsigned int nr = 1 << order;

/* complete memcg works before add pages to LRU */
- split_page_memcg(head, nr);
+ split_page_memcg(head, order);

if (folio_test_anon(folio) && folio_test_swapcache(folio)) {
offset = swp_offset(folio->swap);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 95c3fccb321b..1a09f0e77c44 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3608,11 +3608,12 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
/*
* Because page_memcg(head) is not set on tails, set it now.
*/
-void split_page_memcg(struct page *head, unsigned int nr)
+void split_page_memcg(struct page *head, int order)
{
struct folio *folio = page_folio(head);
struct mem_cgroup *memcg = folio_memcg(folio);
int i;
+ unsigned int nr = 1 << order;

if (mem_cgroup_disabled() || !memcg)
return;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 96839b210abe..a7a96bc97e0b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2653,7 +2653,7 @@ void split_page(struct page *page, unsigned int order)
for (i = 1; i < (1 << order); i++)
set_page_refcounted(page + i);
split_page_owner(page, 1 << order);
- split_page_memcg(page, 1 << order);
+ split_page_memcg(page, order);
}
EXPORT_SYMBOL_GPL(split_page);

@@ -4840,7 +4840,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
struct page *last = page + nr;

split_page_owner(page, 1 << order);
- split_page_memcg(page, 1 << order);
+ split_page_memcg(page, order);
while (page < --last)
set_page_refcounted(last);

--
2.43.0


2024-02-26 20:57:16

by Zi Yan

[permalink] [raw]
Subject: [PATCH v5 4/8] mm/page_owner: use order instead of nr in split_page_owner()

From: Zi Yan <[email protected]>

We do not have non power of two pages, using nr is error prone if nr
is not power-of-two. Use page order instead.

Signed-off-by: Zi Yan <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
---
include/linux/page_owner.h | 9 ++++-----
mm/huge_memory.c | 2 +-
mm/page_alloc.c | 4 ++--
mm/page_owner.c | 3 ++-
4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
index 119a0c9d2a8b..2b39c8e19d98 100644
--- a/include/linux/page_owner.h
+++ b/include/linux/page_owner.h
@@ -11,7 +11,7 @@ extern struct page_ext_operations page_owner_ops;
extern void __reset_page_owner(struct page *page, unsigned short order);
extern void __set_page_owner(struct page *page,
unsigned short order, gfp_t gfp_mask);
-extern void __split_page_owner(struct page *page, unsigned int nr);
+extern void __split_page_owner(struct page *page, int order);
extern void __folio_copy_owner(struct folio *newfolio, struct folio *old);
extern void __set_page_owner_migrate_reason(struct page *page, int reason);
extern void __dump_page_owner(const struct page *page);
@@ -31,10 +31,10 @@ static inline void set_page_owner(struct page *page,
__set_page_owner(page, order, gfp_mask);
}

-static inline void split_page_owner(struct page *page, unsigned int nr)
+static inline void split_page_owner(struct page *page, int order)
{
if (static_branch_unlikely(&page_owner_inited))
- __split_page_owner(page, nr);
+ __split_page_owner(page, order);
}
static inline void folio_copy_owner(struct folio *newfolio, struct folio *old)
{
@@ -59,8 +59,7 @@ static inline void set_page_owner(struct page *page,
unsigned int order, gfp_t gfp_mask)
{
}
-static inline void split_page_owner(struct page *page,
- unsigned short order)
+static inline void split_page_owner(struct page *page, int order)
{
}
static inline void folio_copy_owner(struct folio *newfolio, struct folio *folio)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 96ac7c62c375..2b95dbc95fae 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2933,7 +2933,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
unlock_page_lruvec(lruvec);
/* Caller disabled irqs, so they are still disabled here */

- split_page_owner(head, nr);
+ split_page_owner(head, order);

/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a7a96bc97e0b..eae77e76a671 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2652,7 +2652,7 @@ void split_page(struct page *page, unsigned int order)

for (i = 1; i < (1 << order); i++)
set_page_refcounted(page + i);
- split_page_owner(page, 1 << order);
+ split_page_owner(page, order);
split_page_memcg(page, order);
}
EXPORT_SYMBOL_GPL(split_page);
@@ -4839,7 +4839,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
struct page *page = virt_to_page((void *)addr);
struct page *last = page + nr;

- split_page_owner(page, 1 << order);
+ split_page_owner(page, order);
split_page_memcg(page, order);
while (page < --last)
set_page_refcounted(last);
diff --git a/mm/page_owner.c b/mm/page_owner.c
index e56c1e92eccf..b678f7a6e702 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -306,11 +306,12 @@ void __set_page_owner_migrate_reason(struct page *page, int reason)
page_ext_put(page_ext);
}

-void __split_page_owner(struct page *page, unsigned int nr)
+void __split_page_owner(struct page *page, int order)
{
int i;
struct page_ext *page_ext = page_ext_get(page);
struct page_owner *page_owner;
+ unsigned int nr = 1 << order;

if (unlikely(!page_ext))
return;
--
2.43.0


2024-02-26 20:57:33

by Zi Yan

[permalink] [raw]
Subject: [PATCH v5 5/8] mm: memcg: make memcg huge page split support any order split.

From: Zi Yan <[email protected]>

It sets memcg information for the pages after the split. A new parameter
new_order is added to tell the order of subpages in the new page, always 0
for now. It prepares for upcoming changes to support split huge page to
any lower order.

Signed-off-by: Zi Yan <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
---
include/linux/memcontrol.h | 4 ++--
mm/huge_memory.c | 2 +-
mm/memcontrol.c | 11 ++++++-----
mm/page_alloc.c | 4 ++--
4 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 173bbb53c1ec..9a2dea92be0e 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1163,7 +1163,7 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm,
rcu_read_unlock();
}

-void split_page_memcg(struct page *head, int order);
+void split_page_memcg(struct page *head, int old_order, int new_order);

unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
gfp_t gfp_mask,
@@ -1621,7 +1621,7 @@ void count_memcg_event_mm(struct mm_struct *mm, enum vm_event_item idx)
{
}

-static inline void split_page_memcg(struct page *head, int order)
+static inline void split_page_memcg(struct page *head, int old_order, int new_order)
{
}

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2b95dbc95fae..5d4b7c17b9bc 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2894,7 +2894,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
unsigned int nr = 1 << order;

/* complete memcg works before add pages to LRU */
- split_page_memcg(head, order);
+ split_page_memcg(head, order, 0);

if (folio_test_anon(folio) && folio_test_swapcache(folio)) {
offset = swp_offset(folio->swap);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1a09f0e77c44..669bc8de3780 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3608,23 +3608,24 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
/*
* Because page_memcg(head) is not set on tails, set it now.
*/
-void split_page_memcg(struct page *head, int order)
+void split_page_memcg(struct page *head, int old_order, int new_order)
{
struct folio *folio = page_folio(head);
struct mem_cgroup *memcg = folio_memcg(folio);
int i;
- unsigned int nr = 1 << order;
+ unsigned int old_nr = 1 << old_order;
+ unsigned int new_nr = 1 << new_order;

if (mem_cgroup_disabled() || !memcg)
return;

- for (i = 1; i < nr; i++)
+ for (i = new_nr; i < old_nr; i += new_nr)
folio_page(folio, i)->memcg_data = folio->memcg_data;

if (folio_memcg_kmem(folio))
- obj_cgroup_get_many(__folio_objcg(folio), nr - 1);
+ obj_cgroup_get_many(__folio_objcg(folio), old_nr / new_nr - 1);
else
- css_get_many(&memcg->css, nr - 1);
+ css_get_many(&memcg->css, old_nr / new_nr - 1);
}

#ifdef CONFIG_SWAP
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index eae77e76a671..c31a468fe317 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2653,7 +2653,7 @@ void split_page(struct page *page, unsigned int order)
for (i = 1; i < (1 << order); i++)
set_page_refcounted(page + i);
split_page_owner(page, order);
- split_page_memcg(page, order);
+ split_page_memcg(page, order, 0);
}
EXPORT_SYMBOL_GPL(split_page);

@@ -4840,7 +4840,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
struct page *last = page + nr;

split_page_owner(page, order);
- split_page_memcg(page, order);
+ split_page_memcg(page, order, 0);
while (page < --last)
set_page_refcounted(last);

--
2.43.0


2024-02-26 20:57:46

by Zi Yan

[permalink] [raw]
Subject: [PATCH v5 6/8] mm: page_owner: add support for splitting to any order in split page_owner.

From: Zi Yan <[email protected]>

It adds a new_order parameter to set new page order in page owner.
It prepares for upcoming changes to support split huge page to any
lower order.

Signed-off-by: Zi Yan <[email protected]>
---
include/linux/page_owner.h | 13 ++++++++-----
mm/huge_memory.c | 2 +-
mm/page_alloc.c | 4 ++--
mm/page_owner.c | 7 +++----
4 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
index 2b39c8e19d98..debdc25f08b9 100644
--- a/include/linux/page_owner.h
+++ b/include/linux/page_owner.h
@@ -11,7 +11,8 @@ extern struct page_ext_operations page_owner_ops;
extern void __reset_page_owner(struct page *page, unsigned short order);
extern void __set_page_owner(struct page *page,
unsigned short order, gfp_t gfp_mask);
-extern void __split_page_owner(struct page *page, int order);
+extern void __split_page_owner(struct page *page, int old_order,
+ int new_order);
extern void __folio_copy_owner(struct folio *newfolio, struct folio *old);
extern void __set_page_owner_migrate_reason(struct page *page, int reason);
extern void __dump_page_owner(const struct page *page);
@@ -31,10 +32,11 @@ static inline void set_page_owner(struct page *page,
__set_page_owner(page, order, gfp_mask);
}

-static inline void split_page_owner(struct page *page, int order)
+static inline void split_page_owner(struct page *page, int old_order,
+ int new_order)
{
if (static_branch_unlikely(&page_owner_inited))
- __split_page_owner(page, order);
+ __split_page_owner(page, old_order, new_order);
}
static inline void folio_copy_owner(struct folio *newfolio, struct folio *old)
{
@@ -56,10 +58,11 @@ static inline void reset_page_owner(struct page *page, unsigned short order)
{
}
static inline void set_page_owner(struct page *page,
- unsigned int order, gfp_t gfp_mask)
+ unsigned short order, gfp_t gfp_mask)
{
}
-static inline void split_page_owner(struct page *page, int order)
+static inline void split_page_owner(struct page *page, int old_order,
+ int new_order)
{
}
static inline void folio_copy_owner(struct folio *newfolio, struct folio *folio)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5d4b7c17b9bc..b2df788c11fa 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2933,7 +2933,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
unlock_page_lruvec(lruvec);
/* Caller disabled irqs, so they are still disabled here */

- split_page_owner(head, order);
+ split_page_owner(head, order, 0);

/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c31a468fe317..cc41341c08f4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2652,7 +2652,7 @@ void split_page(struct page *page, unsigned int order)

for (i = 1; i < (1 << order); i++)
set_page_refcounted(page + i);
- split_page_owner(page, order);
+ split_page_owner(page, order, 0);
split_page_memcg(page, order, 0);
}
EXPORT_SYMBOL_GPL(split_page);
@@ -4839,7 +4839,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
struct page *page = virt_to_page((void *)addr);
struct page *last = page + nr;

- split_page_owner(page, order);
+ split_page_owner(page, order, 0);
split_page_memcg(page, order, 0);
while (page < --last)
set_page_refcounted(last);
diff --git a/mm/page_owner.c b/mm/page_owner.c
index b678f7a6e702..033e349f6479 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -306,19 +306,18 @@ void __set_page_owner_migrate_reason(struct page *page, int reason)
page_ext_put(page_ext);
}

-void __split_page_owner(struct page *page, int order)
+void __split_page_owner(struct page *page, int old_order, int new_order)
{
int i;
struct page_ext *page_ext = page_ext_get(page);
struct page_owner *page_owner;
- unsigned int nr = 1 << order;

if (unlikely(!page_ext))
return;

- for (i = 0; i < nr; i++) {
+ for (i = 0; i < (1 << old_order); i++) {
page_owner = get_page_owner(page_ext);
- page_owner->order = 0;
+ page_owner->order = new_order;
page_ext = page_ext_next(page_ext);
}
page_ext_put(page_ext);
--
2.43.0


2024-02-26 20:58:18

by Zi Yan

[permalink] [raw]
Subject: [PATCH v5 7/8] mm: thp: split huge page to any lower order pages

From: Zi Yan <[email protected]>

To split a THP to any lower order pages, we need to reform THPs on
subpages at given order and add page refcount based on the new page
order. Also we need to reinitialize page_deferred_list after removing
the page from the split_queue, otherwise a subsequent split will
see list corruption when checking the page_deferred_list again.

Note: Anonymous order-1 folio is not supported because _deferred_list,
which is used by partially mapped folios, is stored in subpage 2 and an
order-1 folio only has subpage 0 and 1. File-backed order-1 folios are
fine, since they do not use _deferred_list.

Signed-off-by: Zi Yan <[email protected]>
---
include/linux/huge_mm.h | 21 +++++---
mm/huge_memory.c | 110 +++++++++++++++++++++++++++++++---------
2 files changed, 99 insertions(+), 32 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 5adb86af35fc..de0c89105076 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -265,10 +265,11 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,

void folio_prep_large_rmappable(struct folio *folio);
bool can_split_folio(struct folio *folio, int *pextra_pins);
-int split_huge_page_to_list(struct page *page, struct list_head *list);
+int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
+ unsigned int new_order);
static inline int split_huge_page(struct page *page)
{
- return split_huge_page_to_list(page, NULL);
+ return split_huge_page_to_list_to_order(page, NULL, 0);
}
void deferred_split_folio(struct folio *folio);

@@ -422,7 +423,8 @@ can_split_folio(struct folio *folio, int *pextra_pins)
return false;
}
static inline int
-split_huge_page_to_list(struct page *page, struct list_head *list)
+split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
+ unsigned int new_order)
{
return 0;
}
@@ -519,17 +521,20 @@ static inline bool thp_migration_supported(void)
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */

-static inline int split_folio_to_list(struct folio *folio,
- struct list_head *list)
+static inline int split_folio_to_list_to_order(struct folio *folio,
+ struct list_head *list, int new_order)
{
- return split_huge_page_to_list(&folio->page, list);
+ return split_huge_page_to_list_to_order(&folio->page, list, new_order);
}

-static inline int split_folio(struct folio *folio)
+static inline int split_folio_to_order(struct folio *folio, int new_order)
{
- return split_folio_to_list(folio, NULL);
+ return split_folio_to_list_to_order(folio, NULL, new_order);
}

+#define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0)
+#define split_folio(f) split_folio_to_order(f, 0)
+
/*
* archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
* limitations in the implementation like arm64 MTE can override this to
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b2df788c11fa..8b47a96a28f9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2770,7 +2770,6 @@ static void lru_add_page_tail(struct page *head, struct page *tail,
struct lruvec *lruvec, struct list_head *list)
{
VM_BUG_ON_PAGE(!PageHead(head), head);
- VM_BUG_ON_PAGE(PageCompound(tail), head);
VM_BUG_ON_PAGE(PageLRU(tail), head);
lockdep_assert_held(&lruvec->lru_lock);

@@ -2791,7 +2790,8 @@ static void lru_add_page_tail(struct page *head, struct page *tail,
}

static void __split_huge_page_tail(struct folio *folio, int tail,
- struct lruvec *lruvec, struct list_head *list)
+ struct lruvec *lruvec, struct list_head *list,
+ unsigned int new_order)
{
struct page *head = &folio->page;
struct page *page_tail = head + tail;
@@ -2861,10 +2861,15 @@ static void __split_huge_page_tail(struct folio *folio, int tail,
* which needs correct compound_head().
*/
clear_compound_head(page_tail);
+ if (new_order) {
+ prep_compound_page(page_tail, new_order);
+ folio_prep_large_rmappable(new_folio);
+ }

/* Finally unfreeze refcount. Additional reference from page cache. */
- page_ref_unfreeze(page_tail, 1 + (!folio_test_anon(folio) ||
- folio_test_swapcache(folio)));
+ page_ref_unfreeze(page_tail,
+ 1 + ((!folio_test_anon(folio) || folio_test_swapcache(folio)) ?
+ folio_nr_pages(new_folio) : 0));

if (folio_test_young(folio))
folio_set_young(new_folio);
@@ -2882,7 +2887,7 @@ static void __split_huge_page_tail(struct folio *folio, int tail,
}

static void __split_huge_page(struct page *page, struct list_head *list,
- pgoff_t end)
+ pgoff_t end, unsigned int new_order)
{
struct folio *folio = page_folio(page);
struct page *head = &folio->page;
@@ -2890,11 +2895,12 @@ static void __split_huge_page(struct page *page, struct list_head *list,
struct address_space *swap_cache = NULL;
unsigned long offset = 0;
int i, nr_dropped = 0;
+ unsigned int new_nr = 1 << new_order;
int order = folio_order(folio);
unsigned int nr = 1 << order;

/* complete memcg works before add pages to LRU */
- split_page_memcg(head, order, 0);
+ split_page_memcg(head, order, new_order);

if (folio_test_anon(folio) && folio_test_swapcache(folio)) {
offset = swp_offset(folio->swap);
@@ -2907,8 +2913,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,

ClearPageHasHWPoisoned(head);

- for (i = nr - 1; i >= 1; i--) {
- __split_huge_page_tail(folio, i, lruvec, list);
+ for (i = nr - new_nr; i >= new_nr; i -= new_nr) {
+ __split_huge_page_tail(folio, i, lruvec, list, new_order);
/* Some pages can be beyond EOF: drop them from page cache */
if (head[i].index >= end) {
struct folio *tail = page_folio(head + i);
@@ -2929,24 +2935,30 @@ static void __split_huge_page(struct page *page, struct list_head *list,
}
}

- ClearPageCompound(head);
+ if (!new_order)
+ ClearPageCompound(head);
+ else {
+ struct folio *new_folio = (struct folio *)head;
+
+ folio_set_order(new_folio, new_order);
+ }
unlock_page_lruvec(lruvec);
/* Caller disabled irqs, so they are still disabled here */

- split_page_owner(head, order, 0);
+ split_page_owner(head, order, new_order);

/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
/* Additional pin to swap cache */
if (PageSwapCache(head)) {
- page_ref_add(head, 2);
+ page_ref_add(head, 1 + new_nr);
xa_unlock(&swap_cache->i_pages);
} else {
page_ref_inc(head);
}
} else {
/* Additional pin to page cache */
- page_ref_add(head, 2);
+ page_ref_add(head, 1 + new_nr);
xa_unlock(&head->mapping->i_pages);
}
local_irq_enable();
@@ -2958,7 +2970,15 @@ static void __split_huge_page(struct page *page, struct list_head *list,
if (folio_test_swapcache(folio))
split_swap_cluster(folio->swap);

- for (i = 0; i < nr; i++) {
+ /*
+ * set page to its compound_head when split to non order-0 pages, so
+ * we can skip unlocking it below, since PG_locked is transferred to
+ * the compound_head of the page and the caller will unlock it.
+ */
+ if (new_order)
+ page = compound_head(page);
+
+ for (i = 0; i < nr; i += new_nr) {
struct page *subpage = head + i;
if (subpage == page)
continue;
@@ -2992,29 +3012,36 @@ bool can_split_folio(struct folio *folio, int *pextra_pins)
}

/*
- * This function splits huge page into normal pages. @page can point to any
- * subpage of huge page to split. Split doesn't change the position of @page.
+ * This function splits huge page into pages in @new_order. @page can point to
+ * any subpage of huge page to split. Split doesn't change the position of
+ * @page.
+ *
+ * NOTE: order-1 anonymous folio is not supported because _deferred_list,
+ * which is used by partially mapped folios, is stored in subpage 2 and an
+ * order-1 folio only has subpage 0 and 1. File-backed order-1 folios are OK,
+ * since they do not use _deferred_list.
*
* Only caller must hold pin on the @page, otherwise split fails with -EBUSY.
* The huge page must be locked.
*
* If @list is null, tail pages will be added to LRU list, otherwise, to @list.
*
- * Both head page and tail pages will inherit mapping, flags, and so on from
- * the hugepage.
+ * Pages in new_order will inherit mapping, flags, and so on from the hugepage.
*
- * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
- * they are not mapped.
+ * GUP pin and PG_locked transferred to @page or the compound page @page belongs
+ * to. Rest subpages can be freed if they are not mapped.
*
* Returns 0 if the hugepage is split successfully.
* Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
* us.
*/
-int split_huge_page_to_list(struct page *page, struct list_head *list)
+int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
+ unsigned int new_order)
{
struct folio *folio = page_folio(page);
struct deferred_split *ds_queue = get_deferred_split_queue(folio);
- XA_STATE(xas, &folio->mapping->i_pages, folio->index);
+ /* reset xarray order to new order after split */
+ XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
struct anon_vma *anon_vma = NULL;
struct address_space *mapping = NULL;
int extra_pins, ret;
@@ -3024,6 +3051,34 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);

+ /* Cannot split anonymous THP to order-1 */
+ if (new_order == 1 && folio_test_anon(folio)) {
+ VM_WARN_ONCE(1, "Cannot split to order-1 folio");
+ return -EINVAL;
+ }
+
+ if (new_order) {
+ /* Only swapping a whole PMD-mapped folio is supported */
+ if (folio_test_swapcache(folio)) {
+ VM_WARN_ONCE(1,
+ "Cannot split swap-cached folio to non-0 order");
+ return -EINVAL;
+ }
+ /* Split shmem folio to non-zero order not supported */
+ if (shmem_mapping(folio->mapping)) {
+ VM_WARN_ONCE(1,
+ "Cannot split shmem folio to non-0 order");
+ return -EINVAL;
+ }
+ /* No split if the file system does not support large folio */
+ if (!mapping_large_folio_support(folio->mapping)) {
+ VM_WARN_ONCE(1,
+ "Cannot split file folio to non-0 order");
+ return -EINVAL;
+ }
+ }
+
+
is_hzp = is_huge_zero_page(&folio->page);
if (is_hzp) {
pr_warn_ratelimited("Called split_huge_page for huge zero page\n");
@@ -3120,14 +3175,21 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
if (folio_order(folio) > 1 &&
!list_empty(&folio->_deferred_list)) {
ds_queue->split_queue_len--;
- list_del(&folio->_deferred_list);
+ /*
+ * Reinitialize page_deferred_list after removing the
+ * page from the split_queue, otherwise a subsequent
+ * split will see list corruption when checking the
+ * page_deferred_list.
+ */
+ list_del_init(&folio->_deferred_list);
}
spin_unlock(&ds_queue->split_queue_lock);
if (mapping) {
int nr = folio_nr_pages(folio);

xas_split(&xas, folio, folio_order(folio));
- if (folio_test_pmd_mappable(folio)) {
+ if (folio_test_pmd_mappable(folio) &&
+ new_order < HPAGE_PMD_ORDER) {
if (folio_test_swapbacked(folio)) {
__lruvec_stat_mod_folio(folio,
NR_SHMEM_THPS, -nr);
@@ -3139,7 +3201,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
}
}

- __split_huge_page(page, list, end);
+ __split_huge_page(page, list, end, new_order);
ret = 0;
} else {
spin_unlock(&ds_queue->split_queue_lock);
--
2.43.0


2024-02-26 20:58:24

by Zi Yan

[permalink] [raw]
Subject: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

From: Zi Yan <[email protected]>

It is used to test split_huge_page_to_list_to_order for pagecache THPs.
Also add test cases for split_huge_page_to_list_to_order via both
debugfs.

Signed-off-by: Zi Yan <[email protected]>
---
mm/huge_memory.c | 34 ++++--
.../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
2 files changed, 131 insertions(+), 18 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8b47a96a28f9..50d146eb248f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3422,7 +3422,7 @@ static inline bool vma_not_suitable_for_thp_split(struct vm_area_struct *vma)
}

static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
- unsigned long vaddr_end)
+ unsigned long vaddr_end, unsigned int new_order)
{
int ret = 0;
struct task_struct *task;
@@ -3486,13 +3486,19 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
goto next;

total++;
- if (!can_split_folio(folio, NULL))
+ /*
+ * For folios with private, split_huge_page_to_list_to_order()
+ * will try to drop it before split and then check if the folio
+ * can be split or not. So skip the check here.
+ */
+ if (!folio_test_private(folio) &&
+ !can_split_folio(folio, NULL))
goto next;

if (!folio_trylock(folio))
goto next;

- if (!split_folio(folio))
+ if (!split_folio_to_order(folio, new_order))
split++;

folio_unlock(folio);
@@ -3510,7 +3516,7 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
}

static int split_huge_pages_in_file(const char *file_path, pgoff_t off_start,
- pgoff_t off_end)
+ pgoff_t off_end, unsigned int new_order)
{
struct filename *file;
struct file *candidate;
@@ -3549,7 +3555,7 @@ static int split_huge_pages_in_file(const char *file_path, pgoff_t off_start,
if (!folio_trylock(folio))
goto next;

- if (!split_folio(folio))
+ if (!split_folio_to_order(folio, new_order))
split++;

folio_unlock(folio);
@@ -3574,10 +3580,14 @@ static ssize_t split_huge_pages_write(struct file *file, const char __user *buf,
{
static DEFINE_MUTEX(split_debug_mutex);
ssize_t ret;
- /* hold pid, start_vaddr, end_vaddr or file_path, off_start, off_end */
+ /*
+ * hold pid, start_vaddr, end_vaddr, new_order or
+ * file_path, off_start, off_end, new_order
+ */
char input_buf[MAX_INPUT_BUF_SZ];
int pid;
unsigned long vaddr_start, vaddr_end;
+ unsigned int new_order = 0;

ret = mutex_lock_interruptible(&split_debug_mutex);
if (ret)
@@ -3606,29 +3616,29 @@ static ssize_t split_huge_pages_write(struct file *file, const char __user *buf,
goto out;
}

- ret = sscanf(buf, "0x%lx,0x%lx", &off_start, &off_end);
- if (ret != 2) {
+ ret = sscanf(buf, "0x%lx,0x%lx,%d", &off_start, &off_end, &new_order);
+ if (ret != 2 && ret != 3) {
ret = -EINVAL;
goto out;
}
- ret = split_huge_pages_in_file(file_path, off_start, off_end);
+ ret = split_huge_pages_in_file(file_path, off_start, off_end, new_order);
if (!ret)
ret = input_len;

goto out;
}

- ret = sscanf(input_buf, "%d,0x%lx,0x%lx", &pid, &vaddr_start, &vaddr_end);
+ ret = sscanf(input_buf, "%d,0x%lx,0x%lx,%d", &pid, &vaddr_start, &vaddr_end, &new_order);
if (ret == 1 && pid == 1) {
split_huge_pages_all();
ret = strlen(input_buf);
goto out;
- } else if (ret != 3) {
+ } else if (ret != 3 && ret != 4) {
ret = -EINVAL;
goto out;
}

- ret = split_huge_pages_pid(pid, vaddr_start, vaddr_end);
+ ret = split_huge_pages_pid(pid, vaddr_start, vaddr_end, new_order);
if (!ret)
ret = strlen(input_buf);
out:
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index 7b698a848bab..cf09fdc9ef22 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -16,6 +16,7 @@
#include <sys/mount.h>
#include <malloc.h>
#include <stdbool.h>
+#include <time.h>
#include "vm_util.h"
#include "../kselftest.h"

@@ -24,10 +25,12 @@ unsigned int pageshift;
uint64_t pmd_pagesize;

#define SPLIT_DEBUGFS "/sys/kernel/debug/split_huge_pages"
+#define SMAP_PATH "/proc/self/smaps"
+#define THP_FS_PATH "/mnt/thp_fs"
#define INPUT_MAX 80

-#define PID_FMT "%d,0x%lx,0x%lx"
-#define PATH_FMT "%s,0x%lx,0x%lx"
+#define PID_FMT "%d,0x%lx,0x%lx,%d"
+#define PATH_FMT "%s,0x%lx,0x%lx,%d"

#define PFN_MASK ((1UL<<55)-1)
#define KPF_THP (1UL<<22)
@@ -102,7 +105,7 @@ void split_pmd_thp(void)

/* split all THPs */
write_debugfs(PID_FMT, getpid(), (uint64_t)one_page,
- (uint64_t)one_page + len);
+ (uint64_t)one_page + len, 0);

for (i = 0; i < len; i++)
if (one_page[i] != (char)i)
@@ -177,7 +180,7 @@ void split_pte_mapped_thp(void)

/* split all remapped THPs */
write_debugfs(PID_FMT, getpid(), (uint64_t)pte_mapped,
- (uint64_t)pte_mapped + pagesize * 4);
+ (uint64_t)pte_mapped + pagesize * 4, 0);

/* smap does not show THPs after mremap, use kpageflags instead */
thp_size = 0;
@@ -237,7 +240,7 @@ void split_file_backed_thp(void)
}

/* split the file-backed THP */
- write_debugfs(PATH_FMT, testfile, pgoff_start, pgoff_end);
+ write_debugfs(PATH_FMT, testfile, pgoff_start, pgoff_end, 0);

status = unlink(testfile);
if (status) {
@@ -265,8 +268,101 @@ void split_file_backed_thp(void)
ksft_exit_fail_msg("Error occurred\n");
}

+void create_pagecache_thp_and_fd(const char *testfile, size_t fd_size, int *fd, char **addr)
+{
+ size_t i;
+ int dummy;
+
+ srand(time(NULL));
+
+ *fd = open(testfile, O_CREAT | O_RDWR, 0664);
+ if (*fd == -1)
+ ksft_exit_fail_msg("Failed to create a file at "THP_FS_PATH);
+
+ for (i = 0; i < fd_size; i++) {
+ unsigned char byte = (unsigned char)i;
+
+ write(*fd, &byte, sizeof(byte));
+ }
+ close(*fd);
+ sync();
+ *fd = open("/proc/sys/vm/drop_caches", O_WRONLY);
+ if (*fd == -1) {
+ ksft_perror("open drop_caches");
+ goto err_out_unlink;
+ }
+ if (write(*fd, "3", 1) != 1) {
+ ksft_perror("write to drop_caches");
+ goto err_out_unlink;
+ }
+ close(*fd);
+
+ *fd = open(testfile, O_RDWR);
+ if (*fd == -1) {
+ ksft_perror("Failed to open a file at "THP_FS_PATH);
+ goto err_out_unlink;
+ }
+
+ *addr = mmap(NULL, fd_size, PROT_READ|PROT_WRITE, MAP_SHARED, *fd, 0);
+ if (*addr == (char *)-1) {
+ ksft_perror("cannot mmap");
+ goto err_out_close;
+ }
+ madvise(*addr, fd_size, MADV_HUGEPAGE);
+
+ for (size_t i = 0; i < fd_size; i++)
+ dummy += *(*addr + i);
+
+ if (!check_huge_file(*addr, fd_size / pmd_pagesize, pmd_pagesize)) {
+ ksft_print_msg("No large pagecache folio generated, please mount a filesystem supporting large folio at "THP_FS_PATH"\n");
+ goto err_out_close;
+ }
+ return;
+err_out_close:
+ close(*fd);
+err_out_unlink:
+ unlink(testfile);
+ ksft_exit_fail_msg("Failed to create large pagecache folios\n");
+}
+
+void split_thp_in_pagecache_to_order(size_t fd_size, int order)
+{
+ int fd;
+ char *addr;
+ size_t i;
+ const char testfile[] = THP_FS_PATH "/test";
+ int err = 0;
+
+ create_pagecache_thp_and_fd(testfile, fd_size, &fd, &addr);
+
+ write_debugfs(PID_FMT, getpid(), (uint64_t)addr, (uint64_t)addr + fd_size, order);
+
+ for (i = 0; i < fd_size; i++)
+ if (*(addr + i) != (char)i) {
+ ksft_print_msg("%lu byte corrupted in the file\n", i);
+ err = EXIT_FAILURE;
+ goto out;
+ }
+
+ if (!check_huge_file(addr, 0, pmd_pagesize)) {
+ ksft_print_msg("Still FilePmdMapped not split\n");
+ err = EXIT_FAILURE;
+ goto out;
+ }
+
+out:
+ close(fd);
+ unlink(testfile);
+ if (err)
+ ksft_exit_fail_msg("Split PMD-mapped pagecache folio to order %d failed\n", order);
+ ksft_test_result_pass("Split PMD-mapped pagecache folio to order %d passed\n", order);
+}
+
int main(int argc, char **argv)
{
+ int i;
+ size_t fd_size;
+
ksft_print_header();

if (geteuid() != 0) {
@@ -274,7 +370,7 @@ int main(int argc, char **argv)
ksft_finished();
}

- ksft_set_plan(3);
+ ksft_set_plan(3+9);

pagesize = getpagesize();
pageshift = ffs(pagesize) - 1;
@@ -282,9 +378,16 @@ int main(int argc, char **argv)
if (!pmd_pagesize)
ksft_exit_fail_msg("Reading PMD pagesize failed\n");

+ fd_size = 2 * pmd_pagesize;
+
split_pmd_thp();
split_pte_mapped_thp();
split_file_backed_thp();

+ for (i = 8; i >= 0; i--)
+ split_thp_in_pagecache_to_order(fd_size, i);
+
ksft_finished();
+
+ return 0;
}
--
2.43.0


2024-02-28 08:23:42

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v5 7/8] mm: thp: split huge page to any lower order pages

Hi Zi,


On 26/02/2024 20:55, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> To split a THP to any lower order pages, we need to reform THPs on
> subpages at given order and add page refcount based on the new page
> order. Also we need to reinitialize page_deferred_list after removing
> the page from the split_queue, otherwise a subsequent split will
> see list corruption when checking the page_deferred_list again.
>
> Note: Anonymous order-1 folio is not supported because _deferred_list,
> which is used by partially mapped folios, is stored in subpage 2 and an
> order-1 folio only has subpage 0 and 1. File-backed order-1 folios are
> fine, since they do not use _deferred_list.
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> include/linux/huge_mm.h | 21 +++++---
> mm/huge_memory.c | 110 +++++++++++++++++++++++++++++++---------
> 2 files changed, 99 insertions(+), 32 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 5adb86af35fc..de0c89105076 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -265,10 +265,11 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
>
> void folio_prep_large_rmappable(struct folio *folio);
> bool can_split_folio(struct folio *folio, int *pextra_pins);
> -int split_huge_page_to_list(struct page *page, struct list_head *list);
> +int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> + unsigned int new_order);
> static inline int split_huge_page(struct page *page)
> {
> - return split_huge_page_to_list(page, NULL);
> + return split_huge_page_to_list_to_order(page, NULL, 0);
> }
> void deferred_split_folio(struct folio *folio);
>
> @@ -422,7 +423,8 @@ can_split_folio(struct folio *folio, int *pextra_pins)
> return false;
> }
> static inline int
> -split_huge_page_to_list(struct page *page, struct list_head *list)
> +split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> + unsigned int new_order)
> {
> return 0;
> }
> @@ -519,17 +521,20 @@ static inline bool thp_migration_supported(void)
> }
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> -static inline int split_folio_to_list(struct folio *folio,
> - struct list_head *list)
> +static inline int split_folio_to_list_to_order(struct folio *folio,
> + struct list_head *list, int new_order)
> {
> - return split_huge_page_to_list(&folio->page, list);
> + return split_huge_page_to_list_to_order(&folio->page, list, new_order);
> }
>
> -static inline int split_folio(struct folio *folio)
> +static inline int split_folio_to_order(struct folio *folio, int new_order)
> {
> - return split_folio_to_list(folio, NULL);
> + return split_folio_to_list_to_order(folio, NULL, new_order);
> }
>
> +#define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0)
> +#define split_folio(f) split_folio_to_order(f, 0)
> +
> /*
> * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
> * limitations in the implementation like arm64 MTE can override this to
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b2df788c11fa..8b47a96a28f9 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2770,7 +2770,6 @@ static void lru_add_page_tail(struct page *head, struct page *tail,
> struct lruvec *lruvec, struct list_head *list)
> {
> VM_BUG_ON_PAGE(!PageHead(head), head);
> - VM_BUG_ON_PAGE(PageCompound(tail), head);
> VM_BUG_ON_PAGE(PageLRU(tail), head);
> lockdep_assert_held(&lruvec->lru_lock);
>
> @@ -2791,7 +2790,8 @@ static void lru_add_page_tail(struct page *head, struct page *tail,
> }
>
> static void __split_huge_page_tail(struct folio *folio, int tail,
> - struct lruvec *lruvec, struct list_head *list)
> + struct lruvec *lruvec, struct list_head *list,
> + unsigned int new_order)
> {
> struct page *head = &folio->page;
> struct page *page_tail = head + tail;
> @@ -2861,10 +2861,15 @@ static void __split_huge_page_tail(struct folio *folio, int tail,
> * which needs correct compound_head().
> */
> clear_compound_head(page_tail);
> + if (new_order) {
> + prep_compound_page(page_tail, new_order);
> + folio_prep_large_rmappable(new_folio);
> + }
>
> /* Finally unfreeze refcount. Additional reference from page cache. */
> - page_ref_unfreeze(page_tail, 1 + (!folio_test_anon(folio) ||
> - folio_test_swapcache(folio)));
> + page_ref_unfreeze(page_tail,
> + 1 + ((!folio_test_anon(folio) || folio_test_swapcache(folio)) ?
> + folio_nr_pages(new_folio) : 0));
>
> if (folio_test_young(folio))
> folio_set_young(new_folio);
> @@ -2882,7 +2887,7 @@ static void __split_huge_page_tail(struct folio *folio, int tail,
> }
>
> static void __split_huge_page(struct page *page, struct list_head *list,
> - pgoff_t end)
> + pgoff_t end, unsigned int new_order)
> {
> struct folio *folio = page_folio(page);
> struct page *head = &folio->page;
> @@ -2890,11 +2895,12 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> struct address_space *swap_cache = NULL;
> unsigned long offset = 0;
> int i, nr_dropped = 0;
> + unsigned int new_nr = 1 << new_order;
> int order = folio_order(folio);
> unsigned int nr = 1 << order;
>
> /* complete memcg works before add pages to LRU */
> - split_page_memcg(head, order, 0);
> + split_page_memcg(head, order, new_order);
>
> if (folio_test_anon(folio) && folio_test_swapcache(folio)) {
> offset = swp_offset(folio->swap);
> @@ -2907,8 +2913,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>
> ClearPageHasHWPoisoned(head);
>
> - for (i = nr - 1; i >= 1; i--) {
> - __split_huge_page_tail(folio, i, lruvec, list);
> + for (i = nr - new_nr; i >= new_nr; i -= new_nr) {
> + __split_huge_page_tail(folio, i, lruvec, list, new_order);
> /* Some pages can be beyond EOF: drop them from page cache */
> if (head[i].index >= end) {
> struct folio *tail = page_folio(head + i);
> @@ -2929,24 +2935,30 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> }
> }
>
> - ClearPageCompound(head);
> + if (!new_order)
> + ClearPageCompound(head);
> + else {
> + struct folio *new_folio = (struct folio *)head;
> +
> + folio_set_order(new_folio, new_order);
> + }
> unlock_page_lruvec(lruvec);
> /* Caller disabled irqs, so they are still disabled here */
>
> - split_page_owner(head, order, 0);
> + split_page_owner(head, order, new_order);
>
> /* See comment in __split_huge_page_tail() */
> if (PageAnon(head)) {
> /* Additional pin to swap cache */
> if (PageSwapCache(head)) {
> - page_ref_add(head, 2);
> + page_ref_add(head, 1 + new_nr);
> xa_unlock(&swap_cache->i_pages);
> } else {
> page_ref_inc(head);
> }
> } else {
> /* Additional pin to page cache */
> - page_ref_add(head, 2);
> + page_ref_add(head, 1 + new_nr);
> xa_unlock(&head->mapping->i_pages);
> }
> local_irq_enable();
> @@ -2958,7 +2970,15 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> if (folio_test_swapcache(folio))
> split_swap_cluster(folio->swap);
>
> - for (i = 0; i < nr; i++) {
> + /*
> + * set page to its compound_head when split to non order-0 pages, so
> + * we can skip unlocking it below, since PG_locked is transferred to
> + * the compound_head of the page and the caller will unlock it.
> + */
> + if (new_order)
> + page = compound_head(page);
> +
> + for (i = 0; i < nr; i += new_nr) {
> struct page *subpage = head + i;
> if (subpage == page)
> continue;
> @@ -2992,29 +3012,36 @@ bool can_split_folio(struct folio *folio, int *pextra_pins)
> }
>
> /*
> - * This function splits huge page into normal pages. @page can point to any
> - * subpage of huge page to split. Split doesn't change the position of @page.
> + * This function splits huge page into pages in @new_order. @page can point to
> + * any subpage of huge page to split. Split doesn't change the position of
> + * @page.
> + *
> + * NOTE: order-1 anonymous folio is not supported because _deferred_list,
> + * which is used by partially mapped folios, is stored in subpage 2 and an
> + * order-1 folio only has subpage 0 and 1. File-backed order-1 folios are OK,
> + * since they do not use _deferred_list.
> *
> * Only caller must hold pin on the @page, otherwise split fails with -EBUSY.
> * The huge page must be locked.
> *
> * If @list is null, tail pages will be added to LRU list, otherwise, to @list.
> *
> - * Both head page and tail pages will inherit mapping, flags, and so on from
> - * the hugepage.
> + * Pages in new_order will inherit mapping, flags, and so on from the hugepage.
> *
> - * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
> - * they are not mapped.
> + * GUP pin and PG_locked transferred to @page or the compound page @page belongs
> + * to. Rest subpages can be freed if they are not mapped.
> *
> * Returns 0 if the hugepage is split successfully.
> * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
> * us.
> */
> -int split_huge_page_to_list(struct page *page, struct list_head *list)
> +int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> + unsigned int new_order)
> {
> struct folio *folio = page_folio(page);
> struct deferred_split *ds_queue = get_deferred_split_queue(folio);
> - XA_STATE(xas, &folio->mapping->i_pages, folio->index);
> + /* reset xarray order to new order after split */
> + XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
> struct anon_vma *anon_vma = NULL;
> struct address_space *mapping = NULL;
> int extra_pins, ret;
> @@ -3024,6 +3051,34 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
> VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
> VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
>
> + /* Cannot split anonymous THP to order-1 */
> + if (new_order == 1 && folio_test_anon(folio)) {
> + VM_WARN_ONCE(1, "Cannot split to order-1 folio");
> + return -EINVAL;
> + }
> +
> + if (new_order) {
> + /* Only swapping a whole PMD-mapped folio is supported */
> + if (folio_test_swapcache(folio)) {
> + VM_WARN_ONCE(1,
> + "Cannot split swap-cached folio to non-0 order");

My understanding may be wrong here, but can't the folio be moved to swapcache
asynchronously? How does the caller guarrantee that the folio is not in
swapcache and will not be moved between the call to
split_huge_page_to_list_to_order() and this test? If the caller can't prevent
it, then isn't it wrong to raise a warning here? Perhaps you just have to fail
to split?

I'm guessing this restriction is because swap only supports order-0 and
pmd-order folios currently? (And you only have split_swap_cluster() to downgrade
from pmd-order to order-0). Perhaps you need my series that allows swapping out
any order THP? Current version at [1] but I'm working on a new version.

[1] https://lore.kernel.org/linux-mm/[email protected]/

Thanks,
Ryan


> + return -EINVAL;
> + }
> + /* Split shmem folio to non-zero order not supported */
> + if (shmem_mapping(folio->mapping)) {
> + VM_WARN_ONCE(1,
> + "Cannot split shmem folio to non-0 order");
> + return -EINVAL;
> + }
> + /* No split if the file system does not support large folio */
> + if (!mapping_large_folio_support(folio->mapping)) {
> + VM_WARN_ONCE(1,
> + "Cannot split file folio to non-0 order");
> + return -EINVAL;
> + }
> + }
> +
> +
> is_hzp = is_huge_zero_page(&folio->page);
> if (is_hzp) {
> pr_warn_ratelimited("Called split_huge_page for huge zero page\n");
> @@ -3120,14 +3175,21 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
> if (folio_order(folio) > 1 &&
> !list_empty(&folio->_deferred_list)) {
> ds_queue->split_queue_len--;
> - list_del(&folio->_deferred_list);
> + /*
> + * Reinitialize page_deferred_list after removing the
> + * page from the split_queue, otherwise a subsequent
> + * split will see list corruption when checking the
> + * page_deferred_list.
> + */
> + list_del_init(&folio->_deferred_list);
> }
> spin_unlock(&ds_queue->split_queue_lock);
> if (mapping) {
> int nr = folio_nr_pages(folio);
>
> xas_split(&xas, folio, folio_order(folio));
> - if (folio_test_pmd_mappable(folio)) {
> + if (folio_test_pmd_mappable(folio) &&
> + new_order < HPAGE_PMD_ORDER) {
> if (folio_test_swapbacked(folio)) {
> __lruvec_stat_mod_folio(folio,
> NR_SHMEM_THPS, -nr);
> @@ -3139,7 +3201,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
> }
> }
>
> - __split_huge_page(page, list, end);
> + __split_huge_page(page, list, end, new_order);
> ret = 0;
> } else {
> spin_unlock(&ds_queue->split_queue_lock);


2024-02-28 10:30:57

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v5 1/8] mm/huge_memory: only split PMD mapping when necessary in unmap_folio()

On 26.02.24 21:55, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> As multi-size THP support is added, not all THPs are PMD-mapped, thus
> during a huge page split, there is no need to always split PMD mapping
> in unmap_folio(). Make it conditional.
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> mm/huge_memory.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 28341a5067fb..b20e535e874c 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2727,11 +2727,14 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,
>
> static void unmap_folio(struct folio *folio)
> {
> - enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD |
> - TTU_SYNC | TTU_BATCH_FLUSH;
> + enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SYNC |
> + TTU_BATCH_FLUSH;
>
> VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
>
> + if (folio_test_pmd_mappable(folio))
> + ttu_flags |= TTU_SPLIT_HUGE_PMD;
> +
> /*
> * Anon pages need migration entries to preserve them, but file
> * pages can simply be left unmapped, then faulted back on demand.

Reviewed-by: David Hildenbrand <[email protected]>

--
Cheers,

David / dhildenb


2024-02-28 10:45:41

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v5 6/8] mm: page_owner: add support for splitting to any order in split page_owner.

On 26.02.24 21:55, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> It adds a new_order parameter to set new page order in page owner.
> It prepares for upcoming changes to support split huge page to any
> lower order.
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> include/linux/page_owner.h | 13 ++++++++-----
> mm/huge_memory.c | 2 +-
> mm/page_alloc.c | 4 ++--
> mm/page_owner.c | 7 +++----
> 4 files changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
> index 2b39c8e19d98..debdc25f08b9 100644
> --- a/include/linux/page_owner.h
> +++ b/include/linux/page_owner.h
> @@ -11,7 +11,8 @@ extern struct page_ext_operations page_owner_ops;
> extern void __reset_page_owner(struct page *page, unsigned short order);
> extern void __set_page_owner(struct page *page,
> unsigned short order, gfp_t gfp_mask);
> -extern void __split_page_owner(struct page *page, int order);
> +extern void __split_page_owner(struct page *page, int old_order,
> + int new_order);
> extern void __folio_copy_owner(struct folio *newfolio, struct folio *old);
> extern void __set_page_owner_migrate_reason(struct page *page, int reason);
> extern void __dump_page_owner(const struct page *page);
> @@ -31,10 +32,11 @@ static inline void set_page_owner(struct page *page,
> __set_page_owner(page, order, gfp_mask);
> }
>
> -static inline void split_page_owner(struct page *page, int order)
> +static inline void split_page_owner(struct page *page, int old_order,
> + int new_order)
> {
> if (static_branch_unlikely(&page_owner_inited))
> - __split_page_owner(page, order);
> + __split_page_owner(page, old_order, new_order);
> }
> static inline void folio_copy_owner(struct folio *newfolio, struct folio *old)
> {
> @@ -56,10 +58,11 @@ static inline void reset_page_owner(struct page *page, unsigned short order)
> {
> }
> static inline void set_page_owner(struct page *page,
> - unsigned int order, gfp_t gfp_mask)
> + unsigned short order, gfp_t gfp_mask)
> {
> }
> -static inline void split_page_owner(struct page *page, int order)
> +static inline void split_page_owner(struct page *page, int old_order,
> + int new_order)
> {
> }
> static inline void folio_copy_owner(struct folio *newfolio, struct folio *folio)
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 5d4b7c17b9bc..b2df788c11fa 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2933,7 +2933,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> unlock_page_lruvec(lruvec);
> /* Caller disabled irqs, so they are still disabled here */
>
> - split_page_owner(head, order);
> + split_page_owner(head, order, 0);
>
> /* See comment in __split_huge_page_tail() */
> if (PageAnon(head)) {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c31a468fe317..cc41341c08f4 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2652,7 +2652,7 @@ void split_page(struct page *page, unsigned int order)
>
> for (i = 1; i < (1 << order); i++)
> set_page_refcounted(page + i);
> - split_page_owner(page, order);
> + split_page_owner(page, order, 0);
> split_page_memcg(page, order, 0);
> }
> EXPORT_SYMBOL_GPL(split_page);
> @@ -4839,7 +4839,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
> struct page *page = virt_to_page((void *)addr);
> struct page *last = page + nr;
>
> - split_page_owner(page, order);
> + split_page_owner(page, order, 0);
> split_page_memcg(page, order, 0);
> while (page < --last)
> set_page_refcounted(last);
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index b678f7a6e702..033e349f6479 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -306,19 +306,18 @@ void __set_page_owner_migrate_reason(struct page *page, int reason)
> page_ext_put(page_ext);
> }
>
> -void __split_page_owner(struct page *page, int order)
> +void __split_page_owner(struct page *page, int old_order, int new_order)
> {
> int i;
> struct page_ext *page_ext = page_ext_get(page);
> struct page_owner *page_owner;
> - unsigned int nr = 1 << order;
>
> if (unlikely(!page_ext))
> return;
>
> - for (i = 0; i < nr; i++) {
> + for (i = 0; i < (1 << old_order); i++) {
> page_owner = get_page_owner(page_ext);
> - page_owner->order = 0;
> + page_owner->order = new_order;
> page_ext = page_ext_next(page_ext);
> }
> page_ext_put(page_ext);

Acked-by: David Hildenbrand <[email protected]>

--
Cheers,

David / dhildenb


2024-02-28 15:46:37

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v5 7/8] mm: thp: split huge page to any lower order pages

On 28/02/2024 15:42, Zi Yan wrote:
> On 28 Feb 2024, at 3:23, Ryan Roberts wrote:
>
>> Hi Zi,
>>
>>
>> On 26/02/2024 20:55, Zi Yan wrote:
>>> From: Zi Yan <[email protected]>
>>>
>>> To split a THP to any lower order pages, we need to reform THPs on
>>> subpages at given order and add page refcount based on the new page
>>> order. Also we need to reinitialize page_deferred_list after removing
>>> the page from the split_queue, otherwise a subsequent split will
>>> see list corruption when checking the page_deferred_list again.
>>>
>>> Note: Anonymous order-1 folio is not supported because _deferred_list,
>>> which is used by partially mapped folios, is stored in subpage 2 and an
>>> order-1 folio only has subpage 0 and 1. File-backed order-1 folios are
>>> fine, since they do not use _deferred_list.
>>>
>>> Signed-off-by: Zi Yan <[email protected]>
>>> ---
>>> include/linux/huge_mm.h | 21 +++++---
>>> mm/huge_memory.c | 110 +++++++++++++++++++++++++++++++---------
>>> 2 files changed, 99 insertions(+), 32 deletions(-)
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 5adb86af35fc..de0c89105076 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -265,10 +265,11 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
>>>
>>> void folio_prep_large_rmappable(struct folio *folio);
>>> bool can_split_folio(struct folio *folio, int *pextra_pins);
>>> -int split_huge_page_to_list(struct page *page, struct list_head *list);
>>> +int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>>> + unsigned int new_order);
>>> static inline int split_huge_page(struct page *page)
>>> {
>>> - return split_huge_page_to_list(page, NULL);
>>> + return split_huge_page_to_list_to_order(page, NULL, 0);
>>> }
>>> void deferred_split_folio(struct folio *folio);
>>>
>>> @@ -422,7 +423,8 @@ can_split_folio(struct folio *folio, int *pextra_pins)
>>> return false;
>>> }
>>> static inline int
>>> -split_huge_page_to_list(struct page *page, struct list_head *list)
>>> +split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>>> + unsigned int new_order)
>>> {
>>> return 0;
>>> }
>>> @@ -519,17 +521,20 @@ static inline bool thp_migration_supported(void)
>>> }
>>> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>>
>>> -static inline int split_folio_to_list(struct folio *folio,
>>> - struct list_head *list)
>>> +static inline int split_folio_to_list_to_order(struct folio *folio,
>>> + struct list_head *list, int new_order)
>>> {
>>> - return split_huge_page_to_list(&folio->page, list);
>>> + return split_huge_page_to_list_to_order(&folio->page, list, new_order);
>>> }
>>>
>>> -static inline int split_folio(struct folio *folio)
>>> +static inline int split_folio_to_order(struct folio *folio, int new_order)
>>> {
>>> - return split_folio_to_list(folio, NULL);
>>> + return split_folio_to_list_to_order(folio, NULL, new_order);
>>> }
>>>
>>> +#define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0)
>>> +#define split_folio(f) split_folio_to_order(f, 0)
>>> +
>>> /*
>>> * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
>>> * limitations in the implementation like arm64 MTE can override this to
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index b2df788c11fa..8b47a96a28f9 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -2770,7 +2770,6 @@ static void lru_add_page_tail(struct page *head, struct page *tail,
>>> struct lruvec *lruvec, struct list_head *list)
>>> {
>>> VM_BUG_ON_PAGE(!PageHead(head), head);
>>> - VM_BUG_ON_PAGE(PageCompound(tail), head);
>>> VM_BUG_ON_PAGE(PageLRU(tail), head);
>>> lockdep_assert_held(&lruvec->lru_lock);
>>>
>>> @@ -2791,7 +2790,8 @@ static void lru_add_page_tail(struct page *head, struct page *tail,
>>> }
>>>
>>> static void __split_huge_page_tail(struct folio *folio, int tail,
>>> - struct lruvec *lruvec, struct list_head *list)
>>> + struct lruvec *lruvec, struct list_head *list,
>>> + unsigned int new_order)
>>> {
>>> struct page *head = &folio->page;
>>> struct page *page_tail = head + tail;
>>> @@ -2861,10 +2861,15 @@ static void __split_huge_page_tail(struct folio *folio, int tail,
>>> * which needs correct compound_head().
>>> */
>>> clear_compound_head(page_tail);
>>> + if (new_order) {
>>> + prep_compound_page(page_tail, new_order);
>>> + folio_prep_large_rmappable(new_folio);
>>> + }
>>>
>>> /* Finally unfreeze refcount. Additional reference from page cache. */
>>> - page_ref_unfreeze(page_tail, 1 + (!folio_test_anon(folio) ||
>>> - folio_test_swapcache(folio)));
>>> + page_ref_unfreeze(page_tail,
>>> + 1 + ((!folio_test_anon(folio) || folio_test_swapcache(folio)) ?
>>> + folio_nr_pages(new_folio) : 0));
>>>
>>> if (folio_test_young(folio))
>>> folio_set_young(new_folio);
>>> @@ -2882,7 +2887,7 @@ static void __split_huge_page_tail(struct folio *folio, int tail,
>>> }
>>>
>>> static void __split_huge_page(struct page *page, struct list_head *list,
>>> - pgoff_t end)
>>> + pgoff_t end, unsigned int new_order)
>>> {
>>> struct folio *folio = page_folio(page);
>>> struct page *head = &folio->page;
>>> @@ -2890,11 +2895,12 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>>> struct address_space *swap_cache = NULL;
>>> unsigned long offset = 0;
>>> int i, nr_dropped = 0;
>>> + unsigned int new_nr = 1 << new_order;
>>> int order = folio_order(folio);
>>> unsigned int nr = 1 << order;
>>>
>>> /* complete memcg works before add pages to LRU */
>>> - split_page_memcg(head, order, 0);
>>> + split_page_memcg(head, order, new_order);
>>>
>>> if (folio_test_anon(folio) && folio_test_swapcache(folio)) {
>>> offset = swp_offset(folio->swap);
>>> @@ -2907,8 +2913,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>>>
>>> ClearPageHasHWPoisoned(head);
>>>
>>> - for (i = nr - 1; i >= 1; i--) {
>>> - __split_huge_page_tail(folio, i, lruvec, list);
>>> + for (i = nr - new_nr; i >= new_nr; i -= new_nr) {
>>> + __split_huge_page_tail(folio, i, lruvec, list, new_order);
>>> /* Some pages can be beyond EOF: drop them from page cache */
>>> if (head[i].index >= end) {
>>> struct folio *tail = page_folio(head + i);
>>> @@ -2929,24 +2935,30 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>>> }
>>> }
>>>
>>> - ClearPageCompound(head);
>>> + if (!new_order)
>>> + ClearPageCompound(head);
>>> + else {
>>> + struct folio *new_folio = (struct folio *)head;
>>> +
>>> + folio_set_order(new_folio, new_order);
>>> + }
>>> unlock_page_lruvec(lruvec);
>>> /* Caller disabled irqs, so they are still disabled here */
>>>
>>> - split_page_owner(head, order, 0);
>>> + split_page_owner(head, order, new_order);
>>>
>>> /* See comment in __split_huge_page_tail() */
>>> if (PageAnon(head)) {
>>> /* Additional pin to swap cache */
>>> if (PageSwapCache(head)) {
>>> - page_ref_add(head, 2);
>>> + page_ref_add(head, 1 + new_nr);
>>> xa_unlock(&swap_cache->i_pages);
>>> } else {
>>> page_ref_inc(head);
>>> }
>>> } else {
>>> /* Additional pin to page cache */
>>> - page_ref_add(head, 2);
>>> + page_ref_add(head, 1 + new_nr);
>>> xa_unlock(&head->mapping->i_pages);
>>> }
>>> local_irq_enable();
>>> @@ -2958,7 +2970,15 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>>> if (folio_test_swapcache(folio))
>>> split_swap_cluster(folio->swap);
>>>
>>> - for (i = 0; i < nr; i++) {
>>> + /*
>>> + * set page to its compound_head when split to non order-0 pages, so
>>> + * we can skip unlocking it below, since PG_locked is transferred to
>>> + * the compound_head of the page and the caller will unlock it.
>>> + */
>>> + if (new_order)
>>> + page = compound_head(page);
>>> +
>>> + for (i = 0; i < nr; i += new_nr) {
>>> struct page *subpage = head + i;
>>> if (subpage == page)
>>> continue;
>>> @@ -2992,29 +3012,36 @@ bool can_split_folio(struct folio *folio, int *pextra_pins)
>>> }
>>>
>>> /*
>>> - * This function splits huge page into normal pages. @page can point to any
>>> - * subpage of huge page to split. Split doesn't change the position of @page.
>>> + * This function splits huge page into pages in @new_order. @page can point to
>>> + * any subpage of huge page to split. Split doesn't change the position of
>>> + * @page.
>>> + *
>>> + * NOTE: order-1 anonymous folio is not supported because _deferred_list,
>>> + * which is used by partially mapped folios, is stored in subpage 2 and an
>>> + * order-1 folio only has subpage 0 and 1. File-backed order-1 folios are OK,
>>> + * since they do not use _deferred_list.
>>> *
>>> * Only caller must hold pin on the @page, otherwise split fails with -EBUSY.
>>> * The huge page must be locked.
>>> *
>>> * If @list is null, tail pages will be added to LRU list, otherwise, to @list.
>>> *
>>> - * Both head page and tail pages will inherit mapping, flags, and so on from
>>> - * the hugepage.
>>> + * Pages in new_order will inherit mapping, flags, and so on from the hugepage.
>>> *
>>> - * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
>>> - * they are not mapped.
>>> + * GUP pin and PG_locked transferred to @page or the compound page @page belongs
>>> + * to. Rest subpages can be freed if they are not mapped.
>>> *
>>> * Returns 0 if the hugepage is split successfully.
>>> * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
>>> * us.
>>> */
>>> -int split_huge_page_to_list(struct page *page, struct list_head *list)
>>> +int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>>> + unsigned int new_order)
>>> {
>>> struct folio *folio = page_folio(page);
>>> struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>> - XA_STATE(xas, &folio->mapping->i_pages, folio->index);
>>> + /* reset xarray order to new order after split */
>>> + XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
>>> struct anon_vma *anon_vma = NULL;
>>> struct address_space *mapping = NULL;
>>> int extra_pins, ret;
>>> @@ -3024,6 +3051,34 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>>> VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
>>> VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
>>>
>>> + /* Cannot split anonymous THP to order-1 */
>>> + if (new_order == 1 && folio_test_anon(folio)) {
>>> + VM_WARN_ONCE(1, "Cannot split to order-1 folio");
>>> + return -EINVAL;
>>> + }
>>> +
>>> + if (new_order) {
>>> + /* Only swapping a whole PMD-mapped folio is supported */
>>> + if (folio_test_swapcache(folio)) {
>>> + VM_WARN_ONCE(1,
>>> + "Cannot split swap-cached folio to non-0 order");
>>
>> My understanding may be wrong here, but can't the folio be moved to swapcache
>> asynchronously? How does the caller guarrantee that the folio is not in
>> swapcache and will not be moved between the call to
>> split_huge_page_to_list_to_order() and this test? If the caller can't prevent
>> it, then isn't it wrong to raise a warning here? Perhaps you just have to fail
>> to split?
>
> Right. That is why I only use VM_WARN_ONCE here. You mean it is better to
> get rid of the warning. I have no strong preference about it.

Yes; I don't think we should be issuing warnings when the caller has done
nothing wrong?

>
>>
>> I'm guessing this restriction is because swap only supports order-0 and
>> pmd-order folios currently? (And you only have split_swap_cluster() to downgrade
>> from pmd-order to order-0). Perhaps you need my series that allows swapping out
>> any order THP? Current version at [1] but I'm working on a new version.
>>
>> [1] https://lore.kernel.org/linux-mm/[email protected]/
>
> Right. Once your patchset is in, the above check can be removed.
>
>>> + return -EINVAL;
>>> + }
>>> + /* Split shmem folio to non-zero order not supported */
>>> + if (shmem_mapping(folio->mapping)) {
>>> + VM_WARN_ONCE(1,
>>> + "Cannot split shmem folio to non-0 order");
>>> + return -EINVAL;
>>> + }
>>> + /* No split if the file system does not support large folio */
>>> + if (!mapping_large_folio_support(folio->mapping)) {
>>> + VM_WARN_ONCE(1,
>>> + "Cannot split file folio to non-0 order");
>>> + return -EINVAL;
>>> + }
>>> + }
>>> +
>>> +
>>> is_hzp = is_huge_zero_page(&folio->page);
>>> if (is_hzp) {
>>> pr_warn_ratelimited("Called split_huge_page for huge zero page\n");
>>> @@ -3120,14 +3175,21 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>>> if (folio_order(folio) > 1 &&
>>> !list_empty(&folio->_deferred_list)) {
>>> ds_queue->split_queue_len--;
>>> - list_del(&folio->_deferred_list);
>>> + /*
>>> + * Reinitialize page_deferred_list after removing the
>>> + * page from the split_queue, otherwise a subsequent
>>> + * split will see list corruption when checking the
>>> + * page_deferred_list.
>>> + */
>>> + list_del_init(&folio->_deferred_list);
>>> }
>>> spin_unlock(&ds_queue->split_queue_lock);
>>> if (mapping) {
>>> int nr = folio_nr_pages(folio);
>>>
>>> xas_split(&xas, folio, folio_order(folio));
>>> - if (folio_test_pmd_mappable(folio)) {
>>> + if (folio_test_pmd_mappable(folio) &&
>>> + new_order < HPAGE_PMD_ORDER) {
>>> if (folio_test_swapbacked(folio)) {
>>> __lruvec_stat_mod_folio(folio,
>>> NR_SHMEM_THPS, -nr);
>>> @@ -3139,7 +3201,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>>> }
>>> }
>>>
>>> - __split_huge_page(page, list, end);
>>> + __split_huge_page(page, list, end, new_order);
>>> ret = 0;
>>> } else {
>>> spin_unlock(&ds_queue->split_queue_lock);
>
>
> --
> Best Regards,
> Yan, Zi


2024-02-28 15:52:42

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 7/8] mm: thp: split huge page to any lower order pages

Hi Andrew,

On 26 Feb 2024, at 15:55, Zi Yan wrote:

> From: Zi Yan <[email protected]>
>
> To split a THP to any lower order pages, we need to reform THPs on
> subpages at given order and add page refcount based on the new page
> order. Also we need to reinitialize page_deferred_list after removing
> the page from the split_queue, otherwise a subsequent split will
> see list corruption when checking the page_deferred_list again.
>
> Note: Anonymous order-1 folio is not supported because _deferred_list,
> which is used by partially mapped folios, is stored in subpage 2 and an
> order-1 folio only has subpage 0 and 1. File-backed order-1 folios are
> fine, since they do not use _deferred_list.
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> include/linux/huge_mm.h | 21 +++++---
> mm/huge_memory.c | 110 +++++++++++++++++++++++++++++++---------
> 2 files changed, 99 insertions(+), 32 deletions(-)
>

Can you fold the fixup below into this patch (per discussion with Ryan at [1])? Thanks.

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 50d146eb248f..fd745bcc97ff 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3059,11 +3059,8 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,

if (new_order) {
/* Only swapping a whole PMD-mapped folio is supported */
- if (folio_test_swapcache(folio)) {
- VM_WARN_ONCE(1,
- "Cannot split swap-cached folio to non-0 order");
+ if (folio_test_swapcache(folio))
return -EINVAL;
- }
/* Split shmem folio to non-zero order not supported */
if (shmem_mapping(folio->mapping)) {
VM_WARN_ONCE(1,


[1] https://lore.kernel.org/linux-mm/[email protected]/

--
Best Regards,
Yan, Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2024-02-28 16:26:56

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 7/8] mm: thp: split huge page to any lower order pages

On 28 Feb 2024, at 3:23, Ryan Roberts wrote:

> Hi Zi,
>
>
> On 26/02/2024 20:55, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> To split a THP to any lower order pages, we need to reform THPs on
>> subpages at given order and add page refcount based on the new page
>> order. Also we need to reinitialize page_deferred_list after removing
>> the page from the split_queue, otherwise a subsequent split will
>> see list corruption when checking the page_deferred_list again.
>>
>> Note: Anonymous order-1 folio is not supported because _deferred_list,
>> which is used by partially mapped folios, is stored in subpage 2 and an
>> order-1 folio only has subpage 0 and 1. File-backed order-1 folios are
>> fine, since they do not use _deferred_list.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> include/linux/huge_mm.h | 21 +++++---
>> mm/huge_memory.c | 110 +++++++++++++++++++++++++++++++---------
>> 2 files changed, 99 insertions(+), 32 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 5adb86af35fc..de0c89105076 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -265,10 +265,11 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
>>
>> void folio_prep_large_rmappable(struct folio *folio);
>> bool can_split_folio(struct folio *folio, int *pextra_pins);
>> -int split_huge_page_to_list(struct page *page, struct list_head *list);
>> +int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>> + unsigned int new_order);
>> static inline int split_huge_page(struct page *page)
>> {
>> - return split_huge_page_to_list(page, NULL);
>> + return split_huge_page_to_list_to_order(page, NULL, 0);
>> }
>> void deferred_split_folio(struct folio *folio);
>>
>> @@ -422,7 +423,8 @@ can_split_folio(struct folio *folio, int *pextra_pins)
>> return false;
>> }
>> static inline int
>> -split_huge_page_to_list(struct page *page, struct list_head *list)
>> +split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>> + unsigned int new_order)
>> {
>> return 0;
>> }
>> @@ -519,17 +521,20 @@ static inline bool thp_migration_supported(void)
>> }
>> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>
>> -static inline int split_folio_to_list(struct folio *folio,
>> - struct list_head *list)
>> +static inline int split_folio_to_list_to_order(struct folio *folio,
>> + struct list_head *list, int new_order)
>> {
>> - return split_huge_page_to_list(&folio->page, list);
>> + return split_huge_page_to_list_to_order(&folio->page, list, new_order);
>> }
>>
>> -static inline int split_folio(struct folio *folio)
>> +static inline int split_folio_to_order(struct folio *folio, int new_order)
>> {
>> - return split_folio_to_list(folio, NULL);
>> + return split_folio_to_list_to_order(folio, NULL, new_order);
>> }
>>
>> +#define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0)
>> +#define split_folio(f) split_folio_to_order(f, 0)
>> +
>> /*
>> * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
>> * limitations in the implementation like arm64 MTE can override this to
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index b2df788c11fa..8b47a96a28f9 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2770,7 +2770,6 @@ static void lru_add_page_tail(struct page *head, struct page *tail,
>> struct lruvec *lruvec, struct list_head *list)
>> {
>> VM_BUG_ON_PAGE(!PageHead(head), head);
>> - VM_BUG_ON_PAGE(PageCompound(tail), head);
>> VM_BUG_ON_PAGE(PageLRU(tail), head);
>> lockdep_assert_held(&lruvec->lru_lock);
>>
>> @@ -2791,7 +2790,8 @@ static void lru_add_page_tail(struct page *head, struct page *tail,
>> }
>>
>> static void __split_huge_page_tail(struct folio *folio, int tail,
>> - struct lruvec *lruvec, struct list_head *list)
>> + struct lruvec *lruvec, struct list_head *list,
>> + unsigned int new_order)
>> {
>> struct page *head = &folio->page;
>> struct page *page_tail = head + tail;
>> @@ -2861,10 +2861,15 @@ static void __split_huge_page_tail(struct folio *folio, int tail,
>> * which needs correct compound_head().
>> */
>> clear_compound_head(page_tail);
>> + if (new_order) {
>> + prep_compound_page(page_tail, new_order);
>> + folio_prep_large_rmappable(new_folio);
>> + }
>>
>> /* Finally unfreeze refcount. Additional reference from page cache. */
>> - page_ref_unfreeze(page_tail, 1 + (!folio_test_anon(folio) ||
>> - folio_test_swapcache(folio)));
>> + page_ref_unfreeze(page_tail,
>> + 1 + ((!folio_test_anon(folio) || folio_test_swapcache(folio)) ?
>> + folio_nr_pages(new_folio) : 0));
>>
>> if (folio_test_young(folio))
>> folio_set_young(new_folio);
>> @@ -2882,7 +2887,7 @@ static void __split_huge_page_tail(struct folio *folio, int tail,
>> }
>>
>> static void __split_huge_page(struct page *page, struct list_head *list,
>> - pgoff_t end)
>> + pgoff_t end, unsigned int new_order)
>> {
>> struct folio *folio = page_folio(page);
>> struct page *head = &folio->page;
>> @@ -2890,11 +2895,12 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>> struct address_space *swap_cache = NULL;
>> unsigned long offset = 0;
>> int i, nr_dropped = 0;
>> + unsigned int new_nr = 1 << new_order;
>> int order = folio_order(folio);
>> unsigned int nr = 1 << order;
>>
>> /* complete memcg works before add pages to LRU */
>> - split_page_memcg(head, order, 0);
>> + split_page_memcg(head, order, new_order);
>>
>> if (folio_test_anon(folio) && folio_test_swapcache(folio)) {
>> offset = swp_offset(folio->swap);
>> @@ -2907,8 +2913,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>>
>> ClearPageHasHWPoisoned(head);
>>
>> - for (i = nr - 1; i >= 1; i--) {
>> - __split_huge_page_tail(folio, i, lruvec, list);
>> + for (i = nr - new_nr; i >= new_nr; i -= new_nr) {
>> + __split_huge_page_tail(folio, i, lruvec, list, new_order);
>> /* Some pages can be beyond EOF: drop them from page cache */
>> if (head[i].index >= end) {
>> struct folio *tail = page_folio(head + i);
>> @@ -2929,24 +2935,30 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>> }
>> }
>>
>> - ClearPageCompound(head);
>> + if (!new_order)
>> + ClearPageCompound(head);
>> + else {
>> + struct folio *new_folio = (struct folio *)head;
>> +
>> + folio_set_order(new_folio, new_order);
>> + }
>> unlock_page_lruvec(lruvec);
>> /* Caller disabled irqs, so they are still disabled here */
>>
>> - split_page_owner(head, order, 0);
>> + split_page_owner(head, order, new_order);
>>
>> /* See comment in __split_huge_page_tail() */
>> if (PageAnon(head)) {
>> /* Additional pin to swap cache */
>> if (PageSwapCache(head)) {
>> - page_ref_add(head, 2);
>> + page_ref_add(head, 1 + new_nr);
>> xa_unlock(&swap_cache->i_pages);
>> } else {
>> page_ref_inc(head);
>> }
>> } else {
>> /* Additional pin to page cache */
>> - page_ref_add(head, 2);
>> + page_ref_add(head, 1 + new_nr);
>> xa_unlock(&head->mapping->i_pages);
>> }
>> local_irq_enable();
>> @@ -2958,7 +2970,15 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>> if (folio_test_swapcache(folio))
>> split_swap_cluster(folio->swap);
>>
>> - for (i = 0; i < nr; i++) {
>> + /*
>> + * set page to its compound_head when split to non order-0 pages, so
>> + * we can skip unlocking it below, since PG_locked is transferred to
>> + * the compound_head of the page and the caller will unlock it.
>> + */
>> + if (new_order)
>> + page = compound_head(page);
>> +
>> + for (i = 0; i < nr; i += new_nr) {
>> struct page *subpage = head + i;
>> if (subpage == page)
>> continue;
>> @@ -2992,29 +3012,36 @@ bool can_split_folio(struct folio *folio, int *pextra_pins)
>> }
>>
>> /*
>> - * This function splits huge page into normal pages. @page can point to any
>> - * subpage of huge page to split. Split doesn't change the position of @page.
>> + * This function splits huge page into pages in @new_order. @page can point to
>> + * any subpage of huge page to split. Split doesn't change the position of
>> + * @page.
>> + *
>> + * NOTE: order-1 anonymous folio is not supported because _deferred_list,
>> + * which is used by partially mapped folios, is stored in subpage 2 and an
>> + * order-1 folio only has subpage 0 and 1. File-backed order-1 folios are OK,
>> + * since they do not use _deferred_list.
>> *
>> * Only caller must hold pin on the @page, otherwise split fails with -EBUSY.
>> * The huge page must be locked.
>> *
>> * If @list is null, tail pages will be added to LRU list, otherwise, to @list.
>> *
>> - * Both head page and tail pages will inherit mapping, flags, and so on from
>> - * the hugepage.
>> + * Pages in new_order will inherit mapping, flags, and so on from the hugepage.
>> *
>> - * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
>> - * they are not mapped.
>> + * GUP pin and PG_locked transferred to @page or the compound page @page belongs
>> + * to. Rest subpages can be freed if they are not mapped.
>> *
>> * Returns 0 if the hugepage is split successfully.
>> * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
>> * us.
>> */
>> -int split_huge_page_to_list(struct page *page, struct list_head *list)
>> +int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>> + unsigned int new_order)
>> {
>> struct folio *folio = page_folio(page);
>> struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>> - XA_STATE(xas, &folio->mapping->i_pages, folio->index);
>> + /* reset xarray order to new order after split */
>> + XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
>> struct anon_vma *anon_vma = NULL;
>> struct address_space *mapping = NULL;
>> int extra_pins, ret;
>> @@ -3024,6 +3051,34 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>> VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
>> VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
>>
>> + /* Cannot split anonymous THP to order-1 */
>> + if (new_order == 1 && folio_test_anon(folio)) {
>> + VM_WARN_ONCE(1, "Cannot split to order-1 folio");
>> + return -EINVAL;
>> + }
>> +
>> + if (new_order) {
>> + /* Only swapping a whole PMD-mapped folio is supported */
>> + if (folio_test_swapcache(folio)) {
>> + VM_WARN_ONCE(1,
>> + "Cannot split swap-cached folio to non-0 order");
>
> My understanding may be wrong here, but can't the folio be moved to swapcache
> asynchronously? How does the caller guarrantee that the folio is not in
> swapcache and will not be moved between the call to
> split_huge_page_to_list_to_order() and this test? If the caller can't prevent
> it, then isn't it wrong to raise a warning here? Perhaps you just have to fail
> to split?

Right. That is why I only use VM_WARN_ONCE here. You mean it is better to
get rid of the warning. I have no strong preference about it.

>
> I'm guessing this restriction is because swap only supports order-0 and
> pmd-order folios currently? (And you only have split_swap_cluster() to downgrade
> from pmd-order to order-0). Perhaps you need my series that allows swapping out
> any order THP? Current version at [1] but I'm working on a new version.
>
> [1] https://lore.kernel.org/linux-mm/[email protected]/

Right. Once your patchset is in, the above check can be removed.

>> + return -EINVAL;
>> + }
>> + /* Split shmem folio to non-zero order not supported */
>> + if (shmem_mapping(folio->mapping)) {
>> + VM_WARN_ONCE(1,
>> + "Cannot split shmem folio to non-0 order");
>> + return -EINVAL;
>> + }
>> + /* No split if the file system does not support large folio */
>> + if (!mapping_large_folio_support(folio->mapping)) {
>> + VM_WARN_ONCE(1,
>> + "Cannot split file folio to non-0 order");
>> + return -EINVAL;
>> + }
>> + }
>> +
>> +
>> is_hzp = is_huge_zero_page(&folio->page);
>> if (is_hzp) {
>> pr_warn_ratelimited("Called split_huge_page for huge zero page\n");
>> @@ -3120,14 +3175,21 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>> if (folio_order(folio) > 1 &&
>> !list_empty(&folio->_deferred_list)) {
>> ds_queue->split_queue_len--;
>> - list_del(&folio->_deferred_list);
>> + /*
>> + * Reinitialize page_deferred_list after removing the
>> + * page from the split_queue, otherwise a subsequent
>> + * split will see list corruption when checking the
>> + * page_deferred_list.
>> + */
>> + list_del_init(&folio->_deferred_list);
>> }
>> spin_unlock(&ds_queue->split_queue_lock);
>> if (mapping) {
>> int nr = folio_nr_pages(folio);
>>
>> xas_split(&xas, folio, folio_order(folio));
>> - if (folio_test_pmd_mappable(folio)) {
>> + if (folio_test_pmd_mappable(folio) &&
>> + new_order < HPAGE_PMD_ORDER) {
>> if (folio_test_swapbacked(folio)) {
>> __lruvec_stat_mod_folio(folio,
>> NR_SHMEM_THPS, -nr);
>> @@ -3139,7 +3201,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>> }
>> }
>>
>> - __split_huge_page(page, list, end);
>> + __split_huge_page(page, list, end, new_order);
>> ret = 0;
>> } else {
>> spin_unlock(&ds_queue->split_queue_lock);


--
Best Regards,
Yan, Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2024-03-01 09:51:39

by Aishwarya TCV

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.



On 26/02/2024 20:55, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
> Also add test cases for split_huge_page_to_list_to_order via both
> debugfs.
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> mm/huge_memory.c | 34 ++++--
> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
> 2 files changed, 131 insertions(+), 18 deletions(-)
>

Hi Zi,

When booting the kernel against next-master(20240228)with Arm64 on
Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
is failing in our CI (with rootfs over NFS). I can send the full logs if
required.

A bisect (full log below) identified this patch as introducing the
failure. Bisected it on the tag "next-20240228" at repo
"https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".

This works fine on Linux version 6.8.0-rc6


Sample log from failure against run on TX2:
------
07:17:34.056125 # # ------------------------------
07:17:34.056543 # # running ./split_huge_page_test
07:17:34.056839 # # ------------------------------
07:17:34.057114 # # TAP version 13
07:17:34.058564 # # 1..12
07:17:34.156822 # # ok 1 Split huge pages successful
07:17:34.214074 # # ok 2 Split PTE-mapped huge pages successful
07:17:34.215630 # # # Please enable pr_debug in
split_huge_pages_in_file() for more info.
07:17:34.225503 # # # Please check dmesg for more information
07:17:34.225862 # # ok 3 File-backed THP split test done
07:17:34.236944 # # Bail out! Failed to create a file at /mnt/thp_fs#
Planned tests != run tests (12 != 3)
07:17:34.237307 # # # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
07:17:34.237620 # # [FAIL]
07:17:34.246430 # not ok 51 split_huge_page_test # exit=1


Bisect log:
------
git bisect start
# good: [d206a76d7d2726f3b096037f2079ce0bd3ba329b] Linux 6.8-rc6
git bisect good d206a76d7d2726f3b096037f2079ce0bd3ba329b
# bad: [20af1ca418d2c0b11bc2a1fe8c0c88f67bcc2a7e] Add linux-next
specific files for 20240228
git bisect bad 20af1ca418d2c0b11bc2a1fe8c0c88f67bcc2a7e
# bad: [1322f1801e59dddce10591d602d246c1bf49990c] Merge branch 'main' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
git bisect bad 1322f1801e59dddce10591d602d246c1bf49990c
# bad: [a82f70041487790b7b09fe4bb45436e1b57021d3] Merge branch 'dev' of
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git
git bisect bad a82f70041487790b7b09fe4bb45436e1b57021d3
# bad: [ce90480b9352ba2bebe8946dad9223e3f24c6e9a] Merge branch
'for-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap.git
git bisect bad ce90480b9352ba2bebe8946dad9223e3f24c6e9a
# bad: [5daac92ed3881fd0c656478a301a4e1d124100ee] Merge branch
'mm-everything' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect bad 5daac92ed3881fd0c656478a301a4e1d124100ee
# good: [acc2643d9e988c63dd4629a9af380ad9ac69c54a] Merge branch
'mm-stable' into mm-unstable
git bisect good acc2643d9e988c63dd4629a9af380ad9ac69c54a
# good: [0294de8fe7d7c1a7eddc979cbf4c1886406e36b7] Merge branch 'fixes'
of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git
git bisect good 0294de8fe7d7c1a7eddc979cbf4c1886406e36b7
# good: [83e0c8f0e777a1ef0977b2f8189101765703b32d] Merge branch
'mm-nonmm-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect good 83e0c8f0e777a1ef0977b2f8189101765703b32d
# good: [a739cbe236e0dd3b6ff26a01fa1d31c73d4fac93] mm: memcg: make memcg
huge page split support any order split
git bisect good a739cbe236e0dd3b6ff26a01fa1d31c73d4fac93
# bad: [efb520aa333b2f11daaaaa13f4a598b5ae4ae823] mm: allow non-hugetlb
large folios to be batch processed
git bisect bad efb520aa333b2f11daaaaa13f4a598b5ae4ae823
# bad: [2258bdebb55e3ad3d30fd3849ddb955ff36825de] mm/zsmalloc: don't
hold locks of all pages when free_zspage()
git bisect bad 2258bdebb55e3ad3d30fd3849ddb955ff36825de
# bad: [7fc0be45acf2878cbacc4dba56923c34c3fd8b1e] mm: remove
total_mapcount()
git bisect bad 7fc0be45acf2878cbacc4dba56923c34c3fd8b1e
# good: [d55fac55da2f87ad5a99178e107df09770bbc411] mm: thp: split huge
page to any lower order pages
git bisect good d55fac55da2f87ad5a99178e107df09770bbc411
# bad: [4050d591c1aaf9336c08511fa5984827186e9ad1] mm/memfd: refactor
memfd_tag_pins() and memfd_wait_for_pins()
git bisect bad 4050d591c1aaf9336c08511fa5984827186e9ad1
# bad: [c0ba89c29ef559c95273feb481b049f622c43c17] mm: huge_memory:
enable debugfs to split huge pages to any order
git bisect bad c0ba89c29ef559c95273feb481b049f622c43c17
# first bad commit: [c0ba89c29ef559c95273feb481b049f622c43c17] mm:
huge_memory: enable debugfs to split huge pages to any order


Thanks,
Aishwarya

2024-03-01 10:33:36

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 01/03/2024 09:51, Aishwarya TCV wrote:
>
>
> On 26/02/2024 20:55, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>> Also add test cases for split_huge_page_to_list_to_order via both
>> debugfs.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> mm/huge_memory.c | 34 ++++--
>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>
>
> Hi Zi,
>
> When booting the kernel against next-master(20240228)with Arm64 on
> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
> is failing in our CI (with rootfs over NFS). I can send the full logs if
> required.

Just to add, I took a quick eyeball and I think there a couple of potential issues:

- In create_pagecache_thp_and_fd() you do *fd = open(testfile, O_CREAT ...);
where testfile is /mnt/thp_fs/testfile. So if /mnt/thp_fs doesn't exist,
then the open will fail I think? I'm pretty sure that's what's happening on
our CI. Suggest the test needs to setup this dir itself. Is thp_fs a mounted
fs or just a dir? If the latter can you just mktemp()?

- Later in create_pagecache_thp_and_fd() you fail the test if you don't have a
filesystem that supports large folios. Can we turn that into a skip? That
would reduce noise on the CI.

Thanks,
Ryan

>
> A bisect (full log below) identified this patch as introducing the
> failure. Bisected it on the tag "next-20240228" at repo
> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>
> This works fine on Linux version 6.8.0-rc6
>
>
> Sample log from failure against run on TX2:
> ------
> 07:17:34.056125 # # ------------------------------
> 07:17:34.056543 # # running ./split_huge_page_test
> 07:17:34.056839 # # ------------------------------
> 07:17:34.057114 # # TAP version 13
> 07:17:34.058564 # # 1..12
> 07:17:34.156822 # # ok 1 Split huge pages successful
> 07:17:34.214074 # # ok 2 Split PTE-mapped huge pages successful
> 07:17:34.215630 # # # Please enable pr_debug in
> split_huge_pages_in_file() for more info.
> 07:17:34.225503 # # # Please check dmesg for more information
> 07:17:34.225862 # # ok 3 File-backed THP split test done
> 07:17:34.236944 # # Bail out! Failed to create a file at /mnt/thp_fs#
> Planned tests != run tests (12 != 3)
> 07:17:34.237307 # # # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
> 07:17:34.237620 # # [FAIL]
> 07:17:34.246430 # not ok 51 split_huge_page_test # exit=1
>
>
> Bisect log:
> ------
> git bisect start
> # good: [d206a76d7d2726f3b096037f2079ce0bd3ba329b] Linux 6.8-rc6
> git bisect good d206a76d7d2726f3b096037f2079ce0bd3ba329b
> # bad: [20af1ca418d2c0b11bc2a1fe8c0c88f67bcc2a7e] Add linux-next
> specific files for 20240228
> git bisect bad 20af1ca418d2c0b11bc2a1fe8c0c88f67bcc2a7e
> # bad: [1322f1801e59dddce10591d602d246c1bf49990c] Merge branch 'main' of
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
> git bisect bad 1322f1801e59dddce10591d602d246c1bf49990c
> # bad: [a82f70041487790b7b09fe4bb45436e1b57021d3] Merge branch 'dev' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git
> git bisect bad a82f70041487790b7b09fe4bb45436e1b57021d3
> # bad: [ce90480b9352ba2bebe8946dad9223e3f24c6e9a] Merge branch
> 'for-next' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap.git
> git bisect bad ce90480b9352ba2bebe8946dad9223e3f24c6e9a
> # bad: [5daac92ed3881fd0c656478a301a4e1d124100ee] Merge branch
> 'mm-everything' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> git bisect bad 5daac92ed3881fd0c656478a301a4e1d124100ee
> # good: [acc2643d9e988c63dd4629a9af380ad9ac69c54a] Merge branch
> 'mm-stable' into mm-unstable
> git bisect good acc2643d9e988c63dd4629a9af380ad9ac69c54a
> # good: [0294de8fe7d7c1a7eddc979cbf4c1886406e36b7] Merge branch 'fixes'
> of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git
> git bisect good 0294de8fe7d7c1a7eddc979cbf4c1886406e36b7
> # good: [83e0c8f0e777a1ef0977b2f8189101765703b32d] Merge branch
> 'mm-nonmm-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> git bisect good 83e0c8f0e777a1ef0977b2f8189101765703b32d
> # good: [a739cbe236e0dd3b6ff26a01fa1d31c73d4fac93] mm: memcg: make memcg
> huge page split support any order split
> git bisect good a739cbe236e0dd3b6ff26a01fa1d31c73d4fac93
> # bad: [efb520aa333b2f11daaaaa13f4a598b5ae4ae823] mm: allow non-hugetlb
> large folios to be batch processed
> git bisect bad efb520aa333b2f11daaaaa13f4a598b5ae4ae823
> # bad: [2258bdebb55e3ad3d30fd3849ddb955ff36825de] mm/zsmalloc: don't
> hold locks of all pages when free_zspage()
> git bisect bad 2258bdebb55e3ad3d30fd3849ddb955ff36825de
> # bad: [7fc0be45acf2878cbacc4dba56923c34c3fd8b1e] mm: remove
> total_mapcount()
> git bisect bad 7fc0be45acf2878cbacc4dba56923c34c3fd8b1e
> # good: [d55fac55da2f87ad5a99178e107df09770bbc411] mm: thp: split huge
> page to any lower order pages
> git bisect good d55fac55da2f87ad5a99178e107df09770bbc411
> # bad: [4050d591c1aaf9336c08511fa5984827186e9ad1] mm/memfd: refactor
> memfd_tag_pins() and memfd_wait_for_pins()
> git bisect bad 4050d591c1aaf9336c08511fa5984827186e9ad1
> # bad: [c0ba89c29ef559c95273feb481b049f622c43c17] mm: huge_memory:
> enable debugfs to split huge pages to any order
> git bisect bad c0ba89c29ef559c95273feb481b049f622c43c17
> # first bad commit: [c0ba89c29ef559c95273feb481b049f622c43c17] mm:
> huge_memory: enable debugfs to split huge pages to any order
>
>
> Thanks,
> Aishwarya


2024-03-01 12:12:17

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On Fri, Mar 01, 2024 at 10:33:15AM +0000, Ryan Roberts wrote:

> - In create_pagecache_thp_and_fd() you do *fd = open(testfile, O_CREAT ...);
> where testfile is /mnt/thp_fs/testfile. So if /mnt/thp_fs doesn't exist,
> then the open will fail I think? I'm pretty sure that's what's happening on
> our CI. Suggest the test needs to setup this dir itself. Is thp_fs a mounted
> fs or just a dir? If the latter can you just mktemp()?

Mounting on /mnt would also be a bit of an issue, that's something
people are relatively likely to have used for something so could be
disruptive. If the test is going to do a new mount it's probably better
to do something like make a temporary directory then mount on top of that.


Attachments:
(No filename) (745.00 B)
signature.asc (499.00 B)
Download all attachments

2024-03-01 12:53:07

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 1 Mar 2024, at 5:33, Ryan Roberts wrote:

> On 01/03/2024 09:51, Aishwarya TCV wrote:
>>
>>
>> On 26/02/2024 20:55, Zi Yan wrote:
>>> From: Zi Yan <[email protected]>
>>>
>>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>>> Also add test cases for split_huge_page_to_list_to_order via both
>>> debugfs.
>>>
>>> Signed-off-by: Zi Yan <[email protected]>
>>> ---
>>> mm/huge_memory.c | 34 ++++--
>>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>>
>>
>> Hi Zi,
>>
>> When booting the kernel against next-master(20240228)with Arm64 on
>> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
>> is failing in our CI (with rootfs over NFS). I can send the full logs if
>> required.
>
> Just to add, I took a quick eyeball and I think there a couple of potential issues:
>
> - In create_pagecache_thp_and_fd() you do *fd = open(testfile, O_CREAT ...);
> where testfile is /mnt/thp_fs/testfile. So if /mnt/thp_fs doesn't exist,
> then the open will fail I think? I'm pretty sure that's what's happening on
> our CI. Suggest the test needs to setup this dir itself. Is thp_fs a mounted
> fs or just a dir? If the latter can you just mktemp()?

The former. the page cache folio split tests require a file system supporting
large folio and I used XFS.

> - Later in create_pagecache_thp_and_fd() you fail the test if you don't have a
> filesystem that supports large folios. Can we turn that into a skip? That
> would reduce noise on the CI.

I can do that. But is this a new requirement that self tests have to be finish
in CI/CD environment? Can you provide a guideline for it? Since I always assume
selftests are just ran by human who can set up environment. In addition, I do
not think it is realistic to make the test file to set up all the environment,
since everyone's machine is different. It is much easier to make the CI/CD
environment to make the mount.

>
> Thanks,
> Ryan
>
>>
>> A bisect (full log below) identified this patch as introducing the
>> failure. Bisected it on the tag "next-20240228" at repo
>> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>>
>> This works fine on Linux version 6.8.0-rc6
>>
>>
>> Sample log from failure against run on TX2:
>> ------
>> 07:17:34.056125 # # ------------------------------
>> 07:17:34.056543 # # running ./split_huge_page_test
>> 07:17:34.056839 # # ------------------------------
>> 07:17:34.057114 # # TAP version 13
>> 07:17:34.058564 # # 1..12
>> 07:17:34.156822 # # ok 1 Split huge pages successful
>> 07:17:34.214074 # # ok 2 Split PTE-mapped huge pages successful
>> 07:17:34.215630 # # # Please enable pr_debug in
>> split_huge_pages_in_file() for more info.
>> 07:17:34.225503 # # # Please check dmesg for more information
>> 07:17:34.225862 # # ok 3 File-backed THP split test done
>> 07:17:34.236944 # # Bail out! Failed to create a file at /mnt/thp_fs#
>> Planned tests != run tests (12 != 3)
>> 07:17:34.237307 # # # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
>> 07:17:34.237620 # # [FAIL]
>> 07:17:34.246430 # not ok 51 split_huge_page_test # exit=1
>>
>>
>> Bisect log:
>> ------
>> git bisect start
>> # good: [d206a76d7d2726f3b096037f2079ce0bd3ba329b] Linux 6.8-rc6
>> git bisect good d206a76d7d2726f3b096037f2079ce0bd3ba329b
>> # bad: [20af1ca418d2c0b11bc2a1fe8c0c88f67bcc2a7e] Add linux-next
>> specific files for 20240228
>> git bisect bad 20af1ca418d2c0b11bc2a1fe8c0c88f67bcc2a7e
>> # bad: [1322f1801e59dddce10591d602d246c1bf49990c] Merge branch 'main' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
>> git bisect bad 1322f1801e59dddce10591d602d246c1bf49990c
>> # bad: [a82f70041487790b7b09fe4bb45436e1b57021d3] Merge branch 'dev' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git
>> git bisect bad a82f70041487790b7b09fe4bb45436e1b57021d3
>> # bad: [ce90480b9352ba2bebe8946dad9223e3f24c6e9a] Merge branch
>> 'for-next' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap.git
>> git bisect bad ce90480b9352ba2bebe8946dad9223e3f24c6e9a
>> # bad: [5daac92ed3881fd0c656478a301a4e1d124100ee] Merge branch
>> 'mm-everything' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>> git bisect bad 5daac92ed3881fd0c656478a301a4e1d124100ee
>> # good: [acc2643d9e988c63dd4629a9af380ad9ac69c54a] Merge branch
>> 'mm-stable' into mm-unstable
>> git bisect good acc2643d9e988c63dd4629a9af380ad9ac69c54a
>> # good: [0294de8fe7d7c1a7eddc979cbf4c1886406e36b7] Merge branch 'fixes'
>> of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git
>> git bisect good 0294de8fe7d7c1a7eddc979cbf4c1886406e36b7
>> # good: [83e0c8f0e777a1ef0977b2f8189101765703b32d] Merge branch
>> 'mm-nonmm-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>> git bisect good 83e0c8f0e777a1ef0977b2f8189101765703b32d
>> # good: [a739cbe236e0dd3b6ff26a01fa1d31c73d4fac93] mm: memcg: make memcg
>> huge page split support any order split
>> git bisect good a739cbe236e0dd3b6ff26a01fa1d31c73d4fac93
>> # bad: [efb520aa333b2f11daaaaa13f4a598b5ae4ae823] mm: allow non-hugetlb
>> large folios to be batch processed
>> git bisect bad efb520aa333b2f11daaaaa13f4a598b5ae4ae823
>> # bad: [2258bdebb55e3ad3d30fd3849ddb955ff36825de] mm/zsmalloc: don't
>> hold locks of all pages when free_zspage()
>> git bisect bad 2258bdebb55e3ad3d30fd3849ddb955ff36825de
>> # bad: [7fc0be45acf2878cbacc4dba56923c34c3fd8b1e] mm: remove
>> total_mapcount()
>> git bisect bad 7fc0be45acf2878cbacc4dba56923c34c3fd8b1e
>> # good: [d55fac55da2f87ad5a99178e107df09770bbc411] mm: thp: split huge
>> page to any lower order pages
>> git bisect good d55fac55da2f87ad5a99178e107df09770bbc411
>> # bad: [4050d591c1aaf9336c08511fa5984827186e9ad1] mm/memfd: refactor
>> memfd_tag_pins() and memfd_wait_for_pins()
>> git bisect bad 4050d591c1aaf9336c08511fa5984827186e9ad1
>> # bad: [c0ba89c29ef559c95273feb481b049f622c43c17] mm: huge_memory:
>> enable debugfs to split huge pages to any order
>> git bisect bad c0ba89c29ef559c95273feb481b049f622c43c17
>> # first bad commit: [c0ba89c29ef559c95273feb481b049f622c43c17] mm:
>> huge_memory: enable debugfs to split huge pages to any order
>>
>>
>> Thanks,
>> Aishwarya


--
Best Regards,
Yan, Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2024-03-01 12:57:01

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 1 Mar 2024, at 7:11, Mark Brown wrote:

> On Fri, Mar 01, 2024 at 10:33:15AM +0000, Ryan Roberts wrote:
>
>> - In create_pagecache_thp_and_fd() you do *fd = open(testfile, O_CREAT ...);
>> where testfile is /mnt/thp_fs/testfile. So if /mnt/thp_fs doesn't exist,
>> then the open will fail I think? I'm pretty sure that's what's happening on
>> our CI. Suggest the test needs to setup this dir itself. Is thp_fs a mounted
>> fs or just a dir? If the latter can you just mktemp()?
>
> Mounting on /mnt would also be a bit of an issue, that's something
> people are relatively likely to have used for something so could be
> disruptive. If the test is going to do a new mount it's probably better
> to do something like make a temporary directory then mount on top of that.

To move it to a temp folder for mounting, the test needs to do the mount.
But it is impossible to know if the running environment has the required FS or not
and where the FS is. Should I add that as a parameter to the test binary?

--
Best Regards,
Yan, Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2024-03-01 13:09:27

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 01/03/2024 12:52, Zi Yan wrote:
> On 1 Mar 2024, at 5:33, Ryan Roberts wrote:
>
>> On 01/03/2024 09:51, Aishwarya TCV wrote:
>>>
>>>
>>> On 26/02/2024 20:55, Zi Yan wrote:
>>>> From: Zi Yan <[email protected]>
>>>>
>>>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>>>> Also add test cases for split_huge_page_to_list_to_order via both
>>>> debugfs.
>>>>
>>>> Signed-off-by: Zi Yan <[email protected]>
>>>> ---
>>>> mm/huge_memory.c | 34 ++++--
>>>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>>>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>>>
>>>
>>> Hi Zi,
>>>
>>> When booting the kernel against next-master(20240228)with Arm64 on
>>> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
>>> is failing in our CI (with rootfs over NFS). I can send the full logs if
>>> required.
>>
>> Just to add, I took a quick eyeball and I think there a couple of potential issues:
>>
>> - In create_pagecache_thp_and_fd() you do *fd = open(testfile, O_CREAT ...);
>> where testfile is /mnt/thp_fs/testfile. So if /mnt/thp_fs doesn't exist,
>> then the open will fail I think? I'm pretty sure that's what's happening on
>> our CI. Suggest the test needs to setup this dir itself. Is thp_fs a mounted
>> fs or just a dir? If the latter can you just mktemp()?
>
> The former. the page cache folio split tests require a file system supporting
> large folio and I used XFS.

OK got it.

>
>> - Later in create_pagecache_thp_and_fd() you fail the test if you don't have a
>> filesystem that supports large folios. Can we turn that into a skip? That
>> would reduce noise on the CI.
>
> I can do that. But is this a new requirement that self tests have to be finish
> in CI/CD environment? Can you provide a guideline for it?

I'm not sure what's written down, but certainly anyone should be able to run the
selftests with as little knowledge as possible, and they should only fail if
they detect a real problem. By convention a test should be skipped if the
environment (or kernel) isn't compatible. There are lots of examples of that in
mm selftests (just grep ksft_test_result_skip). mm selftests also has
run_vmtests.sh which does a lot of environment setup (e.g. reserving hugetlb
pages, etc) before actually running the tests.

> Since I always assume
> selftests are just ran by human who can set up environment.

I believe kernelci have been running mm skeftests on x86 for a long time. We
have started running them against arm64 on our CI for the last couple of months
and it had found a number of real issues in the kernel in -next, so this is
helping find and fix things early. So there is definitely benefit to keeping
these tests clean and robust.

> In addition, I do
> not think it is realistic to make the test file to set up all the environment,
> since everyone's machine is different. It is much easier to make the CI/CD
> environment to make the mount.

That's reasonable, but then the requirements should be documented and you
probably would want to be able to optionally pass the mount on the command line.

Thanks,
Ryan

>
>>
>> Thanks,
>> Ryan
>>
>>>
>>> A bisect (full log below) identified this patch as introducing the
>>> failure. Bisected it on the tag "next-20240228" at repo
>>> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>>>
>>> This works fine on Linux version 6.8.0-rc6
>>>
>>>
>>> Sample log from failure against run on TX2:
>>> ------
>>> 07:17:34.056125 # # ------------------------------
>>> 07:17:34.056543 # # running ./split_huge_page_test
>>> 07:17:34.056839 # # ------------------------------
>>> 07:17:34.057114 # # TAP version 13
>>> 07:17:34.058564 # # 1..12
>>> 07:17:34.156822 # # ok 1 Split huge pages successful
>>> 07:17:34.214074 # # ok 2 Split PTE-mapped huge pages successful
>>> 07:17:34.215630 # # # Please enable pr_debug in
>>> split_huge_pages_in_file() for more info.
>>> 07:17:34.225503 # # # Please check dmesg for more information
>>> 07:17:34.225862 # # ok 3 File-backed THP split test done
>>> 07:17:34.236944 # # Bail out! Failed to create a file at /mnt/thp_fs#
>>> Planned tests != run tests (12 != 3)
>>> 07:17:34.237307 # # # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
>>> 07:17:34.237620 # # [FAIL]
>>> 07:17:34.246430 # not ok 51 split_huge_page_test # exit=1
>>>
>>>
>>> Bisect log:
>>> ------
>>> git bisect start
>>> # good: [d206a76d7d2726f3b096037f2079ce0bd3ba329b] Linux 6.8-rc6
>>> git bisect good d206a76d7d2726f3b096037f2079ce0bd3ba329b
>>> # bad: [20af1ca418d2c0b11bc2a1fe8c0c88f67bcc2a7e] Add linux-next
>>> specific files for 20240228
>>> git bisect bad 20af1ca418d2c0b11bc2a1fe8c0c88f67bcc2a7e
>>> # bad: [1322f1801e59dddce10591d602d246c1bf49990c] Merge branch 'main' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
>>> git bisect bad 1322f1801e59dddce10591d602d246c1bf49990c
>>> # bad: [a82f70041487790b7b09fe4bb45436e1b57021d3] Merge branch 'dev' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git
>>> git bisect bad a82f70041487790b7b09fe4bb45436e1b57021d3
>>> # bad: [ce90480b9352ba2bebe8946dad9223e3f24c6e9a] Merge branch
>>> 'for-next' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap.git
>>> git bisect bad ce90480b9352ba2bebe8946dad9223e3f24c6e9a
>>> # bad: [5daac92ed3881fd0c656478a301a4e1d124100ee] Merge branch
>>> 'mm-everything' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>>> git bisect bad 5daac92ed3881fd0c656478a301a4e1d124100ee
>>> # good: [acc2643d9e988c63dd4629a9af380ad9ac69c54a] Merge branch
>>> 'mm-stable' into mm-unstable
>>> git bisect good acc2643d9e988c63dd4629a9af380ad9ac69c54a
>>> # good: [0294de8fe7d7c1a7eddc979cbf4c1886406e36b7] Merge branch 'fixes'
>>> of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git
>>> git bisect good 0294de8fe7d7c1a7eddc979cbf4c1886406e36b7
>>> # good: [83e0c8f0e777a1ef0977b2f8189101765703b32d] Merge branch
>>> 'mm-nonmm-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>>> git bisect good 83e0c8f0e777a1ef0977b2f8189101765703b32d
>>> # good: [a739cbe236e0dd3b6ff26a01fa1d31c73d4fac93] mm: memcg: make memcg
>>> huge page split support any order split
>>> git bisect good a739cbe236e0dd3b6ff26a01fa1d31c73d4fac93
>>> # bad: [efb520aa333b2f11daaaaa13f4a598b5ae4ae823] mm: allow non-hugetlb
>>> large folios to be batch processed
>>> git bisect bad efb520aa333b2f11daaaaa13f4a598b5ae4ae823
>>> # bad: [2258bdebb55e3ad3d30fd3849ddb955ff36825de] mm/zsmalloc: don't
>>> hold locks of all pages when free_zspage()
>>> git bisect bad 2258bdebb55e3ad3d30fd3849ddb955ff36825de
>>> # bad: [7fc0be45acf2878cbacc4dba56923c34c3fd8b1e] mm: remove
>>> total_mapcount()
>>> git bisect bad 7fc0be45acf2878cbacc4dba56923c34c3fd8b1e
>>> # good: [d55fac55da2f87ad5a99178e107df09770bbc411] mm: thp: split huge
>>> page to any lower order pages
>>> git bisect good d55fac55da2f87ad5a99178e107df09770bbc411
>>> # bad: [4050d591c1aaf9336c08511fa5984827186e9ad1] mm/memfd: refactor
>>> memfd_tag_pins() and memfd_wait_for_pins()
>>> git bisect bad 4050d591c1aaf9336c08511fa5984827186e9ad1
>>> # bad: [c0ba89c29ef559c95273feb481b049f622c43c17] mm: huge_memory:
>>> enable debugfs to split huge pages to any order
>>> git bisect bad c0ba89c29ef559c95273feb481b049f622c43c17
>>> # first bad commit: [c0ba89c29ef559c95273feb481b049f622c43c17] mm:
>>> huge_memory: enable debugfs to split huge pages to any order
>>>
>>>
>>> Thanks,
>>> Aishwarya
>
>
> --
> Best Regards,
> Yan, Zi


2024-03-01 13:53:24

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 1 Mar 2024, at 8:09, Ryan Roberts wrote:

> On 01/03/2024 12:52, Zi Yan wrote:
>> On 1 Mar 2024, at 5:33, Ryan Roberts wrote:
>>
>>> On 01/03/2024 09:51, Aishwarya TCV wrote:
>>>>
>>>>
>>>> On 26/02/2024 20:55, Zi Yan wrote:
>>>>> From: Zi Yan <[email protected]>
>>>>>
>>>>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>>>>> Also add test cases for split_huge_page_to_list_to_order via both
>>>>> debugfs.
>>>>>
>>>>> Signed-off-by: Zi Yan <[email protected]>
>>>>> ---
>>>>> mm/huge_memory.c | 34 ++++--
>>>>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>>>>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>>>>
>>>>
>>>> Hi Zi,
>>>>
>>>> When booting the kernel against next-master(20240228)with Arm64 on
>>>> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
>>>> is failing in our CI (with rootfs over NFS). I can send the full logs if
>>>> required.
>>>
>>> Just to add, I took a quick eyeball and I think there a couple of potential issues:
>>>
>>> - In create_pagecache_thp_and_fd() you do *fd = open(testfile, O_CREAT ...);
>>> where testfile is /mnt/thp_fs/testfile. So if /mnt/thp_fs doesn't exist,
>>> then the open will fail I think? I'm pretty sure that's what's happening on
>>> our CI. Suggest the test needs to setup this dir itself. Is thp_fs a mounted
>>> fs or just a dir? If the latter can you just mktemp()?
>>
>> The former. the page cache folio split tests require a file system supporting
>> large folio and I used XFS.
>
> OK got it.
>
>>
>>> - Later in create_pagecache_thp_and_fd() you fail the test if you don't have a
>>> filesystem that supports large folios. Can we turn that into a skip? That
>>> would reduce noise on the CI.
>>
>> I can do that. But is this a new requirement that self tests have to be finish
>> in CI/CD environment? Can you provide a guideline for it?
>
> I'm not sure what's written down, but certainly anyone should be able to run the
> selftests with as little knowledge as possible, and they should only fail if
> they detect a real problem. By convention a test should be skipped if the
> environment (or kernel) isn't compatible. There are lots of examples of that in
> mm selftests (just grep ksft_test_result_skip). mm selftests also has
> run_vmtests.sh which does a lot of environment setup (e.g. reserving hugetlb
> pages, etc) before actually running the tests.

Got it. I will send a fixup to skip the page cache split test when the mount
is not ready, then send a separate patch to set up XFS in run_vmtests.sh and
pass it to this test.

>
>> Since I always assume
>> selftests are just ran by human who can set up environment.
>
> I believe kernelci have been running mm skeftests on x86 for a long time. We
> have started running them against arm64 on our CI for the last couple of months
> and it had found a number of real issues in the kernel in -next, so this is
> helping find and fix things early. So there is definitely benefit to keeping
> these tests clean and robust.

Got it. Make sense.

>
>> In addition, I do
>> not think it is realistic to make the test file to set up all the environment,
>> since everyone's machine is different. It is much easier to make the CI/CD
>> environment to make the mount.
>
> That's reasonable, but then the requirements should be documented and you
> probably would want to be able to optionally pass the mount on the command line.

Will do.

Thank you for the explanation.

>>>
>>> Thanks,
>>> Ryan
>>>
>>>>
>>>> A bisect (full log below) identified this patch as introducing the
>>>> failure. Bisected it on the tag "next-20240228" at repo
>>>> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>>>>
>>>> This works fine on Linux version 6.8.0-rc6
>>>>
>>>>
>>>> Sample log from failure against run on TX2:
>>>> ------
>>>> 07:17:34.056125 # # ------------------------------
>>>> 07:17:34.056543 # # running ./split_huge_page_test
>>>> 07:17:34.056839 # # ------------------------------
>>>> 07:17:34.057114 # # TAP version 13
>>>> 07:17:34.058564 # # 1..12
>>>> 07:17:34.156822 # # ok 1 Split huge pages successful
>>>> 07:17:34.214074 # # ok 2 Split PTE-mapped huge pages successful
>>>> 07:17:34.215630 # # # Please enable pr_debug in
>>>> split_huge_pages_in_file() for more info.
>>>> 07:17:34.225503 # # # Please check dmesg for more information
>>>> 07:17:34.225862 # # ok 3 File-backed THP split test done
>>>> 07:17:34.236944 # # Bail out! Failed to create a file at /mnt/thp_fs#
>>>> Planned tests != run tests (12 != 3)
>>>> 07:17:34.237307 # # # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
>>>> 07:17:34.237620 # # [FAIL]
>>>> 07:17:34.246430 # not ok 51 split_huge_page_test # exit=1
>>>>
>>>>
>>>> Bisect log:
>>>> ------
>>>> git bisect start
>>>> # good: [d206a76d7d2726f3b096037f2079ce0bd3ba329b] Linux 6.8-rc6
>>>> git bisect good d206a76d7d2726f3b096037f2079ce0bd3ba329b
>>>> # bad: [20af1ca418d2c0b11bc2a1fe8c0c88f67bcc2a7e] Add linux-next
>>>> specific files for 20240228
>>>> git bisect bad 20af1ca418d2c0b11bc2a1fe8c0c88f67bcc2a7e
>>>> # bad: [1322f1801e59dddce10591d602d246c1bf49990c] Merge branch 'main' of
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
>>>> git bisect bad 1322f1801e59dddce10591d602d246c1bf49990c
>>>> # bad: [a82f70041487790b7b09fe4bb45436e1b57021d3] Merge branch 'dev' of
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git
>>>> git bisect bad a82f70041487790b7b09fe4bb45436e1b57021d3
>>>> # bad: [ce90480b9352ba2bebe8946dad9223e3f24c6e9a] Merge branch
>>>> 'for-next' of
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap.git
>>>> git bisect bad ce90480b9352ba2bebe8946dad9223e3f24c6e9a
>>>> # bad: [5daac92ed3881fd0c656478a301a4e1d124100ee] Merge branch
>>>> 'mm-everything' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>>>> git bisect bad 5daac92ed3881fd0c656478a301a4e1d124100ee
>>>> # good: [acc2643d9e988c63dd4629a9af380ad9ac69c54a] Merge branch
>>>> 'mm-stable' into mm-unstable
>>>> git bisect good acc2643d9e988c63dd4629a9af380ad9ac69c54a
>>>> # good: [0294de8fe7d7c1a7eddc979cbf4c1886406e36b7] Merge branch 'fixes'
>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git
>>>> git bisect good 0294de8fe7d7c1a7eddc979cbf4c1886406e36b7
>>>> # good: [83e0c8f0e777a1ef0977b2f8189101765703b32d] Merge branch
>>>> 'mm-nonmm-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>>>> git bisect good 83e0c8f0e777a1ef0977b2f8189101765703b32d
>>>> # good: [a739cbe236e0dd3b6ff26a01fa1d31c73d4fac93] mm: memcg: make memcg
>>>> huge page split support any order split
>>>> git bisect good a739cbe236e0dd3b6ff26a01fa1d31c73d4fac93
>>>> # bad: [efb520aa333b2f11daaaaa13f4a598b5ae4ae823] mm: allow non-hugetlb
>>>> large folios to be batch processed
>>>> git bisect bad efb520aa333b2f11daaaaa13f4a598b5ae4ae823
>>>> # bad: [2258bdebb55e3ad3d30fd3849ddb955ff36825de] mm/zsmalloc: don't
>>>> hold locks of all pages when free_zspage()
>>>> git bisect bad 2258bdebb55e3ad3d30fd3849ddb955ff36825de
>>>> # bad: [7fc0be45acf2878cbacc4dba56923c34c3fd8b1e] mm: remove
>>>> total_mapcount()
>>>> git bisect bad 7fc0be45acf2878cbacc4dba56923c34c3fd8b1e
>>>> # good: [d55fac55da2f87ad5a99178e107df09770bbc411] mm: thp: split huge
>>>> page to any lower order pages
>>>> git bisect good d55fac55da2f87ad5a99178e107df09770bbc411
>>>> # bad: [4050d591c1aaf9336c08511fa5984827186e9ad1] mm/memfd: refactor
>>>> memfd_tag_pins() and memfd_wait_for_pins()
>>>> git bisect bad 4050d591c1aaf9336c08511fa5984827186e9ad1
>>>> # bad: [c0ba89c29ef559c95273feb481b049f622c43c17] mm: huge_memory:
>>>> enable debugfs to split huge pages to any order
>>>> git bisect bad c0ba89c29ef559c95273feb481b049f622c43c17
>>>> # first bad commit: [c0ba89c29ef559c95273feb481b049f622c43c17] mm:
>>>> huge_memory: enable debugfs to split huge pages to any order
>>>>
>>>>
>>>> Thanks,
>>>> Aishwarya
>>
>>
>> --
>> Best Regards,
>> Yan, Zi


--
Best Regards,
Yan, Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2024-03-01 14:01:56

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 1 Mar 2024, at 4:51, Aishwarya TCV wrote:

> On 26/02/2024 20:55, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>> Also add test cases for split_huge_page_to_list_to_order via both
>> debugfs.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> mm/huge_memory.c | 34 ++++--
>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>
>
> Hi Zi,
>
> When booting the kernel against next-master(20240228)with Arm64 on
> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
> is failing in our CI (with rootfs over NFS). I can send the full logs if
> required.
>
> A bisect (full log below) identified this patch as introducing the
> failure. Bisected it on the tag "next-20240228" at repo
> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>
> This works fine on Linux version 6.8.0-rc6

Hi Aishwarya,

I am trying to fix the issue. When I am compiling selftests/mm, I encountered
the error below when I run make under the folder. Am I missing any configuration?
Since you are able to run the test, I assume you know what is happening. Thanks.

vm_util.c: In function ‘__pagemap_scan_get_categories’:
vm_util.c:34:28: error: storage size of ‘arg’ isn’t known
34 | struct pm_scan_arg arg;
| ^~~
vm_util.c:41:27: error: invalid application of ‘sizeof’ to incomplete type ‘struct pm_scan_arg’
41 | arg.size = sizeof(struct pm_scan_arg);
| ^~~~~~
vm_util.c:45:35: error: ‘PAGE_IS_WPALLOWED’ undeclared (first use in this function)
45 | arg.category_anyof_mask = PAGE_IS_WPALLOWED | PAGE_IS_WRITTEN | PAGE_IS_FILE |
| ^~~~~~~~~~~~~~~~~
vm_util.c:45:35: note: each undeclared identifier is reported only once for each function it appears in
vm_util.c:45:55: error: ‘PAGE_IS_WRITTEN’ undeclared (first use in this function)
45 | arg.category_anyof_mask = PAGE_IS_WPALLOWED | PAGE_IS_WRITTEN | PAGE_IS_FILE |
| ^~~~~~~~~~~~~~~
vm_util.c:45:73: error: ‘PAGE_IS_FILE’ undeclared (first use in this function)
45 | arg.category_anyof_mask = PAGE_IS_WPALLOWED | PAGE_IS_WRITTEN | PAGE_IS_FILE |
| ^~~~~~~~~~~~
vm_util.c:46:35: error: ‘PAGE_IS_PRESENT’ undeclared (first use in this function); did you mean ‘PAGEMAP_PRESENT’?
46 | PAGE_IS_PRESENT | PAGE_IS_SWAPPED | PAGE_IS_PFNZERO |
| ^~~~~~~~~~~~~~~
| PAGEMAP_PRESENT
vm_util.c:46:53: error: ‘PAGE_IS_SWAPPED’ undeclared (first use in this function)
46 | PAGE_IS_PRESENT | PAGE_IS_SWAPPED | PAGE_IS_PFNZERO |
| ^~~~~~~~~~~~~~~
vm_util.c:46:71: error: ‘PAGE_IS_PFNZERO’ undeclared (first use in this function)
46 | PAGE_IS_PRESENT | PAGE_IS_SWAPPED | PAGE_IS_PFNZERO |
| ^~~~~~~~~~~~~~~
vm_util.c:47:35: error: ‘PAGE_IS_HUGE’ undeclared (first use in this function)
47 | PAGE_IS_HUGE | PAGE_IS_SOFT_DIRTY;
| ^~~~~~~~~~~~
vm_util.c:47:50: error: ‘PAGE_IS_SOFT_DIRTY’ undeclared (first use in this function); did you mean ‘PM_SOFT_DIRTY’?
47 | PAGE_IS_HUGE | PAGE_IS_SOFT_DIRTY;
| ^~~~~~~~~~~~~~~~~~
| PM_SOFT_DIRTY
vm_util.c:50:26: error: ‘PAGEMAP_SCAN’ undeclared (first use in this function); did you mean ‘PAGEMAP_PFN’?
50 | return ioctl(fd, PAGEMAP_SCAN, &arg);
| ^~~~~~~~~~~~
| PAGEMAP_PFN

--
Best Regards,
Yan, Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2024-03-01 14:15:31

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On Fri, Mar 01, 2024 at 07:56:41AM -0500, Zi Yan wrote:
> On 1 Mar 2024, at 7:11, Mark Brown wrote:

> > Mounting on /mnt would also be a bit of an issue, that's something
> > people are relatively likely to have used for something so could be
> > disruptive. If the test is going to do a new mount it's probably better
> > to do something like make a temporary directory then mount on top of that.

> To move it to a temp folder for mounting, the test needs to do the mount.
> But it is impossible to know if the running environment has the required FS or not
> and where the FS is. Should I add that as a parameter to the test binary?

You can check the supported filesystem types in /proc/filesystems
(possibly also elsewhere, that's just my first thought). There's some
standard APIs for getting/naming a temporary file or directory which
should pay attention to any system settings - mktemp() is a generally
available one for C code IIRC.


Attachments:
(No filename) (963.00 B)
signature.asc (499.00 B)
Download all attachments

2024-03-01 14:18:37

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 01/03/2024 13:53, Zi Yan wrote:
> On 1 Mar 2024, at 8:09, Ryan Roberts wrote:
>
>> On 01/03/2024 12:52, Zi Yan wrote:
>>> On 1 Mar 2024, at 5:33, Ryan Roberts wrote:
>>>
>>>> On 01/03/2024 09:51, Aishwarya TCV wrote:
>>>>>
>>>>>
>>>>> On 26/02/2024 20:55, Zi Yan wrote:
>>>>>> From: Zi Yan <[email protected]>
>>>>>>
>>>>>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>>>>>> Also add test cases for split_huge_page_to_list_to_order via both
>>>>>> debugfs.
>>>>>>
>>>>>> Signed-off-by: Zi Yan <[email protected]>
>>>>>> ---
>>>>>> mm/huge_memory.c | 34 ++++--
>>>>>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>>>>>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>>>>>
>>>>>
>>>>> Hi Zi,
>>>>>
>>>>> When booting the kernel against next-master(20240228)with Arm64 on
>>>>> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
>>>>> is failing in our CI (with rootfs over NFS). I can send the full logs if
>>>>> required.
>>>>
>>>> Just to add, I took a quick eyeball and I think there a couple of potential issues:
>>>>
>>>> - In create_pagecache_thp_and_fd() you do *fd = open(testfile, O_CREAT ...);
>>>> where testfile is /mnt/thp_fs/testfile. So if /mnt/thp_fs doesn't exist,
>>>> then the open will fail I think? I'm pretty sure that's what's happening on
>>>> our CI. Suggest the test needs to setup this dir itself. Is thp_fs a mounted
>>>> fs or just a dir? If the latter can you just mktemp()?
>>>
>>> The former. the page cache folio split tests require a file system supporting
>>> large folio and I used XFS.
>>
>> OK got it.
>>
>>>
>>>> - Later in create_pagecache_thp_and_fd() you fail the test if you don't have a
>>>> filesystem that supports large folios. Can we turn that into a skip? That
>>>> would reduce noise on the CI.
>>>
>>> I can do that. But is this a new requirement that self tests have to be finish
>>> in CI/CD environment? Can you provide a guideline for it?
>>
>> I'm not sure what's written down, but certainly anyone should be able to run the
>> selftests with as little knowledge as possible, and they should only fail if
>> they detect a real problem. By convention a test should be skipped if the
>> environment (or kernel) isn't compatible. There are lots of examples of that in
>> mm selftests (just grep ksft_test_result_skip). mm selftests also has
>> run_vmtests.sh which does a lot of environment setup (e.g. reserving hugetlb
>> pages, etc) before actually running the tests.
>
> Got it. I will send a fixup to skip the page cache split test when the mount
> is not ready, then send a separate patch to set up XFS in run_vmtests.sh and
> pass it to this test.

Thanks - appeciated!

Although I agree it might be a tall order create and mount an XFS fs in
run_vmtests.sh. Perhaps it might be good enough to add an optional param to the
test to pass a path when running the test manually, and if that's not provided,
just try to create a temp file in the current dir and skip if its not the right
sort of fs?

>
>>
>>> Since I always assume
>>> selftests are just ran by human who can set up environment.
>>
>> I believe kernelci have been running mm skeftests on x86 for a long time. We
>> have started running them against arm64 on our CI for the last couple of months
>> and it had found a number of real issues in the kernel in -next, so this is
>> helping find and fix things early. So there is definitely benefit to keeping
>> these tests clean and robust.
>
> Got it. Make sense.
>
>>
>>> In addition, I do
>>> not think it is realistic to make the test file to set up all the environment,
>>> since everyone's machine is different. It is much easier to make the CI/CD
>>> environment to make the mount.
>>
>> That's reasonable, but then the requirements should be documented and you
>> probably would want to be able to optionally pass the mount on the command line.
>
> Will do.
>
> Thank you for the explanation.
>
>>>>
>>>> Thanks,
>>>> Ryan
>>>>
>>>>>
>>>>> A bisect (full log below) identified this patch as introducing the
>>>>> failure. Bisected it on the tag "next-20240228" at repo
>>>>> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>>>>>
>>>>> This works fine on Linux version 6.8.0-rc6
>>>>>
>>>>>
>>>>> Sample log from failure against run on TX2:
>>>>> ------
>>>>> 07:17:34.056125 # # ------------------------------
>>>>> 07:17:34.056543 # # running ./split_huge_page_test
>>>>> 07:17:34.056839 # # ------------------------------
>>>>> 07:17:34.057114 # # TAP version 13
>>>>> 07:17:34.058564 # # 1..12
>>>>> 07:17:34.156822 # # ok 1 Split huge pages successful
>>>>> 07:17:34.214074 # # ok 2 Split PTE-mapped huge pages successful
>>>>> 07:17:34.215630 # # # Please enable pr_debug in
>>>>> split_huge_pages_in_file() for more info.
>>>>> 07:17:34.225503 # # # Please check dmesg for more information
>>>>> 07:17:34.225862 # # ok 3 File-backed THP split test done
>>>>> 07:17:34.236944 # # Bail out! Failed to create a file at /mnt/thp_fs#
>>>>> Planned tests != run tests (12 != 3)
>>>>> 07:17:34.237307 # # # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
>>>>> 07:17:34.237620 # # [FAIL]
>>>>> 07:17:34.246430 # not ok 51 split_huge_page_test # exit=1
>>>>>
>>>>>
>>>>> Bisect log:
>>>>> ------
>>>>> git bisect start
>>>>> # good: [d206a76d7d2726f3b096037f2079ce0bd3ba329b] Linux 6.8-rc6
>>>>> git bisect good d206a76d7d2726f3b096037f2079ce0bd3ba329b
>>>>> # bad: [20af1ca418d2c0b11bc2a1fe8c0c88f67bcc2a7e] Add linux-next
>>>>> specific files for 20240228
>>>>> git bisect bad 20af1ca418d2c0b11bc2a1fe8c0c88f67bcc2a7e
>>>>> # bad: [1322f1801e59dddce10591d602d246c1bf49990c] Merge branch 'main' of
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
>>>>> git bisect bad 1322f1801e59dddce10591d602d246c1bf49990c
>>>>> # bad: [a82f70041487790b7b09fe4bb45436e1b57021d3] Merge branch 'dev' of
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git
>>>>> git bisect bad a82f70041487790b7b09fe4bb45436e1b57021d3
>>>>> # bad: [ce90480b9352ba2bebe8946dad9223e3f24c6e9a] Merge branch
>>>>> 'for-next' of
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap.git
>>>>> git bisect bad ce90480b9352ba2bebe8946dad9223e3f24c6e9a
>>>>> # bad: [5daac92ed3881fd0c656478a301a4e1d124100ee] Merge branch
>>>>> 'mm-everything' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>>>>> git bisect bad 5daac92ed3881fd0c656478a301a4e1d124100ee
>>>>> # good: [acc2643d9e988c63dd4629a9af380ad9ac69c54a] Merge branch
>>>>> 'mm-stable' into mm-unstable
>>>>> git bisect good acc2643d9e988c63dd4629a9af380ad9ac69c54a
>>>>> # good: [0294de8fe7d7c1a7eddc979cbf4c1886406e36b7] Merge branch 'fixes'
>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git
>>>>> git bisect good 0294de8fe7d7c1a7eddc979cbf4c1886406e36b7
>>>>> # good: [83e0c8f0e777a1ef0977b2f8189101765703b32d] Merge branch
>>>>> 'mm-nonmm-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>>>>> git bisect good 83e0c8f0e777a1ef0977b2f8189101765703b32d
>>>>> # good: [a739cbe236e0dd3b6ff26a01fa1d31c73d4fac93] mm: memcg: make memcg
>>>>> huge page split support any order split
>>>>> git bisect good a739cbe236e0dd3b6ff26a01fa1d31c73d4fac93
>>>>> # bad: [efb520aa333b2f11daaaaa13f4a598b5ae4ae823] mm: allow non-hugetlb
>>>>> large folios to be batch processed
>>>>> git bisect bad efb520aa333b2f11daaaaa13f4a598b5ae4ae823
>>>>> # bad: [2258bdebb55e3ad3d30fd3849ddb955ff36825de] mm/zsmalloc: don't
>>>>> hold locks of all pages when free_zspage()
>>>>> git bisect bad 2258bdebb55e3ad3d30fd3849ddb955ff36825de
>>>>> # bad: [7fc0be45acf2878cbacc4dba56923c34c3fd8b1e] mm: remove
>>>>> total_mapcount()
>>>>> git bisect bad 7fc0be45acf2878cbacc4dba56923c34c3fd8b1e
>>>>> # good: [d55fac55da2f87ad5a99178e107df09770bbc411] mm: thp: split huge
>>>>> page to any lower order pages
>>>>> git bisect good d55fac55da2f87ad5a99178e107df09770bbc411
>>>>> # bad: [4050d591c1aaf9336c08511fa5984827186e9ad1] mm/memfd: refactor
>>>>> memfd_tag_pins() and memfd_wait_for_pins()
>>>>> git bisect bad 4050d591c1aaf9336c08511fa5984827186e9ad1
>>>>> # bad: [c0ba89c29ef559c95273feb481b049f622c43c17] mm: huge_memory:
>>>>> enable debugfs to split huge pages to any order
>>>>> git bisect bad c0ba89c29ef559c95273feb481b049f622c43c17
>>>>> # first bad commit: [c0ba89c29ef559c95273feb481b049f622c43c17] mm:
>>>>> huge_memory: enable debugfs to split huge pages to any order
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Aishwarya
>>>
>>>
>>> --
>>> Best Regards,
>>> Yan, Zi
>
>
> --
> Best Regards,
> Yan, Zi


2024-03-01 14:23:39

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 01/03/2024 14:00, Zi Yan wrote:
> On 1 Mar 2024, at 4:51, Aishwarya TCV wrote:
>
>> On 26/02/2024 20:55, Zi Yan wrote:
>>> From: Zi Yan <[email protected]>
>>>
>>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>>> Also add test cases for split_huge_page_to_list_to_order via both
>>> debugfs.
>>>
>>> Signed-off-by: Zi Yan <[email protected]>
>>> ---
>>> mm/huge_memory.c | 34 ++++--
>>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>>
>>
>> Hi Zi,
>>
>> When booting the kernel against next-master(20240228)with Arm64 on
>> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
>> is failing in our CI (with rootfs over NFS). I can send the full logs if
>> required.
>>
>> A bisect (full log below) identified this patch as introducing the
>> failure. Bisected it on the tag "next-20240228" at repo
>> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>>
>> This works fine on Linux version 6.8.0-rc6
>
> Hi Aishwarya,
>
> I am trying to fix the issue. When I am compiling selftests/mm, I encountered
> the error below when I run make under the folder. Am I missing any configuration?
> Since you are able to run the test, I assume you know what is happening. Thanks.

for what its worth, I usually compile from the top level directory with:

# make headers_install
# make -C tools/testing/selftests TARGETS=mm install INSTALL_PATH=~/kself

Perhaps the below is due to the headers not being exported properly. Bad things definitely happen if you omit the headers_install step.

>
> vm_util.c: In function ‘__pagemap_scan_get_categories’:
> vm_util.c:34:28: error: storage size of ‘arg’ isn’t known
> 34 | struct pm_scan_arg arg;
> | ^~~
> vm_util.c:41:27: error: invalid application of ‘sizeof’ to incomplete type ‘struct pm_scan_arg’
> 41 | arg.size = sizeof(struct pm_scan_arg);
> | ^~~~~~
> vm_util.c:45:35: error: ‘PAGE_IS_WPALLOWED’ undeclared (first use in this function)
> 45 | arg.category_anyof_mask = PAGE_IS_WPALLOWED | PAGE_IS_WRITTEN | PAGE_IS_FILE |
> | ^~~~~~~~~~~~~~~~~
> vm_util.c:45:35: note: each undeclared identifier is reported only once for each function it appears in
> vm_util.c:45:55: error: ‘PAGE_IS_WRITTEN’ undeclared (first use in this function)
> 45 | arg.category_anyof_mask = PAGE_IS_WPALLOWED | PAGE_IS_WRITTEN | PAGE_IS_FILE |
> | ^~~~~~~~~~~~~~~
> vm_util.c:45:73: error: ‘PAGE_IS_FILE’ undeclared (first use in this function)
> 45 | arg.category_anyof_mask = PAGE_IS_WPALLOWED | PAGE_IS_WRITTEN | PAGE_IS_FILE |
> | ^~~~~~~~~~~~
> vm_util.c:46:35: error: ‘PAGE_IS_PRESENT’ undeclared (first use in this function); did you mean ‘PAGEMAP_PRESENT’?
> 46 | PAGE_IS_PRESENT | PAGE_IS_SWAPPED | PAGE_IS_PFNZERO |
> | ^~~~~~~~~~~~~~~
> | PAGEMAP_PRESENT
> vm_util.c:46:53: error: ‘PAGE_IS_SWAPPED’ undeclared (first use in this function)
> 46 | PAGE_IS_PRESENT | PAGE_IS_SWAPPED | PAGE_IS_PFNZERO |
> | ^~~~~~~~~~~~~~~
> vm_util.c:46:71: error: ‘PAGE_IS_PFNZERO’ undeclared (first use in this function)
> 46 | PAGE_IS_PRESENT | PAGE_IS_SWAPPED | PAGE_IS_PFNZERO |
> | ^~~~~~~~~~~~~~~
> vm_util.c:47:35: error: ‘PAGE_IS_HUGE’ undeclared (first use in this function)
> 47 | PAGE_IS_HUGE | PAGE_IS_SOFT_DIRTY;
> | ^~~~~~~~~~~~
> vm_util.c:47:50: error: ‘PAGE_IS_SOFT_DIRTY’ undeclared (first use in this function); did you mean ‘PM_SOFT_DIRTY’?
> 47 | PAGE_IS_HUGE | PAGE_IS_SOFT_DIRTY;
> | ^~~~~~~~~~~~~~~~~~
> | PM_SOFT_DIRTY
> vm_util.c:50:26: error: ‘PAGEMAP_SCAN’ undeclared (first use in this function); did you mean ‘PAGEMAP_PFN’?
> 50 | return ioctl(fd, PAGEMAP_SCAN, &arg);
> | ^~~~~~~~~~~~
> | PAGEMAP_PFN
>
> --
> Best Regards,
> Yan, Zi


2024-03-01 14:24:25

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On Fri, Mar 01, 2024 at 01:09:04PM +0000, Ryan Roberts wrote:
> On 01/03/2024 12:52, Zi Yan wrote:

> > I can do that. But is this a new requirement that self tests have to be finish
> > in CI/CD environment? Can you provide a guideline for it?

> I'm not sure what's written down, but certainly anyone should be able to run the
> selftests with as little knowledge as possible, and they should only fail if
> they detect a real problem. By convention a test should be skipped if the

It does get mentioned in talks and things but TBH I think it's just
generally a good practice thing.

> > Since I always assume
> > selftests are just ran by human who can set up environment.

> I believe kernelci have been running mm skeftests on x86 for a long time. We
> have started running them against arm64 on our CI for the last couple of months
> and it had found a number of real issues in the kernel in -next, so this is
> helping find and fix things early. So there is definitely benefit to keeping
> these tests clean and robust.

They're also being routinely run on at least some platforms by LKFT
(though from a quick check it seems not with the magic command line
options to set up secretmem and huge pages) and CKI, possibly also some
of the other CI systems I'm less aware of.


Attachments:
(No filename) (1.28 kB)
signature.asc (499.00 B)
Download all attachments

2024-03-01 14:27:42

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On Fri, Mar 01, 2024 at 02:18:16PM +0000, Ryan Roberts wrote:

> Although I agree it might be a tall order create and mount an XFS fs in
> run_vmtests.sh. Perhaps it might be good enough to add an optional param to the
> test to pass a path when running the test manually, and if that's not provided,
> just try to create a temp file in the current dir and skip if its not the right
> sort of fs?

Yeah, if it needs to be a specific kind of on disk filesystem then that
needs a lot more integration with CI systems (a lot of them run entirely
from nfsroot by default!). Being able to specify the location via an
environment variable would also be good, it could fall back to the
current directory if one isn't set up.


Attachments:
(No filename) (732.00 B)
signature.asc (499.00 B)
Download all attachments

2024-03-01 14:33:45

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 1 Mar 2024, at 9:23, Ryan Roberts wrote:

> On 01/03/2024 14:00, Zi Yan wrote:
>> On 1 Mar 2024, at 4:51, Aishwarya TCV wrote:
>>
>>> On 26/02/2024 20:55, Zi Yan wrote:
>>>> From: Zi Yan <[email protected]>
>>>>
>>>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>>>> Also add test cases for split_huge_page_to_list_to_order via both
>>>> debugfs.
>>>>
>>>> Signed-off-by: Zi Yan <[email protected]>
>>>> ---
>>>> mm/huge_memory.c | 34 ++++--
>>>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>>>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>>>
>>>
>>> Hi Zi,
>>>
>>> When booting the kernel against next-master(20240228)with Arm64 on
>>> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
>>> is failing in our CI (with rootfs over NFS). I can send the full logs if
>>> required.
>>>
>>> A bisect (full log below) identified this patch as introducing the
>>> failure. Bisected it on the tag "next-20240228" at repo
>>> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>>>
>>> This works fine on Linux version 6.8.0-rc6
>>
>> Hi Aishwarya,
>>
>> I am trying to fix the issue. When I am compiling selftests/mm, I encountered
>> the error below when I run make under the folder. Am I missing any configuration?
>> Since you are able to run the test, I assume you know what is happening. Thanks.
>
> for what its worth, I usually compile from the top level directory with:
>
> # make headers_install
> # make -C tools/testing/selftests TARGETS=mm install INSTALL_PATH=~/kself
>
> Perhaps the below is due to the headers not being exported properly. Bad things definitely happen if you omit the headers_install step.

It works. Thanks a lot!

>>
>> vm_util.c: In function ‘__pagemap_scan_get_categories’:
>> vm_util.c:34:28: error: storage size of ‘arg’ isn’t known
>> 34 | struct pm_scan_arg arg;
>> | ^~~
>> vm_util.c:41:27: error: invalid application of ‘sizeof’ to incomplete type ‘struct pm_scan_arg’
>> 41 | arg.size = sizeof(struct pm_scan_arg);
>> | ^~~~~~
>> vm_util.c:45:35: error: ‘PAGE_IS_WPALLOWED’ undeclared (first use in this function)
>> 45 | arg.category_anyof_mask = PAGE_IS_WPALLOWED | PAGE_IS_WRITTEN | PAGE_IS_FILE |
>> | ^~~~~~~~~~~~~~~~~
>> vm_util.c:45:35: note: each undeclared identifier is reported only once for each function it appears in
>> vm_util.c:45:55: error: ‘PAGE_IS_WRITTEN’ undeclared (first use in this function)
>> 45 | arg.category_anyof_mask = PAGE_IS_WPALLOWED | PAGE_IS_WRITTEN | PAGE_IS_FILE |
>> | ^~~~~~~~~~~~~~~
>> vm_util.c:45:73: error: ‘PAGE_IS_FILE’ undeclared (first use in this function)
>> 45 | arg.category_anyof_mask = PAGE_IS_WPALLOWED | PAGE_IS_WRITTEN | PAGE_IS_FILE |
>> | ^~~~~~~~~~~~
>> vm_util.c:46:35: error: ‘PAGE_IS_PRESENT’ undeclared (first use in this function); did you mean ‘PAGEMAP_PRESENT’?
>> 46 | PAGE_IS_PRESENT | PAGE_IS_SWAPPED | PAGE_IS_PFNZERO |
>> | ^~~~~~~~~~~~~~~
>> | PAGEMAP_PRESENT
>> vm_util.c:46:53: error: ‘PAGE_IS_SWAPPED’ undeclared (first use in this function)
>> 46 | PAGE_IS_PRESENT | PAGE_IS_SWAPPED | PAGE_IS_PFNZERO |
>> | ^~~~~~~~~~~~~~~
>> vm_util.c:46:71: error: ‘PAGE_IS_PFNZERO’ undeclared (first use in this function)
>> 46 | PAGE_IS_PRESENT | PAGE_IS_SWAPPED | PAGE_IS_PFNZERO |
>> | ^~~~~~~~~~~~~~~
>> vm_util.c:47:35: error: ‘PAGE_IS_HUGE’ undeclared (first use in this function)
>> 47 | PAGE_IS_HUGE | PAGE_IS_SOFT_DIRTY;
>> | ^~~~~~~~~~~~
>> vm_util.c:47:50: error: ‘PAGE_IS_SOFT_DIRTY’ undeclared (first use in this function); did you mean ‘PM_SOFT_DIRTY’?
>> 47 | PAGE_IS_HUGE | PAGE_IS_SOFT_DIRTY;
>> | ^~~~~~~~~~~~~~~~~~
>> | PM_SOFT_DIRTY
>> vm_util.c:50:26: error: ‘PAGEMAP_SCAN’ undeclared (first use in this function); did you mean ‘PAGEMAP_PFN’?
>> 50 | return ioctl(fd, PAGEMAP_SCAN, &arg);
>> | ^~~~~~~~~~~~
>> | PAGEMAP_PFN
>>
>> --
>> Best Regards,
>> Yan, Zi


--
Best Regards,
Yan, Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2024-03-01 15:21:45

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 1 Mar 2024, at 9:27, Mark Brown wrote:

> On Fri, Mar 01, 2024 at 02:18:16PM +0000, Ryan Roberts wrote:
>
>> Although I agree it might be a tall order create and mount an XFS fs in
>> run_vmtests.sh. Perhaps it might be good enough to add an optional param to the
>> test to pass a path when running the test manually, and if that's not provided,
>> just try to create a temp file in the current dir and skip if its not the right
>> sort of fs?
>
> Yeah, if it needs to be a specific kind of on disk filesystem then that
> needs a lot more integration with CI systems (a lot of them run entirely
> from nfsroot by default!). Being able to specify the location via an
> environment variable would also be good, it could fall back to the
> current directory if one isn't set up.

Sure. lkp creates a XFS image and mount it. I will give it a try. If it is too
hard to do, I will do what Ryan suggested above.

--
Best Regards,
Yan, Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2024-03-01 19:37:45

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 1 Mar 2024, at 4:51, Aishwarya TCV wrote:

> On 26/02/2024 20:55, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>> Also add test cases for split_huge_page_to_list_to_order via both
>> debugfs.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> mm/huge_memory.c | 34 ++++--
>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>
>
> Hi Zi,
>
> When booting the kernel against next-master(20240228)with Arm64 on
> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
> is failing in our CI (with rootfs over NFS). I can send the full logs if
> required.
>
> A bisect (full log below) identified this patch as introducing the
> failure. Bisected it on the tag "next-20240228" at repo
> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>
> This works fine on Linux version 6.8.0-rc6

Hi Aishwarya,

Can you try the attached patch and see if it fixes the failure? I changed
the test to accept XFS dev as input, mount XFS on a temp folder under /tmp,
and skip if no XFS is mounted.

Thanks.

--
Best Regards,
Yan, Zi


Attachments:
selftest.patch (4.78 kB)
signature.asc (871.00 B)
OpenPGP digital signature
Download all attachments

2024-03-01 20:22:31

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 1 Mar 2024, at 9:27, Mark Brown wrote:

> On Fri, Mar 01, 2024 at 02:18:16PM +0000, Ryan Roberts wrote:
>
>> Although I agree it might be a tall order create and mount an XFS fs in
>> run_vmtests.sh. Perhaps it might be good enough to add an optional param to the
>> test to pass a path when running the test manually, and if that's not provided,
>> just try to create a temp file in the current dir and skip if its not the right
>> sort of fs?
>
> Yeah, if it needs to be a specific kind of on disk filesystem then that
> needs a lot more integration with CI systems (a lot of them run entirely
> from nfsroot by default!). Being able to specify the location via an
> environment variable would also be good, it could fall back to the
> current directory if one isn't set up.
Hi Mark,

I have changed the test[1] to:
1. accept XFS dev as an input,
2. mount XFS on a temp folder,
3. skip the test instead of fail if no XFS is mounted.

Let me know if the change looks good. Thanks.

[1] https://lore.kernel.org/linux-mm/[email protected]/T/#u

--
Best Regards,
Yan, Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2024-03-01 20:34:02

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 1 Mar 2024, at 14:37, Zi Yan wrote:

> On 1 Mar 2024, at 4:51, Aishwarya TCV wrote:
>
>> On 26/02/2024 20:55, Zi Yan wrote:
>>> From: Zi Yan <[email protected]>
>>>
>>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>>> Also add test cases for split_huge_page_to_list_to_order via both
>>> debugfs.
>>>
>>> Signed-off-by: Zi Yan <[email protected]>
>>> ---
>>> mm/huge_memory.c | 34 ++++--
>>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>>
>>
>> Hi Zi,
>>
>> When booting the kernel against next-master(20240228)with Arm64 on
>> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
>> is failing in our CI (with rootfs over NFS). I can send the full logs if
>> required.
>>
>> A bisect (full log below) identified this patch as introducing the
>> failure. Bisected it on the tag "next-20240228" at repo
>> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>>
>> This works fine on Linux version 6.8.0-rc6
>
> Hi Aishwarya,
>
> Can you try the attached patch and see if it fixes the failure? I changed
> the test to accept XFS dev as input, mount XFS on a temp folder under /tmp,
> and skip if no XFS is mounted.

Please try this updated one. It allows you to specify a XFS device path
in SPLIT_HUGE_PAGE_TEST_XFS_PATH env variable, which is passed to
split_huge_page_test in run_vmtests.sh. It at least allow CI/CD to run
the test without too much change.

--
Best Regards,
Yan, Zi


Attachments:
selftest.patch (5.30 kB)
signature.asc (871.00 B)
OpenPGP digital signature
Download all attachments

2024-03-01 21:20:38

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 1 Mar 2024, at 15:02, Zi Yan wrote:

> On 1 Mar 2024, at 14:37, Zi Yan wrote:
>
>> On 1 Mar 2024, at 4:51, Aishwarya TCV wrote:
>>
>>> On 26/02/2024 20:55, Zi Yan wrote:
>>>> From: Zi Yan <[email protected]>
>>>>
>>>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>>>> Also add test cases for split_huge_page_to_list_to_order via both
>>>> debugfs.
>>>>
>>>> Signed-off-by: Zi Yan <[email protected]>
>>>> ---
>>>> mm/huge_memory.c | 34 ++++--
>>>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>>>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>>>
>>>
>>> Hi Zi,
>>>
>>> When booting the kernel against next-master(20240228)with Arm64 on
>>> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
>>> is failing in our CI (with rootfs over NFS). I can send the full logs if
>>> required.
>>>
>>> A bisect (full log below) identified this patch as introducing the
>>> failure. Bisected it on the tag "next-20240228" at repo
>>> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>>>
>>> This works fine on Linux version 6.8.0-rc6
>>
>> Hi Aishwarya,
>>
>> Can you try the attached patch and see if it fixes the failure? I changed
>> the test to accept XFS dev as input, mount XFS on a temp folder under /tmp,
>> and skip if no XFS is mounted.
>
> Please try this updated one. It allows you to specify a XFS device path
> in SPLIT_HUGE_PAGE_TEST_XFS_PATH env variable, which is passed to
> split_huge_page_test in run_vmtests.sh. It at least allow CI/CD to run
> the test without too much change.

OK. This hopefully will be my last churn. Now split_huge_page_test accepts
a path that is backed by XFS and run_vmtest.sh creates a XFS image in /tmp,
mounts it in /tmp, and gives the path to split_huge_page_test. I tested
it locally and it works. Let me know if you have any issue. Thanks.

--
Best Regards,
Yan, Zi


Attachments:
selftest.patch (5.65 kB)
signature.asc (871.00 B)
OpenPGP digital signature
Download all attachments

2024-03-04 09:52:44

by Aishwarya TCV

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.



On 01/03/2024 21:10, Zi Yan wrote:
> On 1 Mar 2024, at 15:02, Zi Yan wrote:
>
>> On 1 Mar 2024, at 14:37, Zi Yan wrote:
>>
>>> On 1 Mar 2024, at 4:51, Aishwarya TCV wrote:
>>>
>>>> On 26/02/2024 20:55, Zi Yan wrote:
>>>>> From: Zi Yan <[email protected]>
>>>>>
>>>>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>>>>> Also add test cases for split_huge_page_to_list_to_order via both
>>>>> debugfs.
>>>>>
>>>>> Signed-off-by: Zi Yan <[email protected]>
>>>>> ---
>>>>> mm/huge_memory.c | 34 ++++--
>>>>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>>>>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>>>>
>>>>
>>>> Hi Zi,
>>>>
>>>> When booting the kernel against next-master(20240228)with Arm64 on
>>>> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
>>>> is failing in our CI (with rootfs over NFS). I can send the full logs if
>>>> required.
>>>>
>>>> A bisect (full log below) identified this patch as introducing the
>>>> failure. Bisected it on the tag "next-20240228" at repo
>>>> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>>>>
>>>> This works fine on Linux version 6.8.0-rc6
>>>
>>> Hi Aishwarya,
>>>
>>> Can you try the attached patch and see if it fixes the failure? I changed
>>> the test to accept XFS dev as input, mount XFS on a temp folder under /tmp,
>>> and skip if no XFS is mounted.
>>
>> Please try this updated one. It allows you to specify a XFS device path
>> in SPLIT_HUGE_PAGE_TEST_XFS_PATH env variable, which is passed to
>> split_huge_page_test in run_vmtests.sh. It at least allow CI/CD to run
>> the test without too much change.
>
> OK. This hopefully will be my last churn. Now split_huge_page_test accepts
> a path that is backed by XFS and run_vmtest.sh creates a XFS image in /tmp,
> mounts it in /tmp, and gives the path to split_huge_page_test. I tested
> it locally and it works. Let me know if you have any issue. Thanks.
>
> --
> Best Regards,
> Yan, Zi

Hi Zi,

Tested the patch by applying it on next-20240304. Logs from our CI with
rootfs over nfs is attached below. "Bail out! cannot remove tmp dir:
Directory not empty" is still observed.


Test run log:
# # ------------------------------
# # running ./split_huge_page_test
# # ------------------------------
# # TAP version 13
# # 1..12
# # ok 1 Split huge pages successful
# # ok 2 Split PTE-mapped huge pages successful
# # # Please enable pr_debug in split_huge_pages_in_file() for more info.
# # # Please check dmesg for more information
# # ok 3 File-backed THP split test done
<6>[ 639.821657] split_huge_page (111099): drop_caches: 3
<6>[ 639.821657] split_huge_page (111099): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 4 # SKIP Pagecache folio split skipped
<6>[ 645.392184] split_huge_page (111099): drop_caches: 3
<6>[ 645.392184] split_huge_page (111099): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 5 # SKIP Pagecache folio split skipped
<6>[ 650.938248] split_huge_page (111099): drop_caches: 3
<6>[ 650.938248] split_huge_page (111099): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 6 # SKIP Pagecache folio split skipped
<6>[ 656.500149] split_huge_page (111099): drop_caches: 3
<6>[ 656.500149] split_huge_page (111099): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 7 # SKIP Pagecache folio split skipped
<6>[ 662.044085] split_huge_page (111099): drop_caches: 3
<6>[ 662.044085] split_huge_page (111099): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 8 # SKIP Pagecache folio split skipped
<6>[ 667.591841] split_huge_page (111099): drop_caches: 3
<6>[ 667.591841] split_huge_page (111099): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 9 # SKIP Pagecache folio split skipped
<6>[ 673.172441] split_huge_page (111099): drop_caches: 3
<6>[ 673.172441] split_huge_page (111099): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 10 # SKIP Pagecache folio split skipped
<6>[ 678.726263] split_huge_page (111099): drop_caches: 3
<6>[ 678.726263] split_huge_page (111099): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 11 # SKIP Pagecache folio split skipped
<6>[ 684.272851] split_huge_page (111099): drop_caches: 3
<6>[ 684.272851] split_huge_page (111099): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 12 # SKIP Pagecache folio split skipped
# # Bail out! cannot remove tmp dir: Directory not empty
# # # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:9 error:0
# # [FAIL]
# not ok 51 split_huge_page_test # exit=1
# # ------------------

Thanks,
Aishwarya

2024-03-04 14:58:46

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 4 Mar 2024, at 4:50, Aishwarya TCV wrote:

> On 01/03/2024 21:10, Zi Yan wrote:
>> On 1 Mar 2024, at 15:02, Zi Yan wrote:
>>
>>> On 1 Mar 2024, at 14:37, Zi Yan wrote:
>>>
>>>> On 1 Mar 2024, at 4:51, Aishwarya TCV wrote:
>>>>
>>>>> On 26/02/2024 20:55, Zi Yan wrote:
>>>>>> From: Zi Yan <[email protected]>
>>>>>>
>>>>>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>>>>>> Also add test cases for split_huge_page_to_list_to_order via both
>>>>>> debugfs.
>>>>>>
>>>>>> Signed-off-by: Zi Yan <[email protected]>
>>>>>> ---
>>>>>> mm/huge_memory.c | 34 ++++--
>>>>>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>>>>>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>>>>>
>>>>>
>>>>> Hi Zi,
>>>>>
>>>>> When booting the kernel against next-master(20240228)with Arm64 on
>>>>> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
>>>>> is failing in our CI (with rootfs over NFS). I can send the full logs if
>>>>> required.
>>>>>
>>>>> A bisect (full log below) identified this patch as introducing the
>>>>> failure. Bisected it on the tag "next-20240228" at repo
>>>>> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>>>>>
>>>>> This works fine on Linux version 6.8.0-rc6
>>>>
>>>> Hi Aishwarya,
>>>>
>>>> Can you try the attached patch and see if it fixes the failure? I changed
>>>> the test to accept XFS dev as input, mount XFS on a temp folder under /tmp,
>>>> and skip if no XFS is mounted.
>>>
>>> Please try this updated one. It allows you to specify a XFS device path
>>> in SPLIT_HUGE_PAGE_TEST_XFS_PATH env variable, which is passed to
>>> split_huge_page_test in run_vmtests.sh. It at least allow CI/CD to run
>>> the test without too much change.
>>
>> OK. This hopefully will be my last churn. Now split_huge_page_test accepts
>> a path that is backed by XFS and run_vmtest.sh creates a XFS image in /tmp,
>> mounts it in /tmp, and gives the path to split_huge_page_test. I tested
>> it locally and it works. Let me know if you have any issue. Thanks.
>>
>> --
>> Best Regards,
>> Yan, Zi
>
> Hi Zi,
>
> Tested the patch by applying it on next-20240304. Logs from our CI with
> rootfs over nfs is attached below. "Bail out! cannot remove tmp dir:
> Directory not empty" is still observed.

Hi Aishwarya,

Do you have the config file for the CI kernel? And /tmp is also on nfs?
Any detailed information about CI machine environment? I cannot reproduce
the error locally, either on bare metal or VM. Maybe because my /tmp is
not NFS mounted?

>
>
> Test run log:
> # # ------------------------------
> # # running ./split_huge_page_test
> # # ------------------------------
> # # TAP version 13
> # # 1..12
> # # ok 1 Split huge pages successful
> # # ok 2 Split PTE-mapped huge pages successful
> # # # Please enable pr_debug in split_huge_pages_in_file() for more info.
> # # # Please check dmesg for more information
> # # ok 3 File-backed THP split test done
> <6>[ 639.821657] split_huge_page (111099): drop_caches: 3
> <6>[ 639.821657] split_huge_page (111099): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 4 # SKIP Pagecache folio split skipped
> <6>[ 645.392184] split_huge_page (111099): drop_caches: 3
> <6>[ 645.392184] split_huge_page (111099): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 5 # SKIP Pagecache folio split skipped
> <6>[ 650.938248] split_huge_page (111099): drop_caches: 3
> <6>[ 650.938248] split_huge_page (111099): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 6 # SKIP Pagecache folio split skipped
> <6>[ 656.500149] split_huge_page (111099): drop_caches: 3
> <6>[ 656.500149] split_huge_page (111099): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 7 # SKIP Pagecache folio split skipped
> <6>[ 662.044085] split_huge_page (111099): drop_caches: 3
> <6>[ 662.044085] split_huge_page (111099): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 8 # SKIP Pagecache folio split skipped
> <6>[ 667.591841] split_huge_page (111099): drop_caches: 3
> <6>[ 667.591841] split_huge_page (111099): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 9 # SKIP Pagecache folio split skipped
> <6>[ 673.172441] split_huge_page (111099): drop_caches: 3
> <6>[ 673.172441] split_huge_page (111099): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 10 # SKIP Pagecache folio split skipped
> <6>[ 678.726263] split_huge_page (111099): drop_caches: 3
> <6>[ 678.726263] split_huge_page (111099): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 11 # SKIP Pagecache folio split skipped
> <6>[ 684.272851] split_huge_page (111099): drop_caches: 3
> <6>[ 684.272851] split_huge_page (111099): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 12 # SKIP Pagecache folio split skipped
> # # Bail out! cannot remove tmp dir: Directory not empty
> # # # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:9 error:0
> # # [FAIL]
> # not ok 51 split_huge_page_test # exit=1
> # # ------------------
>
> Thanks,
> Aishwarya


--
Best Regards,
Yan, Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2024-03-04 15:50:31

by Aishwarya TCV

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.



On 04/03/2024 14:58, Zi Yan wrote:
> On 4 Mar 2024, at 4:50, Aishwarya TCV wrote:
>
>> On 01/03/2024 21:10, Zi Yan wrote:
>>> On 1 Mar 2024, at 15:02, Zi Yan wrote:
>>>
>>>> On 1 Mar 2024, at 14:37, Zi Yan wrote:
>>>>
>>>>> On 1 Mar 2024, at 4:51, Aishwarya TCV wrote:
>>>>>
>>>>>> On 26/02/2024 20:55, Zi Yan wrote:
>>>>>>> From: Zi Yan <[email protected]>
>>>>>>>
>>>>>>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>>>>>>> Also add test cases for split_huge_page_to_list_to_order via both
>>>>>>> debugfs.
>>>>>>>
>>>>>>> Signed-off-by: Zi Yan <[email protected]>
>>>>>>> ---
>>>>>>> mm/huge_memory.c | 34 ++++--
>>>>>>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>>>>>>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>>>>>>
>>>>>>
>>>>>> Hi Zi,
>>>>>>
>>>>>> When booting the kernel against next-master(20240228)with Arm64 on
>>>>>> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
>>>>>> is failing in our CI (with rootfs over NFS). I can send the full logs if
>>>>>> required.
>>>>>>
>>>>>> A bisect (full log below) identified this patch as introducing the
>>>>>> failure. Bisected it on the tag "next-20240228" at repo
>>>>>> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>>>>>>
>>>>>> This works fine on Linux version 6.8.0-rc6
>>>>>
>>>>> Hi Aishwarya,
>>>>>
>>>>> Can you try the attached patch and see if it fixes the failure? I changed
>>>>> the test to accept XFS dev as input, mount XFS on a temp folder under /tmp,
>>>>> and skip if no XFS is mounted.
>>>>
>>>> Please try this updated one. It allows you to specify a XFS device path
>>>> in SPLIT_HUGE_PAGE_TEST_XFS_PATH env variable, which is passed to
>>>> split_huge_page_test in run_vmtests.sh. It at least allow CI/CD to run
>>>> the test without too much change.
>>>
>>> OK. This hopefully will be my last churn. Now split_huge_page_test accepts
>>> a path that is backed by XFS and run_vmtest.sh creates a XFS image in /tmp,
>>> mounts it in /tmp, and gives the path to split_huge_page_test. I tested
>>> it locally and it works. Let me know if you have any issue. Thanks.
>>>
>>> --
>>> Best Regards,
>>> Yan, Zi
>>
>> Hi Zi,
>>
>> Tested the patch by applying it on next-20240304. Logs from our CI with
>> rootfs over nfs is attached below. "Bail out! cannot remove tmp dir:
>> Directory not empty" is still observed.
>
> Hi Aishwarya,
>
> Do you have the config file for the CI kernel? And /tmp is also on nfs?
> Any detailed information about CI machine environment? I cannot reproduce
> the error locally, either on bare metal or VM. Maybe because my /tmp is
> not NFS mounted?
>

Hi Zi,

Please find the details below. Hope it helps.

Do you have the config file for the CI kernel?
- We are using:
defconfig+https://github.com/torvalds/linux/blob/master/tools/testing/selftests/mm/config

And /tmp is also on nfs?
- Yes

Any detailed information about CI machine environment?
- We are running the test using LAVA device Cavium Thunder X2 (TX2),
- We have very similar rootfs as - nfsrootfs:
https://storage.kernelci.org/images/rootfs/debian/bullseye-kselftest/20240129.0/arm64/full.rootfs.tar.xz
- We are using grub boot method over nfs
- Additionally Ryan mentioned "Looks like it is failing because he is
trying to delete the temp dir with rmdir() but rmdir() requires the
directory to be empty, which it is not."

Thanks,
Aishwarya

>>
>>
>> Test run log:
>> # # ------------------------------
>> # # running ./split_huge_page_test
>> # # ------------------------------
>> # # TAP version 13
>> # # 1..12
>> # # ok 1 Split huge pages successful
>> # # ok 2 Split PTE-mapped huge pages successful
>> # # # Please enable pr_debug in split_huge_pages_in_file() for more info.
>> # # # Please check dmesg for more information
>> # # ok 3 File-backed THP split test done
>> <6>[ 639.821657] split_huge_page (111099): drop_caches: 3
>> <6>[ 639.821657] split_huge_page (111099): drop_caches: 3
>> # # # No large pagecache folio generated, please provide a filesystem
>> supporting large folio
>> # # ok 4 # SKIP Pagecache folio split skipped
>> <6>[ 645.392184] split_huge_page (111099): drop_caches: 3
>> <6>[ 645.392184] split_huge_page (111099): drop_caches: 3
>> # # # No large pagecache folio generated, please provide a filesystem
>> supporting large folio
>> # # ok 5 # SKIP Pagecache folio split skipped
>> <6>[ 650.938248] split_huge_page (111099): drop_caches: 3
>> <6>[ 650.938248] split_huge_page (111099): drop_caches: 3
>> # # # No large pagecache folio generated, please provide a filesystem
>> supporting large folio
>> # # ok 6 # SKIP Pagecache folio split skipped
>> <6>[ 656.500149] split_huge_page (111099): drop_caches: 3
>> <6>[ 656.500149] split_huge_page (111099): drop_caches: 3
>> # # # No large pagecache folio generated, please provide a filesystem
>> supporting large folio
>> # # ok 7 # SKIP Pagecache folio split skipped
>> <6>[ 662.044085] split_huge_page (111099): drop_caches: 3
>> <6>[ 662.044085] split_huge_page (111099): drop_caches: 3
>> # # # No large pagecache folio generated, please provide a filesystem
>> supporting large folio
>> # # ok 8 # SKIP Pagecache folio split skipped
>> <6>[ 667.591841] split_huge_page (111099): drop_caches: 3
>> <6>[ 667.591841] split_huge_page (111099): drop_caches: 3
>> # # # No large pagecache folio generated, please provide a filesystem
>> supporting large folio
>> # # ok 9 # SKIP Pagecache folio split skipped
>> <6>[ 673.172441] split_huge_page (111099): drop_caches: 3
>> <6>[ 673.172441] split_huge_page (111099): drop_caches: 3
>> # # # No large pagecache folio generated, please provide a filesystem
>> supporting large folio
>> # # ok 10 # SKIP Pagecache folio split skipped
>> <6>[ 678.726263] split_huge_page (111099): drop_caches: 3
>> <6>[ 678.726263] split_huge_page (111099): drop_caches: 3
>> # # # No large pagecache folio generated, please provide a filesystem
>> supporting large folio
>> # # ok 11 # SKIP Pagecache folio split skipped
>> <6>[ 684.272851] split_huge_page (111099): drop_caches: 3
>> <6>[ 684.272851] split_huge_page (111099): drop_caches: 3
>> # # # No large pagecache folio generated, please provide a filesystem
>> supporting large folio
>> # # ok 12 # SKIP Pagecache folio split skipped
>> # # Bail out! cannot remove tmp dir: Directory not empty
>> # # # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:9 error:0
>> # # [FAIL]
>> # not ok 51 split_huge_page_test # exit=1
>> # # ------------------
>>
>> Thanks,
>> Aishwarya
>
>
> --
> Best Regards,
> Yan, Zi

2024-03-04 15:58:28

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 4 Mar 2024, at 10:44, Aishwarya TCV wrote:

> On 04/03/2024 14:58, Zi Yan wrote:
>> On 4 Mar 2024, at 4:50, Aishwarya TCV wrote:
>>
>>> On 01/03/2024 21:10, Zi Yan wrote:
>>>> On 1 Mar 2024, at 15:02, Zi Yan wrote:
>>>>
>>>>> On 1 Mar 2024, at 14:37, Zi Yan wrote:
>>>>>
>>>>>> On 1 Mar 2024, at 4:51, Aishwarya TCV wrote:
>>>>>>
>>>>>>> On 26/02/2024 20:55, Zi Yan wrote:
>>>>>>>> From: Zi Yan <[email protected]>
>>>>>>>>
>>>>>>>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>>>>>>>> Also add test cases for split_huge_page_to_list_to_order via both
>>>>>>>> debugfs.
>>>>>>>>
>>>>>>>> Signed-off-by: Zi Yan <[email protected]>
>>>>>>>> ---
>>>>>>>> mm/huge_memory.c | 34 ++++--
>>>>>>>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>>>>>>>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>>>>>>>
>>>>>>>
>>>>>>> Hi Zi,
>>>>>>>
>>>>>>> When booting the kernel against next-master(20240228)with Arm64 on
>>>>>>> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
>>>>>>> is failing in our CI (with rootfs over NFS). I can send the full logs if
>>>>>>> required.
>>>>>>>
>>>>>>> A bisect (full log below) identified this patch as introducing the
>>>>>>> failure. Bisected it on the tag "next-20240228" at repo
>>>>>>> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>>>>>>>
>>>>>>> This works fine on Linux version 6.8.0-rc6
>>>>>>
>>>>>> Hi Aishwarya,
>>>>>>
>>>>>> Can you try the attached patch and see if it fixes the failure? I changed
>>>>>> the test to accept XFS dev as input, mount XFS on a temp folder under /tmp,
>>>>>> and skip if no XFS is mounted.
>>>>>
>>>>> Please try this updated one. It allows you to specify a XFS device path
>>>>> in SPLIT_HUGE_PAGE_TEST_XFS_PATH env variable, which is passed to
>>>>> split_huge_page_test in run_vmtests.sh. It at least allow CI/CD to run
>>>>> the test without too much change.
>>>>
>>>> OK. This hopefully will be my last churn. Now split_huge_page_test accepts
>>>> a path that is backed by XFS and run_vmtest.sh creates a XFS image in /tmp,
>>>> mounts it in /tmp, and gives the path to split_huge_page_test. I tested
>>>> it locally and it works. Let me know if you have any issue. Thanks.
>>>>
>>>> --
>>>> Best Regards,
>>>> Yan, Zi
>>>
>>> Hi Zi,
>>>
>>> Tested the patch by applying it on next-20240304. Logs from our CI with
>>> rootfs over nfs is attached below. "Bail out! cannot remove tmp dir:
>>> Directory not empty" is still observed.
>>
>> Hi Aishwarya,
>>
>> Do you have the config file for the CI kernel? And /tmp is also on nfs?
>> Any detailed information about CI machine environment? I cannot reproduce
>> the error locally, either on bare metal or VM. Maybe because my /tmp is
>> not NFS mounted?
>>
>
> Hi Zi,
>
> Please find the details below. Hope it helps.
>
> Do you have the config file for the CI kernel?
> - We are using:
> defconfig+https://github.com/torvalds/linux/blob/master/tools/testing/selftests/mm/config
>
> And /tmp is also on nfs?
> - Yes
>
> Any detailed information about CI machine environment?
> - We are running the test using LAVA device Cavium Thunder X2 (TX2),
> - We have very similar rootfs as - nfsrootfs:
> https://storage.kernelci.org/images/rootfs/debian/bullseye-kselftest/20240129.0/arm64/full.rootfs.tar.xz
> - We are using grub boot method over nfs
> - Additionally Ryan mentioned "Looks like it is failing because he is
> trying to delete the temp dir with rmdir() but rmdir() requires the
> directory to be empty, which it is not."

Hi Aishwarya,

Thank you for the information and I am able to reproduce it on a NFS folder.
The error comes from that the opened test files are not munmapped and their
file descriptors are not closed in the skip path. NFS creates .nfsXXX files
for them, making the temp folder not empty.

The attached patch cleans up properly and works on a NFS folder. Let me know
if it works on your side. Thanks.

--
Best Regards,
Yan, Zi


Attachments:
selftest.patch (5.69 kB)
signature.asc (871.00 B)
OpenPGP digital signature
Download all attachments

2024-03-04 18:25:57

by Aishwarya TCV

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.



On 04/03/2024 15:57, Zi Yan wrote:
> On 4 Mar 2024, at 10:44, Aishwarya TCV wrote:
>
>> On 04/03/2024 14:58, Zi Yan wrote:
>>> On 4 Mar 2024, at 4:50, Aishwarya TCV wrote:
>>>
>>>> On 01/03/2024 21:10, Zi Yan wrote:
>>>>> On 1 Mar 2024, at 15:02, Zi Yan wrote:
>>>>>
>>>>>> On 1 Mar 2024, at 14:37, Zi Yan wrote:
>>>>>>
>>>>>>> On 1 Mar 2024, at 4:51, Aishwarya TCV wrote:
>>>>>>>
>>>>>>>> On 26/02/2024 20:55, Zi Yan wrote:
>>>>>>>>> From: Zi Yan <[email protected]>
>>>>>>>>>
>>>>>>>>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>>>>>>>>> Also add test cases for split_huge_page_to_list_to_order via both
>>>>>>>>> debugfs.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Zi Yan <[email protected]>
>>>>>>>>> ---
>>>>>>>>> mm/huge_memory.c | 34 ++++--
>>>>>>>>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>>>>>>>>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Zi,
>>>>>>>>
>>>>>>>> When booting the kernel against next-master(20240228)with Arm64 on
>>>>>>>> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
>>>>>>>> is failing in our CI (with rootfs over NFS). I can send the full logs if
>>>>>>>> required.
>>>>>>>>
>>>>>>>> A bisect (full log below) identified this patch as introducing the
>>>>>>>> failure. Bisected it on the tag "next-20240228" at repo
>>>>>>>> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>>>>>>>>
>>>>>>>> This works fine on Linux version 6.8.0-rc6
>>>>>>>
>>>>>>> Hi Aishwarya,
>>>>>>>
>>>>>>> Can you try the attached patch and see if it fixes the failure? I changed
>>>>>>> the test to accept XFS dev as input, mount XFS on a temp folder under /tmp,
>>>>>>> and skip if no XFS is mounted.
>>>>>>
>>>>>> Please try this updated one. It allows you to specify a XFS device path
>>>>>> in SPLIT_HUGE_PAGE_TEST_XFS_PATH env variable, which is passed to
>>>>>> split_huge_page_test in run_vmtests.sh. It at least allow CI/CD to run
>>>>>> the test without too much change.
>>>>>
>>>>> OK. This hopefully will be my last churn. Now split_huge_page_test accepts
>>>>> a path that is backed by XFS and run_vmtest.sh creates a XFS image in /tmp,
>>>>> mounts it in /tmp, and gives the path to split_huge_page_test. I tested
>>>>> it locally and it works. Let me know if you have any issue. Thanks.
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Yan, Zi
>>>>
>>>> Hi Zi,
>>>>
>>>> Tested the patch by applying it on next-20240304. Logs from our CI with
>>>> rootfs over nfs is attached below. "Bail out! cannot remove tmp dir:
>>>> Directory not empty" is still observed.
>>>
>>> Hi Aishwarya,
>>>
>>> Do you have the config file for the CI kernel? And /tmp is also on nfs?
>>> Any detailed information about CI machine environment? I cannot reproduce
>>> the error locally, either on bare metal or VM. Maybe because my /tmp is
>>> not NFS mounted?
>>>
>>
>> Hi Zi,
>>
>> Please find the details below. Hope it helps.
>>
>> Do you have the config file for the CI kernel?
>> - We are using:
>> defconfig+https://github.com/torvalds/linux/blob/master/tools/testing/selftests/mm/config
>>
>> And /tmp is also on nfs?
>> - Yes
>>
>> Any detailed information about CI machine environment?
>> - We are running the test using LAVA device Cavium Thunder X2 (TX2),
>> - We have very similar rootfs as - nfsrootfs:
>> https://storage.kernelci.org/images/rootfs/debian/bullseye-kselftest/20240129.0/arm64/full.rootfs.tar.xz
>> - We are using grub boot method over nfs
>> - Additionally Ryan mentioned "Looks like it is failing because he is
>> trying to delete the temp dir with rmdir() but rmdir() requires the
>> directory to be empty, which it is not."
>
> Hi Aishwarya,
>
> Thank you for the information and I am able to reproduce it on a NFS folder.
> The error comes from that the opened test files are not munmapped and their
> file descriptors are not closed in the skip path. NFS creates .nfsXXX files
> for them, making the temp folder not empty.
>
> The attached patch cleans up properly and works on a NFS folder. Let me know
> if it works on your side. Thanks.
>
> --
> Best Regards,
> Yan, Zi

Hi Zi,

Tested the attached patch on next-20240304. Confirming that the test is
running fine. Test run log is attached below.

Test run log:
# # ------------------------------
# # running ./split_huge_page_test
# # ------------------------------
# # TAP version 13
# # 1..12
# # ok 1 Split huge pages successful
# # ok 2 Split PTE-mapped huge pages successful
# # # Please enable pr_debug in split_huge_pages_in_file() for more info.
# # # Please check dmesg for more information
# # ok 3 File-backed THP split test done
<6>[ 1769.710429] split_huge_page (111119): drop_caches: 3
<6>[ 1769.710429] split_huge_page (111119): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 4 # SKIP Pagecache folio split skipped
<6>[ 1775.302315] split_huge_page (111119): drop_caches: 3
<6>[ 1775.302315] split_huge_page (111119): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 5 # SKIP Pagecache folio split skipped
<6>[ 1780.924147] split_huge_page (111119): drop_caches: 3
<6>[ 1780.924147] split_huge_page (111119): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 6 # SKIP Pagecache folio split skipped
<6>[ 1786.524931] split_huge_page (111119): drop_caches: 3
<6>[ 1786.524931] split_huge_page (111119): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 7 # SKIP Pagecache folio split skipped
<6>[ 1792.112869] split_huge_page (111119): drop_caches: 3
<6>[ 1792.112869] split_huge_page (111119): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 8 # SKIP Pagecache folio split skipped
<6>[ 1797.718863] split_huge_page (111119): drop_caches: 3
<6>[ 1797.718863] split_huge_page (111119): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 9 # SKIP Pagecache folio split skipped
<6>[ 1803.332343] split_huge_page (111119): drop_caches: 3
<6>[ 1803.332343] split_huge_page (111119): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 10 # SKIP Pagecache folio split skipped
<6>[ 1808.947913] split_huge_page (111119): drop_caches: 3
<6>[ 1808.947913] split_huge_page (111119): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 11 # SKIP Pagecache folio split skipped
<6>[ 1814.537995] split_huge_page (111119): drop_caches: 3
<6>[ 1814.537995] split_huge_page (111119): drop_caches: 3
# # # No large pagecache folio generated, please provide a filesystem
supporting large folio
# # ok 12 # SKIP Pagecache folio split skipped
# # # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:9 error:0
# # [PASS]
# ok 51 split_huge_page_test
# # -------------------

Thanks,
Aishwarya




2024-03-04 18:32:43

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 4 Mar 2024, at 10:57, Zi Yan wrote:

> On 4 Mar 2024, at 10:44, Aishwarya TCV wrote:
>
>> On 04/03/2024 14:58, Zi Yan wrote:
>>> On 4 Mar 2024, at 4:50, Aishwarya TCV wrote:
>>>
>>>> On 01/03/2024 21:10, Zi Yan wrote:
>>>>> On 1 Mar 2024, at 15:02, Zi Yan wrote:
>>>>>
>>>>>> On 1 Mar 2024, at 14:37, Zi Yan wrote:
>>>>>>
>>>>>>> On 1 Mar 2024, at 4:51, Aishwarya TCV wrote:
>>>>>>>
>>>>>>>> On 26/02/2024 20:55, Zi Yan wrote:
>>>>>>>>> From: Zi Yan <[email protected]>
>>>>>>>>>
>>>>>>>>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>>>>>>>>> Also add test cases for split_huge_page_to_list_to_order via both
>>>>>>>>> debugfs.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Zi Yan <[email protected]>
>>>>>>>>> ---
>>>>>>>>> mm/huge_memory.c | 34 ++++--
>>>>>>>>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>>>>>>>>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Zi,
>>>>>>>>
>>>>>>>> When booting the kernel against next-master(20240228)with Arm64 on
>>>>>>>> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
>>>>>>>> is failing in our CI (with rootfs over NFS). I can send the full logs if
>>>>>>>> required.
>>>>>>>>
>>>>>>>> A bisect (full log below) identified this patch as introducing the
>>>>>>>> failure. Bisected it on the tag "next-20240228" at repo
>>>>>>>> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>>>>>>>>
>>>>>>>> This works fine on Linux version 6.8.0-rc6
>>>>>>>
>>>>>>> Hi Aishwarya,
>>>>>>>
>>>>>>> Can you try the attached patch and see if it fixes the failure? I changed
>>>>>>> the test to accept XFS dev as input, mount XFS on a temp folder under /tmp,
>>>>>>> and skip if no XFS is mounted.
>>>>>>
>>>>>> Please try this updated one. It allows you to specify a XFS device path
>>>>>> in SPLIT_HUGE_PAGE_TEST_XFS_PATH env variable, which is passed to
>>>>>> split_huge_page_test in run_vmtests.sh. It at least allow CI/CD to run
>>>>>> the test without too much change.
>>>>>
>>>>> OK. This hopefully will be my last churn. Now split_huge_page_test accepts
>>>>> a path that is backed by XFS and run_vmtest.sh creates a XFS image in /tmp,
>>>>> mounts it in /tmp, and gives the path to split_huge_page_test. I tested
>>>>> it locally and it works. Let me know if you have any issue. Thanks.
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Yan, Zi
>>>>
>>>> Hi Zi,
>>>>
>>>> Tested the patch by applying it on next-20240304. Logs from our CI with
>>>> rootfs over nfs is attached below. "Bail out! cannot remove tmp dir:
>>>> Directory not empty" is still observed.
>>>
>>> Hi Aishwarya,
>>>
>>> Do you have the config file for the CI kernel? And /tmp is also on nfs?
>>> Any detailed information about CI machine environment? I cannot reproduce
>>> the error locally, either on bare metal or VM. Maybe because my /tmp is
>>> not NFS mounted?
>>>
>>
>> Hi Zi,
>>
>> Please find the details below. Hope it helps.
>>
>> Do you have the config file for the CI kernel?
>> - We are using:
>> defconfig+https://github.com/torvalds/linux/blob/master/tools/testing/selftests/mm/config
>>
>> And /tmp is also on nfs?
>> - Yes
>>
>> Any detailed information about CI machine environment?
>> - We are running the test using LAVA device Cavium Thunder X2 (TX2),
>> - We have very similar rootfs as - nfsrootfs:
>> https://storage.kernelci.org/images/rootfs/debian/bullseye-kselftest/20240129.0/arm64/full.rootfs.tar.xz
>> - We are using grub boot method over nfs
>> - Additionally Ryan mentioned "Looks like it is failing because he is
>> trying to delete the temp dir with rmdir() but rmdir() requires the
>> directory to be empty, which it is not."
>
> Hi Aishwarya,
>
> Thank you for the information and I am able to reproduce it on a NFS folder.
> The error comes from that the opened test files are not munmapped and their
> file descriptors are not closed in the skip path. NFS creates .nfsXXX files
> for them, making the temp folder not empty.
>
> The attached patch cleans up properly and works on a NFS folder. Let me know
> if it works on your side. Thanks.

Hi Andrew,

Could you fold the patch from [1] to this one (i.e., Patch 8)? Let me know
if you want me to send it separately or resend the entire patchset. Thanks.

[1] https://lore.kernel.org/linux-mm/[email protected]/

--
Best Regards,
Yan, Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2024-03-04 19:01:11

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 4 Mar 2024, at 13:25, Aishwarya TCV wrote:

> On 04/03/2024 15:57, Zi Yan wrote:
>> On 4 Mar 2024, at 10:44, Aishwarya TCV wrote:
>>
>>> On 04/03/2024 14:58, Zi Yan wrote:
>>>> On 4 Mar 2024, at 4:50, Aishwarya TCV wrote:
>>>>
>>>>> On 01/03/2024 21:10, Zi Yan wrote:
>>>>>> On 1 Mar 2024, at 15:02, Zi Yan wrote:
>>>>>>
>>>>>>> On 1 Mar 2024, at 14:37, Zi Yan wrote:
>>>>>>>
>>>>>>>> On 1 Mar 2024, at 4:51, Aishwarya TCV wrote:
>>>>>>>>
>>>>>>>>> On 26/02/2024 20:55, Zi Yan wrote:
>>>>>>>>>> From: Zi Yan <[email protected]>
>>>>>>>>>>
>>>>>>>>>> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
>>>>>>>>>> Also add test cases for split_huge_page_to_list_to_order via both
>>>>>>>>>> debugfs.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Zi Yan <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>> mm/huge_memory.c | 34 ++++--
>>>>>>>>>> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
>>>>>>>>>> 2 files changed, 131 insertions(+), 18 deletions(-)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Zi,
>>>>>>>>>
>>>>>>>>> When booting the kernel against next-master(20240228)with Arm64 on
>>>>>>>>> Marvell Thunder X2 (TX2), the kselftest-mm test 'split_huge_page_test'
>>>>>>>>> is failing in our CI (with rootfs over NFS). I can send the full logs if
>>>>>>>>> required.
>>>>>>>>>
>>>>>>>>> A bisect (full log below) identified this patch as introducing the
>>>>>>>>> failure. Bisected it on the tag "next-20240228" at repo
>>>>>>>>> "https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git".
>>>>>>>>>
>>>>>>>>> This works fine on Linux version 6.8.0-rc6
>>>>>>>>
>>>>>>>> Hi Aishwarya,
>>>>>>>>
>>>>>>>> Can you try the attached patch and see if it fixes the failure? I changed
>>>>>>>> the test to accept XFS dev as input, mount XFS on a temp folder under /tmp,
>>>>>>>> and skip if no XFS is mounted.
>>>>>>>
>>>>>>> Please try this updated one. It allows you to specify a XFS device path
>>>>>>> in SPLIT_HUGE_PAGE_TEST_XFS_PATH env variable, which is passed to
>>>>>>> split_huge_page_test in run_vmtests.sh. It at least allow CI/CD to run
>>>>>>> the test without too much change.
>>>>>>
>>>>>> OK. This hopefully will be my last churn. Now split_huge_page_test accepts
>>>>>> a path that is backed by XFS and run_vmtest.sh creates a XFS image in /tmp,
>>>>>> mounts it in /tmp, and gives the path to split_huge_page_test. I tested
>>>>>> it locally and it works. Let me know if you have any issue. Thanks.
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Yan, Zi
>>>>>
>>>>> Hi Zi,
>>>>>
>>>>> Tested the patch by applying it on next-20240304. Logs from our CI with
>>>>> rootfs over nfs is attached below. "Bail out! cannot remove tmp dir:
>>>>> Directory not empty" is still observed.
>>>>
>>>> Hi Aishwarya,
>>>>
>>>> Do you have the config file for the CI kernel? And /tmp is also on nfs?
>>>> Any detailed information about CI machine environment? I cannot reproduce
>>>> the error locally, either on bare metal or VM. Maybe because my /tmp is
>>>> not NFS mounted?
>>>>
>>>
>>> Hi Zi,
>>>
>>> Please find the details below. Hope it helps.
>>>
>>> Do you have the config file for the CI kernel?
>>> - We are using:
>>> defconfig+https://github.com/torvalds/linux/blob/master/tools/testing/selftests/mm/config
>>>
>>> And /tmp is also on nfs?
>>> - Yes
>>>
>>> Any detailed information about CI machine environment?
>>> - We are running the test using LAVA device Cavium Thunder X2 (TX2),
>>> - We have very similar rootfs as - nfsrootfs:
>>> https://storage.kernelci.org/images/rootfs/debian/bullseye-kselftest/20240129.0/arm64/full.rootfs.tar.xz
>>> - We are using grub boot method over nfs
>>> - Additionally Ryan mentioned "Looks like it is failing because he is
>>> trying to delete the temp dir with rmdir() but rmdir() requires the
>>> directory to be empty, which it is not."
>>
>> Hi Aishwarya,
>>
>> Thank you for the information and I am able to reproduce it on a NFS folder.
>> The error comes from that the opened test files are not munmapped and their
>> file descriptors are not closed in the skip path. NFS creates .nfsXXX files
>> for them, making the temp folder not empty.
>>
>> The attached patch cleans up properly and works on a NFS folder. Let me know
>> if it works on your side. Thanks.
>>
>> --
>> Best Regards,
>> Yan, Zi
>
> Hi Zi,
>
> Tested the attached patch on next-20240304. Confirming that the test is
> running fine. Test run log is attached below.
>
> Test run log:
> # # ------------------------------
> # # running ./split_huge_page_test
> # # ------------------------------
> # # TAP version 13
> # # 1..12
> # # ok 1 Split huge pages successful
> # # ok 2 Split PTE-mapped huge pages successful
> # # # Please enable pr_debug in split_huge_pages_in_file() for more info.
> # # # Please check dmesg for more information
> # # ok 3 File-backed THP split test done
> <6>[ 1769.710429] split_huge_page (111119): drop_caches: 3
> <6>[ 1769.710429] split_huge_page (111119): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 4 # SKIP Pagecache folio split skipped
> <6>[ 1775.302315] split_huge_page (111119): drop_caches: 3
> <6>[ 1775.302315] split_huge_page (111119): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 5 # SKIP Pagecache folio split skipped
> <6>[ 1780.924147] split_huge_page (111119): drop_caches: 3
> <6>[ 1780.924147] split_huge_page (111119): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 6 # SKIP Pagecache folio split skipped
> <6>[ 1786.524931] split_huge_page (111119): drop_caches: 3
> <6>[ 1786.524931] split_huge_page (111119): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 7 # SKIP Pagecache folio split skipped
> <6>[ 1792.112869] split_huge_page (111119): drop_caches: 3
> <6>[ 1792.112869] split_huge_page (111119): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 8 # SKIP Pagecache folio split skipped
> <6>[ 1797.718863] split_huge_page (111119): drop_caches: 3
> <6>[ 1797.718863] split_huge_page (111119): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 9 # SKIP Pagecache folio split skipped
> <6>[ 1803.332343] split_huge_page (111119): drop_caches: 3
> <6>[ 1803.332343] split_huge_page (111119): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 10 # SKIP Pagecache folio split skipped
> <6>[ 1808.947913] split_huge_page (111119): drop_caches: 3
> <6>[ 1808.947913] split_huge_page (111119): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 11 # SKIP Pagecache folio split skipped
> <6>[ 1814.537995] split_huge_page (111119): drop_caches: 3
> <6>[ 1814.537995] split_huge_page (111119): drop_caches: 3
> # # # No large pagecache folio generated, please provide a filesystem
> supporting large folio
> # # ok 12 # SKIP Pagecache folio split skipped
> # # # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:9 error:0
> # # [PASS]
> # ok 51 split_huge_page_test
> # # -------------------

Thanks a lot for testing.

--
Best Regards,
Yan, Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2024-03-07 14:58:58

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 7/8] mm: thp: split huge page to any lower order pages

On 26 Feb 2024, at 15:55, Zi Yan wrote:

> From: Zi Yan <[email protected]>
>
> To split a THP to any lower order pages, we need to reform THPs on
> subpages at given order and add page refcount based on the new page
> order. Also we need to reinitialize page_deferred_list after removing
> the page from the split_queue, otherwise a subsequent split will
> see list corruption when checking the page_deferred_list again.
>
> Note: Anonymous order-1 folio is not supported because _deferred_list,
> which is used by partially mapped folios, is stored in subpage 2 and an
> order-1 folio only has subpage 0 and 1. File-backed order-1 folios are
> fine, since they do not use _deferred_list.
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> include/linux/huge_mm.h | 21 +++++---
> mm/huge_memory.c | 110 +++++++++++++++++++++++++++++++---------
> 2 files changed, 99 insertions(+), 32 deletions(-)

Hi Andrew,

Can you fold the patch below into this patch 7? It is based on the discussion
with Dan Carpenter at https://lore.kernel.org/linux-mm/[email protected]/.
It prevents invalid new_order input from causing unexpected outcome of
split_huge_page_to_list_to_order(). Especially in patch 8, new_order can come
from debugfs without any restriction. I will send another fixup to
patch 8 to check new_order from debugfs.

Thanks.

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a81a09236c16..57fca7bffd20 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3052,6 +3052,9 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);

+ if (new_order >= folio_order(folio))
+ return -EINVAL;
+
/* Cannot split anonymous THP to order-1 */
if (new_order == 1 && folio_test_anon(folio)) {
VM_WARN_ONCE(1, "Cannot split to order-1 folio");


--
Best Regards,
Yan, Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2024-03-07 15:07:11

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v5 8/8] mm: huge_memory: enable debugfs to split huge pages to any order.

On 26 Feb 2024, at 15:55, Zi Yan wrote:

> From: Zi Yan <[email protected]>
>
> It is used to test split_huge_page_to_list_to_order for pagecache THPs.
> Also add test cases for split_huge_page_to_list_to_order via both
> debugfs.
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> mm/huge_memory.c | 34 ++++--
> .../selftests/mm/split_huge_page_test.c | 115 +++++++++++++++++-
> 2 files changed, 131 insertions(+), 18 deletions(-)

Hi Andrew,

This is the fixup for patch 8. It is based on the discussion
with Dan Carpenter at https://lore.kernel.org/linux-mm/[email protected]/. It checks new_order input from
debugfs and skips folios early if new_order is greater than the folio order.

Thanks.

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a81a09236c16..42d4f62d7760 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3484,6 +3484,9 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
goto next;

total++;
+
+ if (new_order >= folio_order(folio))
+ goto next;
/*
* For folios with private, split_huge_page_to_list_to_order()
* will try to drop it before split and then check if the folio
@@ -3550,6 +3553,9 @@ static int split_huge_pages_in_file(const char *file_path, pgoff_t off_start,
total++;
nr_pages = folio_nr_pages(folio);

+ if (new_order >= folio_order(folio))
+ goto next;
+
if (!folio_trylock(folio))
goto next;



--
Best Regards,
Yan, Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature