2024-04-12 11:49:26

by Barry Song

[permalink] [raw]
Subject: [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters

From: Barry Song <[email protected]>

The patchset introduces a framework to facilitate mTHP counters, starting
with the allocation and swap-out counters. Currently, only four new nodes
are appended to the stats directory for each mTHP size.

/sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
anon_fault_alloc
anon_fault_fallback
anon_fault_fallback_charge
anon_swpout
anon_swpout_fallback

These nodes are crucial for us to monitor the fragmentation levels of
both the buddy system and the swap partitions. In the future, we may
consider adding additional nodes for further insights.

-v6:
* collect reviewed-by tags for patch2/4, 3/4, 4/4, Ryan;
* move back to static array by using MAX_PTRS_PER_PTE, Ryan;
* move to for_each_possible_cpu to handle cpu hotplug, Ryan;
* other minor cleanups according to Ryan;

-v5:
* rename anon_alloc to anon_fault_alloc, Barry/Ryan;
* add anon_fault_fallback_charge, Ryan;
* move to dynamic alloc_percpu as powerpc's PMD_ORDER is not const,
kernel test robot;
* make anon_fault_alloc and anon_fault_fallback more consistent
with thp_fault_alloc and thp_fault_fallback, Ryan;
* handle cpu hotplug properly, Ryan;
* add docs for new sysfs nodes and ABI, Andrew.
link:
https://lore.kernel.org/linux-mm/[email protected]/

-v4:
* Many thanks to David and Ryan for your patience and valuable insights
throughout the numerous renaming efforts!
* Guard the case order > PMD_ORDER in count func rather than in callers,
Ryan;
* Add swpout counters;
* Add a helper DEFINE_MTHP_STAT_ATTR to avoid code duplication for various
counters;
link:
https://lore.kernel.org/linux-mm/[email protected]/

-v3:
https://lore.kernel.org/linux-mm/[email protected]/

Barry Song (4):
mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback
counters
mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters
mm: add docs for per-order mTHP counters and transhuge_page ABI
mm: correct the docs for thp_fault_alloc and thp_fault_fallback

.../sys-kernel-mm-transparent-hugepage | 17 ++++++
Documentation/admin-guide/mm/transhuge.rst | 32 ++++++++++-
include/linux/huge_mm.h | 23 ++++++++
mm/huge_memory.c | 56 +++++++++++++++++++
mm/memory.c | 5 ++
mm/page_io.c | 1 +
mm/vmscan.c | 3 +
7 files changed, 135 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sys-kernel-mm-transparent-hugepage

--
2.34.1



2024-04-12 11:49:36

by Barry Song

[permalink] [raw]
Subject: [PATCH v6 1/4] mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters

From: Barry Song <[email protected]>

Profiling a system blindly with mTHP has become challenging due to the
lack of visibility into its operations. Presenting the success rate of
mTHP allocations appears to be pressing need.

Recently, I've been experiencing significant difficulty debugging
performance improvements and regressions without these figures. It's
crucial for us to understand the true effectiveness of mTHP in real-world
scenarios, especially in systems with fragmented memory.

This patch establishes the framework for per-order mTHP
counters. It begins by introducing the anon_fault_alloc and
anon_fault_fallback counters. Additionally, to maintain consistency
with thp_fault_fallback_charge in /proc/vmstat, this patch also tracks
anon_fault_fallback_charge when mem_cgroup_charge fails for mTHP.
Incorporating additional counters should now be straightforward as well.

Signed-off-by: Barry Song <[email protected]>
Cc: Chris Li <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Domenico Cerasuolo <[email protected]>
Cc: Kairui Song <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Cc: Yu Zhao <[email protected]>
---
include/linux/huge_mm.h | 21 +++++++++++++++++
mm/huge_memory.c | 52 +++++++++++++++++++++++++++++++++++++++++
mm/memory.c | 5 ++++
3 files changed, 78 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index e896ca4760f6..d4fdb2641070 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -264,6 +264,27 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
enforce_sysfs, orders);
}

+enum mthp_stat_item {
+ MTHP_STAT_ANON_FAULT_ALLOC,
+ MTHP_STAT_ANON_FAULT_FALLBACK,
+ MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
+ __MTHP_STAT_COUNT
+};
+
+struct mthp_stat {
+ unsigned long stats[ilog2(MAX_PTRS_PER_PTE) + 1][__MTHP_STAT_COUNT];
+};
+
+DECLARE_PER_CPU(struct mthp_stat, mthp_stats);
+
+static inline void count_mthp_stat(int order, enum mthp_stat_item item)
+{
+ if (order <= 0 || order > PMD_ORDER)
+ return;
+
+ this_cpu_inc(mthp_stats.stats[order][item]);
+}
+
#define transparent_hugepage_use_zero_page() \
(transparent_hugepage_flags & \
(1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG))
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index dc30139590e6..dfc38cc83a04 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -526,6 +526,48 @@ static const struct kobj_type thpsize_ktype = {
.sysfs_ops = &kobj_sysfs_ops,
};

+DEFINE_PER_CPU(struct mthp_stat, mthp_stats) = {{{0}}};
+
+static unsigned long sum_mthp_stat(int order, enum mthp_stat_item item)
+{
+ unsigned long sum = 0;
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ struct mthp_stat *this = &per_cpu(mthp_stats, cpu);
+
+ sum += this->stats[order][item];
+ }
+
+ return sum;
+}
+
+#define DEFINE_MTHP_STAT_ATTR(_name, _index) \
+static ssize_t _name##_show(struct kobject *kobj, \
+ struct kobj_attribute *attr, char *buf) \
+{ \
+ int order = to_thpsize(kobj)->order; \
+ \
+ return sysfs_emit(buf, "%lu\n", sum_mthp_stat(order, _index)); \
+} \
+static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
+
+DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC);
+DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
+DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
+
+static struct attribute *stats_attrs[] = {
+ &anon_fault_alloc_attr.attr,
+ &anon_fault_fallback_attr.attr,
+ &anon_fault_fallback_charge_attr.attr,
+ NULL,
+};
+
+static struct attribute_group stats_attr_group = {
+ .name = "stats",
+ .attrs = stats_attrs,
+};
+
static struct thpsize *thpsize_create(int order, struct kobject *parent)
{
unsigned long size = (PAGE_SIZE << order) / SZ_1K;
@@ -549,6 +591,12 @@ static struct thpsize *thpsize_create(int order, struct kobject *parent)
return ERR_PTR(ret);
}

+ ret = sysfs_create_group(&thpsize->kobj, &stats_attr_group);
+ if (ret) {
+ kobject_put(&thpsize->kobj);
+ return ERR_PTR(ret);
+ }
+
thpsize->order = order;
return thpsize;
}
@@ -880,6 +928,8 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
folio_put(folio);
count_vm_event(THP_FAULT_FALLBACK);
count_vm_event(THP_FAULT_FALLBACK_CHARGE);
+ count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK);
+ count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
return VM_FAULT_FALLBACK;
}
folio_throttle_swaprate(folio, gfp);
@@ -929,6 +979,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
mm_inc_nr_ptes(vma->vm_mm);
spin_unlock(vmf->ptl);
count_vm_event(THP_FAULT_ALLOC);
+ count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
}

@@ -1050,6 +1101,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
folio = vma_alloc_folio(gfp, HPAGE_PMD_ORDER, vma, haddr, true);
if (unlikely(!folio)) {
count_vm_event(THP_FAULT_FALLBACK);
+ count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK);
return VM_FAULT_FALLBACK;
}
return __do_huge_pmd_anonymous_page(vmf, &folio->page, gfp);
diff --git a/mm/memory.c b/mm/memory.c
index 649a547fe8e3..f31da2de19c6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4368,6 +4368,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
folio = vma_alloc_folio(gfp, order, vma, addr, true);
if (folio) {
if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
+ count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
folio_put(folio);
goto next;
}
@@ -4376,6 +4377,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
return folio;
}
next:
+ count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK);
order = next_order(&orders, order);
}

@@ -4485,6 +4487,9 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)

folio_ref_add(folio, nr_pages - 1);
add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
+#endif
folio_add_new_anon_rmap(folio, vma, addr);
folio_add_lru_vma(folio, vma);
setpte:
--
2.34.1


2024-04-12 11:49:51

by Barry Song

[permalink] [raw]
Subject: [PATCH v6 2/4] mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters

From: Barry Song <[email protected]>

This helps to display the fragmentation situation of the swapfile, knowing
the proportion of how much we haven't split large folios. So far, we only
support non-split swapout for anon memory, with the possibility of
expanding to shmem in the future. So, we add the "anon" prefix to the
counter names.

Signed-off-by: Barry Song <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Cc: Chris Li <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Domenico Cerasuolo <[email protected]>
Cc: Kairui Song <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Cc: Yu Zhao <[email protected]>
---
include/linux/huge_mm.h | 2 ++
mm/huge_memory.c | 4 ++++
mm/page_io.c | 1 +
mm/vmscan.c | 3 +++
4 files changed, 10 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index d4fdb2641070..7cd07b83a3d0 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -268,6 +268,8 @@ enum mthp_stat_item {
MTHP_STAT_ANON_FAULT_ALLOC,
MTHP_STAT_ANON_FAULT_FALLBACK,
MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
+ MTHP_STAT_ANON_SWPOUT,
+ MTHP_STAT_ANON_SWPOUT_FALLBACK,
__MTHP_STAT_COUNT
};

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index dfc38cc83a04..58f2c4745d80 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -555,11 +555,15 @@ static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC);
DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
+DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
+DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);

static struct attribute *stats_attrs[] = {
&anon_fault_alloc_attr.attr,
&anon_fault_fallback_attr.attr,
&anon_fault_fallback_charge_attr.attr,
+ &anon_swpout_attr.attr,
+ &anon_swpout_fallback_attr.attr,
NULL,
};

diff --git a/mm/page_io.c b/mm/page_io.c
index a9a7c236aecc..46c603dddf04 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -217,6 +217,7 @@ static inline void count_swpout_vm_event(struct folio *folio)
count_memcg_folio_events(folio, THP_SWPOUT, 1);
count_vm_event(THP_SWPOUT);
}
+ count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_SWPOUT);
#endif
count_vm_events(PSWPOUT, folio_nr_pages(folio));
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index bca2d9981c95..49bd94423961 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1231,6 +1231,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
goto activate_locked;
}
if (!add_to_swap(folio)) {
+ int __maybe_unused order = folio_order(folio);
+
if (!folio_test_large(folio))
goto activate_locked_split;
/* Fallback to swap normal pages */
@@ -1242,6 +1244,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
THP_SWPOUT_FALLBACK, 1);
count_vm_event(THP_SWPOUT_FALLBACK);
}
+ count_mthp_stat(order, MTHP_STAT_ANON_SWPOUT_FALLBACK);
#endif
if (!add_to_swap(folio))
goto activate_locked_split;
--
2.34.1


2024-04-12 11:50:20

by Barry Song

[permalink] [raw]
Subject: [PATCH v6 4/4] mm: correct the docs for thp_fault_alloc and thp_fault_fallback

From: Barry Song <[email protected]>

The documentation does not align with the code. In
__do_huge_pmd_anonymous_page(), THP_FAULT_FALLBACK is incremented when
mem_cgroup_charge() fails, despite the allocation succeeding, whereas
THP_FAULT_ALLOC is only incremented after a successful charge.

Signed-off-by: Barry Song <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Cc: Chris Li <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Domenico Cerasuolo <[email protected]>
Cc: Kairui Song <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: Jonathan Corbet <[email protected]>
---
Documentation/admin-guide/mm/transhuge.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index e0fe17affeb3..f82300b9193f 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -369,7 +369,7 @@ monitor how successfully the system is providing huge pages for use.

thp_fault_alloc
is incremented every time a huge page is successfully
- allocated to handle a page fault.
+ allocated and charged to handle a page fault.

thp_collapse_alloc
is incremented by khugepaged when it has found
@@ -377,7 +377,7 @@ thp_collapse_alloc
successfully allocated a new huge page to store the data.

thp_fault_fallback
- is incremented if a page fault fails to allocate
+ is incremented if a page fault fails to allocate or charge
a huge page and instead falls back to using small pages.

thp_fault_fallback_charge
--
2.34.1


2024-04-12 11:50:22

by Barry Song

[permalink] [raw]
Subject: [PATCH v6 3/4] mm: add docs for per-order mTHP counters and transhuge_page ABI

From: Barry Song <[email protected]>

This patch includes documentation for mTHP counters and an ABI file
for sys-kernel-mm-transparent-hugepage, which appears to have been
missing for some time.

Signed-off-by: Barry Song <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Cc: Chris Li <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Domenico Cerasuolo <[email protected]>
Cc: Kairui Song <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: Jonathan Corbet <[email protected]>
---
.../sys-kernel-mm-transparent-hugepage | 17 +++++++++++
Documentation/admin-guide/mm/transhuge.rst | 28 +++++++++++++++++++
2 files changed, 45 insertions(+)
create mode 100644 Documentation/ABI/testing/sys-kernel-mm-transparent-hugepage

diff --git a/Documentation/ABI/testing/sys-kernel-mm-transparent-hugepage b/Documentation/ABI/testing/sys-kernel-mm-transparent-hugepage
new file mode 100644
index 000000000000..33163eba5342
--- /dev/null
+++ b/Documentation/ABI/testing/sys-kernel-mm-transparent-hugepage
@@ -0,0 +1,17 @@
+What: /sys/kernel/mm/transparent_hugepage/
+Date: April 2024
+Contact: Linux memory management mailing list <[email protected]>
+Description:
+ /sys/kernel/mm/transparent_hugepage/ contains a number of files and
+ subdirectories,
+ - defrag
+ - enabled
+ - hpage_pmd_size
+ - khugepaged
+ - shmem_enabled
+ - use_zero_page
+ - subdirectories of the form hugepages-<size>kB, where <size>
+ is the page size of the hugepages supported by the kernel/CPU
+ combination.
+
+ See Documentation/admin-guide/mm/transhuge.rst for details.
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 04eb45a2f940..e0fe17affeb3 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -447,6 +447,34 @@ thp_swpout_fallback
Usually because failed to allocate some continuous swap space
for the huge page.

+In /sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/stats, There are
+also individual counters for each huge page size, which can be utilized to
+monitor the system's effectiveness in providing huge pages for usage. Each
+counter has its own corresponding file.
+
+anon_fault_alloc
+ is incremented every time a huge page is successfully
+ allocated and charged to handle a page fault.
+
+anon_fault_fallback
+ is incremented if a page fault fails to allocate or charge
+ a huge page and instead falls back to using huge pages with
+ lower orders or small pages.
+
+anon_fault_fallback_charge
+ is incremented if a page fault fails to charge a huge page and
+ instead falls back to using huge pages with lower orders or
+ small pages even though the allocation was successful.
+
+anon_swpout
+ is incremented every time a huge page is swapped out in one
+ piece without splitting.
+
+anon_swpout_fallback
+ is incremented if a huge page has to be split before swapout.
+ Usually because failed to allocate some continuous swap space
+ for the huge page.
+
As the system ages, allocating huge pages may be expensive as the
system uses memory compaction to copy data around memory to free a
huge page for use. There are some counters in ``/proc/vmstat`` to help
--
2.34.1


2024-04-12 11:59:27

by Ryan Roberts

[permalink] [raw]
Subject: Re: [PATCH v6 1/4] mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters

On 12/04/2024 12:48, Barry Song wrote:
> From: Barry Song <[email protected]>
>
> Profiling a system blindly with mTHP has become challenging due to the
> lack of visibility into its operations. Presenting the success rate of
> mTHP allocations appears to be pressing need.
>
> Recently, I've been experiencing significant difficulty debugging
> performance improvements and regressions without these figures. It's
> crucial for us to understand the true effectiveness of mTHP in real-world
> scenarios, especially in systems with fragmented memory.
>
> This patch establishes the framework for per-order mTHP
> counters. It begins by introducing the anon_fault_alloc and
> anon_fault_fallback counters. Additionally, to maintain consistency
> with thp_fault_fallback_charge in /proc/vmstat, this patch also tracks
> anon_fault_fallback_charge when mem_cgroup_charge fails for mTHP.
> Incorporating additional counters should now be straightforward as well.
>
> Signed-off-by: Barry Song <[email protected]>
> Cc: Chris Li <[email protected]>
> Cc: David Hildenbrand <[email protected]>
> Cc: Domenico Cerasuolo <[email protected]>
> Cc: Kairui Song <[email protected]>
> Cc: Matthew Wilcox (Oracle) <[email protected]>
> Cc: Peter Xu <[email protected]>
> Cc: Ryan Roberts <[email protected]>
> Cc: Suren Baghdasaryan <[email protected]>
> Cc: Yosry Ahmed <[email protected]>
> Cc: Yu Zhao <[email protected]>

LGTM!

Reviewed-by: Ryan Roberts <[email protected]>

> ---
> include/linux/huge_mm.h | 21 +++++++++++++++++
> mm/huge_memory.c | 52 +++++++++++++++++++++++++++++++++++++++++
> mm/memory.c | 5 ++++
> 3 files changed, 78 insertions(+)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index e896ca4760f6..d4fdb2641070 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -264,6 +264,27 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> enforce_sysfs, orders);
> }
>
> +enum mthp_stat_item {
> + MTHP_STAT_ANON_FAULT_ALLOC,
> + MTHP_STAT_ANON_FAULT_FALLBACK,
> + MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
> + __MTHP_STAT_COUNT
> +};
> +
> +struct mthp_stat {
> + unsigned long stats[ilog2(MAX_PTRS_PER_PTE) + 1][__MTHP_STAT_COUNT];
> +};
> +
> +DECLARE_PER_CPU(struct mthp_stat, mthp_stats);
> +
> +static inline void count_mthp_stat(int order, enum mthp_stat_item item)
> +{
> + if (order <= 0 || order > PMD_ORDER)
> + return;
> +
> + this_cpu_inc(mthp_stats.stats[order][item]);
> +}
> +
> #define transparent_hugepage_use_zero_page() \
> (transparent_hugepage_flags & \
> (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG))
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index dc30139590e6..dfc38cc83a04 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -526,6 +526,48 @@ static const struct kobj_type thpsize_ktype = {
> .sysfs_ops = &kobj_sysfs_ops,
> };
>
> +DEFINE_PER_CPU(struct mthp_stat, mthp_stats) = {{{0}}};
> +
> +static unsigned long sum_mthp_stat(int order, enum mthp_stat_item item)
> +{
> + unsigned long sum = 0;
> + int cpu;
> +
> + for_each_possible_cpu(cpu) {
> + struct mthp_stat *this = &per_cpu(mthp_stats, cpu);
> +
> + sum += this->stats[order][item];
> + }
> +
> + return sum;
> +}
> +
> +#define DEFINE_MTHP_STAT_ATTR(_name, _index) \
> +static ssize_t _name##_show(struct kobject *kobj, \
> + struct kobj_attribute *attr, char *buf) \
> +{ \
> + int order = to_thpsize(kobj)->order; \
> + \
> + return sysfs_emit(buf, "%lu\n", sum_mthp_stat(order, _index)); \
> +} \
> +static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
> +
> +DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC);
> +DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
> +DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
> +
> +static struct attribute *stats_attrs[] = {
> + &anon_fault_alloc_attr.attr,
> + &anon_fault_fallback_attr.attr,
> + &anon_fault_fallback_charge_attr.attr,
> + NULL,
> +};
> +
> +static struct attribute_group stats_attr_group = {
> + .name = "stats",
> + .attrs = stats_attrs,
> +};
> +
> static struct thpsize *thpsize_create(int order, struct kobject *parent)
> {
> unsigned long size = (PAGE_SIZE << order) / SZ_1K;
> @@ -549,6 +591,12 @@ static struct thpsize *thpsize_create(int order, struct kobject *parent)
> return ERR_PTR(ret);
> }
>
> + ret = sysfs_create_group(&thpsize->kobj, &stats_attr_group);
> + if (ret) {
> + kobject_put(&thpsize->kobj);
> + return ERR_PTR(ret);
> + }
> +
> thpsize->order = order;
> return thpsize;
> }
> @@ -880,6 +928,8 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
> folio_put(folio);
> count_vm_event(THP_FAULT_FALLBACK);
> count_vm_event(THP_FAULT_FALLBACK_CHARGE);
> + count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK);
> + count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
> return VM_FAULT_FALLBACK;
> }
> folio_throttle_swaprate(folio, gfp);
> @@ -929,6 +979,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
> mm_inc_nr_ptes(vma->vm_mm);
> spin_unlock(vmf->ptl);
> count_vm_event(THP_FAULT_ALLOC);
> + count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
> count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
> }
>
> @@ -1050,6 +1101,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
> folio = vma_alloc_folio(gfp, HPAGE_PMD_ORDER, vma, haddr, true);
> if (unlikely(!folio)) {
> count_vm_event(THP_FAULT_FALLBACK);
> + count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK);
> return VM_FAULT_FALLBACK;
> }
> return __do_huge_pmd_anonymous_page(vmf, &folio->page, gfp);
> diff --git a/mm/memory.c b/mm/memory.c
> index 649a547fe8e3..f31da2de19c6 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4368,6 +4368,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
> folio = vma_alloc_folio(gfp, order, vma, addr, true);
> if (folio) {
> if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
> + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
> folio_put(folio);
> goto next;
> }
> @@ -4376,6 +4377,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
> return folio;
> }
> next:
> + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK);
> order = next_order(&orders, order);
> }
>
> @@ -4485,6 +4487,9 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>
> folio_ref_add(folio, nr_pages - 1);
> add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
> +#endif
> folio_add_new_anon_rmap(folio, vma, addr);
> folio_add_lru_vma(folio, vma);
> setpte:


2024-04-12 12:54:24

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters

On 12.04.24 13:48, Barry Song wrote:
> From: Barry Song <[email protected]>
>
> The patchset introduces a framework to facilitate mTHP counters, starting
> with the allocation and swap-out counters. Currently, only four new nodes
> are appended to the stats directory for each mTHP size.
>
> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
> anon_fault_alloc
> anon_fault_fallback
> anon_fault_fallback_charge
> anon_swpout
> anon_swpout_fallback
>
> These nodes are crucial for us to monitor the fragmentation levels of
> both the buddy system and the swap partitions. In the future, we may
> consider adding additional nodes for further insights.
>
> -v6:
> * collect reviewed-by tags for patch2/4, 3/4, 4/4, Ryan;
> * move back to static array by using MAX_PTRS_PER_PTE, Ryan;
> * move to for_each_possible_cpu to handle cpu hotplug, Ryan;
> * other minor cleanups according to Ryan;

Please *really* not multiple versions of the same patch set on one a
single day.

--
Cheers,

David / dhildenb


2024-04-12 13:16:49

by Barry Song

[permalink] [raw]
Subject: Re: [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters

On Sat, Apr 13, 2024 at 12:54 AM David Hildenbrand <[email protected]> wrote:
>
> On 12.04.24 13:48, Barry Song wrote:
> > From: Barry Song <[email protected]>
> >
> > The patchset introduces a framework to facilitate mTHP counters, starting
> > with the allocation and swap-out counters. Currently, only four new nodes
> > are appended to the stats directory for each mTHP size.
> >
> > /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
> > anon_fault_alloc
> > anon_fault_fallback
> > anon_fault_fallback_charge
> > anon_swpout
> > anon_swpout_fallback
> >
> > These nodes are crucial for us to monitor the fragmentation levels of
> > both the buddy system and the swap partitions. In the future, we may
> > consider adding additional nodes for further insights.
> >
> > -v6:
> > * collect reviewed-by tags for patch2/4, 3/4, 4/4, Ryan;
> > * move back to static array by using MAX_PTRS_PER_PTE, Ryan;
> > * move to for_each_possible_cpu to handle cpu hotplug, Ryan;
> > * other minor cleanups according to Ryan;
>
> Please *really* not multiple versions of the same patch set on one a
> single day.

Ok. I will leave more time for you to review the older versions before moving
to a new version.

For v5->v6, it is quite a straightforward re-spin though I can understand
it might be a bit annoying if you got v6 while you were reading v5.

>
> --
> Cheers,
>
> David / dhildenb

Thanks
Barry

2024-04-16 08:10:47

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters

On 12.04.24 15:16, Barry Song wrote:
> On Sat, Apr 13, 2024 at 12:54 AM David Hildenbrand <[email protected]> wrote:
>>
>> On 12.04.24 13:48, Barry Song wrote:
>>> From: Barry Song <[email protected]>
>>>
>>> The patchset introduces a framework to facilitate mTHP counters, starting
>>> with the allocation and swap-out counters. Currently, only four new nodes
>>> are appended to the stats directory for each mTHP size.
>>>
>>> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
>>> anon_fault_alloc
>>> anon_fault_fallback
>>> anon_fault_fallback_charge
>>> anon_swpout
>>> anon_swpout_fallback
>>>
>>> These nodes are crucial for us to monitor the fragmentation levels of
>>> both the buddy system and the swap partitions. In the future, we may
>>> consider adding additional nodes for further insights.
>>>
>>> -v6:
>>> * collect reviewed-by tags for patch2/4, 3/4, 4/4, Ryan;
>>> * move back to static array by using MAX_PTRS_PER_PTE, Ryan;
>>> * move to for_each_possible_cpu to handle cpu hotplug, Ryan;
>>> * other minor cleanups according to Ryan;
>>
>> Please *really* not multiple versions of the same patch set on one a
>> single day.
>
> Ok. I will leave more time for you to review the older versions before moving
> to a new version.

Yes please. There is not anything gained from sending out stuff to fast,
besides mixing discussions, same questions/comments ... and effectively
more work for reviewers.

--
Cheers,

David / dhildenb


2024-04-16 08:12:23

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v6 1/4] mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters

On 12.04.24 13:48, Barry Song wrote:
> From: Barry Song <[email protected]>
>
> Profiling a system blindly with mTHP has become challenging due to the
> lack of visibility into its operations. Presenting the success rate of
> mTHP allocations appears to be pressing need.
>
> Recently, I've been experiencing significant difficulty debugging
> performance improvements and regressions without these figures. It's
> crucial for us to understand the true effectiveness of mTHP in real-world
> scenarios, especially in systems with fragmented memory.
>
> This patch establishes the framework for per-order mTHP
> counters. It begins by introducing the anon_fault_alloc and
> anon_fault_fallback counters. Additionally, to maintain consistency
> with thp_fault_fallback_charge in /proc/vmstat, this patch also tracks
> anon_fault_fallback_charge when mem_cgroup_charge fails for mTHP.
> Incorporating additional counters should now be straightforward as well.
>
> Signed-off-by: Barry Song <[email protected]>
> Cc: Chris Li <[email protected]>
> Cc: David Hildenbrand <[email protected]>
> Cc: Domenico Cerasuolo <[email protected]>
> Cc: Kairui Song <[email protected]>
> Cc: Matthew Wilcox (Oracle) <[email protected]>
> Cc: Peter Xu <[email protected]>
> Cc: Ryan Roberts <[email protected]>
> Cc: Suren Baghdasaryan <[email protected]>
> Cc: Yosry Ahmed <[email protected]>
> Cc: Yu Zhao <[email protected]>
> ---

Acked-by: David Hildenbrand <[email protected]>

--
Cheers,

David / dhildenb


2024-04-16 08:14:41

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v6 2/4] mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters

On 12.04.24 13:48, Barry Song wrote:
> From: Barry Song <[email protected]>
>
> This helps to display the fragmentation situation of the swapfile, knowing
> the proportion of how much we haven't split large folios. So far, we only
> support non-split swapout for anon memory, with the possibility of
> expanding to shmem in the future. So, we add the "anon" prefix to the
> counter names.
>
> Signed-off-by: Barry Song <[email protected]>
> Reviewed-by: Ryan Roberts <[email protected]>
> Cc: Chris Li <[email protected]>
> Cc: David Hildenbrand <[email protected]>
> Cc: Domenico Cerasuolo <[email protected]>
> Cc: Kairui Song <[email protected]>
> Cc: Matthew Wilcox (Oracle) <[email protected]>
> Cc: Peter Xu <[email protected]>
> Cc: Ryan Roberts <[email protected]>
> Cc: Suren Baghdasaryan <[email protected]>
> Cc: Yosry Ahmed <[email protected]>
> Cc: Yu Zhao <[email protected]>
> ---
> include/linux/huge_mm.h | 2 ++
> mm/huge_memory.c | 4 ++++
> mm/page_io.c | 1 +
> mm/vmscan.c | 3 +++
> 4 files changed, 10 insertions(+)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index d4fdb2641070..7cd07b83a3d0 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -268,6 +268,8 @@ enum mthp_stat_item {
> MTHP_STAT_ANON_FAULT_ALLOC,
> MTHP_STAT_ANON_FAULT_FALLBACK,
> MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
> + MTHP_STAT_ANON_SWPOUT,
> + MTHP_STAT_ANON_SWPOUT_FALLBACK,
> __MTHP_STAT_COUNT
> };
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index dfc38cc83a04..58f2c4745d80 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -555,11 +555,15 @@ static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
> DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC);
> DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
> DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
> +DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
> +DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
>
> static struct attribute *stats_attrs[] = {
> &anon_fault_alloc_attr.attr,
> &anon_fault_fallback_attr.attr,
> &anon_fault_fallback_charge_attr.attr,
> + &anon_swpout_attr.attr,
> + &anon_swpout_fallback_attr.attr,
> NULL,
> };
>
> diff --git a/mm/page_io.c b/mm/page_io.c
> index a9a7c236aecc..46c603dddf04 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -217,6 +217,7 @@ static inline void count_swpout_vm_event(struct folio *folio)
> count_memcg_folio_events(folio, THP_SWPOUT, 1);
> count_vm_event(THP_SWPOUT);
> }
> + count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_SWPOUT);
> #endif
> count_vm_events(PSWPOUT, folio_nr_pages(folio));
> }
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bca2d9981c95..49bd94423961 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1231,6 +1231,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> goto activate_locked;
> }
> if (!add_to_swap(folio)) {
> + int __maybe_unused order = folio_order(folio);
> +
> if (!folio_test_large(folio))
> goto activate_locked_split;
> /* Fallback to swap normal pages */
> @@ -1242,6 +1244,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> THP_SWPOUT_FALLBACK, 1);
> count_vm_event(THP_SWPOUT_FALLBACK);
> }
> + count_mthp_stat(order, MTHP_STAT_ANON_SWPOUT_FALLBACK);

Why the temporary variable for order?

count_mthp_stat(folio_order(order),
MTHP_STAT_ANON_SWPOUT_FALLBACK);

.. but now I do wonder if we want to pass the folio to count_mthp_stat() ?

Anyhow

Acked-by: David Hildenbrand <[email protected]>

--
Cheers,

David / dhildenb


2024-04-16 08:16:41

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v6 4/4] mm: correct the docs for thp_fault_alloc and thp_fault_fallback

On 12.04.24 13:48, Barry Song wrote:
> From: Barry Song <[email protected]>
>
> The documentation does not align with the code. In
> __do_huge_pmd_anonymous_page(), THP_FAULT_FALLBACK is incremented when
> mem_cgroup_charge() fails, despite the allocation succeeding, whereas
> THP_FAULT_ALLOC is only incremented after a successful charge.
>
> Signed-off-by: Barry Song <[email protected]>
> Reviewed-by: Ryan Roberts <[email protected]>
> Cc: Chris Li <[email protected]>
> Cc: David Hildenbrand <[email protected]>
> Cc: Domenico Cerasuolo <[email protected]>
> Cc: Kairui Song <[email protected]>
> Cc: Matthew Wilcox (Oracle) <[email protected]>
> Cc: Peter Xu <[email protected]>
> Cc: Ryan Roberts <[email protected]>
> Cc: Suren Baghdasaryan <[email protected]>
> Cc: Yosry Ahmed <[email protected]>
> Cc: Yu Zhao <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> ---

Reviewed-by: David Hildenbrand <[email protected]>

--
Cheers,

David / dhildenb


2024-04-16 08:17:37

by Barry Song

[permalink] [raw]
Subject: Re: [PATCH v6 2/4] mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters

On Tue, Apr 16, 2024 at 8:14 PM David Hildenbrand <[email protected]> wrote:
>
> On 12.04.24 13:48, Barry Song wrote:
> > From: Barry Song <[email protected]>
> >
> > This helps to display the fragmentation situation of the swapfile, knowing
> > the proportion of how much we haven't split large folios. So far, we only
> > support non-split swapout for anon memory, with the possibility of
> > expanding to shmem in the future. So, we add the "anon" prefix to the
> > counter names.
> >
> > Signed-off-by: Barry Song <[email protected]>
> > Reviewed-by: Ryan Roberts <[email protected]>
> > Cc: Chris Li <[email protected]>
> > Cc: David Hildenbrand <[email protected]>
> > Cc: Domenico Cerasuolo <[email protected]>
> > Cc: Kairui Song <[email protected]>
> > Cc: Matthew Wilcox (Oracle) <[email protected]>
> > Cc: Peter Xu <[email protected]>
> > Cc: Ryan Roberts <[email protected]>
> > Cc: Suren Baghdasaryan <[email protected]>
> > Cc: Yosry Ahmed <[email protected]>
> > Cc: Yu Zhao <[email protected]>
> > ---
> > include/linux/huge_mm.h | 2 ++
> > mm/huge_memory.c | 4 ++++
> > mm/page_io.c | 1 +
> > mm/vmscan.c | 3 +++
> > 4 files changed, 10 insertions(+)
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index d4fdb2641070..7cd07b83a3d0 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -268,6 +268,8 @@ enum mthp_stat_item {
> > MTHP_STAT_ANON_FAULT_ALLOC,
> > MTHP_STAT_ANON_FAULT_FALLBACK,
> > MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
> > + MTHP_STAT_ANON_SWPOUT,
> > + MTHP_STAT_ANON_SWPOUT_FALLBACK,
> > __MTHP_STAT_COUNT
> > };
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index dfc38cc83a04..58f2c4745d80 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -555,11 +555,15 @@ static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
> > DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC);
> > DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
> > DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
> > +DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
> > +DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
> >
> > static struct attribute *stats_attrs[] = {
> > &anon_fault_alloc_attr.attr,
> > &anon_fault_fallback_attr.attr,
> > &anon_fault_fallback_charge_attr.attr,
> > + &anon_swpout_attr.attr,
> > + &anon_swpout_fallback_attr.attr,
> > NULL,
> > };
> >
> > diff --git a/mm/page_io.c b/mm/page_io.c
> > index a9a7c236aecc..46c603dddf04 100644
> > --- a/mm/page_io.c
> > +++ b/mm/page_io.c
> > @@ -217,6 +217,7 @@ static inline void count_swpout_vm_event(struct folio *folio)
> > count_memcg_folio_events(folio, THP_SWPOUT, 1);
> > count_vm_event(THP_SWPOUT);
> > }
> > + count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_SWPOUT);
> > #endif
> > count_vm_events(PSWPOUT, folio_nr_pages(folio));
> > }
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index bca2d9981c95..49bd94423961 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1231,6 +1231,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> > goto activate_locked;
> > }
> > if (!add_to_swap(folio)) {
> > + int __maybe_unused order = folio_order(folio);
> > +
> > if (!folio_test_large(folio))
> > goto activate_locked_split;
> > /* Fallback to swap normal pages */
> > @@ -1242,6 +1244,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> > THP_SWPOUT_FALLBACK, 1);
> > count_vm_event(THP_SWPOUT_FALLBACK);
> > }
> > + count_mthp_stat(order, MTHP_STAT_ANON_SWPOUT_FALLBACK);
>
> Why the temporary variable for order?
>
> count_mthp_stat(folio_order(order),
> MTHP_STAT_ANON_SWPOUT_FALLBACK);
>
> ... but now I do wonder if we want to pass the folio to count_mthp_stat() ?

because we have called split_folio_to_list() before counting. that is also
why Ryan is using if (nr_pages >= HPAGE_PMD_NR) but not pmd_mappable.


>
> Anyhow
>
> Acked-by: David Hildenbrand <[email protected]>

thanks!

>
> --
> Cheers,
>
> David / dhildenb
>
Barry

2024-04-16 08:17:50

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v6 2/4] mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters

On 16.04.24 10:14, David Hildenbrand wrote:
> On 12.04.24 13:48, Barry Song wrote:
>> From: Barry Song <[email protected]>
>>
>> This helps to display the fragmentation situation of the swapfile, knowing
>> the proportion of how much we haven't split large folios. So far, we only
>> support non-split swapout for anon memory, with the possibility of
>> expanding to shmem in the future. So, we add the "anon" prefix to the
>> counter names.
>>
>> Signed-off-by: Barry Song <[email protected]>
>> Reviewed-by: Ryan Roberts <[email protected]>
>> Cc: Chris Li <[email protected]>
>> Cc: David Hildenbrand <[email protected]>
>> Cc: Domenico Cerasuolo <[email protected]>
>> Cc: Kairui Song <[email protected]>
>> Cc: Matthew Wilcox (Oracle) <[email protected]>
>> Cc: Peter Xu <[email protected]>
>> Cc: Ryan Roberts <[email protected]>
>> Cc: Suren Baghdasaryan <[email protected]>
>> Cc: Yosry Ahmed <[email protected]>
>> Cc: Yu Zhao <[email protected]>
>> ---
>> include/linux/huge_mm.h | 2 ++
>> mm/huge_memory.c | 4 ++++
>> mm/page_io.c | 1 +
>> mm/vmscan.c | 3 +++
>> 4 files changed, 10 insertions(+)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index d4fdb2641070..7cd07b83a3d0 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -268,6 +268,8 @@ enum mthp_stat_item {
>> MTHP_STAT_ANON_FAULT_ALLOC,
>> MTHP_STAT_ANON_FAULT_FALLBACK,
>> MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
>> + MTHP_STAT_ANON_SWPOUT,
>> + MTHP_STAT_ANON_SWPOUT_FALLBACK,
>> __MTHP_STAT_COUNT
>> };
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index dfc38cc83a04..58f2c4745d80 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -555,11 +555,15 @@ static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
>> DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC);
>> DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
>> DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
>> +DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
>> +DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
>>
>> static struct attribute *stats_attrs[] = {
>> &anon_fault_alloc_attr.attr,
>> &anon_fault_fallback_attr.attr,
>> &anon_fault_fallback_charge_attr.attr,
>> + &anon_swpout_attr.attr,
>> + &anon_swpout_fallback_attr.attr,
>> NULL,
>> };
>>
>> diff --git a/mm/page_io.c b/mm/page_io.c
>> index a9a7c236aecc..46c603dddf04 100644
>> --- a/mm/page_io.c
>> +++ b/mm/page_io.c
>> @@ -217,6 +217,7 @@ static inline void count_swpout_vm_event(struct folio *folio)
>> count_memcg_folio_events(folio, THP_SWPOUT, 1);
>> count_vm_event(THP_SWPOUT);
>> }
>> + count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_SWPOUT);
>> #endif
>> count_vm_events(PSWPOUT, folio_nr_pages(folio));
>> }
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index bca2d9981c95..49bd94423961 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1231,6 +1231,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>> goto activate_locked;
>> }
>> if (!add_to_swap(folio)) {
>> + int __maybe_unused order = folio_order(folio);
>> +
>> if (!folio_test_large(folio))
>> goto activate_locked_split;
>> /* Fallback to swap normal pages */
>> @@ -1242,6 +1244,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>> THP_SWPOUT_FALLBACK, 1);
>> count_vm_event(THP_SWPOUT_FALLBACK);
>> }
>> + count_mthp_stat(order, MTHP_STAT_ANON_SWPOUT_FALLBACK);
>
> Why the temporary variable for order?
>
> count_mthp_stat(folio_order(order),
> MTHP_STAT_ANON_SWPOUT_FALLBACK);
>
> ... but now I do wonder if we want to pass the folio to count_mthp_stat() ?

.. and now realizing, that that doesn't make sense if we fail to
allocate the folio in the first place. So all good.

--
Cheers,

David / dhildenb


2024-04-16 08:28:33

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v6 3/4] mm: add docs for per-order mTHP counters and transhuge_page ABI

On 12.04.24 13:48, Barry Song wrote:
> From: Barry Song <[email protected]>
>
> This patch includes documentation for mTHP counters and an ABI file
> for sys-kernel-mm-transparent-hugepage, which appears to have been
> missing for some time.
>
> Signed-off-by: Barry Song <[email protected]>
> Reviewed-by: Ryan Roberts <[email protected]>
> Cc: Chris Li <[email protected]>
> Cc: David Hildenbrand <[email protected]>
> Cc: Domenico Cerasuolo <[email protected]>
> Cc: Kairui Song <[email protected]>
> Cc: Matthew Wilcox (Oracle) <[email protected]>
> Cc: Peter Xu <[email protected]>
> Cc: Ryan Roberts <[email protected]>
> Cc: Suren Baghdasaryan <[email protected]>
> Cc: Yosry Ahmed <[email protected]>
> Cc: Yu Zhao <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> ---

Reviewed-by: David Hildenbrand <[email protected]>

--
Cheers,

David / dhildenb