Anonymous pages have already been supported for multi-size (mTHP) allocation
through commit 19eaf44954df, that can allow THP to be configured through the
sysfs interface located at '/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled'.
However, the anonymous shmem will ignore the anonymous mTHP rule configured
through the sysfs interface, and can only use the PMD-mapped THP, that is not
reasonable. Many implement anonymous page sharing through mmap(MAP_SHARED |
MAP_ANONYMOUS), especially in database usage scenarios, therefore, users expect
to apply an unified mTHP strategy for anonymous pages, also including the
anonymous shared pages, in order to enjoy the benefits of mTHP. For example,
lower latency than PMD-mapped THP, smaller memory bloat than PMD-mapped THP,
contiguous PTEs on ARM architecture to reduce TLB miss etc.
The primary strategy is similar to supporting anonymous mTHP. Introduce
a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled',
which can have all the same values as the top-level
'/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new
additional "inherit" option. By default all sizes will be set to "never"
except PMD size, which is set to "inherit". This ensures backward compatibility
with the anonymous shmem enabled of the top level, meanwhile also allows
independent control of anonymous shmem enabled for each mTHP.
Use the page fault latency tool to measure the performance of 1G anonymous shmem
with 32 threads on my machine environment with: ARM64 Architecture, 32 cores,
125G memory:
base: mm-unstable
user-time sys_time faults_per_sec_per_cpu faults_per_sec
0.04s 3.10s 83516.416 2669684.890
mm-unstable + patchset, anon shmem mTHP disabled
user-time sys_time faults_per_sec_per_cpu faults_per_sec
0.02s 3.14s 82936.359 2630746.027
mm-unstable + patchset, anon shmem 64K mTHP enabled
user-time sys_time faults_per_sec_per_cpu faults_per_sec
0.08s 0.31s 678630.231 17082522.495
From the data above, it is observed that the patchset has a minimal impact when
mTHP is not enabled (some fluctuations observed during testing). When enabling 64K
mTHP, there is a significant improvement of the page fault latency.
TODO:
- Support mTHP for tmpfs (?).
- Do not split the large folio when share memory swap out.
- Can swap in a large folio for share memory.
Changes from v1:
- Drop the patch that re-arranges the position of highest_order() and
next_order(), per Ryan.
- Modify the finish_fault() to fix VA alignment issue, per Ryan and
David.
- Fix some building issues, reported by Lance and kernel test robot.
- Update some commit message.
Changes from RFC:
- Rebase the patch set against the new mm-unstable branch, per Lance.
- Add a new patch to export highest_order() and next_order().
- Add a new patch to align mTHP size in shmem_get_unmapped_area().
- Handle the uffd case and the VMA limits case when building mapping for
large folio in the finish_fault() function, per Ryan.
- Remove unnecessary 'order' variable in patch 3, per Kefeng.
- Keep the anon shmem counters' name consistency.
- Modify the strategy to support mTHP for anonymous shmem, discussed with
Ryan and David.
- Add reviewed tag from Barry.
- Update the commit message.
Baolin Wang (7):
mm: memory: extend finish_fault() to support large folio
mm: shmem: add an 'order' parameter for shmem_alloc_hugefolio()
mm: shmem: add THP validation for PMD-mapped THP related statistics
mm: shmem: add multi-size THP sysfs interface for anonymous shmem
mm: shmem: add mTHP support for anonymous shmem
mm: shmem: add mTHP size alignment in shmem_get_unmapped_area
mm: shmem: add mTHP counters for anonymous shmem
Documentation/admin-guide/mm/transhuge.rst | 29 ++
include/linux/huge_mm.h | 23 ++
mm/huge_memory.c | 17 +-
mm/memory.c | 58 +++-
mm/shmem.c | 338 ++++++++++++++++++---
5 files changed, 403 insertions(+), 62 deletions(-)
--
2.39.3
In order to extend support for mTHP, add THP validation for PMD-mapped THP
related statistics to avoid statistical confusion.
Signed-off-by: Baolin Wang <[email protected]>
Reviewed-by: Barry Song <[email protected]>
---
mm/shmem.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index e4483c4596a8..a383ea9a89a5 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1661,7 +1661,7 @@ static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
return ERR_PTR(-E2BIG);
folio = shmem_alloc_hugefolio(gfp, info, index, HPAGE_PMD_ORDER);
- if (!folio)
+ if (!folio && pages == HPAGE_PMD_NR)
count_vm_event(THP_FILE_FALLBACK);
} else {
pages = 1;
@@ -1679,7 +1679,7 @@ static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
if (xa_find(&mapping->i_pages, &index,
index + pages - 1, XA_PRESENT)) {
error = -EEXIST;
- } else if (huge) {
+ } else if (pages == HPAGE_PMD_NR) {
count_vm_event(THP_FILE_FALLBACK);
count_vm_event(THP_FILE_FALLBACK_CHARGE);
}
@@ -2045,7 +2045,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
folio = shmem_alloc_and_add_folio(huge_gfp,
inode, index, fault_mm, true);
if (!IS_ERR(folio)) {
- count_vm_event(THP_FILE_ALLOC);
+ if (folio_test_pmd_mappable(folio))
+ count_vm_event(THP_FILE_ALLOC);
goto alloced;
}
if (PTR_ERR(folio) == -EEXIST)
--
2.39.3
Commit 19eaf44954df adds multi-size THP (mTHP) for anonymous pages, that
can allow THP to be configured through the sysfs interface located at
'/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled'.
However, the anonymous share pages will ignore the anonymous mTHP rule
configured through the sysfs interface, and can only use the PMD-mapped
THP, that is not reasonable. Users expect to apply the mTHP rule for
all anonymous pages, including the anonymous share pages, in order to
enjoy the benefits of mTHP. For example, lower latency than PMD-mapped THP,
smaller memory bloat than PMD-mapped THP, contiguous PTEs on ARM architecture
to reduce TLB miss etc.
The primary strategy is similar to supporting anonymous mTHP. Introduce
a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled',
which can have all the same values as the top-level
'/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new
additional "inherit" option. By default all sizes will be set to "never"
except PMD size, which is set to "inherit". This ensures backward compatibility
with the anonymous shmem enabled of the top level, meanwhile also allows
independent control of anonymous shmem enabled for each mTHP.
Signed-off-by: Baolin Wang <[email protected]>
---
include/linux/huge_mm.h | 10 +++
mm/shmem.c | 179 +++++++++++++++++++++++++++++++++-------
2 files changed, 161 insertions(+), 28 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 1fce6fee7766..b5339210268d 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -583,6 +583,16 @@ static inline bool thp_migration_supported(void)
{
return false;
}
+
+static inline int highest_order(unsigned long orders)
+{
+ return 0;
+}
+
+static inline int next_order(unsigned long *orders, int prev)
+{
+ return 0;
+}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
static inline int split_folio_to_list_to_order(struct folio *folio,
diff --git a/mm/shmem.c b/mm/shmem.c
index 59cc26d44344..b50ddf013e37 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1611,6 +1611,106 @@ static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp)
return result;
}
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static unsigned long anon_shmem_allowable_huge_orders(struct inode *inode,
+ struct vm_area_struct *vma, pgoff_t index,
+ bool global_huge)
+{
+ unsigned long mask = READ_ONCE(huge_anon_shmem_orders_always);
+ unsigned long within_size_orders = READ_ONCE(huge_anon_shmem_orders_within_size);
+ unsigned long vm_flags = vma->vm_flags;
+ /*
+ * Check all the (large) orders below HPAGE_PMD_ORDER + 1 that
+ * are enabled for this vma.
+ */
+ unsigned long orders = BIT(PMD_ORDER + 1) - 1;
+ loff_t i_size;
+ int order;
+
+ if ((vm_flags & VM_NOHUGEPAGE) ||
+ test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
+ return 0;
+
+ /* If the hardware/firmware marked hugepage support disabled. */
+ if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED))
+ return 0;
+
+ /*
+ * Following the 'deny' semantics of the top level, force the huge
+ * option off from all mounts.
+ */
+ if (shmem_huge == SHMEM_HUGE_DENY)
+ return 0;
+ /*
+ * Only allow inherit orders if the top-level value is 'force', which
+ * means non-PMD sized THP can not override 'huge' mount option now.
+ */
+ if (shmem_huge == SHMEM_HUGE_FORCE)
+ return READ_ONCE(huge_anon_shmem_orders_inherit);
+
+ /* Allow mTHP that will be fully within i_size. */
+ order = highest_order(within_size_orders);
+ while (within_size_orders) {
+ index = round_up(index + 1, order);
+ i_size = round_up(i_size_read(inode), PAGE_SIZE);
+ if (i_size >> PAGE_SHIFT >= index) {
+ mask |= within_size_orders;
+ break;
+ }
+
+ order = next_order(&within_size_orders, order);
+ }
+
+ if (vm_flags & VM_HUGEPAGE)
+ mask |= READ_ONCE(huge_anon_shmem_orders_madvise);
+
+ if (global_huge)
+ mask |= READ_ONCE(huge_anon_shmem_orders_inherit);
+
+ return orders & mask;
+}
+
+static unsigned long anon_shmem_suitable_orders(struct inode *inode, struct vm_fault *vmf,
+ struct address_space *mapping, pgoff_t index,
+ unsigned long orders)
+{
+ struct vm_area_struct *vma = vmf->vma;
+ unsigned long pages;
+ int order;
+
+ orders = thp_vma_suitable_orders(vma, vmf->address, orders);
+ if (!orders)
+ return 0;
+
+ /* Find the highest order that can add into the page cache */
+ order = highest_order(orders);
+ while (orders) {
+ pages = 1UL << order;
+ index = round_down(index, pages);
+ if (!xa_find(&mapping->i_pages, &index,
+ index + pages - 1, XA_PRESENT))
+ break;
+ order = next_order(&orders, order);
+ }
+
+ return orders;
+}
+#else
+static unsigned long anon_shmem_allowable_huge_orders(struct inode *inode,
+ struct vm_area_struct *vma, pgoff_t index,
+ bool global_huge)
+{
+ return 0;
+}
+
+static unsigned long anon_shmem_suitable_orders(struct inode *inode, struct vm_fault *vmf,
+ struct address_space *mapping, pgoff_t index,
+ unsigned long orders)
+{
+ return 0;
+}
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
static struct folio *shmem_alloc_hugefolio(gfp_t gfp,
struct shmem_inode_info *info, pgoff_t index, int order)
{
@@ -1639,38 +1739,55 @@ static struct folio *shmem_alloc_folio(gfp_t gfp,
return (struct folio *)page;
}
-static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
- struct inode *inode, pgoff_t index,
- struct mm_struct *fault_mm, bool huge)
+static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
+ gfp_t gfp, struct inode *inode, pgoff_t index,
+ struct mm_struct *fault_mm, bool huge, unsigned long orders)
{
struct address_space *mapping = inode->i_mapping;
struct shmem_inode_info *info = SHMEM_I(inode);
- struct folio *folio;
+ struct vm_area_struct *vma = vmf ? vmf->vma : NULL;
+ unsigned long suitable_orders;
+ struct folio *folio = NULL;
long pages;
- int error;
+ int error, order;
if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
huge = false;
- if (huge) {
- pages = HPAGE_PMD_NR;
- index = round_down(index, HPAGE_PMD_NR);
+ if (huge || orders > 0) {
+ if (vma && vma_is_anon_shmem(vma) && orders) {
+ suitable_orders = anon_shmem_suitable_orders(inode, vmf,
+ mapping, index, orders);
+ } else {
+ pages = HPAGE_PMD_NR;
+ suitable_orders = BIT(HPAGE_PMD_ORDER);
+ index = round_down(index, HPAGE_PMD_NR);
- /*
- * Check for conflict before waiting on a huge allocation.
- * Conflict might be that a huge page has just been allocated
- * and added to page cache by a racing thread, or that there
- * is already at least one small page in the huge extent.
- * Be careful to retry when appropriate, but not forever!
- * Elsewhere -EEXIST would be the right code, but not here.
- */
- if (xa_find(&mapping->i_pages, &index,
+ /*
+ * Check for conflict before waiting on a huge allocation.
+ * Conflict might be that a huge page has just been allocated
+ * and added to page cache by a racing thread, or that there
+ * is already at least one small page in the huge extent.
+ * Be careful to retry when appropriate, but not forever!
+ * Elsewhere -EEXIST would be the right code, but not here.
+ */
+ if (xa_find(&mapping->i_pages, &index,
index + HPAGE_PMD_NR - 1, XA_PRESENT))
- return ERR_PTR(-E2BIG);
+ return ERR_PTR(-E2BIG);
+ }
- folio = shmem_alloc_hugefolio(gfp, info, index, HPAGE_PMD_ORDER);
- if (!folio && pages == HPAGE_PMD_NR)
- count_vm_event(THP_FILE_FALLBACK);
+ order = highest_order(suitable_orders);
+ while (suitable_orders) {
+ pages = 1 << order;
+ index = round_down(index, pages);
+ folio = shmem_alloc_hugefolio(gfp, info, index, order);
+ if (folio)
+ goto allocated;
+
+ if (pages == HPAGE_PMD_NR)
+ count_vm_event(THP_FILE_FALLBACK);
+ order = next_order(&suitable_orders, order);
+ }
} else {
pages = 1;
folio = shmem_alloc_folio(gfp, info, index);
@@ -1678,6 +1795,7 @@ static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
if (!folio)
return ERR_PTR(-ENOMEM);
+allocated:
__folio_set_locked(folio);
__folio_set_swapbacked(folio);
@@ -1972,7 +2090,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
struct mm_struct *fault_mm;
struct folio *folio;
int error;
- bool alloced;
+ bool alloced, huge;
+ unsigned long orders = 0;
if (WARN_ON_ONCE(!shmem_mapping(inode->i_mapping)))
return -EINVAL;
@@ -2044,14 +2163,18 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
return 0;
}
- if (shmem_is_huge(inode, index, false, fault_mm,
- vma ? vma->vm_flags : 0)) {
+ huge = shmem_is_huge(inode, index, false, fault_mm,
+ vma ? vma->vm_flags : 0);
+ /* Find hugepage orders that are allowed for anonymous shmem. */
+ if (vma && vma_is_anon_shmem(vma))
+ orders = anon_shmem_allowable_huge_orders(inode, vma, index, huge);
+ if (huge || orders > 0) {
gfp_t huge_gfp;
huge_gfp = vma_thp_gfp_mask(vma);
huge_gfp = limit_gfp_mask(huge_gfp, gfp);
- folio = shmem_alloc_and_add_folio(huge_gfp,
- inode, index, fault_mm, true);
+ folio = shmem_alloc_and_add_folio(vmf, huge_gfp,
+ inode, index, fault_mm, true, orders);
if (!IS_ERR(folio)) {
if (folio_test_pmd_mappable(folio))
count_vm_event(THP_FILE_ALLOC);
@@ -2061,7 +2184,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
goto repeat;
}
- folio = shmem_alloc_and_add_folio(gfp, inode, index, fault_mm, false);
+ folio = shmem_alloc_and_add_folio(vmf, gfp, inode, index, fault_mm, false, 0);
if (IS_ERR(folio)) {
error = PTR_ERR(folio);
if (error == -EEXIST)
@@ -2072,7 +2195,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
alloced:
alloced = true;
- if (folio_test_pmd_mappable(folio) &&
+ if (folio_test_large(folio) &&
DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE) <
folio_next_index(folio) - 1) {
struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
--
2.39.3
Add large folio mapping establishment support for finish_fault() as a preparation,
to support multi-size THP allocation of anonymous shmem pages in the following
patches.
Signed-off-by: Baolin Wang <[email protected]>
---
mm/memory.c | 58 ++++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 48 insertions(+), 10 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index eea6e4984eae..f5ffe012556c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4747,9 +4747,12 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
struct page *page;
+ struct folio *folio;
vm_fault_t ret;
bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) &&
!(vma->vm_flags & VM_SHARED);
+ int type, nr_pages, i;
+ unsigned long addr = vmf->address;
/* Did we COW the page? */
if (is_cow)
@@ -4780,24 +4783,59 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
return VM_FAULT_OOM;
}
+ folio = page_folio(page);
+ nr_pages = folio_nr_pages(folio);
+
+ /*
+ * Using per-page fault to maintain the uffd semantics, and same
+ * approach also applies to non-anonymous-shmem faults to avoid
+ * inflating the RSS of the process.
+ */
+ if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma))) {
+ nr_pages = 1;
+ } else if (nr_pages > 1) {
+ pgoff_t idx = folio_page_idx(folio, page);
+ /* The page offset of vmf->address within the VMA. */
+ pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
+
+ /*
+ * Fallback to per-page fault in case the folio size in page
+ * cache beyond the VMA limits.
+ */
+ if (unlikely(vma_off < idx ||
+ vma_off + (nr_pages - idx) > vma_pages(vma))) {
+ nr_pages = 1;
+ } else {
+ /* Now we can set mappings for the whole large folio. */
+ addr = vmf->address - idx * PAGE_SIZE;
+ page = &folio->page;
+ }
+ }
+
vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
- vmf->address, &vmf->ptl);
+ addr, &vmf->ptl);
if (!vmf->pte)
return VM_FAULT_NOPAGE;
/* Re-check under ptl */
- if (likely(!vmf_pte_changed(vmf))) {
- struct folio *folio = page_folio(page);
- int type = is_cow ? MM_ANONPAGES : mm_counter_file(folio);
-
- set_pte_range(vmf, folio, page, 1, vmf->address);
- add_mm_counter(vma->vm_mm, type, 1);
- ret = 0;
- } else {
- update_mmu_tlb(vma, vmf->address, vmf->pte);
+ if (nr_pages == 1 && unlikely(vmf_pte_changed(vmf))) {
+ update_mmu_tlb(vma, addr, vmf->pte);
+ ret = VM_FAULT_NOPAGE;
+ goto unlock;
+ } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) {
+ for (i = 0; i < nr_pages; i++)
+ update_mmu_tlb(vma, addr + PAGE_SIZE * i, vmf->pte + i);
ret = VM_FAULT_NOPAGE;
+ goto unlock;
}
+ folio_ref_add(folio, nr_pages - 1);
+ set_pte_range(vmf, folio, page, nr_pages, addr);
+ type = is_cow ? MM_ANONPAGES : mm_counter_file(folio);
+ add_mm_counter(vma->vm_mm, type, nr_pages);
+ ret = 0;
+
+unlock:
pte_unmap_unlock(vmf->pte, vmf->ptl);
return ret;
}
--
2.39.3
Although the top-level hugepage allocation can be turned off, anonymous shmem
can still use mTHP by configuring the sysfs interface located at
'/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled'. Therefore,
add alignment for mTHP size to provide a suitable alignment address in
shmem_get_unmapped_area().
Signed-off-by: Baolin Wang <[email protected]>
Tested-by: Lance Yang <[email protected]>
---
mm/shmem.c | 36 +++++++++++++++++++++++++++---------
1 file changed, 27 insertions(+), 9 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index b50ddf013e37..8b020ff09c72 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2404,6 +2404,7 @@ unsigned long shmem_get_unmapped_area(struct file *file,
unsigned long inflated_len;
unsigned long inflated_addr;
unsigned long inflated_offset;
+ unsigned long hpage_size;
if (len > TASK_SIZE)
return -ENOMEM;
@@ -2422,8 +2423,6 @@ unsigned long shmem_get_unmapped_area(struct file *file,
if (shmem_huge == SHMEM_HUGE_DENY)
return addr;
- if (len < HPAGE_PMD_SIZE)
- return addr;
if (flags & MAP_FIXED)
return addr;
/*
@@ -2435,8 +2434,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
if (uaddr == addr)
return addr;
+ hpage_size = HPAGE_PMD_SIZE;
if (shmem_huge != SHMEM_HUGE_FORCE) {
struct super_block *sb;
+ unsigned long __maybe_unused hpage_orders;
+ int order = 0;
if (file) {
VM_BUG_ON(file->f_op != &shmem_file_operations);
@@ -2449,18 +2451,34 @@ unsigned long shmem_get_unmapped_area(struct file *file,
if (IS_ERR(shm_mnt))
return addr;
sb = shm_mnt->mnt_sb;
+
+ /*
+ * Find the highest mTHP order used for anonymous shmem to
+ * provide a suitable alignment address.
+ */
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ hpage_orders = READ_ONCE(huge_anon_shmem_orders_always);
+ hpage_orders |= READ_ONCE(huge_anon_shmem_orders_within_size);
+ hpage_orders |= READ_ONCE(huge_anon_shmem_orders_madvise);
+ hpage_orders |= READ_ONCE(huge_anon_shmem_orders_inherit);
+ order = highest_order(hpage_orders);
+ hpage_size = PAGE_SIZE << order;
+#endif
}
- if (SHMEM_SB(sb)->huge == SHMEM_HUGE_NEVER)
+ if (SHMEM_SB(sb)->huge == SHMEM_HUGE_NEVER && !order)
return addr;
}
- offset = (pgoff << PAGE_SHIFT) & (HPAGE_PMD_SIZE-1);
- if (offset && offset + len < 2 * HPAGE_PMD_SIZE)
+ if (len < hpage_size)
+ return addr;
+
+ offset = (pgoff << PAGE_SHIFT) & (hpage_size - 1);
+ if (offset && offset + len < 2 * hpage_size)
return addr;
- if ((addr & (HPAGE_PMD_SIZE-1)) == offset)
+ if ((addr & (hpage_size - 1)) == offset)
return addr;
- inflated_len = len + HPAGE_PMD_SIZE - PAGE_SIZE;
+ inflated_len = len + hpage_size - PAGE_SIZE;
if (inflated_len > TASK_SIZE)
return addr;
if (inflated_len < len)
@@ -2473,10 +2491,10 @@ unsigned long shmem_get_unmapped_area(struct file *file,
if (inflated_addr & ~PAGE_MASK)
return addr;
- inflated_offset = inflated_addr & (HPAGE_PMD_SIZE-1);
+ inflated_offset = inflated_addr & (hpage_size - 1);
inflated_addr += offset - inflated_offset;
if (inflated_offset > offset)
- inflated_addr += HPAGE_PMD_SIZE;
+ inflated_addr += hpage_size;
if (inflated_addr > TASK_SIZE - len)
return addr;
--
2.39.3
Add a new parameter to specify the huge page order for shmem_alloc_hugefolio(),
as a preparation to supoort mTHP.
Signed-off-by: Baolin Wang <[email protected]>
---
mm/shmem.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index fa2a0ed97507..e4483c4596a8 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1604,14 +1604,14 @@ static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp)
}
static struct folio *shmem_alloc_hugefolio(gfp_t gfp,
- struct shmem_inode_info *info, pgoff_t index)
+ struct shmem_inode_info *info, pgoff_t index, int order)
{
struct mempolicy *mpol;
pgoff_t ilx;
struct page *page;
- mpol = shmem_get_pgoff_policy(info, index, HPAGE_PMD_ORDER, &ilx);
- page = alloc_pages_mpol(gfp, HPAGE_PMD_ORDER, mpol, ilx, numa_node_id());
+ mpol = shmem_get_pgoff_policy(info, index, order, &ilx);
+ page = alloc_pages_mpol(gfp, order, mpol, ilx, numa_node_id());
mpol_cond_put(mpol);
return page_rmappable_folio(page);
@@ -1660,7 +1660,7 @@ static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
index + HPAGE_PMD_NR - 1, XA_PRESENT))
return ERR_PTR(-E2BIG);
- folio = shmem_alloc_hugefolio(gfp, info, index);
+ folio = shmem_alloc_hugefolio(gfp, info, index, HPAGE_PMD_ORDER);
if (!folio)
count_vm_event(THP_FILE_FALLBACK);
} else {
--
2.39.3
To support the use of mTHP with anonymous shmem, add a new sysfs interface
'shmem_enabled' in the '/sys/kernel/mm/transparent_hugepage/hugepages-kB/'
directory for each mTHP to control whether shmem is enabled for that mTHP,
with a value similar to the top level 'shmem_enabled', which can be set to:
"always", "inherit (to inherit the top level setting)", "within_size", "advise",
"never", "deny", "force". These values follow the same semantics as the top
level, except the 'deny' is equivalent to 'never', and 'force' is equivalent
to 'always' to keep compatibility.
By default, PMD-sized hugepages have enabled="inherit" and all other hugepage
sizes have enabled="never" for '/sys/kernel/mm/transparent_hugepage/hugepages-xxkB/shmem_enabled'.
In addition, if top level value is 'force', then only PMD-sized hugepages
have enabled="inherit", otherwise configuration will be failed and vice versa.
That means now we will avoid using non-PMD sized THP to override the global
huge allocation.
Signed-off-by: Baolin Wang <[email protected]>
---
Documentation/admin-guide/mm/transhuge.rst | 29 +++++++
include/linux/huge_mm.h | 10 +++
mm/huge_memory.c | 11 +--
mm/shmem.c | 96 ++++++++++++++++++++++
4 files changed, 138 insertions(+), 8 deletions(-)
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 076443cc10a6..a28496e15bdb 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -332,6 +332,35 @@ deny
force
Force the huge option on for all - very useful for testing;
+Anonymous shmem can also use "multi-size THP" (mTHP) by adding a new sysfs knob
+to control mTHP allocation: /sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/shmem_enabled.
+Its value for each mTHP is essentially consistent with the global setting, except
+for the addition of 'inherit' to ensure compatibility with the global settings.
+always
+ Attempt to allocate <size> huge pages every time we need a new page;
+
+inherit
+ Inherit the top-level "shmem_enabled" value. By default, PMD-sized hugepages
+ have enabled="inherit" and all other hugepage sizes have enabled="never";
+
+never
+ Do not allocate <size> huge pages;
+
+within_size
+ Only allocate <size> huge page if it will be fully within i_size.
+ Also respect fadvise()/madvise() hints;
+
+advise
+ Only allocate <size> huge pages if requested with fadvise()/madvise();
+
+deny
+ Has the same semantics as 'never', now mTHP allocation policy is only
+ used for anonymous shmem and no not override tmpfs.
+
+force
+ Has the same semantics as 'always', now mTHP allocation policy is only
+ used for anonymous shmem and no not override tmpfs.
+
Need of application restart
===========================
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 017cee864080..1fce6fee7766 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -6,6 +6,7 @@
#include <linux/mm_types.h>
#include <linux/fs.h> /* only for vma_is_dax() */
+#include <linux/kobject.h>
vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf);
int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
@@ -63,6 +64,7 @@ ssize_t single_hugepage_flag_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf,
enum transparent_hugepage_flag flag);
extern struct kobj_attribute shmem_enabled_attr;
+extern struct kobj_attribute thpsize_shmem_enabled_attr;
/*
* Mask of all large folio orders supported for anonymous THP; all orders up to
@@ -265,6 +267,14 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
}
+struct thpsize {
+ struct kobject kobj;
+ struct list_head node;
+ int order;
+};
+
+#define to_thpsize(kobj) container_of(kobj, struct thpsize, kobj)
+
enum mthp_stat_item {
MTHP_STAT_ANON_FAULT_ALLOC,
MTHP_STAT_ANON_FAULT_FALLBACK,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9efb6fefc391..d3080a8843f2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -449,14 +449,6 @@ static void thpsize_release(struct kobject *kobj);
static DEFINE_SPINLOCK(huge_anon_orders_lock);
static LIST_HEAD(thpsize_list);
-struct thpsize {
- struct kobject kobj;
- struct list_head node;
- int order;
-};
-
-#define to_thpsize(kobj) container_of(kobj, struct thpsize, kobj)
-
static ssize_t thpsize_enabled_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
{
@@ -517,6 +509,9 @@ static struct kobj_attribute thpsize_enabled_attr =
static struct attribute *thpsize_attrs[] = {
&thpsize_enabled_attr.attr,
+#ifdef CONFIG_SHMEM
+ &thpsize_shmem_enabled_attr.attr,
+#endif
NULL,
};
diff --git a/mm/shmem.c b/mm/shmem.c
index a383ea9a89a5..59cc26d44344 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -131,6 +131,14 @@ struct shmem_options {
#define SHMEM_SEEN_QUOTA 32
};
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static unsigned long huge_anon_shmem_orders_always __read_mostly;
+static unsigned long huge_anon_shmem_orders_madvise __read_mostly;
+static unsigned long huge_anon_shmem_orders_inherit __read_mostly;
+static unsigned long huge_anon_shmem_orders_within_size __read_mostly;
+static DEFINE_SPINLOCK(huge_anon_shmem_orders_lock);
+#endif
+
#ifdef CONFIG_TMPFS
static unsigned long shmem_default_max_blocks(void)
{
@@ -4687,6 +4695,12 @@ void __init shmem_init(void)
SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
else
shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
+
+ /*
+ * Default to setting PMD-sized THP to inherit the global setting and
+ * disable all other multi-size THPs, when anonymous shmem uses mTHP.
+ */
+ huge_anon_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
#endif
return;
@@ -4746,6 +4760,11 @@ static ssize_t shmem_enabled_store(struct kobject *kobj,
huge != SHMEM_HUGE_NEVER && huge != SHMEM_HUGE_DENY)
return -EINVAL;
+ /* Do not override huge allocation policy with non-PMD sized mTHP */
+ if (huge == SHMEM_HUGE_FORCE &&
+ huge_anon_shmem_orders_inherit != BIT(HPAGE_PMD_ORDER))
+ return -EINVAL;
+
shmem_huge = huge;
if (shmem_huge > SHMEM_HUGE_DENY)
SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
@@ -4753,6 +4772,83 @@ static ssize_t shmem_enabled_store(struct kobject *kobj,
}
struct kobj_attribute shmem_enabled_attr = __ATTR_RW(shmem_enabled);
+
+static ssize_t thpsize_shmem_enabled_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ int order = to_thpsize(kobj)->order;
+ const char *output;
+
+ if (test_bit(order, &huge_anon_shmem_orders_always))
+ output = "[always] inherit within_size advise never deny [force]";
+ else if (test_bit(order, &huge_anon_shmem_orders_inherit))
+ output = "always [inherit] within_size advise never deny force";
+ else if (test_bit(order, &huge_anon_shmem_orders_within_size))
+ output = "always inherit [within_size] advise never deny force";
+ else if (test_bit(order, &huge_anon_shmem_orders_madvise))
+ output = "always inherit within_size [advise] never deny force";
+ else
+ output = "always inherit within_size advise [never] [deny] force";
+
+ return sysfs_emit(buf, "%s\n", output);
+}
+
+static ssize_t thpsize_shmem_enabled_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ int order = to_thpsize(kobj)->order;
+ ssize_t ret = count;
+
+ if (sysfs_streq(buf, "always") || sysfs_streq(buf, "force")) {
+ spin_lock(&huge_anon_shmem_orders_lock);
+ clear_bit(order, &huge_anon_shmem_orders_inherit);
+ clear_bit(order, &huge_anon_shmem_orders_madvise);
+ clear_bit(order, &huge_anon_shmem_orders_within_size);
+ set_bit(order, &huge_anon_shmem_orders_always);
+ spin_unlock(&huge_anon_shmem_orders_lock);
+ } else if (sysfs_streq(buf, "inherit")) {
+ /* Do not override huge allocation policy with non-PMD sized mTHP */
+ if (shmem_huge == SHMEM_HUGE_FORCE &&
+ order != HPAGE_PMD_ORDER)
+ return -EINVAL;
+
+ spin_lock(&huge_anon_shmem_orders_lock);
+ clear_bit(order, &huge_anon_shmem_orders_always);
+ clear_bit(order, &huge_anon_shmem_orders_madvise);
+ clear_bit(order, &huge_anon_shmem_orders_within_size);
+ set_bit(order, &huge_anon_shmem_orders_inherit);
+ spin_unlock(&huge_anon_shmem_orders_lock);
+ } else if (sysfs_streq(buf, "within_size")) {
+ spin_lock(&huge_anon_shmem_orders_lock);
+ clear_bit(order, &huge_anon_shmem_orders_always);
+ clear_bit(order, &huge_anon_shmem_orders_inherit);
+ clear_bit(order, &huge_anon_shmem_orders_madvise);
+ set_bit(order, &huge_anon_shmem_orders_within_size);
+ spin_unlock(&huge_anon_shmem_orders_lock);
+ } else if (sysfs_streq(buf, "madvise")) {
+ spin_lock(&huge_anon_shmem_orders_lock);
+ clear_bit(order, &huge_anon_shmem_orders_always);
+ clear_bit(order, &huge_anon_shmem_orders_inherit);
+ clear_bit(order, &huge_anon_shmem_orders_within_size);
+ set_bit(order, &huge_anon_shmem_orders_madvise);
+ spin_unlock(&huge_anon_shmem_orders_lock);
+ } else if (sysfs_streq(buf, "never") || sysfs_streq(buf, "deny")) {
+ spin_lock(&huge_anon_shmem_orders_lock);
+ clear_bit(order, &huge_anon_shmem_orders_always);
+ clear_bit(order, &huge_anon_shmem_orders_inherit);
+ clear_bit(order, &huge_anon_shmem_orders_within_size);
+ clear_bit(order, &huge_anon_shmem_orders_madvise);
+ spin_unlock(&huge_anon_shmem_orders_lock);
+ } else {
+ ret = -EINVAL;
+ }
+
+ return ret;
+}
+
+struct kobj_attribute thpsize_shmem_enabled_attr =
+ __ATTR(shmem_enabled, 0644, thpsize_shmem_enabled_show, thpsize_shmem_enabled_store);
#endif /* CONFIG_TRANSPARENT_HUGEPAGE && CONFIG_SYSFS */
#else /* !CONFIG_SHMEM */
--
2.39.3
Add mTHP counters for anonymous shmem.
Signed-off-by: Baolin Wang <[email protected]>
---
include/linux/huge_mm.h | 3 +++
mm/huge_memory.c | 6 ++++++
mm/shmem.c | 18 +++++++++++++++---
3 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index b5339210268d..e162498fef82 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -281,6 +281,9 @@ enum mthp_stat_item {
MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
MTHP_STAT_ANON_SWPOUT,
MTHP_STAT_ANON_SWPOUT_FALLBACK,
+ MTHP_STAT_FILE_ALLOC,
+ MTHP_STAT_FILE_FALLBACK,
+ MTHP_STAT_FILE_FALLBACK_CHARGE,
__MTHP_STAT_COUNT
};
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d3080a8843f2..fcda6ae604f6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -555,6 +555,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
+DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
+DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
+DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
static struct attribute *stats_attrs[] = {
&anon_fault_alloc_attr.attr,
@@ -562,6 +565,9 @@ static struct attribute *stats_attrs[] = {
&anon_fault_fallback_charge_attr.attr,
&anon_swpout_attr.attr,
&anon_swpout_fallback_attr.attr,
+ &file_alloc_attr.attr,
+ &file_fallback_attr.attr,
+ &file_fallback_charge_attr.attr,
NULL,
};
diff --git a/mm/shmem.c b/mm/shmem.c
index 8b020ff09c72..fd2cb2e73a21 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1786,6 +1786,9 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
if (pages == HPAGE_PMD_NR)
count_vm_event(THP_FILE_FALLBACK);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
+#endif
order = next_order(&suitable_orders, order);
}
} else {
@@ -1805,9 +1808,15 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
if (xa_find(&mapping->i_pages, &index,
index + pages - 1, XA_PRESENT)) {
error = -EEXIST;
- } else if (pages == HPAGE_PMD_NR) {
- count_vm_event(THP_FILE_FALLBACK);
- count_vm_event(THP_FILE_FALLBACK_CHARGE);
+ } else if (pages > 1) {
+ if (pages == HPAGE_PMD_NR) {
+ count_vm_event(THP_FILE_FALLBACK);
+ count_vm_event(THP_FILE_FALLBACK_CHARGE);
+ }
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK_CHARGE);
+#endif
}
goto unlock;
}
@@ -2178,6 +2187,9 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
if (!IS_ERR(folio)) {
if (folio_test_pmd_mappable(folio))
count_vm_event(THP_FILE_ALLOC);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
+#endif
goto alloced;
}
if (PTR_ERR(folio) == -EEXIST)
--
2.39.3
Hi Baolin,
On Mon, May 13, 2024 at 1:08 PM Baolin Wang
<[email protected]> wrote:
>
> Commit 19eaf44954df adds multi-size THP (mTHP) for anonymous pages, that
> can allow THP to be configured through the sysfs interface located at
> '/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled'.
>
> However, the anonymous share pages will ignore the anonymous mTHP rule
> configured through the sysfs interface, and can only use the PMD-mapped
> THP, that is not reasonable. Users expect to apply the mTHP rule for
> all anonymous pages, including the anonymous share pages, in order to
> enjoy the benefits of mTHP. For example, lower latency than PMD-mapped THP,
> smaller memory bloat than PMD-mapped THP, contiguous PTEs on ARM architecture
> to reduce TLB miss etc.
>
> The primary strategy is similar to supporting anonymous mTHP. Introduce
> a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled',
> which can have all the same values as the top-level
> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new
> additional "inherit" option. By default all sizes will be set to "never"
> except PMD size, which is set to "inherit". This ensures backward compatibility
> with the anonymous shmem enabled of the top level, meanwhile also allows
> independent control of anonymous shmem enabled for each mTHP.
>
> Signed-off-by: Baolin Wang <[email protected]>
> ---
> include/linux/huge_mm.h | 10 +++
> mm/shmem.c | 179 +++++++++++++++++++++++++++++++++-------
> 2 files changed, 161 insertions(+), 28 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 1fce6fee7766..b5339210268d 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -583,6 +583,16 @@ static inline bool thp_migration_supported(void)
> {
> return false;
> }
> +
> +static inline int highest_order(unsigned long orders)
> +{
> + return 0;
> +}
> +
> +static inline int next_order(unsigned long *orders, int prev)
> +{
> + return 0;
> +}
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> static inline int split_folio_to_list_to_order(struct folio *folio,
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 59cc26d44344..b50ddf013e37 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1611,6 +1611,106 @@ static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp)
> return result;
> }
>
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +static unsigned long anon_shmem_allowable_huge_orders(struct inode *inode,
> + struct vm_area_struct *vma, pgoff_t index,
> + bool global_huge)
> +{
> + unsigned long mask = READ_ONCE(huge_anon_shmem_orders_always);
> + unsigned long within_size_orders = READ_ONCE(huge_anon_shmem_orders_within_size);
> + unsigned long vm_flags = vma->vm_flags;
> + /*
> + * Check all the (large) orders below HPAGE_PMD_ORDER + 1 that
> + * are enabled for this vma.
> + */
> + unsigned long orders = BIT(PMD_ORDER + 1) - 1;
> + loff_t i_size;
> + int order;
> +
> + if ((vm_flags & VM_NOHUGEPAGE) ||
> + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> + return 0;
> +
> + /* If the hardware/firmware marked hugepage support disabled. */
> + if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED))
> + return 0;
> +
> + /*
> + * Following the 'deny' semantics of the top level, force the huge
> + * option off from all mounts.
> + */
> + if (shmem_huge == SHMEM_HUGE_DENY)
> + return 0;
> + /*
> + * Only allow inherit orders if the top-level value is 'force', which
> + * means non-PMD sized THP can not override 'huge' mount option now.
> + */
> + if (shmem_huge == SHMEM_HUGE_FORCE)
> + return READ_ONCE(huge_anon_shmem_orders_inherit);
> +
> + /* Allow mTHP that will be fully within i_size. */
> + order = highest_order(within_size_orders);
> + while (within_size_orders) {
> + index = round_up(index + 1, order);
> + i_size = round_up(i_size_read(inode), PAGE_SIZE);
> + if (i_size >> PAGE_SHIFT >= index) {
> + mask |= within_size_orders;
> + break;
> + }
> +
> + order = next_order(&within_size_orders, order);
> + }
> +
> + if (vm_flags & VM_HUGEPAGE)
> + mask |= READ_ONCE(huge_anon_shmem_orders_madvise);
> +
> + if (global_huge)
> + mask |= READ_ONCE(huge_anon_shmem_orders_inherit);
> +
> + return orders & mask;
> +}
> +
> +static unsigned long anon_shmem_suitable_orders(struct inode *inode, struct vm_fault *vmf,
> + struct address_space *mapping, pgoff_t index,
> + unsigned long orders)
> +{
> + struct vm_area_struct *vma = vmf->vma;
> + unsigned long pages;
> + int order;
> +
> + orders = thp_vma_suitable_orders(vma, vmf->address, orders);
> + if (!orders)
> + return 0;
> +
> + /* Find the highest order that can add into the page cache */
> + order = highest_order(orders);
> + while (orders) {
> + pages = 1UL << order;
> + index = round_down(index, pages);
> + if (!xa_find(&mapping->i_pages, &index,
> + index + pages - 1, XA_PRESENT))
> + break;
> + order = next_order(&orders, order);
> + }
> +
> + return orders;
> +}
> +#else
> +static unsigned long anon_shmem_allowable_huge_orders(struct inode *inode,
> + struct vm_area_struct *vma, pgoff_t index,
> + bool global_huge)
> +{
> + return 0;
> +}
> +
> +static unsigned long anon_shmem_suitable_orders(struct inode *inode, struct vm_fault *vmf,
> + struct address_space *mapping, pgoff_t index,
> + unsigned long orders)
> +{
> + return 0;
> +}
> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> +
> static struct folio *shmem_alloc_hugefolio(gfp_t gfp,
> struct shmem_inode_info *info, pgoff_t index, int order)
> {
> @@ -1639,38 +1739,55 @@ static struct folio *shmem_alloc_folio(gfp_t gfp,
> return (struct folio *)page;
> }
>
> -static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
> - struct inode *inode, pgoff_t index,
> - struct mm_struct *fault_mm, bool huge)
> +static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
> + gfp_t gfp, struct inode *inode, pgoff_t index,
> + struct mm_struct *fault_mm, bool huge, unsigned long orders)
IMO, it might be cleaner to drop the huge parameter and just set 'orders' as
BIT(HPAGE_PMD_ORDER), then we only do the 'orders' check :)
Likely:
if (orders > 0) {
if (vma && vma_is_anon_shmem(vma)) {
...
} else if (orders & BIT(HPAGE_PMD_ORDER)) {
...
}
}
> {
> struct address_space *mapping = inode->i_mapping;
> struct shmem_inode_info *info = SHMEM_I(inode);
> - struct folio *folio;
> + struct vm_area_struct *vma = vmf ? vmf->vma : NULL;
> + unsigned long suitable_orders;
> + struct folio *folio = NULL;
> long pages;
> - int error;
> + int error, order;
>
> if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> huge = false;
Currently, if THP is disabled, 'huge' will fall back to order-0, but 'orders'
does not, IIUC. How about we make both consistent if THP is disabled?
if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
huge = false;
orders = 0;
}
Thanks,
Lance
>
> - if (huge) {
> - pages = HPAGE_PMD_NR;
> - index = round_down(index, HPAGE_PMD_NR);
> + if (huge || orders > 0) {
> + if (vma && vma_is_anon_shmem(vma) && orders) {
> + suitable_orders = anon_shmem_suitable_orders(inode, vmf,
> + mapping, index, orders);
> + } else {
> + pages = HPAGE_PMD_NR;
> + suitable_orders = BIT(HPAGE_PMD_ORDER);
> + index = round_down(index, HPAGE_PMD_NR);
>
> - /*
> - * Check for conflict before waiting on a huge allocation.
> - * Conflict might be that a huge page has just been allocated
> - * and added to page cache by a racing thread, or that there
> - * is already at least one small page in the huge extent.
> - * Be careful to retry when appropriate, but not forever!
> - * Elsewhere -EEXIST would be the right code, but not here.
> - */
> - if (xa_find(&mapping->i_pages, &index,
> + /*
> + * Check for conflict before waiting on a huge allocation.
> + * Conflict might be that a huge page has just been allocated
> + * and added to page cache by a racing thread, or that there
> + * is already at least one small page in the huge extent.
> + * Be careful to retry when appropriate, but not forever!
> + * Elsewhere -EEXIST would be the right code, but not here.
> + */
> + if (xa_find(&mapping->i_pages, &index,
> index + HPAGE_PMD_NR - 1, XA_PRESENT))
> - return ERR_PTR(-E2BIG);
> + return ERR_PTR(-E2BIG);
> + }
>
> - folio = shmem_alloc_hugefolio(gfp, info, index, HPAGE_PMD_ORDER);
> - if (!folio && pages == HPAGE_PMD_NR)
> - count_vm_event(THP_FILE_FALLBACK);
> + order = highest_order(suitable_orders);
> + while (suitable_orders) {
> + pages = 1 << order;
> + index = round_down(index, pages);
> + folio = shmem_alloc_hugefolio(gfp, info, index, order);
> + if (folio)
> + goto allocated;
> +
> + if (pages == HPAGE_PMD_NR)
> + count_vm_event(THP_FILE_FALLBACK);
> + order = next_order(&suitable_orders, order);
> + }
> } else {
> pages = 1;
> folio = shmem_alloc_folio(gfp, info, index);
> @@ -1678,6 +1795,7 @@ static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
> if (!folio)
> return ERR_PTR(-ENOMEM);
>
> +allocated:
> __folio_set_locked(folio);
> __folio_set_swapbacked(folio);
>
> @@ -1972,7 +2090,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> struct mm_struct *fault_mm;
> struct folio *folio;
> int error;
> - bool alloced;
> + bool alloced, huge;
> + unsigned long orders = 0;
>
> if (WARN_ON_ONCE(!shmem_mapping(inode->i_mapping)))
> return -EINVAL;
> @@ -2044,14 +2163,18 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> return 0;
> }
>
> - if (shmem_is_huge(inode, index, false, fault_mm,
> - vma ? vma->vm_flags : 0)) {
> + huge = shmem_is_huge(inode, index, false, fault_mm,
> + vma ? vma->vm_flags : 0);
> + /* Find hugepage orders that are allowed for anonymous shmem. */
> + if (vma && vma_is_anon_shmem(vma))
> + orders = anon_shmem_allowable_huge_orders(inode, vma, index, huge);
> + if (huge || orders > 0) {
> gfp_t huge_gfp;
>
> huge_gfp = vma_thp_gfp_mask(vma);
> huge_gfp = limit_gfp_mask(huge_gfp, gfp);
> - folio = shmem_alloc_and_add_folio(huge_gfp,
> - inode, index, fault_mm, true);
> + folio = shmem_alloc_and_add_folio(vmf, huge_gfp,
> + inode, index, fault_mm, true, orders);
> if (!IS_ERR(folio)) {
> if (folio_test_pmd_mappable(folio))
> count_vm_event(THP_FILE_ALLOC);
> @@ -2061,7 +2184,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> goto repeat;
> }
>
> - folio = shmem_alloc_and_add_folio(gfp, inode, index, fault_mm, false);
> + folio = shmem_alloc_and_add_folio(vmf, gfp, inode, index, fault_mm, false, 0);
> if (IS_ERR(folio)) {
> error = PTR_ERR(folio);
> if (error == -EEXIST)
> @@ -2072,7 +2195,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>
> alloced:
> alloced = true;
> - if (folio_test_pmd_mappable(folio) &&
> + if (folio_test_large(folio) &&
> DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE) <
> folio_next_index(folio) - 1) {
> struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> --
> 2.39.3
>
Hi Baolin,
On Mon, May 13, 2024 at 1:08 PM Baolin Wang
<[email protected]> wrote:
>
> Add mTHP counters for anonymous shmem.
>
> Signed-off-by: Baolin Wang <[email protected]>
> ---
> include/linux/huge_mm.h | 3 +++
> mm/huge_memory.c | 6 ++++++
> mm/shmem.c | 18 +++++++++++++++---
> 3 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index b5339210268d..e162498fef82 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -281,6 +281,9 @@ enum mthp_stat_item {
> MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
> MTHP_STAT_ANON_SWPOUT,
> MTHP_STAT_ANON_SWPOUT_FALLBACK,
> + MTHP_STAT_FILE_ALLOC,
> + MTHP_STAT_FILE_FALLBACK,
> + MTHP_STAT_FILE_FALLBACK_CHARGE,
> __MTHP_STAT_COUNT
> };
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index d3080a8843f2..fcda6ae604f6 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -555,6 +555,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
> DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
> DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
> DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
>
> static struct attribute *stats_attrs[] = {
> &anon_fault_alloc_attr.attr,
> @@ -562,6 +565,9 @@ static struct attribute *stats_attrs[] = {
> &anon_fault_fallback_charge_attr.attr,
> &anon_swpout_attr.attr,
> &anon_swpout_fallback_attr.attr,
> + &file_alloc_attr.attr,
> + &file_fallback_attr.attr,
> + &file_fallback_charge_attr.attr,
> NULL,
> };
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 8b020ff09c72..fd2cb2e73a21 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1786,6 +1786,9 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
>
> if (pages == HPAGE_PMD_NR)
> count_vm_event(THP_FILE_FALLBACK);
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
> +#endif
Seems like we don't need these conditional compilation directives here.
The THP_FILE_FALLBACK above will result in a compilation error if
CONFIG_TRANSPARENT_HUGEPAGE is not defined. So we don't
worry about that :)
See THP_FILE_FALLBACK in include/linux/vm_event_item.h.
> order = next_order(&suitable_orders, order);
> }
> } else {
> @@ -1805,9 +1808,15 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
> if (xa_find(&mapping->i_pages, &index,
> index + pages - 1, XA_PRESENT)) {
> error = -EEXIST;
> - } else if (pages == HPAGE_PMD_NR) {
> - count_vm_event(THP_FILE_FALLBACK);
> - count_vm_event(THP_FILE_FALLBACK_CHARGE);
> + } else if (pages > 1) {
> + if (pages == HPAGE_PMD_NR) {
> + count_vm_event(THP_FILE_FALLBACK);
> + count_vm_event(THP_FILE_FALLBACK_CHARGE);
> + }
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK);
> + count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK_CHARGE);
> +#endif
As above.
> }
> goto unlock;
> }
> @@ -2178,6 +2187,9 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> if (!IS_ERR(folio)) {
> if (folio_test_pmd_mappable(folio))
> count_vm_event(THP_FILE_ALLOC);
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
> +#endif
As above.
Perhaps we need to define MTHP_STAT_FILE_ALLOC and friends
using a same way as THP_FILE_ALLOC, set as '{ BUILD_BUG(); 0; }'
If CONFIG_TRANSPARENT_HUGEPAGE is not defined.
Likely:
#ifndef CONFIG_TRANSPARENT_HUGEPAGE
#define MTHP_STAT_FILE_ALLOC ({ BUILD_BUG(); 0; })
..
#endif
Thanks,
Lance
> goto alloced;
> }
> if (PTR_ERR(folio) == -EEXIST)
> --
> 2.39.3
>
On 2024/5/14 21:36, Lance Yang wrote:
> Hi Baolin,
>
> On Mon, May 13, 2024 at 1:08 PM Baolin Wang
> <[email protected]> wrote:
>>
>> Commit 19eaf44954df adds multi-size THP (mTHP) for anonymous pages, that
>> can allow THP to be configured through the sysfs interface located at
>> '/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled'.
>>
>> However, the anonymous share pages will ignore the anonymous mTHP rule
>> configured through the sysfs interface, and can only use the PMD-mapped
>> THP, that is not reasonable. Users expect to apply the mTHP rule for
>> all anonymous pages, including the anonymous share pages, in order to
>> enjoy the benefits of mTHP. For example, lower latency than PMD-mapped THP,
>> smaller memory bloat than PMD-mapped THP, contiguous PTEs on ARM architecture
>> to reduce TLB miss etc.
>>
>> The primary strategy is similar to supporting anonymous mTHP. Introduce
>> a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled',
>> which can have all the same values as the top-level
>> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new
>> additional "inherit" option. By default all sizes will be set to "never"
>> except PMD size, which is set to "inherit". This ensures backward compatibility
>> with the anonymous shmem enabled of the top level, meanwhile also allows
>> independent control of anonymous shmem enabled for each mTHP.
>>
>> Signed-off-by: Baolin Wang <[email protected]>
>> ---
>> include/linux/huge_mm.h | 10 +++
>> mm/shmem.c | 179 +++++++++++++++++++++++++++++++++-------
>> 2 files changed, 161 insertions(+), 28 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 1fce6fee7766..b5339210268d 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -583,6 +583,16 @@ static inline bool thp_migration_supported(void)
>> {
>> return false;
>> }
>> +
>> +static inline int highest_order(unsigned long orders)
>> +{
>> + return 0;
>> +}
>> +
>> +static inline int next_order(unsigned long *orders, int prev)
>> +{
>> + return 0;
>> +}
>> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>
>> static inline int split_folio_to_list_to_order(struct folio *folio,
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 59cc26d44344..b50ddf013e37 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -1611,6 +1611,106 @@ static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp)
>> return result;
>> }
>>
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +static unsigned long anon_shmem_allowable_huge_orders(struct inode *inode,
>> + struct vm_area_struct *vma, pgoff_t index,
>> + bool global_huge)
>> +{
>> + unsigned long mask = READ_ONCE(huge_anon_shmem_orders_always);
>> + unsigned long within_size_orders = READ_ONCE(huge_anon_shmem_orders_within_size);
>> + unsigned long vm_flags = vma->vm_flags;
>> + /*
>> + * Check all the (large) orders below HPAGE_PMD_ORDER + 1 that
>> + * are enabled for this vma.
>> + */
>> + unsigned long orders = BIT(PMD_ORDER + 1) - 1;
>> + loff_t i_size;
>> + int order;
>> +
>> + if ((vm_flags & VM_NOHUGEPAGE) ||
>> + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>> + return 0;
>> +
>> + /* If the hardware/firmware marked hugepage support disabled. */
>> + if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED))
>> + return 0;
>> +
>> + /*
>> + * Following the 'deny' semantics of the top level, force the huge
>> + * option off from all mounts.
>> + */
>> + if (shmem_huge == SHMEM_HUGE_DENY)
>> + return 0;
>> + /*
>> + * Only allow inherit orders if the top-level value is 'force', which
>> + * means non-PMD sized THP can not override 'huge' mount option now.
>> + */
>> + if (shmem_huge == SHMEM_HUGE_FORCE)
>> + return READ_ONCE(huge_anon_shmem_orders_inherit);
>> +
>> + /* Allow mTHP that will be fully within i_size. */
>> + order = highest_order(within_size_orders);
>> + while (within_size_orders) {
>> + index = round_up(index + 1, order);
>> + i_size = round_up(i_size_read(inode), PAGE_SIZE);
>> + if (i_size >> PAGE_SHIFT >= index) {
>> + mask |= within_size_orders;
>> + break;
>> + }
>> +
>> + order = next_order(&within_size_orders, order);
>> + }
>> +
>> + if (vm_flags & VM_HUGEPAGE)
>> + mask |= READ_ONCE(huge_anon_shmem_orders_madvise);
>> +
>> + if (global_huge)
>> + mask |= READ_ONCE(huge_anon_shmem_orders_inherit);
>> +
>> + return orders & mask;
>> +}
>> +
>> +static unsigned long anon_shmem_suitable_orders(struct inode *inode, struct vm_fault *vmf,
>> + struct address_space *mapping, pgoff_t index,
>> + unsigned long orders)
>> +{
>> + struct vm_area_struct *vma = vmf->vma;
>> + unsigned long pages;
>> + int order;
>> +
>> + orders = thp_vma_suitable_orders(vma, vmf->address, orders);
>> + if (!orders)
>> + return 0;
>> +
>> + /* Find the highest order that can add into the page cache */
>> + order = highest_order(orders);
>> + while (orders) {
>> + pages = 1UL << order;
>> + index = round_down(index, pages);
>> + if (!xa_find(&mapping->i_pages, &index,
>> + index + pages - 1, XA_PRESENT))
>> + break;
>> + order = next_order(&orders, order);
>> + }
>> +
>> + return orders;
>> +}
>> +#else
>> +static unsigned long anon_shmem_allowable_huge_orders(struct inode *inode,
>> + struct vm_area_struct *vma, pgoff_t index,
>> + bool global_huge)
>> +{
>> + return 0;
>> +}
>> +
>> +static unsigned long anon_shmem_suitable_orders(struct inode *inode, struct vm_fault *vmf,
>> + struct address_space *mapping, pgoff_t index,
>> + unsigned long orders)
>> +{
>> + return 0;
>> +}
>> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>> +
>> static struct folio *shmem_alloc_hugefolio(gfp_t gfp,
>> struct shmem_inode_info *info, pgoff_t index, int order)
>> {
>> @@ -1639,38 +1739,55 @@ static struct folio *shmem_alloc_folio(gfp_t gfp,
>> return (struct folio *)page;
>> }
>>
>> -static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
>> - struct inode *inode, pgoff_t index,
>> - struct mm_struct *fault_mm, bool huge)
>> +static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
>> + gfp_t gfp, struct inode *inode, pgoff_t index,
>> + struct mm_struct *fault_mm, bool huge, unsigned long orders)
>
> IMO, it might be cleaner to drop the huge parameter and just set 'orders' as
> BIT(HPAGE_PMD_ORDER), then we only do the 'orders' check :)
>
> Likely:
>
> if (orders > 0) {
> if (vma && vma_is_anon_shmem(vma)) {
> ...
> } else if (orders & BIT(HPAGE_PMD_ORDER)) {
> ...
> }
> }
Yes, looks better.
>> {
>> struct address_space *mapping = inode->i_mapping;
>> struct shmem_inode_info *info = SHMEM_I(inode);
>> - struct folio *folio;
>> + struct vm_area_struct *vma = vmf ? vmf->vma : NULL;
>> + unsigned long suitable_orders;
>> + struct folio *folio = NULL;
>> long pages;
>> - int error;
>> + int error, order;
>>
>> if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
>> huge = false;
>
> Currently, if THP is disabled, 'huge' will fall back to order-0, but 'orders'
> does not, IIUC. How about we make both consistent if THP is disabled?
>
> if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> huge = false;
> orders = 0;
> }
If THP is disabled, the 'orders' must be 0, so no need to reset it.
On 2024/5/14 22:49, Lance Yang wrote:
> Hi Baolin,
>
> On Mon, May 13, 2024 at 1:08 PM Baolin Wang
> <[email protected]> wrote:
>>
>> Add mTHP counters for anonymous shmem.
>>
>> Signed-off-by: Baolin Wang <[email protected]>
>> ---
>> include/linux/huge_mm.h | 3 +++
>> mm/huge_memory.c | 6 ++++++
>> mm/shmem.c | 18 +++++++++++++++---
>> 3 files changed, 24 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index b5339210268d..e162498fef82 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -281,6 +281,9 @@ enum mthp_stat_item {
>> MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
>> MTHP_STAT_ANON_SWPOUT,
>> MTHP_STAT_ANON_SWPOUT_FALLBACK,
>> + MTHP_STAT_FILE_ALLOC,
>> + MTHP_STAT_FILE_FALLBACK,
>> + MTHP_STAT_FILE_FALLBACK_CHARGE,
>> __MTHP_STAT_COUNT
>> };
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index d3080a8843f2..fcda6ae604f6 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -555,6 +555,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
>> DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
>> DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
>> DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
>> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
>> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
>> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
>>
>> static struct attribute *stats_attrs[] = {
>> &anon_fault_alloc_attr.attr,
>> @@ -562,6 +565,9 @@ static struct attribute *stats_attrs[] = {
>> &anon_fault_fallback_charge_attr.attr,
>> &anon_swpout_attr.attr,
>> &anon_swpout_fallback_attr.attr,
>> + &file_alloc_attr.attr,
>> + &file_fallback_attr.attr,
>> + &file_fallback_charge_attr.attr,
>> NULL,
>> };
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 8b020ff09c72..fd2cb2e73a21 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -1786,6 +1786,9 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
>>
>> if (pages == HPAGE_PMD_NR)
>> count_vm_event(THP_FILE_FALLBACK);
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> + count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
>> +#endif
>
> Seems like we don't need these conditional compilation directives here.
>
> The THP_FILE_FALLBACK above will result in a compilation error if
> CONFIG_TRANSPARENT_HUGEPAGE is not defined. So we don't
> worry about that :)
>
> See THP_FILE_FALLBACK in include/linux/vm_event_item.h.
>
>> order = next_order(&suitable_orders, order);
>> }
>> } else {
>> @@ -1805,9 +1808,15 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
>> if (xa_find(&mapping->i_pages, &index,
>> index + pages - 1, XA_PRESENT)) {
>> error = -EEXIST;
>> - } else if (pages == HPAGE_PMD_NR) {
>> - count_vm_event(THP_FILE_FALLBACK);
>> - count_vm_event(THP_FILE_FALLBACK_CHARGE);
>> + } else if (pages > 1) {
>> + if (pages == HPAGE_PMD_NR) {
>> + count_vm_event(THP_FILE_FALLBACK);
>> + count_vm_event(THP_FILE_FALLBACK_CHARGE);
>> + }
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> + count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK);
>> + count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK_CHARGE);
>> +#endif
>
> As above.
>
>> }
>> goto unlock;
>> }
>> @@ -2178,6 +2187,9 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>> if (!IS_ERR(folio)) {
>> if (folio_test_pmd_mappable(folio))
>> count_vm_event(THP_FILE_ALLOC);
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> + count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
>> +#endif
>
> As above.
>
> Perhaps we need to define MTHP_STAT_FILE_ALLOC and friends
> using a same way as THP_FILE_ALLOC, set as '{ BUILD_BUG(); 0; }'
> If CONFIG_TRANSPARENT_HUGEPAGE is not defined.
>
> Likely:
>
> #ifndef CONFIG_TRANSPARENT_HUGEPAGE
> #define MTHP_STAT_FILE_ALLOC ({ BUILD_BUG(); 0; })
> ...
> #endif
This is not enough, and we should also define a dummy function for
count_mthp_stat() when CONFIG_TRANSPARENT_HUGEPAGE is not enabled. I was
also hesitant about doing this before, but adding macro controls seems
relatively simple:)
Thanks for reviewing.