2023-01-19 21:33:01

by Sidhartha Kumar

[permalink] [raw]
Subject: [PATCH 0/9] convert hugetlb fault functions to folios

============== OVERVIEW ===========================
This series converts the hugetlb page faulting functions to operate on
folios. These include hugetlb_no_page(), hugetlb_wp(),
copy_hugetlb_page_range(), and hugetlb_mcopy_atomic_pte().

patches 1-3:
- convert prerequisite helper functions to folios.
patch 4:
- add a folio variable to the hugetlb fault functions and start
a partial conversion to folios.
patch 5:
- fully convert hugetlb fault functions to folios.
patches 6-8:
- convert three functions to take in a folio rather than a page as
all callers now use folios.
patch 9:
- update documentation that references alloc_huge_page

============== TEST COVERAGE ============================

Linux Test Project Hugetlb Test Fault Coverage

[opc@sidhakum-devel kernel (master)]$ sudo ./funccount 'hugetlb_no_page'
FUNC COUNT
hugetlb_no_page 7796

[opc@sidhakum-devel kernel (master)]$ sudo ./funccount 'hugetlb_wp'
FUNC COUNT
hugetlb_wp 4623

Using fallocate commands to create files on hugeltbfs

[opc@sidhakum-devel kernel (master)]$ sudo ./funccount 'hugetlbfs*'
FUNC COUNT
hugetlbfs_fallocate 1


Userfaultfd selftest
./userfaultfd hugetlb 256 50 /dev/hugepages/hugefile

[opc@sidhakum-devel kernel (master)]$ sudo ./funccount 'hugetlb_mcopy_atomic_pte'
FUNC COUNT
hugetlb_mcopy_atomic_pte 6240


[opc@sidhakum-devel kernel (master)]$ sudo ./funccount 'copy_hugetlb_page_range'
FUNC COUNT
copy_hugetlb_page_range 3

============== PERFORMANCE ============================
using bpftrace to track time spent in fault functions over 10 rounds of
the LTP hugetlb tests

pre-patch:

@hugetlb_wp_nsecs:
[256, 512) 3675 |@@@@@@@@@@ |
[512, 1K) 18875 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1K, 2K) 1366 |@@@ |
[2K, 4K) 77 | |
[4K, 8K) 12 | |
[8K, 16K) 10 | |
[16K, 32K) 2 | |
[32K, 64K) 0 | |
[64K, 128K) 0 | |
[128K, 256K) 0 | |
[256K, 512K) 0 | |
[512K, 1M) 0 | |
[1M, 2M) 0 | |
[2M, 4M) 0 | |
[4M, 8M) 0 | |
[8M, 16M) 0 | |
[16M, 32M) 0 | |
[32M, 64M) 0 | |
[64M, 128M) 0 | |
[128M, 256M) 0 | |
[256M, 512M) 0 | |
[512M, 1G) 0 | |
[1G, 2G) 0 | |
[2G, 4G) 1 | |

@hugetlb_no_page_nsecs:
[64, 128) 1 | |
[128, 256) 0 | |
[256, 512) 67 | |
[512, 1K) 66 | |
[1K, 2K) 65 | |
[2K, 4K) 198 | |
[4K, 8K) 97 | |
[8K, 16K) 3 | |
[16K, 32K) 4 | |
[32K, 64K) 678 | |
[64K, 128K) 3401 |@ |
[128K, 256K) 96746 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[256K, 512K) 107 | |
[512K, 1M) 0 | |
[1M, 2M) 1 | |
[2M, 4M) 0 | |
[4M, 8M) 1 | |
[8M, 16M) 0 | |
[16M, 32M) 0 | |
[32M, 64M) 0 | |
[64M, 128M) 0 | |
[128M, 256M) 0 | |
[256M, 512M) 0 | |
[512M, 1G) 0 | |
[1G, 2G) 0 | |
[2G, 4G) 283 | |


post patch:

@hugetlb_wp_nsecs:
[256, 512) 7640 |@@@@@@@@@@@@@@@@@ |
[512, 1K) 23314 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1K, 2K) 1171 |@@ |
[2K, 4K) 60 | |
[4K, 8K) 14 | |
[8K, 16K) 29 | |
[16K, 32K) 2 | |

@hugetlb_no_page_nsecs: [11/27]
[256, 512) 64 | |
[512, 1K) 75 | |
[1K, 2K) 70 | |
[2K, 4K) 209 | |
[4K, 8K) 73 | |
[8K, 16K) 5 | |
[16K, 32K) 1 | |
[32K, 64K) 2562 |@ |
[64K, 128K) 7084 |@@@ |
[128K, 256K) 99292 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[256K, 512K) 66 | |
[512K, 1M) 0 | |
[1M, 2M) 1 | |
[2M, 4M) 0 | |
[4M, 8M) 0 | |
[8M, 16M) 0 | |
[16M, 32M) 0 | |
[32M, 64M) 0 | |
[64M, 128M) 0 | |
[128M, 256M) 0 | |
[256M, 512M) 0 | |
[512M, 1G) 0 | |
[1G, 2G) 0 | |
[2G, 4G) 428 | |



rebased on 01/18/2023

Sidhartha Kumar (9):
mm/hugetlb: convert hugetlb_install_page to folios
mm/hugetlb: convert hugetlbfs_pagecache_present() to folios
mm/hugetlb: convert putback_active_hugepage to take in a folio
mm/rmap: change hugepage_add_new_anon_rmap to take in a folio
mm/hugetlb: convert alloc_huge_page to alloc_hugetlb_folio
mm/hugetlb: convert restore_reserve_on_error to take in a folio
mm/hugetlb: convert hugetlb_add_to_page_cache to take in a folio
mm/hugetlb: convert hugetlb_wp() to take in a folio
Documentation/mm: update hugetlbfs documentation to mention
alloc_hugetlb_folio

Documentation/mm/hugetlbfs_reserv.rst | 21 +-
.../zh_CN/mm/hugetlbfs_reserv.rst | 14 +-
fs/hugetlbfs/inode.c | 38 +--
include/linux/hugetlb.h | 16 +-
include/linux/rmap.h | 2 +-
mm/hugetlb.c | 249 +++++++++---------
mm/mempolicy.c | 6 +-
mm/migrate.c | 8 +-
mm/rmap.c | 6 +-
9 files changed, 179 insertions(+), 181 deletions(-)

--
2.39.0


2023-01-19 21:33:09

by Sidhartha Kumar

[permalink] [raw]
Subject: [PATCH 7/9] mm/hugetlb: convert hugetlb_add_to_page_cache to take in a folio

Every caller of hugetlb_add_to_page_cache() is now passing in
&folio->page, change the function to take in a folio directly
and clean up the call sites.

Signed-off-by: Sidhartha Kumar <[email protected]>
---
fs/hugetlbfs/inode.c | 2 +-
include/linux/hugetlb.h | 2 +-
mm/hugetlb.c | 9 ++++-----
3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 880ebc58f330..76e146bdcba0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -869,7 +869,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
}
clear_huge_page(&folio->page, addr, pages_per_huge_page(h));
__folio_mark_uptodate(folio);
- error = hugetlb_add_to_page_cache(&folio->page, mapping, index);
+ error = hugetlb_add_to_page_cache(folio, mapping, index);
if (unlikely(error)) {
restore_reserve_on_error(h, &pseudo_vma, addr, folio);
folio_put(folio);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index e76be7e8df2c..2dcae9bb4495 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -722,7 +722,7 @@ struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
nodemask_t *nmask, gfp_t gfp_mask);
struct folio *alloc_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
-int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping,
+int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping,
pgoff_t idx);
void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma,
unsigned long address, struct folio *folio);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9568d49c12d6..4ab3eda6db18 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5663,10 +5663,9 @@ static bool hugetlbfs_pagecache_present(struct hstate *h,
return folio != NULL;
}

-int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping,
+int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping,
pgoff_t idx)
{
- struct folio *folio = page_folio(page);
struct inode *inode = mapping->host;
struct hstate *h = hstate_inode(inode);
int err;
@@ -5678,7 +5677,7 @@ int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping,
__folio_clear_locked(folio);
return err;
}
- ClearHPageRestoreReserve(page);
+ folio_clear_hugetlb_restore_reserve(folio);

/*
* mark folio dirty so that it will not be removed from cache/file
@@ -5837,7 +5836,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
new_page = true;

if (vma->vm_flags & VM_MAYSHARE) {
- int err = hugetlb_add_to_page_cache(&folio->page, mapping, idx);
+ int err = hugetlb_add_to_page_cache(folio, mapping, idx);
if (err) {
/*
* err can't be -EEXIST which implies someone
@@ -6269,7 +6268,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
* hugetlb_fault_mutex_table that here must be hold by
* the caller.
*/
- ret = hugetlb_add_to_page_cache(&folio->page, mapping, idx);
+ ret = hugetlb_add_to_page_cache(folio, mapping, idx);
if (ret)
goto out_release_nounlock;
page_in_pagecache = true;
--
2.39.0

2023-01-19 21:33:10

by Sidhartha Kumar

[permalink] [raw]
Subject: [PATCH 2/9] mm/hugetlb: convert hugetlbfs_pagecache_present() to folios

Convert hugetlbfs_pagecache_present() to use folios internally by
replacing a call to find_get_page() to filemap_get_folio().

Signed-off-by: Sidhartha Kumar <[email protected]>
---
mm/hugetlb.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 849206e94742..04cbdf5025a5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5653,15 +5653,15 @@ static bool hugetlbfs_pagecache_present(struct hstate *h,
{
struct address_space *mapping;
pgoff_t idx;
- struct page *page;
+ struct folio *folio;

mapping = vma->vm_file->f_mapping;
idx = vma_hugecache_offset(h, vma, address);

- page = find_get_page(mapping, idx);
- if (page)
- put_page(page);
- return page != NULL;
+ folio = filemap_get_folio(mapping, idx);
+ if (folio)
+ folio_put(folio);
+ return folio != NULL;
}

int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping,
--
2.39.0

2023-01-19 21:34:20

by Sidhartha Kumar

[permalink] [raw]
Subject: [PATCH 6/9] mm/hugetlb: convert restore_reserve_on_error to take in a folio

Every caller of restore_reserve_on_error() is now passing in &folio->page,
change the function to take in a folio directly and clean up the call
sites.

Signed-off-by: Sidhartha Kumar <[email protected]>
---
fs/hugetlbfs/inode.c | 2 +-
include/linux/hugetlb.h | 2 +-
mm/hugetlb.c | 21 ++++++++++-----------
3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index e2f8951103be..880ebc58f330 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -871,7 +871,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
__folio_mark_uptodate(folio);
error = hugetlb_add_to_page_cache(&folio->page, mapping, index);
if (unlikely(error)) {
- restore_reserve_on_error(h, &pseudo_vma, addr, &folio->page);
+ restore_reserve_on_error(h, &pseudo_vma, addr, folio);
folio_put(folio);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
goto out;
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 8f054d81e63b..e76be7e8df2c 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -725,7 +725,7 @@ struct folio *alloc_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *v
int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping,
pgoff_t idx);
void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma,
- unsigned long address, struct page *page);
+ unsigned long address, struct folio *folio);

/* arch callback */
int __init __alloc_bootmem_huge_page(struct hstate *h, int nid);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 46467e80716f..9568d49c12d6 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2802,9 +2802,9 @@ static long vma_del_reservation(struct hstate *h,
* and the hugetlb mutex should remain held when calling this routine.
*
* It handles two specific cases:
- * 1) A reservation was in place and the page consumed the reservation.
- * HPageRestoreReserve is set in the page.
- * 2) No reservation was in place for the page, so HPageRestoreReserve is
+ * 1) A reservation was in place and the folio consumed the reservation.
+ * hugetlb_restore_reserve is set in the folio.
+ * 2) No reservation was in place for the page, so hugetlb_restore_reserve is
* not set. However, alloc_hugetlb_folio always updates the reserve map.
*
* In case 1, free_huge_page later in the error path will increment the
@@ -2817,9 +2817,8 @@ static long vma_del_reservation(struct hstate *h,
* In case 2, simply undo reserve map modifications done by alloc_hugetlb_folio.
*/
void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma,
- unsigned long address, struct page *page)
+ unsigned long address, struct folio *folio)
{
- struct folio *folio = page_folio(page);
long rc = vma_needs_reservation(h, vma, address);

if (folio_test_hugetlb_restore_reserve(folio)) {
@@ -5102,7 +5101,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
entry = huge_ptep_get(src_pte);
if (!pte_same(src_pte_old, entry)) {
restore_reserve_on_error(h, dst_vma, addr,
- &new_folio->page);
+ new_folio);
folio_put(new_folio);
/* huge_ptep of dst_pte won't change as in child */
goto again;
@@ -5633,7 +5632,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
* unshare)
*/
if (new_folio != page_folio(old_page))
- restore_reserve_on_error(h, vma, haddr, &new_folio->page);
+ restore_reserve_on_error(h, vma, haddr, new_folio);
folio_put(new_folio);
out_release_old:
put_page(old_page);
@@ -5847,7 +5846,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
* to the page cache. So it's safe to call
* restore_reserve_on_error() here.
*/
- restore_reserve_on_error(h, vma, haddr, &folio->page);
+ restore_reserve_on_error(h, vma, haddr, folio);
folio_put(folio);
goto out;
}
@@ -5948,7 +5947,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
spin_unlock(ptl);
backout_unlocked:
if (new_page && !new_pagecache_page)
- restore_reserve_on_error(h, vma, haddr, &folio->page);
+ restore_reserve_on_error(h, vma, haddr, folio);

folio_unlock(folio);
folio_put(folio);
@@ -6211,7 +6210,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
/* Free the allocated folio which may have
* consumed a reservation.
*/
- restore_reserve_on_error(h, dst_vma, dst_addr, &folio->page);
+ restore_reserve_on_error(h, dst_vma, dst_addr, folio);
folio_put(folio);

/* Allocate a temporary folio to hold the copied
@@ -6339,7 +6338,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
folio_unlock(folio);
out_release_nounlock:
if (!page_in_pagecache)
- restore_reserve_on_error(h, dst_vma, dst_addr, &folio->page);
+ restore_reserve_on_error(h, dst_vma, dst_addr, folio);
folio_put(folio);
goto out;
}
--
2.39.0

2023-01-19 21:54:03

by Sidhartha Kumar

[permalink] [raw]
Subject: [PATCH 9/9] Documentation/mm: update hugetlbfs documentation to mention alloc_hugetlb_folio

alloc_huge_page() has been renamed to alloc_hugetlb_folio() so update the
documentation which mentions alloc_huge_page(). Subpool information has
been moved from page->private to having a dedicated field in struct folio,
reflect this in the documentation.

Signed-off-by: Sidhartha Kumar <[email protected]>
---
Documentation/mm/hugetlbfs_reserv.rst | 21 ++++++++++---------
.../zh_CN/mm/hugetlbfs_reserv.rst | 14 ++++++-------
2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/Documentation/mm/hugetlbfs_reserv.rst b/Documentation/mm/hugetlbfs_reserv.rst
index f143954e0d05..611728c49bff 100644
--- a/Documentation/mm/hugetlbfs_reserv.rst
+++ b/Documentation/mm/hugetlbfs_reserv.rst
@@ -181,14 +181,14 @@ Consuming Reservations/Allocating a Huge Page

Reservations are consumed when huge pages associated with the reservations
are allocated and instantiated in the corresponding mapping. The allocation
-is performed within the routine alloc_huge_page()::
+is performed within the routine alloc_hugetlb_folio()::

- struct page *alloc_huge_page(struct vm_area_struct *vma,
+ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve)

-alloc_huge_page is passed a VMA pointer and a virtual address, so it can
+alloc_hugetlb_folio is passed a VMA pointer and a virtual address, so it can
consult the reservation map to determine if a reservation exists. In addition,
-alloc_huge_page takes the argument avoid_reserve which indicates reserves
+alloc_hugetlb_folio takes the argument avoid_reserve which indicates reserves
should not be used even if it appears they have been set aside for the
specified address. The avoid_reserve argument is most often used in the case
of Copy on Write and Page Migration where additional copies of an existing
@@ -208,7 +208,8 @@ a reservation for the allocation. After determining whether a reservation
exists and can be used for the allocation, the routine dequeue_huge_page_vma()
is called. This routine takes two arguments related to reservations:

-- avoid_reserve, this is the same value/argument passed to alloc_huge_page()
+- avoid_reserve, this is the same value/argument passed to
+ alloc_hugetlb_folio().
- chg, even though this argument is of type long only the values 0 or 1 are
passed to dequeue_huge_page_vma. If the value is 0, it indicates a
reservation exists (see the section "Memory Policy and Reservations" for
@@ -233,9 +234,9 @@ the scope reservations. Even if a surplus page is allocated, the same
reservation based adjustments as above will be made: SetPagePrivate(page) and
resv_huge_pages--.

-After obtaining a new huge page, (page)->private is set to the value of
-the subpool associated with the page if it exists. This will be used for
-subpool accounting when the page is freed.
+After obtaining a new hugetlb folio, (folio)->_hugetlb_subpool is set to the
+value of the subpool associated with the page if it exists. This will be used
+for subpool accounting when the folio is freed.

The routine vma_commit_reservation() is then called to adjust the reserve
map based on the consumption of the reservation. In general, this involves
@@ -246,8 +247,8 @@ was no reservation in a shared mapping or this was a private mapping a new
entry must be created.

It is possible that the reserve map could have been changed between the call
-to vma_needs_reservation() at the beginning of alloc_huge_page() and the
-call to vma_commit_reservation() after the page was allocated. This would
+to vma_needs_reservation() at the beginning of alloc_hugetlb_folio() and the
+call to vma_commit_reservation() after the folio was allocated. This would
be possible if hugetlb_reserve_pages was called for the same page in a shared
mapping. In such cases, the reservation count and subpool free page count
will be off by one. This rare condition can be identified by comparing the
diff --git a/Documentation/translations/zh_CN/mm/hugetlbfs_reserv.rst b/Documentation/translations/zh_CN/mm/hugetlbfs_reserv.rst
index 752e5696cd47..826a50c47389 100644
--- a/Documentation/translations/zh_CN/mm/hugetlbfs_reserv.rst
+++ b/Documentation/translations/zh_CN/mm/hugetlbfs_reserv.rst
@@ -142,14 +142,14 @@ HPAGE_RESV_OWNER标志被设置,以表明该VMA拥有预留。
消耗预留/分配一个巨页
===========================

-当与预留相关的巨页在相应的映射中被分配和实例化时,预留就被消耗了。该分配是在函数alloc_huge_page()
+当与预留相关的巨页在相应的映射中被分配和实例化时,预留就被消耗了。该分配是在函数alloc_hugetlb_folio()
中进行的::

- struct page *alloc_huge_page(struct vm_area_struct *vma,
+ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve)

-alloc_huge_page被传递给一个VMA指针和一个虚拟地址,因此它可以查阅预留映射以确定是否存在预留。
-此外,alloc_huge_page需要一个参数avoid_reserve,该参数表示即使看起来已经为指定的地址预留了
+alloc_hugetlb_folio被传递给一个VMA指针和一个虚拟地址,因此它可以查阅预留映射以确定是否存在预留。
+此外,alloc_hugetlb_folio需要一个参数avoid_reserve,该参数表示即使看起来已经为指定的地址预留了
预留,也不应该使用预留。avoid_reserve参数最常被用于写时拷贝和页面迁移的情况下,即现有页面的额
外拷贝被分配。

@@ -162,7 +162,7 @@ vma_needs_reservation()返回的值通常为0或1。如果该地址存在预留
确定预留是否存在并可用于分配后,调用dequeue_huge_page_vma()函数。这个函数需要两个与预留有关
的参数:

-- avoid_reserve,这是传递给alloc_huge_page()的同一个值/参数。
+- avoid_reserve,这是传递给alloc_hugetlb_folio()的同一个值/参数。
- chg,尽管这个参数的类型是long,但只有0或1的值被传递给dequeue_huge_page_vma。如果该值为0,
则表明存在预留(关于可能的问题,请参见 “预留和内存策略” 一节)。如果值
为1,则表示不存在预留,如果可能的话,必须从全局空闲池中取出该页。
@@ -179,7 +179,7 @@ free_huge_pages的值被递减。如果有一个与该页相关的预留,将
的剩余巨页和超额分配的问题。即使分配了一个多余的页面,也会进行与上面一样的基于预留的调整:
SetPagePrivate(page) 和 resv_huge_pages--.

-在获得一个新的巨页后,(page)->private被设置为与该页面相关的子池的值,如果它存在的话。当页
+在获得一个新的巨页后,(folio)->_hugetlb_subpool被设置为与该页面相关的子池的值,如果它存在的话。当页
面被释放时,这将被用于子池的计数。

然后调用函数vma_commit_reservation(),根据预留的消耗情况调整预留映射。一般来说,这涉及
@@ -199,7 +199,7 @@ SetPagePrivate(page)和resv_huge_pages-。
已经存在,所以不做任何改变。然而,如果共享映射中没有预留,或者这是一个私有映射,则必须创建
一个新的条目。

-在alloc_huge_page()开始调用vma_needs_reservation()和页面分配后调用
+在alloc_hugetlb_folio()开始调用vma_needs_reservation()和页面分配后调用
vma_commit_reservation()之间,预留映射有可能被改变。如果hugetlb_reserve_pages在共
享映射中为同一页面被调用,这将是可能的。在这种情况下,预留计数和子池空闲页计数会有一个偏差。
这种罕见的情况可以通过比较vma_needs_reservation和vma_commit_reservation的返回值来
--
2.39.0

2023-01-19 21:56:29

by Sidhartha Kumar

[permalink] [raw]
Subject: [PATCH 5/9] mm/hugetlb: convert alloc_huge_page to alloc_hugetlb_folio

Change alloc_huge_page() to alloc_hugetlb_folio() by changing all callers
to handle the now folio return type of the function. In this conversion,
alloc_huge_page_vma() is also changed to alloc_hugetlb_folio_vma(). Many
additions of the use of &folio->page are cleaned up in subsequent patches.

Signed-off-by: Sidhartha Kumar <[email protected]>
---
fs/hugetlbfs/inode.c | 38 +++++++-------
include/linux/hugetlb.h | 8 +--
mm/hugetlb.c | 113 +++++++++++++++++-----------------------
mm/mempolicy.c | 6 ++-
4 files changed, 76 insertions(+), 89 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 48f1a8ad2243..e2f8951103be 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -819,7 +819,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
* This is supposed to be the vaddr where the page is being
* faulted in, but we have no vaddr here.
*/
- struct page *page;
+ struct folio *folio;
unsigned long addr;

cond_resched();
@@ -844,48 +844,48 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
mutex_lock(&hugetlb_fault_mutex_table[hash]);

/* See if already present in mapping to avoid alloc/free */
- page = find_get_page(mapping, index);
- if (page) {
- put_page(page);
+ folio = filemap_get_folio(mapping, index);
+ if (folio) {
+ folio_put(folio);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
hugetlb_drop_vma_policy(&pseudo_vma);
continue;
}

/*
- * Allocate page without setting the avoid_reserve argument.
+ * Allocate folio without setting the avoid_reserve argument.
* There certainly are no reserves associated with the
* pseudo_vma. However, there could be shared mappings with
* reserves for the file at the inode level. If we fallocate
- * pages in these areas, we need to consume the reserves
+ * folios in these areas, we need to consume the reserves
* to keep reservation accounting consistent.
*/
- page = alloc_huge_page(&pseudo_vma, addr, 0);
+ folio = alloc_hugetlb_folio(&pseudo_vma, addr, 0);
hugetlb_drop_vma_policy(&pseudo_vma);
- if (IS_ERR(page)) {
+ if (IS_ERR(folio)) {
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
- error = PTR_ERR(page);
+ error = PTR_ERR(folio);
goto out;
}
- clear_huge_page(page, addr, pages_per_huge_page(h));
- __SetPageUptodate(page);
- error = hugetlb_add_to_page_cache(page, mapping, index);
+ clear_huge_page(&folio->page, addr, pages_per_huge_page(h));
+ __folio_mark_uptodate(folio);
+ error = hugetlb_add_to_page_cache(&folio->page, mapping, index);
if (unlikely(error)) {
- restore_reserve_on_error(h, &pseudo_vma, addr, page);
- put_page(page);
+ restore_reserve_on_error(h, &pseudo_vma, addr, &folio->page);
+ folio_put(folio);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
goto out;
}

mutex_unlock(&hugetlb_fault_mutex_table[hash]);

- SetHPageMigratable(page);
+ folio_set_hugetlb_migratable(folio);
/*
- * unlock_page because locked by hugetlb_add_to_page_cache()
- * put_page() due to reference from alloc_huge_page()
+ * folio_unlock because locked by hugetlb_add_to_page_cache()
+ * folio_put() due to reference from alloc_hugetlb_folio()
*/
- unlock_page(page);
- put_page(page);
+ folio_unlock(folio);
+ folio_put(folio);
}

if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index f88c832bdfa4..8f054d81e63b 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -716,11 +716,11 @@ struct huge_bootmem_page {
};

int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
-struct page *alloc_huge_page(struct vm_area_struct *vma,
+struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
nodemask_t *nmask, gfp_t gfp_mask);
-struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
+struct folio *alloc_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping,
pgoff_t idx);
@@ -1032,7 +1032,7 @@ static inline int isolate_or_dissolve_huge_page(struct page *page,
return -ENOMEM;
}

-static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
+static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr,
int avoid_reserve)
{
@@ -1046,7 +1046,7 @@ alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
return NULL;
}

-static inline struct page *alloc_huge_page_vma(struct hstate *h,
+static inline struct folio *alloc_hugetlb_folio_vma(struct hstate *h,
struct vm_area_struct *vma,
unsigned long address)
{
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6f25055c3ba5..46467e80716f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2493,7 +2493,7 @@ struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
}

/* mempolicy aware migration callback */
-struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
+struct folio *alloc_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address)
{
struct mempolicy *mpol;
@@ -2507,7 +2507,7 @@ struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
folio = alloc_hugetlb_folio_nodemask(h, node, nodemask, gfp_mask);
mpol_cond_put(mpol);

- return &folio->page;
+ return folio;
}

/*
@@ -2798,14 +2798,14 @@ static long vma_del_reservation(struct hstate *h,

/*
* This routine is called to restore reservation information on error paths.
- * It should ONLY be called for pages allocated via alloc_huge_page(), and
- * the hugetlb mutex should remain held when calling this routine.
+ * It should ONLY be called for folios allocated via alloc_hugetlb_folio(),
+ * and the hugetlb mutex should remain held when calling this routine.
*
* It handles two specific cases:
* 1) A reservation was in place and the page consumed the reservation.
* HPageRestoreReserve is set in the page.
* 2) No reservation was in place for the page, so HPageRestoreReserve is
- * not set. However, alloc_huge_page always updates the reserve map.
+ * not set. However, alloc_hugetlb_folio always updates the reserve map.
*
* In case 1, free_huge_page later in the error path will increment the
* global reserve count. But, free_huge_page does not have enough context
@@ -2814,7 +2814,7 @@ static long vma_del_reservation(struct hstate *h,
* reserve count adjustments to be made by free_huge_page. Make sure the
* reserve map indicates there is a reservation present.
*
- * In case 2, simply undo reserve map modifications done by alloc_huge_page.
+ * In case 2, simply undo reserve map modifications done by alloc_hugetlb_folio.
*/
void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma,
unsigned long address, struct page *page)
@@ -2844,8 +2844,8 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma,
if (!rc) {
/*
* This indicates there is an entry in the reserve map
- * not added by alloc_huge_page. We know it was added
- * before the alloc_huge_page call, otherwise
+ * not added by alloc_hugetlb_folio. We know it was added
+ * before the alloc_hugetlb_folio call, otherwise
* hugetlb_restore_reserve would be set on the folio.
* Remove the entry so that a subsequent allocation
* does not consume a reservation.
@@ -3014,7 +3014,7 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list)
return ret;
}

-struct page *alloc_huge_page(struct vm_area_struct *vma,
+struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve)
{
struct hugepage_subpool *spool = subpool_vma(vma);
@@ -3130,7 +3130,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h),
pages_per_huge_page(h), folio);
}
- return &folio->page;
+ return folio;

out_uncharge_cgroup:
hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg);
@@ -5080,34 +5080,34 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
} else if (page_try_dup_anon_rmap(ptepage, true,
src_vma)) {
pte_t src_pte_old = entry;
- struct page *new;
+ struct folio *new_folio;

spin_unlock(src_ptl);
spin_unlock(dst_ptl);
/* Do not use reserve as it's private owned */
- new = alloc_huge_page(dst_vma, addr, 1);
- if (IS_ERR(new)) {
+ new_folio = alloc_hugetlb_folio(dst_vma, addr, 1);
+ if (IS_ERR(new_folio)) {
put_page(ptepage);
- ret = PTR_ERR(new);
+ ret = PTR_ERR(new_folio);
break;
}
- copy_user_huge_page(new, ptepage, addr, dst_vma,
+ copy_user_huge_page(&new_folio->page, ptepage, addr, dst_vma,
npages);
put_page(ptepage);

- /* Install the new huge page if src pte stable */
+ /* Install the new hugetlb folio if src pte stable */
dst_ptl = huge_pte_lock(h, dst, dst_pte);
src_ptl = huge_pte_lockptr(h, src, src_pte);
spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
entry = huge_ptep_get(src_pte);
if (!pte_same(src_pte_old, entry)) {
restore_reserve_on_error(h, dst_vma, addr,
- new);
- put_page(new);
+ &new_folio->page);
+ folio_put(new_folio);
/* huge_ptep of dst_pte won't change as in child */
goto again;
}
- hugetlb_install_folio(dst_vma, dst_pte, addr, page_folio(new));
+ hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio);
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
continue;
@@ -5478,8 +5478,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
const bool unshare = flags & FAULT_FLAG_UNSHARE;
pte_t pte;
struct hstate *h = hstate_vma(vma);
- struct page *old_page, *new_page;
- struct folio *new_folio = NULL;
+ struct page *old_page;
+ struct folio *new_folio;
int outside_reserve = 0;
vm_fault_t ret = 0;
unsigned long haddr = address & huge_page_mask(h);
@@ -5540,9 +5540,9 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
* be acquired again before returning to the caller, as expected.
*/
spin_unlock(ptl);
- new_page = alloc_huge_page(vma, haddr, outside_reserve);
+ new_folio = alloc_hugetlb_folio(vma, haddr, outside_reserve);

- if (IS_ERR(new_page)) {
+ if (IS_ERR(new_folio)) {
/*
* If a process owning a MAP_PRIVATE mapping fails to COW,
* it is due to references held by a child and an insufficient
@@ -5587,13 +5587,9 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
return 0;
}

- ret = vmf_error(PTR_ERR(new_page));
+ ret = vmf_error(PTR_ERR(new_folio));
goto out_release_old;
}
-
- if (new_page)
- new_folio = page_folio(new_page);
-
/*
* When the original hugepage is shared one, it does not have
* anon_vma prepared.
@@ -5627,7 +5623,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
make_huge_pte(vma, &new_folio->page, !unshare));
folio_set_hugetlb_migratable(new_folio);
/* Make the old page be freed below */
- new_page = old_page;
+ new_folio = page_folio(old_page);
}
spin_unlock(ptl);
mmu_notifier_invalidate_range_end(&range);
@@ -5636,7 +5632,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
* No restore in case of successful pagetable update (Break COW or
* unshare)
*/
- if (new_page != old_page)
+ if (new_folio != page_folio(old_page))
restore_reserve_on_error(h, vma, haddr, &new_folio->page);
folio_put(new_folio);
out_release_old:
@@ -5759,8 +5755,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
vm_fault_t ret = VM_FAULT_SIGBUS;
int anon_rmap = 0;
unsigned long size;
- struct page *page;
- struct folio *folio = NULL;
+ struct folio *folio;
pte_t new_pte;
spinlock_t *ptl;
unsigned long haddr = address & huge_page_mask(h);
@@ -5784,8 +5779,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
* before we get page_table_lock.
*/
new_page = false;
- page = find_lock_page(mapping, idx);
- if (!page) {
+ folio = filemap_lock_folio(mapping, idx);
+ if (!folio) {
size = i_size_read(mapping->host) >> huge_page_shift(h);
if (idx >= size)
goto out;
@@ -5818,8 +5813,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
VM_UFFD_MISSING);
}

- page = alloc_huge_page(vma, haddr, 0);
- if (IS_ERR(page)) {
+ folio = alloc_hugetlb_folio(vma, haddr, 0);
+ if (IS_ERR(folio)) {
/*
* Returning error will result in faulting task being
* sent SIGBUS. The hugetlb fault mutex prevents two
@@ -5833,15 +5828,11 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
* sure there really is no pte entry.
*/
if (hugetlb_pte_stable(h, mm, ptep, old_pte))
- ret = vmf_error(PTR_ERR(page));
+ ret = vmf_error(PTR_ERR(folio));
else
ret = 0;
goto out;
}
-
- if (page)
- folio = page_folio(page);
-
clear_huge_page(&folio->page, address, pages_per_huge_page(h));
__folio_mark_uptodate(folio);
new_page = true;
@@ -5870,7 +5861,6 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
anon_rmap = 1;
}
} else {
- folio = page_folio(page);
/*
* If memory error occurs between mmap() and fault, some process
* don't have hwpoisoned swap entry for errored virtual address.
@@ -6185,15 +6175,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
pte_t _dst_pte;
spinlock_t *ptl;
int ret = -ENOMEM;
- struct page *page;
- struct folio *folio = NULL;
+ struct folio *folio;
int writable;
bool page_in_pagecache = false;

if (is_continue) {
ret = -EFAULT;
- page = find_lock_page(mapping, idx);
- if (!page)
+ folio = filemap_lock_folio(mapping, idx);
+ if (!folio)
goto out;
page_in_pagecache = true;
} else if (!*pagep) {
@@ -6206,34 +6195,34 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
goto out;
}

- page = alloc_huge_page(dst_vma, dst_addr, 0);
- if (IS_ERR(page)) {
+ folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0);
+ if (IS_ERR(folio)) {
ret = -ENOMEM;
goto out;
}

- ret = copy_huge_page_from_user(page,
+ ret = copy_huge_page_from_user(&folio->page,
(const void __user *) src_addr,
pages_per_huge_page(h), false);

/* fallback to copy_from_user outside mmap_lock */
if (unlikely(ret)) {
ret = -ENOENT;
- /* Free the allocated page which may have
+ /* Free the allocated folio which may have
* consumed a reservation.
*/
- restore_reserve_on_error(h, dst_vma, dst_addr, page);
- put_page(page);
+ restore_reserve_on_error(h, dst_vma, dst_addr, &folio->page);
+ folio_put(folio);

- /* Allocate a temporary page to hold the copied
+ /* Allocate a temporary folio to hold the copied
* contents.
*/
- page = alloc_huge_page_vma(h, dst_vma, dst_addr);
- if (!page) {
+ folio = alloc_hugetlb_folio_vma(h, dst_vma, dst_addr);
+ if (!folio) {
ret = -ENOMEM;
goto out;
}
- *pagep = page;
+ *pagep = &folio->page;
/* Set the outparam pagep and return to the caller to
* copy the contents outside the lock. Don't free the
* page.
@@ -6249,22 +6238,18 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
goto out;
}

- page = alloc_huge_page(dst_vma, dst_addr, 0);
- if (IS_ERR(page)) {
+ folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0);
+ if (IS_ERR(folio)) {
put_page(*pagep);
ret = -ENOMEM;
*pagep = NULL;
goto out;
}
- copy_user_huge_page(page, *pagep, dst_addr, dst_vma,
+ copy_user_huge_page(&folio->page, *pagep, dst_addr, dst_vma,
pages_per_huge_page(h));
put_page(*pagep);
*pagep = NULL;
}
-
- if (page)
- folio = page_folio(page);
-
/*
* The memory barrier inside __folio_mark_uptodate makes sure that
* preceding stores to the page contents become visible before
@@ -6887,7 +6872,7 @@ bool hugetlb_reserve_pages(struct inode *inode,
/*
* pages in this range were added to the reserve
* map between region_chg and region_add. This
- * indicates a race with alloc_huge_page. Adjust
+ * indicates a race with alloc_hugetlb_folio. Adjust
* the subpool and reserve counts modified above
* based on the difference.
*/
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index fd99d303e34f..945b41c245a5 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1218,9 +1218,11 @@ static struct page *new_page(struct page *page, unsigned long start)
break;
}

- if (folio_test_hugetlb(src))
- return alloc_huge_page_vma(page_hstate(&src->page),
+ if (folio_test_hugetlb(src)) {
+ dst = alloc_hugetlb_folio_vma(folio_hstate(src),
vma, address);
+ return &dst->page;
+ }

if (folio_test_large(src))
gfp = GFP_TRANSHUGE;
--
2.39.0

2023-01-19 22:00:03

by Sidhartha Kumar

[permalink] [raw]
Subject: [PATCH 8/9] mm/hugetlb: convert hugetlb_wp() to take in a folio

Change the pagecache_page argument of hugetlb_wp to pagecache_folio.
Replaces a call to find_lock_page() with filemap_lock_folio().

Signed-off-by: Sidhartha Kumar <[email protected]>
---
mm/hugetlb.c | 34 +++++++++++++++++-----------------
1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4ab3eda6db18..20127271b64c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5472,7 +5472,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
*/
static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, pte_t *ptep, unsigned int flags,
- struct page *pagecache_page, spinlock_t *ptl)
+ struct folio *pagecache_folio, spinlock_t *ptl)
{
const bool unshare = flags & FAULT_FLAG_UNSHARE;
pte_t pte;
@@ -5529,7 +5529,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
* of the full address range.
*/
if (is_vma_resv_set(vma, HPAGE_RESV_OWNER) &&
- old_page != pagecache_page)
+ page_folio(old_page) != pagecache_folio)
outside_reserve = 1;

get_page(old_page);
@@ -5923,7 +5923,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
hugetlb_count_add(pages_per_huge_page(h), mm);
if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
/* Optimization, do the COW without a second fault */
- ret = hugetlb_wp(mm, vma, address, ptep, flags, &folio->page, ptl);
+ ret = hugetlb_wp(mm, vma, address, ptep, flags, folio, ptl);
}

spin_unlock(ptl);
@@ -5986,7 +5986,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
u32 hash;
pgoff_t idx;
struct page *page = NULL;
- struct page *pagecache_page = NULL;
+ struct folio *pagecache_folio = NULL;
struct hstate *h = hstate_vma(vma);
struct address_space *mapping;
int need_wait_lock = 0;
@@ -6068,7 +6068,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
/* Just decrements count, does not deallocate */
vma_end_reservation(h, vma, haddr);

- pagecache_page = find_lock_page(mapping, idx);
+ pagecache_folio = filemap_lock_folio(mapping, idx);
}

ptl = huge_pte_lock(h, mm, ptep);
@@ -6088,9 +6088,9 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
};

spin_unlock(ptl);
- if (pagecache_page) {
- unlock_page(pagecache_page);
- put_page(pagecache_page);
+ if (pagecache_folio) {
+ folio_unlock(pagecache_folio);
+ folio_put(pagecache_folio);
}
hugetlb_vma_unlock_read(vma);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
@@ -6099,22 +6099,22 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,

/*
* hugetlb_wp() requires page locks of pte_page(entry) and
- * pagecache_page, so here we need take the former one
- * when page != pagecache_page or !pagecache_page.
+ * pagecache_folio, so here we need take the former one
+ * when page != pagecache_folio or !pagecache_folio.
*/
page = pte_page(entry);
- if (page != pagecache_page)
+ if (page_folio(page) != pagecache_folio)
if (!trylock_page(page)) {
need_wait_lock = 1;
goto out_ptl;
}

- get_page(page);
+ folio_get(pagecache_folio);

if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
if (!huge_pte_write(entry)) {
ret = hugetlb_wp(mm, vma, address, ptep, flags,
- pagecache_page, ptl);
+ pagecache_folio, ptl);
goto out_put_page;
} else if (likely(flags & FAULT_FLAG_WRITE)) {
entry = huge_pte_mkdirty(entry);
@@ -6125,15 +6125,15 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
flags & FAULT_FLAG_WRITE))
update_mmu_cache(vma, haddr, ptep);
out_put_page:
- if (page != pagecache_page)
+ if (page_folio(page) != pagecache_folio)
unlock_page(page);
put_page(page);
out_ptl:
spin_unlock(ptl);

- if (pagecache_page) {
- unlock_page(pagecache_page);
- put_page(pagecache_page);
+ if (pagecache_folio) {
+ folio_unlock(pagecache_folio);
+ folio_put(pagecache_folio);
}
out_mutex:
hugetlb_vma_unlock_read(vma);
--
2.39.0

2023-01-20 06:16:32

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/9] mm/hugetlb: convert hugetlbfs_pagecache_present() to folios

On Thu, Jan 19, 2023 at 01:14:39PM -0800, Sidhartha Kumar wrote:
> +++ b/mm/hugetlb.c
> @@ -5653,15 +5653,15 @@ static bool hugetlbfs_pagecache_present(struct hstate *h,
> {
> struct address_space *mapping;
> pgoff_t idx;
> - struct page *page;
> + struct folio *folio;
>
> mapping = vma->vm_file->f_mapping;
> idx = vma_hugecache_offset(h, vma, address);
>
> - page = find_get_page(mapping, idx);
> - if (page)
> - put_page(page);
> - return page != NULL;
> + folio = filemap_get_folio(mapping, idx);
> + if (folio)
> + folio_put(folio);
> + return folio != NULL;
> }

Seems to me this function could be ...

struct address_space *mapping = vma->vm_file->f_mapping;
pgoff_t index = vma_hugecache_offset(h, vma, address);
bool present;

rcu_read_lock();
present = page_cache_next_miss(mapping, index, 1) != index;
rcu_read_unlock();

return present;

No need to get/drop a refcount on the folio. It's a bit similar to
filemap_range_has_page(), but the API is wrong. Maybe there's room
for a little refactoring here.

2023-01-20 23:48:09

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH 2/9] mm/hugetlb: convert hugetlbfs_pagecache_present() to folios

On 01/20/23 05:43, Matthew Wilcox wrote:
> On Thu, Jan 19, 2023 at 01:14:39PM -0800, Sidhartha Kumar wrote:
> > +++ b/mm/hugetlb.c
> > @@ -5653,15 +5653,15 @@ static bool hugetlbfs_pagecache_present(struct hstate *h,
> > {
> > struct address_space *mapping;
> > pgoff_t idx;
> > - struct page *page;
> > + struct folio *folio;
> >
> > mapping = vma->vm_file->f_mapping;
> > idx = vma_hugecache_offset(h, vma, address);
> >
> > - page = find_get_page(mapping, idx);
> > - if (page)
> > - put_page(page);
> > - return page != NULL;
> > + folio = filemap_get_folio(mapping, idx);
> > + if (folio)
> > + folio_put(folio);
> > + return folio != NULL;
> > }
>
> Seems to me this function could be ...
>
> struct address_space *mapping = vma->vm_file->f_mapping;
> pgoff_t index = vma_hugecache_offset(h, vma, address);
> bool present;
>
> rcu_read_lock();
> present = page_cache_next_miss(mapping, index, 1) != index;
> rcu_read_unlock();
>
> return present;
>
> No need to get/drop a refcount on the folio. It's a bit similar to
> filemap_range_has_page(), but the API is wrong. Maybe there's room
> for a little refactoring here.

Thanks Matthew, I did not know those APIs were available. Perhaps just
use page_cache_next_miss as suggested above for now.

FYI - There is the same pattern in hugetlbfs_fallocate()

/* See if already present in mapping to avoid alloc/free */
folio = filemap_get_folio(mapping, index);
if (folio) {
folio_put(folio);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
hugetlb_drop_vma_policy(&pseudo_vma);
continue;
}

--
Mike Kravetz

2023-01-24 15:24:30

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [PATCH 8/9] mm/hugetlb: convert hugetlb_wp() to take in a folio

On Thu, 19 Jan 2023 13:14:45 -0800
Sidhartha Kumar <[email protected]> wrote:

[...]
> page = pte_page(entry);
> - if (page != pagecache_page)
> + if (page_folio(page) != pagecache_folio)
> if (!trylock_page(page)) {
> need_wait_lock = 1;
> goto out_ptl;
> }
>
> - get_page(page);
> + folio_get(pagecache_folio);
>

We get a kernel crash on s390 in mprotect testcase from libhugetlbfs
testsuite, starting with next-20230120, bisected to this commit.

We get here with pagecache_folio == NULL, and crash in folio_get().
It doesn´t seem right to replace the get_page() with folio_get()
here, the matching put_page() at out_put_page: also wasn't changed
correspondingly. Also, pagecache_folio == NULL seems to be a valid
case here, on all architectures.

Reverting this folio_get() to get_page() fixes the crash. Not sure
though if I missed something. I think you only want to replace
pagecache_page with pagecache_folio, like in the rest of the commit,
and not page -> pagecache_folio for this get_page().

2023-01-24 19:11:22

by Sidhartha Kumar

[permalink] [raw]
Subject: Re: [PATCH 8/9] mm/hugetlb: convert hugetlb_wp() to take in a folio

On 1/24/23 7:23 AM, Gerald Schaefer wrote:
> On Thu, 19 Jan 2023 13:14:45 -0800
> Sidhartha Kumar <[email protected]> wrote:
>
> [...]
>> page = pte_page(entry);
>> - if (page != pagecache_page)
>> + if (page_folio(page) != pagecache_folio)
>> if (!trylock_page(page)) {
>> need_wait_lock = 1;
>> goto out_ptl;
>> }
>>
>> - get_page(page);
>> + folio_get(pagecache_folio);
>>
>
> We get a kernel crash on s390 in mprotect testcase from libhugetlbfs
> testsuite, starting with next-20230120, bisected to this commit.
>
> We get here with pagecache_folio == NULL, and crash in folio_get().
> It doesn´t seem right to replace the get_page() with folio_get()
> here, the matching put_page() at out_put_page: also wasn't changed
> correspondingly. Also, pagecache_folio == NULL seems to be a valid
> case here, on all architectures.
>
> Reverting this folio_get() to get_page() fixes the crash. Not sure
> though if I missed something. I think you only want to replace
> pagecache_page with pagecache_folio, like in the rest of the commit,
> and not page -> pagecache_folio for this get_page().
Ya that get_page(page) line should have stayed how it was before as
pagecache_folio is replacing instances of pagecache_page. Thanks for
catching this, I'll fix this change in a v2 of this patch series.

Thanks,
Sidhartha Kumar