2019-06-20 17:28:26

by Song Liu

[permalink] [raw]
Subject: [PATCH v4 0/6] [PATCH v3 0/6] Enable THP for text section of non-shmem files

Changes v3 => v4:

1. Put the logic to drop THP from pagecache in a separate function (Rik).
2. Move the function to drop THP from pagecache to exit_mmap().
3. Revise confusing commit log 6/6.

Changes v2 => v3:
1. Removed the limitation (cannot write to file with THP) by truncating
whole file during sys_open (see 6/6);
2. Fixed a VM_BUG_ON_PAGE() in filemap_fault() (see 2/6);
3. Split function rename to a separate patch (Rik);
4. Updated condition in hugepage_vma_check() (Rik).

Changes v1 => v2:
1. Fixed a missing mem_cgroup_commit_charge() for non-shmem case.

This set follows up discussion at LSF/MM 2019. The motivation is to put
text section of an application in THP, and thus reduces iTLB miss rate and
improves performance. Both Facebook and Oracle showed strong interests to
this feature.

To make reviews easier, this set aims a mininal valid product. Current
version of the work does not have any changes to file system specific
code. This comes with some limitations (discussed later).

This set enables an application to "hugify" its text section by simply
running something like:

madvise(0x600000, 0x80000, MADV_HUGEPAGE);

Before this call, the /proc/<pid>/maps looks like:

00400000-074d0000 r-xp 00000000 00:27 2006927 app

After this call, part of the text section is split out and mapped to
THP:

00400000-00425000 r-xp 00000000 00:27 2006927 app
00600000-00e00000 r-xp 00200000 00:27 2006927 app <<< on THP
00e00000-074d0000 r-xp 00a00000 00:27 2006927 app

Limitations:

1. This only works for text section (vma with VM_DENYWRITE).
2. Original limitation #2 is removed in v3.

We gated this feature with an experimental config, READ_ONLY_THP_FOR_FS.
Once we get better support on the write path, we can remove the config and
enable it by default.

Tested cases:
1. Tested with btrfs and ext4.
2. Tested with real work application (memcache like caching service).
3. Tested with "THP aware uprobe":
https://patchwork.kernel.org/project/linux-mm/list/?series=131339

Please share your comments and suggestions on this.

Thanks!

Song Liu (6):
filemap: check compound_head(page)->mapping in filemap_fault()
filemap: update offset check in filemap_fault()
mm,thp: stats for file backed THP
khugepaged: rename collapse_shmem() and khugepaged_scan_shmem()
mm,thp: add read-only THP support for (non-shmem) FS
mm,thp: avoid writes to file with THP in pagecache

fs/inode.c | 3 ++
fs/proc/meminfo.c | 4 ++
include/linux/fs.h | 31 ++++++++++++
include/linux/mmzone.h | 2 +
mm/Kconfig | 11 +++++
mm/filemap.c | 9 ++--
mm/khugepaged.c | 104 +++++++++++++++++++++++++++++++++--------
mm/mmap.c | 14 ++++++
mm/rmap.c | 12 +++--
mm/vmstat.c | 2 +
10 files changed, 164 insertions(+), 28 deletions(-)

--
2.17.1


2019-06-20 17:29:41

by Song Liu

[permalink] [raw]
Subject: [PATCH v4 6/6] mm,thp: avoid writes to file with THP in pagecache

In previous patch, an application could put part of its text section in
THP via madvise(). These THPs will be protected from writes when the
application is still running (TXTBSY). However, after the application
exits, the file is available for writes.

This patch avoids writes to file THP by dropping page cache for the file
when the last vma with VM_DENYWRITE is removed. A new counter nr_thps is
added to struct address_space. In exit_mmap(), if nr_thps is non-zero, we
drop page cache for the whole file.

Signed-off-by: Song Liu <[email protected]>
---
fs/inode.c | 3 +++
include/linux/fs.h | 31 +++++++++++++++++++++++++++++++
mm/filemap.c | 1 +
mm/khugepaged.c | 4 +++-
mm/mmap.c | 14 ++++++++++++++
5 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/fs/inode.c b/fs/inode.c
index df6542ec3b88..518113a4e219 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -181,6 +181,9 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
mapping->flags = 0;
mapping->wb_err = 0;
atomic_set(&mapping->i_mmap_writable, 0);
+#ifdef CONFIG_READ_ONLY_THP_FOR_FS
+ atomic_set(&mapping->nr_thps, 0);
+#endif
mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
mapping->private_data = NULL;
mapping->writeback_index = 0;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f7fdfe93e25d..3edf4ee42eee 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -444,6 +444,10 @@ struct address_space {
struct xarray i_pages;
gfp_t gfp_mask;
atomic_t i_mmap_writable;
+#ifdef CONFIG_READ_ONLY_THP_FOR_FS
+ /* number of thp, only for non-shmem files */
+ atomic_t nr_thps;
+#endif
struct rb_root_cached i_mmap;
struct rw_semaphore i_mmap_rwsem;
unsigned long nrpages;
@@ -2790,6 +2794,33 @@ static inline errseq_t filemap_sample_wb_err(struct address_space *mapping)
return errseq_sample(&mapping->wb_err);
}

+static inline int filemap_nr_thps(struct address_space *mapping)
+{
+#ifdef CONFIG_READ_ONLY_THP_FOR_FS
+ return atomic_read(&mapping->nr_thps);
+#else
+ return 0;
+#endif
+}
+
+static inline void filemap_nr_thps_inc(struct address_space *mapping)
+{
+#ifdef CONFIG_READ_ONLY_THP_FOR_FS
+ atomic_inc(&mapping->nr_thps);
+#else
+ WARN_ON_ONCE(1);
+#endif
+}
+
+static inline void filemap_nr_thps_dec(struct address_space *mapping)
+{
+#ifdef CONFIG_READ_ONLY_THP_FOR_FS
+ atomic_dec(&mapping->nr_thps);
+#else
+ WARN_ON_ONCE(1);
+#endif
+}
+
extern int vfs_fsync_range(struct file *file, loff_t start, loff_t end,
int datasync);
extern int vfs_fsync(struct file *file, int datasync);
diff --git a/mm/filemap.c b/mm/filemap.c
index e79ceccdc6df..a8e86c136381 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -205,6 +205,7 @@ static void unaccount_page_cache_page(struct address_space *mapping,
__dec_node_page_state(page, NR_SHMEM_THPS);
} else if (PageTransHuge(page)) {
__dec_node_page_state(page, NR_FILE_THPS);
+ filemap_nr_thps_dec(mapping);
}

/*
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index fbcff5a1d65a..17ebe9da56ce 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1500,8 +1500,10 @@ static void collapse_file(struct vm_area_struct *vma,

if (is_shmem)
__inc_node_page_state(new_page, NR_SHMEM_THPS);
- else
+ else {
__inc_node_page_state(new_page, NR_FILE_THPS);
+ filemap_nr_thps_inc(mapping);
+ }

if (nr_none) {
struct zone *zone = page_zone(new_page);
diff --git a/mm/mmap.c b/mm/mmap.c
index 7e8c3e8ae75f..8094ce028d74 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3088,6 +3088,18 @@ int vm_brk(unsigned long addr, unsigned long len)
}
EXPORT_SYMBOL(vm_brk);

+static inline void release_file_thp(struct vm_area_struct *vma)
+{
+#ifdef CONFIG_READ_ONLY_THP_FOR_FS
+ struct file *file = vma->vm_file;
+
+ if (file && (vma->vm_flags & VM_DENYWRITE) &&
+ atomic_read(&file_inode(file)->i_writecount) == 0 &&
+ filemap_nr_thps(file_inode(file)->i_mapping))
+ truncate_pagecache(file_inode(file), 0);
+#endif
+}
+
/* Release all mmaps. */
void exit_mmap(struct mm_struct *mm)
{
@@ -3153,6 +3165,8 @@ void exit_mmap(struct mm_struct *mm)
while (vma) {
if (vma->vm_flags & VM_ACCOUNT)
nr_accounted += vma_pages(vma);
+
+ release_file_thp(vma);
vma = remove_vma(vma);
}
vm_unacct_memory(nr_accounted);
--
2.17.1

2019-06-20 17:29:52

by Song Liu

[permalink] [raw]
Subject: [PATCH v4 3/6] mm,thp: stats for file backed THP

In preparation for non-shmem THP, this patch adds two stats and exposes
them in /proc/meminfo

Acked-by: Rik van Riel <[email protected]>
Signed-off-by: Song Liu <[email protected]>
---
fs/proc/meminfo.c | 4 ++++
include/linux/mmzone.h | 2 ++
mm/vmstat.c | 2 ++
3 files changed, 8 insertions(+)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 568d90e17c17..bac395fc11f9 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -136,6 +136,10 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
global_node_page_state(NR_SHMEM_THPS) * HPAGE_PMD_NR);
show_val_kb(m, "ShmemPmdMapped: ",
global_node_page_state(NR_SHMEM_PMDMAPPED) * HPAGE_PMD_NR);
+ show_val_kb(m, "FileHugePages: ",
+ global_node_page_state(NR_FILE_THPS) * HPAGE_PMD_NR);
+ show_val_kb(m, "FilePmdMapped: ",
+ global_node_page_state(NR_FILE_PMDMAPPED) * HPAGE_PMD_NR);
#endif

#ifdef CONFIG_CMA
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 70394cabaf4e..827f9b777938 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -234,6 +234,8 @@ enum node_stat_item {
NR_SHMEM, /* shmem pages (included tmpfs/GEM pages) */
NR_SHMEM_THPS,
NR_SHMEM_PMDMAPPED,
+ NR_FILE_THPS,
+ NR_FILE_PMDMAPPED,
NR_ANON_THPS,
NR_UNSTABLE_NFS, /* NFS unstable pages */
NR_VMSCAN_WRITE,
diff --git a/mm/vmstat.c b/mm/vmstat.c
index fd7e16ca6996..6afc892a148a 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1158,6 +1158,8 @@ const char * const vmstat_text[] = {
"nr_shmem",
"nr_shmem_hugepages",
"nr_shmem_pmdmapped",
+ "nr_file_hugepages",
+ "nr_file_pmdmapped",
"nr_anon_transparent_hugepages",
"nr_unstable",
"nr_vmscan_write",
--
2.17.1

2019-06-20 17:29:55

by Song Liu

[permalink] [raw]
Subject: [PATCH v4 1/6] filemap: check compound_head(page)->mapping in filemap_fault()

Currently, filemap_fault() avoids trace condition with truncate by
checking page->mapping == mapping. This does not work for compound
pages. This patch let it check compound_head(page)->mapping instead.

Acked-by: Rik van Riel <[email protected]>
Signed-off-by: Song Liu <[email protected]>
---
mm/filemap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index df2006ba0cfa..f5b79a43946d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2517,7 +2517,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
goto out_retry;

/* Did it get truncated? */
- if (unlikely(page->mapping != mapping)) {
+ if (unlikely(compound_head(page)->mapping != mapping)) {
unlock_page(page);
put_page(page);
goto retry_find;
--
2.17.1

2019-06-20 17:29:59

by Song Liu

[permalink] [raw]
Subject: [PATCH v4 4/6] khugepaged: rename collapse_shmem() and khugepaged_scan_shmem()

Next patch will add khugepaged support of non-shmem files. This patch
renames these two functions to reflect the new functionality:

collapse_shmem() => collapse_file()
khugepaged_scan_shmem() => khugepaged_scan_file()

Acked-by: Rik van Riel <[email protected]>
Signed-off-by: Song Liu <[email protected]>
---
mm/khugepaged.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 0f7419938008..dde8e45552b3 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1287,7 +1287,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
}

/**
- * collapse_shmem - collapse small tmpfs/shmem pages into huge one.
+ * collapse_file - collapse small tmpfs/shmem pages into huge one.
*
* Basic scheme is simple, details are more complex:
* - allocate and lock a new huge page;
@@ -1304,10 +1304,11 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
* + restore gaps in the page cache;
* + unlock and free huge page;
*/
-static void collapse_shmem(struct mm_struct *mm,
+static void collapse_file(struct vm_area_struct *vma,
struct address_space *mapping, pgoff_t start,
struct page **hpage, int node)
{
+ struct mm_struct *mm = vma->vm_mm;
gfp_t gfp;
struct page *new_page;
struct mem_cgroup *memcg;
@@ -1563,7 +1564,7 @@ static void collapse_shmem(struct mm_struct *mm,
/* TODO: tracepoints */
}

-static void khugepaged_scan_shmem(struct mm_struct *mm,
+static void khugepaged_scan_file(struct vm_area_struct *vma,
struct address_space *mapping,
pgoff_t start, struct page **hpage)
{
@@ -1631,14 +1632,14 @@ static void khugepaged_scan_shmem(struct mm_struct *mm,
result = SCAN_EXCEED_NONE_PTE;
} else {
node = khugepaged_find_target_node();
- collapse_shmem(mm, mapping, start, hpage, node);
+ collapse_file(vma, mapping, start, hpage, node);
}
}

/* TODO: tracepoints */
}
#else
-static void khugepaged_scan_shmem(struct mm_struct *mm,
+static void khugepaged_scan_file(struct vm_area_struct *vma,
struct address_space *mapping,
pgoff_t start, struct page **hpage)
{
@@ -1722,7 +1723,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
file = get_file(vma->vm_file);
up_read(&mm->mmap_sem);
ret = 1;
- khugepaged_scan_shmem(mm, file->f_mapping,
+ khugepaged_scan_file(vma, file->f_mapping,
pgoff, hpage);
fput(file);
} else {
--
2.17.1

2019-06-20 17:30:04

by Song Liu

[permalink] [raw]
Subject: [PATCH v4 5/6] mm,thp: add read-only THP support for (non-shmem) FS

This patch is (hopefully) the first step to enable THP for non-shmem
filesystems.

This patch enables an application to put part of its text sections to THP
via madvise, for example:

madvise((void *)0x600000, 0x200000, MADV_HUGEPAGE);

We tried to reuse the logic for THP on tmpfs.

Currently, write is not supported for non-shmem THP. khugepaged will only
process vma with VM_DENYWRITE. The next patch will handle writes, which
would only happen when the vma with VM_DENYWRITE is unmapped.

An EXPERIMENTAL config, READ_ONLY_THP_FOR_FS, is added to gate this
feature.

Signed-off-by: Song Liu <[email protected]>
---
mm/Kconfig | 11 ++++++
mm/filemap.c | 4 +--
mm/khugepaged.c | 91 +++++++++++++++++++++++++++++++++++++++++--------
mm/rmap.c | 12 ++++---
4 files changed, 97 insertions(+), 21 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index f0c76ba47695..546d45d9bdab 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -762,6 +762,17 @@ config GUP_BENCHMARK

See tools/testing/selftests/vm/gup_benchmark.c

+config READ_ONLY_THP_FOR_FS
+ bool "Read-only THP for filesystems (EXPERIMENTAL)"
+ depends on TRANSPARENT_HUGE_PAGECACHE && SHMEM
+
+ help
+ Allow khugepaged to put read-only file-backed pages in THP.
+
+ This is marked experimental because it makes files with thp in
+ the page cache read-only. To overwrite the file, it need to be
+ truncated or removed first.
+
config ARCH_HAS_PTE_SPECIAL
bool

diff --git a/mm/filemap.c b/mm/filemap.c
index 5f072a113535..e79ceccdc6df 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -203,8 +203,8 @@ static void unaccount_page_cache_page(struct address_space *mapping,
__mod_node_page_state(page_pgdat(page), NR_SHMEM, -nr);
if (PageTransHuge(page))
__dec_node_page_state(page, NR_SHMEM_THPS);
- } else {
- VM_BUG_ON_PAGE(PageTransHuge(page), page);
+ } else if (PageTransHuge(page)) {
+ __dec_node_page_state(page, NR_FILE_THPS);
}

/*
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index dde8e45552b3..fbcff5a1d65a 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -48,6 +48,7 @@ enum scan_result {
SCAN_CGROUP_CHARGE_FAIL,
SCAN_EXCEED_SWAP_PTE,
SCAN_TRUNCATED,
+ SCAN_PAGE_HAS_PRIVATE,
};

#define CREATE_TRACE_POINTS
@@ -404,7 +405,11 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
(vm_flags & VM_NOHUGEPAGE) ||
test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
return false;
- if (shmem_file(vma->vm_file)) {
+
+ if (shmem_file(vma->vm_file) ||
+ (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
+ vma->vm_file &&
+ (vm_flags & VM_DENYWRITE))) {
if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE))
return false;
return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
@@ -456,8 +461,9 @@ int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
unsigned long hstart, hend;

/*
- * khugepaged does not yet work on non-shmem files or special
- * mappings. And file-private shmem THP is not supported.
+ * khugepaged only supports read-only files for non-shmem files.
+ * khugepaged does not yet work on special mappings. And
+ * file-private shmem THP is not supported.
*/
if (!hugepage_vma_check(vma, vm_flags))
return 0;
@@ -1287,12 +1293,12 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
}

/**
- * collapse_file - collapse small tmpfs/shmem pages into huge one.
+ * collapse_file - collapse filemap/tmpfs/shmem pages into huge one.
*
* Basic scheme is simple, details are more complex:
* - allocate and lock a new huge page;
* - scan page cache replacing old pages with the new one
- * + swap in pages if necessary;
+ * + swap/gup in pages if necessary;
* + fill in gaps;
* + keep old pages around in case rollback is required;
* - if replacing succeeds:
@@ -1316,7 +1322,11 @@ static void collapse_file(struct vm_area_struct *vma,
LIST_HEAD(pagelist);
XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER);
int nr_none = 0, result = SCAN_SUCCEED;
+ bool is_shmem = shmem_file(vma->vm_file);

+#ifndef CONFIG_READ_ONLY_THP_FOR_FS
+ VM_BUG_ON(!is_shmem);
+#endif
VM_BUG_ON(start & (HPAGE_PMD_NR - 1));

/* Only allocate from the target node */
@@ -1348,7 +1358,8 @@ static void collapse_file(struct vm_area_struct *vma,
} while (1);

__SetPageLocked(new_page);
- __SetPageSwapBacked(new_page);
+ if (is_shmem)
+ __SetPageSwapBacked(new_page);
new_page->index = start;
new_page->mapping = mapping;

@@ -1363,7 +1374,7 @@ static void collapse_file(struct vm_area_struct *vma,
struct page *page = xas_next(&xas);

VM_BUG_ON(index != xas.xa_index);
- if (!page) {
+ if (is_shmem && !page) {
/*
* Stop if extent has been truncated or hole-punched,
* and is now completely empty.
@@ -1384,7 +1395,7 @@ static void collapse_file(struct vm_area_struct *vma,
continue;
}

- if (xa_is_value(page) || !PageUptodate(page)) {
+ if (is_shmem && (xa_is_value(page) || !PageUptodate(page))) {
xas_unlock_irq(&xas);
/* swap in or instantiate fallocated page */
if (shmem_getpage(mapping->host, index, &page,
@@ -1392,6 +1403,24 @@ static void collapse_file(struct vm_area_struct *vma,
result = SCAN_FAIL;
goto xa_unlocked;
}
+ } else if (!page || xa_is_value(page)) {
+ unsigned long vaddr;
+
+ VM_BUG_ON(is_shmem);
+
+ vaddr = vma->vm_start +
+ ((index - vma->vm_pgoff) << PAGE_SHIFT);
+ xas_unlock_irq(&xas);
+ if (get_user_pages_remote(NULL, mm, vaddr, 1,
+ FOLL_FORCE, &page, NULL, NULL) != 1) {
+ result = SCAN_FAIL;
+ goto xa_unlocked;
+ }
+ lru_add_drain();
+ lock_page(page);
+ } else if (!PageUptodate(page) || PageDirty(page)) {
+ result = SCAN_FAIL;
+ goto xa_locked;
} else if (trylock_page(page)) {
get_page(page);
xas_unlock_irq(&xas);
@@ -1426,6 +1455,12 @@ static void collapse_file(struct vm_area_struct *vma,
goto out_unlock;
}

+ if (page_has_private(page) &&
+ !try_to_release_page(page, GFP_KERNEL)) {
+ result = SCAN_PAGE_HAS_PRIVATE;
+ break;
+ }
+
if (page_mapped(page))
unmap_mapping_pages(mapping, index, 1, false);

@@ -1463,12 +1498,18 @@ static void collapse_file(struct vm_area_struct *vma,
goto xa_unlocked;
}

- __inc_node_page_state(new_page, NR_SHMEM_THPS);
+ if (is_shmem)
+ __inc_node_page_state(new_page, NR_SHMEM_THPS);
+ else
+ __inc_node_page_state(new_page, NR_FILE_THPS);
+
if (nr_none) {
struct zone *zone = page_zone(new_page);

__mod_node_page_state(zone->zone_pgdat, NR_FILE_PAGES, nr_none);
- __mod_node_page_state(zone->zone_pgdat, NR_SHMEM, nr_none);
+ if (is_shmem)
+ __mod_node_page_state(zone->zone_pgdat,
+ NR_SHMEM, nr_none);
}

xa_locked:
@@ -1506,10 +1547,15 @@ static void collapse_file(struct vm_area_struct *vma,

SetPageUptodate(new_page);
page_ref_add(new_page, HPAGE_PMD_NR - 1);
- set_page_dirty(new_page);
mem_cgroup_commit_charge(new_page, memcg, false, true);
+
+ if (is_shmem) {
+ set_page_dirty(new_page);
+ lru_cache_add_anon(new_page);
+ } else {
+ lru_cache_add_file(new_page);
+ }
count_memcg_events(memcg, THP_COLLAPSE_ALLOC, 1);
- lru_cache_add_anon(new_page);

/*
* Remove pte page tables, so we can re-fault the page as huge.
@@ -1524,7 +1570,9 @@ static void collapse_file(struct vm_area_struct *vma,
/* Something went wrong: roll back page cache changes */
xas_lock_irq(&xas);
mapping->nrpages -= nr_none;
- shmem_uncharge(mapping->host, nr_none);
+
+ if (is_shmem)
+ shmem_uncharge(mapping->host, nr_none);

xas_set(&xas, start);
xas_for_each(&xas, page, end - 1) {
@@ -1607,6 +1655,17 @@ static void khugepaged_scan_file(struct vm_area_struct *vma,
break;
}

+ if (page_has_private(page) && trylock_page(page)) {
+ int ret;
+
+ ret = try_to_release_page(page, GFP_KERNEL);
+ unlock_page(page);
+ if (!ret) {
+ result = SCAN_PAGE_HAS_PRIVATE;
+ break;
+ }
+ }
+
if (page_count(page) != 1 + page_mapcount(page)) {
result = SCAN_PAGE_COUNT;
break;
@@ -1714,11 +1773,13 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
- if (shmem_file(vma->vm_file)) {
+ if (vma->vm_file) {
struct file *file;
pgoff_t pgoff = linear_page_index(vma,
khugepaged_scan.address);
- if (!shmem_huge_enabled(vma))
+
+ if (shmem_file(vma->vm_file)
+ && !shmem_huge_enabled(vma))
goto skip;
file = get_file(vma->vm_file);
up_read(&mm->mmap_sem);
diff --git a/mm/rmap.c b/mm/rmap.c
index e5dfe2ae6b0d..87cfa2c19eda 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1192,8 +1192,10 @@ void page_add_file_rmap(struct page *page, bool compound)
}
if (!atomic_inc_and_test(compound_mapcount_ptr(page)))
goto out;
- VM_BUG_ON_PAGE(!PageSwapBacked(page), page);
- __inc_node_page_state(page, NR_SHMEM_PMDMAPPED);
+ if (PageSwapBacked(page))
+ __inc_node_page_state(page, NR_SHMEM_PMDMAPPED);
+ else
+ __inc_node_page_state(page, NR_FILE_PMDMAPPED);
} else {
if (PageTransCompound(page) && page_mapping(page)) {
VM_WARN_ON_ONCE(!PageLocked(page));
@@ -1232,8 +1234,10 @@ static void page_remove_file_rmap(struct page *page, bool compound)
}
if (!atomic_add_negative(-1, compound_mapcount_ptr(page)))
goto out;
- VM_BUG_ON_PAGE(!PageSwapBacked(page), page);
- __dec_node_page_state(page, NR_SHMEM_PMDMAPPED);
+ if (PageSwapBacked(page))
+ __dec_node_page_state(page, NR_SHMEM_PMDMAPPED);
+ else
+ __dec_node_page_state(page, NR_FILE_PMDMAPPED);
} else {
if (!atomic_add_negative(-1, &page->_mapcount))
goto out;
--
2.17.1

2019-06-20 17:30:37

by Song Liu

[permalink] [raw]
Subject: [PATCH v4 2/6] filemap: update offset check in filemap_fault()

With THP, current check of offset:

VM_BUG_ON_PAGE(page->index != offset, page);

is no longer accurate. Update it to:

VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page);

Acked-by: Rik van Riel <[email protected]>
Signed-off-by: Song Liu <[email protected]>
---
mm/filemap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index f5b79a43946d..5f072a113535 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2522,7 +2522,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
put_page(page);
goto retry_find;
}
- VM_BUG_ON_PAGE(page->index != offset, page);
+ VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page);

/*
* We have a locked page in the page cache, now we need to check
--
2.17.1

2019-06-20 17:35:15

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH v4 5/6] mm,thp: add read-only THP support for (non-shmem) FS

On Thu, 2019-06-20 at 10:27 -0700, Song Liu wrote:
> This patch is (hopefully) the first step to enable THP for non-shmem
> filesystems.
>
> This patch enables an application to put part of its text sections to
> THP
> via madvise, for example:
>
> madvise((void *)0x600000, 0x200000, MADV_HUGEPAGE);
>
> We tried to reuse the logic for THP on tmpfs.
>
> Currently, write is not supported for non-shmem THP. khugepaged will
> only
> process vma with VM_DENYWRITE. The next patch will handle writes,
> which
> would only happen when the vma with VM_DENYWRITE is unmapped.
>
> An EXPERIMENTAL config, READ_ONLY_THP_FOR_FS, is added to gate this
> feature.
>
> Signed-off-by: Song Liu <[email protected]>

Acked-by: Rik van Riel <[email protected]>

(I suppose I should have sent this out last night,
while I was posting questions about patch 6)

2019-06-20 17:43:11

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH v4 6/6] mm,thp: avoid writes to file with THP in pagecache

On Thu, 2019-06-20 at 10:27 -0700, Song Liu wrote:

> +++ b/mm/mmap.c
> @@ -3088,6 +3088,18 @@ int vm_brk(unsigned long addr, unsigned long
> len)
> }
> EXPORT_SYMBOL(vm_brk);
>
> +static inline void release_file_thp(struct vm_area_struct *vma)
> +{
> +#ifdef CONFIG_READ_ONLY_THP_FOR_FS
> + struct file *file = vma->vm_file;
> +
> + if (file && (vma->vm_flags & VM_DENYWRITE) &&
> + atomic_read(&file_inode(file)->i_writecount) == 0 &&
> + filemap_nr_thps(file_inode(file)->i_mapping))
> + truncate_pagecache(file_inode(file), 0);
> +#endif
> +}
> +
> /* Release all mmaps. */
> void exit_mmap(struct mm_struct *mm)
> {
> @@ -3153,6 +3165,8 @@ void exit_mmap(struct mm_struct *mm)
> while (vma) {
> if (vma->vm_flags & VM_ACCOUNT)
> nr_accounted += vma_pages(vma);
> +
> + release_file_thp(vma);
> vma = remove_vma(vma);
> }
> vm_unacct_memory(nr_accounted);

I like how you make the file accessible again to other
users, but am somewhat unsure about the mechanism used.

First, if multiple processes have the same file mmapped,
do you really want to blow away the page cache?

Secondly, by hooking into exit_mmap, you miss making
files writable again that get unmapped through munmap.

Would it be better to blow away the page cache when
the last mmap user unmaps it?

The page->mapping->i_mmap interval tree will be empty
when nobody has the file mmap()d.

Alternatively, open() could check whether the file is
currently mmaped, and blow away the page cache then.
That would leave the page cache intact if the same file
gets execve()d several times in a row without any writes
in-between, which seems like it might be a relatively
common case.