2020-11-11 20:48:49

by Zi Yan

[permalink] [raw]
Subject: [RFC PATCH 0/6] Split huge pages to any lower order pages.

From: Zi Yan <[email protected]>

Hi all,

With Matthew's THP in pagecache patches[1], we will be able to handle any size
pagecache THPs, but currently split_huge_page can only split a THP to order-0
pages. This can easily erase the benefit of having pagecache THPs, when
operations like truncate might want to keep pages larger than order-0. In
response, here is the patches to add support for splitting a THP to any lower
order pages. In addition, this patchset prepares for my PUD THP patchset[2],
since splitting a PUD THP to multiple PMD THPs can be handled by
split_huge_page_to_list_to_order function added by this patchset, which reduces
a lot of redundant code without just replicating split_huge_page for PUD THP.

The patchset is on top of Matthew's pagecache/next tree[3].

To ease the tests of split_huge_page functions, I added a new debugfs interface
at <debugfs>/split_huge_pages_in_range_pid, so developers can split THPs in a
given range from a process with the given pid by writing
"<pid>,<vaddr_start>,<vaddr_end>,<to_order>" to the interface. I also added a
new test program to test 1) split PMD THPs, 2) split pagecache THPs to any lower
order, and 3) truncating a pagecache THP to a page with a lower order.

Suggestions and comments are welcome. Thanks.


[1] https://lore.kernel.org/linux-mm/[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/
[3] https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/next

Zi Yan (6):
mm: huge_memory: add new debugfs interface to trigger split huge page
on any page range.
mm: memcg: make memcg huge page split support any order split.
mm: page_owner: add support for splitting to any order in split
page_owner.
mm: thp: add support for split huge page to any lower order pages.
mm: truncate: split thp to a non-zero order if possible.
mm: huge_memory: enable debugfs to split huge pages to any order.

include/linux/huge_mm.h | 8 +
include/linux/memcontrol.h | 5 +-
include/linux/page_owner.h | 7 +-
mm/huge_memory.c | 177 ++++++++++--
mm/internal.h | 1 +
mm/memcontrol.c | 4 +-
mm/migrate.c | 2 +-
mm/page_alloc.c | 2 +-
mm/page_owner.c | 6 +-
mm/swap.c | 1 -
mm/truncate.c | 22 +-
tools/testing/selftests/vm/Makefile | 1 +
.../selftests/vm/split_huge_page_test.c | 255 ++++++++++++++++++
13 files changed, 453 insertions(+), 38 deletions(-)
create mode 100644 tools/testing/selftests/vm/split_huge_page_test.c

--
2.28.0


2020-11-11 20:49:04

by Zi Yan

[permalink] [raw]
Subject: [RFC PATCH 5/6] mm: truncate: split thp to a non-zero order if possible.

From: Zi Yan <[email protected]>

To minimize the number of pages after a truncation, when truncating a
THP, we do not need to split it all the way down to order-0. The THP has
at most three parts, the part before offset, the part to be truncated,
the part left at the end. Use the non-zero minimum of them to decide
what order we split the THP to.

Signed-off-by: Zi Yan <[email protected]>
---
mm/truncate.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index 20bd17538ec2..6d8e3c6115bc 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -237,7 +237,7 @@ int truncate_inode_page(struct address_space *mapping, struct page *page)
bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end)
{
loff_t pos = page_offset(page);
- unsigned int offset, length;
+ unsigned int offset, length, left, min_subpage_size = PAGE_SIZE;

if (pos < start)
offset = start - pos;
@@ -248,6 +248,7 @@ bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end)
length = length - offset;
else
length = end + 1 - pos - offset;
+ left = thp_size(page) - offset - length;

wait_on_page_writeback(page);
if (length == thp_size(page)) {
@@ -267,7 +268,24 @@ bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end)
do_invalidatepage(page, offset, length);
if (!PageTransHuge(page))
return true;
- return split_huge_page(page) == 0;
+
+ /*
+ * find the non-zero minimum of offset, length, and left and use it to
+ * decide the new order of the page after split
+ */
+ if (offset && left)
+ min_subpage_size = min_t(unsigned int,
+ min_t(unsigned int, offset, length),
+ left);
+ else if (!offset)
+ min_subpage_size = min_t(unsigned int, length, left);
+ else /* !left */
+ min_subpage_size = min_t(unsigned int, length, offset);
+
+ min_subpage_size = max_t(unsigned int, PAGE_SIZE, min_subpage_size);
+
+ return split_huge_page_to_list_to_order(page, NULL,
+ ilog2(min_subpage_size/PAGE_SIZE)) == 0;
}

/*
--
2.28.0

2020-11-11 20:49:17

by Zi Yan

[permalink] [raw]
Subject: [RFC PATCH 6/6] mm: huge_memory: enable debugfs to split huge pages to any order.

From: Zi Yan <[email protected]>

It is used to test split_huge_page_to_list_to_order for pagecache THPs.
Also add test cases for split_huge_page_to_list_to_order via both
debugfs and truncating a file.

Signed-off-by: Zi Yan <[email protected]>
---
mm/huge_memory.c | 13 +--
.../selftests/vm/split_huge_page_test.c | 102 +++++++++++++++++-
2 files changed, 105 insertions(+), 10 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 88f50da40c9b..b7470607a08b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2974,7 +2974,7 @@ static ssize_t split_huge_pages_in_range_pid_write(struct file *file,
static DEFINE_MUTEX(mutex);
ssize_t ret;
char input_buf[80]; /* hold pid, start_vaddr, end_vaddr */
- int pid;
+ int pid, to_order = 0;
unsigned long vaddr_start, vaddr_end, addr;
nodemask_t task_nodes;
struct mm_struct *mm;
@@ -2990,8 +2990,9 @@ static ssize_t split_huge_pages_in_range_pid_write(struct file *file,
goto out;

input_buf[80] = '\0';
- ret = sscanf(input_buf, "%d,%lx,%lx", &pid, &vaddr_start, &vaddr_end);
- if (ret != 3) {
+ ret = sscanf(input_buf, "%d,%lx,%lx,%d", &pid, &vaddr_start, &vaddr_end, &to_order);
+ /* cannot split to order-1 THP, which is not possible */
+ if ((ret != 3 && ret != 4) || to_order == 1) {
ret = -EINVAL;
goto out;
}
@@ -2999,8 +3000,8 @@ static ssize_t split_huge_pages_in_range_pid_write(struct file *file,
vaddr_end &= PAGE_MASK;

ret = strlen(input_buf);
- pr_debug("split huge pages in pid: %d, vaddr: [%lx - %lx]\n",
- pid, vaddr_start, vaddr_end);
+ pr_debug("split huge pages in pid: %d, vaddr: [%lx - %lx], to order: %d\n",
+ pid, vaddr_start, vaddr_end, to_order);

mm = find_mm_struct(pid, &task_nodes);
if (IS_ERR(mm)) {
@@ -3038,7 +3039,7 @@ static ssize_t split_huge_pages_in_range_pid_write(struct file *file,
addr += page_size(page) - PAGE_SIZE;

/* reset addr if split fails */
- if (split_huge_page(page))
+ if (split_huge_page_to_list_to_order(page, NULL, to_order))
addr -= (page_size(page) - PAGE_SIZE);

unlock_page(page);
diff --git a/tools/testing/selftests/vm/split_huge_page_test.c b/tools/testing/selftests/vm/split_huge_page_test.c
index c8a32ae9e13a..bcbc5a9d327c 100644
--- a/tools/testing/selftests/vm/split_huge_page_test.c
+++ b/tools/testing/selftests/vm/split_huge_page_test.c
@@ -16,6 +16,7 @@
#include <sys/wait.h>
#include <malloc.h>
#include <stdbool.h>
+#include <time.h>

#define PAGE_4KB (4096UL)
#define PAGE_2MB (512UL*PAGE_4KB)
@@ -31,6 +32,7 @@

#define SPLIT_DEBUGFS "/sys/kernel/debug/split_huge_pages_in_range_pid"
#define SMAP_PATH "/proc/self/smaps"
+#define THP_FS_PATH "/mnt/thp_fs"
#define INPUT_MAX 80

static int write_file(const char *path, const char *buf, size_t buflen)
@@ -50,13 +52,13 @@ static int write_file(const char *path, const char *buf, size_t buflen)
return (unsigned int) numwritten;
}

-static void write_debugfs(int pid, uint64_t vaddr_start, uint64_t vaddr_end)
+static void write_debugfs(int pid, uint64_t vaddr_start, uint64_t vaddr_end, int order)
{
char input[INPUT_MAX];
int ret;

- ret = snprintf(input, INPUT_MAX, "%d,%lx,%lx", pid, vaddr_start,
- vaddr_end);
+ ret = snprintf(input, INPUT_MAX, "%d,%lx,%lx,%d", pid, vaddr_start,
+ vaddr_end, order);
if (ret >= INPUT_MAX) {
printf("%s: Debugfs input is too long\n", __func__);
exit(EXIT_FAILURE);
@@ -139,7 +141,7 @@ void split_pmd_thp(void)
}

/* split all possible huge pages */
- write_debugfs(getpid(), (uint64_t)one_page, (uint64_t)one_page + len);
+ write_debugfs(getpid(), (uint64_t)one_page, (uint64_t)one_page + len, 0);

*one_page = 0;

@@ -153,9 +155,101 @@ void split_pmd_thp(void)
free(one_page);
}

+void create_pagecache_thp_and_fd(size_t fd_size, int *fd, char **addr)
+{
+ const char testfile[] = THP_FS_PATH "/test";
+ size_t i;
+ int dummy;
+
+ srand(time(NULL));
+
+ *fd = open(testfile, O_CREAT | O_RDWR, 0664);
+
+ for (i = 0; i < fd_size; i++) {
+ unsigned char byte = rand();
+
+ write(*fd, &byte, sizeof(byte));
+ }
+ close(*fd);
+ sync();
+ *fd = open("/proc/sys/vm/drop_caches", O_WRONLY);
+ if (*fd == -1) {
+ perror("open drop_caches");
+ exit(EXIT_FAILURE);
+ }
+ if (write(*fd, "3", 1) != 1) {
+ perror("write to drop_caches");
+ exit(EXIT_FAILURE);
+ }
+ close(*fd);
+
+ *fd = open(testfile, O_RDWR);
+
+ *addr = mmap(NULL, fd_size, PROT_READ|PROT_WRITE, MAP_SHARED, *fd, 0);
+ if (*addr == (char *)-1) {
+ perror("cannot mmap");
+ exit(1);
+ }
+ madvise(*addr, fd_size, MADV_HUGEPAGE);
+
+ for (size_t i = 0; i < fd_size; i++)
+ dummy += *(*addr + i);
+}
+
+void split_thp_in_pagecache_to_order(int order)
+{
+ int fd;
+ char *addr;
+ size_t fd_size = 2 * PAGE_2MB, i;
+
+ create_pagecache_thp_and_fd(fd_size, &fd, &addr);
+
+ printf("split %ld kB pagecache page to order %d ... ", fd_size >> 10, order);
+ write_debugfs(getpid(), (uint64_t)addr, (uint64_t)addr + fd_size, order);
+
+ for (i = 0; i < fd_size; i++)
+ *(addr + i) = (char)i;
+
+ close(fd);
+ printf("done\n");
+}
+
+void truncate_thp_in_pagecache_to_order(int order)
+{
+ int fd;
+ char *addr;
+ size_t fd_size = 2 * PAGE_2MB, i;
+
+ create_pagecache_thp_and_fd(fd_size, &fd, &addr);
+
+ printf("truncate %ld kB pagecache page to size %lu kB ... ", fd_size >> 10, 4UL << order);
+ ftruncate(fd, PAGE_4KB << order);
+
+ for (i = 0; i < (PAGE_4KB << order); i++)
+ *(addr + i) = (char)i;
+
+ close(fd);
+ printf("done\n");
+}
+
int main(int argc, char **argv)
{
+ int i;
+
+ if (geteuid() != 0) {
+ printf("Please run the benchmark as root\n");
+ exit(EXIT_FAILURE);
+ }
+
split_pmd_thp();

+ for (i = 8; i >= 0; i--)
+ if (i != 1)
+ split_thp_in_pagecache_to_order(i);
+
+ for (i = 8; i >= 0; i--)
+ if (i != 1)
+ truncate_thp_in_pagecache_to_order(i);
+
return 0;
}
--
2.28.0

2020-11-11 20:49:22

by Zi Yan

[permalink] [raw]
Subject: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

From: Zi Yan <[email protected]>

It adds a new_order parameter to set new page order in page owner.
It prepares for upcoming changes to support split huge page to any lower
order.

Signed-off-by: Zi Yan <[email protected]>
---
include/linux/page_owner.h | 7 ++++---
mm/huge_memory.c | 2 +-
mm/page_alloc.c | 2 +-
mm/page_owner.c | 6 +++---
4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
index 3468794f83d2..215cbb159568 100644
--- a/include/linux/page_owner.h
+++ b/include/linux/page_owner.h
@@ -31,10 +31,11 @@ static inline void set_page_owner(struct page *page,
__set_page_owner(page, order, gfp_mask);
}

-static inline void split_page_owner(struct page *page, unsigned int nr)
+static inline void split_page_owner(struct page *page, unsigned int nr,
+ unsigned int new_order)
{
if (static_branch_unlikely(&page_owner_inited))
- __split_page_owner(page, nr);
+ __split_page_owner(page, nr, new_order);
}
static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
{
@@ -60,7 +61,7 @@ static inline void set_page_owner(struct page *page,
{
}
static inline void split_page_owner(struct page *page,
- unsigned int order)
+ unsigned int nr, unsigned int new_order)
{
}
static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f599f5b9bf7f..8b7d771ee962 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2459,7 +2459,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,

ClearPageCompound(head);

- split_page_owner(head, nr);
+ split_page_owner(head, nr, 1);

/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d77220615fd5..a9eead0e091a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3284,7 +3284,7 @@ void split_page(struct page *page, unsigned int order)

for (i = 1; i < (1 << order); i++)
set_page_refcounted(page + i);
- split_page_owner(page, 1 << order);
+ split_page_owner(page, 1 << order, 1);
}
EXPORT_SYMBOL_GPL(split_page);

diff --git a/mm/page_owner.c b/mm/page_owner.c
index b735a8eafcdb..2b7f7e9056dc 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -204,7 +204,7 @@ void __set_page_owner_migrate_reason(struct page *page, int reason)
page_owner->last_migrate_reason = reason;
}

-void __split_page_owner(struct page *page, unsigned int nr)
+void __split_page_owner(struct page *page, unsigned int nr, unsigned int new_order)
{
int i;
struct page_ext *page_ext = lookup_page_ext(page);
@@ -213,9 +213,9 @@ void __split_page_owner(struct page *page, unsigned int nr)
if (unlikely(!page_ext))
return;

- for (i = 0; i < nr; i++) {
+ for (i = 0; i < nr; i += (1 << new_order)) {
page_owner = get_page_owner(page_ext);
- page_owner->order = 0;
+ page_owner->order = new_order;
page_ext = page_ext_next(page_ext);
}
}
--
2.28.0

2020-11-11 20:55:43

by Zi Yan

[permalink] [raw]
Subject: [RFC PATCH 2/6] mm: memcg: make memcg huge page split support any order split.

From: Zi Yan <[email protected]>

It reads thp_nr_pages and splits to provided new_nr. It prepares for
upcoming changes to support split huge page to any lower order.

Signed-off-by: Zi Yan <[email protected]>
---
include/linux/memcontrol.h | 5 +++--
mm/huge_memory.c | 2 +-
mm/memcontrol.c | 4 ++--
3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0f4dd7829fb2..b3bac79ceed6 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1105,7 +1105,7 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm,
}

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-void mem_cgroup_split_huge_fixup(struct page *head);
+void mem_cgroup_split_huge_fixup(struct page *head, unsigned int new_nr);
#endif

#else /* CONFIG_MEMCG */
@@ -1451,7 +1451,8 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
return 0;
}

-static inline void mem_cgroup_split_huge_fixup(struct page *head)
+static inline void mem_cgroup_split_huge_fixup(struct page *head,
+ unsigned int new_nr)
{
}

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c4fead5ead31..f599f5b9bf7f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2429,7 +2429,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
lruvec = mem_cgroup_page_lruvec(head, pgdat);

/* complete memcg works before add pages to LRU */
- mem_cgroup_split_huge_fixup(head);
+ mem_cgroup_split_huge_fixup(head, 1);

if (PageAnon(head) && PageSwapCache(head)) {
swp_entry_t entry = { .val = page_private(head) };
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 33f632689cee..e9705ba6bbcc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3247,7 +3247,7 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
* Because tail pages are not marked as "used", set it. We're under
* pgdat->lru_lock and migration entries setup in all page mappings.
*/
-void mem_cgroup_split_huge_fixup(struct page *head)
+void mem_cgroup_split_huge_fixup(struct page *head, unsigned int new_nr)
{
struct mem_cgroup *memcg = page_memcg(head);
int i;
@@ -3255,7 +3255,7 @@ void mem_cgroup_split_huge_fixup(struct page *head)
if (mem_cgroup_disabled())
return;

- for (i = 1; i < thp_nr_pages(head); i++) {
+ for (i = new_nr; i < thp_nr_pages(head); i += new_nr) {
css_get(&memcg->css);
head[i].memcg_data = (unsigned long)memcg;
}
--
2.28.0

2020-11-12 17:59:16

by Ralph Campbell

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.


On 11/11/20 12:40 PM, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> It adds a new_order parameter to set new page order in page owner.
> It prepares for upcoming changes to support split huge page to any lower
> order.
>
> Signed-off-by: Zi Yan <[email protected]>

Except for a minor fix below, you can add:
Reviewed-by: Ralph Campbell <[email protected]>

> ---
> include/linux/page_owner.h | 7 ++++---
> mm/huge_memory.c | 2 +-
> mm/page_alloc.c | 2 +-
> mm/page_owner.c | 6 +++---
> 4 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
> index 3468794f83d2..215cbb159568 100644
> --- a/include/linux/page_owner.h
> +++ b/include/linux/page_owner.h
> @@ -31,10 +31,11 @@ static inline void set_page_owner(struct page *page,
> __set_page_owner(page, order, gfp_mask);
> }
>
> -static inline void split_page_owner(struct page *page, unsigned int nr)
> +static inline void split_page_owner(struct page *page, unsigned int nr,
> + unsigned int new_order)
> {
> if (static_branch_unlikely(&page_owner_inited))
> - __split_page_owner(page, nr);
> + __split_page_owner(page, nr, new_order);
> }
> static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
> {
> @@ -60,7 +61,7 @@ static inline void set_page_owner(struct page *page,
> {
> }
> static inline void split_page_owner(struct page *page,
> - unsigned int order)
> + unsigned int nr, unsigned int new_order)
> {
> }
> static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index f599f5b9bf7f..8b7d771ee962 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2459,7 +2459,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>
> ClearPageCompound(head);
>
> - split_page_owner(head, nr);
> + split_page_owner(head, nr, 1);

Shouldn't this be 0, not 1?
(new_order not new_nr).

> /* See comment in __split_huge_page_tail() */
> if (PageAnon(head)) {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d77220615fd5..a9eead0e091a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3284,7 +3284,7 @@ void split_page(struct page *page, unsigned int order)
>
> for (i = 1; i < (1 << order); i++)
> set_page_refcounted(page + i);
> - split_page_owner(page, 1 << order);
> + split_page_owner(page, 1 << order, 1);

Ditto, 0.

> }
> EXPORT_SYMBOL_GPL(split_page);

> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index b735a8eafcdb..2b7f7e9056dc 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -204,7 +204,7 @@ void __set_page_owner_migrate_reason(struct page *page, int reason)
> page_owner->last_migrate_reason = reason;
> }
>
> -void __split_page_owner(struct page *page, unsigned int nr)
> +void __split_page_owner(struct page *page, unsigned int nr, unsigned int new_order)
> {
> int i;
> struct page_ext *page_ext = lookup_page_ext(page);
> @@ -213,9 +213,9 @@ void __split_page_owner(struct page *page, unsigned int nr)
> if (unlikely(!page_ext))
> return;
>
> - for (i = 0; i < nr; i++) {
> + for (i = 0; i < nr; i += (1 << new_order)) {
> page_owner = get_page_owner(page_ext);
> - page_owner->order = 0;
> + page_owner->order = new_order;
> page_ext = page_ext_next(page_ext);
> }
> }
>

2020-11-12 18:00:27

by Ralph Campbell

[permalink] [raw]
Subject: Re: [RFC PATCH 2/6] mm: memcg: make memcg huge page split support any order split.


On 11/11/20 12:40 PM, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> It reads thp_nr_pages and splits to provided new_nr. It prepares for
> upcoming changes to support split huge page to any lower order.
>
> Signed-off-by: Zi Yan <[email protected]>

Looks OK to me.
Reviewed-by: Ralph Campbell <[email protected]>

> ---
> include/linux/memcontrol.h | 5 +++--
> mm/huge_memory.c | 2 +-
> mm/memcontrol.c | 4 ++--
> 3 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 0f4dd7829fb2..b3bac79ceed6 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1105,7 +1105,7 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm,
> }
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -void mem_cgroup_split_huge_fixup(struct page *head);
> +void mem_cgroup_split_huge_fixup(struct page *head, unsigned int new_nr);
> #endif
>
> #else /* CONFIG_MEMCG */
> @@ -1451,7 +1451,8 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
> return 0;
> }
>
> -static inline void mem_cgroup_split_huge_fixup(struct page *head)
> +static inline void mem_cgroup_split_huge_fixup(struct page *head,
> + unsigned int new_nr)
> {
> }
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c4fead5ead31..f599f5b9bf7f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2429,7 +2429,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> lruvec = mem_cgroup_page_lruvec(head, pgdat);
>
> /* complete memcg works before add pages to LRU */
> - mem_cgroup_split_huge_fixup(head);
> + mem_cgroup_split_huge_fixup(head, 1);
>
> if (PageAnon(head) && PageSwapCache(head)) {
> swp_entry_t entry = { .val = page_private(head) };
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 33f632689cee..e9705ba6bbcc 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3247,7 +3247,7 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
> * Because tail pages are not marked as "used", set it. We're under
> * pgdat->lru_lock and migration entries setup in all page mappings.
> */
> -void mem_cgroup_split_huge_fixup(struct page *head)
> +void mem_cgroup_split_huge_fixup(struct page *head, unsigned int new_nr)
> {
> struct mem_cgroup *memcg = page_memcg(head);
> int i;
> @@ -3255,7 +3255,7 @@ void mem_cgroup_split_huge_fixup(struct page *head)
> if (mem_cgroup_disabled())
> return;
>
> - for (i = 1; i < thp_nr_pages(head); i++) {
> + for (i = new_nr; i < thp_nr_pages(head); i += new_nr) {
> css_get(&memcg->css);
> head[i].memcg_data = (unsigned long)memcg;
> }
>

2020-11-12 18:02:39

by Zi Yan

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On 12 Nov 2020, at 12:57, Ralph Campbell wrote:

> On 11/11/20 12:40 PM, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> It adds a new_order parameter to set new page order in page owner.
>> It prepares for upcoming changes to support split huge page to any lower
>> order.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>
> Except for a minor fix below, you can add:
> Reviewed-by: Ralph Campbell <[email protected]>

Thanks.

>
>> ---
>> include/linux/page_owner.h | 7 ++++---
>> mm/huge_memory.c | 2 +-
>> mm/page_alloc.c | 2 +-
>> mm/page_owner.c | 6 +++---
>> 4 files changed, 9 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
>> index 3468794f83d2..215cbb159568 100644
>> --- a/include/linux/page_owner.h
>> +++ b/include/linux/page_owner.h
>> @@ -31,10 +31,11 @@ static inline void set_page_owner(struct page *page,
>> __set_page_owner(page, order, gfp_mask);
>> }
>> -static inline void split_page_owner(struct page *page, unsigned int nr)
>> +static inline void split_page_owner(struct page *page, unsigned int nr,
>> + unsigned int new_order)
>> {
>> if (static_branch_unlikely(&page_owner_inited))
>> - __split_page_owner(page, nr);
>> + __split_page_owner(page, nr, new_order);
>> }
>> static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
>> {
>> @@ -60,7 +61,7 @@ static inline void set_page_owner(struct page *page,
>> {
>> }
>> static inline void split_page_owner(struct page *page,
>> - unsigned int order)
>> + unsigned int nr, unsigned int new_order)
>> {
>> }
>> static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index f599f5b9bf7f..8b7d771ee962 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2459,7 +2459,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>> ClearPageCompound(head);
>> - split_page_owner(head, nr);
>> + split_page_owner(head, nr, 1);
>
> Shouldn't this be 0, not 1?
> (new_order not new_nr).
>

Yes, I forgot to fix the call site after I change the function signature. Thanks.

>> /* See comment in __split_huge_page_tail() */
>> if (PageAnon(head)) {
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index d77220615fd5..a9eead0e091a 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3284,7 +3284,7 @@ void split_page(struct page *page, unsigned int order)
>> for (i = 1; i < (1 << order); i++)
>> set_page_refcounted(page + i);
>> - split_page_owner(page, 1 << order);
>> + split_page_owner(page, 1 << order, 1);
>
> Ditto, 0.
>

Sure, will fix this too.



Best Regards,
Yan Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2020-11-12 18:02:56

by Zi Yan

[permalink] [raw]
Subject: Re: [RFC PATCH 2/6] mm: memcg: make memcg huge page split support any order split.

On 12 Nov 2020, at 12:58, Ralph Campbell wrote:

> On 11/11/20 12:40 PM, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> It reads thp_nr_pages and splits to provided new_nr. It prepares for
>> upcoming changes to support split huge page to any lower order.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>
> Looks OK to me.
> Reviewed-by: Ralph Campbell <[email protected]>

Thanks.


Best Regards,
Yan Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2020-11-12 22:12:00

by Ralph Campbell

[permalink] [raw]
Subject: Re: [RFC PATCH 5/6] mm: truncate: split thp to a non-zero order if possible.


On 11/11/20 12:40 PM, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> To minimize the number of pages after a truncation, when truncating a
> THP, we do not need to split it all the way down to order-0. The THP has
> at most three parts, the part before offset, the part to be truncated,
> the part left at the end. Use the non-zero minimum of them to decide
> what order we split the THP to.
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> mm/truncate.c | 22 ++++++++++++++++++++--
> 1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 20bd17538ec2..6d8e3c6115bc 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -237,7 +237,7 @@ int truncate_inode_page(struct address_space *mapping, struct page *page)
> bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end)
> {
> loff_t pos = page_offset(page);
> - unsigned int offset, length;
> + unsigned int offset, length, left, min_subpage_size = PAGE_SIZE;

Maybe use "remaining" instead of "left" since I think of the latter as the length of the
left side (offset).

> if (pos < start)
> offset = start - pos;
> @@ -248,6 +248,7 @@ bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end)
> length = length - offset;
> else
> length = end + 1 - pos - offset;
> + left = thp_size(page) - offset - length;
>
> wait_on_page_writeback(page);
> if (length == thp_size(page)) {
> @@ -267,7 +268,24 @@ bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end)
> do_invalidatepage(page, offset, length);
> if (!PageTransHuge(page))
> return true;
> - return split_huge_page(page) == 0;
> +
> + /*
> + * find the non-zero minimum of offset, length, and left and use it to
> + * decide the new order of the page after split
> + */
> + if (offset && left)
> + min_subpage_size = min_t(unsigned int,
> + min_t(unsigned int, offset, length),
> + left);
> + else if (!offset)
> + min_subpage_size = min_t(unsigned int, length, left);
> + else /* !left */
> + min_subpage_size = min_t(unsigned int, length, offset);
> +
> + min_subpage_size = max_t(unsigned int, PAGE_SIZE, min_subpage_size);
> +
> + return split_huge_page_to_list_to_order(page, NULL,
> + ilog2(min_subpage_size/PAGE_SIZE)) == 0;
> }

What if "min_subpage_size" is 1/2 the THP but offset isn't aligned to 1/2?
Splitting the page in half wouldn't result in a page that could be freed
but maybe splitting to 1/4 would (assuming the THP is at least 8x PAGE_SIZE).

2020-11-12 22:41:35

by Zi Yan

[permalink] [raw]
Subject: Re: [RFC PATCH 5/6] mm: truncate: split thp to a non-zero order if possible.

On 12 Nov 2020, at 17:08, Ralph Campbell wrote:

> On 11/11/20 12:40 PM, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> To minimize the number of pages after a truncation, when truncating a
>> THP, we do not need to split it all the way down to order-0. The THP has
>> at most three parts, the part before offset, the part to be truncated,
>> the part left at the end. Use the non-zero minimum of them to decide
>> what order we split the THP to.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> mm/truncate.c | 22 ++++++++++++++++++++--
>> 1 file changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/truncate.c b/mm/truncate.c
>> index 20bd17538ec2..6d8e3c6115bc 100644
>> --- a/mm/truncate.c
>> +++ b/mm/truncate.c
>> @@ -237,7 +237,7 @@ int truncate_inode_page(struct address_space *mapping, struct page *page)
>> bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end)
>> {
>> loff_t pos = page_offset(page);
>> - unsigned int offset, length;
>> + unsigned int offset, length, left, min_subpage_size = PAGE_SIZE;
>
> Maybe use "remaining" instead of "left" since I think of the latter as the length of the
> left side (offset).

Sure. Will change the name.

>
>> if (pos < start)
>> offset = start - pos;
>> @@ -248,6 +248,7 @@ bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end)
>> length = length - offset;
>> else
>> length = end + 1 - pos - offset;
>> + left = thp_size(page) - offset - length;
>> wait_on_page_writeback(page);
>> if (length == thp_size(page)) {
>> @@ -267,7 +268,24 @@ bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end)
>> do_invalidatepage(page, offset, length);
>> if (!PageTransHuge(page))
>> return true;
>> - return split_huge_page(page) == 0;
>> +
>> + /*
>> + * find the non-zero minimum of offset, length, and left and use it to
>> + * decide the new order of the page after split
>> + */
>> + if (offset && left)
>> + min_subpage_size = min_t(unsigned int,
>> + min_t(unsigned int, offset, length),
>> + left);
>> + else if (!offset)
>> + min_subpage_size = min_t(unsigned int, length, left);
>> + else /* !left */
>> + min_subpage_size = min_t(unsigned int, length, offset);
>> +
>> + min_subpage_size = max_t(unsigned int, PAGE_SIZE, min_subpage_size);
>> +
>> + return split_huge_page_to_list_to_order(page, NULL,
>> + ilog2(min_subpage_size/PAGE_SIZE)) == 0;
>> }
>
> What if "min_subpage_size" is 1/2 the THP but offset isn't aligned to 1/2?
> Splitting the page in half wouldn't result in a page that could be freed
> but maybe splitting to 1/4 would (assuming the THP is at least 8x PAGE_SIZE).

Is it possible? The whole THP is divided into three parts, offset, length, and
remaining (renamed from left). If offset is not aligned to 1/2, it is either
greater than 1/2 or smaller than 1/2. If it is the former, length and remaining
will be smaller than 1/2, so min_subpage_size cannot be 1/2. If it is the latter,
min_subpage_size cannot be 1/2 either. Because min_subpage_size is the smallest
non-zero value of offset, length, and remaining. Let me know if I miss anything.


Best Regards,
Yan Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2020-11-14 00:20:34

by Roman Gushchin

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On Wed, Nov 11, 2020 at 03:40:05PM -0500, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> It adds a new_order parameter to set new page order in page owner.
> It prepares for upcoming changes to support split huge page to any lower
> order.
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> include/linux/page_owner.h | 7 ++++---
> mm/huge_memory.c | 2 +-
> mm/page_alloc.c | 2 +-
> mm/page_owner.c | 6 +++---
> 4 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
> index 3468794f83d2..215cbb159568 100644
> --- a/include/linux/page_owner.h
> +++ b/include/linux/page_owner.h
> @@ -31,10 +31,11 @@ static inline void set_page_owner(struct page *page,
> __set_page_owner(page, order, gfp_mask);
> }
>
> -static inline void split_page_owner(struct page *page, unsigned int nr)
> +static inline void split_page_owner(struct page *page, unsigned int nr,
> + unsigned int new_order)
> {
> if (static_branch_unlikely(&page_owner_inited))
> - __split_page_owner(page, nr);
> + __split_page_owner(page, nr, new_order);
> }
> static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
> {
> @@ -60,7 +61,7 @@ static inline void set_page_owner(struct page *page,
> {
> }
> static inline void split_page_owner(struct page *page,
> - unsigned int order)
> + unsigned int nr, unsigned int new_order)

With the addition of the new argument it's a bit hard to understand
what the function is supposed to do. It seems like nr == page_order(page),
is it right? Maybe we can pass old_order and new_order? Or just the page
and the new order?

> {
> }
> static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index f599f5b9bf7f..8b7d771ee962 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2459,7 +2459,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>
> ClearPageCompound(head);
>
> - split_page_owner(head, nr);
> + split_page_owner(head, nr, 1);
>
> /* See comment in __split_huge_page_tail() */
> if (PageAnon(head)) {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d77220615fd5..a9eead0e091a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3284,7 +3284,7 @@ void split_page(struct page *page, unsigned int order)
>
> for (i = 1; i < (1 << order); i++)
> set_page_refcounted(page + i);
> - split_page_owner(page, 1 << order);
> + split_page_owner(page, 1 << order, 1);
> }
> EXPORT_SYMBOL_GPL(split_page);
>
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index b735a8eafcdb..2b7f7e9056dc 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -204,7 +204,7 @@ void __set_page_owner_migrate_reason(struct page *page, int reason)
> page_owner->last_migrate_reason = reason;
> }
>
> -void __split_page_owner(struct page *page, unsigned int nr)
> +void __split_page_owner(struct page *page, unsigned int nr, unsigned int new_order)
> {
> int i;
> struct page_ext *page_ext = lookup_page_ext(page);
> @@ -213,9 +213,9 @@ void __split_page_owner(struct page *page, unsigned int nr)
> if (unlikely(!page_ext))
> return;
>
> - for (i = 0; i < nr; i++) {
> + for (i = 0; i < nr; i += (1 << new_order)) {
> page_owner = get_page_owner(page_ext);
> - page_owner->order = 0;
> + page_owner->order = new_order;
> page_ext = page_ext_next(page_ext);

I believe there cannot be any leftovers because nr is always a power of 2.
Is it true? Converting nr argument to order (if it's possible) will make it obvious.

Other than that the patch looks good to me.

Thanks!

2020-11-14 00:26:08

by Roman Gushchin

[permalink] [raw]
Subject: Re: [RFC PATCH 2/6] mm: memcg: make memcg huge page split support any order split.

On Wed, Nov 11, 2020 at 03:40:04PM -0500, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> It reads thp_nr_pages and splits to provided new_nr. It prepares for
> upcoming changes to support split huge page to any lower order.
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> include/linux/memcontrol.h | 5 +++--
> mm/huge_memory.c | 2 +-
> mm/memcontrol.c | 4 ++--
> 3 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 0f4dd7829fb2..b3bac79ceed6 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1105,7 +1105,7 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm,
> }
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -void mem_cgroup_split_huge_fixup(struct page *head);
> +void mem_cgroup_split_huge_fixup(struct page *head, unsigned int new_nr);
> #endif
>
> #else /* CONFIG_MEMCG */
> @@ -1451,7 +1451,8 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
> return 0;
> }
>
> -static inline void mem_cgroup_split_huge_fixup(struct page *head)
> +static inline void mem_cgroup_split_huge_fixup(struct page *head,
> + unsigned int new_nr)
> {
> }
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c4fead5ead31..f599f5b9bf7f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2429,7 +2429,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> lruvec = mem_cgroup_page_lruvec(head, pgdat);
>
> /* complete memcg works before add pages to LRU */
> - mem_cgroup_split_huge_fixup(head);
> + mem_cgroup_split_huge_fixup(head, 1);
>
> if (PageAnon(head) && PageSwapCache(head)) {
> swp_entry_t entry = { .val = page_private(head) };
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 33f632689cee..e9705ba6bbcc 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3247,7 +3247,7 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
> * Because tail pages are not marked as "used", set it. We're under
> * pgdat->lru_lock and migration entries setup in all page mappings.
> */
> -void mem_cgroup_split_huge_fixup(struct page *head)
> +void mem_cgroup_split_huge_fixup(struct page *head, unsigned int new_nr)

I'd go with unsigned int new_order, then it's obvious that we can split
the original page without any leftovers.

Other than that the patch looks good!
Acked-by: Roman Gushchin <[email protected]>

Thanks!

> {
> struct mem_cgroup *memcg = page_memcg(head);
> int i;
> @@ -3255,7 +3255,7 @@ void mem_cgroup_split_huge_fixup(struct page *head)
> if (mem_cgroup_disabled())
> return;
>
> - for (i = 1; i < thp_nr_pages(head); i++) {
> + for (i = new_nr; i < thp_nr_pages(head); i += new_nr) {
> css_get(&memcg->css);
> head[i].memcg_data = (unsigned long)memcg;
> }
> --
> 2.28.0
>

2020-11-14 01:02:06

by Zi Yan

[permalink] [raw]
Subject: Re: [RFC PATCH 2/6] mm: memcg: make memcg huge page split support any order split.

On 13 Nov 2020, at 19:23, Roman Gushchin wrote:

> On Wed, Nov 11, 2020 at 03:40:04PM -0500, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> It reads thp_nr_pages and splits to provided new_nr. It prepares for
>> upcoming changes to support split huge page to any lower order.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> include/linux/memcontrol.h | 5 +++--
>> mm/huge_memory.c | 2 +-
>> mm/memcontrol.c | 4 ++--
>> 3 files changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 0f4dd7829fb2..b3bac79ceed6 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -1105,7 +1105,7 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm,
>> }
>>
>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> -void mem_cgroup_split_huge_fixup(struct page *head);
>> +void mem_cgroup_split_huge_fixup(struct page *head, unsigned int new_nr);
>> #endif
>>
>> #else /* CONFIG_MEMCG */
>> @@ -1451,7 +1451,8 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
>> return 0;
>> }
>>
>> -static inline void mem_cgroup_split_huge_fixup(struct page *head)
>> +static inline void mem_cgroup_split_huge_fixup(struct page *head,
>> + unsigned int new_nr)
>> {
>> }
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index c4fead5ead31..f599f5b9bf7f 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2429,7 +2429,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>> lruvec = mem_cgroup_page_lruvec(head, pgdat);
>>
>> /* complete memcg works before add pages to LRU */
>> - mem_cgroup_split_huge_fixup(head);
>> + mem_cgroup_split_huge_fixup(head, 1);
>>
>> if (PageAnon(head) && PageSwapCache(head)) {
>> swp_entry_t entry = { .val = page_private(head) };
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 33f632689cee..e9705ba6bbcc 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -3247,7 +3247,7 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
>> * Because tail pages are not marked as "used", set it. We're under
>> * pgdat->lru_lock and migration entries setup in all page mappings.
>> */
>> -void mem_cgroup_split_huge_fixup(struct page *head)
>> +void mem_cgroup_split_huge_fixup(struct page *head, unsigned int new_nr)
>
> I'd go with unsigned int new_order, then it's obvious that we can split
> the original page without any leftovers.

Makes sense. Will change it.

>
> Other than that the patch looks good!
> Acked-by: Roman Gushchin <[email protected]>

Thanks.

>> {
>> struct mem_cgroup *memcg = page_memcg(head);
>> int i;
>> @@ -3255,7 +3255,7 @@ void mem_cgroup_split_huge_fixup(struct page *head)
>> if (mem_cgroup_disabled())
>> return;
>>
>> - for (i = 1; i < thp_nr_pages(head); i++) {
>> + for (i = new_nr; i < thp_nr_pages(head); i += new_nr) {
>> css_get(&memcg->css);
>> head[i].memcg_data = (unsigned long)memcg;
>> }
>> --
>> 2.28.0
>>



Best Regards,
Yan Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2020-11-14 01:20:34

by Zi Yan

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On 13 Nov 2020, at 19:15, Roman Gushchin wrote:

> On Wed, Nov 11, 2020 at 03:40:05PM -0500, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> It adds a new_order parameter to set new page order in page owner.
>> It prepares for upcoming changes to support split huge page to any
>> lower
>> order.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> include/linux/page_owner.h | 7 ++++---
>> mm/huge_memory.c | 2 +-
>> mm/page_alloc.c | 2 +-
>> mm/page_owner.c | 6 +++---
>> 4 files changed, 9 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
>> index 3468794f83d2..215cbb159568 100644
>> --- a/include/linux/page_owner.h
>> +++ b/include/linux/page_owner.h
>> @@ -31,10 +31,11 @@ static inline void set_page_owner(struct page
>> *page,
>> __set_page_owner(page, order, gfp_mask);
>> }
>>
>> -static inline void split_page_owner(struct page *page, unsigned int
>> nr)
>> +static inline void split_page_owner(struct page *page, unsigned int
>> nr,
>> + unsigned int new_order)
>> {
>> if (static_branch_unlikely(&page_owner_inited))
>> - __split_page_owner(page, nr);
>> + __split_page_owner(page, nr, new_order);
>> }
>> static inline void copy_page_owner(struct page *oldpage, struct page
>> *newpage)
>> {
>> @@ -60,7 +61,7 @@ static inline void set_page_owner(struct page
>> *page,
>> {
>> }
>> static inline void split_page_owner(struct page *page,
>> - unsigned int order)
>> + unsigned int nr, unsigned int new_order)
>
> With the addition of the new argument it's a bit hard to understand
> what the function is supposed to do. It seems like nr ==
> page_order(page),
> is it right? Maybe we can pass old_order and new_order? Or just the
> page
> and the new order?

Yeah, it is a bit confusing. Please see more below.

>
>> {
>> }
>> static inline void copy_page_owner(struct page *oldpage, struct page
>> *newpage)
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index f599f5b9bf7f..8b7d771ee962 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2459,7 +2459,7 @@ static void __split_huge_page(struct page
>> *page, struct list_head *list,
>>
>> ClearPageCompound(head);
>>
>> - split_page_owner(head, nr);
>> + split_page_owner(head, nr, 1);
>>
>> /* See comment in __split_huge_page_tail() */
>> if (PageAnon(head)) {
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index d77220615fd5..a9eead0e091a 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3284,7 +3284,7 @@ void split_page(struct page *page, unsigned int
>> order)
>>
>> for (i = 1; i < (1 << order); i++)
>> set_page_refcounted(page + i);
>> - split_page_owner(page, 1 << order);
>> + split_page_owner(page, 1 << order, 1);
>> }
>> EXPORT_SYMBOL_GPL(split_page);
>>
>> diff --git a/mm/page_owner.c b/mm/page_owner.c
>> index b735a8eafcdb..2b7f7e9056dc 100644
>> --- a/mm/page_owner.c
>> +++ b/mm/page_owner.c
>> @@ -204,7 +204,7 @@ void __set_page_owner_migrate_reason(struct page
>> *page, int reason)
>> page_owner->last_migrate_reason = reason;
>> }
>>
>> -void __split_page_owner(struct page *page, unsigned int nr)
>> +void __split_page_owner(struct page *page, unsigned int nr, unsigned
>> int new_order)
>> {
>> int i;
>> struct page_ext *page_ext = lookup_page_ext(page);
>> @@ -213,9 +213,9 @@ void __split_page_owner(struct page *page,
>> unsigned int nr)
>> if (unlikely(!page_ext))
>> return;
>>
>> - for (i = 0; i < nr; i++) {
>> + for (i = 0; i < nr; i += (1 << new_order)) {
>> page_owner = get_page_owner(page_ext);
>> - page_owner->order = 0;
>> + page_owner->order = new_order;
>> page_ext = page_ext_next(page_ext);
>
> I believe there cannot be any leftovers because nr is always a power
> of 2.
> Is it true? Converting nr argument to order (if it's possible) will
> make it obvious.

Right. nr = thp_nr_pages(head), which is a power of 2. There would not
be any
leftover.

Matthew recently converted split_page_owner to take nr instead of
order.[1] But I am not
sure why, since it seems to me that two call sites (__split_huge_page in
mm/huge_memory.c and split_page in mm/page_alloc.c) can pass the order
information.


[1]https://lore.kernel.org/linux-mm/[email protected]/



Best Regards,
Yan Zi

2020-11-14 01:42:22

by Roman Gushchin

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On Fri, Nov 13, 2020 at 08:08:58PM -0500, Zi Yan wrote:
> On 13 Nov 2020, at 19:15, Roman Gushchin wrote:
>
> > On Wed, Nov 11, 2020 at 03:40:05PM -0500, Zi Yan wrote:
> > > From: Zi Yan <[email protected]>
> > >
> > > It adds a new_order parameter to set new page order in page owner.
> > > It prepares for upcoming changes to support split huge page to any
> > > lower
> > > order.
> > >
> > > Signed-off-by: Zi Yan <[email protected]>
> > > ---
> > > include/linux/page_owner.h | 7 ++++---
> > > mm/huge_memory.c | 2 +-
> > > mm/page_alloc.c | 2 +-
> > > mm/page_owner.c | 6 +++---
> > > 4 files changed, 9 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
> > > index 3468794f83d2..215cbb159568 100644
> > > --- a/include/linux/page_owner.h
> > > +++ b/include/linux/page_owner.h
> > > @@ -31,10 +31,11 @@ static inline void set_page_owner(struct page
> > > *page,
> > > __set_page_owner(page, order, gfp_mask);
> > > }
> > >
> > > -static inline void split_page_owner(struct page *page, unsigned int
> > > nr)
> > > +static inline void split_page_owner(struct page *page, unsigned int
> > > nr,
> > > + unsigned int new_order)
> > > {
> > > if (static_branch_unlikely(&page_owner_inited))
> > > - __split_page_owner(page, nr);
> > > + __split_page_owner(page, nr, new_order);
> > > }
> > > static inline void copy_page_owner(struct page *oldpage, struct
> > > page *newpage)
> > > {
> > > @@ -60,7 +61,7 @@ static inline void set_page_owner(struct page
> > > *page,
> > > {
> > > }
> > > static inline void split_page_owner(struct page *page,
> > > - unsigned int order)
> > > + unsigned int nr, unsigned int new_order)
> >
> > With the addition of the new argument it's a bit hard to understand
> > what the function is supposed to do. It seems like nr ==
> > page_order(page),
> > is it right? Maybe we can pass old_order and new_order? Or just the page
> > and the new order?
>
> Yeah, it is a bit confusing. Please see more below.
>
> >
> > > {
> > > }
> > > static inline void copy_page_owner(struct page *oldpage, struct
> > > page *newpage)
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index f599f5b9bf7f..8b7d771ee962 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -2459,7 +2459,7 @@ static void __split_huge_page(struct page
> > > *page, struct list_head *list,
> > >
> > > ClearPageCompound(head);
> > >
> > > - split_page_owner(head, nr);
> > > + split_page_owner(head, nr, 1);
> > >
> > > /* See comment in __split_huge_page_tail() */
> > > if (PageAnon(head)) {
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index d77220615fd5..a9eead0e091a 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -3284,7 +3284,7 @@ void split_page(struct page *page, unsigned
> > > int order)
> > >
> > > for (i = 1; i < (1 << order); i++)
> > > set_page_refcounted(page + i);
> > > - split_page_owner(page, 1 << order);
> > > + split_page_owner(page, 1 << order, 1);
> > > }
> > > EXPORT_SYMBOL_GPL(split_page);
> > >
> > > diff --git a/mm/page_owner.c b/mm/page_owner.c
> > > index b735a8eafcdb..2b7f7e9056dc 100644
> > > --- a/mm/page_owner.c
> > > +++ b/mm/page_owner.c
> > > @@ -204,7 +204,7 @@ void __set_page_owner_migrate_reason(struct page
> > > *page, int reason)
> > > page_owner->last_migrate_reason = reason;
> > > }
> > >
> > > -void __split_page_owner(struct page *page, unsigned int nr)
> > > +void __split_page_owner(struct page *page, unsigned int nr,
> > > unsigned int new_order)
> > > {
> > > int i;
> > > struct page_ext *page_ext = lookup_page_ext(page);
> > > @@ -213,9 +213,9 @@ void __split_page_owner(struct page *page,
> > > unsigned int nr)
> > > if (unlikely(!page_ext))
> > > return;
> > >
> > > - for (i = 0; i < nr; i++) {
> > > + for (i = 0; i < nr; i += (1 << new_order)) {
> > > page_owner = get_page_owner(page_ext);
> > > - page_owner->order = 0;
> > > + page_owner->order = new_order;
> > > page_ext = page_ext_next(page_ext);
> >
> > I believe there cannot be any leftovers because nr is always a power of
> > 2.
> > Is it true? Converting nr argument to order (if it's possible) will make
> > it obvious.
>
> Right. nr = thp_nr_pages(head), which is a power of 2. There would not be
> any
> leftover.
>
> Matthew recently converted split_page_owner to take nr instead of order.[1]
> But I am not
> sure why, since it seems to me that two call sites (__split_huge_page in
> mm/huge_memory.c and split_page in mm/page_alloc.c) can pass the order
> information.

Yeah, I'm not sure why too. Maybe Matthew has some input here?
You can also pass new_nr, but IMO orders look so much better here.

Thanks!

2020-11-16 16:28:00

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On Wed, Nov 11, 2020 at 03:40:05PM -0500, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> It adds a new_order parameter to set new page order in page owner.
> It prepares for upcoming changes to support split huge page to any lower
> order.
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> include/linux/page_owner.h | 7 ++++---
> mm/huge_memory.c | 2 +-
> mm/page_alloc.c | 2 +-
> mm/page_owner.c | 6 +++---
> 4 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
> index 3468794f83d2..215cbb159568 100644
> --- a/include/linux/page_owner.h
> +++ b/include/linux/page_owner.h
> @@ -31,10 +31,11 @@ static inline void set_page_owner(struct page *page,
> __set_page_owner(page, order, gfp_mask);
> }
>
> -static inline void split_page_owner(struct page *page, unsigned int nr)
> +static inline void split_page_owner(struct page *page, unsigned int nr,
> + unsigned int new_order)
> {
> if (static_branch_unlikely(&page_owner_inited))
> - __split_page_owner(page, nr);
> + __split_page_owner(page, nr, new_order);

Hm. Where do you correct __split_page_owner() declaration. I don't see it.

--
Kirill A. Shutemov

2020-11-16 20:49:12

by Zi Yan

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On 16 Nov 2020, at 11:25, Kirill A. Shutemov wrote:

> On Wed, Nov 11, 2020 at 03:40:05PM -0500, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> It adds a new_order parameter to set new page order in page owner.
>> It prepares for upcoming changes to support split huge page to any lower
>> order.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> include/linux/page_owner.h | 7 ++++---
>> mm/huge_memory.c | 2 +-
>> mm/page_alloc.c | 2 +-
>> mm/page_owner.c | 6 +++---
>> 4 files changed, 9 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
>> index 3468794f83d2..215cbb159568 100644
>> --- a/include/linux/page_owner.h
>> +++ b/include/linux/page_owner.h
>> @@ -31,10 +31,11 @@ static inline void set_page_owner(struct page *page,
>> __set_page_owner(page, order, gfp_mask);
>> }
>>
>> -static inline void split_page_owner(struct page *page, unsigned int nr)
>> +static inline void split_page_owner(struct page *page, unsigned int nr,
>> + unsigned int new_order)
>> {
>> if (static_branch_unlikely(&page_owner_inited))
>> - __split_page_owner(page, nr);
>> + __split_page_owner(page, nr, new_order);
>
> Hm. Where do you correct __split_page_owner() declaration. I don't see it.

I missed it. Will add it. Thanks.


Best Regards,
Yan Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2020-11-17 21:09:16

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On Fri, Nov 13, 2020 at 05:38:01PM -0800, Roman Gushchin wrote:
> On Fri, Nov 13, 2020 at 08:08:58PM -0500, Zi Yan wrote:
> > Matthew recently converted split_page_owner to take nr instead of order.[1]
> > But I am not
> > sure why, since it seems to me that two call sites (__split_huge_page in
> > mm/huge_memory.c and split_page in mm/page_alloc.c) can pass the order
> > information.
>
> Yeah, I'm not sure why too. Maybe Matthew has some input here?
> You can also pass new_nr, but IMO orders look so much better here.

If only I'd written that information in the changelog ... oh wait, I did!

mm/page_owner: change split_page_owner to take a count

The implementation of split_page_owner() prefers a count rather than the
old order of the page. When we support a variable size THP, we won't
have the order at this point, but we will have the number of pages.
So change the interface to what the caller and callee would prefer.

2020-11-17 21:14:06

by Zi Yan

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On 17 Nov 2020, at 16:05, Matthew Wilcox wrote:

> On Fri, Nov 13, 2020 at 05:38:01PM -0800, Roman Gushchin wrote:
>> On Fri, Nov 13, 2020 at 08:08:58PM -0500, Zi Yan wrote:
>>> Matthew recently converted split_page_owner to take nr instead of order.[1]
>>> But I am not
>>> sure why, since it seems to me that two call sites (__split_huge_page in
>>> mm/huge_memory.c and split_page in mm/page_alloc.c) can pass the order
>>> information.
>>
>> Yeah, I'm not sure why too. Maybe Matthew has some input here?
>> You can also pass new_nr, but IMO orders look so much better here.
>
> If only I'd written that information in the changelog ... oh wait, I did!
>
> mm/page_owner: change split_page_owner to take a count
>
> The implementation of split_page_owner() prefers a count rather than the
> old order of the page. When we support a variable size THP, we won't
> have the order at this point, but we will have the number of pages.
> So change the interface to what the caller and callee would prefer.

There are two callers, split_page in mm/page_alloc.c and __split_huge_page in
mm/huge_memory.c. The former has the page order. The latter has the page order
information before __split_huge_page_tail is called, so we can do
old_order = thp_order(head) instead of nr = thp_nr_page(head) and use old_order.
What am I missing there?

Thanks.


Best Regards,
Yan Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2020-11-17 21:15:22

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On Wed, Nov 11, 2020 at 03:40:05PM -0500, Zi Yan wrote:
> - for (i = 0; i < nr; i++) {
> + for (i = 0; i < nr; i += (1 << new_order)) {
> page_owner = get_page_owner(page_ext);
> - page_owner->order = 0;
> + page_owner->order = new_order;
> page_ext = page_ext_next(page_ext);
> }

This doesn't do what you're hoping it will. It's going to set ->order to
new_order for the first N pages instead of every 1/N pages.

You'll need to do something like

page_ext = lookup_page_ext(page + i);

or add a new page_ext_add(page_ext, 1 << new_order);

2020-11-17 21:15:45

by Zi Yan

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On 17 Nov 2020, at 16:10, Matthew Wilcox wrote:

> On Wed, Nov 11, 2020 at 03:40:05PM -0500, Zi Yan wrote:
>> - for (i = 0; i < nr; i++) {
>> + for (i = 0; i < nr; i += (1 << new_order)) {
>> page_owner = get_page_owner(page_ext);
>> - page_owner->order = 0;
>> + page_owner->order = new_order;
>> page_ext = page_ext_next(page_ext);
>> }
>
> This doesn't do what you're hoping it will. It's going to set ->order to
> new_order for the first N pages instead of every 1/N pages.
>
> You'll need to do something like
>
> page_ext = lookup_page_ext(page + i);

Will use this. Thanks.

>
> or add a new page_ext_add(page_ext, 1 << new_order);



Best Regards,
Yan Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2020-11-17 21:28:22

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On Tue, Nov 17, 2020 at 04:12:03PM -0500, Zi Yan wrote:
> On 17 Nov 2020, at 16:05, Matthew Wilcox wrote:
>
> > On Fri, Nov 13, 2020 at 05:38:01PM -0800, Roman Gushchin wrote:
> >> On Fri, Nov 13, 2020 at 08:08:58PM -0500, Zi Yan wrote:
> >>> Matthew recently converted split_page_owner to take nr instead of order.[1]
> >>> But I am not
> >>> sure why, since it seems to me that two call sites (__split_huge_page in
> >>> mm/huge_memory.c and split_page in mm/page_alloc.c) can pass the order
> >>> information.
> >>
> >> Yeah, I'm not sure why too. Maybe Matthew has some input here?
> >> You can also pass new_nr, but IMO orders look so much better here.
> >
> > If only I'd written that information in the changelog ... oh wait, I did!
> >
> > mm/page_owner: change split_page_owner to take a count
> >
> > The implementation of split_page_owner() prefers a count rather than the
> > old order of the page. When we support a variable size THP, we won't
> > have the order at this point, but we will have the number of pages.
> > So change the interface to what the caller and callee would prefer.
>
> There are two callers, split_page in mm/page_alloc.c and __split_huge_page in
> mm/huge_memory.c. The former has the page order. The latter has the page order
> information before __split_huge_page_tail is called, so we can do
> old_order = thp_order(head) instead of nr = thp_nr_page(head) and use old_order.
> What am I missing there?

Sure, we could also do that. But what I wrote was true at the time I
wrote it.

2020-11-17 21:29:51

by Zi Yan

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On 17 Nov 2020, at 16:22, Matthew Wilcox wrote:

> On Tue, Nov 17, 2020 at 04:12:03PM -0500, Zi Yan wrote:
>> On 17 Nov 2020, at 16:05, Matthew Wilcox wrote:
>>
>>> On Fri, Nov 13, 2020 at 05:38:01PM -0800, Roman Gushchin wrote:
>>>> On Fri, Nov 13, 2020 at 08:08:58PM -0500, Zi Yan wrote:
>>>>> Matthew recently converted split_page_owner to take nr instead of order.[1]
>>>>> But I am not
>>>>> sure why, since it seems to me that two call sites (__split_huge_page in
>>>>> mm/huge_memory.c and split_page in mm/page_alloc.c) can pass the order
>>>>> information.
>>>>
>>>> Yeah, I'm not sure why too. Maybe Matthew has some input here?
>>>> You can also pass new_nr, but IMO orders look so much better here.
>>>
>>> If only I'd written that information in the changelog ... oh wait, I did!
>>>
>>> mm/page_owner: change split_page_owner to take a count
>>>
>>> The implementation of split_page_owner() prefers a count rather than the
>>> old order of the page. When we support a variable size THP, we won't
>>> have the order at this point, but we will have the number of pages.
>>> So change the interface to what the caller and callee would prefer.
>>
>> There are two callers, split_page in mm/page_alloc.c and __split_huge_page in
>> mm/huge_memory.c. The former has the page order. The latter has the page order
>> information before __split_huge_page_tail is called, so we can do
>> old_order = thp_order(head) instead of nr = thp_nr_page(head) and use old_order.
>> What am I missing there?
>
> Sure, we could also do that. But what I wrote was true at the time I
> wrote it.

Got it. Thanks. Will change it to use old_order to make split_page_owner parameters
look more consistent.


Best Regards,
Yan Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2020-11-17 21:39:29

by Roman Gushchin

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On Tue, Nov 17, 2020 at 09:22:55PM +0000, Matthew Wilcox wrote:
> On Tue, Nov 17, 2020 at 04:12:03PM -0500, Zi Yan wrote:
> > On 17 Nov 2020, at 16:05, Matthew Wilcox wrote:
> >
> > > On Fri, Nov 13, 2020 at 05:38:01PM -0800, Roman Gushchin wrote:
> > >> On Fri, Nov 13, 2020 at 08:08:58PM -0500, Zi Yan wrote:
> > >>> Matthew recently converted split_page_owner to take nr instead of order.[1]
> > >>> But I am not
> > >>> sure why, since it seems to me that two call sites (__split_huge_page in
> > >>> mm/huge_memory.c and split_page in mm/page_alloc.c) can pass the order
> > >>> information.
> > >>
> > >> Yeah, I'm not sure why too. Maybe Matthew has some input here?
> > >> You can also pass new_nr, but IMO orders look so much better here.
> > >
> > > If only I'd written that information in the changelog ... oh wait, I did!
> > >
> > > mm/page_owner: change split_page_owner to take a count
> > >
> > > The implementation of split_page_owner() prefers a count rather than the
> > > old order of the page. When we support a variable size THP, we won't
> > > have the order at this point, but we will have the number of pages.
> > > So change the interface to what the caller and callee would prefer.
> >
> > There are two callers, split_page in mm/page_alloc.c and __split_huge_page in
> > mm/huge_memory.c. The former has the page order. The latter has the page order
> > information before __split_huge_page_tail is called, so we can do
> > old_order = thp_order(head) instead of nr = thp_nr_page(head) and use old_order.
> > What am I missing there?
>
> Sure, we could also do that. But what I wrote was true at the time I
> wrote it.

Sure, I was asking about if you're ok with going back to orders or there are better
ideas. I'm sorry if it wasn't clear and sounded differently.

It just seems to me than a function is taking nr and order (as in Zi's last version),
I'd expect that it's a number of pages of given order, or something like this.
So I'd avoid mixing them. Orders are slightly better if nr is always a power of two,
it's just more obvious from looking at the code.

Thanks!

2020-11-17 21:46:25

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On Tue, Nov 17, 2020 at 01:35:37PM -0800, Roman Gushchin wrote:
> On Tue, Nov 17, 2020 at 09:22:55PM +0000, Matthew Wilcox wrote:
> > On Tue, Nov 17, 2020 at 04:12:03PM -0500, Zi Yan wrote:
> > > On 17 Nov 2020, at 16:05, Matthew Wilcox wrote:
> > >
> > > > On Fri, Nov 13, 2020 at 05:38:01PM -0800, Roman Gushchin wrote:
> > > >> On Fri, Nov 13, 2020 at 08:08:58PM -0500, Zi Yan wrote:
> > > >>> Matthew recently converted split_page_owner to take nr instead of order.[1]
> > > >>> But I am not
> > > >>> sure why, since it seems to me that two call sites (__split_huge_page in
> > > >>> mm/huge_memory.c and split_page in mm/page_alloc.c) can pass the order
> > > >>> information.
> > > >>
> > > >> Yeah, I'm not sure why too. Maybe Matthew has some input here?
> > > >> You can also pass new_nr, but IMO orders look so much better here.
> > > >
> > > > If only I'd written that information in the changelog ... oh wait, I did!
> > > >
> > > > mm/page_owner: change split_page_owner to take a count
> > > >
> > > > The implementation of split_page_owner() prefers a count rather than the
> > > > old order of the page. When we support a variable size THP, we won't
> > > > have the order at this point, but we will have the number of pages.
> > > > So change the interface to what the caller and callee would prefer.
> > >
> > > There are two callers, split_page in mm/page_alloc.c and __split_huge_page in
> > > mm/huge_memory.c. The former has the page order. The latter has the page order
> > > information before __split_huge_page_tail is called, so we can do
> > > old_order = thp_order(head) instead of nr = thp_nr_page(head) and use old_order.
> > > What am I missing there?
> >
> > Sure, we could also do that. But what I wrote was true at the time I
> > wrote it.
>
> Sure, I was asking about if you're ok with going back to orders or there are better
> ideas. I'm sorry if it wasn't clear and sounded differently.
>
> It just seems to me than a function is taking nr and order (as in Zi's last version),
> I'd expect that it's a number of pages of given order, or something like this.
> So I'd avoid mixing them. Orders are slightly better if nr is always a power of two,
> it's just more obvious from looking at the code.

I think it's awkward no matter which way round we do it.

If we pass old_order, new_order then we create extra work for both caller
and callee.

If we pass old_nr, new_order, it looks weird for humans.

At the end of the day, I'm not that invested in which we do.