LinuxLists.cc - [RFC PATCH 0/6] Split huge pages to any lower order pages.

2020-11-12 18:02:56

by Zi Yan

[permalink] [raw]

Subject: Re: [RFC PATCH 2/6] mm: memcg: make memcg huge page split support any order split.

On 12 Nov 2020, at 12:58, Ralph Campbell wrote:

> On 11/11/20 12:40 PM, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> It reads thp_nr_pages and splits to provided new_nr. It prepares for
>> upcoming changes to support split huge page to any lower order.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>
> Looks OK to me.
> Reviewed-by: Ralph Campbell <[email protected]>

Thanks.

—
Best Regards,
Yan Zi

Attachments:

2020-11-12 22:12:00

by Ralph Campbell

[permalink] [raw]

Subject: Re: [RFC PATCH 5/6] mm: truncate: split thp to a non-zero order if possible.

On 11/11/20 12:40 PM, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> To minimize the number of pages after a truncation, when truncating a
> THP, we do not need to split it all the way down to order-0. The THP has
> at most three parts, the part before offset, the part to be truncated,
> the part left at the end. Use the non-zero minimum of them to decide
> what order we split the THP to.
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> mm/truncate.c | 22 ++++++++++++++++++++--
> 1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 20bd17538ec2..6d8e3c6115bc 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -237,7 +237,7 @@ int truncate_inode_page(struct address_space *mapping, struct page *page)
> bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end)
> {
> loff_t pos = page_offset(page);
> - unsigned int offset, length;
> + unsigned int offset, length, left, min_subpage_size = PAGE_SIZE;

Maybe use "remaining" instead of "left" since I think of the latter as the length of the
left side (offset).

> if (pos < start)
> offset = start - pos;
> @@ -248,6 +248,7 @@ bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end)
> length = length - offset;
> else
> length = end + 1 - pos - offset;
> + left = thp_size(page) - offset - length;
>
> wait_on_page_writeback(page);
> if (length == thp_size(page)) {
> @@ -267,7 +268,24 @@ bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end)
> do_invalidatepage(page, offset, length);
> if (!PageTransHuge(page))
> return true;
> - return split_huge_page(page) == 0;
> +
> + /*
> + * find the non-zero minimum of offset, length, and left and use it to
> + * decide the new order of the page after split
> + */
> + if (offset && left)
> + min_subpage_size = min_t(unsigned int,
> + min_t(unsigned int, offset, length),
> + left);
> + else if (!offset)
> + min_subpage_size = min_t(unsigned int, length, left);
> + else /* !left */
> + min_subpage_size = min_t(unsigned int, length, offset);
> +
> + min_subpage_size = max_t(unsigned int, PAGE_SIZE, min_subpage_size);
> +
> + return split_huge_page_to_list_to_order(page, NULL,
> + ilog2(min_subpage_size/PAGE_SIZE)) == 0;
> }

What if "min_subpage_size" is 1/2 the THP but offset isn't aligned to 1/2?
Splitting the page in half wouldn't result in a page that could be freed
but maybe splitting to 1/4 would (assuming the THP is at least 8x PAGE_SIZE).

2020-11-12 22:41:35

by Zi Yan

[permalink] [raw]

Subject: Re: [RFC PATCH 5/6] mm: truncate: split thp to a non-zero order if possible.

On 12 Nov 2020, at 17:08, Ralph Campbell wrote:

> On 11/11/20 12:40 PM, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> To minimize the number of pages after a truncation, when truncating a
>> THP, we do not need to split it all the way down to order-0. The THP has
>> at most three parts, the part before offset, the part to be truncated,
>> the part left at the end. Use the non-zero minimum of them to decide
>> what order we split the THP to.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> mm/truncate.c | 22 ++++++++++++++++++++--
>> 1 file changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/truncate.c b/mm/truncate.c
>> index 20bd17538ec2..6d8e3c6115bc 100644
>> --- a/mm/truncate.c
>> +++ b/mm/truncate.c
>> @@ -237,7 +237,7 @@ int truncate_inode_page(struct address_space *mapping, struct page *page)
>> bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end)
>> {
>> loff_t pos = page_offset(page);
>> - unsigned int offset, length;
>> + unsigned int offset, length, left, min_subpage_size = PAGE_SIZE;
>
> Maybe use "remaining" instead of "left" since I think of the latter as the length of the
> left side (offset).

Sure. Will change the name.

>
>> if (pos < start)
>> offset = start - pos;
>> @@ -248,6 +248,7 @@ bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end)
>> length = length - offset;
>> else
>> length = end + 1 - pos - offset;
>> + left = thp_size(page) - offset - length;
>> wait_on_page_writeback(page);
>> if (length == thp_size(page)) {
>> @@ -267,7 +268,24 @@ bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end)
>> do_invalidatepage(page, offset, length);
>> if (!PageTransHuge(page))
>> return true;
>> - return split_huge_page(page) == 0;
>> +
>> + /*
>> + * find the non-zero minimum of offset, length, and left and use it to
>> + * decide the new order of the page after split
>> + */
>> + if (offset && left)
>> + min_subpage_size = min_t(unsigned int,
>> + min_t(unsigned int, offset, length),
>> + left);
>> + else if (!offset)
>> + min_subpage_size = min_t(unsigned int, length, left);
>> + else /* !left */
>> + min_subpage_size = min_t(unsigned int, length, offset);
>> +
>> + min_subpage_size = max_t(unsigned int, PAGE_SIZE, min_subpage_size);
>> +
>> + return split_huge_page_to_list_to_order(page, NULL,
>> + ilog2(min_subpage_size/PAGE_SIZE)) == 0;
>> }
>
> What if "min_subpage_size" is 1/2 the THP but offset isn't aligned to 1/2?
> Splitting the page in half wouldn't result in a page that could be freed
> but maybe splitting to 1/4 would (assuming the THP is at least 8x PAGE_SIZE).

Is it possible? The whole THP is divided into three parts, offset, length, and
remaining (renamed from left). If offset is not aligned to 1/2, it is either
greater than 1/2 or smaller than 1/2. If it is the former, length and remaining
will be smaller than 1/2, so min_subpage_size cannot be 1/2. If it is the latter,
min_subpage_size cannot be 1/2 either. Because min_subpage_size is the smallest
non-zero value of offset, length, and remaining. Let me know if I miss anything.

—
Best Regards,
Yan Zi

Attachments:

2020-11-14 00:20:34

[permalink] [raw]

Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On Wed, Nov 11, 2020 at 03:40:05PM -0500, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> It adds a new_order parameter to set new page order in page owner.
> It prepares for upcoming changes to support split huge page to any lower
> order.
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> include/linux/page_owner.h | 7 ++++---
> mm/huge_memory.c | 2 +-
> mm/page_alloc.c | 2 +-
> mm/page_owner.c | 6 +++---
> 4 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
> index 3468794f83d2..215cbb159568 100644
> --- a/include/linux/page_owner.h
> +++ b/include/linux/page_owner.h
> @@ -31,10 +31,11 @@ static inline void set_page_owner(struct page *page,
> __set_page_owner(page, order, gfp_mask);
> }
>
> -static inline void split_page_owner(struct page *page, unsigned int nr)
> +static inline void split_page_owner(struct page *page, unsigned int nr,
> + unsigned int new_order)
> {
> if (static_branch_unlikely(&page_owner_inited))
> - __split_page_owner(page, nr);
> + __split_page_owner(page, nr, new_order);
> }
> static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
> {
> @@ -60,7 +61,7 @@ static inline void set_page_owner(struct page *page,
> {
> }
> static inline void split_page_owner(struct page *page,
> - unsigned int order)
> + unsigned int nr, unsigned int new_order)

With the addition of the new argument it's a bit hard to understand
what the function is supposed to do. It seems like nr == page_order(page),
is it right? Maybe we can pass old_order and new_order? Or just the page
and the new order?

> {
> }
> static inline void copy_page_owner(struct page *oldpage, struct page *newpage)
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index f599f5b9bf7f..8b7d771ee962 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2459,7 +2459,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>
> ClearPageCompound(head);
>
> - split_page_owner(head, nr);
> + split_page_owner(head, nr, 1);
>
> /* See comment in __split_huge_page_tail() */
> if (PageAnon(head)) {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d77220615fd5..a9eead0e091a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3284,7 +3284,7 @@ void split_page(struct page *page, unsigned int order)
>
> for (i = 1; i < (1 << order); i++)
> set_page_refcounted(page + i);
> - split_page_owner(page, 1 << order);
> + split_page_owner(page, 1 << order, 1);
> }
> EXPORT_SYMBOL_GPL(split_page);
>
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index b735a8eafcdb..2b7f7e9056dc 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -204,7 +204,7 @@ void __set_page_owner_migrate_reason(struct page *page, int reason)
> page_owner->last_migrate_reason = reason;
> }
>
> -void __split_page_owner(struct page *page, unsigned int nr)
> +void __split_page_owner(struct page *page, unsigned int nr, unsigned int new_order)
> {
> int i;
> struct page_ext *page_ext = lookup_page_ext(page);
> @@ -213,9 +213,9 @@ void __split_page_owner(struct page *page, unsigned int nr)
> if (unlikely(!page_ext))
> return;
>
> - for (i = 0; i < nr; i++) {
> + for (i = 0; i < nr; i += (1 << new_order)) {
> page_owner = get_page_owner(page_ext);
> - page_owner->order = 0;
> + page_owner->order = new_order;
> page_ext = page_ext_next(page_ext);

I believe there cannot be any leftovers because nr is always a power of 2.
Is it true? Converting nr argument to order (if it's possible) will make it obvious.

Other than that the patch looks good to me.

Thanks!

2020-11-14 00:26:08

[permalink] [raw]

Subject: Re: [RFC PATCH 2/6] mm: memcg: make memcg huge page split support any order split.

On Wed, Nov 11, 2020 at 03:40:04PM -0500, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> It reads thp_nr_pages and splits to provided new_nr. It prepares for
> upcoming changes to support split huge page to any lower order.
>
> Signed-off-by: Zi Yan <[email protected]>
> ---
> include/linux/memcontrol.h | 5 +++--
> mm/huge_memory.c | 2 +-
> mm/memcontrol.c | 4 ++--
> 3 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 0f4dd7829fb2..b3bac79ceed6 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1105,7 +1105,7 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm,
> }
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -void mem_cgroup_split_huge_fixup(struct page *head);
> +void mem_cgroup_split_huge_fixup(struct page *head, unsigned int new_nr);
> #endif
>
> #else /* CONFIG_MEMCG */
> @@ -1451,7 +1451,8 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
> return 0;
> }
>
> -static inline void mem_cgroup_split_huge_fixup(struct page *head)
> +static inline void mem_cgroup_split_huge_fixup(struct page *head,
> + unsigned int new_nr)
> {
> }
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c4fead5ead31..f599f5b9bf7f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2429,7 +2429,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> lruvec = mem_cgroup_page_lruvec(head, pgdat);
>
> /* complete memcg works before add pages to LRU */
> - mem_cgroup_split_huge_fixup(head);
> + mem_cgroup_split_huge_fixup(head, 1);
>
> if (PageAnon(head) && PageSwapCache(head)) {
> swp_entry_t entry = { .val = page_private(head) };
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 33f632689cee..e9705ba6bbcc 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3247,7 +3247,7 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
> * Because tail pages are not marked as "used", set it. We're under
> * pgdat->lru_lock and migration entries setup in all page mappings.
> */
> -void mem_cgroup_split_huge_fixup(struct page *head)
> +void mem_cgroup_split_huge_fixup(struct page *head, unsigned int new_nr)

I'd go with unsigned int new_order, then it's obvious that we can split
the original page without any leftovers.

Other than that the patch looks good!
Acked-by: Roman Gushchin <[email protected]>

Thanks!

> {
> struct mem_cgroup *memcg = page_memcg(head);
> int i;
> @@ -3255,7 +3255,7 @@ void mem_cgroup_split_huge_fixup(struct page *head)
> if (mem_cgroup_disabled())
> return;
>
> - for (i = 1; i < thp_nr_pages(head); i++) {
> + for (i = new_nr; i < thp_nr_pages(head); i += new_nr) {
> css_get(&memcg->css);
> head[i].memcg_data = (unsigned long)memcg;
> }
> --
> 2.28.0
>

2020-11-14 01:02:06

by Zi Yan

[permalink] [raw]

Subject: Re: [RFC PATCH 2/6] mm: memcg: make memcg huge page split support any order split.

On 13 Nov 2020, at 19:23, Roman Gushchin wrote:

> On Wed, Nov 11, 2020 at 03:40:04PM -0500, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> It reads thp_nr_pages and splits to provided new_nr. It prepares for
>> upcoming changes to support split huge page to any lower order.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> include/linux/memcontrol.h | 5 +++--
>> mm/huge_memory.c | 2 +-
>> mm/memcontrol.c | 4 ++--
>> 3 files changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 0f4dd7829fb2..b3bac79ceed6 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -1105,7 +1105,7 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm,
>> }
>>
>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> -void mem_cgroup_split_huge_fixup(struct page *head);
>> +void mem_cgroup_split_huge_fixup(struct page *head, unsigned int new_nr);
>> #endif
>>
>> #else /* CONFIG_MEMCG */
>> @@ -1451,7 +1451,8 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
>> return 0;
>> }
>>
>> -static inline void mem_cgroup_split_huge_fixup(struct page *head)
>> +static inline void mem_cgroup_split_huge_fixup(struct page *head,
>> + unsigned int new_nr)
>> {
>> }
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index c4fead5ead31..f599f5b9bf7f 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2429,7 +2429,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>> lruvec = mem_cgroup_page_lruvec(head, pgdat);
>>
>> /* complete memcg works before add pages to LRU */
>> - mem_cgroup_split_huge_fixup(head);
>> + mem_cgroup_split_huge_fixup(head, 1);
>>
>> if (PageAnon(head) && PageSwapCache(head)) {
>> swp_entry_t entry = { .val = page_private(head) };
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 33f632689cee..e9705ba6bbcc 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -3247,7 +3247,7 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
>> * Because tail pages are not marked as "used", set it. We're under
>> * pgdat->lru_lock and migration entries setup in all page mappings.
>> */
>> -void mem_cgroup_split_huge_fixup(struct page *head)
>> +void mem_cgroup_split_huge_fixup(struct page *head, unsigned int new_nr)
>
> I'd go with unsigned int new_order, then it's obvious that we can split
> the original page without any leftovers.

Makes sense. Will change it.

>
> Other than that the patch looks good!
> Acked-by: Roman Gushchin <[email protected]>

Thanks.

>> {
>> struct mem_cgroup *memcg = page_memcg(head);
>> int i;
>> @@ -3255,7 +3255,7 @@ void mem_cgroup_split_huge_fixup(struct page *head)
>> if (mem_cgroup_disabled())
>> return;
>>
>> - for (i = 1; i < thp_nr_pages(head); i++) {
>> + for (i = new_nr; i < thp_nr_pages(head); i += new_nr) {
>> css_get(&memcg->css);
>> head[i].memcg_data = (unsigned long)memcg;
>> }
>> --
>> 2.28.0
>>

—
Best Regards,
Yan Zi

Attachments:

2020-11-14 01:20:34

by Zi Yan

[permalink] [raw]

Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On 13 Nov 2020, at 19:15, Roman Gushchin wrote:

> On Wed, Nov 11, 2020 at 03:40:05PM -0500, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> It adds a new_order parameter to set new page order in page owner.
>> It prepares for upcoming changes to support split huge page to any
>> lower
>> order.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> include/linux/page_owner.h | 7 ++++---
>> mm/huge_memory.c | 2 +-
>> mm/page_alloc.c | 2 +-
>> mm/page_owner.c | 6 +++---
>> 4 files changed, 9 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
>> index 3468794f83d2..215cbb159568 100644
>> --- a/include/linux/page_owner.h
>> +++ b/include/linux/page_owner.h
>> @@ -31,10 +31,11 @@ static inline void set_page_owner(struct page
>> *page,
>> __set_page_owner(page, order, gfp_mask);
>> }
>>
>> -static inline void split_page_owner(struct page *page, unsigned int
>> nr)
>> +static inline void split_page_owner(struct page *page, unsigned int
>> nr,
>> + unsigned int new_order)
>> {
>> if (static_branch_unlikely(&page_owner_inited))
>> - __split_page_owner(page, nr);
>> + __split_page_owner(page, nr, new_order);
>> }
>> static inline void copy_page_owner(struct page *oldpage, struct page
>> *newpage)
>> {
>> @@ -60,7 +61,7 @@ static inline void set_page_owner(struct page
>> *page,
>> {
>> }
>> static inline void split_page_owner(struct page *page,
>> - unsigned int order)
>> + unsigned int nr, unsigned int new_order)
>
> With the addition of the new argument it's a bit hard to understand
> what the function is supposed to do. It seems like nr ==
> page_order(page),
> is it right? Maybe we can pass old_order and new_order? Or just the
> page
> and the new order?

Yeah, it is a bit confusing. Please see more below.

>
>> {
>> }
>> static inline void copy_page_owner(struct page *oldpage, struct page
>> *newpage)
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index f599f5b9bf7f..8b7d771ee962 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2459,7 +2459,7 @@ static void __split_huge_page(struct page
>> *page, struct list_head *list,
>>
>> ClearPageCompound(head);
>>
>> - split_page_owner(head, nr);
>> + split_page_owner(head, nr, 1);
>>
>> /* See comment in __split_huge_page_tail() */
>> if (PageAnon(head)) {
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index d77220615fd5..a9eead0e091a 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3284,7 +3284,7 @@ void split_page(struct page *page, unsigned int
>> order)
>>
>> for (i = 1; i < (1 << order); i++)
>> set_page_refcounted(page + i);
>> - split_page_owner(page, 1 << order);
>> + split_page_owner(page, 1 << order, 1);
>> }
>> EXPORT_SYMBOL_GPL(split_page);
>>
>> diff --git a/mm/page_owner.c b/mm/page_owner.c
>> index b735a8eafcdb..2b7f7e9056dc 100644
>> --- a/mm/page_owner.c
>> +++ b/mm/page_owner.c
>> @@ -204,7 +204,7 @@ void __set_page_owner_migrate_reason(struct page
>> *page, int reason)
>> page_owner->last_migrate_reason = reason;
>> }
>>
>> -void __split_page_owner(struct page *page, unsigned int nr)
>> +void __split_page_owner(struct page *page, unsigned int nr, unsigned
>> int new_order)
>> {
>> int i;
>> struct page_ext *page_ext = lookup_page_ext(page);
>> @@ -213,9 +213,9 @@ void __split_page_owner(struct page *page,
>> unsigned int nr)
>> if (unlikely(!page_ext))
>> return;
>>
>> - for (i = 0; i < nr; i++) {
>> + for (i = 0; i < nr; i += (1 << new_order)) {
>> page_owner = get_page_owner(page_ext);
>> - page_owner->order = 0;
>> + page_owner->order = new_order;
>> page_ext = page_ext_next(page_ext);
>
> I believe there cannot be any leftovers because nr is always a power
> of 2.
> Is it true? Converting nr argument to order (if it's possible) will
> make it obvious.

Right. nr = thp_nr_pages(head), which is a power of 2. There would not
be any
leftover.

Matthew recently converted split_page_owner to take nr instead of
order.[1] But I am not
sure why, since it seems to me that two call sites (__split_huge_page in
mm/huge_memory.c and split_page in mm/page_alloc.c) can pass the order
information.

[1]https://lore.kernel.org/linux-mm/[email protected]/

—
Best Regards,
Yan Zi

2020-11-14 01:42:22

[permalink] [raw]

Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On Fri, Nov 13, 2020 at 08:08:58PM -0500, Zi Yan wrote:
> On 13 Nov 2020, at 19:15, Roman Gushchin wrote:
>
> > On Wed, Nov 11, 2020 at 03:40:05PM -0500, Zi Yan wrote:
> > > From: Zi Yan <[email protected]>
> > >
> > > It adds a new_order parameter to set new page order in page owner.
> > > It prepares for upcoming changes to support split huge page to any
> > > lower
> > > order.
> > >
> > > Signed-off-by: Zi Yan <[email protected]>
> > > ---
> > > include/linux/page_owner.h | 7 ++++---
> > > mm/huge_memory.c | 2 +-
> > > mm/page_alloc.c | 2 +-
> > > mm/page_owner.c | 6 +++---
> > > 4 files changed, 9 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
> > > index 3468794f83d2..215cbb159568 100644
> > > --- a/include/linux/page_owner.h
> > > +++ b/include/linux/page_owner.h
> > > @@ -31,10 +31,11 @@ static inline void set_page_owner(struct page
> > > *page,
> > > __set_page_owner(page, order, gfp_mask);
> > > }
> > >
> > > -static inline void split_page_owner(struct page *page, unsigned int
> > > nr)
> > > +static inline void split_page_owner(struct page *page, unsigned int
> > > nr,
> > > + unsigned int new_order)
> > > {
> > > if (static_branch_unlikely(&page_owner_inited))
> > > - __split_page_owner(page, nr);
> > > + __split_page_owner(page, nr, new_order);
> > > }
> > > static inline void copy_page_owner(struct page *oldpage, struct
> > > page *newpage)
> > > {
> > > @@ -60,7 +61,7 @@ static inline void set_page_owner(struct page
> > > *page,
> > > {
> > > }
> > > static inline void split_page_owner(struct page *page,
> > > - unsigned int order)
> > > + unsigned int nr, unsigned int new_order)
> >
> > With the addition of the new argument it's a bit hard to understand
> > what the function is supposed to do. It seems like nr ==
> > page_order(page),
> > is it right? Maybe we can pass old_order and new_order? Or just the page
> > and the new order?
>
> Yeah, it is a bit confusing. Please see more below.
>
> >
> > > {
> > > }
> > > static inline void copy_page_owner(struct page *oldpage, struct
> > > page *newpage)
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index f599f5b9bf7f..8b7d771ee962 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -2459,7 +2459,7 @@ static void __split_huge_page(struct page
> > > *page, struct list_head *list,
> > >
> > > ClearPageCompound(head);
> > >
> > > - split_page_owner(head, nr);
> > > + split_page_owner(head, nr, 1);
> > >
> > > /* See comment in __split_huge_page_tail() */
> > > if (PageAnon(head)) {
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index d77220615fd5..a9eead0e091a 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -3284,7 +3284,7 @@ void split_page(struct page *page, unsigned
> > > int order)
> > >
> > > for (i = 1; i < (1 << order); i++)
> > > set_page_refcounted(page + i);
> > > - split_page_owner(page, 1 << order);
> > > + split_page_owner(page, 1 << order, 1);
> > > }
> > > EXPORT_SYMBOL_GPL(split_page);
> > >
> > > diff --git a/mm/page_owner.c b/mm/page_owner.c
> > > index b735a8eafcdb..2b7f7e9056dc 100644
> > > --- a/mm/page_owner.c
> > > +++ b/mm/page_owner.c
> > > @@ -204,7 +204,7 @@ void __set_page_owner_migrate_reason(struct page
> > > *page, int reason)
> > > page_owner->last_migrate_reason = reason;
> > > }
> > >
> > > -void __split_page_owner(struct page *page, unsigned int nr)
> > > +void __split_page_owner(struct page *page, unsigned int nr,
> > > unsigned int new_order)
> > > {
> > > int i;
> > > struct page_ext *page_ext = lookup_page_ext(page);
> > > @@ -213,9 +213,9 @@ void __split_page_owner(struct page *page,
> > > unsigned int nr)
> > > if (unlikely(!page_ext))
> > > return;
> > >
> > > - for (i = 0; i < nr; i++) {
> > > + for (i = 0; i < nr; i += (1 << new_order)) {
> > > page_owner = get_page_owner(page_ext);
> > > - page_owner->order = 0;
> > > + page_owner->order = new_order;
> > > page_ext = page_ext_next(page_ext);
> >
> > I believe there cannot be any leftovers because nr is always a power of
> > 2.
> > Is it true? Converting nr argument to order (if it's possible) will make
> > it obvious.
>
> Right. nr = thp_nr_pages(head), which is a power of 2. There would not be
> any
> leftover.
>
> Matthew recently converted split_page_owner to take nr instead of order.[1]
> But I am not
> sure why, since it seems to me that two call sites (__split_huge_page in
> mm/huge_memory.c and split_page in mm/page_alloc.c) can pass the order
> information.

Yeah, I'm not sure why too. Maybe Matthew has some input here?
You can also pass new_nr, but IMO orders look so much better here.

Thanks!

2020-11-16 16:28:00

by Kirill A. Shutemov

[permalink] [raw]

Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

2020-11-16 20:49:12

by Zi Yan

[permalink] [raw]

Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On 16 Nov 2020, at 11:25, Kirill A. Shutemov wrote:

> On Wed, Nov 11, 2020 at 03:40:05PM -0500, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> It adds a new_order parameter to set new page order in page owner.
>> It prepares for upcoming changes to support split huge page to any lower
>> order.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> include/linux/page_owner.h | 7 ++++---
>> mm/huge_memory.c | 2 +-
>> mm/page_alloc.c | 2 +-
>> mm/page_owner.c | 6 +++---
>> 4 files changed, 9 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
>> index 3468794f83d2..215cbb159568 100644
>> --- a/include/linux/page_owner.h
>> +++ b/include/linux/page_owner.h
>> @@ -31,10 +31,11 @@ static inline void set_page_owner(struct page *page,
>> __set_page_owner(page, order, gfp_mask);
>> }
>>
>> -static inline void split_page_owner(struct page *page, unsigned int nr)
>> +static inline void split_page_owner(struct page *page, unsigned int nr,
>> + unsigned int new_order)
>> {
>> if (static_branch_unlikely(&page_owner_inited))
>> - __split_page_owner(page, nr);
>> + __split_page_owner(page, nr, new_order);
>
> Hm. Where do you correct __split_page_owner() declaration. I don't see it.

I missed it. Will add it. Thanks.

—
Best Regards,
Yan Zi

Attachments:

2020-11-17 21:09:16

[permalink] [raw]

Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On Fri, Nov 13, 2020 at 05:38:01PM -0800, Roman Gushchin wrote:
> On Fri, Nov 13, 2020 at 08:08:58PM -0500, Zi Yan wrote:
> > Matthew recently converted split_page_owner to take nr instead of order.[1]
> > But I am not
> > sure why, since it seems to me that two call sites (__split_huge_page in
> > mm/huge_memory.c and split_page in mm/page_alloc.c) can pass the order
> > information.
>
> Yeah, I'm not sure why too. Maybe Matthew has some input here?
> You can also pass new_nr, but IMO orders look so much better here.

If only I'd written that information in the changelog ... oh wait, I did!

mm/page_owner: change split_page_owner to take a count

The implementation of split_page_owner() prefers a count rather than the
old order of the page. When we support a variable size THP, we won't
have the order at this point, but we will have the number of pages.
So change the interface to what the caller and callee would prefer.

2020-11-17 21:14:06

by Zi Yan

[permalink] [raw]

Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On 17 Nov 2020, at 16:05, Matthew Wilcox wrote:

> On Fri, Nov 13, 2020 at 05:38:01PM -0800, Roman Gushchin wrote:
>> On Fri, Nov 13, 2020 at 08:08:58PM -0500, Zi Yan wrote:
>>> Matthew recently converted split_page_owner to take nr instead of order.[1]
>>> But I am not
>>> sure why, since it seems to me that two call sites (__split_huge_page in
>>> mm/huge_memory.c and split_page in mm/page_alloc.c) can pass the order
>>> information.
>>
>> Yeah, I'm not sure why too. Maybe Matthew has some input here?
>> You can also pass new_nr, but IMO orders look so much better here.
>
> If only I'd written that information in the changelog ... oh wait, I did!
>
> mm/page_owner: change split_page_owner to take a count
>
> The implementation of split_page_owner() prefers a count rather than the
> old order of the page. When we support a variable size THP, we won't
> have the order at this point, but we will have the number of pages.
> So change the interface to what the caller and callee would prefer.

There are two callers, split_page in mm/page_alloc.c and __split_huge_page in
mm/huge_memory.c. The former has the page order. The latter has the page order
information before __split_huge_page_tail is called, so we can do
old_order = thp_order(head) instead of nr = thp_nr_page(head) and use old_order.
What am I missing there?

Thanks.

—
Best Regards,
Yan Zi

Attachments:

2020-11-17 21:15:22

[permalink] [raw]

Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On Wed, Nov 11, 2020 at 03:40:05PM -0500, Zi Yan wrote:
> - for (i = 0; i < nr; i++) {
> + for (i = 0; i < nr; i += (1 << new_order)) {
> page_owner = get_page_owner(page_ext);
> - page_owner->order = 0;
> + page_owner->order = new_order;
> page_ext = page_ext_next(page_ext);
> }

This doesn't do what you're hoping it will. It's going to set ->order to
new_order for the first N pages instead of every 1/N pages.

You'll need to do something like

page_ext = lookup_page_ext(page + i);

or add a new page_ext_add(page_ext, 1 << new_order);

2020-11-17 21:15:45

by Zi Yan

[permalink] [raw]

Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On 17 Nov 2020, at 16:10, Matthew Wilcox wrote:

> On Wed, Nov 11, 2020 at 03:40:05PM -0500, Zi Yan wrote:
>> - for (i = 0; i < nr; i++) {
>> + for (i = 0; i < nr; i += (1 << new_order)) {
>> page_owner = get_page_owner(page_ext);
>> - page_owner->order = 0;
>> + page_owner->order = new_order;
>> page_ext = page_ext_next(page_ext);
>> }
>
> This doesn't do what you're hoping it will. It's going to set ->order to
> new_order for the first N pages instead of every 1/N pages.
>
> You'll need to do something like
>
> page_ext = lookup_page_ext(page + i);

Will use this. Thanks.

>
> or add a new page_ext_add(page_ext, 1 << new_order);

—
Best Regards,
Yan Zi

Attachments:

2020-11-17 21:28:22

[permalink] [raw]

Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On Tue, Nov 17, 2020 at 04:12:03PM -0500, Zi Yan wrote:
> On 17 Nov 2020, at 16:05, Matthew Wilcox wrote:
>
> > On Fri, Nov 13, 2020 at 05:38:01PM -0800, Roman Gushchin wrote:
> >> On Fri, Nov 13, 2020 at 08:08:58PM -0500, Zi Yan wrote:
> >>> Matthew recently converted split_page_owner to take nr instead of order.[1]
> >>> But I am not
> >>> sure why, since it seems to me that two call sites (__split_huge_page in
> >>> mm/huge_memory.c and split_page in mm/page_alloc.c) can pass the order
> >>> information.
> >>
> >> Yeah, I'm not sure why too. Maybe Matthew has some input here?
> >> You can also pass new_nr, but IMO orders look so much better here.
> >
> > If only I'd written that information in the changelog ... oh wait, I did!
> >
> > mm/page_owner: change split_page_owner to take a count
> >
> > The implementation of split_page_owner() prefers a count rather than the
> > old order of the page. When we support a variable size THP, we won't
> > have the order at this point, but we will have the number of pages.
> > So change the interface to what the caller and callee would prefer.
>
> There are two callers, split_page in mm/page_alloc.c and __split_huge_page in
> mm/huge_memory.c. The former has the page order. The latter has the page order
> information before __split_huge_page_tail is called, so we can do
> old_order = thp_order(head) instead of nr = thp_nr_page(head) and use old_order.
> What am I missing there?

Sure, we could also do that. But what I wrote was true at the time I
wrote it.

2020-11-17 21:29:51

by Zi Yan

[permalink] [raw]

Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On 17 Nov 2020, at 16:22, Matthew Wilcox wrote:

> On Tue, Nov 17, 2020 at 04:12:03PM -0500, Zi Yan wrote:
>> On 17 Nov 2020, at 16:05, Matthew Wilcox wrote:
>>
>>> On Fri, Nov 13, 2020 at 05:38:01PM -0800, Roman Gushchin wrote:
>>>> On Fri, Nov 13, 2020 at 08:08:58PM -0500, Zi Yan wrote:
>>>>> Matthew recently converted split_page_owner to take nr instead of order.[1]
>>>>> But I am not
>>>>> sure why, since it seems to me that two call sites (__split_huge_page in
>>>>> mm/huge_memory.c and split_page in mm/page_alloc.c) can pass the order
>>>>> information.
>>>>
>>>> Yeah, I'm not sure why too. Maybe Matthew has some input here?
>>>> You can also pass new_nr, but IMO orders look so much better here.
>>>
>>> If only I'd written that information in the changelog ... oh wait, I did!
>>>
>>> mm/page_owner: change split_page_owner to take a count
>>>
>>> The implementation of split_page_owner() prefers a count rather than the
>>> old order of the page. When we support a variable size THP, we won't
>>> have the order at this point, but we will have the number of pages.
>>> So change the interface to what the caller and callee would prefer.
>>
>> There are two callers, split_page in mm/page_alloc.c and __split_huge_page in
>> mm/huge_memory.c. The former has the page order. The latter has the page order
>> information before __split_huge_page_tail is called, so we can do
>> old_order = thp_order(head) instead of nr = thp_nr_page(head) and use old_order.
>> What am I missing there?
>
> Sure, we could also do that. But what I wrote was true at the time I
> wrote it.

Got it. Thanks. Will change it to use old_order to make split_page_owner parameters
look more consistent.

—
Best Regards,
Yan Zi

Attachments:

2020-11-17 21:39:29

[permalink] [raw]

Subject: Re: [RFC PATCH 3/6] mm: page_owner: add support for splitting to any order in split page_owner.

On Tue, Nov 17, 2020 at 09:22:55PM +0000, Matthew Wilcox wrote:
> On Tue, Nov 17, 2020 at 04:12:03PM -0500, Zi Yan wrote:
> > On 17 Nov 2020, at 16:05, Matthew Wilcox wrote:
> >
> > > On Fri, Nov 13, 2020 at 05:38:01PM -0800, Roman Gushchin wrote:
> > >> On Fri, Nov 13, 2020 at 08:08:58PM -0500, Zi Yan wrote:
> > >>> Matthew recently converted split_page_owner to take nr instead of order.[1]
> > >>> But I am not
> > >>> sure why, since it seems to me that two call sites (__split_huge_page in
> > >>> mm/huge_memory.c and split_page in mm/page_alloc.c) can pass the order
> > >>> information.
> > >>
> > >> Yeah, I'm not sure why too. Maybe Matthew has some input here?
> > >> You can also pass new_nr, but IMO orders look so much better here.
> > >
> > > If only I'd written that information in the changelog ... oh wait, I did!
> > >
> > > mm/page_owner: change split_page_owner to take a count
> > >
> > > The implementation of split_page_owner() prefers a count rather than the
> > > old order of the page. When we support a variable size THP, we won't
> > > have the order at this point, but we will have the number of pages.
> > > So change the interface to what the caller and callee would prefer.
> >
> > There are two callers, split_page in mm/page_alloc.c and __split_huge_page in
> > mm/huge_memory.c. The former has the page order. The latter has the page order
> > information before __split_huge_page_tail is called, so we can do
> > old_order = thp_order(head) instead of nr = thp_nr_page(head) and use old_order.
> > What am I missing there?
>
> Sure, we could also do that. But what I wrote was true at the time I
> wrote it.

Sure, I was asking about if you're ok with going back to orders or there are better
ideas. I'm sorry if it wasn't clear and sounded differently.

It just seems to me than a function is taking nr and order (as in Zi's last version),
I'd expect that it's a number of pages of given order, or something like this.
So I'd avoid mixing them. Orders are slightly better if nr is always a power of two,
it's just more obvious from looking at the code.

Thanks!

2020-11-17 21:46:25