2023-04-14 19:50:58

by Tarun Sahu

[permalink] [raw]
Subject: [PATCH] mm/folio: Avoid special handling for order value 0 in folio_set_order

folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order
folio does not have any tail page to set order. folio->_folio_nr_pages is
set to 0 for order 0 in folio_set_order. It is required because
_folio_nr_pages overlapped with page->mapping and leaving it non zero
caused "bad page" error while freeing gigantic hugepages. This was fixed in
Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic
pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA
pages to CMA") now explicitly clear page->mapping and hence we won't see
the bad page error even if _folio_nr_pages remains unset. Also the order 0
folios are not supposed to call folio_set_order, So now we can get rid of
folio_set_order(folio, 0) from hugetlb code path to clear the confusion.

The patch also moves _folio_set_head and folio_set_order calls in
__prep_compound_gigantic_folio() such that we avoid clearing them in the
error path.

Testing: I have run LTP tests, which all passes. and also I have written
the test in LTP which tests the bug caused by compound_nr and page->mapping
overlapping.

https://lore.kernel.org/all/[email protected]/

Running on older kernel ( < 5.10-rc7) with the above bug this fails while
on newer kernel and, also with this patch it passes.

Signed-off-by: Tarun Sahu <[email protected]>
---
mm/hugetlb.c | 9 +++------
mm/internal.h | 8 ++------
2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f16b25b1a6b9..e2540269c1dc 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio,
set_page_refcounted(p);
}

- folio_set_order(folio, 0);
__folio_clear_head(folio);
}

@@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
struct page *p;

__folio_clear_reserved(folio);
- __folio_set_head(folio);
- /* we rely on prep_new_hugetlb_folio to set the destructor */
- folio_set_order(folio, order);
for (i = 0; i < nr_pages; i++) {
p = folio_page(folio, i);

@@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
if (i != 0)
set_compound_head(p, &folio->page);
}
+ __folio_set_head(folio);
+ /* we rely on prep_new_hugetlb_folio to set the destructor */
+ folio_set_order(folio, order);
atomic_set(&folio->_entire_mapcount, -1);
atomic_set(&folio->_nr_pages_mapped, 0);
atomic_set(&folio->_pincount, 0);
@@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
p = folio_page(folio, j);
__ClearPageReserved(p);
}
- folio_set_order(folio, 0);
- __folio_clear_head(folio);
return false;
}

diff --git a/mm/internal.h b/mm/internal.h
index 18cda26b8a92..0d96a3bc1d58 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -425,16 +425,12 @@ int split_free_page(struct page *free_page,
*/
static inline void folio_set_order(struct folio *folio, unsigned int order)
{
- if (WARN_ON_ONCE(!folio_test_large(folio)))
+ if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
return;

folio->_folio_order = order;
#ifdef CONFIG_64BIT
- /*
- * When hugetlb dissolves a folio, we need to clear the tail
- * page, rather than setting nr_pages to 1.
- */
- folio->_folio_nr_pages = order ? 1U << order : 0;
+ folio->_folio_nr_pages = 1U << order;
#endif
}

--
2.31.1


2023-04-14 20:14:03

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] mm/folio: Avoid special handling for order value 0 in folio_set_order

On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote:
> folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order
> folio does not have any tail page to set order.

I think you're missing the point of how folio_set_order() is used.
When splitting a large folio, we need to zero out the folio_nr_pages
in the tail, so it does have a tail page, and that tail page needs to
be zeroed. We even assert that there is a tail page:

if (WARN_ON_ONCE(!folio_test_large(folio)))
return;

Or maybe you need to explain yourself better.

> folio->_folio_nr_pages is
> set to 0 for order 0 in folio_set_order. It is required because
> _folio_nr_pages overlapped with page->mapping and leaving it non zero
> caused "bad page" error while freeing gigantic hugepages. This was fixed in
> Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic
> pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA
> pages to CMA") now explicitly clear page->mapping and hence we won't see
> the bad page error even if _folio_nr_pages remains unset. Also the order 0
> folios are not supposed to call folio_set_order, So now we can get rid of
> folio_set_order(folio, 0) from hugetlb code path to clear the confusion.

... this is all very confusing.

> The patch also moves _folio_set_head and folio_set_order calls in
> __prep_compound_gigantic_folio() such that we avoid clearing them in the
> error path.

But don't we need those bits set while we operate on the folio to set it
up? It makes me nervous if we don't have those bits set because we can
end up with speculative references that point to a head page while that
page is not marked as a head page. It may not be a problem, but I want
to see some air-tight analysis of that.

> Testing: I have run LTP tests, which all passes. and also I have written
> the test in LTP which tests the bug caused by compound_nr and page->mapping
> overlapping.
>
> https://lore.kernel.org/all/[email protected]/
>
> Running on older kernel ( < 5.10-rc7) with the above bug this fails while
> on newer kernel and, also with this patch it passes.
>
> Signed-off-by: Tarun Sahu <[email protected]>
> ---
> mm/hugetlb.c | 9 +++------
> mm/internal.h | 8 ++------
> 2 files changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f16b25b1a6b9..e2540269c1dc 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio,
> set_page_refcounted(p);
> }
>
> - folio_set_order(folio, 0);
> __folio_clear_head(folio);
> }
>
> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> struct page *p;
>
> __folio_clear_reserved(folio);
> - __folio_set_head(folio);
> - /* we rely on prep_new_hugetlb_folio to set the destructor */
> - folio_set_order(folio, order);
> for (i = 0; i < nr_pages; i++) {
> p = folio_page(folio, i);
>
> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> if (i != 0)
> set_compound_head(p, &folio->page);
> }
> + __folio_set_head(folio);
> + /* we rely on prep_new_hugetlb_folio to set the destructor */
> + folio_set_order(folio, order);
> atomic_set(&folio->_entire_mapcount, -1);
> atomic_set(&folio->_nr_pages_mapped, 0);
> atomic_set(&folio->_pincount, 0);
> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> p = folio_page(folio, j);
> __ClearPageReserved(p);
> }
> - folio_set_order(folio, 0);
> - __folio_clear_head(folio);
> return false;
> }
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 18cda26b8a92..0d96a3bc1d58 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page,
> */
> static inline void folio_set_order(struct folio *folio, unsigned int order)
> {
> - if (WARN_ON_ONCE(!folio_test_large(folio)))
> + if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
> return;
>
> folio->_folio_order = order;
> #ifdef CONFIG_64BIT
> - /*
> - * When hugetlb dissolves a folio, we need to clear the tail
> - * page, rather than setting nr_pages to 1.
> - */
> - folio->_folio_nr_pages = order ? 1U << order : 0;
> + folio->_folio_nr_pages = 1U << order;
> #endif
> }
>
> --
> 2.31.1
>

2023-04-14 21:41:54

by Sidhartha Kumar

[permalink] [raw]
Subject: Re: [PATCH] mm/folio: Avoid special handling for order value 0 in folio_set_order

On 4/14/23 12:48 PM, Tarun Sahu wrote:
> folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order
> folio does not have any tail page to set order. folio->_folio_nr_pages is
> set to 0 for order 0 in folio_set_order. It is required because

In the previous discussion of this function, Mike mentioned having
folio_set_order() be used for non-zero orders and adding a
folio_clear_order() that is used to set order to 0. This could be done
to reduce confusion.

> _folio_nr_pages overlapped with page->mapping and leaving it non zero
> caused "bad page" error while freeing gigantic hugepages. This was fixed in
> Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic
> pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA
> pages to CMA") now explicitly clear page->mapping and hence we won't see
> the bad page error even if _folio_nr_pages remains unset. Also the order 0
> folios are not supposed to call folio_set_order, So now we can get rid of
> folio_set_order(folio, 0) from hugetlb code path to clear the confusion.
>
> The patch also moves _folio_set_head and folio_set_order calls in
> __prep_compound_gigantic_folio() such that we avoid clearing them in the
> error path.
>
> Testing: I have run LTP tests, which all passes. and also I have written
> the test in LTP which tests the bug caused by compound_nr and page->mapping
> overlapping.
>
> https://lore.kernel.org/all/[email protected]/
>
> Running on older kernel ( < 5.10-rc7) with the above bug this fails while
> on newer kernel and, also with this patch it passes.
>
> Signed-off-by: Tarun Sahu <[email protected]>
> ---
> mm/hugetlb.c | 9 +++------
> mm/internal.h | 8 ++------
> 2 files changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f16b25b1a6b9..e2540269c1dc 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio,
> set_page_refcounted(p);
> }
>
> - folio_set_order(folio, 0);
> __folio_clear_head(folio);
> }
>
> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> struct page *p;
>
> __folio_clear_reserved(folio);
> - __folio_set_head(folio);
> - /* we rely on prep_new_hugetlb_folio to set the destructor */
> - folio_set_order(folio, order);
> for (i = 0; i < nr_pages; i++) {
> p = folio_page(folio, i);
>
> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> if (i != 0)
> set_compound_head(p, &folio->page);
> }

calling set_compound_head() for the tail page before the folio has the
head flag set could seem misleading. At this point order is not set as
well so it is not clear that the folio is a compound page folio.

> + __folio_set_head(folio);
> + /* we rely on prep_new_hugetlb_folio to set the destructor */
> + folio_set_order(folio, order);
> atomic_set(&folio->_entire_mapcount, -1);
> atomic_set(&folio->_nr_pages_mapped, 0);
> atomic_set(&folio->_pincount, 0);
> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> p = folio_page(folio, j);
> __ClearPageReserved(p);
> }
> - folio_set_order(folio, 0);
> - __folio_clear_head(folio);
> return false;
> }
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 18cda26b8a92..0d96a3bc1d58 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page,
> */
> static inline void folio_set_order(struct folio *folio, unsigned int order)
> {
> - if (WARN_ON_ONCE(!folio_test_large(folio)))
> + if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
> return;
>
> folio->_folio_order = order;
> #ifdef CONFIG_64BIT
> - /*
> - * When hugetlb dissolves a folio, we need to clear the tail
> - * page, rather than setting nr_pages to 1.
> - */
> - folio->_folio_nr_pages = order ? 1U << order : 0;
> + folio->_folio_nr_pages = 1U << order;
> #endif
> }
>

2023-04-18 09:06:25

by Tarun Sahu

[permalink] [raw]
Subject: Re: [PATCH] mm/folio: Avoid special handling for order value 0 in folio_set_order

Matthew Wilcox <[email protected]> writes:

Hi Mathew,
Thanks for reviewing. please find my comments inline.

> On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote:
>> folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order
>> folio does not have any tail page to set order.
>
> I think you're missing the point of how folio_set_order() is used.
> When splitting a large folio, we need to zero out the folio_nr_pages
> in the tail, so it does have a tail page, and that tail page needs to
> be zeroed. We even assert that there is a tail page:
>
> if (WARN_ON_ONCE(!folio_test_large(folio)))
> return;
>
> Or maybe you need to explain yourself better.
>

Yes, I understand, folio_set_order(order, 0) is called to clear out
tail pages folio_order/folio_nr_pages. With this patch, I am trying
to convey two things:-
1. It is not necessary to clear out these field if page->mapping is
being explicitly updated. I explain this below [EXP].

2. folio_set_order(order, 0) now currently being used to clear
folio_order and folio_nr_pages which is ok. But looking at
folio_set_order(folio, 0) is confusing as setting order 0 implies that
only 1 page in folio. and folio_order and folio_nr_pages are part of
first tail page. IIRC, there was a discussion to use folio_clear_order
to avoid such confusion. But if above point 1 deemed to be correct, there
will not be any need of this too.

**[EXP]**
IIUC, during splitting, page->mapping is updated
explicitly for tail pages. There is no code path I see, where
folio_set_order(order, 0) or set_compound_order(head, 0) is called
except below two places.

1. __destroy_compound_gigantic_folio
Here, in past there was a problem when struct page used to have
compound_nr field which used to overlap with page->mapping. So
while freeing, it was necessary to explicitly clear out
compound_nr. Which was taken care by Commit ba9c1201beaa
("mm/hugetlb: clear compound_nr before freeing gigantic pages").

But after, Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA
pages to CMA"), page->mapping has explicitly been cleared out for
all tail pages.

for (i = 1; i < nr_pages; i++) {
p = folio_page(folio, i);
p->mapping = NULL; <======== (Here)
clear_compound_head(p);
if (!demote)
set_page_refcounted(p);
}
folio_set_order(folio, 0); <== this line can be removed.

2. __prep_compound_gigantic_folio
Here, folio_set_order(folio, 0) is called in error path only.
which can be avoided if we call folio_set_order(folio, order)
after the for loop.
I am new to memory allocators. But as far as I could understood by
looking at past discussion around this function [1][2], During RCU
grace period there could be a race condition causing ref count
inflation. But IIUC, that doesn't have any dependency on newly
allocated gigantic page except that the ref count might be taken
by folio_ref_try_add_rcu for the same page/s which will cause
prep_compound_gigantic_folio to fail. So IMHO, it will be ok to
move __folio_set_head and folio_set_order after the for loop.
Here, Just for reference, below I copy pasted the *for loop*,
from before, I am moving these two calls to after this.

for (i = 0; i < nr_pages; i++) {
p = folio_page(folio, i);

if (i != 0) /* head page cleared above */
__ClearPageReserved(p);
if (!demote) {
if (!page_ref_freeze(p, 1)) {
pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n");
goto out_error;
}
} else {
VM_BUG_ON_PAGE(page_count(p), p);
}
if (i != 0)
set_compound_head(p, &folio->page);
}

I also tested it with triggering demotion of gigantic hugepages to PMD
hugepages.

$ echo 5 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
$ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
5
$ cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
0
$ echo 1 > /sys/kernel/mm/hugepages/hugepages-1048576kB/demote
$ cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
512

I am quite new to field. Please correct me if I understood it
differently than it is. Also if I didn't consider other code path
for its consideration.

[1] https://lore.kernel.org/linux-mm/CAG48ez23q0Jy9cuVnwAe7t_fdhMk2S7N5Hdi-GLcCeq5bsfLxw@mail.gmail.com/
[2] https://lore.kernel.org/all/[email protected]/T/#u

>> folio->_folio_nr_pages is
>> set to 0 for order 0 in folio_set_order. It is required because
>> _folio_nr_pages overlapped with page->mapping and leaving it non zero
>> caused "bad page" error while freeing gigantic hugepages. This was fixed in
>> Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic
>> pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA
>> pages to CMA") now explicitly clear page->mapping and hence we won't see
>> the bad page error even if _folio_nr_pages remains unset. Also the order 0
>> folios are not supposed to call folio_set_order, So now we can get rid of
>> folio_set_order(folio, 0) from hugetlb code path to clear the confusion.
>
> ... this is all very confusing.
>

Sorry, for this. Lemme know if above explanation [EXP] is clear.

>> The patch also moves _folio_set_head and folio_set_order calls in
>> __prep_compound_gigantic_folio() such that we avoid clearing them in the
>> error path.
>
> But don't we need those bits set while we operate on the folio to set it
> up? It makes me nervous if we don't have those bits set because we can
> end up with speculative references that point to a head page while that
> page is not marked as a head page. It may not be a problem, but I want
> to see some air-tight analysis of that.
>


>> Testing: I have run LTP tests, which all passes. and also I have written
>> the test in LTP which tests the bug caused by compound_nr and page->mapping
>> overlapping.
>>
>> https://lore.kernel.org/all/[email protected]/
>>
>> Running on older kernel ( < 5.10-rc7) with the above bug this fails while
>> on newer kernel and, also with this patch it passes.
>>
>> Signed-off-by: Tarun Sahu <[email protected]>
>> ---
>> mm/hugetlb.c | 9 +++------
>> mm/internal.h | 8 ++------
>> 2 files changed, 5 insertions(+), 12 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index f16b25b1a6b9..e2540269c1dc 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio,
>> set_page_refcounted(p);
>> }
>>
>> - folio_set_order(folio, 0);
>> __folio_clear_head(folio);
>> }
>>
>> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>> struct page *p;
>>
>> __folio_clear_reserved(folio);
>> - __folio_set_head(folio);
>> - /* we rely on prep_new_hugetlb_folio to set the destructor */
>> - folio_set_order(folio, order);
>> for (i = 0; i < nr_pages; i++) {
>> p = folio_page(folio, i);
>>
>> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>> if (i != 0)
>> set_compound_head(p, &folio->page);
>> }
>> + __folio_set_head(folio);
>> + /* we rely on prep_new_hugetlb_folio to set the destructor */
>> + folio_set_order(folio, order);
>> atomic_set(&folio->_entire_mapcount, -1);
>> atomic_set(&folio->_nr_pages_mapped, 0);
>> atomic_set(&folio->_pincount, 0);
>> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>> p = folio_page(folio, j);
>> __ClearPageReserved(p);
>> }
>> - folio_set_order(folio, 0);
>> - __folio_clear_head(folio);
>> return false;
>> }
>>
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 18cda26b8a92..0d96a3bc1d58 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page,
>> */
>> static inline void folio_set_order(struct folio *folio, unsigned int order)
>> {
>> - if (WARN_ON_ONCE(!folio_test_large(folio)))
>> + if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
>> return;
>>
>> folio->_folio_order = order;
>> #ifdef CONFIG_64BIT
>> - /*
>> - * When hugetlb dissolves a folio, we need to clear the tail
>> - * page, rather than setting nr_pages to 1.
>> - */
>> - folio->_folio_nr_pages = order ? 1U << order : 0;
>> + folio->_folio_nr_pages = 1U << order;
>> #endif
>> }
>>
>> --
>> 2.31.1
>>

2023-04-18 09:30:22

by Tarun Sahu

[permalink] [raw]
Subject: Re: [PATCH] mm/folio: Avoid special handling for order value 0 in folio_set_order

Hi Sidhartha,

Thanks for your inputs, please find my comments inline

> On 4/14/23 12:48 PM, Tarun Sahu wrote:
>> folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order
>> folio does not have any tail page to set order. folio->_folio_nr_pages is
>> set to 0 for order 0 in folio_set_order. It is required because
>
> In the previous discussion of this function, Mike mentioned having
> folio_set_order() be used for non-zero orders and adding a
> folio_clear_order() that is used to set order to 0. This could be done
> to reduce confusion.
>
Yes, I agree, I replied to Mathew reply to this thread, Lemme know your
thought on this. In this patch, I proposed that there won't be need of
folio_clear_order if folio_set_order(folio, 0) is not needed with minor
changes in code path.

>> _folio_nr_pages overlapped with page->mapping and leaving it non zero
>> caused "bad page" error while freeing gigantic hugepages. This was fixed in
>> Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic
>> pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA
>> pages to CMA") now explicitly clear page->mapping and hence we won't see
>> the bad page error even if _folio_nr_pages remains unset. Also the order 0
>> folios are not supposed to call folio_set_order, So now we can get rid of
>> folio_set_order(folio, 0) from hugetlb code path to clear the confusion.
>>
>> The patch also moves _folio_set_head and folio_set_order calls in
>> __prep_compound_gigantic_folio() such that we avoid clearing them in the
>> error path.
>>
>> Testing: I have run LTP tests, which all passes. and also I have written
>> the test in LTP which tests the bug caused by compound_nr and page->mapping
>> overlapping.
>>
>> https://lore.kernel.org/all/[email protected]/
>>
>> Running on older kernel ( < 5.10-rc7) with the above bug this fails while
>> on newer kernel and, also with this patch it passes.
>>
>> Signed-off-by: Tarun Sahu <[email protected]>
>> ---
>> mm/hugetlb.c | 9 +++------
>> mm/internal.h | 8 ++------
>> 2 files changed, 5 insertions(+), 12 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index f16b25b1a6b9..e2540269c1dc 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio,
>> set_page_refcounted(p);
>> }
>>
>> - folio_set_order(folio, 0);
>> __folio_clear_head(folio);
>> }
>>
>> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>> struct page *p;
>>
>> __folio_clear_reserved(folio);
>> - __folio_set_head(folio);
>> - /* we rely on prep_new_hugetlb_folio to set the destructor */
>> - folio_set_order(folio, order);
>> for (i = 0; i < nr_pages; i++) {
>> p = folio_page(folio, i);
>>
>> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>> if (i != 0)
>> set_compound_head(p, &folio->page);
>> }
>
> calling set_compound_head() for the tail page before the folio has the
> head flag set could seem misleading. At this point order is not set as
> well so it is not clear that the folio is a compound page folio.
>
Yeah, I agree, But they are part of same call. I can avoid moving
__folio_set_head. And I think, It wont mislead if I avoid moving
__folio_set_head. Below function has the similar path.

void prep_compound_page(struct page *page, unsigned int order)
{
int i;
int nr_pages = 1 << order;

__SetPageHead(page);
for (i = 1; i < nr_pages; i++)
prep_compound_tail(page, i);

prep_compound_head(page, order);
}

Lemme know you thoughts.


~Tarun

>> + __folio_set_head(folio);
>> + /* we rely on prep_new_hugetlb_folio to set the destructor */
>> + folio_set_order(folio, order);
>> atomic_set(&folio->_entire_mapcount, -1);
>> atomic_set(&folio->_nr_pages_mapped, 0);
>> atomic_set(&folio->_pincount, 0);
>> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>> p = folio_page(folio, j);
>> __ClearPageReserved(p);
>> }
>> - folio_set_order(folio, 0);
>> - __folio_clear_head(folio);
>> return false;
>> }
>>
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 18cda26b8a92..0d96a3bc1d58 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page,
>> */
>> static inline void folio_set_order(struct folio *folio, unsigned int order)
>> {
>> - if (WARN_ON_ONCE(!folio_test_large(folio)))
>> + if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
>> return;
>>
>> folio->_folio_order = order;
>> #ifdef CONFIG_64BIT
>> - /*
>> - * When hugetlb dissolves a folio, we need to clear the tail
>> - * page, rather than setting nr_pages to 1.
>> - */
>> - folio->_folio_nr_pages = order ? 1U << order : 0;
>> + folio->_folio_nr_pages = 1U << order;
>> #endif
>> }
>>

2023-04-18 09:32:03

by Tarun Sahu

[permalink] [raw]
Subject: Re: [PATCH] mm/folio: Avoid special handling for order value 0 in folio_set_order

Tarun Sahu <[email protected]> writes:

> Hi Sidhartha,
>
> Thanks for your inputs, please find my comments inline
>
>> On 4/14/23 12:48 PM, Tarun Sahu wrote:
>>> folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order
>>> folio does not have any tail page to set order. folio->_folio_nr_pages is
>>> set to 0 for order 0 in folio_set_order. It is required because
>>
>> In the previous discussion of this function, Mike mentioned having
>> folio_set_order() be used for non-zero orders and adding a
>> folio_clear_order() that is used to set order to 0. This could be done
>> to reduce confusion.
>>
> Yes, I agree, I replied to Mathew reply to this thread, Lemme know your
> thought on this. In this patch, I proposed that there won't be need of
> folio_clear_order if folio_set_order(folio, 0) is not needed with minor
> changes in code path.
>
>>> _folio_nr_pages overlapped with page->mapping and leaving it non zero
>>> caused "bad page" error while freeing gigantic hugepages. This was fixed in
>>> Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic
>>> pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA
>>> pages to CMA") now explicitly clear page->mapping and hence we won't see
>>> the bad page error even if _folio_nr_pages remains unset. Also the order 0
>>> folios are not supposed to call folio_set_order, So now we can get rid of
>>> folio_set_order(folio, 0) from hugetlb code path to clear the confusion.
>>>
>>> The patch also moves _folio_set_head and folio_set_order calls in
>>> __prep_compound_gigantic_folio() such that we avoid clearing them in the
>>> error path.
>>>
>>> Testing: I have run LTP tests, which all passes. and also I have written
>>> the test in LTP which tests the bug caused by compound_nr and page->mapping
>>> overlapping.
>>>
>>> https://lore.kernel.org/all/[email protected]/
>>>
>>> Running on older kernel ( < 5.10-rc7) with the above bug this fails while
>>> on newer kernel and, also with this patch it passes.
>>>
>>> Signed-off-by: Tarun Sahu <[email protected]>
>>> ---
>>> mm/hugetlb.c | 9 +++------
>>> mm/internal.h | 8 ++------
>>> 2 files changed, 5 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index f16b25b1a6b9..e2540269c1dc 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio,
>>> set_page_refcounted(p);
>>> }
>>>
>>> - folio_set_order(folio, 0);
>>> __folio_clear_head(folio);
>>> }
>>>
>>> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>>> struct page *p;
>>>
>>> __folio_clear_reserved(folio);
>>> - __folio_set_head(folio);
>>> - /* we rely on prep_new_hugetlb_folio to set the destructor */
>>> - folio_set_order(folio, order);
>>> for (i = 0; i < nr_pages; i++) {
>>> p = folio_page(folio, i);
>>>
>>> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>>> if (i != 0)
>>> set_compound_head(p, &folio->page);
>>> }
>>
>> calling set_compound_head() for the tail page before the folio has the
>> head flag set could seem misleading. At this point order is not set as
>> well so it is not clear that the folio is a compound page folio.
>>
> Yeah, I agree, But they are part of same call. I can avoid moving
> __folio_set_head. And I think, It wont mislead if I avoid moving
> __folio_set_head. Below function has the similar path.

Apologies, Here, I mixed the sentences, I want to say
It won't mislead if we avoid moving __folio_set_head, but move only
folio_set_order. Below function does the same.
>
> void prep_compound_page(struct page *page, unsigned int order)
> {
> int i;
> int nr_pages = 1 << order;
>
> __SetPageHead(page);
> for (i = 1; i < nr_pages; i++)
> prep_compound_tail(page, i);
>
> prep_compound_head(page, order);
> }
>
> Lemme know you thoughts.
>
>
> ~Tarun
>
>>> + __folio_set_head(folio);
>>> + /* we rely on prep_new_hugetlb_folio to set the destructor */
>>> + folio_set_order(folio, order);
>>> atomic_set(&folio->_entire_mapcount, -1);
>>> atomic_set(&folio->_nr_pages_mapped, 0);
>>> atomic_set(&folio->_pincount, 0);
>>> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>>> p = folio_page(folio, j);
>>> __ClearPageReserved(p);
>>> }
>>> - folio_set_order(folio, 0);
>>> - __folio_clear_head(folio);
>>> return false;
>>> }
>>>
>>> diff --git a/mm/internal.h b/mm/internal.h
>>> index 18cda26b8a92..0d96a3bc1d58 100644
>>> --- a/mm/internal.h
>>> +++ b/mm/internal.h
>>> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page,
>>> */
>>> static inline void folio_set_order(struct folio *folio, unsigned int order)
>>> {
>>> - if (WARN_ON_ONCE(!folio_test_large(folio)))
>>> + if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
>>> return;
>>>
>>> folio->_folio_order = order;
>>> #ifdef CONFIG_64BIT
>>> - /*
>>> - * When hugetlb dissolves a folio, we need to clear the tail
>>> - * page, rather than setting nr_pages to 1.
>>> - */
>>> - folio->_folio_nr_pages = order ? 1U << order : 0;
>>> + folio->_folio_nr_pages = 1U << order;
>>> #endif
>>> }
>>>

2023-04-18 19:02:24

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH] mm/folio: Avoid special handling for order value 0 in folio_set_order

On 04/14/23 21:12, Matthew Wilcox wrote:
> On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote:
> > folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order
> > folio does not have any tail page to set order.
>
> I think you're missing the point of how folio_set_order() is used.
> When splitting a large folio, we need to zero out the folio_nr_pages
> in the tail, so it does have a tail page, and that tail page needs to
> be zeroed. We even assert that there is a tail page:
>
> if (WARN_ON_ONCE(!folio_test_large(folio)))
> return;
>
> Or maybe you need to explain yourself better.
>
> > folio->_folio_nr_pages is
> > set to 0 for order 0 in folio_set_order. It is required because
> > _folio_nr_pages overlapped with page->mapping and leaving it non zero
> > caused "bad page" error while freeing gigantic hugepages. This was fixed in
> > Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic
> > pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA
> > pages to CMA") now explicitly clear page->mapping and hence we won't see
> > the bad page error even if _folio_nr_pages remains unset. Also the order 0
> > folios are not supposed to call folio_set_order, So now we can get rid of
> > folio_set_order(folio, 0) from hugetlb code path to clear the confusion.
>
> ... this is all very confusing.
>
> > The patch also moves _folio_set_head and folio_set_order calls in
> > __prep_compound_gigantic_folio() such that we avoid clearing them in the
> > error path.
>
> But don't we need those bits set while we operate on the folio to set it
> up? It makes me nervous if we don't have those bits set because we can
> end up with speculative references that point to a head page while that
> page is not marked as a head page. It may not be a problem, but I want
> to see some air-tight analysis of that.

I am fairly certain we are 'safe'. Here is code before setting up the
pointer to the head page.

* In the case of demote, the ref count will be zero.
*/
if (!demote) {
if (!page_ref_freeze(p, 1)) {
pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n");
goto out_error;
}
} else {
VM_BUG_ON_PAGE(page_count(p), p);
}
if (i != 0)
set_compound_head(p, &folio->page);

So, before setting the pointer to head page ref count will be zero.

I 'think' it would actually be better to move the calls to _folio_set_head and
folio_set_order in __prep_compound_gigantic_folio() as suggested here. Why?
In the current code, the ref count on the 'head page' is still 1 (or more)
while those calls are made. So, someone could take a speculative ref on the
page BEFORE the tail pages are set up.

TBH, I do not have much of an opinion about potential confusion surrounding
folio_set_compound_order(folio, 0). IIUC, hugetlb gigantic page setup is the
only place outside the page allocation code that sets up compound pages/large
folios. So, it is going to be a bit 'special'. As mentioned, when this was
originally discussed I suggested folio_clear_order(). I would be happy with
either.
--
Mike Kravetz

2023-04-24 15:43:32

by Tarun Sahu

[permalink] [raw]
Subject: Re: [PATCH] mm/folio: Avoid special handling for order value 0 in folio_set_order


Hi Mike,


Mike Kravetz <[email protected]> writes:

> On 04/14/23 21:12, Matthew Wilcox wrote:
>> On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote:
>> > folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order
>> > folio does not have any tail page to set order.
>>
>> I think you're missing the point of how folio_set_order() is used.
>> When splitting a large folio, we need to zero out the folio_nr_pages
>> in the tail, so it does have a tail page, and that tail page needs to
>> be zeroed. We even assert that there is a tail page:
>>
>> if (WARN_ON_ONCE(!folio_test_large(folio)))
>> return;
>>
>> Or maybe you need to explain yourself better.
>>
>> > folio->_folio_nr_pages is
>> > set to 0 for order 0 in folio_set_order. It is required because
>> > _folio_nr_pages overlapped with page->mapping and leaving it non zero
>> > caused "bad page" error while freeing gigantic hugepages. This was fixed in
>> > Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic
>> > pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA
>> > pages to CMA") now explicitly clear page->mapping and hence we won't see
>> > the bad page error even if _folio_nr_pages remains unset. Also the order 0
>> > folios are not supposed to call folio_set_order, So now we can get rid of
>> > folio_set_order(folio, 0) from hugetlb code path to clear the confusion.
>>
>> ... this is all very confusing.
>>
>> > The patch also moves _folio_set_head and folio_set_order calls in
>> > __prep_compound_gigantic_folio() such that we avoid clearing them in the
>> > error path.
>>
>> But don't we need those bits set while we operate on the folio to set it
>> up? It makes me nervous if we don't have those bits set because we can
>> end up with speculative references that point to a head page while that
>> page is not marked as a head page. It may not be a problem, but I want
>> to see some air-tight analysis of that.
>
> I am fairly certain we are 'safe'. Here is code before setting up the
> pointer to the head page.
>
> * In the case of demote, the ref count will be zero.
> */
> if (!demote) {
> if (!page_ref_freeze(p, 1)) {
> pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n");
> goto out_error;
> }
> } else {
> VM_BUG_ON_PAGE(page_count(p), p);
> }
> if (i != 0)
> set_compound_head(p, &folio->page);
>
> So, before setting the pointer to head page ref count will be zero.
>
> I 'think' it would actually be better to move the calls to _folio_set_head and
> folio_set_order in __prep_compound_gigantic_folio() as suggested here. Why?
> In the current code, the ref count on the 'head page' is still 1 (or more)
> while those calls are made. So, someone could take a speculative ref on the
> page BEFORE the tail pages are set up.
>

Thanks, for confirming the correctness of moving these calls. Also I
didn't look at it this way while moving them. Thanks for the comment.
I will update the commit msg and send the v2.

~Tarun

> TBH, I do not have much of an opinion about potential confusion surrounding
> folio_set_compound_order(folio, 0). IIUC, hugetlb gigantic page setup is the
> only place outside the page allocation code that sets up compound pages/large
> folios. So, it is going to be a bit 'special'. As mentioned, when this was
> originally discussed I suggested folio_clear_order(). I would be happy with
> either.


> --
> Mike Kravetz

2023-04-24 15:51:23

by Tarun Sahu

[permalink] [raw]
Subject: Re: [PATCH] mm/folio: Avoid special handling for order value 0 in folio_set_order

Hi, Mathew,

I am not sure If I was clear about my intention behind the patch.
Here, I attempt to answer it again. Please lemme know if you agree.

Matthew Wilcox <[email protected]> writes:

> On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote:
>> folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order
>> folio does not have any tail page to set order.
>
> I think you're missing the point of how folio_set_order() is used.
> When splitting a large folio, we need to zero out the folio_nr_pages
> in the tail, so it does have a tail page, and that tail page needs to
> be zeroed. We even assert that there is a tail page:
>
> if (WARN_ON_ONCE(!folio_test_large(folio)))
> return;
>
> Or maybe you need to explain yourself better.
>

In the upstream, I don't see folio_set_order(folio, 0) being called in
splitting path. IIUC, we had to zero out _folio_nr_pages during freeing
gigantic folio as described by Commit ba9c1201beaa
("mm/hugetlb: clear compound_nr before freeing gigantic pages").

I agree that folio_set_order(folio, 0) is called with folio having
tail pages. But I meant only that, in general, it is just confusing to
have setting the folio order to 0.

With this patch, I would like to draw attention to the point that there
is no need to call folio_set_order(folio, 0) anymore to zero out
_folio_order and _folio_nr_pages.

In past, it was needed because page->mapping used to overlap with
_folio_nr_pages and _folio_order. So if these fields were left
uncleared during freeing gigantic hugepages, they were causing
"BUG: bad page state" due to non-zero page->mapping. Now, After
Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
CMA") page->mapping has explicitly been cleared out for tail pages.
Also, _folio_order and _folio_nr_pages no longer overlaps with page->mapping.

struct page {
...
struct address_space * mapping; /* 24 8 */
...
}

struct folio {
...
union {
struct {
long unsigned int _flags_1; /* 64 8 */
long unsigned int _head_1; /* 72 8 */
unsigned char _folio_dtor; /* 80 1 */
unsigned char _folio_order; /* 81 1 */

/* XXX 2 bytes hole, try to pack */

atomic_t _entire_mapcount; /* 84 4 */
atomic_t _nr_pages_mapped; /* 88 4 */
atomic_t _pincount; /* 92 4 */
unsigned int _folio_nr_pages; /* 96 4 */
}; /* 64 40 */
struct page __page_1 __attribute__((__aligned__(8))); /* 64 64 */
}
...
}

So, folio_set_order(folio, 0) can be removed from freeing gigantic
folio path (__destroy_compound_gigantic_folio).

Another place, where folio_set_order(folio, 0) is called inside
__prep_compound_gigantic_folio during error path. Here,
folio_set_order(folio, 0) can also be removed if we move
folio_set_order(folio, order) after for loop. Also, Mike confirmed that
it is safe to move the call.

~Tarun

>> folio->_folio_nr_pages is
>> set to 0 for order 0 in folio_set_order. It is required because
>> _folio_nr_pages overlapped with page->mapping and leaving it non zero
>> caused "bad page" error while freeing gigantic hugepages. This was fixed in
>> Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic
>> pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA
>> pages to CMA") now explicitly clear page->mapping and hence we won't see
>> the bad page error even if _folio_nr_pages remains unset. Also the order 0
>> folios are not supposed to call folio_set_order, So now we can get rid of
>> folio_set_order(folio, 0) from hugetlb code path to clear the confusion.
>
> ... this is all very confusing.
>
>> The patch also moves _folio_set_head and folio_set_order calls in
>> __prep_compound_gigantic_folio() such that we avoid clearing them in the
>> error path.
>
> But don't we need those bits set while we operate on the folio to set it
> up? It makes me nervous if we don't have those bits set because we can
> end up with speculative references that point to a head page while that
> page is not marked as a head page. It may not be a problem, but I want
> to see some air-tight analysis of that.
>
>> Testing: I have run LTP tests, which all passes. and also I have written
>> the test in LTP which tests the bug caused by compound_nr and page->mapping
>> overlapping.
>>
>> https://lore.kernel.org/all/[email protected]/
>>
>> Running on older kernel ( < 5.10-rc7) with the above bug this fails while
>> on newer kernel and, also with this patch it passes.
>>
>> Signed-off-by: Tarun Sahu <[email protected]>
>> ---
>> mm/hugetlb.c | 9 +++------
>> mm/internal.h | 8 ++------
>> 2 files changed, 5 insertions(+), 12 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index f16b25b1a6b9..e2540269c1dc 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio,
>> set_page_refcounted(p);
>> }
>>
>> - folio_set_order(folio, 0);
>> __folio_clear_head(folio);
>> }
>>
>> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>> struct page *p;
>>
>> __folio_clear_reserved(folio);
>> - __folio_set_head(folio);
>> - /* we rely on prep_new_hugetlb_folio to set the destructor */
>> - folio_set_order(folio, order);
>> for (i = 0; i < nr_pages; i++) {
>> p = folio_page(folio, i);
>>
>> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>> if (i != 0)
>> set_compound_head(p, &folio->page);
>> }
>> + __folio_set_head(folio);
>> + /* we rely on prep_new_hugetlb_folio to set the destructor */
>> + folio_set_order(folio, order);
>> atomic_set(&folio->_entire_mapcount, -1);
>> atomic_set(&folio->_nr_pages_mapped, 0);
>> atomic_set(&folio->_pincount, 0);
>> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>> p = folio_page(folio, j);
>> __ClearPageReserved(p);
>> }
>> - folio_set_order(folio, 0);
>> - __folio_clear_head(folio);
>> return false;
>> }
>>
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 18cda26b8a92..0d96a3bc1d58 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page,
>> */
>> static inline void folio_set_order(struct folio *folio, unsigned int order)
>> {
>> - if (WARN_ON_ONCE(!folio_test_large(folio)))
>> + if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
>> return;
>>
>> folio->_folio_order = order;
>> #ifdef CONFIG_64BIT
>> - /*
>> - * When hugetlb dissolves a folio, we need to clear the tail
>> - * page, rather than setting nr_pages to 1.
>> - */
>> - folio->_folio_nr_pages = order ? 1U << order : 0;
>> + folio->_folio_nr_pages = 1U << order;
>> #endif
>> }
>>
>> --
>> 2.31.1
>>