LinuxLists.cc - [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set

2023-05-15 17:34:20

Subject: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order

folio_set_order(folio, 0) is used in kernel at two places
__destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
Currently, It is called to clear out the folio->_folio_nr_pages and
folio->_folio_order.

For __destroy_compound_gigantic_folio:
In past, folio_set_order(folio, 0) was needed because page->mapping used
to overlap with _folio_nr_pages and _folio_order. So if these fields were
left uncleared during freeing gigantic hugepages, they were causing
"BUG: bad page state" due to non-zero page->mapping. Now, After
Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
CMA") page->mapping has explicitly been cleared out for tail pages. Also,
_folio_order and _folio_nr_pages no longer overlaps with page->mapping.

struct page {
...
struct address_space * mapping; /* 24 8 */
...
}

struct folio {
...
union {
struct {
long unsigned int _flags_1; /* 64 8 */
long unsigned int _head_1; /* 72 8 */
unsigned char _folio_dtor; /* 80 1 */
unsigned char _folio_order; /* 81 1 */

/* XXX 2 bytes hole, try to pack */

atomic_t _entire_mapcount; /* 84 4 */
atomic_t _nr_pages_mapped; /* 88 4 */
atomic_t _pincount; /* 92 4 */
unsigned int _folio_nr_pages; /* 96 4 */
}; /* 64 40 */
struct page __page_1 __attribute__((__aligned__(8))); /* 64 64 */
}
...
}

So, folio_set_order(folio, 0) can be removed from freeing gigantic
folio path (__destroy_compound_gigantic_folio).

Another place, folio_set_order(folio, 0) is called inside
__prep_compound_gigantic_folio during error path. Here,
folio_set_order(folio, 0) can also be removed if we move
folio_set_order(folio, order) after for loop.

The patch also moves _folio_set_head call in __prep_compound_gigantic_folio()
such that we avoid clearing them in the error path.

Also, as Mike pointed out:
"It would actually be better to move the calls _folio_set_head and
folio_set_order in __prep_compound_gigantic_folio() as suggested here. Why?
In the current code, the ref count on the 'head page' is still 1 (or more)
while those calls are made. So, someone could take a speculative ref on the
page BEFORE the tail pages are set up."

This way, folio_set_order(folio, 0) is no more needed. And it will also
helps removing the confusion of folio order being set to 0 (as _folio_order
field is part of first tail page).

Testing: I have run LTP tests, which all passes. and also I have written
the test in LTP which tests the bug caused by compound_nr and page->mapping
overlapping.

https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/hugetlb/hugemmap/hugemmap32.c

Running on older kernel ( < 5.10-rc7) with the above bug this fails while
on newer kernel and, also with this patch it passes.

Signed-off-by: Tarun Sahu <[email protected]>
---
mm/hugetlb.c | 9 +++------
mm/internal.h | 8 ++------
2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f154019e6b84..607553445855 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio,
set_page_refcounted(p);
}

- folio_set_order(folio, 0);
__folio_clear_head(folio);
}

@@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
struct page *p;

__folio_clear_reserved(folio);
- __folio_set_head(folio);
- /* we rely on prep_new_hugetlb_folio to set the destructor */
- folio_set_order(folio, order);
for (i = 0; i < nr_pages; i++) {
p = folio_page(folio, i);

@@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
if (i != 0)
set_compound_head(p, &folio->page);
}
+ __folio_set_head(folio);
+ /* we rely on prep_new_hugetlb_folio to set the destructor */
+ folio_set_order(folio, order);
atomic_set(&folio->_entire_mapcount, -1);
atomic_set(&folio->_nr_pages_mapped, 0);
atomic_set(&folio->_pincount, 0);
@@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
p = folio_page(folio, j);
__ClearPageReserved(p);
}
- folio_set_order(folio, 0);
- __folio_clear_head(folio);
return false;
}

diff --git a/mm/internal.h b/mm/internal.h
index 68410c6d97ac..c59fe08c5b39 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -425,16 +425,12 @@ int split_free_page(struct page *free_page,
*/
static inline void folio_set_order(struct folio *folio, unsigned int order)
{
- if (WARN_ON_ONCE(!folio_test_large(folio)))
+ if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
return;

folio->_folio_order = order;
#ifdef CONFIG_64BIT
- /*
- * When hugetlb dissolves a folio, we need to clear the tail
- * page, rather than setting nr_pages to 1.
- */
- folio->_folio_nr_pages = order ? 1U << order : 0;
+ folio->_folio_nr_pages = 1U << order;
#endif
}

--
2.31.1

2023-05-15 17:35:22

by Tarun Sahu

[permalink] [raw]

Subject: Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order

Changes from v1:
- Changed the patch description. Added comment from Mike.

~Tarun

Tarun Sahu <[email protected]> writes:

> folio_set_order(folio, 0) is used in kernel at two places
> __destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
> Currently, It is called to clear out the folio->_folio_nr_pages and
> folio->_folio_order.
>
> For __destroy_compound_gigantic_folio:
> In past, folio_set_order(folio, 0) was needed because page->mapping used
> to overlap with _folio_nr_pages and _folio_order. So if these fields were
> left uncleared during freeing gigantic hugepages, they were causing
> "BUG: bad page state" due to non-zero page->mapping. Now, After
> Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
> CMA") page->mapping has explicitly been cleared out for tail pages. Also,
> _folio_order and _folio_nr_pages no longer overlaps with page->mapping.
>
> struct page {
> ...
> struct address_space * mapping; /* 24 8 */
> ...
> }
>
> struct folio {
> ...
> union {
> struct {
> long unsigned int _flags_1; /* 64 8 */
> long unsigned int _head_1; /* 72 8 */
> unsigned char _folio_dtor; /* 80 1 */
> unsigned char _folio_order; /* 81 1 */
>
> /* XXX 2 bytes hole, try to pack */
>
> atomic_t _entire_mapcount; /* 84 4 */
> atomic_t _nr_pages_mapped; /* 88 4 */
> atomic_t _pincount; /* 92 4 */
> unsigned int _folio_nr_pages; /* 96 4 */
> }; /* 64 40 */
> struct page __page_1 __attribute__((__aligned__(8))); /* 64 64 */
> }
> ...
> }
>
> So, folio_set_order(folio, 0) can be removed from freeing gigantic
> folio path (__destroy_compound_gigantic_folio).
>
> Another place, folio_set_order(folio, 0) is called inside
> __prep_compound_gigantic_folio during error path. Here,
> folio_set_order(folio, 0) can also be removed if we move
> folio_set_order(folio, order) after for loop.
>
> The patch also moves _folio_set_head call in __prep_compound_gigantic_folio()
> such that we avoid clearing them in the error path.
>
> Also, as Mike pointed out:
> "It would actually be better to move the calls _folio_set_head and
> folio_set_order in __prep_compound_gigantic_folio() as suggested here. Why?
> In the current code, the ref count on the 'head page' is still 1 (or more)
> while those calls are made. So, someone could take a speculative ref on the
> page BEFORE the tail pages are set up."
>
> This way, folio_set_order(folio, 0) is no more needed. And it will also
> helps removing the confusion of folio order being set to 0 (as _folio_order
> field is part of first tail page).
>
> Testing: I have run LTP tests, which all passes. and also I have written
> the test in LTP which tests the bug caused by compound_nr and page->mapping
> overlapping.
>
> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/hugetlb/hugemmap/hugemmap32.c
>
> Running on older kernel ( < 5.10-rc7) with the above bug this fails while
> on newer kernel and, also with this patch it passes.
>
> Signed-off-by: Tarun Sahu <[email protected]>
> ---
> mm/hugetlb.c | 9 +++------
> mm/internal.h | 8 ++------
> 2 files changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f154019e6b84..607553445855 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio,
> set_page_refcounted(p);
> }
>
> - folio_set_order(folio, 0);
> __folio_clear_head(folio);
> }
>
> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> struct page *p;
>
> __folio_clear_reserved(folio);
> - __folio_set_head(folio);
> - /* we rely on prep_new_hugetlb_folio to set the destructor */
> - folio_set_order(folio, order);
> for (i = 0; i < nr_pages; i++) {
> p = folio_page(folio, i);
>
> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> if (i != 0)
> set_compound_head(p, &folio->page);
> }
> + __folio_set_head(folio);
> + /* we rely on prep_new_hugetlb_folio to set the destructor */
> + folio_set_order(folio, order);
> atomic_set(&folio->_entire_mapcount, -1);
> atomic_set(&folio->_nr_pages_mapped, 0);
> atomic_set(&folio->_pincount, 0);
> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> p = folio_page(folio, j);
> __ClearPageReserved(p);
> }
> - folio_set_order(folio, 0);
> - __folio_clear_head(folio);
> return false;
> }
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 68410c6d97ac..c59fe08c5b39 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page,
> */
> static inline void folio_set_order(struct folio *folio, unsigned int order)
> {
> - if (WARN_ON_ONCE(!folio_test_large(folio)))
> + if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
> return;
>
> folio->_folio_order = order;
> #ifdef CONFIG_64BIT
> - /*
> - * When hugetlb dissolves a folio, we need to clear the tail
> - * page, rather than setting nr_pages to 1.
> - */
> - folio->_folio_nr_pages = order ? 1U << order : 0;
> + folio->_folio_nr_pages = 1U << order;
> #endif
> }
>
> --
> 2.31.1

2023-05-22 05:57:08

by Tarun Sahu

[permalink] [raw]

Subject: Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order

Hi,

This is a gentle reminder, please let me know, If any information or any
changes are needed from my end.

Thanks
Tarun

Tarun Sahu <[email protected]> writes:

> folio_set_order(folio, 0) is used in kernel at two places
> __destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
> Currently, It is called to clear out the folio->_folio_nr_pages and
> folio->_folio_order.
>
> For __destroy_compound_gigantic_folio:
> In past, folio_set_order(folio, 0) was needed because page->mapping used
> to overlap with _folio_nr_pages and _folio_order. So if these fields were
> left uncleared during freeing gigantic hugepages, they were causing
> "BUG: bad page state" due to non-zero page->mapping. Now, After
> Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
> CMA") page->mapping has explicitly been cleared out for tail pages. Also,
> _folio_order and _folio_nr_pages no longer overlaps with page->mapping.
>
> struct page {
> ...
> struct address_space * mapping; /* 24 8 */
> ...
> }
>
> struct folio {
> ...
> union {
> struct {
> long unsigned int _flags_1; /* 64 8 */
> long unsigned int _head_1; /* 72 8 */
> unsigned char _folio_dtor; /* 80 1 */
> unsigned char _folio_order; /* 81 1 */
>
> /* XXX 2 bytes hole, try to pack */
>
> atomic_t _entire_mapcount; /* 84 4 */
> atomic_t _nr_pages_mapped; /* 88 4 */
> atomic_t _pincount; /* 92 4 */
> unsigned int _folio_nr_pages; /* 96 4 */
> }; /* 64 40 */
> struct page __page_1 __attribute__((__aligned__(8))); /* 64 64 */
> }
> ...
> }
>
> So, folio_set_order(folio, 0) can be removed from freeing gigantic
> folio path (__destroy_compound_gigantic_folio).
>
> Another place, folio_set_order(folio, 0) is called inside
> __prep_compound_gigantic_folio during error path. Here,
> folio_set_order(folio, 0) can also be removed if we move
> folio_set_order(folio, order) after for loop.
>
> The patch also moves _folio_set_head call in __prep_compound_gigantic_folio()
> such that we avoid clearing them in the error path.
>
> Also, as Mike pointed out:
> "It would actually be better to move the calls _folio_set_head and
> folio_set_order in __prep_compound_gigantic_folio() as suggested here. Why?
> In the current code, the ref count on the 'head page' is still 1 (or more)
> while those calls are made. So, someone could take a speculative ref on the
> page BEFORE the tail pages are set up."
>
> This way, folio_set_order(folio, 0) is no more needed. And it will also
> helps removing the confusion of folio order being set to 0 (as _folio_order
> field is part of first tail page).
>
> Testing: I have run LTP tests, which all passes. and also I have written
> the test in LTP which tests the bug caused by compound_nr and page->mapping
> overlapping.
>
> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/hugetlb/hugemmap/hugemmap32.c
>
> Running on older kernel ( < 5.10-rc7) with the above bug this fails while
> on newer kernel and, also with this patch it passes.
>
> Signed-off-by: Tarun Sahu <[email protected]>
> ---
> mm/hugetlb.c | 9 +++------
> mm/internal.h | 8 ++------
> 2 files changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f154019e6b84..607553445855 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio,
> set_page_refcounted(p);
> }
>
> - folio_set_order(folio, 0);
> __folio_clear_head(folio);
> }
>
> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> struct page *p;
>
> __folio_clear_reserved(folio);
> - __folio_set_head(folio);
> - /* we rely on prep_new_hugetlb_folio to set the destructor */
> - folio_set_order(folio, order);
> for (i = 0; i < nr_pages; i++) {
> p = folio_page(folio, i);
>
> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> if (i != 0)
> set_compound_head(p, &folio->page);
> }
> + __folio_set_head(folio);
> + /* we rely on prep_new_hugetlb_folio to set the destructor */
> + folio_set_order(folio, order);
> atomic_set(&folio->_entire_mapcount, -1);
> atomic_set(&folio->_nr_pages_mapped, 0);
> atomic_set(&folio->_pincount, 0);
> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> p = folio_page(folio, j);
> __ClearPageReserved(p);
> }
> - folio_set_order(folio, 0);
> - __folio_clear_head(folio);
> return false;
> }
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 68410c6d97ac..c59fe08c5b39 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page,
> */
> static inline void folio_set_order(struct folio *folio, unsigned int order)
> {
> - if (WARN_ON_ONCE(!folio_test_large(folio)))
> + if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
> return;
>
> folio->_folio_order = order;
> #ifdef CONFIG_64BIT
> - /*
> - * When hugetlb dissolves a folio, we need to clear the tail
> - * page, rather than setting nr_pages to 1.
> - */
> - folio->_folio_nr_pages = order ? 1U << order : 0;
> + folio->_folio_nr_pages = 1U << order;
> #endif
> }
>
> --
> 2.31.1

2023-06-06 16:41:01

by Mike Kravetz

[permalink] [raw]

Subject: Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order

On 06/06/23 10:32, Tarun Sahu wrote:
>
> Hi Mike,
>
> Thanks for your inputs.
> I wanted to know if you find it okay, Can I send it again adding your Reviewed-by?

Hi Tarun,

Just a few more comments/questions.

On 05/15/23 22:38, Tarun Sahu wrote:
> folio_set_order(folio, 0) is used in kernel at two places
> __destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
> Currently, It is called to clear out the folio->_folio_nr_pages and
> folio->_folio_order.
>
> For __destroy_compound_gigantic_folio:
> In past, folio_set_order(folio, 0) was needed because page->mapping used
> to overlap with _folio_nr_pages and _folio_order. So if these fields were
> left uncleared during freeing gigantic hugepages, they were causing
> "BUG: bad page state" due to non-zero page->mapping. Now, After
> Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
> CMA") page->mapping has explicitly been cleared out for tail pages. Also,
> _folio_order and _folio_nr_pages no longer overlaps with page->mapping.

I believe the same logic/reasoning as above also applies to
__prep_compound_gigantic_folio.
Why?
In __prep_compound_gigantic_folio we only call folio_set_order(folio, 0)
in the case of error. If __prep_compound_gigantic_folio fails, the caller
will then call free_gigantic_folio() on the "gigantic page". However, it is
not really a gigantic at this point in time, and we are simply calling
cma_release() or free_contig_range().
The end result is that I do not believe the existing call to
folio_set_order(folio, 0) in __prep_compound_gigantic_folio is actually
required. ???

If my reasoning above is correct, then we could just have one patch to
remove the folio_set_order(folio, 0) calls and remove special casing for
order 0 in folio_set_order.

However, I still believe your restructuring of __prep_compound_gigantic_folio,
is of value. I do not believe there is an issue as questioned by Matthew. My
reasoning has been stated previously. We could make changes like the following
to retain the same order of operations in __prep_compound_gigantic_folio and
totally avoid Matthew's question. Totally untested.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ea24718db4af..a54fee663cb1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1950,10 +1950,8 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
int nr_pages = 1 << order;
struct page *p;

- __folio_clear_reserved(folio);
- __folio_set_head(folio);
/* we rely on prep_new_hugetlb_folio to set the destructor */
- folio_set_order(folio, order);
+
for (i = 0; i < nr_pages; i++) {
p = folio_page(folio, i);

@@ -1969,7 +1967,7 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
* on the head page when they need know if put_page() is needed
* after get_user_pages().
*/
- if (i != 0) /* head page cleared above */
+ if (i != 0) /* head page cleared below */
__ClearPageReserved(p);
/*
* Subtle and very unlikely
@@ -1996,8 +1994,14 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
} else {
VM_BUG_ON_PAGE(page_count(p), p);
}
- if (i != 0)
+
+ if (i == 0) {
+ __folio_clear_reserved(folio);
+ __folio_set_head(folio);
+ folio_set_order(folio, order);
+ } else {
set_compound_head(p, &folio->page);
+ }
}
atomic_set(&folio->_entire_mapcount, -1);
atomic_set(&folio->_nr_pages_mapped, 0);
@@ -2017,7 +2021,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
p = folio_page(folio, j);
__ClearPageReserved(p);
}
- folio_set_order(folio, 0);
__folio_clear_head(folio);
return false;
}

>
> struct page {
> ...
> struct address_space * mapping; /* 24 8 */
> ...
> }
>
> struct folio {
> ...
> union {
> struct {
> long unsigned int _flags_1; /* 64 8 */
> long unsigned int _head_1; /* 72 8 */
> unsigned char _folio_dtor; /* 80 1 */
> unsigned char _folio_order; /* 81 1 */
>
> /* XXX 2 bytes hole, try to pack */
>
> atomic_t _entire_mapcount; /* 84 4 */
> atomic_t _nr_pages_mapped; /* 88 4 */
> atomic_t _pincount; /* 92 4 */
> unsigned int _folio_nr_pages; /* 96 4 */
> }; /* 64 40 */
> struct page __page_1 __attribute__((__aligned__(8))); /* 64 64 */
> }
> ...
> }

I do not think the copy of page/folio definitions adds much value to the
commit message.

--
Mike Kravetz

2023-06-08 10:17:39

by Tarun Sahu

[permalink] [raw]

Subject: Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order

Hi Mike,

Please find my comments inline.

Mike Kravetz <[email protected]> writes:

> On 06/06/23 10:32, Tarun Sahu wrote:
>>
>> Hi Mike,
>>
>> Thanks for your inputs.
>> I wanted to know if you find it okay, Can I send it again adding your Reviewed-by?
>
> Hi Tarun,
>
> Just a few more comments/questions.
>
> On 05/15/23 22:38, Tarun Sahu wrote:
>> folio_set_order(folio, 0) is used in kernel at two places
>> __destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
>> Currently, It is called to clear out the folio->_folio_nr_pages and
>> folio->_folio_order.
>>
>> For __destroy_compound_gigantic_folio:
>> In past, folio_set_order(folio, 0) was needed because page->mapping used
>> to overlap with _folio_nr_pages and _folio_order. So if these fields were
>> left uncleared during freeing gigantic hugepages, they were causing
>> "BUG: bad page state" due to non-zero page->mapping. Now, After
>> Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
>> CMA") page->mapping has explicitly been cleared out for tail pages. Also,
>> _folio_order and _folio_nr_pages no longer overlaps with page->mapping.
>
> I believe the same logic/reasoning as above also applies to
> __prep_compound_gigantic_folio.
> Why?
> In __prep_compound_gigantic_folio we only call folio_set_order(folio, 0)
> in the case of error. If __prep_compound_gigantic_folio fails, the caller
> will then call free_gigantic_folio() on the "gigantic page". However, it is
> not really a gigantic at this point in time, and we are simply calling
> cma_release() or free_contig_range().
> The end result is that I do not believe the existing call to
> folio_set_order(folio, 0) in __prep_compound_gigantic_folio is actually
> required. ???
No, there is a difference. IIUC, __destroy_compound_gigantic_folio
explicitly reset page->mapping for each page of compound page which
makes sure, even if in future some fields of struct page/folio overlaps
with page->mapping, it won't cause `BUG: bad page state` error. But If we
just remove folio_set_order(folio, 0) from __prep_compound_gigantic_folio
without moving folio_set_order(folio, order), this will cause extra
maintenance overhead to track if page->_folio_order overlaps with
page->mapping everytime struct page fields are changed. As in case of
overlapping page->mapping will be non-zero. IMHO, To avoid it,
moving the folio_set_order(folio, order) after all error checks are
done on tail pages. So, _folio_order is either set on success and not
set in case of error. (which is the original proposal). But for
folio_set_head, I agree the way you suggested below.

WDYT?

>
> If my reasoning above is correct, then we could just have one patch to
> remove the folio_set_order(folio, 0) calls and remove special casing for
> order 0 in folio_set_order.
>
> However, I still believe your restructuring of __prep_compound_gigantic_folio,
> is of value. I do not believe there is an issue as questioned by Matthew. My
> reasoning has been stated previously. We could make changes like the following
> to retain the same order of operations in __prep_compound_gigantic_folio and
> totally avoid Matthew's question. Totally untested.
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ea24718db4af..a54fee663cb1 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1950,10 +1950,8 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> int nr_pages = 1 << order;
> struct page *p;
>
> - __folio_clear_reserved(folio);
> - __folio_set_head(folio);
> /* we rely on prep_new_hugetlb_folio to set the destructor */
> - folio_set_order(folio, order);
> +
> for (i = 0; i < nr_pages; i++) {
> p = folio_page(folio, i);
>
> @@ -1969,7 +1967,7 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> * on the head page when they need know if put_page() is needed
> * after get_user_pages().
> */
> - if (i != 0) /* head page cleared above */
> + if (i != 0) /* head page cleared below */
> __ClearPageReserved(p);
> /*
> * Subtle and very unlikely
> @@ -1996,8 +1994,14 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> } else {
> VM_BUG_ON_PAGE(page_count(p), p);
> }
> - if (i != 0)
> +
> + if (i == 0) {
> + __folio_clear_reserved(folio);
> + __folio_set_head(folio);
> + folio_set_order(folio, order);
With folio_set_head, I agree to this, But does not feel good with
folio_set_order as per my above reasoning. WDYT?

> + } else {
> set_compound_head(p, &folio->page);
> + }
> }
> atomic_set(&folio->_entire_mapcount, -1);
> atomic_set(&folio->_nr_pages_mapped, 0);
> @@ -2017,7 +2021,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> p = folio_page(folio, j);
> __ClearPageReserved(p);
> }
> - folio_set_order(folio, 0);
> __folio_clear_head(folio);
> return false;
> }
>
>
>>
>> struct page {
>> ...
>> struct address_space * mapping; /* 24 8 */
>> ...
>> }
>>
>> struct folio {
>> ...
>> union {
>> struct {
>> long unsigned int _flags_1; /* 64 8 */
>> long unsigned int _head_1; /* 72 8 */
>> unsigned char _folio_dtor; /* 80 1 */
>> unsigned char _folio_order; /* 81 1 */
>>
>> /* XXX 2 bytes hole, try to pack */
>>
>> atomic_t _entire_mapcount; /* 84 4 */
>> atomic_t _nr_pages_mapped; /* 88 4 */
>> atomic_t _pincount; /* 92 4 */
>> unsigned int _folio_nr_pages; /* 96 4 */
>> }; /* 64 40 */
>> struct page __page_1 __attribute__((__aligned__(8))); /* 64 64 */
>> }
>> ...
>> }
>
> I do not think the copy of page/folio definitions adds much value to the
> commit message.
Yeah, Will remove it.
>
> --
> Mike Kravetz

2023-06-09 00:14:55

by Mike Kravetz

[permalink] [raw]

Subject: Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order

On 06/08/23 15:33, Tarun Sahu wrote:
> Hi Mike,
>
> Please find my comments inline.
>
> Mike Kravetz <[email protected]> writes:
>
> > On 06/06/23 10:32, Tarun Sahu wrote:
> >>
> >> Hi Mike,
> >>
> >> Thanks for your inputs.
> >> I wanted to know if you find it okay, Can I send it again adding your Reviewed-by?
> >
> > Hi Tarun,
> >
> > Just a few more comments/questions.
> >
> > On 05/15/23 22:38, Tarun Sahu wrote:
> >> folio_set_order(folio, 0) is used in kernel at two places
> >> __destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
> >> Currently, It is called to clear out the folio->_folio_nr_pages and
> >> folio->_folio_order.
> >>
> >> For __destroy_compound_gigantic_folio:
> >> In past, folio_set_order(folio, 0) was needed because page->mapping used
> >> to overlap with _folio_nr_pages and _folio_order. So if these fields were
> >> left uncleared during freeing gigantic hugepages, they were causing
> >> "BUG: bad page state" due to non-zero page->mapping. Now, After
> >> Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
> >> CMA") page->mapping has explicitly been cleared out for tail pages. Also,
> >> _folio_order and _folio_nr_pages no longer overlaps with page->mapping.
> >
> > I believe the same logic/reasoning as above also applies to
> > __prep_compound_gigantic_folio.
> > Why?
> > In __prep_compound_gigantic_folio we only call folio_set_order(folio, 0)
> > in the case of error. If __prep_compound_gigantic_folio fails, the caller
> > will then call free_gigantic_folio() on the "gigantic page". However, it is
> > not really a gigantic at this point in time, and we are simply calling
> > cma_release() or free_contig_range().
> > The end result is that I do not believe the existing call to
> > folio_set_order(folio, 0) in __prep_compound_gigantic_folio is actually
> > required. ???
> No, there is a difference. IIUC, __destroy_compound_gigantic_folio
> explicitly reset page->mapping for each page of compound page which
> makes sure, even if in future some fields of struct page/folio overlaps
> with page->mapping, it won't cause `BUG: bad page state` error. But If we
> just remove folio_set_order(folio, 0) from __prep_compound_gigantic_folio
> without moving folio_set_order(folio, order), this will cause extra
> maintenance overhead to track if page->_folio_order overlaps with
> page->mapping everytime struct page fields are changed. As in case of
> overlapping page->mapping will be non-zero. IMHO, To avoid it,
> moving the folio_set_order(folio, order) after all error checks are
> done on tail pages. So, _folio_order is either set on success and not
> set in case of error. (which is the original proposal). But for
> folio_set_head, I agree the way you suggested below.
>
> WDYT?

Right. It is more 'future proof' to only set folio order on success as
done in your original patch.

> >
> > If my reasoning above is correct, then we could just have one patch to
> > remove the folio_set_order(folio, 0) calls and remove special casing for
> > order 0 in folio_set_order.
> >
> > However, I still believe your restructuring of __prep_compound_gigantic_folio,
> > is of value. I do not believe there is an issue as questioned by Matthew. My
> > reasoning has been stated previously. We could make changes like the following
> > to retain the same order of operations in __prep_compound_gigantic_folio and
> > totally avoid Matthew's question. Totally untested.
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index ea24718db4af..a54fee663cb1 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1950,10 +1950,8 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> > int nr_pages = 1 << order;
> > struct page *p;
> >
> > - __folio_clear_reserved(folio);
> > - __folio_set_head(folio);
> > /* we rely on prep_new_hugetlb_folio to set the destructor */
> > - folio_set_order(folio, order);
> > +
> > for (i = 0; i < nr_pages; i++) {
> > p = folio_page(folio, i);
> >
> > @@ -1969,7 +1967,7 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> > * on the head page when they need know if put_page() is needed
> > * after get_user_pages().
> > */
> > - if (i != 0) /* head page cleared above */
> > + if (i != 0) /* head page cleared below */
> > __ClearPageReserved(p);
> > /*
> > * Subtle and very unlikely
> > @@ -1996,8 +1994,14 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> > } else {
> > VM_BUG_ON_PAGE(page_count(p), p);
> > }
> > - if (i != 0)
> > +
> > + if (i == 0) {
> > + __folio_clear_reserved(folio);
> > + __folio_set_head(folio);
> > + folio_set_order(folio, order);
> With folio_set_head, I agree to this, But does not feel good with
> folio_set_order as per my above reasoning. WDYT?

Agree with your reasoning. We should just move __folio_set_head and
folio_set_order after the loop as you originally suggested.

>
> > + } else {
> > set_compound_head(p, &folio->page);
> > + }
> > }
> > atomic_set(&folio->_entire_mapcount, -1);
> > atomic_set(&folio->_nr_pages_mapped, 0);
> > @@ -2017,7 +2021,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> > p = folio_page(folio, j);
> > __ClearPageReserved(p);
> > }
> > - folio_set_order(folio, 0);
> > __folio_clear_head(folio);
> > return false;
> > }
> >
> >
> >>
> >> struct page {
> >> ...
> >> struct address_space * mapping; /* 24 8 */
> >> ...
> >> }
> >>
> >> struct folio {
> >> ...
> >> union {
> >> struct {
> >> long unsigned int _flags_1; /* 64 8 */
> >> long unsigned int _head_1; /* 72 8 */
> >> unsigned char _folio_dtor; /* 80 1 */
> >> unsigned char _folio_order; /* 81 1 */
> >>
> >> /* XXX 2 bytes hole, try to pack */
> >>
> >> atomic_t _entire_mapcount; /* 84 4 */
> >> atomic_t _nr_pages_mapped; /* 88 4 */
> >> atomic_t _pincount; /* 92 4 */
> >> unsigned int _folio_nr_pages; /* 96 4 */
> >> }; /* 64 40 */
> >> struct page __page_1 __attribute__((__aligned__(8))); /* 64 64 */
> >> }
> >> ...
> >> }
> >
> > I do not think the copy of page/folio definitions adds much value to the
> > commit message.
> Yeah, Will remove it.
> >

I think we are finally on the same page. I am good with this v2 patch.
Only change is to update commit message to remove the definitions.
--
Mike Kravetz