2023-05-15 17:33:16

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order

On Mon, May 15, 2023 at 10:38:09PM +0530, Tarun Sahu wrote:
> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> struct page *p;
>
> __folio_clear_reserved(folio);
> - __folio_set_head(folio);
> - /* we rely on prep_new_hugetlb_folio to set the destructor */
> - folio_set_order(folio, order);
> for (i = 0; i < nr_pages; i++) {
> p = folio_page(folio, i);
>
> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> if (i != 0)
> set_compound_head(p, &folio->page);
> }
> + __folio_set_head(folio);
> + /* we rely on prep_new_hugetlb_folio to set the destructor */
> + folio_set_order(folio, order);

This makes me nervous, as I said before. This means that
compound_head(tail) can temporarily point to a page which is not marked
as a head page. That's different from prep_compound_page(). You need to
come up with some good argumentation for why this is safe, and no amount
of testing you do can replace it -- any race in this area will be subtle.


2023-05-15 18:49:18

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order

On 05/15/23 18:16, Matthew Wilcox wrote:
> On Mon, May 15, 2023 at 10:38:09PM +0530, Tarun Sahu wrote:
> > @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> > struct page *p;
> >
> > __folio_clear_reserved(folio);
> > - __folio_set_head(folio);
> > - /* we rely on prep_new_hugetlb_folio to set the destructor */
> > - folio_set_order(folio, order);
> > for (i = 0; i < nr_pages; i++) {
> > p = folio_page(folio, i);
> >
> > @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> > if (i != 0)
> > set_compound_head(p, &folio->page);
> > }
> > + __folio_set_head(folio);
> > + /* we rely on prep_new_hugetlb_folio to set the destructor */
> > + folio_set_order(folio, order);
>
> This makes me nervous, as I said before. This means that
> compound_head(tail) can temporarily point to a page which is not marked
> as a head page. That's different from prep_compound_page(). You need to
> come up with some good argumentation for why this is safe, and no amount
> of testing you do can replace it -- any race in this area will be subtle.

I added comments supporting this approach in the first version of the patch.
My argument was that this is actually safer than the existing code. That is
because we freeze the page (ref count 0) before setting compound_head(tail).
So, nobody should be taking any speculative refs on those tail pages.

In the existing code, we set the compound page order in the head before
freezing the head or any tail pages. Therefore, speculative refs can be
taken on any of the pages while in this state.

If we want prep_compound_gigantic_folio to work like prep_compound_page
we would need to take two passes through the pages. In the first pass,
freeze all the pages and in the second set up the compound page.
--
Mike Kravetz

2023-05-16 13:58:06

by Tarun Sahu

[permalink] [raw]
Subject: Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order

Hi Mathew,

Matthew Wilcox <[email protected]> writes:

> On Mon, May 15, 2023 at 10:38:09PM +0530, Tarun Sahu wrote:
>> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>> struct page *p;
>>
>> __folio_clear_reserved(folio);
>> - __folio_set_head(folio);
>> - /* we rely on prep_new_hugetlb_folio to set the destructor */
>> - folio_set_order(folio, order);
>> for (i = 0; i < nr_pages; i++) {
>> p = folio_page(folio, i);
>>
>> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>> if (i != 0)
>> set_compound_head(p, &folio->page);
>> }
>> + __folio_set_head(folio);
>> + /* we rely on prep_new_hugetlb_folio to set the destructor */
>> + folio_set_order(folio, order);
>
> This makes me nervous, as I said before. This means that
> compound_head(tail) can temporarily point to a page which is not marked
> as a head page. That's different from prep_compound_page(). You need to
> come up with some good argumentation for why this is safe, and no amount
> of testing you do can replace it -- any race in this area will be subtle.

IIUC, I am certain that it is safe to move these calls and agree with what
Mike said. Here is my reasoning:

When we get pages from CMA allocator for gigantic folio, page refcount
for each pages is 1.
page_cache_get_speculative (now folio_try_get_rcu) can take reference to
any of these pages before prep_compound_gigantic_folio explicitly freeze
refcount of these pages. With this race condition there are 2 possible situation.

...
if (!demote) {
if (!page_ref_freeze(p, 1)) {
pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n");
goto out_error;
}
} else {
VM_BUG_ON_PAGE(page_count(p), p);
}
if (i != 0)
set_compound_head(p, &folio->page);
}
...

1. In the current code, before freezing refcount of nth (hence, n+th)
tail page, folio_try_get_rcu might try to take nth tail page reference,
so refcount will be increased of the nth tail page not the head page
(as compound head is not yet set for nth tail page). and once this
happens, nth iteration of loop will cause error and
prep_compound_gigantic_folio will fail.

So, setting the PG_head at the starting of for-loop or at the end won't
have any difference to this flow.

2. If reference for the head page is taken by folio_try_get_rcu before
freezing it, prep_compound_gigantic_page will fail, but before PG_head
and folio_order of head page is cleared in error path, the caller of
folio_try_get_rcu path will find that this page is head page and might
try to operate on its tail pages while these tail pages are invalid.

Hence, It will be safer if we call __folio_set_head and folio_set_order
after freezing the tail page refcount.

~Tarun

2023-06-03 00:53:50

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order

On 05/15/23 10:45, Mike Kravetz wrote:
> On 05/15/23 18:16, Matthew Wilcox wrote:
> > On Mon, May 15, 2023 at 10:38:09PM +0530, Tarun Sahu wrote:
> > > @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> > > struct page *p;
> > >
> > > __folio_clear_reserved(folio);
> > > - __folio_set_head(folio);
> > > - /* we rely on prep_new_hugetlb_folio to set the destructor */
> > > - folio_set_order(folio, order);
> > > for (i = 0; i < nr_pages; i++) {
> > > p = folio_page(folio, i);
> > >
> > > @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> > > if (i != 0)
> > > set_compound_head(p, &folio->page);
> > > }
> > > + __folio_set_head(folio);
> > > + /* we rely on prep_new_hugetlb_folio to set the destructor */
> > > + folio_set_order(folio, order);
> >
> > This makes me nervous, as I said before. This means that
> > compound_head(tail) can temporarily point to a page which is not marked
> > as a head page. That's different from prep_compound_page(). You need to
> > come up with some good argumentation for why this is safe, and no amount
> > of testing you do can replace it -- any race in this area will be subtle.

We could continue to set up the head page first as in the current code,
but we need to move the freezing of that page outside the loop. That is
better then the existing code, however I am not sure if it is any better
than what is proposed here. I still believe my reasoning below as to
why this proposal is better then the existing code is correct.

Also, that 'folio_set_order(folio, 0)' only exists in the error path of
the current code. I am not sure if it is actually needed. Why? Right
after returning an error, the pages associated with the gigantic page
will be freed. This is similar to the reason why it can be removed in
__destroy_compound_gigantic_folio.

> I added comments supporting this approach in the first version of the patch.
> My argument was that this is actually safer than the existing code. That is
> because we freeze the page (ref count 0) before setting compound_head(tail).
> So, nobody should be taking any speculative refs on those tail pages.
>
> In the existing code, we set the compound page order in the head before
> freezing the head or any tail pages. Therefore, speculative refs can be
> taken on any of the pages while in this state.
>
> If we want prep_compound_gigantic_folio to work like prep_compound_page
> we would need to take two passes through the pages. In the first pass,
> freeze all the pages and in the second set up the compound page.

--
Mike Kravetz