2021-07-10 00:27:03

by Mike Kravetz

[permalink] [raw]
Subject: [PATCH 0/3] hugetlb: fix potential ref counting races

When Muchun Song brought up a potential issue with hugetlb ref counting[1],
I started looking closer at the code. hugetlbfs is the only code with it's
own specialized compound page destructor and taking special action when ref
counts drop to zero. Potential races happen in this unique handling of ref
counts. The following patches address these races when creating and
destroying hugetlb pages.

These potential races have likely existed since the creation of
hugetlbfs. They certainly have been around for more than 10 years.
However, I am unaware of anyone actually hitting these races. It is
VERY unlikely than anyone will actually hit these races, but they do
exist.

I could not think of an easy (or difficult) way to force these races.
Therefore, testing consisted of adding code to randomly increase ref
counts in strategic places. In this way, I was able to exercise all the
race handling code paths.

[1] https://lore.kernel.org/linux-mm/CAMZfGtVMn3daKrJwZMaVOGOaJU+B4dS--x_oPmGQMD=c=QNGEg@mail.gmail.com/

Mike Kravetz (3):
hugetlb: simplify prep_compound_gigantic_page ref count racing code
hugetlb: drop ref count earlier after page allocation
hugetlb: before freeing hugetlb page set dtor to appropriate value

mm/hugetlb.c | 137 ++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 104 insertions(+), 33 deletions(-)

--
2.31.1


2021-07-10 00:27:38

by Mike Kravetz

[permalink] [raw]
Subject: [PATCH 1/3] hugetlb: simplify prep_compound_gigantic_page ref count racing code

Code in prep_compound_gigantic_page waits for a rcu grace period if it
notices a temporarily inflated ref count on a tail page. This was due
to the identified potential race with speculative page cache references
which could only last for a rcu grace period. This is overly complicated
as this situation is VERY unlikely to ever happen. Instead, just quickly
return an error.

Also, only print a warning in prep_compound_gigantic_page instead of
multiple callers.

Signed-off-by: Mike Kravetz <[email protected]>
---
mm/hugetlb.c | 15 +++++----------
1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 924553aa8f78..e59ebba63da7 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1657,16 +1657,12 @@ static bool prep_compound_gigantic_page(struct page *page, unsigned int order)
* cache adding could take a ref on a 'to be' tail page.
* We need to respect any increased ref count, and only set
* the ref count to zero if count is currently 1. If count
- * is not 1, we call synchronize_rcu in the hope that a rcu
- * grace period will cause ref count to drop and then retry.
- * If count is still inflated on retry we return an error and
- * must discard the pages.
+ * is not 1, we return an error and caller must discard the
+ * pages.
*/
if (!page_ref_freeze(p, 1)) {
- pr_info("HugeTLB unexpected inflated ref count on freshly allocated page\n");
- synchronize_rcu();
- if (!page_ref_freeze(p, 1))
- goto out_error;
+ pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n");
+ goto out_error;
}
set_page_count(p, 0);
set_compound_head(p, page);
@@ -1830,7 +1826,6 @@ static struct page *alloc_fresh_huge_page(struct hstate *h,
retry = true;
goto retry;
}
- pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n");
return NULL;
}
}
@@ -2828,8 +2823,8 @@ static void __init gather_bootmem_prealloc(void)
prep_new_huge_page(h, page, page_to_nid(page));
put_page(page); /* add to the hugepage allocator */
} else {
+ /* VERY unlikely inflated ref count on a tail page */
free_gigantic_page(page, huge_page_order(h));
- pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n");
}

/*
--
2.31.1

2021-07-13 06:33:28

by Muchun Song

[permalink] [raw]
Subject: Re: [External] [PATCH 1/3] hugetlb: simplify prep_compound_gigantic_page ref count racing code

On Sat, Jul 10, 2021 at 8:25 AM Mike Kravetz <[email protected]> wrote:
>
> Code in prep_compound_gigantic_page waits for a rcu grace period if it
> notices a temporarily inflated ref count on a tail page. This was due
> to the identified potential race with speculative page cache references
> which could only last for a rcu grace period. This is overly complicated
> as this situation is VERY unlikely to ever happen. Instead, just quickly
> return an error.

Right. The race is very very small. IMHO, that does not complicate
the code is the right thing to do.

>
> Also, only print a warning in prep_compound_gigantic_page instead of
> multiple callers.
>
> Signed-off-by: Mike Kravetz <[email protected]>
> ---
> mm/hugetlb.c | 15 +++++----------
> 1 file changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 924553aa8f78..e59ebba63da7 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1657,16 +1657,12 @@ static bool prep_compound_gigantic_page(struct page *page, unsigned int order)
> * cache adding could take a ref on a 'to be' tail page.
> * We need to respect any increased ref count, and only set
> * the ref count to zero if count is currently 1. If count
> - * is not 1, we call synchronize_rcu in the hope that a rcu
> - * grace period will cause ref count to drop and then retry.
> - * If count is still inflated on retry we return an error and
> - * must discard the pages.
> + * is not 1, we return an error and caller must discard the
> + * pages.

Shall we add more details about why we discard the pages?

Thanks.

> */
> if (!page_ref_freeze(p, 1)) {
> - pr_info("HugeTLB unexpected inflated ref count on freshly allocated page\n");
> - synchronize_rcu();
> - if (!page_ref_freeze(p, 1))
> - goto out_error;
> + pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n");
> + goto out_error;
> }
> set_page_count(p, 0);
> set_compound_head(p, page);
> @@ -1830,7 +1826,6 @@ static struct page *alloc_fresh_huge_page(struct hstate *h,
> retry = true;
> goto retry;
> }
> - pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n");
> return NULL;
> }
> }
> @@ -2828,8 +2823,8 @@ static void __init gather_bootmem_prealloc(void)
> prep_new_huge_page(h, page, page_to_nid(page));
> put_page(page); /* add to the hugepage allocator */
> } else {
> + /* VERY unlikely inflated ref count on a tail page */
> free_gigantic_page(page, huge_page_order(h));
> - pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n");
> }
>
> /*
> --
> 2.31.1
>

2021-07-27 23:43:17

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH 0/3] hugetlb: fix potential ref counting races

Any additional comments on these patches/this approach?

The first patch addressing this issue actually went into the 5.14 merge
window as commit 7118fc2906e2 ("hugetlb: address ref count racing in
prep_compound_gigantic_page").

All this code is very tricky and subtle. It addresses potential issues
discovered by code analysis. I do not believe the races have ever been
experienced in practice. If anyone has suggestions for a simpler or
alternative approach, I would love to hear them.
--
Mike Kravetz


On 7/9/21 5:24 PM, Mike Kravetz wrote:
> When Muchun Song brought up a potential issue with hugetlb ref counting[1],
> I started looking closer at the code. hugetlbfs is the only code with it's
> own specialized compound page destructor and taking special action when ref
> counts drop to zero. Potential races happen in this unique handling of ref
> counts. The following patches address these races when creating and
> destroying hugetlb pages.
>
> These potential races have likely existed since the creation of
> hugetlbfs. They certainly have been around for more than 10 years.
> However, I am unaware of anyone actually hitting these races. It is
> VERY unlikely than anyone will actually hit these races, but they do
> exist.
>
> I could not think of an easy (or difficult) way to force these races.
> Therefore, testing consisted of adding code to randomly increase ref
> counts in strategic places. In this way, I was able to exercise all the
> race handling code paths.
>
> [1] https://lore.kernel.org/linux-mm/CAMZfGtVMn3daKrJwZMaVOGOaJU+B4dS--x_oPmGQMD=c=QNGEg@mail.gmail.com/
>
> Mike Kravetz (3):
> hugetlb: simplify prep_compound_gigantic_page ref count racing code
> hugetlb: drop ref count earlier after page allocation
> hugetlb: before freeing hugetlb page set dtor to appropriate value
>
> mm/hugetlb.c | 137 ++++++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 104 insertions(+), 33 deletions(-)
>

2021-07-28 04:06:08

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH 0/3] hugetlb: fix potential ref counting races

On Wed, Jul 28, 2021 at 7:41 AM Mike Kravetz <[email protected]> wrote:
>
> Any additional comments on these patches/this approach?
>
> The first patch addressing this issue actually went into the 5.14 merge
> window as commit 7118fc2906e2 ("hugetlb: address ref count racing in
> prep_compound_gigantic_page").
>
> All this code is very tricky and subtle. It addresses potential issues
> discovered by code analysis. I do not believe the races have ever been
> experienced in practice. If anyone has suggestions for a simpler or
> alternative approach, I would love to hear them.

Hi Mike,

I agree with you that this code is very tricky and subtle. I have looked
at this patch set. For me, I cannot figure out a better solution for this
race.

--
Thanks,
Muchun

> --
> Mike Kravetz
>
>
> On 7/9/21 5:24 PM, Mike Kravetz wrote:
> > When Muchun Song brought up a potential issue with hugetlb ref counting[1],
> > I started looking closer at the code. hugetlbfs is the only code with it's
> > own specialized compound page destructor and taking special action when ref
> > counts drop to zero. Potential races happen in this unique handling of ref
> > counts. The following patches address these races when creating and
> > destroying hugetlb pages.
> >
> > These potential races have likely existed since the creation of
> > hugetlbfs. They certainly have been around for more than 10 years.
> > However, I am unaware of anyone actually hitting these races. It is
> > VERY unlikely than anyone will actually hit these races, but they do
> > exist.
> >
> > I could not think of an easy (or difficult) way to force these races.
> > Therefore, testing consisted of adding code to randomly increase ref
> > counts in strategic places. In this way, I was able to exercise all the
> > race handling code paths.
> >
> > [1] https://lore.kernel.org/linux-mm/CAMZfGtVMn3daKrJwZMaVOGOaJU+B4dS--x_oPmGQMD=c=QNGEg@mail.gmail.com/
> >
> > Mike Kravetz (3):
> > hugetlb: simplify prep_compound_gigantic_page ref count racing code
> > hugetlb: drop ref count earlier after page allocation
> > hugetlb: before freeing hugetlb page set dtor to appropriate value
> >
> > mm/hugetlb.c | 137 ++++++++++++++++++++++++++++++++++++++-------------
> > 1 file changed, 104 insertions(+), 33 deletions(-)
> >