On Tue, Sep 05, 2023 at 02:44:00PM -0700, Mike Kravetz wrote:
> Currently, vmemmap optimization of hugetlb pages is performed before the
> hugetlb flag (previously hugetlb destructor) is set identifying it as a
> hugetlb folio. This means there is a window of time where an ordinary
> folio does not have all associated vmemmap present. The core mm only
> expects vmemmap to be potentially optimized for hugetlb and device dax.
> This can cause problems in code such as memory error handling that may
> want to write to tail struct pages.
>
> There is only one call to perform hugetlb vmemmap optimization today.
> To fix this issue, simply set the hugetlb flag before that call.
>
> There was a similar issue in the free hugetlb path that was previously
> addressed. The two routines that optimize or restore hugetlb vmemmap
> should only be passed hugetlb folios/pages. To catch any callers not
> following this rule, add VM_WARN_ON calls to the routines. In the
> hugetlb free code paths, some calls could be made to restore vmemmap
> after clearing the hugetlb flag. This was 'safe' as in these cases
> vmemmap was already present and the call was a NOOP. However, for
> consistency these calls where eliminated so that we can add the
> VM_WARN_ON checks.
>
> Fixes: f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page")
> Signed-off-by: Mike Kravetz <[email protected]>
I saw that VM_WARN_ON_ONCE() in hugetlb_vmemmap_restore is triggered when
memory_failure() is called on a free hugetlb page with vmemmap optimization
disabled (the warning is not triggered if vmemmap optimization is enabled).
I think that we need check folio_test_hugetlb() before dissolve_free_huge_page()
calls hugetlb_vmemmap_restore_folio().
Could you consider adding some diff like below?
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2312,15 +2312,16 @@ int dissolve_free_huge_page(struct page *page)
* Attempt to allocate vmemmmap here so that we can take
* appropriate action on failure.
*/
- rc = hugetlb_vmemmap_restore_folio(h, folio);
- if (!rc) {
- update_and_free_hugetlb_folio(h, folio, false);
- } else {
- spin_lock_irq(&hugetlb_lock);
- add_hugetlb_folio(h, folio, false);
- h->max_huge_pages++;
- spin_unlock_irq(&hugetlb_lock);
+ if (folio_test_hugetlb(folio)) {
+ rc = hugetlb_vmemmap_restore_folio(h, folio);
+ if (rc) {
+ spin_lock_irq(&hugetlb_lock);
+ add_hugetlb_folio(h, folio, false);
+ h->max_huge_pages++;
+ goto out;
+ }
}
+ update_and_free_hugetlb_folio(h, folio, false);
return rc;
}
Thanks,
Naoya Horiguchi
On 10/13/23 21:58, Naoya Horiguchi wrote:
> On Tue, Sep 05, 2023 at 02:44:00PM -0700, Mike Kravetz wrote:
> > Currently, vmemmap optimization of hugetlb pages is performed before the
> > hugetlb flag (previously hugetlb destructor) is set identifying it as a
> > hugetlb folio. This means there is a window of time where an ordinary
> > folio does not have all associated vmemmap present. The core mm only
> > expects vmemmap to be potentially optimized for hugetlb and device dax.
> > This can cause problems in code such as memory error handling that may
> > want to write to tail struct pages.
> >
> > There is only one call to perform hugetlb vmemmap optimization today.
> > To fix this issue, simply set the hugetlb flag before that call.
> >
> > There was a similar issue in the free hugetlb path that was previously
> > addressed. The two routines that optimize or restore hugetlb vmemmap
> > should only be passed hugetlb folios/pages. To catch any callers not
> > following this rule, add VM_WARN_ON calls to the routines. In the
> > hugetlb free code paths, some calls could be made to restore vmemmap
> > after clearing the hugetlb flag. This was 'safe' as in these cases
> > vmemmap was already present and the call was a NOOP. However, for
> > consistency these calls where eliminated so that we can add the
> > VM_WARN_ON checks.
> >
> > Fixes: f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page")
> > Signed-off-by: Mike Kravetz <[email protected]>
>
> I saw that VM_WARN_ON_ONCE() in hugetlb_vmemmap_restore is triggered when
> memory_failure() is called on a free hugetlb page with vmemmap optimization
> disabled (the warning is not triggered if vmemmap optimization is enabled).
> I think that we need check folio_test_hugetlb() before dissolve_free_huge_page()
> calls hugetlb_vmemmap_restore_folio().
>
> Could you consider adding some diff like below?
Thanks! That case was indeed overlooked.
Andrew, this patch is currently in mm-stable. How would you like to update?
- A new version of the patch
- An patch to the original patch
- Something else
--
Mike Kravetz
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2312,15 +2312,16 @@ int dissolve_free_huge_page(struct page *page)
> * Attempt to allocate vmemmmap here so that we can take
> * appropriate action on failure.
> */
> - rc = hugetlb_vmemmap_restore_folio(h, folio);
> - if (!rc) {
> - update_and_free_hugetlb_folio(h, folio, false);
> - } else {
> - spin_lock_irq(&hugetlb_lock);
> - add_hugetlb_folio(h, folio, false);
> - h->max_huge_pages++;
> - spin_unlock_irq(&hugetlb_lock);
> + if (folio_test_hugetlb(folio)) {
> + rc = hugetlb_vmemmap_restore_folio(h, folio);
> + if (rc) {
> + spin_lock_irq(&hugetlb_lock);
> + add_hugetlb_folio(h, folio, false);
> + h->max_huge_pages++;
> + goto out;
> + }
> }
> + update_and_free_hugetlb_folio(h, folio, false);
>
> return rc;
> }
>
>
> Thanks,
> Naoya Horiguchi
On Fri, 13 Oct 2023 14:43:56 -0700 Mike Kravetz <[email protected]> wrote:
> > Could you consider adding some diff like below?
>
> Thanks! That case was indeed overlooked.
>
> Andrew, this patch is currently in mm-stable. How would you like to update?
> - A new version of the patch
> - An patch to the original patch
> - Something else
I guess just a fixup against what's currently in mm-stable, please. It
happens sometimes.
Please let's get the Fixes: accurate. I just had to rebase mm-stable
to drop Huang Ying's pcp auto-tuning series. I'm now seeing
d8f5f7e445f0 hugetlb: set hugetlb page flag before optimizing vmemmap
which I think is what it used to be, so the rebasing shouldn't affect
this patch's hash.
On 10/13/23 21:58, Naoya Horiguchi wrote:
> On Tue, Sep 05, 2023 at 02:44:00PM -0700, Mike Kravetz wrote:
> >
> > Fixes: f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page")
> > Signed-off-by: Mike Kravetz <[email protected]>
>
> I saw that VM_WARN_ON_ONCE() in hugetlb_vmemmap_restore is triggered when
> memory_failure() is called on a free hugetlb page with vmemmap optimization
> disabled (the warning is not triggered if vmemmap optimization is enabled).
> I think that we need check folio_test_hugetlb() before dissolve_free_huge_page()
> calls hugetlb_vmemmap_restore_folio().
>
> Could you consider adding some diff like below?
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2312,15 +2312,16 @@ int dissolve_free_huge_page(struct page *page)
> * Attempt to allocate vmemmmap here so that we can take
> * appropriate action on failure.
> */
> - rc = hugetlb_vmemmap_restore_folio(h, folio);
> - if (!rc) {
> - update_and_free_hugetlb_folio(h, folio, false);
> - } else {
> - spin_lock_irq(&hugetlb_lock);
> - add_hugetlb_folio(h, folio, false);
> - h->max_huge_pages++;
> - spin_unlock_irq(&hugetlb_lock);
> + if (folio_test_hugetlb(folio)) {
> + rc = hugetlb_vmemmap_restore_folio(h, folio);
> + if (rc) {
> + spin_lock_irq(&hugetlb_lock);
> + add_hugetlb_folio(h, folio, false);
> + h->max_huge_pages++;
> + goto out;
> + }
> }
> + update_and_free_hugetlb_folio(h, folio, false);
>
> return rc;
> }
>
Hi Naoya,
I believe we need to set 'rc = 0' in the !folio_test_hugetlb(). I put
together the following patch based on mm-stable. Please take a look.
From f19fbfab324d7d17de4a1e814f95ee655950c58e Mon Sep 17 00:00:00 2001
From: Mike Kravetz <[email protected]>
Date: Mon, 16 Oct 2023 19:55:49 -0700
Subject: [PATCH] hugetlb: check for hugetlb folio before vmemmap_restore
In commit d8f5f7e445f0 ("hugetlb: set hugetlb page flag before
optimizing vmemmap") checks were added to print a warning if
hugetlb_vmemmap_restore was called on a non-hugetlb page. This
was mostly due to ordering issues in the hugetlb page set up and
tear down sequencees. One place missed was the routine
dissolve_free_huge_page. Naoya Horiguchi noted: "I saw that
VM_WARN_ON_ONCE() in hugetlb_vmemmap_restore is triggered when
memory_failure() is called on a free hugetlb page with vmemmap
optimization disabled (the warning is not triggered if vmemmap
optimization is enabled). I think that we need check
folio_test_hugetlb() before dissolve_free_huge_page() calls
hugetlb_vmemmap_restore_folio()."
Perform the check as suggested by Naoya.
Fixes: d8f5f7e445f0 ("hugetlb: set hugetlb page flag before optimizing vmemmap")
Suggested-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Mike Kravetz <[email protected]>
---
mm/hugetlb.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 36b40bc9ac25..13736cbb2c19 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2290,17 +2290,23 @@ int dissolve_free_huge_page(struct page *page)
* need to adjust max_huge_pages if the page is not freed.
* Attempt to allocate vmemmmap here so that we can take
* appropriate action on failure.
+ *
+ * The folio_test_hugetlb check here is because
+ * remove_hugetlb_folio will clear hugetlb folio flag for
+ * non-vmemmap optimized hugetlb folios.
*/
- rc = hugetlb_vmemmap_restore(h, &folio->page);
- if (!rc) {
- update_and_free_hugetlb_folio(h, folio, false);
- } else {
- spin_lock_irq(&hugetlb_lock);
- add_hugetlb_folio(h, folio, false);
- h->max_huge_pages++;
- spin_unlock_irq(&hugetlb_lock);
- }
+ if (folio_test_hugetlb(folio)) {
+ rc = hugetlb_vmemmap_restore(h, &folio->page);
+ if (rc) {
+ spin_lock_irq(&hugetlb_lock);
+ add_hugetlb_folio(h, folio, false);
+ h->max_huge_pages++;
+ goto out;
+ }
+ } else
+ rc = 0;
+ update_and_free_hugetlb_folio(h, folio, false);
return rc;
}
out:
--
2.41.0
On Mon, Oct 16, 2023 at 08:21:40PM -0700, Mike Kravetz wrote:
> On 10/13/23 21:58, Naoya Horiguchi wrote:
> > On Tue, Sep 05, 2023 at 02:44:00PM -0700, Mike Kravetz wrote:
> > >
> > > Fixes: f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page")
> > > Signed-off-by: Mike Kravetz <[email protected]>
> >
> > I saw that VM_WARN_ON_ONCE() in hugetlb_vmemmap_restore is triggered when
> > memory_failure() is called on a free hugetlb page with vmemmap optimization
> > disabled (the warning is not triggered if vmemmap optimization is enabled).
> > I think that we need check folio_test_hugetlb() before dissolve_free_huge_page()
> > calls hugetlb_vmemmap_restore_folio().
> >
> > Could you consider adding some diff like below?
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -2312,15 +2312,16 @@ int dissolve_free_huge_page(struct page *page)
> > * Attempt to allocate vmemmmap here so that we can take
> > * appropriate action on failure.
> > */
> > - rc = hugetlb_vmemmap_restore_folio(h, folio);
> > - if (!rc) {
> > - update_and_free_hugetlb_folio(h, folio, false);
> > - } else {
> > - spin_lock_irq(&hugetlb_lock);
> > - add_hugetlb_folio(h, folio, false);
> > - h->max_huge_pages++;
> > - spin_unlock_irq(&hugetlb_lock);
> > + if (folio_test_hugetlb(folio)) {
> > + rc = hugetlb_vmemmap_restore_folio(h, folio);
> > + if (rc) {
> > + spin_lock_irq(&hugetlb_lock);
> > + add_hugetlb_folio(h, folio, false);
> > + h->max_huge_pages++;
> > + goto out;
> > + }
> > }
> > + update_and_free_hugetlb_folio(h, folio, false);
> >
> > return rc;
> > }
> >
>
> Hi Naoya,
>
> I believe we need to set 'rc = 0' in the !folio_test_hugetlb(). I put
> together the following patch based on mm-stable. Please take a look.
Hi Mike, the patch looks good to me, and I confirmed that my test programs
showed no new problem on latest mm-stable + this patch.
Thank you very much.
Naoya Horiguchi
>
> From f19fbfab324d7d17de4a1e814f95ee655950c58e Mon Sep 17 00:00:00 2001
> From: Mike Kravetz <[email protected]>
> Date: Mon, 16 Oct 2023 19:55:49 -0700
> Subject: [PATCH] hugetlb: check for hugetlb folio before vmemmap_restore
>
> In commit d8f5f7e445f0 ("hugetlb: set hugetlb page flag before
> optimizing vmemmap") checks were added to print a warning if
> hugetlb_vmemmap_restore was called on a non-hugetlb page. This
> was mostly due to ordering issues in the hugetlb page set up and
> tear down sequencees. One place missed was the routine
> dissolve_free_huge_page. Naoya Horiguchi noted: "I saw that
> VM_WARN_ON_ONCE() in hugetlb_vmemmap_restore is triggered when
> memory_failure() is called on a free hugetlb page with vmemmap
> optimization disabled (the warning is not triggered if vmemmap
> optimization is enabled). I think that we need check
> folio_test_hugetlb() before dissolve_free_huge_page() calls
> hugetlb_vmemmap_restore_folio()."
>
> Perform the check as suggested by Naoya.
>
> Fixes: d8f5f7e445f0 ("hugetlb: set hugetlb page flag before optimizing vmemmap")
> Suggested-by: Naoya Horiguchi <[email protected]>
> Signed-off-by: Mike Kravetz <[email protected]>
> ---
> mm/hugetlb.c | 24 +++++++++++++++---------
> 1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 36b40bc9ac25..13736cbb2c19 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2290,17 +2290,23 @@ int dissolve_free_huge_page(struct page *page)
> * need to adjust max_huge_pages if the page is not freed.
> * Attempt to allocate vmemmmap here so that we can take
> * appropriate action on failure.
> + *
> + * The folio_test_hugetlb check here is because
> + * remove_hugetlb_folio will clear hugetlb folio flag for
> + * non-vmemmap optimized hugetlb folios.
> */
> - rc = hugetlb_vmemmap_restore(h, &folio->page);
> - if (!rc) {
> - update_and_free_hugetlb_folio(h, folio, false);
> - } else {
> - spin_lock_irq(&hugetlb_lock);
> - add_hugetlb_folio(h, folio, false);
> - h->max_huge_pages++;
> - spin_unlock_irq(&hugetlb_lock);
> - }
> + if (folio_test_hugetlb(folio)) {
> + rc = hugetlb_vmemmap_restore(h, &folio->page);
> + if (rc) {
> + spin_lock_irq(&hugetlb_lock);
> + add_hugetlb_folio(h, folio, false);
> + h->max_huge_pages++;
> + goto out;
> + }
> + } else
> + rc = 0;
>
> + update_and_free_hugetlb_folio(h, folio, false);
> return rc;
> }
> out:
> --
> 2.41.0
>
On 10/18/23 10:58, Naoya Horiguchi wrote:
> On Mon, Oct 16, 2023 at 08:21:40PM -0700, Mike Kravetz wrote:
> > On 10/13/23 21:58, Naoya Horiguchi wrote:
> > > On Tue, Sep 05, 2023 at 02:44:00PM -0700, Mike Kravetz wrote:
> > Hi Naoya,
> >
> > I believe we need to set 'rc = 0' in the !folio_test_hugetlb(). I put
> > together the following patch based on mm-stable. Please take a look.
>
> Hi Mike, the patch looks good to me, and I confirmed that my test programs
> showed no new problem on latest mm-stable + this patch.
>
Thank you for testing!
My patch could be simpler/more elegant by eliminating that else clause
and just returning 0 after update_and_free_hugetlb_folio. Oh well, it
is functionally the same.
--
Mike Kravetz
> Thank you very much.
> Naoya Horiguchi
>
> >
> > From f19fbfab324d7d17de4a1e814f95ee655950c58e Mon Sep 17 00:00:00 2001
> > From: Mike Kravetz <[email protected]>
> > Date: Mon, 16 Oct 2023 19:55:49 -0700
> > Subject: [PATCH] hugetlb: check for hugetlb folio before vmemmap_restore
> >
> > In commit d8f5f7e445f0 ("hugetlb: set hugetlb page flag before
> > optimizing vmemmap") checks were added to print a warning if
> > hugetlb_vmemmap_restore was called on a non-hugetlb page. This
> > was mostly due to ordering issues in the hugetlb page set up and
> > tear down sequencees. One place missed was the routine
> > dissolve_free_huge_page. Naoya Horiguchi noted: "I saw that
> > VM_WARN_ON_ONCE() in hugetlb_vmemmap_restore is triggered when
> > memory_failure() is called on a free hugetlb page with vmemmap
> > optimization disabled (the warning is not triggered if vmemmap
> > optimization is enabled). I think that we need check
> > folio_test_hugetlb() before dissolve_free_huge_page() calls
> > hugetlb_vmemmap_restore_folio()."
> >
> > Perform the check as suggested by Naoya.
> >
> > Fixes: d8f5f7e445f0 ("hugetlb: set hugetlb page flag before optimizing vmemmap")
> > Suggested-by: Naoya Horiguchi <[email protected]>
> > Signed-off-by: Mike Kravetz <[email protected]>
> > ---
> > mm/hugetlb.c | 24 +++++++++++++++---------
> > 1 file changed, 15 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 36b40bc9ac25..13736cbb2c19 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -2290,17 +2290,23 @@ int dissolve_free_huge_page(struct page *page)
> > * need to adjust max_huge_pages if the page is not freed.
> > * Attempt to allocate vmemmmap here so that we can take
> > * appropriate action on failure.
> > + *
> > + * The folio_test_hugetlb check here is because
> > + * remove_hugetlb_folio will clear hugetlb folio flag for
> > + * non-vmemmap optimized hugetlb folios.
> > */
> > - rc = hugetlb_vmemmap_restore(h, &folio->page);
> > - if (!rc) {
> > - update_and_free_hugetlb_folio(h, folio, false);
> > - } else {
> > - spin_lock_irq(&hugetlb_lock);
> > - add_hugetlb_folio(h, folio, false);
> > - h->max_huge_pages++;
> > - spin_unlock_irq(&hugetlb_lock);
> > - }
> > + if (folio_test_hugetlb(folio)) {
> > + rc = hugetlb_vmemmap_restore(h, &folio->page);
> > + if (rc) {
> > + spin_lock_irq(&hugetlb_lock);
> > + add_hugetlb_folio(h, folio, false);
> > + h->max_huge_pages++;
> > + goto out;
> > + }
> > + } else
> > + rc = 0;
> >
> > + update_and_free_hugetlb_folio(h, folio, false);
> > return rc;
> > }
> > out:
> > --
> > 2.41.0