HugeTLB pages have a struct page optimizations where struct pages for tail
pages are freed. However, when HugeTLB pages are destroyed, the memory for
struct pages (vmemmap) need to be allocated again.
Currently, __GFP_NORETRY flag is used to allocate the memory for vmemmap,
but given that this flag makes very little effort to actually reclaim
memory the returning of huge pages back to the system can be problem. Lets
use __GFP_RETRY_MAYFAIL instead. This flag is also performs graceful
reclaim without causing ooms, but at least it may perform a few retries,
and will fail only when there is genuinely little amount of unused memory
in the system.
Signed-off-by: Pasha Tatashin <[email protected]>
Suggested-by: David Rientjes <[email protected]>
---
mm/hugetlb_vmemmap.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index a559037cce00..c4226d2af7cc 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -475,9 +475,12 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head)
* the range is mapped to the page which @vmemmap_reuse is mapped to.
* When a HugeTLB page is freed to the buddy allocator, previously
* discarded vmemmap pages must be allocated and remapping.
+ *
+ * Use __GFP_RETRY_MAYFAIL to fail only when there is genuinely little
+ * unused memory in the system.
*/
ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, vmemmap_reuse,
- GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
+ GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE);
if (!ret) {
ClearHPageVmemmapOptimized(head);
static_branch_dec(&hugetlb_optimize_vmemmap_key);
--
2.40.0.577.gac1e443424-goog
On Wed, 12 Apr 2023, Pasha Tatashin wrote:
> HugeTLB pages have a struct page optimizations where struct pages for tail
> pages are freed. However, when HugeTLB pages are destroyed, the memory for
> struct pages (vmemmap) need to be allocated again.
>
> Currently, __GFP_NORETRY flag is used to allocate the memory for vmemmap,
> but given that this flag makes very little effort to actually reclaim
> memory the returning of huge pages back to the system can be problem. Lets
> use __GFP_RETRY_MAYFAIL instead. This flag is also performs graceful
> reclaim without causing ooms, but at least it may perform a few retries,
> and will fail only when there is genuinely little amount of unused memory
> in the system.
>
Thanks Pasha, this definitely makes sense. We want to free the hugetlb
page back to the system so it would be a shame to have to strand it in the
hugetlb pool because we can't allocate the tail pages (we want to free
more memory than we're allocating).
> Signed-off-by: Pasha Tatashin <[email protected]>
> Suggested-by: David Rientjes <[email protected]>
> ---
> mm/hugetlb_vmemmap.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index a559037cce00..c4226d2af7cc 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -475,9 +475,12 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head)
> * the range is mapped to the page which @vmemmap_reuse is mapped to.
> * When a HugeTLB page is freed to the buddy allocator, previously
> * discarded vmemmap pages must be allocated and remapping.
> + *
> + * Use __GFP_RETRY_MAYFAIL to fail only when there is genuinely little
> + * unused memory in the system.
> */
> ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, vmemmap_reuse,
> - GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
> + GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE);
> if (!ret) {
> ClearHPageVmemmapOptimized(head);
> static_branch_dec(&hugetlb_optimize_vmemmap_key);
The behavior of __GFP_RETRY_MAYFAIL is different for high-order memory (at
least larger than PAGE_ALLOC_COSTLY_ORDER). The order that we're
allocating would depend on the implementation of alloc_vmemmap_page_list()
so likely best to move the gfp mask to that function.
On 04/12/23 10:54, David Rientjes wrote:
> On Wed, 12 Apr 2023, Pasha Tatashin wrote:
>
> > HugeTLB pages have a struct page optimizations where struct pages for tail
> > pages are freed. However, when HugeTLB pages are destroyed, the memory for
> > struct pages (vmemmap) need to be allocated again.
> >
> > Currently, __GFP_NORETRY flag is used to allocate the memory for vmemmap,
> > but given that this flag makes very little effort to actually reclaim
> > memory the returning of huge pages back to the system can be problem. Lets
> > use __GFP_RETRY_MAYFAIL instead. This flag is also performs graceful
> > reclaim without causing ooms, but at least it may perform a few retries,
> > and will fail only when there is genuinely little amount of unused memory
> > in the system.
> >
>
> Thanks Pasha, this definitely makes sense. We want to free the hugetlb
> page back to the system so it would be a shame to have to strand it in the
> hugetlb pool because we can't allocate the tail pages (we want to free
> more memory than we're allocating).
Agree.
The hugetlb vmemmmap freeing series went through more than 20 revisions
before being merged. One issue with much discussion was the need to
allocate vmemmap pages when hugetlb pages were returned to buddy.
It looks like the current set of GFP flags was suggested here:
https://lore.kernel.org/linux-mm/[email protected]/
Although, it was also mentioned that __GFP_RETRY_MAYFAIL could be used
instead of __GFP_NORETRY here:
https://lore.kernel.org/linux-mm/[email protected]/
Adding Michal on Cc: since these were his suggestions.
>
> > Signed-off-by: Pasha Tatashin <[email protected]>
> > Suggested-by: David Rientjes <[email protected]>
> > ---
> > mm/hugetlb_vmemmap.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> > index a559037cce00..c4226d2af7cc 100644
> > --- a/mm/hugetlb_vmemmap.c
> > +++ b/mm/hugetlb_vmemmap.c
> > @@ -475,9 +475,12 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head)
> > * the range is mapped to the page which @vmemmap_reuse is mapped to.
> > * When a HugeTLB page is freed to the buddy allocator, previously
> > * discarded vmemmap pages must be allocated and remapping.
> > + *
> > + * Use __GFP_RETRY_MAYFAIL to fail only when there is genuinely little
> > + * unused memory in the system.
> > */
> > ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, vmemmap_reuse,
> > - GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
> > + GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE);
> > if (!ret) {
> > ClearHPageVmemmapOptimized(head);
> > static_branch_dec(&hugetlb_optimize_vmemmap_key);
>
> The behavior of __GFP_RETRY_MAYFAIL is different for high-order memory (at
> least larger than PAGE_ALLOC_COSTLY_ORDER). The order that we're
> allocating would depend on the implementation of alloc_vmemmap_page_list()
> so likely best to move the gfp mask to that function.
Good point.
--
Mike Kravetz
On Wed, Apr 12, 2023 at 1:54 PM David Rientjes <[email protected]> wrote:
>
> On Wed, 12 Apr 2023, Pasha Tatashin wrote:
>
> > HugeTLB pages have a struct page optimizations where struct pages for tail
> > pages are freed. However, when HugeTLB pages are destroyed, the memory for
> > struct pages (vmemmap) need to be allocated again.
> >
> > Currently, __GFP_NORETRY flag is used to allocate the memory for vmemmap,
> > but given that this flag makes very little effort to actually reclaim
> > memory the returning of huge pages back to the system can be problem. Lets
> > use __GFP_RETRY_MAYFAIL instead. This flag is also performs graceful
> > reclaim without causing ooms, but at least it may perform a few retries,
> > and will fail only when there is genuinely little amount of unused memory
> > in the system.
> >
>
> Thanks Pasha, this definitely makes sense. We want to free the hugetlb
> page back to the system so it would be a shame to have to strand it in the
> hugetlb pool because we can't allocate the tail pages (we want to free
> more memory than we're allocating).
>
> > Signed-off-by: Pasha Tatashin <[email protected]>
> > Suggested-by: David Rientjes <[email protected]>
> > ---
> > mm/hugetlb_vmemmap.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> > index a559037cce00..c4226d2af7cc 100644
> > --- a/mm/hugetlb_vmemmap.c
> > +++ b/mm/hugetlb_vmemmap.c
> > @@ -475,9 +475,12 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head)
> > * the range is mapped to the page which @vmemmap_reuse is mapped to.
> > * When a HugeTLB page is freed to the buddy allocator, previously
> > * discarded vmemmap pages must be allocated and remapping.
> > + *
> > + * Use __GFP_RETRY_MAYFAIL to fail only when there is genuinely little
> > + * unused memory in the system.
> > */
> > ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, vmemmap_reuse,
> > - GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
> > + GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE);
> > if (!ret) {
> > ClearHPageVmemmapOptimized(head);
> > static_branch_dec(&hugetlb_optimize_vmemmap_key);
>
> The behavior of __GFP_RETRY_MAYFAIL is different for high-order memory (at
> least larger than PAGE_ALLOC_COSTLY_ORDER). The order that we're
> allocating would depend on the implementation of alloc_vmemmap_page_list()
> so likely best to move the gfp mask to that function.
Thank you David. This makes sense, I will send the 2nd version soon.
Pasha
On Wed, Apr 12, 2023 at 3:57 PM Mike Kravetz <[email protected]> wrote:
>
> On 04/12/23 10:54, David Rientjes wrote:
> > On Wed, 12 Apr 2023, Pasha Tatashin wrote:
> >
> > > HugeTLB pages have a struct page optimizations where struct pages for tail
> > > pages are freed. However, when HugeTLB pages are destroyed, the memory for
> > > struct pages (vmemmap) need to be allocated again.
> > >
> > > Currently, __GFP_NORETRY flag is used to allocate the memory for vmemmap,
> > > but given that this flag makes very little effort to actually reclaim
> > > memory the returning of huge pages back to the system can be problem. Lets
> > > use __GFP_RETRY_MAYFAIL instead. This flag is also performs graceful
> > > reclaim without causing ooms, but at least it may perform a few retries,
> > > and will fail only when there is genuinely little amount of unused memory
> > > in the system.
> > >
> >
> > Thanks Pasha, this definitely makes sense. We want to free the hugetlb
> > page back to the system so it would be a shame to have to strand it in the
> > hugetlb pool because we can't allocate the tail pages (we want to free
> > more memory than we're allocating).
>
> Agree.
>
> The hugetlb vmemmmap freeing series went through more than 20 revisions
> before being merged. One issue with much discussion was the need to
> allocate vmemmap pages when hugetlb pages were returned to buddy.
>
> It looks like the current set of GFP flags was suggested here:
> https://lore.kernel.org/linux-mm/[email protected]/
>
> Although, it was also mentioned that __GFP_RETRY_MAYFAIL could be used
> instead of __GFP_NORETRY here:
> https://lore.kernel.org/linux-mm/[email protected]/
>
> Adding Michal on Cc: since these were his suggestions.
Thank you for the background Mike. I have sent the 2nd version, and
added Michal into that patch.
Pasha