2024-01-09 19:32:12

by Yang Shi

[permalink] [raw]
Subject: [PATCH 1/2] mm: mmap: no need to call khugepaged_enter_vma() for stack

From: Yang Shi <[email protected]>

We avoid allocating THP for temporary stack, even tnough
khugepaged_enter_vma() is called for stack VMAs, it actualy returns
false. So no need to call it in the first place at all.

Signed-off-by: Yang Shi <[email protected]>
---
mm/mmap.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index b78e83d351d2..2ff79b1d1564 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2046,7 +2046,6 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
}
}
anon_vma_unlock_write(vma->anon_vma);
- khugepaged_enter_vma(vma, vma->vm_flags);
mas_destroy(&mas);
validate_mm(mm);
return error;
@@ -2140,7 +2139,6 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
}
}
anon_vma_unlock_write(vma->anon_vma);
- khugepaged_enter_vma(vma, vma->vm_flags);
mas_destroy(&mas);
validate_mm(mm);
return error;
--
2.41.0



2024-01-09 19:33:10

by Yang Shi

[permalink] [raw]
Subject: [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE

From: Yang Shi <[email protected]>

The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
boundaries") incured regression for stress-ng pthread benchmark [1].
It is because THP get allocated to pthread's stack area much more possible
than before. Pthread's stack area is allocated by mmap without VM_GROWSDOWN
or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not.

The MAP_STACK flag is used to mark the stack area, but it is a no-op on
Linux. Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating
THP for such stack area.

With this change the stack area looks like:

fffd18e10000-fffd19610000 rw-p 00000000 00:00 0
Size: 8192 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 12 kB
Pss: 12 kB
Pss_Dirty: 12 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 12 kB
Referenced: 12 kB
Anonymous: 12 kB
KSM: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
THPeligible: 0
VmFlags: rd wr mr mw me ac nh

The "nh" flag is set.

[1] https://lore.kernel.org/linux-mm/[email protected]/

Reported-by: kernel test robot <[email protected]>
Tested-by: Oliver Sang <[email protected]>
Cc: Yin Fengwei <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Christopher Lameter <[email protected]>
Cc: Huang, Ying <[email protected]>
Signed-off-by: Yang Shi <[email protected]>
---
include/linux/mman.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/linux/mman.h b/include/linux/mman.h
index 40d94411d492..dc7048824be8 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
_calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
_calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
+ _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
arch_calc_vm_flag_bits(flags);
}

--
2.41.0


2024-01-10 01:36:14

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: mmap: no need to call khugepaged_enter_vma() for stack



On 2023/12/21 14:59, Yang Shi wrote:
> From: Yang Shi <[email protected]>
>
> We avoid allocating THP for temporary stack, even tnough
> khugepaged_enter_vma() is called for stack VMAs, it actualy returns
> false. So no need to call it in the first place at all.
>
> Signed-off-by: Yang Shi <[email protected]>
Reviewed-by: Yin Fengwei <[email protected]>

> ---
> mm/mmap.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index b78e83d351d2..2ff79b1d1564 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2046,7 +2046,6 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
> }
> }
> anon_vma_unlock_write(vma->anon_vma);
> - khugepaged_enter_vma(vma, vma->vm_flags);
> mas_destroy(&mas);
> validate_mm(mm);
> return error;
> @@ -2140,7 +2139,6 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
> }
> }
> anon_vma_unlock_write(vma->anon_vma);
> - khugepaged_enter_vma(vma, vma->vm_flags);
> mas_destroy(&mas);
> validate_mm(mm);
> return error;

2024-01-10 01:37:02

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE



On 2023/12/21 14:59, Yang Shi wrote:
> From: Yang Shi <[email protected]>
>
> The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> boundaries") incured regression for stress-ng pthread benchmark [1].
> It is because THP get allocated to pthread's stack area much more possible
> than before. Pthread's stack area is allocated by mmap without VM_GROWSDOWN
> or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not.
>
> The MAP_STACK flag is used to mark the stack area, but it is a no-op on
> Linux. Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating
> THP for such stack area.
>
> With this change the stack area looks like:
>
> fffd18e10000-fffd19610000 rw-p 00000000 00:00 0
> Size: 8192 kB
> KernelPageSize: 4 kB
> MMUPageSize: 4 kB
> Rss: 12 kB
> Pss: 12 kB
> Pss_Dirty: 12 kB
> Shared_Clean: 0 kB
> Shared_Dirty: 0 kB
> Private_Clean: 0 kB
> Private_Dirty: 12 kB
> Referenced: 12 kB
> Anonymous: 12 kB
> KSM: 0 kB
> LazyFree: 0 kB
> AnonHugePages: 0 kB
> ShmemPmdMapped: 0 kB
> FilePmdMapped: 0 kB
> Shared_Hugetlb: 0 kB
> Private_Hugetlb: 0 kB
> Swap: 0 kB
> SwapPss: 0 kB
> Locked: 0 kB
> THPeligible: 0
> VmFlags: rd wr mr mw me ac nh
>
> The "nh" flag is set.
>
> [1] https://lore.kernel.org/linux-mm/[email protected]/
>
> Reported-by: kernel test robot <[email protected]>
> Tested-by: Oliver Sang <[email protected]>
> Cc: Yin Fengwei <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Christopher Lameter <[email protected]>
> Cc: Huang, Ying <[email protected]>
> Signed-off-by: Yang Shi <[email protected]>

Reviewed-by: Yin Fengwei <[email protected]>

> ---
> include/linux/mman.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index 40d94411d492..dc7048824be8 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
> _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
> _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
> + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
> arch_calc_vm_flag_bits(flags);
> }
>

2024-01-15 05:52:41

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: mmap: no need to call khugepaged_enter_vma() for stack

Yang Shi <[email protected]> writes:

> From: Yang Shi <[email protected]>
>
> We avoid allocating THP for temporary stack, even tnough
~~~~~~
though?

--
Best Regards,
Huang, Ying

> khugepaged_enter_vma() is called for stack VMAs, it actualy returns
> false. So no need to call it in the first place at all.
>
> Signed-off-by: Yang Shi <[email protected]>
> ---
> mm/mmap.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index b78e83d351d2..2ff79b1d1564 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2046,7 +2046,6 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
> }
> }
> anon_vma_unlock_write(vma->anon_vma);
> - khugepaged_enter_vma(vma, vma->vm_flags);
> mas_destroy(&mas);
> validate_mm(mm);
> return error;
> @@ -2140,7 +2139,6 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
> }
> }
> anon_vma_unlock_write(vma->anon_vma);
> - khugepaged_enter_vma(vma, vma->vm_flags);
> mas_destroy(&mas);
> validate_mm(mm);
> return error;

2024-01-16 19:23:07

by Zach O'Keefe

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE

Thanks Yang,

Should this be marked for stable? Given how easily it is for pthreads
to allocate hugepages w/o this change, it can easily cause memory
bloat on larger systems and/or users with high thread counts. I don't
think that will be welcomed, and seems odd that just 6.7 should suffer
this.

Thanks,
Zach

On Tue, Jan 9, 2024 at 5:36 PM Yin Fengwei <[email protected]> wrote:
>
>
>
> On 2023/12/21 14:59, Yang Shi wrote:
> > From: Yang Shi <[email protected]>
> >
> > The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> > boundaries") incured regression for stress-ng pthread benchmark [1].
> > It is because THP get allocated to pthread's stack area much more possible
> > than before. Pthread's stack area is allocated by mmap without VM_GROWSDOWN
> > or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not.
> >
> > The MAP_STACK flag is used to mark the stack area, but it is a no-op on
> > Linux. Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating
> > THP for such stack area.
> >
> > With this change the stack area looks like:
> >
> > fffd18e10000-fffd19610000 rw-p 00000000 00:00 0
> > Size: 8192 kB
> > KernelPageSize: 4 kB
> > MMUPageSize: 4 kB
> > Rss: 12 kB
> > Pss: 12 kB
> > Pss_Dirty: 12 kB
> > Shared_Clean: 0 kB
> > Shared_Dirty: 0 kB
> > Private_Clean: 0 kB
> > Private_Dirty: 12 kB
> > Referenced: 12 kB
> > Anonymous: 12 kB
> > KSM: 0 kB
> > LazyFree: 0 kB
> > AnonHugePages: 0 kB
> > ShmemPmdMapped: 0 kB
> > FilePmdMapped: 0 kB
> > Shared_Hugetlb: 0 kB
> > Private_Hugetlb: 0 kB
> > Swap: 0 kB
> > SwapPss: 0 kB
> > Locked: 0 kB
> > THPeligible: 0
> > VmFlags: rd wr mr mw me ac nh
> >
> > The "nh" flag is set.
> >
> > [1] https://lore.kernel.org/linux-mm/[email protected]/
> >
> > Reported-by: kernel test robot <[email protected]>
> > Tested-by: Oliver Sang <[email protected]>
> > Cc: Yin Fengwei <[email protected]>
> > Cc: Rik van Riel <[email protected]>
> > Cc: Matthew Wilcox <[email protected]>
> > Cc: Christopher Lameter <[email protected]>
> > Cc: Huang, Ying <[email protected]>
> > Signed-off-by: Yang Shi <[email protected]>
>
> Reviewed-by: Yin Fengwei <[email protected]>
>
> > ---
> > include/linux/mman.h | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > index 40d94411d492..dc7048824be8 100644
> > --- a/include/linux/mman.h
> > +++ b/include/linux/mman.h
> > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
> > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
> > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
> > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
> > arch_calc_vm_flag_bits(flags);
> > }
> >
>

2024-01-16 22:22:03

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE

On Tue, Jan 16, 2024 at 11:22 AM Zach O'Keefe <[email protected]> wrote:
>
> Thanks Yang,
>
> Should this be marked for stable? Given how easily it is for pthreads
> to allocate hugepages w/o this change, it can easily cause memory
> bloat on larger systems and/or users with high thread counts. I don't
> think that will be welcomed, and seems odd that just 6.7 should suffer
> this.

Thanks for the suggestion, fine to me.

>
> Thanks,
> Zach
>
> On Tue, Jan 9, 2024 at 5:36 PM Yin Fengwei <[email protected]> wrote:
> >
> >
> >
> > On 2023/12/21 14:59, Yang Shi wrote:
> > > From: Yang Shi <[email protected]>
> > >
> > > The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> > > boundaries") incured regression for stress-ng pthread benchmark [1].
> > > It is because THP get allocated to pthread's stack area much more possible
> > > than before. Pthread's stack area is allocated by mmap without VM_GROWSDOWN
> > > or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not.
> > >
> > > The MAP_STACK flag is used to mark the stack area, but it is a no-op on
> > > Linux. Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating
> > > THP for such stack area.
> > >
> > > With this change the stack area looks like:
> > >
> > > fffd18e10000-fffd19610000 rw-p 00000000 00:00 0
> > > Size: 8192 kB
> > > KernelPageSize: 4 kB
> > > MMUPageSize: 4 kB
> > > Rss: 12 kB
> > > Pss: 12 kB
> > > Pss_Dirty: 12 kB
> > > Shared_Clean: 0 kB
> > > Shared_Dirty: 0 kB
> > > Private_Clean: 0 kB
> > > Private_Dirty: 12 kB
> > > Referenced: 12 kB
> > > Anonymous: 12 kB
> > > KSM: 0 kB
> > > LazyFree: 0 kB
> > > AnonHugePages: 0 kB
> > > ShmemPmdMapped: 0 kB
> > > FilePmdMapped: 0 kB
> > > Shared_Hugetlb: 0 kB
> > > Private_Hugetlb: 0 kB
> > > Swap: 0 kB
> > > SwapPss: 0 kB
> > > Locked: 0 kB
> > > THPeligible: 0
> > > VmFlags: rd wr mr mw me ac nh
> > >
> > > The "nh" flag is set.
> > >
> > > [1] https://lore.kernel.org/linux-mm/[email protected]/
> > >
> > > Reported-by: kernel test robot <[email protected]>
> > > Tested-by: Oliver Sang <[email protected]>
> > > Cc: Yin Fengwei <[email protected]>
> > > Cc: Rik van Riel <[email protected]>
> > > Cc: Matthew Wilcox <[email protected]>
> > > Cc: Christopher Lameter <[email protected]>
> > > Cc: Huang, Ying <[email protected]>
> > > Signed-off-by: Yang Shi <[email protected]>
> >
> > Reviewed-by: Yin Fengwei <[email protected]>
> >
> > > ---
> > > include/linux/mman.h | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > > index 40d94411d492..dc7048824be8 100644
> > > --- a/include/linux/mman.h
> > > +++ b/include/linux/mman.h
> > > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> > > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
> > > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
> > > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
> > > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
> > > arch_calc_vm_flag_bits(flags);
> > > }
> > >
> >

2024-01-16 22:29:09

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE

On Tue, 16 Jan 2024 12:57:41 -0800 Yang Shi <[email protected]> wrote:

> > Should this be marked for stable? Given how easily it is for pthreads
> > to allocate hugepages w/o this change, it can easily cause memory
> > bloat on larger systems and/or users with high thread counts. I don't
> > think that will be welcomed, and seems odd that just 6.7 should suffer
> > this.
>
> Thanks for the suggestion, fine to me.
>

Thanks, added, along with

Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries")


2024-01-16 22:34:38

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: mmap: no need to call khugepaged_enter_vma() for stack

On Sun, Jan 14, 2024 at 9:52 PM Huang, Ying <[email protected]> wrote:
>
> Yang Shi <[email protected]> writes:
>
> > From: Yang Shi <[email protected]>
> >
> > We avoid allocating THP for temporary stack, even tnough
> ~~~~~~
> though?

Yeah, it is a typo. Thanks for noticing this.

>
> --
> Best Regards,
> Huang, Ying
>
> > khugepaged_enter_vma() is called for stack VMAs, it actualy returns
> > false. So no need to call it in the first place at all.
> >
> > Signed-off-by: Yang Shi <[email protected]>
> > ---
> > mm/mmap.c | 2 --
> > 1 file changed, 2 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index b78e83d351d2..2ff79b1d1564 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -2046,7 +2046,6 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
> > }
> > }
> > anon_vma_unlock_write(vma->anon_vma);
> > - khugepaged_enter_vma(vma, vma->vm_flags);
> > mas_destroy(&mas);
> > validate_mm(mm);
> > return error;
> > @@ -2140,7 +2139,6 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
> > }
> > }
> > anon_vma_unlock_write(vma->anon_vma);
> > - khugepaged_enter_vma(vma, vma->vm_flags);
> > mas_destroy(&mas);
> > validate_mm(mm);
> > return error;

2024-01-31 07:55:59

by Florian Weimer

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE

* Yang Shi:

> From: Yang Shi <[email protected]>
>
> The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> boundaries") incured regression for stress-ng pthread benchmark [1].
> It is because THP get allocated to pthread's stack area much more possible
> than before. Pthread's stack area is allocated by mmap without VM_GROWSDOWN
> or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not.
>
> The MAP_STACK flag is used to mark the stack area, but it is a no-op on
> Linux. Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating
> THP for such stack area.

Doesn't this introduce a regression in the other direction, where
workloads expect to use a hugepage TLB entry for the stack?

It's seems an odd approach to fixing the stress-ng regression. Isn't it
very much coding to the benchmark?

Thanks,
Florian


2024-01-31 18:47:57

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE

On Tue, Jan 30, 2024 at 11:53 PM Florian Weimer <[email protected]> wrote:
>
> * Yang Shi:
>
> > From: Yang Shi <[email protected]>
> >
> > The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> > boundaries") incured regression for stress-ng pthread benchmark [1].
> > It is because THP get allocated to pthread's stack area much more possible
> > than before. Pthread's stack area is allocated by mmap without VM_GROWSDOWN
> > or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not.
> >
> > The MAP_STACK flag is used to mark the stack area, but it is a no-op on
> > Linux. Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating
> > THP for such stack area.
>
> Doesn't this introduce a regression in the other direction, where
> workloads expect to use a hugepage TLB entry for the stack?

Maybe, it is theoretically possible. But AFAICT, the real life
workloads performance usually gets hurt if THP is used for stack.
Willy has an example:

https://lore.kernel.org/linux-mm/[email protected]/#t

And avoiding THP on stack is not new, VM_GROWSDOWN | VM_GROWSUP areas
have been applied before, this patch just extends this to MAP_STACK.

>
> It's seems an odd approach to fixing the stress-ng regression. Isn't it
> very much coding to the benchmark?
>
> Thanks,
> Florian
>

2024-02-01 15:34:31

by Florian Weimer

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE

* Yang Shi:

> On Tue, Jan 30, 2024 at 11:53 PM Florian Weimer <[email protected]> wrote:
>>
>> * Yang Shi:
>>
>> > From: Yang Shi <[email protected]>
>> >
>> > The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
>> > boundaries") incured regression for stress-ng pthread benchmark [1].
>> > It is because THP get allocated to pthread's stack area much more possible
>> > than before. Pthread's stack area is allocated by mmap without VM_GROWSDOWN
>> > or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not.
>> >
>> > The MAP_STACK flag is used to mark the stack area, but it is a no-op on
>> > Linux. Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating
>> > THP for such stack area.
>>
>> Doesn't this introduce a regression in the other direction, where
>> workloads expect to use a hugepage TLB entry for the stack?
>
> Maybe, it is theoretically possible. But AFAICT, the real life
> workloads performance usually gets hurt if THP is used for stack.
> Willy has an example:
>
> https://lore.kernel.org/linux-mm/[email protected]/#t
>
> And avoiding THP on stack is not new, VM_GROWSDOWN | VM_GROWSUP areas
> have been applied before, this patch just extends this to MAP_STACK.

If it's *always* beneficial then we should help it along in glibc as
well. We've started to offer a tunable in response to this observation
(also paper over in OpenJDK):

Make thread stacks not use huge pages
<https://bugs.openjdk.org/browse/JDK-8303215>

But this is specifically about RSS usage, and not directly about
reducing TLB misses etc.

Thanks,
Florian


2024-02-01 19:00:52

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE

On Thu, Feb 1, 2024 at 7:34 AM Florian Weimer <[email protected]> wrote:
>
> * Yang Shi:
>
> > On Tue, Jan 30, 2024 at 11:53 PM Florian Weimer <fweimer@redhatcom> wrote:
> >>
> >> * Yang Shi:
> >>
> >> > From: Yang Shi <[email protected]>
> >> >
> >> > The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> >> > boundaries") incured regression for stress-ng pthread benchmark [1].
> >> > It is because THP get allocated to pthread's stack area much more possible
> >> > than before. Pthread's stack area is allocated by mmap without VM_GROWSDOWN
> >> > or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not.
> >> >
> >> > The MAP_STACK flag is used to mark the stack area, but it is a no-op on
> >> > Linux. Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating
> >> > THP for such stack area.
> >>
> >> Doesn't this introduce a regression in the other direction, where
> >> workloads expect to use a hugepage TLB entry for the stack?
> >
> > Maybe, it is theoretically possible. But AFAICT, the real life
> > workloads performance usually gets hurt if THP is used for stack.
> > Willy has an example:
> >
> > https://lore.kernel.org/linux-mm/[email protected]/#t
> >
> > And avoiding THP on stack is not new, VM_GROWSDOWN | VM_GROWSUP areas
> > have been applied before, this patch just extends this to MAP_STACK.
>
> If it's *always* beneficial then we should help it along in glibc as
> well. We've started to offer a tunable in response to this observation
> (also paper over in OpenJDK):
>
> Make thread stacks not use huge pages
> <https://bugs.openjdk.org/browse/JDK-8303215>
>
> But this is specifically about RSS usage, and not directly about
> reducing TLB misses etc.

Thanks for the data point. Out of curiosity, what mmap flags are used
by JVM to indicate a stack? MAP_STACK? If so it should get
VM_NOHUGEPAGE due to this patch (of course, on older kernel
MADV_NOHUGEPAGE must be called by JVM).

Letting others, for example, glibc, call MADV_NOHUGEPAGE explicitly on
stack area is fine too, but it may take some time to get there...

>
> Thanks,
> Florian
>