2019-04-23 16:44:35

by Yang Shi

[permalink] [raw]
Subject: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility

The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
vma") introduced THPeligible bit for processes' smaps. But, when checking
the eligibility for shmem vma, __transparent_hugepage_enabled() is
called to override the result from shmem_huge_enabled(). It may result
in the anonymous vma's THP flag override shmem's. For example, running a
simple test which create THP for shmem, but with anonymous THP disabled,
when reading the process's smaps, it may show:

7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
Size: 4096 kB
...
[snip]
...
ShmemPmdMapped: 4096 kB
...
[snip]
...
THPeligible: 0

And, /proc/meminfo does show THP allocated and PMD mapped too:

ShmemHugePages: 4096 kB
ShmemPmdMapped: 4096 kB

This doesn't make too much sense. The anonymous THP flag should not
intervene shmem THP. Calling shmem_huge_enabled() with checking
MMF_DISABLE_THP sounds good enough. And, we could skip stack and
dax vma check since we already checked if the vma is shmem already.

Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma")
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Signed-off-by: Yang Shi <[email protected]>
---
v2: Check VM_NOHUGEPAGE per Michal Hocko

mm/huge_memory.c | 4 ++--
mm/shmem.c | 3 +++
2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 165ea46..5881e82 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -67,8 +67,8 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma)
{
if (vma_is_anonymous(vma))
return __transparent_hugepage_enabled(vma);
- if (vma_is_shmem(vma) && shmem_huge_enabled(vma))
- return __transparent_hugepage_enabled(vma);
+ if (vma_is_shmem(vma))
+ return shmem_huge_enabled(vma);

return false;
}
diff --git a/mm/shmem.c b/mm/shmem.c
index 2275a0f..6f09a31 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3873,6 +3873,9 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
loff_t i_size;
pgoff_t off;

+ if ((vma->vm_flags & VM_NOHUGEPAGE) ||
+ test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
+ return false;
if (shmem_huge == SHMEM_HUGE_FORCE)
return true;
if (shmem_huge == SHMEM_HUGE_DENY)
--
1.8.3.1


2019-04-23 17:54:08

by Michal Hocko

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility

On Wed 24-04-19 00:43:01, Yang Shi wrote:
> The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
> vma") introduced THPeligible bit for processes' smaps. But, when checking
> the eligibility for shmem vma, __transparent_hugepage_enabled() is
> called to override the result from shmem_huge_enabled(). It may result
> in the anonymous vma's THP flag override shmem's. For example, running a
> simple test which create THP for shmem, but with anonymous THP disabled,
> when reading the process's smaps, it may show:
>
> 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
> Size: 4096 kB
> ...
> [snip]
> ...
> ShmemPmdMapped: 4096 kB
> ...
> [snip]
> ...
> THPeligible: 0
>
> And, /proc/meminfo does show THP allocated and PMD mapped too:
>
> ShmemHugePages: 4096 kB
> ShmemPmdMapped: 4096 kB
>
> This doesn't make too much sense. The anonymous THP flag should not
> intervene shmem THP. Calling shmem_huge_enabled() with checking
> MMF_DISABLE_THP sounds good enough. And, we could skip stack and
> dax vma check since we already checked if the vma is shmem already.

Kirill, can we get a confirmation that this is really intended behavior
rather than an omission please? Is this documented? What is a global
knob to simply disable THP system wise?

I have to say that the THP tuning API is one giant mess :/

Btw. this patch also seem to fix khugepaged behavior because it previously
ignored both VM_NOHUGEPAGE and MMF_DISABLE_THP.

> Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma")
> Cc: Michal Hocko <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: Kirill A. Shutemov <[email protected]>
> Signed-off-by: Yang Shi <[email protected]>
> ---
> v2: Check VM_NOHUGEPAGE per Michal Hocko
>
> mm/huge_memory.c | 4 ++--
> mm/shmem.c | 3 +++
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 165ea46..5881e82 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -67,8 +67,8 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma)
> {
> if (vma_is_anonymous(vma))
> return __transparent_hugepage_enabled(vma);
> - if (vma_is_shmem(vma) && shmem_huge_enabled(vma))
> - return __transparent_hugepage_enabled(vma);
> + if (vma_is_shmem(vma))
> + return shmem_huge_enabled(vma);
>
> return false;
> }
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 2275a0f..6f09a31 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3873,6 +3873,9 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
> loff_t i_size;
> pgoff_t off;
>
> + if ((vma->vm_flags & VM_NOHUGEPAGE) ||
> + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> + return false;
> if (shmem_huge == SHMEM_HUGE_FORCE)
> return true;
> if (shmem_huge == SHMEM_HUGE_DENY)
> --
> 1.8.3.1
>

--
Michal Hocko
SUSE Labs

2019-04-23 18:35:37

by Yang Shi

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility



On 4/23/19 10:52 AM, Michal Hocko wrote:
> On Wed 24-04-19 00:43:01, Yang Shi wrote:
>> The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
>> vma") introduced THPeligible bit for processes' smaps. But, when checking
>> the eligibility for shmem vma, __transparent_hugepage_enabled() is
>> called to override the result from shmem_huge_enabled(). It may result
>> in the anonymous vma's THP flag override shmem's. For example, running a
>> simple test which create THP for shmem, but with anonymous THP disabled,
>> when reading the process's smaps, it may show:
>>
>> 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
>> Size: 4096 kB
>> ...
>> [snip]
>> ...
>> ShmemPmdMapped: 4096 kB
>> ...
>> [snip]
>> ...
>> THPeligible: 0
>>
>> And, /proc/meminfo does show THP allocated and PMD mapped too:
>>
>> ShmemHugePages: 4096 kB
>> ShmemPmdMapped: 4096 kB
>>
>> This doesn't make too much sense. The anonymous THP flag should not
>> intervene shmem THP. Calling shmem_huge_enabled() with checking
>> MMF_DISABLE_THP sounds good enough. And, we could skip stack and
>> dax vma check since we already checked if the vma is shmem already.
> Kirill, can we get a confirmation that this is really intended behavior
> rather than an omission please? Is this documented? What is a global
> knob to simply disable THP system wise?
>
> I have to say that the THP tuning API is one giant mess :/
>
> Btw. this patch also seem to fix khugepaged behavior because it previously
> ignored both VM_NOHUGEPAGE and MMF_DISABLE_THP.

Aha, I didn't notice this. It looks we need separate the patch to fix
that khugepaged problem for both 5.1-rc and LTS.

>
>> Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma")
>> Cc: Michal Hocko <[email protected]>
>> Cc: Vlastimil Babka <[email protected]>
>> Cc: David Rientjes <[email protected]>
>> Cc: Kirill A. Shutemov <[email protected]>
>> Signed-off-by: Yang Shi <[email protected]>
>> ---
>> v2: Check VM_NOHUGEPAGE per Michal Hocko
>>
>> mm/huge_memory.c | 4 ++--
>> mm/shmem.c | 3 +++
>> 2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 165ea46..5881e82 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -67,8 +67,8 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma)
>> {
>> if (vma_is_anonymous(vma))
>> return __transparent_hugepage_enabled(vma);
>> - if (vma_is_shmem(vma) && shmem_huge_enabled(vma))
>> - return __transparent_hugepage_enabled(vma);
>> + if (vma_is_shmem(vma))
>> + return shmem_huge_enabled(vma);
>>
>> return false;
>> }
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 2275a0f..6f09a31 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -3873,6 +3873,9 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
>> loff_t i_size;
>> pgoff_t off;
>>
>> + if ((vma->vm_flags & VM_NOHUGEPAGE) ||
>> + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>> + return false;
>> if (shmem_huge == SHMEM_HUGE_FORCE)
>> return true;
>> if (shmem_huge == SHMEM_HUGE_DENY)
>> --
>> 1.8.3.1
>>

2019-04-24 00:25:21

by Yang Shi

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility



On 4/23/19 11:34 AM, Yang Shi wrote:
>
>
> On 4/23/19 10:52 AM, Michal Hocko wrote:
>> On Wed 24-04-19 00:43:01, Yang Shi wrote:
>>> The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for
>>> each
>>> vma") introduced THPeligible bit for processes' smaps. But, when
>>> checking
>>> the eligibility for shmem vma, __transparent_hugepage_enabled() is
>>> called to override the result from shmem_huge_enabled().  It may result
>>> in the anonymous vma's THP flag override shmem's.  For example,
>>> running a
>>> simple test which create THP for shmem, but with anonymous THP
>>> disabled,
>>> when reading the process's smaps, it may show:
>>>
>>> 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
>>> Size:               4096 kB
>>> ...
>>> [snip]
>>> ...
>>> ShmemPmdMapped:     4096 kB
>>> ...
>>> [snip]
>>> ...
>>> THPeligible:    0
>>>
>>> And, /proc/meminfo does show THP allocated and PMD mapped too:
>>>
>>> ShmemHugePages:     4096 kB
>>> ShmemPmdMapped:     4096 kB
>>>
>>> This doesn't make too much sense.  The anonymous THP flag should not
>>> intervene shmem THP.  Calling shmem_huge_enabled() with checking
>>> MMF_DISABLE_THP sounds good enough.  And, we could skip stack and
>>> dax vma check since we already checked if the vma is shmem already.
>> Kirill, can we get a confirmation that this is really intended behavior
>> rather than an omission please? Is this documented? What is a global
>> knob to simply disable THP system wise?
>>
>> I have to say that the THP tuning API is one giant mess :/
>>
>> Btw. this patch also seem to fix khugepaged behavior because it
>> previously
>> ignored both VM_NOHUGEPAGE and MMF_DISABLE_THP.

Second look shows this is not ignored. hugepage_vma_check() would check
this for both anonymous vma and shmem vma before scanning. It is called
before shmem_huge_enabled().

>
> Aha, I didn't notice this. It looks we need separate the patch to fix
> that khugepaged problem for both 5.1-rc and LTS.
>
>>
>>> Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
>>> vma")
>>> Cc: Michal Hocko <[email protected]>
>>> Cc: Vlastimil Babka <[email protected]>
>>> Cc: David Rientjes <[email protected]>
>>> Cc: Kirill A. Shutemov <[email protected]>
>>> Signed-off-by: Yang Shi <[email protected]>
>>> ---
>>> v2: Check VM_NOHUGEPAGE per Michal Hocko
>>>
>>>   mm/huge_memory.c | 4 ++--
>>>   mm/shmem.c       | 3 +++
>>>   2 files changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 165ea46..5881e82 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -67,8 +67,8 @@ bool transparent_hugepage_enabled(struct
>>> vm_area_struct *vma)
>>>   {
>>>       if (vma_is_anonymous(vma))
>>>           return __transparent_hugepage_enabled(vma);
>>> -    if (vma_is_shmem(vma) && shmem_huge_enabled(vma))
>>> -        return __transparent_hugepage_enabled(vma);
>>> +    if (vma_is_shmem(vma))
>>> +        return shmem_huge_enabled(vma);
>>>         return false;
>>>   }
>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>> index 2275a0f..6f09a31 100644
>>> --- a/mm/shmem.c
>>> +++ b/mm/shmem.c
>>> @@ -3873,6 +3873,9 @@ bool shmem_huge_enabled(struct vm_area_struct
>>> *vma)
>>>       loff_t i_size;
>>>       pgoff_t off;
>>>   +    if ((vma->vm_flags & VM_NOHUGEPAGE) ||
>>> +        test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>>> +        return false;
>>>       if (shmem_huge == SHMEM_HUGE_FORCE)
>>>           return true;
>>>       if (shmem_huge == SHMEM_HUGE_DENY)
>>> --
>>> 1.8.3.1
>>>
>

2019-04-24 08:07:38

by Michal Hocko

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility

On Tue 23-04-19 17:22:36, Yang Shi wrote:
>
>
> On 4/23/19 11:34 AM, Yang Shi wrote:
> >
> >
> > On 4/23/19 10:52 AM, Michal Hocko wrote:
> > > On Wed 24-04-19 00:43:01, Yang Shi wrote:
> > > > The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility
> > > > for each
> > > > vma") introduced THPeligible bit for processes' smaps. But, when
> > > > checking
> > > > the eligibility for shmem vma, __transparent_hugepage_enabled() is
> > > > called to override the result from shmem_huge_enabled().? It may result
> > > > in the anonymous vma's THP flag override shmem's.? For example,
> > > > running a
> > > > simple test which create THP for shmem, but with anonymous THP
> > > > disabled,
> > > > when reading the process's smaps, it may show:
> > > >
> > > > 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
> > > > Size:?????????????? 4096 kB
> > > > ...
> > > > [snip]
> > > > ...
> > > > ShmemPmdMapped:???? 4096 kB
> > > > ...
> > > > [snip]
> > > > ...
> > > > THPeligible:??? 0
> > > >
> > > > And, /proc/meminfo does show THP allocated and PMD mapped too:
> > > >
> > > > ShmemHugePages:???? 4096 kB
> > > > ShmemPmdMapped:???? 4096 kB
> > > >
> > > > This doesn't make too much sense.? The anonymous THP flag should not
> > > > intervene shmem THP.? Calling shmem_huge_enabled() with checking
> > > > MMF_DISABLE_THP sounds good enough.? And, we could skip stack and
> > > > dax vma check since we already checked if the vma is shmem already.
> > > Kirill, can we get a confirmation that this is really intended behavior
> > > rather than an omission please? Is this documented? What is a global
> > > knob to simply disable THP system wise?
> > >
> > > I have to say that the THP tuning API is one giant mess :/
> > >
> > > Btw. this patch also seem to fix khugepaged behavior because it
> > > previously
> > > ignored both VM_NOHUGEPAGE and MMF_DISABLE_THP.
>
> Second look shows this is not ignored. hugepage_vma_check() would check this
> for both anonymous vma and shmem vma before scanning. It is called before
> shmem_huge_enabled().

Right. I have missed the earlier check. The main question remains
though.

--
Michal Hocko
SUSE Labs

2019-04-28 19:14:33

by Yang Shi

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility



On 4/23/19 10:52 AM, Michal Hocko wrote:
> On Wed 24-04-19 00:43:01, Yang Shi wrote:
>> The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
>> vma") introduced THPeligible bit for processes' smaps. But, when checking
>> the eligibility for shmem vma, __transparent_hugepage_enabled() is
>> called to override the result from shmem_huge_enabled(). It may result
>> in the anonymous vma's THP flag override shmem's. For example, running a
>> simple test which create THP for shmem, but with anonymous THP disabled,
>> when reading the process's smaps, it may show:
>>
>> 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
>> Size: 4096 kB
>> ...
>> [snip]
>> ...
>> ShmemPmdMapped: 4096 kB
>> ...
>> [snip]
>> ...
>> THPeligible: 0
>>
>> And, /proc/meminfo does show THP allocated and PMD mapped too:
>>
>> ShmemHugePages: 4096 kB
>> ShmemPmdMapped: 4096 kB
>>
>> This doesn't make too much sense. The anonymous THP flag should not
>> intervene shmem THP. Calling shmem_huge_enabled() with checking
>> MMF_DISABLE_THP sounds good enough. And, we could skip stack and
>> dax vma check since we already checked if the vma is shmem already.
> Kirill, can we get a confirmation that this is really intended behavior
> rather than an omission please? Is this documented? What is a global
> knob to simply disable THP system wise?

Hi Kirill,

Ping. Any comment?

Thanks,
Yang

>
> I have to say that the THP tuning API is one giant mess :/
>
> Btw. this patch also seem to fix khugepaged behavior because it previously
> ignored both VM_NOHUGEPAGE and MMF_DISABLE_THP.
>
>> Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma")
>> Cc: Michal Hocko <[email protected]>
>> Cc: Vlastimil Babka <[email protected]>
>> Cc: David Rientjes <[email protected]>
>> Cc: Kirill A. Shutemov <[email protected]>
>> Signed-off-by: Yang Shi <[email protected]>
>> ---
>> v2: Check VM_NOHUGEPAGE per Michal Hocko
>>
>> mm/huge_memory.c | 4 ++--
>> mm/shmem.c | 3 +++
>> 2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 165ea46..5881e82 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -67,8 +67,8 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma)
>> {
>> if (vma_is_anonymous(vma))
>> return __transparent_hugepage_enabled(vma);
>> - if (vma_is_shmem(vma) && shmem_huge_enabled(vma))
>> - return __transparent_hugepage_enabled(vma);
>> + if (vma_is_shmem(vma))
>> + return shmem_huge_enabled(vma);
>>
>> return false;
>> }
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 2275a0f..6f09a31 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -3873,6 +3873,9 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
>> loff_t i_size;
>> pgoff_t off;
>>
>> + if ((vma->vm_flags & VM_NOHUGEPAGE) ||
>> + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>> + return false;
>> if (shmem_huge == SHMEM_HUGE_FORCE)
>> return true;
>> if (shmem_huge == SHMEM_HUGE_DENY)
>> --
>> 1.8.3.1
>>

2019-05-06 23:39:00

by Yang Shi

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility



On 4/28/19 12:13 PM, Yang Shi wrote:
>
>
> On 4/23/19 10:52 AM, Michal Hocko wrote:
>> On Wed 24-04-19 00:43:01, Yang Shi wrote:
>>> The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for
>>> each
>>> vma") introduced THPeligible bit for processes' smaps. But, when
>>> checking
>>> the eligibility for shmem vma, __transparent_hugepage_enabled() is
>>> called to override the result from shmem_huge_enabled().  It may result
>>> in the anonymous vma's THP flag override shmem's.  For example,
>>> running a
>>> simple test which create THP for shmem, but with anonymous THP
>>> disabled,
>>> when reading the process's smaps, it may show:
>>>
>>> 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
>>> Size:               4096 kB
>>> ...
>>> [snip]
>>> ...
>>> ShmemPmdMapped:     4096 kB
>>> ...
>>> [snip]
>>> ...
>>> THPeligible:    0
>>>
>>> And, /proc/meminfo does show THP allocated and PMD mapped too:
>>>
>>> ShmemHugePages:     4096 kB
>>> ShmemPmdMapped:     4096 kB
>>>
>>> This doesn't make too much sense.  The anonymous THP flag should not
>>> intervene shmem THP.  Calling shmem_huge_enabled() with checking
>>> MMF_DISABLE_THP sounds good enough.  And, we could skip stack and
>>> dax vma check since we already checked if the vma is shmem already.
>> Kirill, can we get a confirmation that this is really intended behavior
>> rather than an omission please? Is this documented? What is a global
>> knob to simply disable THP system wise?
>
> Hi Kirill,
>
> Ping. Any comment?

Talked with Kirill at LSFMM, it sounds this is kind of intended behavior
according to him. But, we all agree it looks inconsistent.

So, we may have two options:
    - Just fix the false negative issue as what the patch does
    - Change the behavior to make it more consistent

I'm not sure whether anyone relies on the behavior explicitly or
implicitly or not.

If we would like to change the behavior, I may consider to take a step
further to refactor the code a little bit to use huge_fault() to handle
THP fault instead of falling back to handle_pte_fault() in the current
implementation. This may make adding THP for other filesystems easier.

>
> Thanks,
> Yang
>
>>
>> I have to say that the THP tuning API is one giant mess :/
>>
>> Btw. this patch also seem to fix khugepaged behavior because it
>> previously
>> ignored both VM_NOHUGEPAGE and MMF_DISABLE_THP.
>>
>>> Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
>>> vma")
>>> Cc: Michal Hocko <[email protected]>
>>> Cc: Vlastimil Babka <[email protected]>
>>> Cc: David Rientjes <[email protected]>
>>> Cc: Kirill A. Shutemov <[email protected]>
>>> Signed-off-by: Yang Shi <[email protected]>
>>> ---
>>> v2: Check VM_NOHUGEPAGE per Michal Hocko
>>>
>>>   mm/huge_memory.c | 4 ++--
>>>   mm/shmem.c       | 3 +++
>>>   2 files changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 165ea46..5881e82 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -67,8 +67,8 @@ bool transparent_hugepage_enabled(struct
>>> vm_area_struct *vma)
>>>   {
>>>       if (vma_is_anonymous(vma))
>>>           return __transparent_hugepage_enabled(vma);
>>> -    if (vma_is_shmem(vma) && shmem_huge_enabled(vma))
>>> -        return __transparent_hugepage_enabled(vma);
>>> +    if (vma_is_shmem(vma))
>>> +        return shmem_huge_enabled(vma);
>>>         return false;
>>>   }
>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>> index 2275a0f..6f09a31 100644
>>> --- a/mm/shmem.c
>>> +++ b/mm/shmem.c
>>> @@ -3873,6 +3873,9 @@ bool shmem_huge_enabled(struct vm_area_struct
>>> *vma)
>>>       loff_t i_size;
>>>       pgoff_t off;
>>>   +    if ((vma->vm_flags & VM_NOHUGEPAGE) ||
>>> +        test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>>> +        return false;
>>>       if (shmem_huge == SHMEM_HUGE_FORCE)
>>>           return true;
>>>       if (shmem_huge == SHMEM_HUGE_DENY)
>>> --
>>> 1.8.3.1
>>>
>

2019-05-07 10:48:24

by Michal Hocko

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility


[Hmm, I thought, Hugh was CCed]

On Mon 06-05-19 16:37:42, Yang Shi wrote:
>
>
> On 4/28/19 12:13 PM, Yang Shi wrote:
> >
> >
> > On 4/23/19 10:52 AM, Michal Hocko wrote:
> > > On Wed 24-04-19 00:43:01, Yang Shi wrote:
> > > > The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility
> > > > for each
> > > > vma") introduced THPeligible bit for processes' smaps. But, when
> > > > checking
> > > > the eligibility for shmem vma, __transparent_hugepage_enabled() is
> > > > called to override the result from shmem_huge_enabled().? It may result
> > > > in the anonymous vma's THP flag override shmem's.? For example,
> > > > running a
> > > > simple test which create THP for shmem, but with anonymous THP
> > > > disabled,
> > > > when reading the process's smaps, it may show:
> > > >
> > > > 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
> > > > Size:?????????????? 4096 kB
> > > > ...
> > > > [snip]
> > > > ...
> > > > ShmemPmdMapped:???? 4096 kB
> > > > ...
> > > > [snip]
> > > > ...
> > > > THPeligible:??? 0
> > > >
> > > > And, /proc/meminfo does show THP allocated and PMD mapped too:
> > > >
> > > > ShmemHugePages:???? 4096 kB
> > > > ShmemPmdMapped:???? 4096 kB
> > > >
> > > > This doesn't make too much sense.? The anonymous THP flag should not
> > > > intervene shmem THP.? Calling shmem_huge_enabled() with checking
> > > > MMF_DISABLE_THP sounds good enough.? And, we could skip stack and
> > > > dax vma check since we already checked if the vma is shmem already.
> > > Kirill, can we get a confirmation that this is really intended behavior
> > > rather than an omission please? Is this documented? What is a global
> > > knob to simply disable THP system wise?
> >
> > Hi Kirill,
> >
> > Ping. Any comment?
>
> Talked with Kirill at LSFMM, it sounds this is kind of intended behavior
> according to him. But, we all agree it looks inconsistent.
>
> So, we may have two options:
> ??? - Just fix the false negative issue as what the patch does
> ??? - Change the behavior to make it more consistent
>
> I'm not sure whether anyone relies on the behavior explicitly or implicitly
> or not.

Well, I would be certainly more happy with a more consistent behavior.
Talked to Hugh at LSFMM about this and he finds treating shmem objects
separately from the anonymous memory. And that is already the case
partially when each mount point might have its own setup. So the primary
question is whether we need a one global knob to controll all THP
allocations. One argument to have that is that it might be helpful to
for an admin to simply disable source of THP at a single place rather
than crawling over all shmem mount points and remount them. Especially
in environments where shmem points are mounted in a container by a
non-root. Why would somebody wanted something like that? One example
would be to temporarily workaround high order allocations issues which
we have seen non trivial amount of in the past and we are likely not at
the end of the tunel.

That being said I would be in favor of treating the global sysfs knob to
be global for all THP allocations. I will not push back on that if there
is a general consensus that shmem and fs in general are a different
class of objects and a single global control is not desirable for
whatever reasons.

Kirill, Hugh othe folks?
--
Michal Hocko
SUSE Labs

2019-05-07 17:12:32

by Yang Shi

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility



On 5/7/19 3:47 AM, Michal Hocko wrote:
> [Hmm, I thought, Hugh was CCed]
>
> On Mon 06-05-19 16:37:42, Yang Shi wrote:
>>
>> On 4/28/19 12:13 PM, Yang Shi wrote:
>>>
>>> On 4/23/19 10:52 AM, Michal Hocko wrote:
>>>> On Wed 24-04-19 00:43:01, Yang Shi wrote:
>>>>> The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility
>>>>> for each
>>>>> vma") introduced THPeligible bit for processes' smaps. But, when
>>>>> checking
>>>>> the eligibility for shmem vma, __transparent_hugepage_enabled() is
>>>>> called to override the result from shmem_huge_enabled().  It may result
>>>>> in the anonymous vma's THP flag override shmem's.  For example,
>>>>> running a
>>>>> simple test which create THP for shmem, but with anonymous THP
>>>>> disabled,
>>>>> when reading the process's smaps, it may show:
>>>>>
>>>>> 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
>>>>> Size:               4096 kB
>>>>> ...
>>>>> [snip]
>>>>> ...
>>>>> ShmemPmdMapped:     4096 kB
>>>>> ...
>>>>> [snip]
>>>>> ...
>>>>> THPeligible:    0
>>>>>
>>>>> And, /proc/meminfo does show THP allocated and PMD mapped too:
>>>>>
>>>>> ShmemHugePages:     4096 kB
>>>>> ShmemPmdMapped:     4096 kB
>>>>>
>>>>> This doesn't make too much sense.  The anonymous THP flag should not
>>>>> intervene shmem THP.  Calling shmem_huge_enabled() with checking
>>>>> MMF_DISABLE_THP sounds good enough.  And, we could skip stack and
>>>>> dax vma check since we already checked if the vma is shmem already.
>>>> Kirill, can we get a confirmation that this is really intended behavior
>>>> rather than an omission please? Is this documented? What is a global
>>>> knob to simply disable THP system wise?
>>> Hi Kirill,
>>>
>>> Ping. Any comment?
>> Talked with Kirill at LSFMM, it sounds this is kind of intended behavior
>> according to him. But, we all agree it looks inconsistent.
>>
>> So, we may have two options:
>>     - Just fix the false negative issue as what the patch does
>>     - Change the behavior to make it more consistent
>>
>> I'm not sure whether anyone relies on the behavior explicitly or implicitly
>> or not.
> Well, I would be certainly more happy with a more consistent behavior.
> Talked to Hugh at LSFMM about this and he finds treating shmem objects
> separately from the anonymous memory. And that is already the case
> partially when each mount point might have its own setup. So the primary
> question is whether we need a one global knob to controll all THP
> allocations. One argument to have that is that it might be helpful to
> for an admin to simply disable source of THP at a single place rather
> than crawling over all shmem mount points and remount them. Especially
> in environments where shmem points are mounted in a container by a
> non-root. Why would somebody wanted something like that? One example
> would be to temporarily workaround high order allocations issues which
> we have seen non trivial amount of in the past and we are likely not at
> the end of the tunel.

Shmem has a global control for such use. Setting shmem_enabled to
"force" or "deny" would enable or disable THP for shmem globally,
including non-fs objects, i.e. memfd, SYS V shmem, etc.

>
> That being said I would be in favor of treating the global sysfs knob to
> be global for all THP allocations. I will not push back on that if there
> is a general consensus that shmem and fs in general are a different
> class of objects and a single global control is not desirable for
> whatever reasons.

OK, we need more inputs from Kirill, Hugh and other folks.

>
> Kirill, Hugh othe folks?

2019-06-06 20:18:17

by Yang Shi

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility



On 5/7/19 10:10 AM, Yang Shi wrote:
>
>
> On 5/7/19 3:47 AM, Michal Hocko wrote:
>> [Hmm, I thought, Hugh was CCed]
>>
>> On Mon 06-05-19 16:37:42, Yang Shi wrote:
>>>
>>> On 4/28/19 12:13 PM, Yang Shi wrote:
>>>>
>>>> On 4/23/19 10:52 AM, Michal Hocko wrote:
>>>>> On Wed 24-04-19 00:43:01, Yang Shi wrote:
>>>>>> The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility
>>>>>> for each
>>>>>> vma") introduced THPeligible bit for processes' smaps. But, when
>>>>>> checking
>>>>>> the eligibility for shmem vma, __transparent_hugepage_enabled() is
>>>>>> called to override the result from shmem_huge_enabled().  It may
>>>>>> result
>>>>>> in the anonymous vma's THP flag override shmem's.  For example,
>>>>>> running a
>>>>>> simple test which create THP for shmem, but with anonymous THP
>>>>>> disabled,
>>>>>> when reading the process's smaps, it may show:
>>>>>>
>>>>>> 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
>>>>>> Size:               4096 kB
>>>>>> ...
>>>>>> [snip]
>>>>>> ...
>>>>>> ShmemPmdMapped:     4096 kB
>>>>>> ...
>>>>>> [snip]
>>>>>> ...
>>>>>> THPeligible:    0
>>>>>>
>>>>>> And, /proc/meminfo does show THP allocated and PMD mapped too:
>>>>>>
>>>>>> ShmemHugePages:     4096 kB
>>>>>> ShmemPmdMapped:     4096 kB
>>>>>>
>>>>>> This doesn't make too much sense.  The anonymous THP flag should not
>>>>>> intervene shmem THP.  Calling shmem_huge_enabled() with checking
>>>>>> MMF_DISABLE_THP sounds good enough.  And, we could skip stack and
>>>>>> dax vma check since we already checked if the vma is shmem already.
>>>>> Kirill, can we get a confirmation that this is really intended
>>>>> behavior
>>>>> rather than an omission please? Is this documented? What is a global
>>>>> knob to simply disable THP system wise?
>>>> Hi Kirill,
>>>>
>>>> Ping. Any comment?
>>> Talked with Kirill at LSFMM, it sounds this is kind of intended
>>> behavior
>>> according to him. But, we all agree it looks inconsistent.
>>>
>>> So, we may have two options:
>>>      - Just fix the false negative issue as what the patch does
>>>      - Change the behavior to make it more consistent
>>>
>>> I'm not sure whether anyone relies on the behavior explicitly or
>>> implicitly
>>> or not.
>> Well, I would be certainly more happy with a more consistent behavior.
>> Talked to Hugh at LSFMM about this and he finds treating shmem objects
>> separately from the anonymous memory. And that is already the case
>> partially when each mount point might have its own setup. So the primary
>> question is whether we need a one global knob to controll all THP
>> allocations. One argument to have that is that it might be helpful to
>> for an admin to simply disable source of THP at a single place rather
>> than crawling over all shmem mount points and remount them. Especially
>> in environments where shmem points are mounted in a container by a
>> non-root. Why would somebody wanted something like that? One example
>> would be to temporarily workaround high order allocations issues which
>> we have seen non trivial amount of in the past and we are likely not at
>> the end of the tunel.
>
> Shmem has a global control for such use. Setting shmem_enabled to
> "force" or "deny" would enable or disable THP for shmem globally,
> including non-fs objects, i.e. memfd, SYS V shmem, etc.
>
>>
>> That being said I would be in favor of treating the global sysfs knob to
>> be global for all THP allocations. I will not push back on that if there
>> is a general consensus that shmem and fs in general are a different
>> class of objects and a single global control is not desirable for
>> whatever reasons.
>
> OK, we need more inputs from Kirill, Hugh and other folks.

[Forgot cc to mailing lists]

Hi guys,

How should we move forward for this one? Make the sysfs knob
(/sys/kernel/mm/transparent_hugepage/enabled) to be global for both
anonymous and tmpfs? Or just treat shmem objects separately from anon
memory then fix the false-negative of THP eligibility by this patch?

>
>>
>> Kirill, Hugh othe folks?
>

2019-06-07 11:56:26

by Hugh Dickins

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility

On Thu, 6 Jun 2019, Yang Shi wrote:
> On 5/7/19 10:10 AM, Yang Shi wrote:
> > On 5/7/19 3:47 AM, Michal Hocko wrote:
> > > [Hmm, I thought, Hugh was CCed]
> > >
> > > On Mon 06-05-19 16:37:42, Yang Shi wrote:
> > > >
> > > > On 4/28/19 12:13 PM, Yang Shi wrote:
> > > > >
> > > > > On 4/23/19 10:52 AM, Michal Hocko wrote:
> > > > > > On Wed 24-04-19 00:43:01, Yang Shi wrote:
> > > > > > > The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility
> > > > > > > for each
> > > > > > > vma") introduced THPeligible bit for processes' smaps. But, when
> > > > > > > checking
> > > > > > > the eligibility for shmem vma, __transparent_hugepage_enabled()
> > > > > > > is
> > > > > > > called to override the result from shmem_huge_enabled().  It may
> > > > > > > result
> > > > > > > in the anonymous vma's THP flag override shmem's.  For example,
> > > > > > > running a
> > > > > > > simple test which create THP for shmem, but with anonymous THP
> > > > > > > disabled,
> > > > > > > when reading the process's smaps, it may show:
> > > > > > >
> > > > > > > 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
> > > > > > > Size:               4096 kB
> > > > > > > ...
> > > > > > > [snip]
> > > > > > > ...
> > > > > > > ShmemPmdMapped:     4096 kB
> > > > > > > ...
> > > > > > > [snip]
> > > > > > > ...
> > > > > > > THPeligible:    0
> > > > > > >
> > > > > > > And, /proc/meminfo does show THP allocated and PMD mapped too:
> > > > > > >
> > > > > > > ShmemHugePages:     4096 kB
> > > > > > > ShmemPmdMapped:     4096 kB
> > > > > > >
> > > > > > > This doesn't make too much sense.  The anonymous THP flag should
> > > > > > > not
> > > > > > > intervene shmem THP.  Calling shmem_huge_enabled() with checking
> > > > > > > MMF_DISABLE_THP sounds good enough.  And, we could skip stack and
> > > > > > > dax vma check since we already checked if the vma is shmem
> > > > > > > already.
> > > > > > Kirill, can we get a confirmation that this is really intended
> > > > > > behavior
> > > > > > rather than an omission please? Is this documented? What is a
> > > > > > global
> > > > > > knob to simply disable THP system wise?
> > > > > Hi Kirill,
> > > > >
> > > > > Ping. Any comment?
> > > > Talked with Kirill at LSFMM, it sounds this is kind of intended
> > > > behavior
> > > > according to him. But, we all agree it looks inconsistent.
> > > >
> > > > So, we may have two options:
> > > >      - Just fix the false negative issue as what the patch does
> > > >      - Change the behavior to make it more consistent
> > > >
> > > > I'm not sure whether anyone relies on the behavior explicitly or
> > > > implicitly
> > > > or not.
> > > Well, I would be certainly more happy with a more consistent behavior.
> > > Talked to Hugh at LSFMM about this and he finds treating shmem objects
> > > separately from the anonymous memory. And that is already the case
> > > partially when each mount point might have its own setup. So the primary
> > > question is whether we need a one global knob to controll all THP
> > > allocations. One argument to have that is that it might be helpful to
> > > for an admin to simply disable source of THP at a single place rather
> > > than crawling over all shmem mount points and remount them. Especially
> > > in environments where shmem points are mounted in a container by a
> > > non-root. Why would somebody wanted something like that? One example
> > > would be to temporarily workaround high order allocations issues which
> > > we have seen non trivial amount of in the past and we are likely not at
> > > the end of the tunel.
> >
> > Shmem has a global control for such use. Setting shmem_enabled to "force"
> > or "deny" would enable or disable THP for shmem globally, including non-fs
> > objects, i.e. memfd, SYS V shmem, etc.
> >
> > >
> > > That being said I would be in favor of treating the global sysfs knob to
> > > be global for all THP allocations. I will not push back on that if there
> > > is a general consensus that shmem and fs in general are a different
> > > class of objects and a single global control is not desirable for
> > > whatever reasons.
> >
> > OK, we need more inputs from Kirill, Hugh and other folks.
>
> [Forgot cc to mailing lists]
>
> Hi guys,
>
> How should we move forward for this one? Make the sysfs knob
> (/sys/kernel/mm/transparent_hugepage/enabled) to be global for both anonymous
> and tmpfs? Or just treat shmem objects separately from anon memory then fix
> the false-negative of THP eligibility by this patch?

Sorry for not getting back to you sooner on this.

I don't like to drive design by smaps. I agree with the word "mess" used
several times of THP tunings in this thread, but it's too easy to make
that mess worse by unnecessary changes, so I'm very cautious here.

The addition of "THPeligible" without an "Anon" in its name was
unfortunate. I suppose we're two releases too late to change that.

Applying process (PR_SET_THP_DISABLE) and mm (MADV_*HUGEPAGE)
limitations to shared filesystem objects doesn't work all that well.

I recommend that you continue to treat shmem objects separately from
anon memory, and just make the smaps "THPeligible" more often accurate.

Is your v2 patch earlier in this thread the best for that?
No answer tonight, I'll re-examine later in the day.

Hugh

2019-06-07 14:27:24

by Michal Hocko

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility

On Fri 07-06-19 03:57:18, Hugh Dickins wrote:
[...]
> The addition of "THPeligible" without an "Anon" in its name was
> unfortunate. I suppose we're two releases too late to change that.

Well, I do not really see any reason why THPeligible should be Anon
specific at all. Even if ...

> Applying process (PR_SET_THP_DISABLE) and mm (MADV_*HUGEPAGE)
> limitations to shared filesystem objects doesn't work all that well.

... this is what we are going with then it is really important to have a
single place to query the eligibility IMHO.

> I recommend that you continue to treat shmem objects separately from
> anon memory, and just make the smaps "THPeligible" more often accurate.

Agreed on this.

--
Michal Hocko
SUSE Labs

2019-06-07 19:03:51

by Yang Shi

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility



On 6/7/19 3:57 AM, Hugh Dickins wrote:
> On Thu, 6 Jun 2019, Yang Shi wrote:
>> On 5/7/19 10:10 AM, Yang Shi wrote:
>>> On 5/7/19 3:47 AM, Michal Hocko wrote:
>>>> [Hmm, I thought, Hugh was CCed]
>>>>
>>>> On Mon 06-05-19 16:37:42, Yang Shi wrote:
>>>>> On 4/28/19 12:13 PM, Yang Shi wrote:
>>>>>> On 4/23/19 10:52 AM, Michal Hocko wrote:
>>>>>>> On Wed 24-04-19 00:43:01, Yang Shi wrote:
>>>>>>>> The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility
>>>>>>>> for each
>>>>>>>> vma") introduced THPeligible bit for processes' smaps. But, when
>>>>>>>> checking
>>>>>>>> the eligibility for shmem vma, __transparent_hugepage_enabled()
>>>>>>>> is
>>>>>>>> called to override the result from shmem_huge_enabled().  It may
>>>>>>>> result
>>>>>>>> in the anonymous vma's THP flag override shmem's.  For example,
>>>>>>>> running a
>>>>>>>> simple test which create THP for shmem, but with anonymous THP
>>>>>>>> disabled,
>>>>>>>> when reading the process's smaps, it may show:
>>>>>>>>
>>>>>>>> 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
>>>>>>>> Size:               4096 kB
>>>>>>>> ...
>>>>>>>> [snip]
>>>>>>>> ...
>>>>>>>> ShmemPmdMapped:     4096 kB
>>>>>>>> ...
>>>>>>>> [snip]
>>>>>>>> ...
>>>>>>>> THPeligible:    0
>>>>>>>>
>>>>>>>> And, /proc/meminfo does show THP allocated and PMD mapped too:
>>>>>>>>
>>>>>>>> ShmemHugePages:     4096 kB
>>>>>>>> ShmemPmdMapped:     4096 kB
>>>>>>>>
>>>>>>>> This doesn't make too much sense.  The anonymous THP flag should
>>>>>>>> not
>>>>>>>> intervene shmem THP.  Calling shmem_huge_enabled() with checking
>>>>>>>> MMF_DISABLE_THP sounds good enough.  And, we could skip stack and
>>>>>>>> dax vma check since we already checked if the vma is shmem
>>>>>>>> already.
>>>>>>> Kirill, can we get a confirmation that this is really intended
>>>>>>> behavior
>>>>>>> rather than an omission please? Is this documented? What is a
>>>>>>> global
>>>>>>> knob to simply disable THP system wise?
>>>>>> Hi Kirill,
>>>>>>
>>>>>> Ping. Any comment?
>>>>> Talked with Kirill at LSFMM, it sounds this is kind of intended
>>>>> behavior
>>>>> according to him. But, we all agree it looks inconsistent.
>>>>>
>>>>> So, we may have two options:
>>>>>      - Just fix the false negative issue as what the patch does
>>>>>      - Change the behavior to make it more consistent
>>>>>
>>>>> I'm not sure whether anyone relies on the behavior explicitly or
>>>>> implicitly
>>>>> or not.
>>>> Well, I would be certainly more happy with a more consistent behavior.
>>>> Talked to Hugh at LSFMM about this and he finds treating shmem objects
>>>> separately from the anonymous memory. And that is already the case
>>>> partially when each mount point might have its own setup. So the primary
>>>> question is whether we need a one global knob to controll all THP
>>>> allocations. One argument to have that is that it might be helpful to
>>>> for an admin to simply disable source of THP at a single place rather
>>>> than crawling over all shmem mount points and remount them. Especially
>>>> in environments where shmem points are mounted in a container by a
>>>> non-root. Why would somebody wanted something like that? One example
>>>> would be to temporarily workaround high order allocations issues which
>>>> we have seen non trivial amount of in the past and we are likely not at
>>>> the end of the tunel.
>>> Shmem has a global control for such use. Setting shmem_enabled to "force"
>>> or "deny" would enable or disable THP for shmem globally, including non-fs
>>> objects, i.e. memfd, SYS V shmem, etc.
>>>
>>>> That being said I would be in favor of treating the global sysfs knob to
>>>> be global for all THP allocations. I will not push back on that if there
>>>> is a general consensus that shmem and fs in general are a different
>>>> class of objects and a single global control is not desirable for
>>>> whatever reasons.
>>> OK, we need more inputs from Kirill, Hugh and other folks.
>> [Forgot cc to mailing lists]
>>
>> Hi guys,
>>
>> How should we move forward for this one? Make the sysfs knob
>> (/sys/kernel/mm/transparent_hugepage/enabled) to be global for both anonymous
>> and tmpfs? Or just treat shmem objects separately from anon memory then fix
>> the false-negative of THP eligibility by this patch?
> Sorry for not getting back to you sooner on this.
>
> I don't like to drive design by smaps. I agree with the word "mess" used
> several times of THP tunings in this thread, but it's too easy to make
> that mess worse by unnecessary changes, so I'm very cautious here.
>
> The addition of "THPeligible" without an "Anon" in its name was
> unfortunate. I suppose we're two releases too late to change that.

The smaps shows it is anon vma or shmem vma for the most cases.

>
> Applying process (PR_SET_THP_DISABLE) and mm (MADV_*HUGEPAGE)
> limitations to shared filesystem objects doesn't work all that well.

The THP eligibility indicator is per vma, it just reports whether THP is
eligible for a specific vma. So, I'm supposed it should keep consistent
with MMF_DISABLE_THP and MADV_*HUGEPAGE setting.

The current implementation in shmem and kuhugepaged also checks these.

>
> I recommend that you continue to treat shmem objects separately from
> anon memory, and just make the smaps "THPeligible" more often accurate.
>
> Is your v2 patch earlier in this thread the best for that?

The v2 patch treats shmem objects separately from anon memory and it
makes the "THPeligible" more often accurate.

> No answer tonight, I'll re-examine later in the day.
>
> Hugh

2019-06-08 04:32:22

by Hugh Dickins

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility

On Wed, 24 Apr 2019, Yang Shi wrote:

> The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
> vma") introduced THPeligible bit for processes' smaps. But, when checking
> the eligibility for shmem vma, __transparent_hugepage_enabled() is
> called to override the result from shmem_huge_enabled(). It may result
> in the anonymous vma's THP flag override shmem's. For example, running a
> simple test which create THP for shmem, but with anonymous THP disabled,
> when reading the process's smaps, it may show:
>
> 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
> Size: 4096 kB
> ...
> [snip]
> ...
> ShmemPmdMapped: 4096 kB
> ...
> [snip]
> ...
> THPeligible: 0
>
> And, /proc/meminfo does show THP allocated and PMD mapped too:
>
> ShmemHugePages: 4096 kB
> ShmemPmdMapped: 4096 kB
>
> This doesn't make too much sense. The anonymous THP flag should not
> intervene shmem THP. Calling shmem_huge_enabled() with checking
> MMF_DISABLE_THP sounds good enough. And, we could skip stack and
> dax vma check since we already checked if the vma is shmem already.
>
> Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma")
> Cc: Michal Hocko <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: Kirill A. Shutemov <[email protected]>
> Signed-off-by: Yang Shi <[email protected]>
> ---
> v2: Check VM_NOHUGEPAGE per Michal Hocko
>
> mm/huge_memory.c | 4 ++--
> mm/shmem.c | 3 +++
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 165ea46..5881e82 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -67,8 +67,8 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma)
> {
> if (vma_is_anonymous(vma))
> return __transparent_hugepage_enabled(vma);
> - if (vma_is_shmem(vma) && shmem_huge_enabled(vma))
> - return __transparent_hugepage_enabled(vma);
> + if (vma_is_shmem(vma))
> + return shmem_huge_enabled(vma);
>
> return false;
> }
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 2275a0f..6f09a31 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3873,6 +3873,9 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
> loff_t i_size;
> pgoff_t off;
>
> + if ((vma->vm_flags & VM_NOHUGEPAGE) ||
> + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> + return false;

Yes, that is correct; and correctly placed. But a little more is needed:
see how mm/memory.c's transhuge_vma_suitable() will only allow a pmd to
be used instead of a pte if the vma offset and size permit. smaps should
not report a shmem vma as THPeligible if its offset or size prevent it.

And I see that should also be fixed on anon vmas: at present smaps
reports even a 4kB anon vma as THPeligible, which is not right.
Maybe a test like transhuge_vma_suitable() can be added into
transparent_hugepage_enabled(), to handle anon and shmem together.
I say "like transhuge_vma_suitable()", because that function needs
an address, which here you don't have.

The anon offset situation is interesting: usually anon vm_pgoff is
initialized to fit with its vm_start, so the anon offset check passes;
but I wonder what happens after mremap to a different address - does
transhuge_vma_suitable() then prevent the use of pmds where they could
actually be used? Not a Number#1 priority to investigate or fix here!
but a curiosity someone might want to look into.

> if (shmem_huge == SHMEM_HUGE_FORCE)
> return true;
> if (shmem_huge == SHMEM_HUGE_DENY)
> --
> 1.8.3.1


Even with your changes
ShmemPmdMapped: 4096 kB
THPeligible: 0
will easily be seen: THPeligible reflects whether a huge page can be
allocated and mapped by pmd in that vma; but if something else already
allocated the huge page earlier, it will be mapped by pmd in this vma
if offset and size allow, whatever THPeligible says. We could change
transhuge_vma_suitable() to force ptes in that case, but it would be
a silly change, just to make what smaps shows easier to explain.

Hugh

2019-06-10 17:33:57

by Yang Shi

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility



On 6/7/19 8:58 PM, Hugh Dickins wrote:
> On Wed, 24 Apr 2019, Yang Shi wrote:
>
>> The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
>> vma") introduced THPeligible bit for processes' smaps. But, when checking
>> the eligibility for shmem vma, __transparent_hugepage_enabled() is
>> called to override the result from shmem_huge_enabled(). It may result
>> in the anonymous vma's THP flag override shmem's. For example, running a
>> simple test which create THP for shmem, but with anonymous THP disabled,
>> when reading the process's smaps, it may show:
>>
>> 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
>> Size: 4096 kB
>> ...
>> [snip]
>> ...
>> ShmemPmdMapped: 4096 kB
>> ...
>> [snip]
>> ...
>> THPeligible: 0
>>
>> And, /proc/meminfo does show THP allocated and PMD mapped too:
>>
>> ShmemHugePages: 4096 kB
>> ShmemPmdMapped: 4096 kB
>>
>> This doesn't make too much sense. The anonymous THP flag should not
>> intervene shmem THP. Calling shmem_huge_enabled() with checking
>> MMF_DISABLE_THP sounds good enough. And, we could skip stack and
>> dax vma check since we already checked if the vma is shmem already.
>>
>> Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma")
>> Cc: Michal Hocko <[email protected]>
>> Cc: Vlastimil Babka <[email protected]>
>> Cc: David Rientjes <[email protected]>
>> Cc: Kirill A. Shutemov <[email protected]>
>> Signed-off-by: Yang Shi <[email protected]>
>> ---
>> v2: Check VM_NOHUGEPAGE per Michal Hocko
>>
>> mm/huge_memory.c | 4 ++--
>> mm/shmem.c | 3 +++
>> 2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 165ea46..5881e82 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -67,8 +67,8 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma)
>> {
>> if (vma_is_anonymous(vma))
>> return __transparent_hugepage_enabled(vma);
>> - if (vma_is_shmem(vma) && shmem_huge_enabled(vma))
>> - return __transparent_hugepage_enabled(vma);
>> + if (vma_is_shmem(vma))
>> + return shmem_huge_enabled(vma);
>>
>> return false;
>> }
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 2275a0f..6f09a31 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -3873,6 +3873,9 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
>> loff_t i_size;
>> pgoff_t off;
>>
>> + if ((vma->vm_flags & VM_NOHUGEPAGE) ||
>> + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>> + return false;
> Yes, that is correct; and correctly placed. But a little more is needed:
> see how mm/memory.c's transhuge_vma_suitable() will only allow a pmd to
> be used instead of a pte if the vma offset and size permit. smaps should
> not report a shmem vma as THPeligible if its offset or size prevent it.
>
> And I see that should also be fixed on anon vmas: at present smaps
> reports even a 4kB anon vma as THPeligible, which is not right.
> Maybe a test like transhuge_vma_suitable() can be added into
> transparent_hugepage_enabled(), to handle anon and shmem together.
> I say "like transhuge_vma_suitable()", because that function needs
> an address, which here you don't have.

Thanks for the remind. Since we don't have an address I'm supposed we
just need check if the vma's size is big enough or not other than other
alignment check.

And, I'm wondering whether we could reuse transhuge_vma_suitable() by
passing in an impossible address, i.e. -1 since it is not a valid
userspace address. It can be used as and indicator that this call is
from THPeligible context.

>
> The anon offset situation is interesting: usually anon vm_pgoff is
> initialized to fit with its vm_start, so the anon offset check passes;
> but I wonder what happens after mremap to a different address - does
> transhuge_vma_suitable() then prevent the use of pmds where they could
> actually be used? Not a Number#1 priority to investigate or fix here!
> but a curiosity someone might want to look into.

Will mark on my TODO list.

>
>> if (shmem_huge == SHMEM_HUGE_FORCE)
>> return true;
>> if (shmem_huge == SHMEM_HUGE_DENY)
>> --
>> 1.8.3.1
>
> Even with your changes
> ShmemPmdMapped: 4096 kB
> THPeligible: 0
> will easily be seen: THPeligible reflects whether a huge page can be
> allocated and mapped by pmd in that vma; but if something else already
> allocated the huge page earlier, it will be mapped by pmd in this vma
> if offset and size allow, whatever THPeligible says. We could change
> transhuge_vma_suitable() to force ptes in that case, but it would be
> a silly change, just to make what smaps shows easier to explain.

Where did this come from? From the commit log? If so it is the example
for the wrong smap output. If that case really happens, I think we could
document it since THPeligible should just show the current status.

>
> Hugh

2019-06-12 18:47:09

by Hugh Dickins

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility

On Mon, 10 Jun 2019, Yang Shi wrote:
> On 6/7/19 8:58 PM, Hugh Dickins wrote:
> > Yes, that is correct; and correctly placed. But a little more is needed:
> > see how mm/memory.c's transhuge_vma_suitable() will only allow a pmd to
> > be used instead of a pte if the vma offset and size permit. smaps should
> > not report a shmem vma as THPeligible if its offset or size prevent it.
> >
> > And I see that should also be fixed on anon vmas: at present smaps
> > reports even a 4kB anon vma as THPeligible, which is not right.
> > Maybe a test like transhuge_vma_suitable() can be added into
> > transparent_hugepage_enabled(), to handle anon and shmem together.
> > I say "like transhuge_vma_suitable()", because that function needs
> > an address, which here you don't have.
>
> Thanks for the remind. Since we don't have an address I'm supposed we just
> need check if the vma's size is big enough or not other than other alignment
> check.
>
> And, I'm wondering whether we could reuse transhuge_vma_suitable() by passing
> in an impossible address, i.e. -1 since it is not a valid userspace address.
> It can be used as and indicator that this call is from THPeligible context.

Perhaps, but sounds like it will abuse and uglify transhuge_vma_suitable()
just for smaps. Would passing transhuge_vma_suitable() the address
((vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE)
give the the correct answer in all cases?

> >
> > The anon offset situation is interesting: usually anon vm_pgoff is
> > initialized to fit with its vm_start, so the anon offset check passes;
> > but I wonder what happens after mremap to a different address - does
> > transhuge_vma_suitable() then prevent the use of pmds where they could
> > actually be used? Not a Number#1 priority to investigate or fix here!
> > but a curiosity someone might want to look into.
>
> Will mark on my TODO list.
>
> > Even with your changes
> > ShmemPmdMapped: 4096 kB
> > THPeligible: 0
> > will easily be seen: THPeligible reflects whether a huge page can be
> > allocated and mapped by pmd in that vma; but if something else already
> > allocated the huge page earlier, it will be mapped by pmd in this vma
> > if offset and size allow, whatever THPeligible says. We could change
> > transhuge_vma_suitable() to force ptes in that case, but it would be
> > a silly change, just to make what smaps shows easier to explain.
>
> Where did this come from? From the commit log? If so it is the example for
> the wrong smap output. If that case really happens, I think we could document
> it since THPeligible should just show the current status.

Please read again what I explained there: it's not necessarily an example
of wrong smaps output, it's reasonable smaps output for a reasonable case.

Yes, maybe Documentation/filesystems/proc.txt should explain "THPeligble"
a little better - "eligible for allocating THP pages" rather than just
"eligible for THP pages" would be good enough? we don't want to write
a book about the various cases.

Oh, and the "THPeligible" output lines up very nicely there in proc.txt:
could the actual alignment of that 0 or 1 be fixed in smaps itself too?

Thanks,
Hugh

2019-06-12 19:59:58

by Yang Shi

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: thp: fix false negative of shmem vma's THP eligibility



On 6/12/19 11:44 AM, Hugh Dickins wrote:
> On Mon, 10 Jun 2019, Yang Shi wrote:
>> On 6/7/19 8:58 PM, Hugh Dickins wrote:
>>> Yes, that is correct; and correctly placed. But a little more is needed:
>>> see how mm/memory.c's transhuge_vma_suitable() will only allow a pmd to
>>> be used instead of a pte if the vma offset and size permit. smaps should
>>> not report a shmem vma as THPeligible if its offset or size prevent it.
>>>
>>> And I see that should also be fixed on anon vmas: at present smaps
>>> reports even a 4kB anon vma as THPeligible, which is not right.
>>> Maybe a test like transhuge_vma_suitable() can be added into
>>> transparent_hugepage_enabled(), to handle anon and shmem together.
>>> I say "like transhuge_vma_suitable()", because that function needs
>>> an address, which here you don't have.
>> Thanks for the remind. Since we don't have an address I'm supposed we just
>> need check if the vma's size is big enough or not other than other alignment
>> check.
>>
>> And, I'm wondering whether we could reuse transhuge_vma_suitable() by passing
>> in an impossible address, i.e. -1 since it is not a valid userspace address.
>> It can be used as and indicator that this call is from THPeligible context.
> Perhaps, but sounds like it will abuse and uglify transhuge_vma_suitable()
> just for smaps. Would passing transhuge_vma_suitable() the address
> ((vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE)
> give the the correct answer in all cases?

Yes, it looks better.

>
>>> The anon offset situation is interesting: usually anon vm_pgoff is
>>> initialized to fit with its vm_start, so the anon offset check passes;
>>> but I wonder what happens after mremap to a different address - does
>>> transhuge_vma_suitable() then prevent the use of pmds where they could
>>> actually be used? Not a Number#1 priority to investigate or fix here!
>>> but a curiosity someone might want to look into.
>> Will mark on my TODO list.
>>
>>> Even with your changes
>>> ShmemPmdMapped: 4096 kB
>>> THPeligible: 0
>>> will easily be seen: THPeligible reflects whether a huge page can be
>>> allocated and mapped by pmd in that vma; but if something else already
>>> allocated the huge page earlier, it will be mapped by pmd in this vma
>>> if offset and size allow, whatever THPeligible says. We could change
>>> transhuge_vma_suitable() to force ptes in that case, but it would be
>>> a silly change, just to make what smaps shows easier to explain.
>> Where did this come from? From the commit log? If so it is the example for
>> the wrong smap output. If that case really happens, I think we could document
>> it since THPeligible should just show the current status.
> Please read again what I explained there: it's not necessarily an example
> of wrong smaps output, it's reasonable smaps output for a reasonable case.
>
> Yes, maybe Documentation/filesystems/proc.txt should explain "THPeligble"
> a little better - "eligible for allocating THP pages" rather than just
> "eligible for THP pages" would be good enough? we don't want to write
> a book about the various cases.

Yes, I agree.

>
> Oh, and the "THPeligible" output lines up very nicely there in proc.txt:
> could the actual alignment of that 0 or 1 be fixed in smaps itself too?

Sure.

Thanks,
Yang

>
> Thanks,
> Hugh