2020-04-02 02:02:44

by Huang, Ying

[permalink] [raw]
Subject: [PATCH -V2] /proc/PID/smaps: Add PMD migration entry parsing

From: Huang Ying <[email protected]>

Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply
ignored. To improve the accuracy of /proc/PID/smaps, its parsing and processing
is added.

Before the patch, for a fully populated 400 MB anonymous VMA, sometimes some THP
pages under migration may be lost as follows.

7f3f6a7e5000-7f3f837e5000 rw-p 00000000 00:00 0
Size: 409600 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 407552 kB
Pss: 407552 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 407552 kB
Referenced: 301056 kB
Anonymous: 407552 kB
LazyFree: 0 kB
AnonHugePages: 405504 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
THPeligible: 1
VmFlags: rd wr mr mw me ac

After the patch, it will be always,

7f3f6a7e5000-7f3f837e5000 rw-p 00000000 00:00 0
Size: 409600 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 409600 kB
Pss: 409600 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 409600 kB
Referenced: 294912 kB
Anonymous: 409600 kB
LazyFree: 0 kB
AnonHugePages: 407552 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
THPeligible: 1
VmFlags: rd wr mr mw me ac

Signed-off-by: "Huang, Ying" <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Yang Shi <[email protected]>
---

v2:

- Use thp_migration_supported() in condition to reduce code size if THP
migration isn't enabled.

- Replace VM_BUG_ON() with VM_WARN_ON_ONCE(), it's not necessary to nuking
kernel for this.

---
fs/proc/task_mmu.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 8d382d4ec067..9c72f9ce2dd8 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -546,10 +546,19 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
struct mem_size_stats *mss = walk->private;
struct vm_area_struct *vma = walk->vma;
bool locked = !!(vma->vm_flags & VM_LOCKED);
- struct page *page;
+ struct page *page = NULL;

- /* FOLL_DUMP will return -EFAULT on huge zero page */
- page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
+ if (pmd_present(*pmd)) {
+ /* FOLL_DUMP will return -EFAULT on huge zero page */
+ page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
+ } else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) {
+ swp_entry_t entry = pmd_to_swp_entry(*pmd);
+
+ if (is_migration_entry(entry))
+ page = migration_entry_to_page(entry);
+ else
+ VM_WARN_ON_ONCE(1);
+ }
if (IS_ERR_OR_NULL(page))
return;
if (PageAnon(page))
@@ -578,8 +587,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,

ptl = pmd_trans_huge_lock(pmd, vma);
if (ptl) {
- if (pmd_present(*pmd))
- smaps_pmd_entry(pmd, addr, walk);
+ smaps_pmd_entry(pmd, addr, walk);
spin_unlock(ptl);
goto out;
}
--
2.25.0


2020-04-02 06:45:26

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH -V2] /proc/PID/smaps: Add PMD migration entry parsing

On Thu 02-04-20 10:00:31, Huang, Ying wrote:
> From: Huang Ying <[email protected]>
>
> Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply
> ignored. To improve the accuracy of /proc/PID/smaps, its parsing and processing
> is added.
>
> Before the patch, for a fully populated 400 MB anonymous VMA, sometimes some THP
> pages under migration may be lost as follows.

Interesting. How did you reproduce this?
[...]

> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 8d382d4ec067..9c72f9ce2dd8 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -546,10 +546,19 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
> struct mem_size_stats *mss = walk->private;
> struct vm_area_struct *vma = walk->vma;
> bool locked = !!(vma->vm_flags & VM_LOCKED);
> - struct page *page;
> + struct page *page = NULL;
>
> - /* FOLL_DUMP will return -EFAULT on huge zero page */
> - page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
> + if (pmd_present(*pmd)) {
> + /* FOLL_DUMP will return -EFAULT on huge zero page */
> + page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
> + } else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) {
> + swp_entry_t entry = pmd_to_swp_entry(*pmd);
> +
> + if (is_migration_entry(entry))
> + page = migration_entry_to_page(entry);
> + else
> + VM_WARN_ON_ONCE(1);

Could you explain why do we need this WARN_ON? I haven't really checked
the swap support for THP but cannot we have normal swap pmd entries?

> + }
> if (IS_ERR_OR_NULL(page))
> return;
> if (PageAnon(page))
> @@ -578,8 +587,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>
> ptl = pmd_trans_huge_lock(pmd, vma);
> if (ptl) {
> - if (pmd_present(*pmd))
> - smaps_pmd_entry(pmd, addr, walk);
> + smaps_pmd_entry(pmd, addr, walk);
> spin_unlock(ptl);
> goto out;
> }
> --
> 2.25.0

--
Michal Hocko
SUSE Labs

2020-04-02 07:44:50

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH -V2] /proc/PID/smaps: Add PMD migration entry parsing

On Thu 02-04-20 15:03:23, Huang, Ying wrote:
> Michal Hocko <[email protected]> writes:
>
> > On Thu 02-04-20 10:00:31, Huang, Ying wrote:
> >> From: Huang Ying <[email protected]>
> >>
> >> Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply
> >> ignored. To improve the accuracy of /proc/PID/smaps, its parsing and processing
> >> is added.
> >>
> >> Before the patch, for a fully populated 400 MB anonymous VMA, sometimes some THP
> >> pages under migration may be lost as follows.
> >
> > Interesting. How did you reproduce this?
> > [...]
>
> I run the pmbench in background to eat memory, then run
> `/usr/bin/migratepages` and `cat /proc/PID/smaps` every second. The
> issue can be reproduced within 60 seconds.

Please add that information to the changelog. I was probably too
optimistic about the migration duration because I found it highly
unlikely to be visible. I was clearly wrong here.

> >> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> >> index 8d382d4ec067..9c72f9ce2dd8 100644
> >> --- a/fs/proc/task_mmu.c
> >> +++ b/fs/proc/task_mmu.c
> >> @@ -546,10 +546,19 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
> >> struct mem_size_stats *mss = walk->private;
> >> struct vm_area_struct *vma = walk->vma;
> >> bool locked = !!(vma->vm_flags & VM_LOCKED);
> >> - struct page *page;
> >> + struct page *page = NULL;
> >>
> >> - /* FOLL_DUMP will return -EFAULT on huge zero page */
> >> - page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
> >> + if (pmd_present(*pmd)) {
> >> + /* FOLL_DUMP will return -EFAULT on huge zero page */
> >> + page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
> >> + } else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) {
> >> + swp_entry_t entry = pmd_to_swp_entry(*pmd);
> >> +
> >> + if (is_migration_entry(entry))
> >> + page = migration_entry_to_page(entry);
> >> + else
> >> + VM_WARN_ON_ONCE(1);
> >
> > Could you explain why do we need this WARN_ON? I haven't really checked
> > the swap support for THP but cannot we have normal swap pmd entries?
>
> I have some patches to add the swap pmd entry support, but they haven't
> been merged yet.
>
> Similar checks are for all THP migration code paths, so I follow the
> same style.

I haven't checked other migration code paths but what is the reason to
add the warning here? Even if this shouldn't happen, smaps is perfectly
fine to ignore that situation, no?
--
Michal Hocko
SUSE Labs

2020-04-02 08:22:19

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH -V2] /proc/PID/smaps: Add PMD migration entry parsing

On Thu 02-04-20 16:10:29, Huang, Ying wrote:
> Michal Hocko <[email protected]> writes:
>
> > On Thu 02-04-20 15:03:23, Huang, Ying wrote:
[...]
> >> > Could you explain why do we need this WARN_ON? I haven't really checked
> >> > the swap support for THP but cannot we have normal swap pmd entries?
> >>
> >> I have some patches to add the swap pmd entry support, but they haven't
> >> been merged yet.
> >>
> >> Similar checks are for all THP migration code paths, so I follow the
> >> same style.
> >
> > I haven't checked other migration code paths but what is the reason to
> > add the warning here? Even if this shouldn't happen, smaps is perfectly
> > fine to ignore that situation, no?
>
> Yes. smaps itself is perfectly fine to ignore it. I think this is used
> to find bugs in other code paths such as THP migration related.

Please do not add new warnings without a good an strong reasons. As a
matter of fact there are people running with panic_on_warn and each
warning is fatal for them. Please also note that this is a user trigable
path and that requires even more care.

--
Michal Hocko
SUSE Labs

2020-04-02 09:00:40

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH -V2] /proc/PID/smaps: Add PMD migration entry parsing

On Thu 02-04-20 11:29:09, Konstantin Khlebnikov wrote:
>
>
> On 02/04/2020 11.21, Michal Hocko wrote:
> > On Thu 02-04-20 16:10:29, Huang, Ying wrote:
> > > Michal Hocko <[email protected]> writes:
> > >
> > > > On Thu 02-04-20 15:03:23, Huang, Ying wrote:
> > [...]
> > > > > > Could you explain why do we need this WARN_ON? I haven't really checked
> > > > > > the swap support for THP but cannot we have normal swap pmd entries?
> > > > >
> > > > > I have some patches to add the swap pmd entry support, but they haven't
> > > > > been merged yet.
> > > > >
> > > > > Similar checks are for all THP migration code paths, so I follow the
> > > > > same style.
> > > >
> > > > I haven't checked other migration code paths but what is the reason to
> > > > add the warning here? Even if this shouldn't happen, smaps is perfectly
> > > > fine to ignore that situation, no?
> > >
> > > Yes. smaps itself is perfectly fine to ignore it. I think this is used
> > > to find bugs in other code paths such as THP migration related.
> >
> > Please do not add new warnings without a good an strong reasons. As a
> > matter of fact there are people running with panic_on_warn and each
> > warning is fatal for them. Please also note that this is a user trigable
> > path and that requires even more care.
> >
>
> But this should not happen and if it does we'll never know without debug.

The migration path which already deals with this will notice, right?
Those are paths which really care about consistency.

> VM_WARN_ON checks something only if build with CONFIG_DEBUG_VM=y.
>
> Anybody who runs debug kernels with panic_on_warn shouldn't expect much stability =)

That doesn't mean we should be adding warnings here and there nilly
willy.

--
Michal Hocko
SUSE Labs

2020-04-02 09:04:05

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH -V2] /proc/PID/smaps: Add PMD migration entry parsing



On 02/04/2020 11.21, Michal Hocko wrote:
> On Thu 02-04-20 16:10:29, Huang, Ying wrote:
>> Michal Hocko <[email protected]> writes:
>>
>>> On Thu 02-04-20 15:03:23, Huang, Ying wrote:
> [...]
>>>>> Could you explain why do we need this WARN_ON? I haven't really checked
>>>>> the swap support for THP but cannot we have normal swap pmd entries?
>>>>
>>>> I have some patches to add the swap pmd entry support, but they haven't
>>>> been merged yet.
>>>>
>>>> Similar checks are for all THP migration code paths, so I follow the
>>>> same style.
>>>
>>> I haven't checked other migration code paths but what is the reason to
>>> add the warning here? Even if this shouldn't happen, smaps is perfectly
>>> fine to ignore that situation, no?
>>
>> Yes. smaps itself is perfectly fine to ignore it. I think this is used
>> to find bugs in other code paths such as THP migration related.
>
> Please do not add new warnings without a good an strong reasons. As a
> matter of fact there are people running with panic_on_warn and each
> warning is fatal for them. Please also note that this is a user trigable
> path and that requires even more care.
>

But this should not happen and if it does we'll never know without debug.
VM_WARN_ON checks something only if build with CONFIG_DEBUG_VM=y.

Anybody who runs debug kernels with panic_on_warn shouldn't expect much stability =)

2020-04-02 13:05:17

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH -V2] /proc/PID/smaps: Add PMD migration entry parsing

On Thu, Apr 02, 2020 at 10:00:31AM +0800, Huang, Ying wrote:
> From: Huang Ying <[email protected]>
>
> Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply
> ignored. To improve the accuracy of /proc/PID/smaps, its parsing and processing
> is added.
>
> Before the patch, for a fully populated 400 MB anonymous VMA, sometimes some THP
> pages under migration may be lost as follows.
>
> 7f3f6a7e5000-7f3f837e5000 rw-p 00000000 00:00 0
> Size: 409600 kB
> KernelPageSize: 4 kB
> MMUPageSize: 4 kB
> Rss: 407552 kB
> Pss: 407552 kB
> Shared_Clean: 0 kB
> Shared_Dirty: 0 kB
> Private_Clean: 0 kB
> Private_Dirty: 407552 kB
> Referenced: 301056 kB
> Anonymous: 407552 kB
> LazyFree: 0 kB
> AnonHugePages: 405504 kB
> ShmemPmdMapped: 0 kB
> FilePmdMapped: 0 kB

The alignment makes me triggered.

Andrew, could you please apply this patch:

http://lore.kernel.org/r/[email protected]

--
Kirill A. Shutemov

2020-04-02 15:54:27

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH -V2] /proc/PID/smaps: Add PMD migration entry parsing



On 4/2/20 12:44 AM, Michal Hocko wrote:
> On Thu 02-04-20 15:03:23, Huang, Ying wrote:
>> Michal Hocko <[email protected]> writes:
>>
>>> On Thu 02-04-20 10:00:31, Huang, Ying wrote:
>>>> From: Huang Ying <[email protected]>
>>>>
>>>> Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply
>>>> ignored. To improve the accuracy of /proc/PID/smaps, its parsing and processing
>>>> is added.
>>>>
>>>> Before the patch, for a fully populated 400 MB anonymous VMA, sometimes some THP
>>>> pages under migration may be lost as follows.
>>> Interesting. How did you reproduce this?
>>> [...]
>> I run the pmbench in background to eat memory, then run
>> `/usr/bin/migratepages` and `cat /proc/PID/smaps` every second. The
>> issue can be reproduced within 60 seconds.
> Please add that information to the changelog. I was probably too
> optimistic about the migration duration because I found it highly
> unlikely to be visible. I was clearly wrong here.

I believe that depends on the page is shared by how many processes. If
it is not shared then it should just take dozens micro seconds in my
test FYI.

>
>>>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>>>> index 8d382d4ec067..9c72f9ce2dd8 100644
>>>> --- a/fs/proc/task_mmu.c
>>>> +++ b/fs/proc/task_mmu.c
>>>> @@ -546,10 +546,19 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
>>>> struct mem_size_stats *mss = walk->private;
>>>> struct vm_area_struct *vma = walk->vma;
>>>> bool locked = !!(vma->vm_flags & VM_LOCKED);
>>>> - struct page *page;
>>>> + struct page *page = NULL;
>>>>
>>>> - /* FOLL_DUMP will return -EFAULT on huge zero page */
>>>> - page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>>>> + if (pmd_present(*pmd)) {
>>>> + /* FOLL_DUMP will return -EFAULT on huge zero page */
>>>> + page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>>>> + } else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) {
>>>> + swp_entry_t entry = pmd_to_swp_entry(*pmd);
>>>> +
>>>> + if (is_migration_entry(entry))
>>>> + page = migration_entry_to_page(entry);
>>>> + else
>>>> + VM_WARN_ON_ONCE(1);
>>> Could you explain why do we need this WARN_ON? I haven't really checked
>>> the swap support for THP but cannot we have normal swap pmd entries?
>> I have some patches to add the swap pmd entry support, but they haven't
>> been merged yet.
>>
>> Similar checks are for all THP migration code paths, so I follow the
>> same style.
> I haven't checked other migration code paths but what is the reason to
> add the warning here? Even if this shouldn't happen, smaps is perfectly
> fine to ignore that situation, no?