2020-08-30 20:59:41

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 0/5] mm: fixes to past from future testing

Here's a set of independent fixes against 5.9-rc2: prompted by
testing Alex Shi's "warning on !memcg" and lru_lock series, but
I think fit for 5.9 - though maybe only the first for stable.

[PATCH 1/5] ksm: reinstate memcg charge on copied pages
[PATCH 2/5] mm: migration of hugetlbfs page skip memcg
[PATCH 3/5] shmem: shmem_writepage() split unlikely i915 THP
[PATCH 4/5] mm: fix check_move_unevictable_pages() on THP
[PATCH 5/5] mlock: fix unevictable_pgs event counts on THP

mm/ksm.c | 4 ++++
mm/migrate.c | 3 ++-
mm/mlock.c | 24 +++++++++++++++---------
mm/shmem.c | 10 +++++++++-
mm/swap.c | 6 +++---
mm/vmscan.c | 10 ++++++++--
6 files changed, 41 insertions(+), 16 deletions(-)


2020-08-30 21:02:45

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 1/5] ksm: reinstate memcg charge on copied pages

In 5.8 some instances of memcg charging in do_swap_page() and unuse_pte()
were removed, on the understanding that swap cache is now already charged
at those points; but a case was missed, when ksm_might_need_to_copy() has
decided it must allocate a substitute page: such pages were never charged.
Fix it inside ksm_might_need_to_copy().

This was discovered by Alex Shi's prospective commit "mm/memcg: warning
on !memcg after readahead page charged".

But there is a another surprise: this also fixes some rarer uncharged
PageAnon cases, when KSM is configured in, but has never been activated.
ksm_might_need_to_copy()'s anon_vma->root and linear_page_index() check
sometimes catches a case which would need to have been copied if KSM
were turned on. Or that's my optimistic interpretation (of my own old
code), but it leaves some doubt as to whether everything is working as
intended there - might it hint at rare anon ptes which rmap cannot find?
A question not easily answered: put in the fix for missed memcg charges.

Fixes: 4c6355b25e8b ("mm: memcontrol: charge swapin pages on instantiation")
Signed-off-by: Hugh Dickins <[email protected]>
Cc: [email protected] # v5.8
---

mm/ksm.c | 4 ++++
1 file changed, 4 insertions(+)

--- 5.9-rc2/mm/ksm.c 2020-08-16 17:32:50.645506940 -0700
+++ linux/mm/ksm.c 2020-08-28 17:42:07.967278385 -0700
@@ -2582,6 +2582,10 @@ struct page *ksm_might_need_to_copy(stru
return page; /* let do_swap_page report the error */

new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
+ if (new_page && mem_cgroup_charge(new_page, vma->vm_mm, GFP_KERNEL)) {
+ put_page(new_page);
+ new_page = NULL;
+ }
if (new_page) {
copy_user_highpage(new_page, page, address, vma);

2020-08-30 21:05:42

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 2/5] mm: migration of hugetlbfs page skip memcg

hugetlbfs pages do not participate in memcg: so although they do find
most of migrate_page_states() useful, it would be better if they did
not call into mem_cgroup_migrate() - where Qian Cai reported that LTP's
move_pages12 triggers the warning in Alex Shi's prospective commit
"mm/memcg: warning on !memcg after readahead page charged".

Signed-off-by: Hugh Dickins <[email protected]>
---
This fixes a likely future warning, but is just a cleanup right now.

mm/migrate.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- 5.9-rc2/mm/migrate.c 2020-08-16 17:32:50.665507048 -0700
+++ linux/mm/migrate.c 2020-08-28 17:42:07.967278385 -0700
@@ -668,7 +668,8 @@ void migrate_page_states(struct page *ne

copy_page_owner(page, newpage);

- mem_cgroup_migrate(page, newpage);
+ if (!PageHuge(page))
+ mem_cgroup_migrate(page, newpage);
}
EXPORT_SYMBOL(migrate_page_states);

2020-08-30 21:09:24

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 4/5] mm: fix check_move_unevictable_pages() on THP

check_move_unevictable_pages() is used in making unevictable shmem pages
evictable: by shmem_unlock_mapping(), drm_gem_check_release_pagevec() and
i915/gem check_release_pagevec(). Those may pass down subpages of a huge
page, when /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force".

That does not crash or warn at present, but the accounting of vmstats
unevictable_pgs_scanned and unevictable_pgs_rescued is inconsistent:
scanned being incremented on each subpage, rescued only on the head
(since tails already appear evictable once the head has been updated).

5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has
established that vm_events in general (and unevictable_pgs_rescued in
particular) should count every subpage: so follow that precedent here.

Do this in such a way that if mem_cgroup_page_lruvec() is made stricter
(to check page->mem_cgroup is always set), no problem: skip the tails
before calling it, and add thp_nr_pages() to vmstats on the head.

Signed-off-by: Hugh Dickins <[email protected]>
---
Nothing here worth going to stable, since it's just a testing config
that is fixed, whose event numbers are not very important; but this
will be needed before Alex Shi's warning, and might as well go in now.

The callers of check_move_unevictable_pages() could be optimized,
to skip over tails: but Matthew Wilcox has other changes in flight
there, so let's skip the optimization for now.

mm/vmscan.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

--- 5.9-rc2/mm/vmscan.c 2020-08-16 17:32:50.721507348 -0700
+++ linux/mm/vmscan.c 2020-08-28 17:47:10.595580876 -0700
@@ -4260,8 +4260,14 @@ void check_move_unevictable_pages(struct
for (i = 0; i < pvec->nr; i++) {
struct page *page = pvec->pages[i];
struct pglist_data *pagepgdat = page_pgdat(page);
+ int nr_pages;
+
+ if (PageTransTail(page))
+ continue;
+
+ nr_pages = thp_nr_pages(page);
+ pgscanned += nr_pages;

- pgscanned++;
if (pagepgdat != pgdat) {
if (pgdat)
spin_unlock_irq(&pgdat->lru_lock);
@@ -4280,7 +4286,7 @@ void check_move_unevictable_pages(struct
ClearPageUnevictable(page);
del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE);
add_page_to_lru_list(page, lruvec, lru);
- pgrescued++;
+ pgrescued += nr_pages;
}
}

2020-08-30 21:13:22

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 5/5] mlock: fix unevictable_pgs event counts on THP

5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has
established that vm_events should count every subpage of a THP,
including unevictable_pgs_culled and unevictable_pgs_rescued; but
lru_cache_add_inactive_or_unevictable() was not doing so for
unevictable_pgs_mlocked, and mm/mlock.c was not doing so for
unevictable_pgs mlocked, munlocked, cleared and stranded.

Fix them; but THPs don't go the pagevec way in mlock.c,
so no fixes needed on that path.

Fixes: 5d91f31faf8e ("mm: swap: fix vmstats for huge page")
Signed-off-by: Hugh Dickins <[email protected]>
---
I've only checked UNEVICTABLEs: there may be more inconsistencies left.
The check_move_unevictable_pages() patch brought me to this one, but
this is more important because mlock works on all THPs, without needing
special testing "force". But, it's still just monotonically increasing
event counts, so not all that important.

mm/mlock.c | 24 +++++++++++++++---------
mm/swap.c | 6 +++---
2 files changed, 18 insertions(+), 12 deletions(-)

--- 5.9-rc2/mm/mlock.c 2020-08-16 17:32:50.665507048 -0700
+++ linux/mm/mlock.c 2020-08-28 17:42:07.975278411 -0700
@@ -58,11 +58,14 @@ EXPORT_SYMBOL(can_do_mlock);
*/
void clear_page_mlock(struct page *page)
{
+ int nr_pages;
+
if (!TestClearPageMlocked(page))
return;

- mod_zone_page_state(page_zone(page), NR_MLOCK, -thp_nr_pages(page));
- count_vm_event(UNEVICTABLE_PGCLEARED);
+ nr_pages = thp_nr_pages(page);
+ mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
+ count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages);
/*
* The previous TestClearPageMlocked() corresponds to the smp_mb()
* in __pagevec_lru_add_fn().
@@ -76,7 +79,7 @@ void clear_page_mlock(struct page *page)
* We lost the race. the page already moved to evictable list.
*/
if (PageUnevictable(page))
- count_vm_event(UNEVICTABLE_PGSTRANDED);
+ count_vm_events(UNEVICTABLE_PGSTRANDED, nr_pages);
}
}

@@ -93,9 +96,10 @@ void mlock_vma_page(struct page *page)
VM_BUG_ON_PAGE(PageCompound(page) && PageDoubleMap(page), page);

if (!TestSetPageMlocked(page)) {
- mod_zone_page_state(page_zone(page), NR_MLOCK,
- thp_nr_pages(page));
- count_vm_event(UNEVICTABLE_PGMLOCKED);
+ int nr_pages = thp_nr_pages(page);
+
+ mod_zone_page_state(page_zone(page), NR_MLOCK, nr_pages);
+ count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages);
if (!isolate_lru_page(page))
putback_lru_page(page);
}
@@ -138,7 +142,7 @@ static void __munlock_isolated_page(stru

/* Did try_to_unlock() succeed or punt? */
if (!PageMlocked(page))
- count_vm_event(UNEVICTABLE_PGMUNLOCKED);
+ count_vm_events(UNEVICTABLE_PGMUNLOCKED, thp_nr_pages(page));

putback_lru_page(page);
}
@@ -154,10 +158,12 @@ static void __munlock_isolated_page(stru
*/
static void __munlock_isolation_failed(struct page *page)
{
+ int nr_pages = thp_nr_pages(page);
+
if (PageUnevictable(page))
- __count_vm_event(UNEVICTABLE_PGSTRANDED);
+ __count_vm_events(UNEVICTABLE_PGSTRANDED, nr_pages);
else
- __count_vm_event(UNEVICTABLE_PGMUNLOCKED);
+ __count_vm_events(UNEVICTABLE_PGMUNLOCKED, nr_pages);
}

/**
--- 5.9-rc2/mm/swap.c 2020-08-16 17:32:50.709507284 -0700
+++ linux/mm/swap.c 2020-08-28 17:42:07.975278411 -0700
@@ -494,14 +494,14 @@ void lru_cache_add_inactive_or_unevictab

unevictable = (vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) == VM_LOCKED;
if (unlikely(unevictable) && !TestSetPageMlocked(page)) {
+ int nr_pages = thp_nr_pages(page);
/*
* We use the irq-unsafe __mod_zone_page_stat because this
* counter is not modified from interrupt context, and the pte
* lock is held(spinlock), which implies preemption disabled.
*/
- __mod_zone_page_state(page_zone(page), NR_MLOCK,
- thp_nr_pages(page));
- count_vm_event(UNEVICTABLE_PGMLOCKED);
+ __mod_zone_page_state(page_zone(page), NR_MLOCK, nr_pages);
+ count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages);
}
lru_cache_add(page);
}

2020-08-31 18:40:29

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH 1/5] ksm: reinstate memcg charge on copied pages

On Sun, Aug 30, 2020 at 1:59 PM Hugh Dickins <[email protected]> wrote:
>
> In 5.8 some instances of memcg charging in do_swap_page() and unuse_pte()
> were removed, on the understanding that swap cache is now already charged
> at those points; but a case was missed, when ksm_might_need_to_copy() has
> decided it must allocate a substitute page: such pages were never charged.
> Fix it inside ksm_might_need_to_copy().
>
> This was discovered by Alex Shi's prospective commit "mm/memcg: warning
> on !memcg after readahead page charged".
>
> But there is a another surprise: this also fixes some rarer uncharged
> PageAnon cases, when KSM is configured in, but has never been activated.
> ksm_might_need_to_copy()'s anon_vma->root and linear_page_index() check
> sometimes catches a case which would need to have been copied if KSM
> were turned on. Or that's my optimistic interpretation (of my own old
> code), but it leaves some doubt as to whether everything is working as
> intended there - might it hint at rare anon ptes which rmap cannot find?
> A question not easily answered: put in the fix for missed memcg charges.
>
> Fixes: 4c6355b25e8b ("mm: memcontrol: charge swapin pages on instantiation")
> Signed-off-by: Hugh Dickins <[email protected]>
> Cc: [email protected] # v5.8

Reviewed-by: Shakeel Butt <[email protected]>

2020-08-31 18:41:02

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH 2/5] mm: migration of hugetlbfs page skip memcg

On Sun, Aug 30, 2020 at 2:01 PM Hugh Dickins <[email protected]> wrote:
>
> hugetlbfs pages do not participate in memcg: so although they do find
> most of migrate_page_states() useful, it would be better if they did
> not call into mem_cgroup_migrate() - where Qian Cai reported that LTP's
> move_pages12 triggers the warning in Alex Shi's prospective commit
> "mm/memcg: warning on !memcg after readahead page charged".
>
> Signed-off-by: Hugh Dickins <[email protected]>

Reviewed-by: Shakeel Butt <[email protected]>

2020-08-31 18:41:17

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH 4/5] mm: fix check_move_unevictable_pages() on THP

On Sun, Aug 30, 2020 at 2:08 PM Hugh Dickins <[email protected]> wrote:
>
> check_move_unevictable_pages() is used in making unevictable shmem pages
> evictable: by shmem_unlock_mapping(), drm_gem_check_release_pagevec() and
> i915/gem check_release_pagevec(). Those may pass down subpages of a huge
> page, when /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force".
>
> That does not crash or warn at present, but the accounting of vmstats
> unevictable_pgs_scanned and unevictable_pgs_rescued is inconsistent:
> scanned being incremented on each subpage, rescued only on the head
> (since tails already appear evictable once the head has been updated).
>
> 5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has
> established that vm_events in general (and unevictable_pgs_rescued in
> particular) should count every subpage: so follow that precedent here.
>
> Do this in such a way that if mem_cgroup_page_lruvec() is made stricter
> (to check page->mem_cgroup is always set), no problem: skip the tails
> before calling it, and add thp_nr_pages() to vmstats on the head.
>
> Signed-off-by: Hugh Dickins <[email protected]>

Thanks for catching this.

Reviewed-by: Shakeel Butt <[email protected]>

2020-08-31 18:41:39

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH 5/5] mlock: fix unevictable_pgs event counts on THP

On Sun, Aug 30, 2020 at 2:09 PM Hugh Dickins <[email protected]> wrote:
>
> 5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has
> established that vm_events should count every subpage of a THP,
> including unevictable_pgs_culled and unevictable_pgs_rescued; but
> lru_cache_add_inactive_or_unevictable() was not doing so for
> unevictable_pgs_mlocked, and mm/mlock.c was not doing so for
> unevictable_pgs mlocked, munlocked, cleared and stranded.
>
> Fix them; but THPs don't go the pagevec way in mlock.c,
> so no fixes needed on that path.
>
> Fixes: 5d91f31faf8e ("mm: swap: fix vmstats for huge page")
> Signed-off-by: Hugh Dickins <[email protected]>

Reviewed-by: Shakeel Butt <[email protected]>

2020-09-01 02:08:33

by Alex Shi

[permalink] [raw]
Subject: Re: [PATCH 4/5] mm: fix check_move_unevictable_pages() on THP



?? 2020/8/31 ????5:08, Hugh Dickins д??:
> check_move_unevictable_pages() is used in making unevictable shmem pages
> evictable: by shmem_unlock_mapping(), drm_gem_check_release_pagevec() and
> i915/gem check_release_pagevec(). Those may pass down subpages of a huge
> page, when /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force".
>
> That does not crash or warn at present, but the accounting of vmstats
> unevictable_pgs_scanned and unevictable_pgs_rescued is inconsistent:
> scanned being incremented on each subpage, rescued only on the head
> (since tails already appear evictable once the head has been updated).
>
> 5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has
> established that vm_events in general (and unevictable_pgs_rescued in
> particular) should count every subpage: so follow that precedent here.
>
> Do this in such a way that if mem_cgroup_page_lruvec() is made stricter
> (to check page->mem_cgroup is always set), no problem: skip the tails
> before calling it, and add thp_nr_pages() to vmstats on the head.
>
> Signed-off-by: Hugh Dickins <[email protected]>
> ---
> Nothing here worth going to stable, since it's just a testing config
> that is fixed, whose event numbers are not very important; but this
> will be needed before Alex Shi's warning, and might as well go in now.
>
> The callers of check_move_unevictable_pages() could be optimized,
> to skip over tails: but Matthew Wilcox has other changes in flight
> there, so let's skip the optimization for now.
>
> mm/vmscan.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> --- 5.9-rc2/mm/vmscan.c 2020-08-16 17:32:50.721507348 -0700
> +++ linux/mm/vmscan.c 2020-08-28 17:47:10.595580876 -0700
> @@ -4260,8 +4260,14 @@ void check_move_unevictable_pages(struct
> for (i = 0; i < pvec->nr; i++) {
> struct page *page = pvec->pages[i];
> struct pglist_data *pagepgdat = page_pgdat(page);
> + int nr_pages;
> +
> + if (PageTransTail(page))
> + continue;
> +
> + nr_pages = thp_nr_pages(page);
> + pgscanned += nr_pages;
>
> - pgscanned++;
> if (pagepgdat != pgdat) {
> if (pgdat)
> spin_unlock_irq(&pgdat->lru_lock);
> @@ -4280,7 +4286,7 @@ void check_move_unevictable_pages(struct
> ClearPageUnevictable(page);
> del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE);
> add_page_to_lru_list(page, lruvec, lru);

So, we might randomly del or add a thp tail page into lru?
It's interesting to know here. :)

Thanks
Alex

> - pgrescued++;
> + pgrescued += nr_pages;
> }
> }
>
>

2020-09-01 02:31:23

by Alex Shi

[permalink] [raw]
Subject: Re: [PATCH 0/5] mm: fixes to past from future testing



?? 2020/8/31 ????4:57, Hugh Dickins д??:
> Here's a set of independent fixes against 5.9-rc2: prompted by
> testing Alex Shi's "warning on !memcg" and lru_lock series, but
> I think fit for 5.9 - though maybe only the first for stable.
>
> [PATCH 1/5] ksm: reinstate memcg charge on copied pages
> [PATCH 2/5] mm: migration of hugetlbfs page skip memcg
> [PATCH 3/5] shmem: shmem_writepage() split unlikely i915 THP
> [PATCH 4/5] mm: fix check_move_unevictable_pages() on THP
> [PATCH 5/5] mlock: fix unevictable_pgs event counts on THP

Hi Hugh,

Thanks a lot for reporting and fix! All fixed looks fine for me.

BTW,
I assume you already rebased lru_lock patchset on this. So I don't
need to redo rebase again, do I? :)

Thanks
Alex

>
> mm/ksm.c | 4 ++++
> mm/migrate.c | 3 ++-
> mm/mlock.c | 24 +++++++++++++++---------
> mm/shmem.c | 10 +++++++++-
> mm/swap.c | 6 +++---
> mm/vmscan.c | 10 ++++++++--
> 6 files changed, 41 insertions(+), 16 deletions(-)
>

2020-09-01 04:04:46

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 4/5] mm: fix check_move_unevictable_pages() on THP

On Tue, 1 Sep 2020, Alex Shi wrote:
> 在 2020/8/31 上午5:08, Hugh Dickins 写道:
> > --- 5.9-rc2/mm/vmscan.c 2020-08-16 17:32:50.721507348 -0700
> > +++ linux/mm/vmscan.c 2020-08-28 17:47:10.595580876 -0700
> > @@ -4260,8 +4260,14 @@ void check_move_unevictable_pages(struct
> > for (i = 0; i < pvec->nr; i++) {
> > struct page *page = pvec->pages[i];
> > struct pglist_data *pagepgdat = page_pgdat(page);
> > + int nr_pages;
> > +
> > + if (PageTransTail(page))
> > + continue;
> > +
> > + nr_pages = thp_nr_pages(page);
> > + pgscanned += nr_pages;
> >
> > - pgscanned++;
> > if (pagepgdat != pgdat) {
> > if (pgdat)
> > spin_unlock_irq(&pgdat->lru_lock);
> > @@ -4280,7 +4286,7 @@ void check_move_unevictable_pages(struct
> > ClearPageUnevictable(page);
> > del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE);
> > add_page_to_lru_list(page, lruvec, lru);
>
> So, we might randomly del or add a thp tail page into lru?
> It's interesting to know here. :)

No, it wasn't that interesting, I'd have been more concerned if it was
like that. Amusing idea though: because the act of adding a thp tail
to lru would clear the very bit that says it's a tail.

if (!PageLRU(page) || !PageUnevictable(page))
continue;

Let's see: PageLRU and PageUnevictable are both of the PF_HEAD type,
so permitted on tails, but always redirecting to head: so typically
PageLRU(page) would be true, because head on lru; but PageUnevictable(page)
would be false on tail, because head has already been moved to an evictable
lru by this same function. So it safely went the "continue" way, but
without incrementing pgrescued.

Hugh

>
> Thanks
> Alex
>
> > - pgrescued++;
> > + pgrescued += nr_pages;
> > }
> > }
> >

2020-09-01 04:18:46

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 0/5] mm: fixes to past from future testing

On Tue, 1 Sep 2020, Alex Shi wrote:
> 在 2020/8/31 上午4:57, Hugh Dickins 写道:
> > Here's a set of independent fixes against 5.9-rc2: prompted by
> > testing Alex Shi's "warning on !memcg" and lru_lock series, but
> > I think fit for 5.9 - though maybe only the first for stable.
> >
> > [PATCH 1/5] ksm: reinstate memcg charge on copied pages
> > [PATCH 2/5] mm: migration of hugetlbfs page skip memcg
> > [PATCH 3/5] shmem: shmem_writepage() split unlikely i915 THP
> > [PATCH 4/5] mm: fix check_move_unevictable_pages() on THP
> > [PATCH 5/5] mlock: fix unevictable_pgs event counts on THP
>
> Hi Hugh,
>
> Thanks a lot for reporting and fix! All fixed looks fine for me.

Thanks for checking.

>
> BTW,
> I assume you already rebased lru_lock patchset on this. So I don't
> need to redo rebase again, do I? :)

That's right, no need for another posting: the only ones of yours
which don't apply cleanly on top of mine are 20/32 and 21/32,
touching check_move_unevictable_pages(); and they're easy enough
to resolve.

With my 5 fixes in, I'll advance to commenting on yours (but not today).

Hugh

2020-09-01 15:07:45

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 1/5] ksm: reinstate memcg charge on copied pages

On Sun, Aug 30, 2020 at 01:59:35PM -0700, Hugh Dickins wrote:
> In 5.8 some instances of memcg charging in do_swap_page() and unuse_pte()
> were removed, on the understanding that swap cache is now already charged
> at those points; but a case was missed, when ksm_might_need_to_copy() has
> decided it must allocate a substitute page: such pages were never charged.
> Fix it inside ksm_might_need_to_copy().
>
> This was discovered by Alex Shi's prospective commit "mm/memcg: warning
> on !memcg after readahead page charged".
>
> But there is a another surprise: this also fixes some rarer uncharged
> PageAnon cases, when KSM is configured in, but has never been activated.
> ksm_might_need_to_copy()'s anon_vma->root and linear_page_index() check
> sometimes catches a case which would need to have been copied if KSM
> were turned on. Or that's my optimistic interpretation (of my own old
> code), but it leaves some doubt as to whether everything is working as
> intended there - might it hint at rare anon ptes which rmap cannot find?
> A question not easily answered: put in the fix for missed memcg charges.
>
> Fixes: 4c6355b25e8b ("mm: memcontrol: charge swapin pages on instantiation")
> Signed-off-by: Hugh Dickins <[email protected]>
> Cc: [email protected] # v5.8

Acked-by: Johannes Weiner <[email protected]>

2020-09-01 15:07:47

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 2/5] mm: migration of hugetlbfs page skip memcg

On Sun, Aug 30, 2020 at 02:01:30PM -0700, Hugh Dickins wrote:
> hugetlbfs pages do not participate in memcg: so although they do find
> most of migrate_page_states() useful, it would be better if they did
> not call into mem_cgroup_migrate() - where Qian Cai reported that LTP's
> move_pages12 triggers the warning in Alex Shi's prospective commit
> "mm/memcg: warning on !memcg after readahead page charged".
>
> Signed-off-by: Hugh Dickins <[email protected]>

Acked-by: Johannes Weiner <[email protected]>

2020-09-01 16:01:17

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH 4/5] mm: fix check_move_unevictable_pages() on THP

On Sun, Aug 30, 2020 at 2:08 PM Hugh Dickins <[email protected]> wrote:
>
> check_move_unevictable_pages() is used in making unevictable shmem pages
> evictable: by shmem_unlock_mapping(), drm_gem_check_release_pagevec() and
> i915/gem check_release_pagevec(). Those may pass down subpages of a huge
> page, when /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force".
>
> That does not crash or warn at present, but the accounting of vmstats
> unevictable_pgs_scanned and unevictable_pgs_rescued is inconsistent:
> scanned being incremented on each subpage, rescued only on the head
> (since tails already appear evictable once the head has been updated).
>
> 5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has
> established that vm_events in general (and unevictable_pgs_rescued in
> particular) should count every subpage: so follow that precedent here.
>
> Do this in such a way that if mem_cgroup_page_lruvec() is made stricter
> (to check page->mem_cgroup is always set), no problem: skip the tails
> before calling it, and add thp_nr_pages() to vmstats on the head.
>
> Signed-off-by: Hugh Dickins <[email protected]>

Acked-by: Yang Shi <[email protected]>

> ---
> Nothing here worth going to stable, since it's just a testing config
> that is fixed, whose event numbers are not very important; but this
> will be needed before Alex Shi's warning, and might as well go in now.
>
> The callers of check_move_unevictable_pages() could be optimized,
> to skip over tails: but Matthew Wilcox has other changes in flight
> there, so let's skip the optimization for now.
>
> mm/vmscan.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> --- 5.9-rc2/mm/vmscan.c 2020-08-16 17:32:50.721507348 -0700
> +++ linux/mm/vmscan.c 2020-08-28 17:47:10.595580876 -0700
> @@ -4260,8 +4260,14 @@ void check_move_unevictable_pages(struct
> for (i = 0; i < pvec->nr; i++) {
> struct page *page = pvec->pages[i];
> struct pglist_data *pagepgdat = page_pgdat(page);
> + int nr_pages;
> +
> + if (PageTransTail(page))
> + continue;
> +
> + nr_pages = thp_nr_pages(page);
> + pgscanned += nr_pages;
>
> - pgscanned++;
> if (pagepgdat != pgdat) {
> if (pgdat)
> spin_unlock_irq(&pgdat->lru_lock);
> @@ -4280,7 +4286,7 @@ void check_move_unevictable_pages(struct
> ClearPageUnevictable(page);
> del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE);
> add_page_to_lru_list(page, lruvec, lru);
> - pgrescued++;
> + pgrescued += nr_pages;
> }
> }
>
>

2020-09-01 16:06:28

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH 5/5] mlock: fix unevictable_pgs event counts on THP

On Sun, Aug 30, 2020 at 2:09 PM Hugh Dickins <[email protected]> wrote:
>
> 5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has
> established that vm_events should count every subpage of a THP,
> including unevictable_pgs_culled and unevictable_pgs_rescued; but
> lru_cache_add_inactive_or_unevictable() was not doing so for
> unevictable_pgs_mlocked, and mm/mlock.c was not doing so for
> unevictable_pgs mlocked, munlocked, cleared and stranded.
>
> Fix them; but THPs don't go the pagevec way in mlock.c,
> so no fixes needed on that path.
>
> Fixes: 5d91f31faf8e ("mm: swap: fix vmstats for huge page")
> Signed-off-by: Hugh Dickins <[email protected]>

Acked-by: Yang Shi <[email protected]>

> ---
> I've only checked UNEVICTABLEs: there may be more inconsistencies left.
> The check_move_unevictable_pages() patch brought me to this one, but
> this is more important because mlock works on all THPs, without needing
> special testing "force". But, it's still just monotonically increasing
> event counts, so not all that important.
>
> mm/mlock.c | 24 +++++++++++++++---------
> mm/swap.c | 6 +++---
> 2 files changed, 18 insertions(+), 12 deletions(-)
>
> --- 5.9-rc2/mm/mlock.c 2020-08-16 17:32:50.665507048 -0700
> +++ linux/mm/mlock.c 2020-08-28 17:42:07.975278411 -0700
> @@ -58,11 +58,14 @@ EXPORT_SYMBOL(can_do_mlock);
> */
> void clear_page_mlock(struct page *page)
> {
> + int nr_pages;
> +
> if (!TestClearPageMlocked(page))
> return;
>
> - mod_zone_page_state(page_zone(page), NR_MLOCK, -thp_nr_pages(page));
> - count_vm_event(UNEVICTABLE_PGCLEARED);
> + nr_pages = thp_nr_pages(page);
> + mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
> + count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages);
> /*
> * The previous TestClearPageMlocked() corresponds to the smp_mb()
> * in __pagevec_lru_add_fn().
> @@ -76,7 +79,7 @@ void clear_page_mlock(struct page *page)
> * We lost the race. the page already moved to evictable list.
> */
> if (PageUnevictable(page))
> - count_vm_event(UNEVICTABLE_PGSTRANDED);
> + count_vm_events(UNEVICTABLE_PGSTRANDED, nr_pages);
> }
> }
>
> @@ -93,9 +96,10 @@ void mlock_vma_page(struct page *page)
> VM_BUG_ON_PAGE(PageCompound(page) && PageDoubleMap(page), page);
>
> if (!TestSetPageMlocked(page)) {
> - mod_zone_page_state(page_zone(page), NR_MLOCK,
> - thp_nr_pages(page));
> - count_vm_event(UNEVICTABLE_PGMLOCKED);
> + int nr_pages = thp_nr_pages(page);
> +
> + mod_zone_page_state(page_zone(page), NR_MLOCK, nr_pages);
> + count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages);
> if (!isolate_lru_page(page))
> putback_lru_page(page);
> }
> @@ -138,7 +142,7 @@ static void __munlock_isolated_page(stru
>
> /* Did try_to_unlock() succeed or punt? */
> if (!PageMlocked(page))
> - count_vm_event(UNEVICTABLE_PGMUNLOCKED);
> + count_vm_events(UNEVICTABLE_PGMUNLOCKED, thp_nr_pages(page));
>
> putback_lru_page(page);
> }
> @@ -154,10 +158,12 @@ static void __munlock_isolated_page(stru
> */
> static void __munlock_isolation_failed(struct page *page)
> {
> + int nr_pages = thp_nr_pages(page);
> +
> if (PageUnevictable(page))
> - __count_vm_event(UNEVICTABLE_PGSTRANDED);
> + __count_vm_events(UNEVICTABLE_PGSTRANDED, nr_pages);
> else
> - __count_vm_event(UNEVICTABLE_PGMUNLOCKED);
> + __count_vm_events(UNEVICTABLE_PGMUNLOCKED, nr_pages);
> }
>
> /**
> --- 5.9-rc2/mm/swap.c 2020-08-16 17:32:50.709507284 -0700
> +++ linux/mm/swap.c 2020-08-28 17:42:07.975278411 -0700
> @@ -494,14 +494,14 @@ void lru_cache_add_inactive_or_unevictab
>
> unevictable = (vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) == VM_LOCKED;
> if (unlikely(unevictable) && !TestSetPageMlocked(page)) {
> + int nr_pages = thp_nr_pages(page);
> /*
> * We use the irq-unsafe __mod_zone_page_stat because this
> * counter is not modified from interrupt context, and the pte
> * lock is held(spinlock), which implies preemption disabled.
> */
> - __mod_zone_page_state(page_zone(page), NR_MLOCK,
> - thp_nr_pages(page));
> - count_vm_event(UNEVICTABLE_PGMLOCKED);
> + __mod_zone_page_state(page_zone(page), NR_MLOCK, nr_pages);
> + count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages);
> }
> lru_cache_add(page);
> }
>