2021-07-07 20:43:15

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 1/4] mm/rmap: fix comments left over from recent changes

Parallel developments in mm/rmap.c have left behind some out-of-date
comments: try_to_migrate_one() also accepts TTU_SYNC (already commented
in try_to_migrate() itself), and try_to_migrate() returns nothing at all.

TTU_SPLIT_FREEZE has just been deleted, so reword the comment about it in
mm/huge_memory.c; and TTU_IGNORE_ACCESS was removed in 5.11, so delete
the "recently referenced" comment from try_to_unmap_one() (once upon a
time the comment was near the removed codeblock, but they drifted apart).

Signed-off-by: Hugh Dickins <[email protected]>
---
mm/huge_memory.c | 2 +-
mm/rmap.c | 7 +------
2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8b731d53e9f4..afff3ac87067 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2331,7 +2331,7 @@ static void remap_page(struct page *page, unsigned int nr)
{
int i;

- /* If TTU_SPLIT_FREEZE is ever extended to file, remove this check */
+ /* If unmap_page() uses try_to_migrate() on file, remove this check */
if (!PageAnon(page))
return;
if (PageTransHuge(page)) {
diff --git a/mm/rmap.c b/mm/rmap.c
index 37c24672125c..746013e282c3 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1439,8 +1439,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
while (page_vma_mapped_walk(&pvmw)) {
/*
* If the page is mlock()d, we cannot swap it out.
- * If it's recently referenced (perhaps page_referenced
- * skipped over this mm) then we should reactivate it.
*/
if (!(flags & TTU_IGNORE_MLOCK)) {
if (vma->vm_flags & VM_LOCKED) {
@@ -1687,8 +1685,7 @@ void try_to_unmap(struct page *page, enum ttu_flags flags)
* @arg: enum ttu_flags will be passed to this argument.
*
* If TTU_SPLIT_HUGE_PMD is specified any PMD mappings will be split into PTEs
- * containing migration entries. This and TTU_RMAP_LOCKED are the only supported
- * flags.
+ * containing migration entries.
*/
static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma,
unsigned long address, void *arg)
@@ -1928,8 +1925,6 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma,
*
* Tries to remove all the page table entries which are mapping this page and
* replace them with special swap entries. Caller must hold the page lock.
- *
- * If is successful, return true. Otherwise, false.
*/
void try_to_migrate(struct page *page, enum ttu_flags flags)
{
--
2.26.2


2021-07-07 20:43:17

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 2/4] mm/rmap: fix old bug: munlocking THP missed other mlocks

The kernel recovers in due course from missing Mlocked pages: but there
was no point in calling page_mlock() (formerly known as try_to_munlock())
on a THP, because nothing got done even when it was found to be mapped in
another VM_LOCKED vma.

It's true that we need to be careful: Mlocked accounting of pte-mapped
THPs is too difficult (so consistently avoided); but Mlocked accounting
of only-pmd-mapped THPs is supposed to work, even when multiple mappings
are mlocked and munlocked or munmapped. Refine the tests.

There is already a VM_BUG_ON_PAGE(PageDoubleMap) in page_mlock(), so
page_mlock_one() does not even have to worry about that complication.

(I said the kernel recovers: but would page reclaim be likely to split
THP before rediscovering that it's VM_LOCKED? I've not followed that up.)

Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages")
Signed-off-by: Hugh Dickins <[email protected]>
---
mm/rmap.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 746013e282c3..0e83c3be8568 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1442,8 +1442,9 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
*/
if (!(flags & TTU_IGNORE_MLOCK)) {
if (vma->vm_flags & VM_LOCKED) {
- /* PTE-mapped THP are never mlocked */
- if (!PageTransCompound(page)) {
+ /* PTE-mapped THP are never marked as mlocked */
+ if (!PageTransCompound(page) ||
+ (PageHead(page) && !PageDoubleMap(page))) {
/*
* Holding pte lock, we do *not* need
* mmap_lock here
@@ -1984,9 +1985,11 @@ static bool page_mlock_one(struct page *page, struct vm_area_struct *vma,
* munlock_vma_pages_range().
*/
if (vma->vm_flags & VM_LOCKED) {
- /* PTE-mapped THP are never mlocked */
- if (!PageTransCompound(page))
- mlock_vma_page(page);
+ /*
+ * PTE-mapped THP are never marked as mlocked, but
+ * this function is never called when PageDoubleMap().
+ */
+ mlock_vma_page(page);
page_vma_mapped_walk_done(&pvmw);
}

--
2.26.2

2021-07-07 20:43:33

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 4/4] mm/rmap: try_to_migrate() skip zone_device !device_private

I know nothing about zone_device pages and !device_private pages; but
if try_to_migrate_one() will do nothing for them, then it's better that
try_to_migrate() filter them first, than trawl through all their vmas.

Signed-off-by: Hugh Dickins <[email protected]>
---
mm/rmap.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 1235368f0628..795f9d5f8386 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1703,9 +1703,6 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma,
struct mmu_notifier_range range;
enum ttu_flags flags = (enum ttu_flags)(long)arg;

- if (is_zone_device_page(page) && !is_device_private_page(page))
- return true;
-
/*
* When racing against e.g. zap_pte_range() on another cpu,
* in between its ptep_get_and_clear_full() and page_remove_rmap(),
@@ -1944,6 +1941,9 @@ void try_to_migrate(struct page *page, enum ttu_flags flags)
TTU_SYNC)))
return;

+ if (is_zone_device_page(page) && !is_device_private_page(page))
+ return;
+
/*
* During exec, a temporary VMA is setup and later moved.
* The VMA is moved under the anon_vma lock but not the
--
2.26.2

2021-07-07 20:43:37

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/rmap: fix comments left over from recent changes

On Wed, Jul 7, 2021 at 1:06 PM Hugh Dickins <[email protected]> wrote:
>
> Parallel developments in mm/rmap.c have left behind some out-of-date
> comments: try_to_migrate_one() also accepts TTU_SYNC (already commented
> in try_to_migrate() itself), and try_to_migrate() returns nothing at all.
>
> TTU_SPLIT_FREEZE has just been deleted, so reword the comment about it in
> mm/huge_memory.c; and TTU_IGNORE_ACCESS was removed in 5.11, so delete
> the "recently referenced" comment from try_to_unmap_one() (once upon a
> time the comment was near the removed codeblock, but they drifted apart).
>
> Signed-off-by: Hugh Dickins <[email protected]>

Reviewed-by: Shakeel Butt <[email protected]>

2021-07-07 20:43:56

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH 3/4] mm/rmap: fix new bug: premature return from page_mlock_one()

In the unlikely race case that page_mlock_one() finds VM_LOCKED has been
cleared by the time it got page table lock, page_vma_mapped_walk_done()
must be called before returning, either explicitly, or by a final call
to page_vma_mapped_walk() - otherwise the page table remains locked.

Fixes: cd62734ca60d ("mm/rmap: split try_to_munlock from try_to_unmap")
Signed-off-by: Hugh Dickins <[email protected]>
---
mm/rmap.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 0e83c3be8568..1235368f0628 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1990,14 +1990,13 @@ static bool page_mlock_one(struct page *page, struct vm_area_struct *vma,
* this function is never called when PageDoubleMap().
*/
mlock_vma_page(page);
+ /*
+ * No need to scan further once the page is marked
+ * as mlocked.
+ */
page_vma_mapped_walk_done(&pvmw);
+ return false;
}
-
- /*
- * no need to continue scanning other vma's if the page has
- * been locked.
- */
- return false;
}

return true;
--
2.26.2

2021-07-07 21:49:45

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm/rmap: fix old bug: munlocking THP missed other mlocks

On Wed, Jul 7, 2021 at 1:08 PM Hugh Dickins <[email protected]> wrote:
>
> The kernel recovers in due course from missing Mlocked pages: but there
> was no point in calling page_mlock() (formerly known as try_to_munlock())
> on a THP, because nothing got done even when it was found to be mapped in
> another VM_LOCKED vma.
>
> It's true that we need to be careful: Mlocked accounting of pte-mapped
> THPs is too difficult (so consistently avoided); but Mlocked accounting
> of only-pmd-mapped THPs is supposed to work, even when multiple mappings
> are mlocked and munlocked or munmapped. Refine the tests.
>
> There is already a VM_BUG_ON_PAGE(PageDoubleMap) in page_mlock(), so
> page_mlock_one() does not even have to worry about that complication.
>
> (I said the kernel recovers: but would page reclaim be likely to split
> THP before rediscovering that it's VM_LOCKED? I've not followed that up.)

I think, yes, page reclaim will split the THP prematurely in this case.

>
> Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages")
> Signed-off-by: Hugh Dickins <[email protected]>

Reviewed-by: Shakeel Butt <[email protected]>

2021-07-07 21:53:21

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/rmap: fix comments left over from recent changes

On Wed, Jul 7, 2021 at 2:26 PM Yang Shi <[email protected]> wrote:
>
> On Wed, Jul 7, 2021 at 1:06 PM Hugh Dickins <[email protected]> wrote:
> >
> > Parallel developments in mm/rmap.c have left behind some out-of-date
> > comments: try_to_migrate_one() also accepts TTU_SYNC (already commented
> > in try_to_migrate() itself), and try_to_migrate() returns nothing at all.
> >
> > TTU_SPLIT_FREEZE has just been deleted, so reword the comment about it in
> > mm/huge_memory.c; and TTU_IGNORE_ACCESS was removed in 5.11, so delete
>
> I just realized this. Currently unmap_page() just unmaps file pages
> when splitting THP. But it seems this may cause some trouble for page
> cache speculative get for the below case IIUC. Am I missing something?
>
> CPU A CPU B
> unmap_page()
> ...
> freeze refcount
> find_get_page() ->
> __page_cache_add_speculative() ->
> VM_BUG_ON_PAGE(page_count(page)
> == 0, page); //When CONFIG_TINY_RCU is enabled
>
>
> The race is acceptable, I think we could replace the VM_BUG_ON to
> page_ref_add_unless(), just like !CONFIG_TINY_RCU case.

Please just disregard the above comment, I just found CONFIG_TINY_RCU
is for UP only. Sorry for the noise.

>
>
> > the "recently referenced" comment from try_to_unmap_one() (once upon a
> > time the comment was near the removed codeblock, but they drifted apart).
> >
> > Signed-off-by: Hugh Dickins <[email protected]>
> > ---
> > mm/huge_memory.c | 2 +-
> > mm/rmap.c | 7 +------
> > 2 files changed, 2 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 8b731d53e9f4..afff3ac87067 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2331,7 +2331,7 @@ static void remap_page(struct page *page, unsigned int nr)
> > {
> > int i;
> >
> > - /* If TTU_SPLIT_FREEZE is ever extended to file, remove this check */
> > + /* If unmap_page() uses try_to_migrate() on file, remove this check */
> > if (!PageAnon(page))
> > return;
> > if (PageTransHuge(page)) {
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 37c24672125c..746013e282c3 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1439,8 +1439,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> > while (page_vma_mapped_walk(&pvmw)) {
> > /*
> > * If the page is mlock()d, we cannot swap it out.
> > - * If it's recently referenced (perhaps page_referenced
> > - * skipped over this mm) then we should reactivate it.
> > */
> > if (!(flags & TTU_IGNORE_MLOCK)) {
> > if (vma->vm_flags & VM_LOCKED) {
> > @@ -1687,8 +1685,7 @@ void try_to_unmap(struct page *page, enum ttu_flags flags)
> > * @arg: enum ttu_flags will be passed to this argument.
> > *
> > * If TTU_SPLIT_HUGE_PMD is specified any PMD mappings will be split into PTEs
> > - * containing migration entries. This and TTU_RMAP_LOCKED are the only supported
> > - * flags.
> > + * containing migration entries.
> > */
> > static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma,
> > unsigned long address, void *arg)
> > @@ -1928,8 +1925,6 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma,
> > *
> > * Tries to remove all the page table entries which are mapping this page and
> > * replace them with special swap entries. Caller must hold the page lock.
> > - *
> > - * If is successful, return true. Otherwise, false.
> > */
> > void try_to_migrate(struct page *page, enum ttu_flags flags)
> > {
> > --
> > 2.26.2
> >

2021-07-07 22:21:51

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm/rmap: try_to_migrate() skip zone_device !device_private

On Wed, Jul 7, 2021 at 1:13 PM Hugh Dickins <[email protected]> wrote:
>
> I know nothing about zone_device pages and !device_private pages; but
> if try_to_migrate_one() will do nothing for them, then it's better that
> try_to_migrate() filter them first, than trawl through all their vmas.
>
> Signed-off-by: Hugh Dickins <[email protected]>

Reviewed-by: Shakeel Butt <[email protected]>

2021-07-07 22:36:31

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/rmap: fix comments left over from recent changes

On Wed, Jul 7, 2021 at 1:06 PM Hugh Dickins <[email protected]> wrote:
>
> Parallel developments in mm/rmap.c have left behind some out-of-date
> comments: try_to_migrate_one() also accepts TTU_SYNC (already commented
> in try_to_migrate() itself), and try_to_migrate() returns nothing at all.
>
> TTU_SPLIT_FREEZE has just been deleted, so reword the comment about it in
> mm/huge_memory.c; and TTU_IGNORE_ACCESS was removed in 5.11, so delete

I just realized this. Currently unmap_page() just unmaps file pages
when splitting THP. But it seems this may cause some trouble for page
cache speculative get for the below case IIUC. Am I missing something?

CPU A CPU B
unmap_page()
...
freeze refcount
find_get_page() ->
__page_cache_add_speculative() ->
VM_BUG_ON_PAGE(page_count(page)
== 0, page); //When CONFIG_TINY_RCU is enabled


The race is acceptable, I think we could replace the VM_BUG_ON to
page_ref_add_unless(), just like !CONFIG_TINY_RCU case.


> the "recently referenced" comment from try_to_unmap_one() (once upon a
> time the comment was near the removed codeblock, but they drifted apart).
>
> Signed-off-by: Hugh Dickins <[email protected]>
> ---
> mm/huge_memory.c | 2 +-
> mm/rmap.c | 7 +------
> 2 files changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 8b731d53e9f4..afff3ac87067 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2331,7 +2331,7 @@ static void remap_page(struct page *page, unsigned int nr)
> {
> int i;
>
> - /* If TTU_SPLIT_FREEZE is ever extended to file, remove this check */
> + /* If unmap_page() uses try_to_migrate() on file, remove this check */
> if (!PageAnon(page))
> return;
> if (PageTransHuge(page)) {
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 37c24672125c..746013e282c3 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1439,8 +1439,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> while (page_vma_mapped_walk(&pvmw)) {
> /*
> * If the page is mlock()d, we cannot swap it out.
> - * If it's recently referenced (perhaps page_referenced
> - * skipped over this mm) then we should reactivate it.
> */
> if (!(flags & TTU_IGNORE_MLOCK)) {
> if (vma->vm_flags & VM_LOCKED) {
> @@ -1687,8 +1685,7 @@ void try_to_unmap(struct page *page, enum ttu_flags flags)
> * @arg: enum ttu_flags will be passed to this argument.
> *
> * If TTU_SPLIT_HUGE_PMD is specified any PMD mappings will be split into PTEs
> - * containing migration entries. This and TTU_RMAP_LOCKED are the only supported
> - * flags.
> + * containing migration entries.
> */
> static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma,
> unsigned long address, void *arg)
> @@ -1928,8 +1925,6 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma,
> *
> * Tries to remove all the page table entries which are mapping this page and
> * replace them with special swap entries. Caller must hold the page lock.
> - *
> - * If is successful, return true. Otherwise, false.
> */
> void try_to_migrate(struct page *page, enum ttu_flags flags)
> {
> --
> 2.26.2
>

2021-07-07 23:18:11

by Alistair Popple

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm/rmap: fix new bug: premature return from page_mlock_one()

Thanks Hugh, evidently I missed this when re-adding the VM_LOCKED check
back.

Reviewed-by: Alistair Popple <[email protected]>

On Thursday, 8 July 2021 6:11:24 AM AEST Hugh Dickins wrote:
> In the unlikely race case that page_mlock_one() finds VM_LOCKED has been
> cleared by the time it got page table lock, page_vma_mapped_walk_done()
> must be called before returning, either explicitly, or by a final call
> to page_vma_mapped_walk() - otherwise the page table remains locked.
>
> Fixes: cd62734ca60d ("mm/rmap: split try_to_munlock from try_to_unmap")
> Signed-off-by: Hugh Dickins <[email protected]>
> ---
> mm/rmap.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 0e83c3be8568..1235368f0628 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1990,14 +1990,13 @@ static bool page_mlock_one(struct page *page, struct vm_area_struct *vma,
> * this function is never called when PageDoubleMap().
> */
> mlock_vma_page(page);
> + /*
> + * No need to scan further once the page is marked
> + * as mlocked.
> + */
> page_vma_mapped_walk_done(&pvmw);
> + return false;
> }
> -
> - /*
> - * no need to continue scanning other vma's if the page has
> - * been locked.
> - */
> - return false;
> }
>
> return true;
>




2021-07-07 23:26:27

by Alistair Popple

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm/rmap: try_to_migrate() skip zone_device !device_private

I know a bit about device private pages but not so much other variants.
try_to_migrate_one() will work with private pages so in general that check
still looks good.

Reviewed-by: Alistair Popple <[email protected]>

On Thursday, 8 July 2021 6:13:33 AM AEST Hugh Dickins wrote:
> I know nothing about zone_device pages and !device_private pages; but
> if try_to_migrate_one() will do nothing for them, then it's better that
> try_to_migrate() filter them first, than trawl through all their vmas.
>
> Signed-off-by: Hugh Dickins <[email protected]>
> ---
> mm/rmap.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 1235368f0628..795f9d5f8386 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1703,9 +1703,6 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma,
> struct mmu_notifier_range range;
> enum ttu_flags flags = (enum ttu_flags)(long)arg;
>
> - if (is_zone_device_page(page) && !is_device_private_page(page))
> - return true;
> -
> /*
> * When racing against e.g. zap_pte_range() on another cpu,
> * in between its ptep_get_and_clear_full() and page_remove_rmap(),
> @@ -1944,6 +1941,9 @@ void try_to_migrate(struct page *page, enum ttu_flags flags)
> TTU_SYNC)))
> return;
>
> + if (is_zone_device_page(page) && !is_device_private_page(page))
> + return;
> +
> /*
> * During exec, a temporary VMA is setup and later moved.
> * The VMA is moved under the anon_vma lock but not the
>




2021-07-07 23:51:51

by Alistair Popple

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm/rmap: fix comments left over from recent changes

On Thursday, 8 July 2021 6:06:17 AM AEST Hugh Dickins wrote:
> Parallel developments in mm/rmap.c have left behind some out-of-date
> comments: try_to_migrate_one() also accepts TTU_SYNC (already commented
> in try_to_migrate() itself), and try_to_migrate() returns nothing at all.
>
> TTU_SPLIT_FREEZE has just been deleted, so reword the comment about it in
> mm/huge_memory.c; and TTU_IGNORE_ACCESS was removed in 5.11, so delete
> the "recently referenced" comment from try_to_unmap_one() (once upon a
> time the comment was near the removed codeblock, but they drifted apart).
>
> Signed-off-by: Hugh Dickins <[email protected]>

Reviewed-by: Alistair Popple <[email protected]>

> ---
> mm/huge_memory.c | 2 +-
> mm/rmap.c | 7 +------
> 2 files changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 8b731d53e9f4..afff3ac87067 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2331,7 +2331,7 @@ static void remap_page(struct page *page, unsigned int nr)
> {
> int i;
>
> - /* If TTU_SPLIT_FREEZE is ever extended to file, remove this check */
> + /* If unmap_page() uses try_to_migrate() on file, remove this check */
> if (!PageAnon(page))
> return;
> if (PageTransHuge(page)) {
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 37c24672125c..746013e282c3 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1439,8 +1439,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> while (page_vma_mapped_walk(&pvmw)) {
> /*
> * If the page is mlock()d, we cannot swap it out.
> - * If it's recently referenced (perhaps page_referenced
> - * skipped over this mm) then we should reactivate it.
> */
> if (!(flags & TTU_IGNORE_MLOCK)) {
> if (vma->vm_flags & VM_LOCKED) {
> @@ -1687,8 +1685,7 @@ void try_to_unmap(struct page *page, enum ttu_flags flags)
> * @arg: enum ttu_flags will be passed to this argument.
> *
> * If TTU_SPLIT_HUGE_PMD is specified any PMD mappings will be split into PTEs
> - * containing migration entries. This and TTU_RMAP_LOCKED are the only supported
> - * flags.
> + * containing migration entries.
> */
> static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma,
> unsigned long address, void *arg)
> @@ -1928,8 +1925,6 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma,
> *
> * Tries to remove all the page table entries which are mapping this page and
> * replace them with special swap entries. Caller must hold the page lock.
> - *
> - * If is successful, return true. Otherwise, false.
> */
> void try_to_migrate(struct page *page, enum ttu_flags flags)
> {
>




2021-07-08 13:59:18

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm/rmap: fix old bug: munlocking THP missed other mlocks

On Wed, Jul 07, 2021 at 01:08:53PM -0700, Hugh Dickins wrote:
> The kernel recovers in due course from missing Mlocked pages: but there
> was no point in calling page_mlock() (formerly known as try_to_munlock())
> on a THP, because nothing got done even when it was found to be mapped in
> another VM_LOCKED vma.
>
> It's true that we need to be careful: Mlocked accounting of pte-mapped
> THPs is too difficult (so consistently avoided); but Mlocked accounting
> of only-pmd-mapped THPs is supposed to work, even when multiple mappings
> are mlocked and munlocked or munmapped. Refine the tests.

Well, that's true that it should be fine to mlock only-pmd-mapped THPs,
but the refined check doesn't gurantee that the page is not mapped with
PTEs. !PageDoubleMap(page) only guarantees that the page in not mapped
with both PMDs and PTEs at the same time. For anon pages, we clear the
flag when the last PMD mapping is gone and only PTEs left.

Do I miss some detail here? Maybe we exclude anon pages here somehow?
I don't see it.

--
Kirill A. Shutemov

2021-07-09 02:52:38

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm/rmap: fix old bug: munlocking THP missed other mlocks

On Thu, 8 Jul 2021, Kirill A. Shutemov wrote:
> On Wed, Jul 07, 2021 at 01:08:53PM -0700, Hugh Dickins wrote:
> > The kernel recovers in due course from missing Mlocked pages: but there
> > was no point in calling page_mlock() (formerly known as try_to_munlock())
> > on a THP, because nothing got done even when it was found to be mapped in
> > another VM_LOCKED vma.
> >
> > It's true that we need to be careful: Mlocked accounting of pte-mapped
> > THPs is too difficult (so consistently avoided); but Mlocked accounting
> > of only-pmd-mapped THPs is supposed to work, even when multiple mappings
> > are mlocked and munlocked or munmapped. Refine the tests.
>
> Well, that's true that it should be fine to mlock only-pmd-mapped THPs,
> but the refined check doesn't gurantee that the page is not mapped with
> PTEs. !PageDoubleMap(page) only guarantees that the page in not mapped
> with both PMDs and PTEs at the same time. For anon pages, we clear the
> flag when the last PMD mapping is gone and only PTEs left.
>
> Do I miss some detail here? Maybe we exclude anon pages here somehow?
> I don't see it.

Yes, you're right, Kirill: thanks a lot for catching that.
PageDoubleMap: certainly not my favourite page flag!

And now that I've seen follow_trans_huge_pmd(), its comments, and its
goto skip_mlock on a PageAnon with compound_mapcount != 1, the right
fix for page_mlock() seems to be to skip over Anon THP altogether.

Here's a v2 of just this patch (others remain good): what do you think?

[PATCH v2 2/4] mm/rmap: fix old bug: munlocking THP missed other mlocks

The kernel recovers in due course from missing Mlocked pages: but there
was no point in calling page_mlock() (formerly known as try_to_munlock())
on a THP, because nothing got done even when it was found to be mapped in
another VM_LOCKED vma.

It's true that we need to be careful: Mlocked accounting of pte-mapped
THPs is too difficult (so consistently avoided); but Mlocked accounting
of only-pmd-mapped file THPs is supposed to work, even when multiple
mappings are mlocked and munlocked or munmapped. Refine the tests.

Many thanks to Kirill for reminding that PageDoubleMap cannot be relied on
to warn of pte mappings in the Anon THP case; and a scan of subpages does
not seem appropriate here. Note how follow_trans_huge_pmd() does not even
mark an Anon THP as mlocked when compound_mapcount != 1: multiple mlocking
of Anon THP is avoided, so simply return from page_mlock() in this case.

I said the kernel recovers: but would page reclaim be likely to split THP
before rediscovering that it's VM_LOCKED? Apparently so. I have worked
on a fix for that, but it's a different issue, and not something to rush.
Whereas page_mlock_one() could not be reviewed without fixing this first.

Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages")
Signed-off-by: Hugh Dickins <[email protected]>
---
mm/rmap.c | 42 +++++++++++++++++++++++++-----------------
1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 746013e282c3..f1d4edf9c696 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1440,20 +1440,20 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
/*
* If the page is mlock()d, we cannot swap it out.
*/
- if (!(flags & TTU_IGNORE_MLOCK)) {
- if (vma->vm_flags & VM_LOCKED) {
- /* PTE-mapped THP are never mlocked */
- if (!PageTransCompound(page)) {
- /*
- * Holding pte lock, we do *not* need
- * mmap_lock here
- */
- mlock_vma_page(page);
- }
- ret = false;
- page_vma_mapped_walk_done(&pvmw);
- break;
- }
+ if (!(flags & TTU_IGNORE_MLOCK) &&
+ (vma->vm_flags & VM_LOCKED)) {
+ /*
+ * PTE-mapped THP are never marked as mlocked: so do
+ * not set it on a DoubleMap THP, nor on an Anon THP
+ * (which may still be PTE-mapped after DoubleMap was
+ * cleared). But stop unmapping even in those cases.
+ */
+ if (!PageTransCompound(page) || (PageHead(page) &&
+ !PageDoubleMap(page) && !PageAnon(page)))
+ mlock_vma_page(page);
+ page_vma_mapped_walk_done(&pvmw);
+ ret = false;
+ break;
}

/* Unexpected PMD-mapped THP? */
@@ -1984,9 +1984,13 @@ static bool page_mlock_one(struct page *page, struct vm_area_struct *vma,
* munlock_vma_pages_range().
*/
if (vma->vm_flags & VM_LOCKED) {
- /* PTE-mapped THP are never mlocked */
- if (!PageTransCompound(page))
- mlock_vma_page(page);
+ /*
+ * PTE-mapped THP are never marked as mlocked; but
+ * this function is never called on a DoubleMap THP,
+ * nor on an Anon THP (which may still be PTE-mapped
+ * after DoubleMap was cleared).
+ */
+ mlock_vma_page(page);
page_vma_mapped_walk_done(&pvmw);
}

@@ -2020,6 +2024,10 @@ void page_mlock(struct page *page)
VM_BUG_ON_PAGE(!PageLocked(page) || PageLRU(page), page);
VM_BUG_ON_PAGE(PageCompound(page) && PageDoubleMap(page), page);

+ /* Anon THP are only marked as mlocked when singly mapped */
+ if (PageTransCompound(page) && PageAnon(page))
+ return;
+
rmap_walk(page, &rwc);
}

--
2.26.2

2021-07-09 10:58:33

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm/rmap: fix old bug: munlocking THP missed other mlocks

On Thu, Jul 08, 2021 at 07:50:26PM -0700, Hugh Dickins wrote:
> Here's a v2 of just this patch (others remain good): what do you think?

Looks good to me. You can use my

Acked-by: Kirill A. Shutemov <[email protected]>

for the series.

--
Kirill A. Shutemov