2022-06-19 15:15:53

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 3/3] mm: Clear page->private when splitting or migrating a page

In our efforts to remove uses of PG_private, we have found folios with
the private flag clear and folio->private not-NULL. That is the root
cause behind 642d51fb0775 ("ceph: check folio PG_private bit instead
of folio->private"). It can also affect a few other filesystems that
haven't yet reported a problem.

compaction_alloc() can return a page with uninitialised page->private,
and rather than checking all the callers of migrate_pages(), just zero
page->private after calling get_new_page(). Similarly, the tail pages
from split_huge_page() may also have an uninitialised page->private.

Reported-by: Xiubo Li <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/huge_memory.c | 1 +
mm/migrate.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f7248002dad9..9b31a50217b5 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2377,6 +2377,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
page_tail);
page_tail->mapping = head->mapping;
page_tail->index = head->index + tail;
+ page_tail->private = NULL;

/* Page flags must be visible before we make the page non-compound. */
smp_wmb();
diff --git a/mm/migrate.c b/mm/migrate.c
index e51588e95f57..6c1ea61f39d8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1106,6 +1106,7 @@ static int unmap_and_move(new_page_t get_new_page,
if (!newpage)
return -ENOMEM;

+ newpage->private = 0;
rc = __unmap_and_move(page, newpage, force, mode);
if (rc == MIGRATEPAGE_SUCCESS)
set_page_owner_migrate_reason(newpage, reason);
--
2.35.1


2022-06-20 07:06:20

by Xiubo Li

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm: Clear page->private when splitting or migrating a page


On 6/19/22 11:11 PM, Matthew Wilcox (Oracle) wrote:
> In our efforts to remove uses of PG_private, we have found folios with
> the private flag clear and folio->private not-NULL. That is the root
> cause behind 642d51fb0775 ("ceph: check folio PG_private bit instead
> of folio->private"). It can also affect a few other filesystems that
> haven't yet reported a problem.
>
> compaction_alloc() can return a page with uninitialised page->private,
> and rather than checking all the callers of migrate_pages(), just zero
> page->private after calling get_new_page(). Similarly, the tail pages
> from split_huge_page() may also have an uninitialised page->private.
>
> Reported-by: Xiubo Li <[email protected]>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> mm/huge_memory.c | 1 +
> mm/migrate.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index f7248002dad9..9b31a50217b5 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2377,6 +2377,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
> page_tail);
> page_tail->mapping = head->mapping;
> page_tail->index = head->index + tail;
> + page_tail->private = NULL;

There has a warning when compiling it:

mm/huge_memory.c: In function ‘__split_huge_page_tail’:
mm/huge_memory.c:2380:21: warning: assignment to ‘long unsigned int’
from ‘void *’ makes integer from pointer without a cast [-Wint-conversion]
  page_tail->private = NULL;
                     ^
  AR      mm/built-in.a


>
> /* Page flags must be visible before we make the page non-compound. */
> smp_wmb();
> diff --git a/mm/migrate.c b/mm/migrate.c
> index e51588e95f57..6c1ea61f39d8 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1106,6 +1106,7 @@ static int unmap_and_move(new_page_t get_new_page,
> if (!newpage)
> return -ENOMEM;
>
> + newpage->private = 0;
> rc = __unmap_and_move(page, newpage, force, mode);
> if (rc == MIGRATEPAGE_SUCCESS)
> set_page_owner_migrate_reason(newpage, reason);

2022-06-21 00:18:30

by Xiubo Li

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm: Clear page->private when splitting or migrating a page


On 6/19/22 11:11 PM, Matthew Wilcox (Oracle) wrote:
> In our efforts to remove uses of PG_private, we have found folios with
> the private flag clear and folio->private not-NULL. That is the root
> cause behind 642d51fb0775 ("ceph: check folio PG_private bit instead
> of folio->private"). It can also affect a few other filesystems that
> haven't yet reported a problem.
>
> compaction_alloc() can return a page with uninitialised page->private,
> and rather than checking all the callers of migrate_pages(), just zero
> page->private after calling get_new_page(). Similarly, the tail pages
> from split_huge_page() may also have an uninitialised page->private.
>
> Reported-by: Xiubo Li <[email protected]>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> mm/huge_memory.c | 1 +
> mm/migrate.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index f7248002dad9..9b31a50217b5 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2377,6 +2377,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
> page_tail);
> page_tail->mapping = head->mapping;
> page_tail->index = head->index + tail;
> + page_tail->private = NULL;
>
> /* Page flags must be visible before we make the page non-compound. */
> smp_wmb();
> diff --git a/mm/migrate.c b/mm/migrate.c
> index e51588e95f57..6c1ea61f39d8 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1106,6 +1106,7 @@ static int unmap_and_move(new_page_t get_new_page,
> if (!newpage)
> return -ENOMEM;
>
> + newpage->private = 0;
> rc = __unmap_and_move(page, newpage, force, mode);
> if (rc == MIGRATEPAGE_SUCCESS)
> set_page_owner_migrate_reason(newpage, reason);

Test this patch by reverting my previous patch for many times yesterday,
and it worked well for me till now. I will test it more to see whether
there are other cases could cause the crash.

Tested-by: Xiubo Li <[email protected]>

-- Xiubo