2024-02-13 08:54:21

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] mm/huge_memory: fix swap entry values of tail pages of THP

On Tue, Feb 13, 2024 at 02:18:10PM +0530, Charan Teja Kalla wrote:
> An anon THP page is first added to swap cache before reclaiming it.
> Initially, each tail page contains the proper swap entry value(stored in
> ->private field) which is filled from add_to_swap_cache(). After
> migrating the THP page sitting on the swap cache, only the swap entry of
> the head page is filled(see folio_migrate_mapping()).
>
> Now when this page is tried to split(one case is when this page is again
> migrated, see migrate_pages()->try_split_thp()), the tail pages
> ->private is not stored with proper swap entry values. When this tail
> page is now try to be freed, as part of it delete_from_swap_cache() is
> called which operates on the wrong swap cache index and eventually
> replaces the wrong swap cache index with shadow/NULL value, frees the
> page.
>
> This leads to the state with a swap cache containing the freed page.
> This issue can manifest in many forms and the most common thing observed
> is the rcu stall during the swapin (see mapping_get_entry()).
>
> On the recent kernels, this issues is indirectly getting fixed with the
> series[1], to be specific[2].
>
> When tried to back port this series, it is observed many merge
> conflicts and also seems dependent on many other changes. As backporting
> to LTS branches is not a trivial one, the similar change from [2] is
> picked as a fix.
>
> [1] https://lore.kernel.org/all/[email protected]/
> [2] https://lore.kernel.org/all/[email protected]/

I am deeply confused by this commit message.

Are you saying there is a problem in current HEAD which this fixes, or
are you saying that this problem has already been fixed, and this patch
is for older kernels?

> Closes: https://lore.kernel.org/linux-mm/[email protected]/
> Fixes: 3417013e0d18 ("mm/migrate: Add folio_migrate_mapping()")
> Cc: <[email protected]> # see patch description, applicable to <=6.1
> Signed-off-by: Charan Teja Kalla <[email protected]>
> ---
> mm/huge_memory.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 5957794..cc5273f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2477,6 +2477,8 @@ static void __split_huge_page_tail(struct page *head, int tail,
> if (!folio_test_swapcache(page_folio(head))) {
> VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, page_tail);
> page_tail->private = 0;
> + } else {
> + set_page_private(page_tail, (unsigned long)head->private + tail);
> }
>
> /* Page flags must be visible before we make the page non-compound. */
> --
> 2.7.4
>


2024-02-13 09:10:37

by Charan Teja Kalla

[permalink] [raw]
Subject: Re: [PATCH] mm/huge_memory: fix swap entry values of tail pages of THP

Thanks Matthew!!

On 2/13/2024 2:24 PM, Matthew Wilcox wrote:
> I am deeply confused by this commit message.
>
> Are you saying there is a problem in current HEAD which this fixes, or
> are you saying that this problem has already been fixed, and this patch
> is for older kernels?

Sorry, I meant this patch is __only for older kernels__. We are seeing
this issue on 6.1 LTS kernel.

At least I am not expecting this issue on the HEAD of the linux-next branch.

Seems the below message is not clear from my side to say that:
a) why this issue won't be seen on the latest kernel and
b) the problems associated with the respective patches in back porting
to LTS branch?

"On the recent kernels, this issues is indirectly getting fixed with the
series[1], to be specific[2].

When tried to back port this series, it is observed many merge
conflicts and also seems dependent on many other changes. As backporting
to LTS branches is not a trivial one, the similar change from [2] is
picked as a fix.

[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/"

IOW, the below couple of line is ensuring the proper swap entry is
stored in the tail pages which is somehow missed on the older kernels.

static void __split_huge_page_tail(struct folio *folio, int tail,
struct lruvec *lruvec, struct list_head *list)
{
.............
+ if (folio_test_swapcache(folio))
+ new_folio->swap.val = folio->swap.val + tail;
.............
}

Thanks.