2024-02-05 23:28:45

by Nhat Pham

[permalink] [raw]
Subject: [PATCH] mm/swap_state: update zswap LRU's protection range with the folio locked

Move the zswap LRU protection range update above the swap_read_folio()
call, and only when a new page is allocated. This is the case where
(z)swapin could happen, which is a signal that the zswap shrinker should
be more conservative with its reclaiming action.

It also prevents a race, in which folio migration can clear the
memcg_data of the now unlocked folio, resulting in a warning in the
inlined folio_lruvec() call.

Reported-by: [email protected]
Closes: https://lore.kernel.org/all/[email protected]/
Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
Signed-off-by: Nhat Pham <[email protected]>
---
mm/swap_state.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index e671266ad772..7255c01a1e4e 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -680,9 +680,10 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
/* The page was likely read above, so no need for plugging here */
folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
&page_allocated, false);
- if (unlikely(page_allocated))
+ if (unlikely(page_allocated)) {
+ zswap_folio_swapin(folio);
swap_read_folio(folio, false, NULL);
- zswap_folio_swapin(folio);
+ }
return folio;
}

@@ -855,9 +856,10 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
/* The folio was likely read above, so no need for plugging here */
folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx,
&page_allocated, false);
- if (unlikely(page_allocated))
+ if (unlikely(page_allocated)) {
+ zswap_folio_swapin(folio);
swap_read_folio(folio, false, NULL);
- zswap_folio_swapin(folio);
+ }
return folio;
}


base-commit: 91f3daa1765ee4e0c89987dc25f72c40f07af34d
--
2.39.3



2024-02-06 02:07:36

by Chengming Zhou

[permalink] [raw]
Subject: Re: [PATCH] mm/swap_state: update zswap LRU's protection range with the folio locked

On 2024/2/6 07:24, Nhat Pham wrote:
> Move the zswap LRU protection range update above the swap_read_folio()
> call, and only when a new page is allocated. This is the case where
> (z)swapin could happen, which is a signal that the zswap shrinker should
> be more conservative with its reclaiming action.
>
> It also prevents a race, in which folio migration can clear the
> memcg_data of the now unlocked folio, resulting in a warning in the
> inlined folio_lruvec() call.
>
> Reported-by: [email protected]
> Closes: https://lore.kernel.org/all/[email protected]/
> Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> Signed-off-by: Nhat Pham <[email protected]>

LGTM, thanks!

Reviewed-by: Chengming Zhou <[email protected]>

> ---
> mm/swap_state.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index e671266ad772..7255c01a1e4e 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -680,9 +680,10 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
> /* The page was likely read above, so no need for plugging here */
> folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
> &page_allocated, false);
> - if (unlikely(page_allocated))
> + if (unlikely(page_allocated)) {
> + zswap_folio_swapin(folio);
> swap_read_folio(folio, false, NULL);
> - zswap_folio_swapin(folio);
> + }
> return folio;
> }
>
> @@ -855,9 +856,10 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> /* The folio was likely read above, so no need for plugging here */
> folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx,
> &page_allocated, false);
> - if (unlikely(page_allocated))
> + if (unlikely(page_allocated)) {
> + zswap_folio_swapin(folio);
> swap_read_folio(folio, false, NULL);
> - zswap_folio_swapin(folio);
> + }
> return folio;
> }
>
>
> base-commit: 91f3daa1765ee4e0c89987dc25f72c40f07af34d

2024-02-06 15:23:59

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH] mm/swap_state: update zswap LRU's protection range with the folio locked

On Mon, Feb 05, 2024 at 03:24:42PM -0800, Nhat Pham wrote:
> Move the zswap LRU protection range update above the swap_read_folio()
> call, and only when a new page is allocated. This is the case where
> (z)swapin could happen, which is a signal that the zswap shrinker should
> be more conservative with its reclaiming action.
>
> It also prevents a race, in which folio migration can clear the
> memcg_data of the now unlocked folio, resulting in a warning in the
> inlined folio_lruvec() call.

The warning is the most probable outcome, and it will cause the update
to go against the root cgroup which is safe at least.

But AFAICS there is no ordering guarantee to rule out a UAF if the
lookup succeeds but the memcg and lruvec get freed before the update.

I think that part should be more prominent in the changelog. It's more
important than the first paragraph. Consider somebody scrolling
through the git log and trying to decide whether to backport or not;
it's helpful to describe the bug and its impact first thing, then put
the explanation of the fix after.

> Reported-by: [email protected]
> Closes: https://lore.kernel.org/all/[email protected]/
> Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> Signed-off-by: Nhat Pham <[email protected]>

Would it make sense to add

VM_WARN_ON_ONCE(!folio_test_locked(folio));

to zswap_folio_swapin() as well?

2024-02-06 17:31:25

by Nhat Pham

[permalink] [raw]
Subject: Re: [PATCH] mm/swap_state: update zswap LRU's protection range with the folio locked

On Tue, Feb 6, 2024 at 7:15 AM Johannes Weiner <[email protected]> wrote:
>
> On Mon, Feb 05, 2024 at 03:24:42PM -0800, Nhat Pham wrote:
> > Move the zswap LRU protection range update above the swap_read_folio()
> > call, and only when a new page is allocated. This is the case where
> > (z)swapin could happen, which is a signal that the zswap shrinker should
> > be more conservative with its reclaiming action.
> >
> > It also prevents a race, in which folio migration can clear the
> > memcg_data of the now unlocked folio, resulting in a warning in the
> > inlined folio_lruvec() call.
>
> The warning is the most probable outcome, and it will cause the update
> to go against the root cgroup which is safe at least.
>
> But AFAICS there is no ordering guarantee to rule out a UAF if the
> lookup succeeds but the memcg and lruvec get freed before the update.

Ah nice. I didn't consider that. IIUC, having the folio locked should
prevent this too. Based on the documentation:

* For a non-kmem folio any of the following ensures folio and memcg binding
* stability:
*
* - the folio lock

I'll rework the commit log to include this, and make this more prominent :)

>
> I think that part should be more prominent in the changelog. It's more
> important than the first paragraph. Consider somebody scrolling
> through the git log and trying to decide whether to backport or not;
> it's helpful to describe the bug and its impact first thing, then put
> the explanation of the fix after.
>
> > Reported-by: [email protected]
> > Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@googlecom/
> > Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> > Signed-off-by: Nhat Pham <[email protected]>
>
> Would it make sense to add
>
> VM_WARN_ON_ONCE(!folio_test_locked(folio));
>
> to zswap_folio_swapin() as well?