Once upon a time, all swapins counted toward memory pressure[1]. Then
Joonsoo introduced workingset detection for anonymous pages and we
gained the ability to distinguish hot from cold swapins[2][3]. But we
failed to update swap_readpage() accordingly, and now we account
partial memory pressure in the swapin path of cold memory.
Not for all situations - which adds more inconsistency: paths using
the conventional submit_bio() and lock_page() route will not see much
pressure - unless storage itself is heavily congested and the bio
submissions stall. ZRAM and ZSWAP do most of the work directly from
swap_readpage() and will see all swapins reflected as pressure.
Restore consistency by making all swapin stall accounting conditional
on the page actually being part of the workingset.
[1] commit 937790699be9 ("mm/page_io.c: annotate refault stalls from swap_readpage")
[2] commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU")
[3] commit cad8320b4b39 ("mm/swap: don't SetPageWorkingset unconditionally during swapin")
Reported-by: CGEL <[email protected]>
Signed-off-by: Johannes Weiner <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Yu Zhao <[email protected]>
---
mm/page_io.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/mm/page_io.c b/mm/page_io.c
index 61c792f916fa..f6296ee25014 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -359,6 +359,7 @@ int swap_readpage(struct page *page, bool synchronous)
struct bio *bio;
int ret = 0;
struct swap_info_struct *sis = page_swap_info(page);
+ bool workingset = PageWorkingset(page);
unsigned long pflags;
VM_BUG_ON_PAGE(!PageSwapCache(page) && !synchronous, page);
@@ -370,7 +371,8 @@ int swap_readpage(struct page *page, bool synchronous)
* or the submitting cgroup IO-throttled, submission can be a
* significant part of overall IO time.
*/
- psi_memstall_enter(&pflags);
+ if (workingset)
+ psi_memstall_enter(&pflags);
delayacct_swapin_start();
if (frontswap_load(page) == 0) {
@@ -431,7 +433,8 @@ int swap_readpage(struct page *page, bool synchronous)
bio_put(bio);
out:
- psi_memstall_leave(&pflags);
+ if (workingset)
+ psi_memstall_leave(&pflags);
delayacct_swapin_end();
return ret;
}
--
2.34.1
On Mon, 14 Feb 2022 16:49:21 -0500 Johannes Weiner <[email protected]> wrote:
> Once upon a time, all swapins counted toward memory pressure[1]. Then
> Joonsoo introduced workingset detection for anonymous pages and we
> gained the ability to distinguish hot from cold swapins[2][3]. But we
> failed to update swap_readpage() accordingly, and now we account
> partial memory pressure in the swapin path of cold memory.
>
> Not for all situations - which adds more inconsistency: paths using
> the conventional submit_bio() and lock_page() route will not see much
> pressure - unless storage itself is heavily congested and the bio
> submissions stall. ZRAM and ZSWAP do most of the work directly from
> swap_readpage() and will see all swapins reflected as pressure.
>
> Restore consistency by making all swapin stall accounting conditional
> on the page actually being part of the workingset.
Does this have any known runtime effects? If not, can we
hazard a guess?
On Mon, Feb 14, 2022 at 04:49:21PM -0500, Johannes Weiner wrote:
> Once upon a time, all swapins counted toward memory pressure[1]. Then
> Joonsoo introduced workingset detection for anonymous pages and we
> gained the ability to distinguish hot from cold swapins[2][3]. But we
> failed to update swap_readpage() accordingly, and now we account
> partial memory pressure in the swapin path of cold memory.
>
> Not for all situations - which adds more inconsistency: paths using
> the conventional submit_bio() and lock_page() route will not see much
> pressure - unless storage itself is heavily congested and the bio
> submissions stall. ZRAM and ZSWAP do most of the work directly from
> swap_readpage() and will see all swapins reflected as pressure.
>
> Restore consistency by making all swapin stall accounting conditional
> on the page actually being part of the workingset.
>
> [1] commit 937790699be9 ("mm/page_io.c: annotate refault stalls from swap_readpage")
> [2] commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU")
> [3] commit cad8320b4b39 ("mm/swap: don't SetPageWorkingset unconditionally during swapin")
>
> Reported-by: CGEL <[email protected]>
> Signed-off-by: Johannes Weiner <[email protected]>
Acked-by: Minchan Kim <[email protected]>
On Mon, Feb 14, 2022 at 02:48:05PM -0800, Andrew Morton wrote:
> On Mon, 14 Feb 2022 16:49:21 -0500 Johannes Weiner <[email protected]> wrote:
>
> > Once upon a time, all swapins counted toward memory pressure[1]. Then
> > Joonsoo introduced workingset detection for anonymous pages and we
> > gained the ability to distinguish hot from cold swapins[2][3]. But we
> > failed to update swap_readpage() accordingly, and now we account
> > partial memory pressure in the swapin path of cold memory.
> >
> > Not for all situations - which adds more inconsistency: paths using
> > the conventional submit_bio() and lock_page() route will not see much
> > pressure - unless storage itself is heavily congested and the bio
> > submissions stall. ZRAM and ZSWAP do most of the work directly from
> > swap_readpage() and will see all swapins reflected as pressure.
> >
> > Restore consistency by making all swapin stall accounting conditional
> > on the page actually being part of the workingset.
>
> Does this have any known runtime effects? If not, can we
> hazard a guess?
hm, how about this paragrah between "not for all situations" and
"restore consistency":
IOW, a workload doing cold swapins could see little to no pressure
reported with on-disk swap, but potentially high pressure with a zram
or zswap backend. That confuses any psi-based health monitoring, load
shedding, proactive reclaim, or userspace OOM killing schemes that
might be in place for the workload.