LinuxLists.cc - [PATCH v2 0/9] swapin refactor for optimization and unified readahead

2024-01-02 17:54:01

Subject: [PATCH v2 0/9] swapin refactor for optimization and unified readahead

From: Kairui Song <[email protected]>

This series is rebased on latest mm-stable to avoid conflicts.

This series tries to unify and clean up the swapin path, introduce minor
optimization, and make both shmem swapoff make use of SWP_SYNCHRONOUS_IO
flag to skip readahead and swapcache for better performance.

1. Some benchmark for dropping readahead and swapcache for shmem with ZRAM:

- Single file sequence read:
perf stat --repeat 20 dd if=/tmpfs/test of=/dev/null bs=1M count=8192
(/tmpfs/test is a zero filled file, using brd as swap, 4G memcg limit)
Before: 22.248 +- 0.549
After: 22.021 +- 0.684 (-1.1%)

- Random read stress test:
fio -name=tmpfs --numjobs=16 --directory=/tmpfs \
--size=256m --ioengine=mmap --rw=randread --random_distribution=random \
--time_based --ramp_time=1m --runtime=5m --group_reporting
(using brd as swap, 2G memcg limit)

Before: 1818MiB/s
After: 1888MiB/s (+3.85%)

- Zipf biased random read stress test:
fio -name=tmpfs --numjobs=16 --directory=/tmpfs \
--size=256m --ioengine=mmap --rw=randread --random_distribution=zipf:1.2 \
--time_based --ramp_time=1m --runtime=5m --group_reporting
(using brd as swap, 2G memcg limit)

Before: 31.1GiB/s
After: 32.3GiB/s (+3.86%)

Previously, shmem always used cluster readahead, it doesn't help much even
for single sequence read, and for random stress tests, the performance is
better without it. In reality, due to memory and swap fragmentation cluster
read-head is less helpful for ZRAM.

2. Micro benchmark which use madvise to swap out 10G zero-filled data to
ZRAM then read them in, shows a performance gain for swapin path:

Before: 11143285 us
After: 10692644 us (+4.1%)

3. Swap off an 10G ZRAM:

Before:
time swapoff /dev/zram0
real 0m12.337s
user 0m0.001s
sys 0m12.329s

After:
time swapoff /dev/zram0
real 0m9.728s
user 0m0.001s
sys 0m9.719s

This also clean up the path to apply a per swap device readahead
policy for all swapin paths.

V1: https://lkml.org/lkml/2023/11/19/296
Update from V1:
- Rebased based on mm-unstable.
- Remove behaviour changing patches, will submit in seperate series
later.
- Code style, naming and comments updates.
- Thanks to Chris Li for very detailed and helpful review of V1. Thanks
to Matthew Wilcox and Huang Ying for helpful suggestions.

Kairui Song (9):
mm/swapfile.c: add back some comment
mm/swap: move no readahead swapin code to a stand-alone helper
mm/swap: avoid doing extra unlock error checks for direct swapin
mm/swap: always account swapped in page into current memcg
mm/swap: introduce swapin_entry for unified readahead policy
mm/swap: also handle swapcache lookup in swapin_entry
mm/swap: avoid a duplicated swap cache lookup for SWP_SYNCHRONOUS_IO
mm/swap: introduce a helper for swapin without vmfault
swap, shmem: use new swapin helper and skip readahead conditionally

mm/memory.c | 74 +++++++-------------------
mm/shmem.c | 67 +++++++++++------------
mm/swap.h | 39 ++++++++++----
mm/swap_state.c | 138 +++++++++++++++++++++++++++++++++++++++++-------
mm/swapfile.c | 32 +++++------
5 files changed, 218 insertions(+), 132 deletions(-)

--
2.43.0

2024-01-02 17:54:24

by Kairui Song

[permalink] [raw]

Subject: [PATCH v2 2/9] mm/swap: move no readahead swapin code to a stand-alone helper

From: Kairui Song <[email protected]>

No feature change, simply move the routine to a standalone function to
be re-used later. The error path handling is copied from the "out_page"
label, to make the code change minimized for easier reviewing.

Signed-off-by: Kairui Song <[email protected]>
---
mm/memory.c | 32 ++++----------------------------
mm/swap.h | 8 ++++++++
mm/swap_state.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 59 insertions(+), 28 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index a0a50d3754f0..0165c8cad489 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3803,7 +3803,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
swp_entry_t entry;
pte_t pte;
vm_fault_t ret = 0;
- void *shadow = NULL;

if (!pte_unmap_same(vmf))
goto out;
@@ -3867,33 +3866,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
if (!folio) {
if (data_race(si->flags & SWP_SYNCHRONOUS_IO) &&
__swap_count(entry) == 1) {
- /* skip swapcache */
- folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0,
- vma, vmf->address, false);
- page = &folio->page;
- if (folio) {
- __folio_set_locked(folio);
- __folio_set_swapbacked(folio);
-
- if (mem_cgroup_swapin_charge_folio(folio,
- vma->vm_mm, GFP_KERNEL,
- entry)) {
- ret = VM_FAULT_OOM;
- goto out_page;
- }
- mem_cgroup_swapin_uncharge_swap(entry);
-
- shadow = get_shadow_from_swap_cache(entry);
- if (shadow)
- workingset_refault(folio, shadow);
-
- folio_add_lru(folio);
-
- /* To provide entry to swap_read_folio() */
- folio->swap = entry;
- swap_read_folio(folio, true, NULL);
- folio->private = NULL;
- }
+ /* skip swapcache and readahead */
+ folio = swapin_direct(entry, GFP_HIGHUSER_MOVABLE, vmf);
+ if (folio)
+ page = &folio->page;
} else {
page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
vmf);
diff --git a/mm/swap.h b/mm/swap.h
index 758c46ca671e..83eab7b67e77 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -56,6 +56,8 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t flag,
struct mempolicy *mpol, pgoff_t ilx);
struct page *swapin_readahead(swp_entry_t entry, gfp_t flag,
struct vm_fault *vmf);
+struct folio *swapin_direct(swp_entry_t entry, gfp_t flag,
+ struct vm_fault *vmf);

static inline unsigned int folio_swap_flags(struct folio *folio)
{
@@ -86,6 +88,12 @@ static inline struct folio *swap_cluster_readahead(swp_entry_t entry,
return NULL;
}

+struct folio *swapin_direct(swp_entry_t entry, gfp_t flag,
+ struct vm_fault *vmf)
+{
+ return NULL;
+}
+
static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
struct vm_fault *vmf)
{
diff --git a/mm/swap_state.c b/mm/swap_state.c
index e671266ad772..24cb93ed5081 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -861,6 +861,53 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
return folio;
}

+/**
+ * swapin_direct - swap in folios skipping swap cache and readahead
+ * @entry: swap entry of this memory
+ * @gfp_mask: memory allocation flags
+ * @vmf: fault information
+ *
+ * Returns the struct folio for entry and addr after the swap entry is read
+ * in.
+ */
+struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
+ struct vm_fault *vmf)
+{
+ struct vm_area_struct *vma = vmf->vma;
+ struct folio *folio;
+ void *shadow = NULL;
+
+ /* skip swapcache */
+ folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0,
+ vma, vmf->address, false);
+ if (folio) {
+ __folio_set_locked(folio);
+ __folio_set_swapbacked(folio);
+
+ if (mem_cgroup_swapin_charge_folio(folio,
+ vma->vm_mm, GFP_KERNEL,
+ entry)) {
+ folio_unlock(folio);
+ folio_put(folio);
+ return NULL;
+ }
+ mem_cgroup_swapin_uncharge_swap(entry);
+
+ shadow = get_shadow_from_swap_cache(entry);
+ if (shadow)
+ workingset_refault(folio, shadow);
+
+ folio_add_lru(folio);
+
+ /* To provide entry to swap_read_folio() */
+ folio->swap = entry;
+ swap_read_folio(folio, true, NULL);
+ folio->private = NULL;
+ }
+
+ return folio;
+}
+
/**
* swapin_readahead - swap in pages in hope we need them soon
* @entry: swap entry of this memory
--
2.43.0

2024-01-02 17:55:12

by Kairui Song

[permalink] [raw]

Subject: [PATCH v2 5/9] mm/swap: introduce swapin_entry for unified readahead policy

From: Kairui Song <[email protected]>

Introduce swapin_entry which merges swapin_readahead and swapin_direct
making it the main entry for swapin pages, and use a unified swapin
policy.

This commit makes swapoff make use of this new helper and now swapping
off a 10G ZRAM (lzo-rle) is faster since readahead is skipped.

Before:
time swapoff /dev/zram0
real 0m12.337s
user 0m0.001s
sys 0m12.329s

After:
time swapoff /dev/zram0
real 0m9.728s
user 0m0.001s
sys 0m9.719s

Signed-off-by: Kairui Song <[email protected]>
---
mm/memory.c | 21 +++++++--------------
mm/swap.h | 16 ++++------------
mm/swap_state.c | 49 +++++++++++++++++++++++++++++++++----------------
mm/swapfile.c | 7 ++-----
4 files changed, 46 insertions(+), 47 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 0165c8cad489..b56254a875f8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3801,6 +3801,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
rmap_t rmap_flags = RMAP_NONE;
bool exclusive = false;
swp_entry_t entry;
+ bool swapcached;
pte_t pte;
vm_fault_t ret = 0;

@@ -3864,21 +3865,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
swapcache = folio;

if (!folio) {
- if (data_race(si->flags & SWP_SYNCHRONOUS_IO) &&
- __swap_count(entry) == 1) {
- /* skip swapcache and readahead */
- folio = swapin_direct(entry, GFP_HIGHUSER_MOVABLE, vmf);
- if (folio)
- page = &folio->page;
+ folio = swapin_entry(entry, GFP_HIGHUSER_MOVABLE,
+ vmf, &swapcached);
+ if (folio) {
+ page = folio_file_page(folio, swp_offset(entry));
+ if (swapcached)
+ swapcache = folio;
} else {
- page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
- vmf);
- if (page)
- folio = page_folio(page);
- swapcache = folio;
- }
-
- if (!folio) {
/*
* Back out if somebody else faulted in this pte
* while we released the pte lock.
diff --git a/mm/swap.h b/mm/swap.h
index 83eab7b67e77..502a2801f817 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -54,10 +54,8 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_flags,
bool skip_if_exists);
struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t flag,
struct mempolicy *mpol, pgoff_t ilx);
-struct page *swapin_readahead(swp_entry_t entry, gfp_t flag,
- struct vm_fault *vmf);
-struct folio *swapin_direct(swp_entry_t entry, gfp_t flag,
- struct vm_fault *vmf);
+struct folio *swapin_entry(swp_entry_t entry, gfp_t flag,
+ struct vm_fault *vmf, bool *swapcached);

static inline unsigned int folio_swap_flags(struct folio *folio)
{
@@ -88,14 +86,8 @@ static inline struct folio *swap_cluster_readahead(swp_entry_t entry,
return NULL;
}

-struct folio *swapin_direct(swp_entry_t entry, gfp_t flag,
- struct vm_fault *vmf)
-{
- return NULL;
-}
-
-static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
- struct vm_fault *vmf)
+static inline struct folio *swapin_entry(swp_entry_t swp, gfp_t gfp_mask,
+ struct vm_fault *vmf, bool *swapcached)
{
return NULL;
}
diff --git a/mm/swap_state.c b/mm/swap_state.c
index d39c5369da21..66ff187aa5d3 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -316,6 +316,11 @@ void free_pages_and_swap_cache(struct encoded_page **pages, int nr)
release_pages(pages, nr);
}

+static inline bool swap_use_no_readahead(struct swap_info_struct *si, swp_entry_t entry)
+{
+ return data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1;
+}
+
static inline bool swap_use_vma_readahead(void)
{
return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotate_swap);
@@ -870,8 +875,8 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
* Returns the struct folio for entry and addr after the swap entry is read
* in.
*/
-struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
- struct vm_fault *vmf)
+static struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
+ struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
struct folio *folio;
@@ -908,33 +913,45 @@ struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
}

/**
- * swapin_readahead - swap in pages in hope we need them soon
+ * swapin_entry - swap in a page from swap entry
* @entry: swap entry of this memory
* @gfp_mask: memory allocation flags
* @vmf: fault information
+ * @swapcached: pointer to a bool used as indicator if the
+ * page is swapped in through swapcache.
*
* Returns the struct page for entry and addr, after queueing swapin.
*
- * It's a main entry function for swap readahead. By the configuration,
+ * It's a main entry function for swap in. By the configuration,
* it will read ahead blocks by cluster-based(ie, physical disk based)
- * or vma-based(ie, virtual address based on faulty address) readahead.
+ * or vma-based(ie, virtual address based on faulty address) readahead,
+ * or skip the readahead (ie, ramdisk based swap device).
*/
-struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
- struct vm_fault *vmf)
+struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp_mask,
+ struct vm_fault *vmf, bool *swapcached)
{
struct mempolicy *mpol;
- pgoff_t ilx;
struct folio *folio;
+ pgoff_t ilx;
+ bool cached;

- mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx);
- folio = swap_use_vma_readahead() ?
- swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf) :
- swap_cluster_readahead(entry, gfp_mask, mpol, ilx);
- mpol_cond_put(mpol);
+ if (swap_use_no_readahead(swp_swap_info(entry), entry)) {
+ folio = swapin_direct(entry, gfp_mask, vmf);
+ cached = false;
+ } else {
+ mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx);
+ if (swap_use_vma_readahead())
+ folio = swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf);
+ else
+ folio = swap_cluster_readahead(entry, gfp_mask, mpol, ilx);
+ mpol_cond_put(mpol);
+ cached = true;
+ }

- if (!folio)
- return NULL;
- return folio_file_page(folio, swp_offset(entry));
+ if (swapcached)
+ *swapcached = cached;
+
+ return folio;
}

#ifdef CONFIG_SYSFS
diff --git a/mm/swapfile.c b/mm/swapfile.c
index f7271504aa0a..ce4e6c10dce7 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1866,7 +1866,6 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,

folio = swap_cache_get_folio(entry, vma, addr);
if (!folio) {
- struct page *page;
struct vm_fault vmf = {
.vma = vma,
.address = addr,
@@ -1874,10 +1873,8 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
.pmd = pmd,
};

- page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
- &vmf);
- if (page)
- folio = page_folio(page);
+ folio = swapin_entry(entry, GFP_HIGHUSER_MOVABLE,
+ &vmf, NULL);
}
if (!folio) {
/*
--
2.43.0

2024-01-04 07:31:09

by Huang, Ying

[permalink] [raw]

Subject: Re: [PATCH v2 2/9] mm/swap: move no readahead swapin code to a stand-alone helper

Kairui Song <[email protected]> writes:

> From: Kairui Song <[email protected]>
>
> No feature change, simply move the routine to a standalone function to
> be re-used later. The error path handling is copied from the "out_page"
> label, to make the code change minimized for easier reviewing.
>
> Signed-off-by: Kairui Song <[email protected]>
> ---
> mm/memory.c | 32 ++++----------------------------
> mm/swap.h | 8 ++++++++
> mm/swap_state.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 59 insertions(+), 28 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index a0a50d3754f0..0165c8cad489 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3803,7 +3803,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> swp_entry_t entry;
> pte_t pte;
> vm_fault_t ret = 0;
> - void *shadow = NULL;
>
> if (!pte_unmap_same(vmf))
> goto out;
> @@ -3867,33 +3866,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> if (!folio) {
> if (data_race(si->flags & SWP_SYNCHRONOUS_IO) &&
> __swap_count(entry) == 1) {
> - /* skip swapcache */
> - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0,
> - vma, vmf->address, false);
> - page = &folio->page;
> - if (folio) {
> - __folio_set_locked(folio);
> - __folio_set_swapbacked(folio);
> -
> - if (mem_cgroup_swapin_charge_folio(folio,
> - vma->vm_mm, GFP_KERNEL,
> - entry)) {
> - ret = VM_FAULT_OOM;
> - goto out_page;
> - }
> - mem_cgroup_swapin_uncharge_swap(entry);
> -
> - shadow = get_shadow_from_swap_cache(entry);
> - if (shadow)
> - workingset_refault(folio, shadow);
> -
> - folio_add_lru(folio);
> -
> - /* To provide entry to swap_read_folio() */
> - folio->swap = entry;
> - swap_read_folio(folio, true, NULL);
> - folio->private = NULL;
> - }
> + /* skip swapcache and readahead */
> + folio = swapin_direct(entry, GFP_HIGHUSER_MOVABLE, vmf);
> + if (folio)
> + page = &folio->page;
> } else {
> page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
> vmf);
> diff --git a/mm/swap.h b/mm/swap.h
> index 758c46ca671e..83eab7b67e77 100644
> --- a/mm/swap.h
> +++ b/mm/swap.h
> @@ -56,6 +56,8 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t flag,
> struct mempolicy *mpol, pgoff_t ilx);
> struct page *swapin_readahead(swp_entry_t entry, gfp_t flag,
> struct vm_fault *vmf);
> +struct folio *swapin_direct(swp_entry_t entry, gfp_t flag,
> + struct vm_fault *vmf);
>
> static inline unsigned int folio_swap_flags(struct folio *folio)
> {
> @@ -86,6 +88,12 @@ static inline struct folio *swap_cluster_readahead(swp_entry_t entry,
> return NULL;
> }
>
> +struct folio *swapin_direct(swp_entry_t entry, gfp_t flag,
> + struct vm_fault *vmf)
> +{
> + return NULL;
> +}
> +
> static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
> struct vm_fault *vmf)
> {
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index e671266ad772..24cb93ed5081 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -861,6 +861,53 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> return folio;
> }
>
> +/**
> + * swapin_direct - swap in folios skipping swap cache and readahead

swap in a folio ...

> + * @entry: swap entry of this memory
> + * @gfp_mask: memory allocation flags
> + * @vmf: fault information
> + *
> + * Returns the struct folio for entry and addr after the swap entry is read
> + * in.
> + */
> +struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
> + struct vm_fault *vmf)
> +{
> + struct vm_area_struct *vma = vmf->vma;
> + struct folio *folio;
> + void *shadow = NULL;
> +
> + /* skip swapcache */
> + folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0,

You passed gfp_mask in, but don't use it.

> + vma, vmf->address, false);
> + if (folio) {
> + __folio_set_locked(folio);
> + __folio_set_swapbacked(folio);
> +
> + if (mem_cgroup_swapin_charge_folio(folio,
> + vma->vm_mm, GFP_KERNEL,
> + entry)) {
> + folio_unlock(folio);
> + folio_put(folio);
> + return NULL;
> + }
> + mem_cgroup_swapin_uncharge_swap(entry);
> +
> + shadow = get_shadow_from_swap_cache(entry);
> + if (shadow)
> + workingset_refault(folio, shadow);
> +
> + folio_add_lru(folio);
> +
> + /* To provide entry to swap_read_folio() */
> + folio->swap = entry;
> + swap_read_folio(folio, true, NULL);
> + folio->private = NULL;
> + }
> +
> + return folio;
> +}
> +
> /**
> * swapin_readahead - swap in pages in hope we need them soon
> * @entry: swap entry of this memory

--
Best Regards,
Huang, Ying

2024-01-05 07:30:28

by Huang, Ying

[permalink] [raw]

Subject: Re: [PATCH v2 5/9] mm/swap: introduce swapin_entry for unified readahead policy

Kairui Song <[email protected]> writes:

> From: Kairui Song <[email protected]>
>
> Introduce swapin_entry which merges swapin_readahead and swapin_direct
> making it the main entry for swapin pages, and use a unified swapin
> policy.
>
> This commit makes swapoff make use of this new helper and now swapping
> off a 10G ZRAM (lzo-rle) is faster since readahead is skipped.
>
> Before:
> time swapoff /dev/zram0
> real 0m12.337s
> user 0m0.001s
> sys 0m12.329s
>
> After:
> time swapoff /dev/zram0
> real 0m9.728s
> user 0m0.001s
> sys 0m9.719s
>
> Signed-off-by: Kairui Song <[email protected]>
> ---
> mm/memory.c | 21 +++++++--------------
> mm/swap.h | 16 ++++------------
> mm/swap_state.c | 49 +++++++++++++++++++++++++++++++++----------------
> mm/swapfile.c | 7 ++-----
> 4 files changed, 46 insertions(+), 47 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 0165c8cad489..b56254a875f8 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3801,6 +3801,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> rmap_t rmap_flags = RMAP_NONE;
> bool exclusive = false;
> swp_entry_t entry;
> + bool swapcached;
> pte_t pte;
> vm_fault_t ret = 0;
>
> @@ -3864,21 +3865,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> swapcache = folio;
>
> if (!folio) {
> - if (data_race(si->flags & SWP_SYNCHRONOUS_IO) &&
> - __swap_count(entry) == 1) {
> - /* skip swapcache and readahead */
> - folio = swapin_direct(entry, GFP_HIGHUSER_MOVABLE, vmf);
> - if (folio)
> - page = &folio->page;
> + folio = swapin_entry(entry, GFP_HIGHUSER_MOVABLE,
> + vmf, &swapcached);
> + if (folio) {
> + page = folio_file_page(folio, swp_offset(entry));
> + if (swapcached)
> + swapcache = folio;
> } else {
> - page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
> - vmf);
> - if (page)
> - folio = page_folio(page);
> - swapcache = folio;
> - }
> -
> - if (!folio) {
> /*
> * Back out if somebody else faulted in this pte
> * while we released the pte lock.
> diff --git a/mm/swap.h b/mm/swap.h
> index 83eab7b67e77..502a2801f817 100644
> --- a/mm/swap.h
> +++ b/mm/swap.h
> @@ -54,10 +54,8 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_flags,
> bool skip_if_exists);
> struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t flag,
> struct mempolicy *mpol, pgoff_t ilx);
> -struct page *swapin_readahead(swp_entry_t entry, gfp_t flag,
> - struct vm_fault *vmf);
> -struct folio *swapin_direct(swp_entry_t entry, gfp_t flag,
> - struct vm_fault *vmf);
> +struct folio *swapin_entry(swp_entry_t entry, gfp_t flag,
> + struct vm_fault *vmf, bool *swapcached);
>
> static inline unsigned int folio_swap_flags(struct folio *folio)
> {
> @@ -88,14 +86,8 @@ static inline struct folio *swap_cluster_readahead(swp_entry_t entry,
> return NULL;
> }
>
> -struct folio *swapin_direct(swp_entry_t entry, gfp_t flag,
> - struct vm_fault *vmf)
> -{
> - return NULL;
> -}
> -
> -static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
> - struct vm_fault *vmf)
> +static inline struct folio *swapin_entry(swp_entry_t swp, gfp_t gfp_mask,
> + struct vm_fault *vmf, bool *swapcached)
> {
> return NULL;
> }
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index d39c5369da21..66ff187aa5d3 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -316,6 +316,11 @@ void free_pages_and_swap_cache(struct encoded_page **pages, int nr)
> release_pages(pages, nr);
> }
>
> +static inline bool swap_use_no_readahead(struct swap_info_struct *si, swp_entry_t entry)
> +{
> + return data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1;
> +}
> +

It appears that there's only one caller of the function in the same
file? Why add a function?

> static inline bool swap_use_vma_readahead(void)
> {
> return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotate_swap);
> @@ -870,8 +875,8 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> * Returns the struct folio for entry and addr after the swap entry is read
> * in.
> */
> -struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
> - struct vm_fault *vmf)
> +static struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
> + struct vm_fault *vmf)
> {
> struct vm_area_struct *vma = vmf->vma;
> struct folio *folio;
> @@ -908,33 +913,45 @@ struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
> }
>
> /**
> - * swapin_readahead - swap in pages in hope we need them soon
> + * swapin_entry - swap in a page from swap entry
> * @entry: swap entry of this memory
> * @gfp_mask: memory allocation flags
> * @vmf: fault information
> + * @swapcached: pointer to a bool used as indicator if the
> + * page is swapped in through swapcache.
> *
> * Returns the struct page for entry and addr, after queueing swapin.
> *
> - * It's a main entry function for swap readahead. By the configuration,
> + * It's a main entry function for swap in. By the configuration,
> * it will read ahead blocks by cluster-based(ie, physical disk based)
> - * or vma-based(ie, virtual address based on faulty address) readahead.
> + * or vma-based(ie, virtual address based on faulty address) readahead,
> + * or skip the readahead (ie, ramdisk based swap device).
> */
> -struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> - struct vm_fault *vmf)
> +struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp_mask,
> + struct vm_fault *vmf, bool *swapcached)

May be better to use

struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp_mask,
struct vm_fault *vmf, struct folio **swapcache)

In this way, we can reduce the number of source lines in the caller.

> {
> struct mempolicy *mpol;
> - pgoff_t ilx;
> struct folio *folio;
> + pgoff_t ilx;
> + bool cached;
>
> - mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx);
> - folio = swap_use_vma_readahead() ?
> - swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf) :
> - swap_cluster_readahead(entry, gfp_mask, mpol, ilx);
> - mpol_cond_put(mpol);
> + if (swap_use_no_readahead(swp_swap_info(entry), entry)) {
> + folio = swapin_direct(entry, gfp_mask, vmf);
> + cached = false;
> + } else {
> + mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx);
> + if (swap_use_vma_readahead())
> + folio = swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf);
> + else
> + folio = swap_cluster_readahead(entry, gfp_mask, mpol, ilx);
> + mpol_cond_put(mpol);
> + cached = true;
> + }
>
> - if (!folio)
> - return NULL;
> - return folio_file_page(folio, swp_offset(entry));
> + if (swapcached)
> + *swapcached = cached;
> +
> + return folio;
> }
>
> #ifdef CONFIG_SYSFS
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index f7271504aa0a..ce4e6c10dce7 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1866,7 +1866,6 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
>
> folio = swap_cache_get_folio(entry, vma, addr);
> if (!folio) {
> - struct page *page;
> struct vm_fault vmf = {
> .vma = vma,
> .address = addr,
> @@ -1874,10 +1873,8 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> .pmd = pmd,
> };
>
> - page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
> - &vmf);
> - if (page)
> - folio = page_folio(page);
> + folio = swapin_entry(entry, GFP_HIGHUSER_MOVABLE,
> + &vmf, NULL);
> }
> if (!folio) {
> /*

--
Best Regards,
Huang, Ying

2024-01-05 07:43:35

by Kairui Song

[permalink] [raw]

Subject: Re: [PATCH v2 2/9] mm/swap: move no readahead swapin code to a stand-alone helper

Huang, Ying <[email protected]> 于2024年1月4日周四 15:31写道：
>
> Kairui Song <[email protected]> writes:
>
> > From: Kairui Song <[email protected]>
> >
> > No feature change, simply move the routine to a standalone function to
> > be re-used later. The error path handling is copied from the "out_page"
> > label, to make the code change minimized for easier reviewing.
> >
> > Signed-off-by: Kairui Song <[email protected]>
> > ---
> > mm/memory.c | 32 ++++----------------------------
> > mm/swap.h | 8 ++++++++
> > mm/swap_state.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 59 insertions(+), 28 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index a0a50d3754f0..0165c8cad489 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3803,7 +3803,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > swp_entry_t entry;
> > pte_t pte;
> > vm_fault_t ret = 0;
> > - void *shadow = NULL;
> >
> > if (!pte_unmap_same(vmf))
> > goto out;
> > @@ -3867,33 +3866,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > if (!folio) {
> > if (data_race(si->flags & SWP_SYNCHRONOUS_IO) &&
> > __swap_count(entry) == 1) {
> > - /* skip swapcache */
> > - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0,
> > - vma, vmf->address, false);
> > - page = &folio->page;
> > - if (folio) {
> > - __folio_set_locked(folio);
> > - __folio_set_swapbacked(folio);
> > -
> > - if (mem_cgroup_swapin_charge_folio(folio,
> > - vma->vm_mm, GFP_KERNEL,
> > - entry)) {
> > - ret = VM_FAULT_OOM;
> > - goto out_page;
> > - }
> > - mem_cgroup_swapin_uncharge_swap(entry);
> > -
> > - shadow = get_shadow_from_swap_cache(entry);
> > - if (shadow)
> > - workingset_refault(folio, shadow);
> > -
> > - folio_add_lru(folio);
> > -
> > - /* To provide entry to swap_read_folio() */
> > - folio->swap = entry;
> > - swap_read_folio(folio, true, NULL);
> > - folio->private = NULL;
> > - }
> > + /* skip swapcache and readahead */
> > + folio = swapin_direct(entry, GFP_HIGHUSER_MOVABLE, vmf);
> > + if (folio)
> > + page = &folio->page;
> > } else {
> > page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
> > vmf);
> > diff --git a/mm/swap.h b/mm/swap.h
> > index 758c46ca671e..83eab7b67e77 100644
> > --- a/mm/swap.h
> > +++ b/mm/swap.h
> > @@ -56,6 +56,8 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t flag,
> > struct mempolicy *mpol, pgoff_t ilx);
> > struct page *swapin_readahead(swp_entry_t entry, gfp_t flag,
> > struct vm_fault *vmf);
> > +struct folio *swapin_direct(swp_entry_t entry, gfp_t flag,
> > + struct vm_fault *vmf);
> >
> > static inline unsigned int folio_swap_flags(struct folio *folio)
> > {
> > @@ -86,6 +88,12 @@ static inline struct folio *swap_cluster_readahead(swp_entry_t entry,
> > return NULL;
> > }
> >
> > +struct folio *swapin_direct(swp_entry_t entry, gfp_t flag,
> > + struct vm_fault *vmf)
> > +{
> > + return NULL;
> > +}
> > +
> > static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
> > struct vm_fault *vmf)
> > {
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index e671266ad772..24cb93ed5081 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -861,6 +861,53 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> > return folio;
> > }
> >
> > +/**
> > + * swapin_direct - swap in folios skipping swap cache and readahead
>
> swap in a folio ...
>
> > + * @entry: swap entry of this memory
> > + * @gfp_mask: memory allocation flags
> > + * @vmf: fault information
> > + *
> > + * Returns the struct folio for entry and addr after the swap entry is read
> > + * in.
> > + */
> > +struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
> > + struct vm_fault *vmf)
> > +{
> > + struct vm_area_struct *vma = vmf->vma;
> > + struct folio *folio;
> > + void *shadow = NULL;
> > +
> > + /* skip swapcache */
> > + folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0,
>
> You passed gfp_mask in, but don't use it.
>

Thanks, will fix these.

> --
> Best Regards,
> Huang, Ying

2024-01-10 02:43:07

by Kairui Song

[permalink] [raw]

Subject: Re: [PATCH v2 5/9] mm/swap: introduce swapin_entry for unified readahead policy

Huang, Ying <[email protected]> 于2024年1月5日周五 15:30写道：
>
> Kairui Song <[email protected]> writes:
>
> > From: Kairui Song <[email protected]>
> >
> > Introduce swapin_entry which merges swapin_readahead and swapin_direct
> > making it the main entry for swapin pages, and use a unified swapin
> > policy.
> >
> > This commit makes swapoff make use of this new helper and now swapping
> > off a 10G ZRAM (lzo-rle) is faster since readahead is skipped.
> >
> > Before:
> > time swapoff /dev/zram0
> > real 0m12.337s
> > user 0m0.001s
> > sys 0m12.329s
> >
> > After:
> > time swapoff /dev/zram0
> > real 0m9.728s
> > user 0m0.001s
> > sys 0m9.719s
> >
> > Signed-off-by: Kairui Song <[email protected]>
> > ---
> > mm/memory.c | 21 +++++++--------------
> > mm/swap.h | 16 ++++------------
> > mm/swap_state.c | 49 +++++++++++++++++++++++++++++++++----------------
> > mm/swapfile.c | 7 ++-----
> > 4 files changed, 46 insertions(+), 47 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 0165c8cad489..b56254a875f8 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3801,6 +3801,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > rmap_t rmap_flags = RMAP_NONE;
> > bool exclusive = false;
> > swp_entry_t entry;
> > + bool swapcached;
> > pte_t pte;
> > vm_fault_t ret = 0;
> >
> > @@ -3864,21 +3865,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > swapcache = folio;
> >
> > if (!folio) {
> > - if (data_race(si->flags & SWP_SYNCHRONOUS_IO) &&
> > - __swap_count(entry) == 1) {
> > - /* skip swapcache and readahead */
> > - folio = swapin_direct(entry, GFP_HIGHUSER_MOVABLE, vmf);
> > - if (folio)
> > - page = &folio->page;
> > + folio = swapin_entry(entry, GFP_HIGHUSER_MOVABLE,
> > + vmf, &swapcached);
> > + if (folio) {
> > + page = folio_file_page(folio, swp_offset(entry));
> > + if (swapcached)
> > + swapcache = folio;
> > } else {
> > - page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
> > - vmf);
> > - if (page)
> > - folio = page_folio(page);
> > - swapcache = folio;
> > - }
> > -
> > - if (!folio) {
> > /*
> > * Back out if somebody else faulted in this pte
> > * while we released the pte lock.
> > diff --git a/mm/swap.h b/mm/swap.h
> > index 83eab7b67e77..502a2801f817 100644
> > --- a/mm/swap.h
> > +++ b/mm/swap.h
> > @@ -54,10 +54,8 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_flags,
> > bool skip_if_exists);
> > struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t flag,
> > struct mempolicy *mpol, pgoff_t ilx);
> > -struct page *swapin_readahead(swp_entry_t entry, gfp_t flag,
> > - struct vm_fault *vmf);
> > -struct folio *swapin_direct(swp_entry_t entry, gfp_t flag,
> > - struct vm_fault *vmf);
> > +struct folio *swapin_entry(swp_entry_t entry, gfp_t flag,
> > + struct vm_fault *vmf, bool *swapcached);
> >
> > static inline unsigned int folio_swap_flags(struct folio *folio)
> > {
> > @@ -88,14 +86,8 @@ static inline struct folio *swap_cluster_readahead(swp_entry_t entry,
> > return NULL;
> > }
> >
> > -struct folio *swapin_direct(swp_entry_t entry, gfp_t flag,
> > - struct vm_fault *vmf)
> > -{
> > - return NULL;
> > -}
> > -
> > -static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
> > - struct vm_fault *vmf)
> > +static inline struct folio *swapin_entry(swp_entry_t swp, gfp_t gfp_mask,
> > + struct vm_fault *vmf, bool *swapcached)
> > {
> > return NULL;
> > }
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index d39c5369da21..66ff187aa5d3 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -316,6 +316,11 @@ void free_pages_and_swap_cache(struct encoded_page **pages, int nr)
> > release_pages(pages, nr);
> > }
> >
> > +static inline bool swap_use_no_readahead(struct swap_info_struct *si, swp_entry_t entry)
> > +{
> > + return data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1;
> > +}
> > +

Hi Ying,

Thanks for the review.

>
> It appears that there's only one caller of the function in the same
> file? Why add a function?

Later patch will extend the checker function.
I can defer this change so it won't cause confusion for reviewers.

>
> > static inline bool swap_use_vma_readahead(void)
> > {
> > return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotate_swap);
> > @@ -870,8 +875,8 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> > * Returns the struct folio for entry and addr after the swap entry is read
> > * in.
> > */
> > -struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
> > - struct vm_fault *vmf)
> > +static struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
> > + struct vm_fault *vmf)
> > {
> > struct vm_area_struct *vma = vmf->vma;
> > struct folio *folio;
> > @@ -908,33 +913,45 @@ struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask,
> > }
> >
> > /**
> > - * swapin_readahead - swap in pages in hope we need them soon
> > + * swapin_entry - swap in a page from swap entry
> > * @entry: swap entry of this memory
> > * @gfp_mask: memory allocation flags
> > * @vmf: fault information
> > + * @swapcached: pointer to a bool used as indicator if the
> > + * page is swapped in through swapcache.
> > *
> > * Returns the struct page for entry and addr, after queueing swapin.
> > *
> > - * It's a main entry function for swap readahead. By the configuration,
> > + * It's a main entry function for swap in. By the configuration,
> > * it will read ahead blocks by cluster-based(ie, physical disk based)
> > - * or vma-based(ie, virtual address based on faulty address) readahead.
> > + * or vma-based(ie, virtual address based on faulty address) readahead,
> > + * or skip the readahead (ie, ramdisk based swap device).
> > */
> > -struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> > - struct vm_fault *vmf)
> > +struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp_mask,
> > + struct vm_fault *vmf, bool *swapcached)
>
> May be better to use
>
> struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp_mask,
> struct vm_fault *vmf, struct folio **swapcache)
>
> In this way, we can reduce the number of source lines in the caller.

Following commit will rewrite this part to return a enum instead of
bool, so this is just a intermediate change. And do_swap_page is the
only caller that can benefit from this, not helpful for swapoff/shmem.
I think we can just keep it this way here.