2024-04-23 17:04:30

by Kairui Song

[permalink] [raw]
Subject: [PATCH v2 0/8] mm/swap: optimize swap cache search space

From: Kairui Song <[email protected]>

Currently we use one swap_address_space for every 64M chunk to reduce lock
contention, this is like having a set of smaller swap files inside one
big swap file. But when doing swap cache look up or insert, we are
still using the offset of the whole large swap file. This is OK for
correctness, as the offset (key) is unique.

But Xarray is specially optimized for small indexes, it creates the
redix tree levels lazily to be just enough to fit the largest key
stored in one Xarray. So we are wasting tree nodes unnecessarily.

For 64M chunk it should only take at most 3 level to contain everything.
But we are using the offset from the whole swap file, so the offset (key)
value will be way beyond 64M, and so will the tree level.

Optimize this by reduce the swap cache search space into 64M scope.

Test with `time memhog 128G` inside a 8G memcg using 128G swap (ramdisk
with SWP_SYNCHRONOUS_IO dropped, tested 3 times, results are stable. The
test result is similar but the improvement is smaller if SWP_SYNCHRONOUS_IO
is enabled, as swap out path can never skip swap cache):

Before:
6.07user 250.74system 4:17.26elapsed 99%CPU (0avgtext+0avgdata 8373376maxresident)k
0inputs+0outputs (55major+33555018minor)pagefaults 0swaps

After (+1.8% faster):
6.08user 246.09system 4:12.58elapsed 99%CPU (0avgtext+0avgdata 8373248maxresident)k
0inputs+0outputs (54major+33555027minor)pagefaults 0swaps

Similar result with MySQL and sysbench using swap:
Before:
94055.61 qps

After (+0.8% faster):
94834.91 qps

There is alse a very slight drop of radix tree node slab usage:
Before: 303952K
After: 302224K


For this series:

There are multiple places that expect mixed type of pages (page cache or
swap cache), eg. migration, huge memory split; There are four helpers
for that:

- page_index
- page_file_offset
- folio_index
- folio_file_pos

To keep the code clean and compatible, this series first cleaned up
usage of page_index and page_file_offset, then convert folio_index
and folio_file_pos to be compatible with separate offsets.
Current users won't be effected. And introduce two new helper
swap_cache_index and swap_dev_pos for swap internal usage.
Replace swp_offset with swap_cache_index when used to retrieve
folio from swap cache, and use swap_dev_pos when needed to retrieve
the device position of a swap entry.

Idealy, in the future, we may want to reduce SWAP_ADDRESS_SPACE_SHIFT
from 14 to 12: Default Xarray chunk offset is 6, so we have 3 level
trees instead of 2 level trees just for 2 extra bits. But swap cache
is based on address_space struct, with 4 times more metadata sparsely
distributed in memory it waste more cacheline, the performance gain
from this series is almost canceled according to my test. So first,
just have a cleaner seperation of offsets and smaller search space.

Patch 1/8 - 6/8: Clean up usage of page_index and page_file_offset
Patch 7/8: Convert folio_index and folio_dev_pos to be compatible with
separate offset.
Patch 8/8: Introduce swap_cache_index and use it when doing lookup in
swap cache.

V1: https://lore.kernel.org/all/[email protected]/
Update from V1:
- Convert more users to use folio directly when possible [Matthew Wilcox]
- Rename swap_file_pos to swap_dev_pos [Huang, Ying]
- Update comments and commit message.
- Adjust headers and add dummy function to fix build error.

This series is part of effort to reduce swap cache overhead, and ultimately
remove SWP_SYNCHRONOUS_IO and unify swap cache usage as proposed before:
https://lore.kernel.org/lkml/[email protected]/

Kairui Song (8):
NFS: remove nfs_page_lengthg and usage of page_index
nilfs2: drop usage of page_index
f2fs: drop usage of page_index
ceph: drop usage of page_index
cifs: drop usage of page_file_offset
mm/swap: get the swap file offset directly
mm: drop page_index/page_file_offset and convert swap helpers to use
folio
mm/swap: reduce swap cache search space

fs/ceph/dir.c | 2 +-
fs/ceph/inode.c | 2 +-
fs/f2fs/data.c | 5 ++---
fs/nfs/internal.h | 19 -------------------
fs/nilfs2/bmap.c | 3 +--
fs/smb/client/file.c | 2 +-
include/linux/mm.h | 13 -------------
include/linux/pagemap.h | 19 +++++++++----------
mm/huge_memory.c | 2 +-
mm/memcontrol.c | 2 +-
mm/mincore.c | 2 +-
mm/page_io.c | 6 +++---
mm/shmem.c | 2 +-
mm/swap.h | 24 ++++++++++++++++++++++++
mm/swap_state.c | 12 ++++++------
mm/swapfile.c | 17 +++++++++++------
16 files changed, 63 insertions(+), 69 deletions(-)

--
2.44.0



2024-04-23 17:05:11

by Kairui Song

[permalink] [raw]
Subject: [PATCH v2 3/8] f2fs: drop usage of page_index

From: Kairui Song <[email protected]>

page_index is needed for mixed usage of page cache and swap cache,
for pure page cache usage, the caller can just use page->index instead.

It can't be a swap cache page here, so just drop it.

Signed-off-by: Kairui Song <[email protected]>
Cc: Chao Yu <[email protected]>
Cc: Jaegeuk Kim <[email protected]>
Cc: [email protected]
---
fs/f2fs/data.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index d9494b5fc7c1..12d5bbd18755 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2057,7 +2057,7 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page,
sector_t block_nr;
int ret = 0;

- block_in_file = (sector_t)page_index(page);
+ block_in_file = (sector_t)page->index;
last_block = block_in_file + nr_pages;
last_block_in_file = bytes_to_blks(inode,
f2fs_readpage_limit(inode) + blocksize - 1);
@@ -4086,8 +4086,7 @@ void f2fs_clear_page_cache_dirty_tag(struct page *page)
unsigned long flags;

xa_lock_irqsave(&mapping->i_pages, flags);
- __xa_clear_mark(&mapping->i_pages, page_index(page),
- PAGECACHE_TAG_DIRTY);
+ __xa_clear_mark(&mapping->i_pages, page->index, PAGECACHE_TAG_DIRTY);
xa_unlock_irqrestore(&mapping->i_pages, flags);
}

--
2.44.0


2024-04-23 17:05:46

by Kairui Song

[permalink] [raw]
Subject: [PATCH v2 5/8] cifs: drop usage of page_file_offset

From: Kairui Song <[email protected]>

page_file_offset is only needed for mixed usage of page cache and
swap cache, for pure page cache usage, the caller can just use
page_offset instead.

It can't be a swap cache page here, so just drop it and convert it to
use folio.

Signed-off-by: Kairui Song <[email protected]>
Cc: Steve French <[email protected]>
Cc: Namjae Jeon <[email protected]>
Cc: Paulo Alcantara <[email protected]>
Cc: Shyam Prasad N <[email protected]>
Cc: Bharath SM <[email protected]>
Signed-off-by: Kairui Song <[email protected]>
---
fs/smb/client/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 16aadce492b2..56ebc0b1444d 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -4749,7 +4749,7 @@ static int cifs_readpage_worker(struct file *file, struct page *page,
static int cifs_read_folio(struct file *file, struct folio *folio)
{
struct page *page = &folio->page;
- loff_t offset = page_file_offset(page);
+ loff_t offset = folio_pos(folio);
int rc = -EACCES;
unsigned int xid;

--
2.44.0


2024-04-23 17:06:26

by Kairui Song

[permalink] [raw]
Subject: [PATCH v2 4/8] ceph: drop usage of page_index

From: Kairui Song <[email protected]>

page_index is needed for mixed usage of page cache and swap cache,
for pure page cache usage, the caller can just use page->index instead.

It can't be a swap cache page here, so just drop it.

Signed-off-by: Kairui Song <[email protected]>
Cc: Xiubo Li <[email protected]>
Cc: Ilya Dryomov <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: [email protected]
---
fs/ceph/dir.c | 2 +-
fs/ceph/inode.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 0e9f56eaba1e..570a9d634cc5 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -141,7 +141,7 @@ __dcache_find_get_entry(struct dentry *parent, u64 idx,
if (ptr_pos >= i_size_read(dir))
return NULL;

- if (!cache_ctl->page || ptr_pgoff != page_index(cache_ctl->page)) {
+ if (!cache_ctl->page || ptr_pgoff != cache_ctl->page->index) {
ceph_readdir_cache_release(cache_ctl);
cache_ctl->page = find_lock_page(&dir->i_data, ptr_pgoff);
if (!cache_ctl->page) {
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 7b2e77517f23..1f92d3faaa6b 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1861,7 +1861,7 @@ static int fill_readdir_cache(struct inode *dir, struct dentry *dn,
unsigned idx = ctl->index % nsize;
pgoff_t pgoff = ctl->index / nsize;

- if (!ctl->page || pgoff != page_index(ctl->page)) {
+ if (!ctl->page || pgoff != ctl->page->index) {
ceph_readdir_cache_release(ctl);
if (idx == 0)
ctl->page = grab_cache_page(&dir->i_data, pgoff);
--
2.44.0


2024-04-23 17:07:25

by Kairui Song

[permalink] [raw]
Subject: [PATCH v2 7/8] mm: drop page_index/page_file_offset and convert swap helpers to use folio

From: Kairui Song <[email protected]>

There are four helpers for retrieving the page index within address
space, or offset within mapped file:

- page_index
- page_file_offset
- folio_index (equivalence of page_index)
- folio_file_pos (equivalence of page_file_offset)

And they are only needed for mixed usage of swap cache & page cache (eg.
migration, huge memory split). Else users can just use folio->index or
folio_pos.

This commit drops page_index and page_file_offset as we have eliminated
all users, and converts folio_index and folio_file_pos (they were simply
wrappers of page_index and page_file_offset, and implemented in a not
very clean way) to use folio internally.

After this commit, there will be only two helpers for users that may
encounter mixed usage of swap cache and page cache:

- folio_index (calls __folio_swap_cache_index for swap cache folio)
- folio_file_pos (calls __folio_swap_dev_pos for swap cache folio)

The index in swap cache and offset in swap device are still basically
the same thing, but will be different in following commits.

Signed-off-by: Kairui Song <[email protected]>
---
include/linux/mm.h | 13 -------------
include/linux/pagemap.h | 19 +++++++++----------
mm/swapfile.c | 13 +++++++++----
3 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0436b919f1c7..797480e76c9c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2245,19 +2245,6 @@ static inline void *folio_address(const struct folio *folio)
return page_address(&folio->page);
}

-extern pgoff_t __page_file_index(struct page *page);
-
-/*
- * Return the pagecache index of the passed page. Regular pagecache pages
- * use ->index whereas swapcache pages use swp_offset(->private)
- */
-static inline pgoff_t page_index(struct page *page)
-{
- if (unlikely(PageSwapCache(page)))
- return __page_file_index(page);
- return page->index;
-}
-
/*
* Return true only if the page has been allocated with
* ALLOC_NO_WATERMARKS and the low watermark was not
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 2df35e65557d..a7d025571ee6 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -780,7 +780,7 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
mapping_gfp_mask(mapping));
}

-#define swapcache_index(folio) __page_file_index(&(folio)->page)
+extern pgoff_t __folio_swap_cache_index(struct folio *folio);

/**
* folio_index - File index of a folio.
@@ -795,9 +795,9 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
*/
static inline pgoff_t folio_index(struct folio *folio)
{
- if (unlikely(folio_test_swapcache(folio)))
- return swapcache_index(folio);
- return folio->index;
+ if (unlikely(folio_test_swapcache(folio)))
+ return __folio_swap_cache_index(folio);
+ return folio->index;
}

/**
@@ -920,11 +920,6 @@ static inline loff_t page_offset(struct page *page)
return ((loff_t)page->index) << PAGE_SHIFT;
}

-static inline loff_t page_file_offset(struct page *page)
-{
- return ((loff_t)page_index(page)) << PAGE_SHIFT;
-}
-
/**
* folio_pos - Returns the byte position of this folio in its file.
* @folio: The folio.
@@ -934,6 +929,8 @@ static inline loff_t folio_pos(struct folio *folio)
return page_offset(&folio->page);
}

+extern loff_t __folio_swap_dev_pos(struct folio *folio);
+
/**
* folio_file_pos - Returns the byte position of this folio in its file.
* @folio: The folio.
@@ -943,7 +940,9 @@ static inline loff_t folio_pos(struct folio *folio)
*/
static inline loff_t folio_file_pos(struct folio *folio)
{
- return page_file_offset(&folio->page);
+ if (unlikely(folio_test_swapcache(folio)))
+ return __folio_swap_dev_pos(folio);
+ return ((loff_t)folio->index << PAGE_SHIFT);
}

/*
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 4919423cce76..2387f5e131d7 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3419,12 +3419,17 @@ struct address_space *swapcache_mapping(struct folio *folio)
}
EXPORT_SYMBOL_GPL(swapcache_mapping);

-pgoff_t __page_file_index(struct page *page)
+pgoff_t __folio_swap_cache_index(struct folio *folio)
{
- swp_entry_t swap = page_swap_entry(page);
- return swp_offset(swap);
+ return swp_offset(folio->swap);
}
-EXPORT_SYMBOL_GPL(__page_file_index);
+EXPORT_SYMBOL_GPL(__folio_swap_cache_index);
+
+loff_t __folio_swap_dev_pos(struct folio *folio)
+{
+ return swap_dev_pos(folio->swap);
+}
+EXPORT_SYMBOL_GPL(__folio_swap_dev_pos);

/*
* add_swap_count_continuation - called when a swap count is duplicated
--
2.44.0


2024-04-23 17:22:14

by Kairui Song

[permalink] [raw]
Subject: [PATCH v2 2/8] nilfs2: drop usage of page_index

From: Kairui Song <[email protected]>

page_index is only for mixed usage of page cache and swap cache, for
pure page cache usage, the caller can just use page->index instead.

It can't be a swap cache page here (being part of buffer head),
so just drop it, also convert it to use folio.

Signed-off-by: Kairui Song <[email protected]>
Cc: Ryusuke Konishi <[email protected]>
Cc: [email protected]
---
fs/nilfs2/bmap.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/nilfs2/bmap.c b/fs/nilfs2/bmap.c
index 383f0afa2cea..9f561afe864f 100644
--- a/fs/nilfs2/bmap.c
+++ b/fs/nilfs2/bmap.c
@@ -453,8 +453,7 @@ __u64 nilfs_bmap_data_get_key(const struct nilfs_bmap *bmap,
struct buffer_head *pbh;
__u64 key;

- key = page_index(bh->b_page) << (PAGE_SHIFT -
- bmap->b_inode->i_blkbits);
+ key = bh->b_folio->index << (PAGE_SHIFT - bmap->b_inode->i_blkbits);
for (pbh = page_buffers(bh->b_page); pbh != bh; pbh = pbh->b_this_page)
key++;

--
2.44.0


2024-04-23 17:27:25

by Kairui Song

[permalink] [raw]
Subject: [PATCH v2 6/8] mm/swap: get the swap file offset directly

From: Kairui Song <[email protected]>

folio_file_pos and page_file_offset are for mixed usage of swap cache
and page cache, it can't be page cache here, so introduce a new helper
to get the swap offset in swap file directly.

Need to include swapops.h in mm/swap.h to ensure swp_offset is always
defined before use.

Signed-off-by: Kairui Song <[email protected]>
---
mm/page_io.c | 6 +++---
mm/swap.h | 9 +++++++++
2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/mm/page_io.c b/mm/page_io.c
index ae2b49055e43..615812cfe4ca 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -279,7 +279,7 @@ static void sio_write_complete(struct kiocb *iocb, long ret)
* be temporary.
*/
pr_err_ratelimited("Write error %ld on dio swapfile (%llu)\n",
- ret, page_file_offset(page));
+ ret, swap_dev_pos(page_swap_entry(page)));
for (p = 0; p < sio->pages; p++) {
page = sio->bvec[p].bv_page;
set_page_dirty(page);
@@ -298,7 +298,7 @@ static void swap_writepage_fs(struct folio *folio, struct writeback_control *wbc
struct swap_iocb *sio = NULL;
struct swap_info_struct *sis = swp_swap_info(folio->swap);
struct file *swap_file = sis->swap_file;
- loff_t pos = folio_file_pos(folio);
+ loff_t pos = swap_dev_pos(folio->swap);

count_swpout_vm_event(folio);
folio_start_writeback(folio);
@@ -429,7 +429,7 @@ static void swap_read_folio_fs(struct folio *folio, struct swap_iocb **plug)
{
struct swap_info_struct *sis = swp_swap_info(folio->swap);
struct swap_iocb *sio = NULL;
- loff_t pos = folio_file_pos(folio);
+ loff_t pos = swap_dev_pos(folio->swap);

if (plug)
sio = *plug;
diff --git a/mm/swap.h b/mm/swap.h
index fc2f6ade7f80..82023ab93205 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -5,6 +5,7 @@
struct mempolicy;

#ifdef CONFIG_SWAP
+#include <linux/swapops.h> /* for swp_offset */
#include <linux/blk_types.h> /* for bio_end_io_t */

/* linux/mm/page_io.c */
@@ -31,6 +32,14 @@ extern struct address_space *swapper_spaces[];
(&swapper_spaces[swp_type(entry)][swp_offset(entry) \
>> SWAP_ADDRESS_SPACE_SHIFT])

+/*
+ * Return the swap device position of the swap entry.
+ */
+static inline loff_t swap_dev_pos(swp_entry_t entry)
+{
+ return ((loff_t)swp_offset(entry)) << PAGE_SHIFT;
+}
+
void show_swap_cache_info(void);
bool add_to_swap(struct folio *folio);
void *get_shadow_from_swap_cache(swp_entry_t entry);
--
2.44.0


2024-04-23 17:28:43

by Kairui Song

[permalink] [raw]
Subject: [PATCH v2 8/8] mm/swap: reduce swap cache search space

From: Kairui Song <[email protected]>

Currently we use one swap_address_space for every 64M chunk to reduce lock
contention, this is like having a set of smaller swap files inside one
big swap file. But when doing swap cache look up or insert, we are
still using the offset of the whole large swap file. This is OK for
correctness, as the offset (key) is unique.

But Xarray is specially optimized for small indexes, it creates the
radix tree levels lazily to be just enough to fit the largest key
stored in one Xarray. So we are wasting tree nodes unnecessarily.

For 64M chunk it should only take at most 3 levels to contain everything.
But we are using the offset from the whole swap file, so the offset (key)
value will be way beyond 64M, and so will the tree level.

Optimize this by using a new helper swap_cache_index to get a swap
entry's unique offset in its own 64M swap_address_space.

I see a ~1% performance gain in benchmark and actual workload with
high memory pressure.

Test with `time memhog 128G` inside a 8G memcg using 128G swap (ramdisk
with SWP_SYNCHRONOUS_IO dropped, tested 3 times, results are stable. The
test result is similar but the improvement is smaller if SWP_SYNCHRONOUS_IO
is enabled, as swap out path can never skip swap cache):

Before:
6.07user 250.74system 4:17.26elapsed 99%CPU (0avgtext+0avgdata 8373376maxresident)k
0inputs+0outputs (55major+33555018minor)pagefaults 0swaps

After (1.8% faster):
6.08user 246.09system 4:12.58elapsed 99%CPU (0avgtext+0avgdata 8373248maxresident)k
0inputs+0outputs (54major+33555027minor)pagefaults 0swaps

Similar result with MySQL and sysbench using swap:
Before:
94055.61 qps

After (0.8% faster):
94834.91 qps

Radix tree slab usage is also very slightly lower.

Signed-off-by: Kairui Song <[email protected]>
---
mm/huge_memory.c | 2 +-
mm/memcontrol.c | 2 +-
mm/mincore.c | 2 +-
mm/shmem.c | 2 +-
mm/swap.h | 15 +++++++++++++++
mm/swap_state.c | 12 ++++++------
mm/swapfile.c | 6 +++---
7 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9859aa4f7553..1208d60792f0 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2903,7 +2903,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
split_page_memcg(head, order, new_order);

if (folio_test_anon(folio) && folio_test_swapcache(folio)) {
- offset = swp_offset(folio->swap);
+ offset = swap_cache_index(folio->swap);
swap_cache = swap_address_space(folio->swap);
xa_lock(&swap_cache->i_pages);
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fabce2b50c69..04d7be7f30dc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5934,7 +5934,7 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
* Because swap_cache_get_folio() updates some statistics counter,
* we call find_get_page() with swapper_space directly.
*/
- page = find_get_page(swap_address_space(ent), swp_offset(ent));
+ page = find_get_page(swap_address_space(ent), swap_cache_index(ent));
entry->val = ent.val;

return page;
diff --git a/mm/mincore.c b/mm/mincore.c
index dad3622cc963..e31cf1bde614 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -139,7 +139,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
} else {
#ifdef CONFIG_SWAP
*vec = mincore_page(swap_address_space(entry),
- swp_offset(entry));
+ swap_cache_index(entry));
#else
WARN_ON(1);
*vec = 1;
diff --git a/mm/shmem.c b/mm/shmem.c
index 0aad0d9a621b..cbe33ab52a73 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1762,7 +1762,7 @@ static int shmem_replace_folio(struct folio **foliop, gfp_t gfp,

old = *foliop;
entry = old->swap;
- swap_index = swp_offset(entry);
+ swap_index = swap_cache_index(entry);
swap_mapping = swap_address_space(entry);

/*
diff --git a/mm/swap.h b/mm/swap.h
index 82023ab93205..93e3e1b58a7f 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -27,6 +27,7 @@ void __swap_writepage(struct folio *folio, struct writeback_control *wbc);
/* One swap address space for each 64M swap space */
#define SWAP_ADDRESS_SPACE_SHIFT 14
#define SWAP_ADDRESS_SPACE_PAGES (1 << SWAP_ADDRESS_SPACE_SHIFT)
+#define SWAP_ADDRESS_SPACE_MASK (BIT(SWAP_ADDRESS_SPACE_SHIFT) - 1)
extern struct address_space *swapper_spaces[];
#define swap_address_space(entry) \
(&swapper_spaces[swp_type(entry)][swp_offset(entry) \
@@ -40,6 +41,15 @@ static inline loff_t swap_dev_pos(swp_entry_t entry)
return ((loff_t)swp_offset(entry)) << PAGE_SHIFT;
}

+/*
+ * Return the swap cache index of the swap entry.
+ */
+static inline pgoff_t swap_cache_index(swp_entry_t entry)
+{
+ BUILD_BUG_ON((SWP_OFFSET_MASK | SWAP_ADDRESS_SPACE_MASK) != SWP_OFFSET_MASK);
+ return swp_offset(entry) & SWAP_ADDRESS_SPACE_MASK;
+}
+
void show_swap_cache_info(void);
bool add_to_swap(struct folio *folio);
void *get_shadow_from_swap_cache(swp_entry_t entry);
@@ -86,6 +96,11 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
return NULL;
}

+static inline pgoff_t swap_cache_index(swp_entry_t entry)
+{
+ return 0;
+}
+
static inline void show_swap_cache_info(void)
{
}
diff --git a/mm/swap_state.c b/mm/swap_state.c
index bfc7e8c58a6d..9dbb54c72770 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -72,7 +72,7 @@ void show_swap_cache_info(void)
void *get_shadow_from_swap_cache(swp_entry_t entry)
{
struct address_space *address_space = swap_address_space(entry);
- pgoff_t idx = swp_offset(entry);
+ pgoff_t idx = swap_cache_index(entry);
struct page *page;

page = xa_load(&address_space->i_pages, idx);
@@ -89,7 +89,7 @@ int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
gfp_t gfp, void **shadowp)
{
struct address_space *address_space = swap_address_space(entry);
- pgoff_t idx = swp_offset(entry);
+ pgoff_t idx = swap_cache_index(entry);
XA_STATE_ORDER(xas, &address_space->i_pages, idx, folio_order(folio));
unsigned long i, nr = folio_nr_pages(folio);
void *old;
@@ -144,7 +144,7 @@ void __delete_from_swap_cache(struct folio *folio,
struct address_space *address_space = swap_address_space(entry);
int i;
long nr = folio_nr_pages(folio);
- pgoff_t idx = swp_offset(entry);
+ pgoff_t idx = swap_cache_index(entry);
XA_STATE(xas, &address_space->i_pages, idx);

xas_set_update(&xas, workingset_update_node);
@@ -350,7 +350,7 @@ struct folio *swap_cache_get_folio(swp_entry_t entry,
{
struct folio *folio;

- folio = filemap_get_folio(swap_address_space(entry), swp_offset(entry));
+ folio = filemap_get_folio(swap_address_space(entry), swap_cache_index(entry));
if (!IS_ERR(folio)) {
bool vma_ra = swap_use_vma_readahead();
bool readahead;
@@ -420,7 +420,7 @@ struct folio *filemap_get_incore_folio(struct address_space *mapping,
si = get_swap_device(swp);
if (!si)
return ERR_PTR(-ENOENT);
- index = swp_offset(swp);
+ index = swap_cache_index(swp);
folio = filemap_get_folio(swap_address_space(swp), index);
put_swap_device(si);
return folio;
@@ -447,7 +447,7 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
* that would confuse statistics.
*/
folio = filemap_get_folio(swap_address_space(entry),
- swp_offset(entry));
+ swap_cache_index(entry));
if (!IS_ERR(folio))
goto got_folio;

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 2387f5e131d7..3d838659f55f 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -138,7 +138,7 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
struct folio *folio;
int ret = 0;

- folio = filemap_get_folio(swap_address_space(entry), offset);
+ folio = filemap_get_folio(swap_address_space(entry), swap_cache_index(entry));
if (IS_ERR(folio))
return 0;
/*
@@ -2110,7 +2110,7 @@ static int try_to_unuse(unsigned int type)
(i = find_next_to_unuse(si, i)) != 0) {

entry = swp_entry(type, i);
- folio = filemap_get_folio(swap_address_space(entry), i);
+ folio = filemap_get_folio(swap_address_space(entry), swap_cache_index(entry));
if (IS_ERR(folio))
continue;

@@ -3421,7 +3421,7 @@ EXPORT_SYMBOL_GPL(swapcache_mapping);

pgoff_t __folio_swap_cache_index(struct folio *folio)
{
- return swp_offset(folio->swap);
+ return swap_cache_index(folio->swap);
}
EXPORT_SYMBOL_GPL(__folio_swap_cache_index);

--
2.44.0


2024-04-23 18:48:35

by Ryusuke Konishi

[permalink] [raw]
Subject: Re: [PATCH v2 2/8] nilfs2: drop usage of page_index

On Wed, Apr 24, 2024 at 2:04 AM Kairui Song wrote:
>
> From: Kairui Song <[email protected]>
>
> page_index is only for mixed usage of page cache and swap cache, for
> pure page cache usage, the caller can just use page->index instead.
>
> It can't be a swap cache page here (being part of buffer head),
> so just drop it, also convert it to use folio.
>
> Signed-off-by: Kairui Song <[email protected]>
> Cc: Ryusuke Konishi <[email protected]>
> Cc: [email protected]
> ---
> fs/nilfs2/bmap.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/nilfs2/bmap.c b/fs/nilfs2/bmap.c
> index 383f0afa2cea..9f561afe864f 100644
> --- a/fs/nilfs2/bmap.c
> +++ b/fs/nilfs2/bmap.c
> @@ -453,8 +453,7 @@ __u64 nilfs_bmap_data_get_key(const struct nilfs_bmap *bmap,
> struct buffer_head *pbh;
> __u64 key;
>
> - key = page_index(bh->b_page) << (PAGE_SHIFT -
> - bmap->b_inode->i_blkbits);
> + key = bh->b_folio->index << (PAGE_SHIFT - bmap->b_inode->i_blkbits);
> for (pbh = page_buffers(bh->b_page); pbh != bh; pbh = pbh->b_this_page)
> key++;

This conversion mixes the use of page and folio within the function.
Would you like to take the opportunity to convert
"page_buffers(bh->b_page)" to "folio_buffers(bh->b_folio)" as well?

Thanks,
Ryusuke Konishi

2024-04-24 00:13:38

by Xiubo Li

[permalink] [raw]
Subject: Re: [PATCH v2 4/8] ceph: drop usage of page_index


On 4/24/24 01:03, Kairui Song wrote:
> From: Kairui Song <[email protected]>
>
> page_index is needed for mixed usage of page cache and swap cache,
> for pure page cache usage, the caller can just use page->index instead.
>
> It can't be a swap cache page here, so just drop it.
>
> Signed-off-by: Kairui Song <[email protected]>
> Cc: Xiubo Li <[email protected]>
> Cc: Ilya Dryomov <[email protected]>
> Cc: Jeff Layton <[email protected]>
> Cc: [email protected]
> ---
> fs/ceph/dir.c | 2 +-
> fs/ceph/inode.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index 0e9f56eaba1e..570a9d634cc5 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -141,7 +141,7 @@ __dcache_find_get_entry(struct dentry *parent, u64 idx,
> if (ptr_pos >= i_size_read(dir))
> return NULL;
>
> - if (!cache_ctl->page || ptr_pgoff != page_index(cache_ctl->page)) {
> + if (!cache_ctl->page || ptr_pgoff != cache_ctl->page->index) {
> ceph_readdir_cache_release(cache_ctl);
> cache_ctl->page = find_lock_page(&dir->i_data, ptr_pgoff);
> if (!cache_ctl->page) {
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index 7b2e77517f23..1f92d3faaa6b 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -1861,7 +1861,7 @@ static int fill_readdir_cache(struct inode *dir, struct dentry *dn,
> unsigned idx = ctl->index % nsize;
> pgoff_t pgoff = ctl->index / nsize;
>
> - if (!ctl->page || pgoff != page_index(ctl->page)) {
> + if (!ctl->page || pgoff != ctl->page->index) {
> ceph_readdir_cache_release(ctl);
> if (idx == 0)
> ctl->page = grab_cache_page(&dir->i_data, pgoff);

LGTM.

Reviewed-by: Xiubo Li <[email protected]>



2024-04-24 01:57:49

by Kairui Song

[permalink] [raw]
Subject: Re: [PATCH v2 2/8] nilfs2: drop usage of page_index

On Wed, Apr 24, 2024 at 2:51 AM Ryusuke Konishi
<[email protected]> wrote:
>
> On Wed, Apr 24, 2024 at 2:04 AM Kairui Song wrote:
> >
> > From: Kairui Song <[email protected]>
> >
> > page_index is only for mixed usage of page cache and swap cache, for
> > pure page cache usage, the caller can just use page->index instead.
> >
> > It can't be a swap cache page here (being part of buffer head),
> > so just drop it, also convert it to use folio.
> >
> > Signed-off-by: Kairui Song <[email protected]>
> > Cc: Ryusuke Konishi <[email protected]>
> > Cc: [email protected]
> > ---
> > fs/nilfs2/bmap.c | 3 +--
> > 1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/fs/nilfs2/bmap.c b/fs/nilfs2/bmap.c
> > index 383f0afa2cea..9f561afe864f 100644
> > --- a/fs/nilfs2/bmap.c
> > +++ b/fs/nilfs2/bmap.c
> > @@ -453,8 +453,7 @@ __u64 nilfs_bmap_data_get_key(const struct nilfs_bmap *bmap,
> > struct buffer_head *pbh;
> > __u64 key;
> >
> > - key = page_index(bh->b_page) << (PAGE_SHIFT -
> > - bmap->b_inode->i_blkbits);
> > + key = bh->b_folio->index << (PAGE_SHIFT - bmap->b_inode->i_blkbits);
> > for (pbh = page_buffers(bh->b_page); pbh != bh; pbh = pbh->b_this_page)
> > key++;
>
> This conversion mixes the use of page and folio within the function.
> Would you like to take the opportunity to convert
> "page_buffers(bh->b_page)" to "folio_buffers(bh->b_folio)" as well?

OK, will update this part.

2024-04-24 03:59:59

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 3/8] f2fs: drop usage of page_index

On Wed, Apr 24, 2024 at 01:03:34AM +0800, Kairui Song wrote:
> @@ -4086,8 +4086,7 @@ void f2fs_clear_page_cache_dirty_tag(struct page *page)
> unsigned long flags;
>
> xa_lock_irqsave(&mapping->i_pages, flags);
> - __xa_clear_mark(&mapping->i_pages, page_index(page),
> - PAGECACHE_TAG_DIRTY);
> + __xa_clear_mark(&mapping->i_pages, page->index, PAGECACHE_TAG_DIRTY);
> xa_unlock_irqrestore(&mapping->i_pages, flags);
> }

I just sent a patch which is going to conflict with this:

https://lore.kernel.org/linux-mm/[email protected]/

Chao Yu, Jaegeuk Kim; what are your plans for converting f2fs to use
folios? This is getting quite urgent.

2024-04-24 08:22:20

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 7/8] mm: drop page_index/page_file_offset and convert swap helpers to use folio

On Wed, Apr 24, 2024 at 10:17:04AM +0800, Huang, Ying wrote:
> Kairui Song <[email protected]> writes:
> > static inline loff_t folio_file_pos(struct folio *folio)
> > {
> > - return page_file_offset(&folio->page);
> > + if (unlikely(folio_test_swapcache(folio)))
> > + return __folio_swap_dev_pos(folio);
> > + return ((loff_t)folio->index << PAGE_SHIFT);
>
> This still looks confusing for me. The function returns the byte
> position of the folio in its file. But we returns the swap device
> position of the folio.
>
> Tried to search folio_file_pos() usage. The 2 usage in page_io.c is
> swap specific, we can use swap_dev_pos() directly.
>
> There are also other file system users (NFS and AFS) of
> folio_file_pos(), I don't know why they need to work with swap
> cache. Cced file system maintainers for help.

Time for a history lesson!

In d56b4ddf7781 (2012) we introduced page_file_index() and
page_file_mapping() to support swap-over-NFS. Writes to the swapfile went
through ->direct_IO but reads went through ->readpage. So NFS was changed
to remove direct references to page->mapping and page->index because
those aren't right for anon pages (or shmem pages being swapped out).

In e1209d3a7a67 (2022), we stopped using ->readpage in favour of using
->swap_rw. Now we don't need to use page_file_*(); we get the swap_file
and ki_pos directly in the swap_iocb. But there are still relics in NFS
that nobody has dared rip out. And there are all the copy-and-pasted
filesystems that use page_file_* because they don't know any better.

We should delete page_file_*() and folio_file_*(). They shouldn't be
needed any more.

2024-04-28 03:18:01

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH v2 3/8] f2fs: drop usage of page_index

On 2024/4/24 6:58, Matthew Wilcox wrote:
> On Wed, Apr 24, 2024 at 01:03:34AM +0800, Kairui Song wrote:
>> @@ -4086,8 +4086,7 @@ void f2fs_clear_page_cache_dirty_tag(struct page *page)
>> unsigned long flags;
>>
>> xa_lock_irqsave(&mapping->i_pages, flags);
>> - __xa_clear_mark(&mapping->i_pages, page_index(page),
>> - PAGECACHE_TAG_DIRTY);
>> + __xa_clear_mark(&mapping->i_pages, page->index, PAGECACHE_TAG_DIRTY);
>> xa_unlock_irqrestore(&mapping->i_pages, flags);
>> }
>
> I just sent a patch which is going to conflict with this:
>
> https://lore.kernel.org/linux-mm/[email protected]/
>
> Chao Yu, Jaegeuk Kim; what are your plans for converting f2fs to use

Hi Matthew,

I've converted .read_folio and .readahead of f2fs to use folio w/ below patchset,
and let me take a look how to support and enable large folio...

https://lore.kernel.org/linux-f2fs-devel/[email protected]/

Thanks,

> folios? This is getting quite urgent.