2023-08-07 11:48:45

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v4 00/48] use refcount+RCU method to implement lockless slab shrink

Hi all,

1. Background
=============

We used to implement the lockless slab shrink with SRCU [1], but then kernel
test robot reported -88.8% regression in stress-ng.ramfs.ops_per_sec test
case [2], so we reverted it [3].

This patch series aims to re-implement the lockless slab shrink using the
refcount+RCU method proposed by Dave Chinner [4].

[1]. https://lore.kernel.org/lkml/[email protected]/
[2]. https://lore.kernel.org/lkml/[email protected]/
[3]. https://lore.kernel.org/all/[email protected]/
[4]. https://lore.kernel.org/lkml/[email protected]/

2. Implementation
=================

Currently, the shrinker instances can be divided into the following three types:

a) global shrinker instance statically defined in the kernel, such as
workingset_shadow_shrinker.

b) global shrinker instance statically defined in the kernel modules, such as
mmu_shrinker in x86.

c) shrinker instance embedded in other structures.

For case a, the memory of shrinker instance is never freed. For case b, the
memory of shrinker instance will be freed after synchronize_rcu() when the
module is unloaded. For case c, the memory of shrinker instance will be freed
along with the structure it is embedded in.

In preparation for implementing lockless slab shrink, we need to dynamically
allocate those shrinker instances in case c, then the memory can be dynamically
freed alone by calling kfree_rcu().

This patchset adds the following new APIs for dynamically allocating shrinker,
and add a private_data field to struct shrinker to record and get the original
embedded structure.

1. shrinker_alloc()
2. shrinker_register()
3. shrinker_free()

In order to simplify shrinker-related APIs and make shrinker more independent of
other kernel mechanisms, this patchset uses the above APIs to convert all
shrinkers (including case a and b) to dynamically allocated, and then remove all
existing APIs. This will also have another advantage mentioned by Dave Chinner:

```
The other advantage of this is that it will break all the existing out of tree
code and third party modules using the old API and will no longer work with a
kernel using lockless slab shrinkers. They need to break (both at the source and
binary levels) to stop bad things from happening due to using uncoverted
shrinkers in the new setup.
```

Then we free the shrinker by calling call_rcu(), and use rcu_read_{lock,unlock}()
to ensure that the shrinker instance is valid. And the shrinker::refcount
mechanism ensures that the shrinker instance will not be run again after
unregistration. So the structure that records the pointer of shrinker instance
can be safely freed without waiting for the RCU read-side critical section.

In this way, while we implement the lockless slab shrink, we don't need to be
blocked in unregister_shrinker() to wait RCU read-side critical section.

PATCH 1: fix memory leak in binder_init()
PATCH 2: move some shrinker-related function declarations to mm/internal.h
PATCH 3: move shrinker-related code into a separate file
PATCH 4: remove redundant shrinker_rwsem in debugfs operations
PATCH 5: add infrastructure for dynamically allocating shrinker
PATCH 6 ~ 23: dynamically allocate the shrinker instances in case a and b
PATCH 24 ~ 42: dynamically allocate the shrinker instances in case c
PATCH 43: remove old APIs
PATCH 44: introduce pool_shrink_rwsem to implement private synchronize_shrinkers()
PATCH 45: add a secondary array for shrinker_info::{map, nr_deferred}
PATCH 46 ~ 47: implement the lockless slab shrink
PATCH 48 ~ 49: convert shrinker_rwsem to mutex

3. Testing
==========

3.1 slab shrink stress test
---------------------------

We can reproduce the down_read_trylock() hotspot through the following script:

```

DIR="/root/shrinker/memcg/mnt"

do_create()
{
mkdir -p /sys/fs/cgroup/memory/test
echo 4G > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
for i in `seq 0 $1`;
do
mkdir -p /sys/fs/cgroup/memory/test/$i;
echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs;
mkdir -p $DIR/$i;
done
}

do_mount()
{
for i in `seq $1 $2`;
do
mount -t tmpfs $i $DIR/$i;
done
}

do_touch()
{
for i in `seq $1 $2`;
do
echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs;
dd if=/dev/zero of=$DIR/$i/file$i bs=1M count=1 &
done
}

case "$1" in
touch)
do_touch $2 $3
;;
test)
do_create 4000
do_mount 0 4000
do_touch 0 3000
;;
*)
exit 1
;;
esac
```

Save the above script, then run test and touch commands. Then we can use the
following perf command to view hotspots:

perf top -U -F 999

1) Before applying this patchset:

40.44% [kernel] [k] down_read_trylock
17.59% [kernel] [k] up_read
13.64% [kernel] [k] pv_native_safe_halt
11.90% [kernel] [k] shrink_slab
8.21% [kernel] [k] idr_find
2.71% [kernel] [k] _find_next_bit
1.36% [kernel] [k] shrink_node
0.81% [kernel] [k] shrink_lruvec
0.80% [kernel] [k] __radix_tree_lookup
0.50% [kernel] [k] do_shrink_slab
0.21% [kernel] [k] list_lru_count_one
0.16% [kernel] [k] mem_cgroup_iter

2) After applying this patchset:

60.17% [kernel] [k] shrink_slab
20.42% [kernel] [k] pv_native_safe_halt
3.03% [kernel] [k] do_shrink_slab
2.73% [kernel] [k] shrink_node
2.27% [kernel] [k] shrink_lruvec
2.00% [kernel] [k] __rcu_read_unlock
1.92% [kernel] [k] mem_cgroup_iter
0.98% [kernel] [k] __rcu_read_lock
0.91% [kernel] [k] osq_lock
0.63% [kernel] [k] mem_cgroup_calculate_protection
0.55% [kernel] [k] shrinker_put
0.46% [kernel] [k] list_lru_count_one

We can see that the first perf hotspot becomes shrink_slab, which is what we
expect.

3.2 registeration and unregisteration stress test
-------------------------------------------------

Run the command below to test:

stress-ng --timeout 60 --times --verify --metrics-brief --ramfs 9 &

1) Before applying this patchset:

setting to a 60 second run per stressor
dispatching hogs: 9 ramfs
stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
(secs) (secs) (secs) (real time) (usr+sys time)
ramfs 735238 60.00 12.37 363.70 12253.05 1955.08
for a 60.01s run time:
1440.27s available CPU time
12.36s user time ( 0.86%)
363.70s system time ( 25.25%)
376.06s total time ( 26.11%)
load average: 10.79 4.47 1.69
passed: 9: ramfs (9)
failed: 0
skipped: 0
successful run completed in 60.01s (1 min, 0.01 secs)

2) After applying this patchset:

setting to a 60 second run per stressor
dispatching hogs: 9 ramfs
stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
(secs) (secs) (secs) (real time) (usr+sys time)
ramfs 746698 60.00 12.45 376.16 12444.02 1921.47
for a 60.01s run time:
1440.28s available CPU time
12.44s user time ( 0.86%)
376.16s system time ( 26.12%)
388.60s total time ( 26.98%)
load average: 9.01 3.85 1.49
passed: 9: ramfs (9)
failed: 0
skipped: 0
successful run completed in 60.01s (1 min, 0.01 secs)

We can see that the ops/s has hardly changed.

This series is based on next-20230807.

Comments and suggestions are welcome.

Thanks,
Qi

Changelog in v3 -> v4:
- [PATCH v3 01/49] has been merged, so discard it.
- fix wrong return value in patch v3 15\16\22\27\28\29\34\40.
(pointed by Damien Le Moal)
- fix uninitialized variable in [PATCH v3 04/49]
(pointed by Simon Horman)
- fix typo in [PATCH v3 05/49] (pointed by Simon Horman)
- rebase onto the next-20230807.

Changelog in v2 -> v3:
- add the patch that [PATCH v3 07/49] depends on
- move some shrinker-related function declarations to mm/internal.h
(suggested by Muchun Song)
- combine shrinker_free_non_registered() and shrinker_unregister() into
shrinker_free() (suggested by Dave Chinner)
- add missing __init and fix return value in bch_btree_cache_alloc()
(pointed by Muchun Song)
- remove unnecessary WARN_ON() (pointed by Steven Price)
- go back to use completion to implement lockless slab shrink
(pointed by Dave Chinner)
- collect Acked-bys and Reviewed-bys
- rebase onto the next-20230726.

Changelog in v1 -> v2:
- implement the new APIs and convert all shrinkers to use it.
(suggested by Dave Chinner)
- fix UAF in PATCH [05/29] (pointed by Steven Price)
- add a secondary array for shrinker_info::{map, nr_deferred}
- re-implement the lockless slab shrink
(Since unifying the processing of global and memcg slab shrink needs to
modify the startup sequence (As I mentioned in https://lore.kernel.org/lkml/[email protected]/),
I finally choose to process them separately.)
- collect Acked-bys

Qi Zheng (48):
mm: move some shrinker-related function declarations to mm/internal.h
mm: vmscan: move shrinker-related code into a separate file
mm: shrinker: remove redundant shrinker_rwsem in debugfs operations
mm: shrinker: add infrastructure for dynamically allocating shrinker
kvm: mmu: dynamically allocate the x86-mmu shrinker
binder: dynamically allocate the android-binder shrinker
drm/ttm: dynamically allocate the drm-ttm_pool shrinker
xenbus/backend: dynamically allocate the xen-backend shrinker
erofs: dynamically allocate the erofs-shrinker
f2fs: dynamically allocate the f2fs-shrinker
gfs2: dynamically allocate the gfs2-glock shrinker
gfs2: dynamically allocate the gfs2-qd shrinker
NFSv4.2: dynamically allocate the nfs-xattr shrinkers
nfs: dynamically allocate the nfs-acl shrinker
nfsd: dynamically allocate the nfsd-filecache shrinker
quota: dynamically allocate the dquota-cache shrinker
ubifs: dynamically allocate the ubifs-slab shrinker
rcu: dynamically allocate the rcu-lazy shrinker
rcu: dynamically allocate the rcu-kfree shrinker
mm: thp: dynamically allocate the thp-related shrinkers
sunrpc: dynamically allocate the sunrpc_cred shrinker
mm: workingset: dynamically allocate the mm-shadow shrinker
drm/i915: dynamically allocate the i915_gem_mm shrinker
drm/msm: dynamically allocate the drm-msm_gem shrinker
drm/panfrost: dynamically allocate the drm-panfrost shrinker
dm: dynamically allocate the dm-bufio shrinker
dm zoned: dynamically allocate the dm-zoned-meta shrinker
md/raid5: dynamically allocate the md-raid5 shrinker
bcache: dynamically allocate the md-bcache shrinker
vmw_balloon: dynamically allocate the vmw-balloon shrinker
virtio_balloon: dynamically allocate the virtio-balloon shrinker
mbcache: dynamically allocate the mbcache shrinker
ext4: dynamically allocate the ext4-es shrinker
jbd2,ext4: dynamically allocate the jbd2-journal shrinker
nfsd: dynamically allocate the nfsd-client shrinker
nfsd: dynamically allocate the nfsd-reply shrinker
xfs: dynamically allocate the xfs-buf shrinker
xfs: dynamically allocate the xfs-inodegc shrinker
xfs: dynamically allocate the xfs-qm shrinker
zsmalloc: dynamically allocate the mm-zspool shrinker
fs: super: dynamically allocate the s_shrink
mm: shrinker: remove old APIs
drm/ttm: introduce pool_shrink_rwsem
mm: shrinker: add a secondary array for shrinker_info::{map,
nr_deferred}
mm: shrinker: make global slab shrink lockless
mm: shrinker: make memcg slab shrink lockless
mm: shrinker: hold write lock to reparent shrinker nr_deferred
mm: shrinker: convert shrinker_rwsem to mutex

arch/x86/kvm/mmu/mmu.c | 18 +-
drivers/android/binder_alloc.c | 31 +-
drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 30 +-
drivers/gpu/drm/i915/i915_drv.h | 2 +-
drivers/gpu/drm/msm/msm_drv.c | 4 +-
drivers/gpu/drm/msm/msm_drv.h | 4 +-
drivers/gpu/drm/msm/msm_gem_shrinker.c | 34 +-
drivers/gpu/drm/panfrost/panfrost_device.h | 2 +-
drivers/gpu/drm/panfrost/panfrost_drv.c | 6 +-
drivers/gpu/drm/panfrost/panfrost_gem.h | 2 +-
.../gpu/drm/panfrost/panfrost_gem_shrinker.c | 30 +-
drivers/gpu/drm/ttm/ttm_pool.c | 38 +-
drivers/md/bcache/bcache.h | 2 +-
drivers/md/bcache/btree.c | 27 +-
drivers/md/bcache/sysfs.c | 3 +-
drivers/md/dm-bufio.c | 28 +-
drivers/md/dm-cache-metadata.c | 2 +-
drivers/md/dm-zoned-metadata.c | 29 +-
drivers/md/raid5.c | 26 +-
drivers/md/raid5.h | 2 +-
drivers/misc/vmw_balloon.c | 38 +-
drivers/virtio/virtio_balloon.c | 25 +-
drivers/xen/xenbus/xenbus_probe_backend.c | 18 +-
fs/btrfs/super.c | 2 +-
fs/erofs/utils.c | 20 +-
fs/ext4/ext4.h | 2 +-
fs/ext4/extents_status.c | 24 +-
fs/f2fs/super.c | 32 +-
fs/gfs2/glock.c | 20 +-
fs/gfs2/main.c | 6 +-
fs/gfs2/quota.c | 26 +-
fs/gfs2/quota.h | 3 +-
fs/jbd2/journal.c | 27 +-
fs/kernfs/mount.c | 2 +-
fs/mbcache.c | 23 +-
fs/nfs/nfs42xattr.c | 87 +-
fs/nfs/super.c | 22 +-
fs/nfsd/filecache.c | 23 +-
fs/nfsd/netns.h | 4 +-
fs/nfsd/nfs4state.c | 20 +-
fs/nfsd/nfscache.c | 31 +-
fs/proc/root.c | 2 +-
fs/quota/dquot.c | 18 +-
fs/super.c | 36 +-
fs/ubifs/super.c | 22 +-
fs/xfs/xfs_buf.c | 25 +-
fs/xfs/xfs_buf.h | 2 +-
fs/xfs/xfs_icache.c | 26 +-
fs/xfs/xfs_mount.c | 4 +-
fs/xfs/xfs_mount.h | 2 +-
fs/xfs/xfs_qm.c | 28 +-
fs/xfs/xfs_qm.h | 2 +-
include/linux/fs.h | 2 +-
include/linux/jbd2.h | 2 +-
include/linux/memcontrol.h | 12 +-
include/linux/shrinker.h | 67 +-
kernel/rcu/tree.c | 22 +-
kernel/rcu/tree_nocb.h | 20 +-
mm/Makefile | 4 +-
mm/huge_memory.c | 69 +-
mm/internal.h | 41 +
mm/shrinker.c | 770 ++++++++++++++++++
mm/shrinker_debug.c | 47 +-
mm/vmscan.c | 701 ----------------
mm/workingset.c | 27 +-
mm/zsmalloc.c | 28 +-
net/sunrpc/auth.c | 21 +-
67 files changed, 1540 insertions(+), 1235 deletions(-)
create mode 100644 mm/shrinker.c

--
2.30.2



2023-08-07 11:52:37

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v4 03/48] mm: shrinker: remove redundant shrinker_rwsem in debugfs operations

The debugfs_remove_recursive() will wait for debugfs_file_put() to return,
so the shrinker will not be freed when doing debugfs operations (such as
shrinker_debugfs_count_show() and shrinker_debugfs_scan_write()), so there
is no need to hold shrinker_rwsem during debugfs operations.

Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
mm/shrinker_debug.c | 16 +---------------
1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
index 3ab53fad8876..61702bdc1af4 100644
--- a/mm/shrinker_debug.c
+++ b/mm/shrinker_debug.c
@@ -49,17 +49,12 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
struct mem_cgroup *memcg;
unsigned long total;
bool memcg_aware;
- int ret, nid;
+ int ret = 0, nid;

count_per_node = kcalloc(nr_node_ids, sizeof(unsigned long), GFP_KERNEL);
if (!count_per_node)
return -ENOMEM;

- ret = down_read_killable(&shrinker_rwsem);
- if (ret) {
- kfree(count_per_node);
- return ret;
- }
rcu_read_lock();

memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
@@ -92,7 +87,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);

rcu_read_unlock();
- up_read(&shrinker_rwsem);

kfree(count_per_node);
return ret;
@@ -117,7 +111,6 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
struct mem_cgroup *memcg = NULL;
int nid;
char kbuf[72];
- ssize_t ret;

read_len = size < (sizeof(kbuf) - 1) ? size : (sizeof(kbuf) - 1);
if (copy_from_user(kbuf, buf, read_len))
@@ -146,12 +139,6 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
return -EINVAL;
}

- ret = down_read_killable(&shrinker_rwsem);
- if (ret) {
- mem_cgroup_put(memcg);
- return ret;
- }
-
sc.nid = nid;
sc.memcg = memcg;
sc.nr_to_scan = nr_to_scan;
@@ -159,7 +146,6 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,

shrinker->scan_objects(shrinker, &sc);

- up_read(&shrinker_rwsem);
mem_cgroup_put(memcg);

return size;
--
2.30.2


2023-08-07 11:59:51

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v4 10/48] f2fs: dynamically allocate the f2fs-shrinker

Use new APIs to dynamically allocate the f2fs-shrinker.

Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
fs/f2fs/super.c | 32 ++++++++++++++++++++++++--------
1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index aa1f9a3a8037..9092310582aa 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -83,11 +83,27 @@ void f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned int rate,
#endif

/* f2fs-wide shrinker description */
-static struct shrinker f2fs_shrinker_info = {
- .scan_objects = f2fs_shrink_scan,
- .count_objects = f2fs_shrink_count,
- .seeks = DEFAULT_SEEKS,
-};
+static struct shrinker *f2fs_shrinker_info;
+
+static int __init f2fs_init_shrinker(void)
+{
+ f2fs_shrinker_info = shrinker_alloc(0, "f2fs-shrinker");
+ if (!f2fs_shrinker_info)
+ return -ENOMEM;
+
+ f2fs_shrinker_info->count_objects = f2fs_shrink_count;
+ f2fs_shrinker_info->scan_objects = f2fs_shrink_scan;
+ f2fs_shrinker_info->seeks = DEFAULT_SEEKS;
+
+ shrinker_register(f2fs_shrinker_info);
+
+ return 0;
+}
+
+static void f2fs_exit_shrinker(void)
+{
+ shrinker_free(f2fs_shrinker_info);
+}

enum {
Opt_gc_background,
@@ -4940,7 +4956,7 @@ static int __init init_f2fs_fs(void)
err = f2fs_init_sysfs();
if (err)
goto free_garbage_collection_cache;
- err = register_shrinker(&f2fs_shrinker_info, "f2fs-shrinker");
+ err = f2fs_init_shrinker();
if (err)
goto free_sysfs;
err = register_filesystem(&f2fs_fs_type);
@@ -4985,7 +5001,7 @@ static int __init init_f2fs_fs(void)
f2fs_destroy_root_stats();
unregister_filesystem(&f2fs_fs_type);
free_shrinker:
- unregister_shrinker(&f2fs_shrinker_info);
+ f2fs_exit_shrinker();
free_sysfs:
f2fs_exit_sysfs();
free_garbage_collection_cache:
@@ -5017,7 +5033,7 @@ static void __exit exit_f2fs_fs(void)
f2fs_destroy_post_read_processing();
f2fs_destroy_root_stats();
unregister_filesystem(&f2fs_fs_type);
- unregister_shrinker(&f2fs_shrinker_info);
+ f2fs_exit_shrinker();
f2fs_exit_sysfs();
f2fs_destroy_garbage_collection_cache();
f2fs_destroy_extent_cache();
--
2.30.2


2023-08-07 12:00:02

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v4 14/48] nfs: dynamically allocate the nfs-acl shrinker

Use new APIs to dynamically allocate the nfs-acl shrinker.

Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
fs/nfs/super.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 2284f749d892..1b5cd0444dda 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -129,11 +129,7 @@ static void nfs_ssc_unregister_ops(void)
}
#endif /* CONFIG_NFS_V4_2 */

-static struct shrinker acl_shrinker = {
- .count_objects = nfs_access_cache_count,
- .scan_objects = nfs_access_cache_scan,
- .seeks = DEFAULT_SEEKS,
-};
+static struct shrinker *acl_shrinker;

/*
* Register the NFS filesystems
@@ -153,9 +149,19 @@ int __init register_nfs_fs(void)
ret = nfs_register_sysctl();
if (ret < 0)
goto error_2;
- ret = register_shrinker(&acl_shrinker, "nfs-acl");
- if (ret < 0)
+
+ acl_shrinker = shrinker_alloc(0, "nfs-acl");
+ if (!acl_shrinker) {
+ ret = -ENOMEM;
goto error_3;
+ }
+
+ acl_shrinker->count_objects = nfs_access_cache_count;
+ acl_shrinker->scan_objects = nfs_access_cache_scan;
+ acl_shrinker->seeks = DEFAULT_SEEKS;
+
+ shrinker_register(acl_shrinker);
+
#ifdef CONFIG_NFS_V4_2
nfs_ssc_register_ops();
#endif
@@ -175,7 +181,7 @@ int __init register_nfs_fs(void)
*/
void __exit unregister_nfs_fs(void)
{
- unregister_shrinker(&acl_shrinker);
+ shrinker_free(acl_shrinker);
nfs_unregister_sysctl();
unregister_nfs4_fs();
#ifdef CONFIG_NFS_V4_2
--
2.30.2


2023-08-07 12:01:39

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v4 36/48] nfsd: dynamically allocate the nfsd-reply shrinker

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the nfsd-reply shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct nfsd_net.

Signed-off-by: Qi Zheng <[email protected]>
Acked-by: Chuck Lever <[email protected]>
Acked-by: Jeff Layton <[email protected]>
---
fs/nfsd/netns.h | 2 +-
fs/nfsd/nfscache.c | 31 ++++++++++++++++---------------
2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index f669444d5336..ab303a8b77d5 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -177,7 +177,7 @@ struct nfsd_net {
/* size of cache when we saw the longest hash chain */
unsigned int longest_chain_cachesize;

- struct shrinker nfsd_reply_cache_shrinker;
+ struct shrinker *nfsd_reply_cache_shrinker;

/* tracking server-to-server copy mounts */
spinlock_t nfsd_ssc_lock;
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index 80621a709510..fd56a52aa5fb 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -201,26 +201,29 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)
{
unsigned int hashsize;
unsigned int i;
- int status = 0;

nn->max_drc_entries = nfsd_cache_size_limit();
atomic_set(&nn->num_drc_entries, 0);
hashsize = nfsd_hashsize(nn->max_drc_entries);
nn->maskbits = ilog2(hashsize);

- nn->nfsd_reply_cache_shrinker.scan_objects = nfsd_reply_cache_scan;
- nn->nfsd_reply_cache_shrinker.count_objects = nfsd_reply_cache_count;
- nn->nfsd_reply_cache_shrinker.seeks = 1;
- status = register_shrinker(&nn->nfsd_reply_cache_shrinker,
- "nfsd-reply:%s", nn->nfsd_name);
- if (status)
- return status;
-
nn->drc_hashtbl = kvzalloc(array_size(hashsize,
sizeof(*nn->drc_hashtbl)), GFP_KERNEL);
if (!nn->drc_hashtbl)
+ return -ENOMEM;
+
+ nn->nfsd_reply_cache_shrinker = shrinker_alloc(0, "nfsd-reply:%s",
+ nn->nfsd_name);
+ if (!nn->nfsd_reply_cache_shrinker)
goto out_shrinker;

+ nn->nfsd_reply_cache_shrinker->scan_objects = nfsd_reply_cache_scan;
+ nn->nfsd_reply_cache_shrinker->count_objects = nfsd_reply_cache_count;
+ nn->nfsd_reply_cache_shrinker->seeks = 1;
+ nn->nfsd_reply_cache_shrinker->private_data = nn;
+
+ shrinker_register(nn->nfsd_reply_cache_shrinker);
+
for (i = 0; i < hashsize; i++) {
INIT_LIST_HEAD(&nn->drc_hashtbl[i].lru_head);
spin_lock_init(&nn->drc_hashtbl[i].cache_lock);
@@ -229,7 +232,7 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)

return 0;
out_shrinker:
- unregister_shrinker(&nn->nfsd_reply_cache_shrinker);
+ kvfree(nn->drc_hashtbl);
printk(KERN_ERR "nfsd: failed to allocate reply cache\n");
return -ENOMEM;
}
@@ -239,7 +242,7 @@ void nfsd_reply_cache_shutdown(struct nfsd_net *nn)
struct nfsd_cacherep *rp;
unsigned int i;

- unregister_shrinker(&nn->nfsd_reply_cache_shrinker);
+ shrinker_free(nn->nfsd_reply_cache_shrinker);

for (i = 0; i < nn->drc_hashsize; i++) {
struct list_head *head = &nn->drc_hashtbl[i].lru_head;
@@ -323,8 +326,7 @@ nfsd_prune_bucket_locked(struct nfsd_net *nn, struct nfsd_drc_bucket *b,
static unsigned long
nfsd_reply_cache_count(struct shrinker *shrink, struct shrink_control *sc)
{
- struct nfsd_net *nn = container_of(shrink,
- struct nfsd_net, nfsd_reply_cache_shrinker);
+ struct nfsd_net *nn = shrink->private_data;

return atomic_read(&nn->num_drc_entries);
}
@@ -343,8 +345,7 @@ nfsd_reply_cache_count(struct shrinker *shrink, struct shrink_control *sc)
static unsigned long
nfsd_reply_cache_scan(struct shrinker *shrink, struct shrink_control *sc)
{
- struct nfsd_net *nn = container_of(shrink,
- struct nfsd_net, nfsd_reply_cache_shrinker);
+ struct nfsd_net *nn = shrink->private_data;
unsigned long freed = 0;
LIST_HEAD(dispose);
unsigned int i;
--
2.30.2


2023-08-07 12:01:51

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v4 26/48] dm: dynamically allocate the dm-bufio shrinker

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the dm-bufio shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct dm_bufio_client.

Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
drivers/md/dm-bufio.c | 28 +++++++++++++++++-----------
1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index bc309e41d074..62eb27639c9b 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -963,7 +963,7 @@ struct dm_bufio_client {

sector_t start;

- struct shrinker shrinker;
+ struct shrinker *shrinker;
struct work_struct shrink_work;
atomic_long_t need_shrink;

@@ -2368,7 +2368,7 @@ static unsigned long dm_bufio_shrink_scan(struct shrinker *shrink, struct shrink
{
struct dm_bufio_client *c;

- c = container_of(shrink, struct dm_bufio_client, shrinker);
+ c = shrink->private_data;
atomic_long_add(sc->nr_to_scan, &c->need_shrink);
queue_work(dm_bufio_wq, &c->shrink_work);

@@ -2377,7 +2377,7 @@ static unsigned long dm_bufio_shrink_scan(struct shrinker *shrink, struct shrink

static unsigned long dm_bufio_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
{
- struct dm_bufio_client *c = container_of(shrink, struct dm_bufio_client, shrinker);
+ struct dm_bufio_client *c = shrink->private_data;
unsigned long count = cache_total(&c->cache);
unsigned long retain_target = get_retain_buffers(c);
unsigned long queued_for_cleanup = atomic_long_read(&c->need_shrink);
@@ -2490,14 +2490,20 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign
INIT_WORK(&c->shrink_work, shrink_work);
atomic_long_set(&c->need_shrink, 0);

- c->shrinker.count_objects = dm_bufio_shrink_count;
- c->shrinker.scan_objects = dm_bufio_shrink_scan;
- c->shrinker.seeks = 1;
- c->shrinker.batch = 0;
- r = register_shrinker(&c->shrinker, "dm-bufio:(%u:%u)",
- MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev));
- if (r)
+ c->shrinker = shrinker_alloc(0, "dm-bufio:(%u:%u)",
+ MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev));
+ if (!c->shrinker) {
+ r = -ENOMEM;
goto bad;
+ }
+
+ c->shrinker->count_objects = dm_bufio_shrink_count;
+ c->shrinker->scan_objects = dm_bufio_shrink_scan;
+ c->shrinker->seeks = 1;
+ c->shrinker->batch = 0;
+ c->shrinker->private_data = c;
+
+ shrinker_register(c->shrinker);

mutex_lock(&dm_bufio_clients_lock);
dm_bufio_client_count++;
@@ -2537,7 +2543,7 @@ void dm_bufio_client_destroy(struct dm_bufio_client *c)

drop_buffers(c);

- unregister_shrinker(&c->shrinker);
+ shrinker_free(c->shrinker);
flush_work(&c->shrink_work);

mutex_lock(&dm_bufio_clients_lock);
--
2.30.2


2023-08-07 12:06:04

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v4 07/48] drm/ttm: dynamically allocate the drm-ttm_pool shrinker

Use new APIs to dynamically allocate the drm-ttm_pool shrinker.

Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
drivers/gpu/drm/ttm/ttm_pool.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index cddb9151d20f..c9c9618c0dce 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -73,7 +73,7 @@ static struct ttm_pool_type global_dma32_uncached[MAX_ORDER + 1];

static spinlock_t shrinker_lock;
static struct list_head shrinker_list;
-static struct shrinker mm_shrinker;
+static struct shrinker *mm_shrinker;

/* Allocate pages of size 1 << order with the given gfp_flags */
static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
@@ -734,8 +734,8 @@ static int ttm_pool_debugfs_shrink_show(struct seq_file *m, void *data)
struct shrink_control sc = { .gfp_mask = GFP_NOFS };

fs_reclaim_acquire(GFP_KERNEL);
- seq_printf(m, "%lu/%lu\n", ttm_pool_shrinker_count(&mm_shrinker, &sc),
- ttm_pool_shrinker_scan(&mm_shrinker, &sc));
+ seq_printf(m, "%lu/%lu\n", ttm_pool_shrinker_count(mm_shrinker, &sc),
+ ttm_pool_shrinker_scan(mm_shrinker, &sc));
fs_reclaim_release(GFP_KERNEL);

return 0;
@@ -779,10 +779,17 @@ int ttm_pool_mgr_init(unsigned long num_pages)
&ttm_pool_debugfs_shrink_fops);
#endif

- mm_shrinker.count_objects = ttm_pool_shrinker_count;
- mm_shrinker.scan_objects = ttm_pool_shrinker_scan;
- mm_shrinker.seeks = 1;
- return register_shrinker(&mm_shrinker, "drm-ttm_pool");
+ mm_shrinker = shrinker_alloc(0, "drm-ttm_pool");
+ if (!mm_shrinker)
+ return -ENOMEM;
+
+ mm_shrinker->count_objects = ttm_pool_shrinker_count;
+ mm_shrinker->scan_objects = ttm_pool_shrinker_scan;
+ mm_shrinker->seeks = 1;
+
+ shrinker_register(mm_shrinker);
+
+ return 0;
}

/**
@@ -802,6 +809,6 @@ void ttm_pool_mgr_fini(void)
ttm_pool_type_fini(&global_dma32_uncached[i]);
}

- unregister_shrinker(&mm_shrinker);
+ shrinker_free(mm_shrinker);
WARN_ON(!list_empty(&shrinker_list));
}
--
2.30.2


2023-08-07 12:10:54

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v4 23/48] drm/i915: dynamically allocate the i915_gem_mm shrinker

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the i915_gem_mm shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct drm_i915_private.

Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 30 +++++++++++---------
drivers/gpu/drm/i915/i915_drv.h | 2 +-
2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
index 214763942aa2..4504eb4f31d5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
@@ -284,8 +284,7 @@ unsigned long i915_gem_shrink_all(struct drm_i915_private *i915)
static unsigned long
i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
{
- struct drm_i915_private *i915 =
- container_of(shrinker, struct drm_i915_private, mm.shrinker);
+ struct drm_i915_private *i915 = shrinker->private_data;
unsigned long num_objects;
unsigned long count;

@@ -302,8 +301,8 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
if (num_objects) {
unsigned long avg = 2 * count / num_objects;

- i915->mm.shrinker.batch =
- max((i915->mm.shrinker.batch + avg) >> 1,
+ i915->mm.shrinker->batch =
+ max((i915->mm.shrinker->batch + avg) >> 1,
128ul /* default SHRINK_BATCH */);
}

@@ -313,8 +312,7 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
static unsigned long
i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
{
- struct drm_i915_private *i915 =
- container_of(shrinker, struct drm_i915_private, mm.shrinker);
+ struct drm_i915_private *i915 = shrinker->private_data;
unsigned long freed;

sc->nr_scanned = 0;
@@ -422,12 +420,18 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr

void i915_gem_driver_register__shrinker(struct drm_i915_private *i915)
{
- i915->mm.shrinker.scan_objects = i915_gem_shrinker_scan;
- i915->mm.shrinker.count_objects = i915_gem_shrinker_count;
- i915->mm.shrinker.seeks = DEFAULT_SEEKS;
- i915->mm.shrinker.batch = 4096;
- drm_WARN_ON(&i915->drm, register_shrinker(&i915->mm.shrinker,
- "drm-i915_gem"));
+ i915->mm.shrinker = shrinker_alloc(0, "drm-i915_gem");
+ if (!i915->mm.shrinker) {
+ drm_WARN_ON(&i915->drm, 1);
+ } else {
+ i915->mm.shrinker->scan_objects = i915_gem_shrinker_scan;
+ i915->mm.shrinker->count_objects = i915_gem_shrinker_count;
+ i915->mm.shrinker->seeks = DEFAULT_SEEKS;
+ i915->mm.shrinker->batch = 4096;
+ i915->mm.shrinker->private_data = i915;
+
+ shrinker_register(i915->mm.shrinker);
+ }

i915->mm.oom_notifier.notifier_call = i915_gem_shrinker_oom;
drm_WARN_ON(&i915->drm, register_oom_notifier(&i915->mm.oom_notifier));
@@ -443,7 +447,7 @@ void i915_gem_driver_unregister__shrinker(struct drm_i915_private *i915)
unregister_vmap_purge_notifier(&i915->mm.vmap_notifier));
drm_WARN_ON(&i915->drm,
unregister_oom_notifier(&i915->mm.oom_notifier));
- unregister_shrinker(&i915->mm.shrinker);
+ shrinker_free(i915->mm.shrinker);
}

void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 682ef2b5c7d5..389e8bf140d7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -163,7 +163,7 @@ struct i915_gem_mm {

struct notifier_block oom_notifier;
struct notifier_block vmap_notifier;
- struct shrinker shrinker;
+ struct shrinker *shrinker;

#ifdef CONFIG_MMU_NOTIFIER
/**
--
2.30.2


2023-08-07 12:11:52

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v4 43/48] drm/ttm: introduce pool_shrink_rwsem

Currently, the synchronize_shrinkers() is only used by TTM pool. It only
requires that no shrinkers run in parallel.

After we use RCU+refcount method to implement the lockless slab shrink,
we can not use shrinker_rwsem or synchronize_rcu() to guarantee that all
shrinker invocations have seen an update before freeing memory.

So we introduce a new pool_shrink_rwsem to implement a private
synchronize_shrinkers(), so as to achieve the same purpose.

Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
drivers/gpu/drm/ttm/ttm_pool.c | 15 +++++++++++++++
include/linux/shrinker.h | 2 --
mm/shrinker.c | 15 ---------------
3 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index c9c9618c0dce..38b4c280725c 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -74,6 +74,7 @@ static struct ttm_pool_type global_dma32_uncached[MAX_ORDER + 1];
static spinlock_t shrinker_lock;
static struct list_head shrinker_list;
static struct shrinker *mm_shrinker;
+static DECLARE_RWSEM(pool_shrink_rwsem);

/* Allocate pages of size 1 << order with the given gfp_flags */
static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
@@ -317,6 +318,7 @@ static unsigned int ttm_pool_shrink(void)
unsigned int num_pages;
struct page *p;

+ down_read(&pool_shrink_rwsem);
spin_lock(&shrinker_lock);
pt = list_first_entry(&shrinker_list, typeof(*pt), shrinker_list);
list_move_tail(&pt->shrinker_list, &shrinker_list);
@@ -329,6 +331,7 @@ static unsigned int ttm_pool_shrink(void)
} else {
num_pages = 0;
}
+ up_read(&pool_shrink_rwsem);

return num_pages;
}
@@ -572,6 +575,18 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
}
EXPORT_SYMBOL(ttm_pool_init);

+/**
+ * synchronize_shrinkers - Wait for all running shrinkers to complete.
+ *
+ * This is useful to guarantee that all shrinker invocations have seen an
+ * update, before freeing memory, similar to rcu.
+ */
+static void synchronize_shrinkers(void)
+{
+ down_write(&pool_shrink_rwsem);
+ up_write(&pool_shrink_rwsem);
+}
+
/**
* ttm_pool_fini - Cleanup a pool
*
diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index c55c07c3f0cb..025c8070dd86 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -103,8 +103,6 @@ struct shrinker *shrinker_alloc(unsigned int flags, const char *fmt, ...);
void shrinker_register(struct shrinker *shrinker);
void shrinker_free(struct shrinker *shrinker);

-extern void synchronize_shrinkers(void);
-
#ifdef CONFIG_SHRINKER_DEBUG
extern int __printf(2, 3) shrinker_debugfs_rename(struct shrinker *shrinker,
const char *fmt, ...);
diff --git a/mm/shrinker.c b/mm/shrinker.c
index 3ab301ff122d..a27779ed3798 100644
--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -650,18 +650,3 @@ void shrinker_free(struct shrinker *shrinker)
kfree(shrinker);
}
EXPORT_SYMBOL_GPL(shrinker_free);
-
-/**
- * synchronize_shrinkers - Wait for all running shrinkers to complete.
- *
- * This is equivalent to calling unregister_shrink() and register_shrinker(),
- * but atomically and with less overhead. This is useful to guarantee that all
- * shrinker invocations have seen an update, before freeing memory, similar to
- * rcu.
- */
-void synchronize_shrinkers(void)
-{
- down_write(&shrinker_rwsem);
- up_write(&shrinker_rwsem);
-}
-EXPORT_SYMBOL(synchronize_shrinkers);
--
2.30.2


2023-08-07 12:20:43

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v4 47/48] mm: shrinker: hold write lock to reparent shrinker nr_deferred

For now, reparent_shrinker_deferred() is the only holder of read lock of
shrinker_rwsem. And it already holds the global cgroup_mutex, so it will
not be called in parallel.

Therefore, in order to convert shrinker_rwsem to shrinker_mutex later,
here we change to hold the write lock of shrinker_rwsem to reparent.

Signed-off-by: Qi Zheng <[email protected]>
---
mm/shrinker.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/shrinker.c b/mm/shrinker.c
index fee6f62904fb..a12dede5d21f 100644
--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -299,7 +299,7 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg)
parent = root_mem_cgroup;

/* Prevent from concurrent shrinker_info expand */
- down_read(&shrinker_rwsem);
+ down_write(&shrinker_rwsem);
for_each_node(nid) {
child_info = shrinker_info_protected(memcg, nid);
parent_info = shrinker_info_protected(parent, nid);
@@ -312,7 +312,7 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg)
}
}
}
- up_read(&shrinker_rwsem);
+ up_write(&shrinker_rwsem);
}
#else
static int shrinker_memcg_alloc(struct shrinker *shrinker)
--
2.30.2


2023-08-07 13:10:49

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v4 20/48] mm: thp: dynamically allocate the thp-related shrinkers

Use new APIs to dynamically allocate the thp-zero and thp-deferred_split
shrinkers.

Signed-off-by: Qi Zheng <[email protected]>
---
mm/huge_memory.c | 69 +++++++++++++++++++++++++++++++-----------------
1 file changed, 45 insertions(+), 24 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 947001a7cd42..5d0c7a0b651c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -65,7 +65,11 @@ unsigned long transparent_hugepage_flags __read_mostly =
(1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG)|
(1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG);

-static struct shrinker deferred_split_shrinker;
+static struct shrinker *deferred_split_shrinker;
+static unsigned long deferred_split_count(struct shrinker *shrink,
+ struct shrink_control *sc);
+static unsigned long deferred_split_scan(struct shrinker *shrink,
+ struct shrink_control *sc);

static atomic_t huge_zero_refcount;
struct page *huge_zero_page __read_mostly;
@@ -229,11 +233,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink,
return 0;
}

-static struct shrinker huge_zero_page_shrinker = {
- .count_objects = shrink_huge_zero_page_count,
- .scan_objects = shrink_huge_zero_page_scan,
- .seeks = DEFAULT_SEEKS,
-};
+static struct shrinker *huge_zero_page_shrinker;

#ifdef CONFIG_SYSFS
static ssize_t enabled_show(struct kobject *kobj,
@@ -454,6 +454,40 @@ static inline void hugepage_exit_sysfs(struct kobject *hugepage_kobj)
}
#endif /* CONFIG_SYSFS */

+static int __init thp_shrinker_init(void)
+{
+ huge_zero_page_shrinker = shrinker_alloc(0, "thp-zero");
+ if (!huge_zero_page_shrinker)
+ return -ENOMEM;
+
+ deferred_split_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE |
+ SHRINKER_MEMCG_AWARE |
+ SHRINKER_NONSLAB,
+ "thp-deferred_split");
+ if (!deferred_split_shrinker) {
+ shrinker_free(huge_zero_page_shrinker);
+ return -ENOMEM;
+ }
+
+ huge_zero_page_shrinker->count_objects = shrink_huge_zero_page_count;
+ huge_zero_page_shrinker->scan_objects = shrink_huge_zero_page_scan;
+ huge_zero_page_shrinker->seeks = DEFAULT_SEEKS;
+ shrinker_register(huge_zero_page_shrinker);
+
+ deferred_split_shrinker->count_objects = deferred_split_count;
+ deferred_split_shrinker->scan_objects = deferred_split_scan;
+ deferred_split_shrinker->seeks = DEFAULT_SEEKS;
+ shrinker_register(deferred_split_shrinker);
+
+ return 0;
+}
+
+static void __init thp_shrinker_exit(void)
+{
+ shrinker_free(huge_zero_page_shrinker);
+ shrinker_free(deferred_split_shrinker);
+}
+
static int __init hugepage_init(void)
{
int err;
@@ -482,12 +516,9 @@ static int __init hugepage_init(void)
if (err)
goto err_slab;

- err = register_shrinker(&huge_zero_page_shrinker, "thp-zero");
- if (err)
- goto err_hzp_shrinker;
- err = register_shrinker(&deferred_split_shrinker, "thp-deferred_split");
+ err = thp_shrinker_init();
if (err)
- goto err_split_shrinker;
+ goto err_shrinker;

/*
* By default disable transparent hugepages on smaller systems,
@@ -505,10 +536,8 @@ static int __init hugepage_init(void)

return 0;
err_khugepaged:
- unregister_shrinker(&deferred_split_shrinker);
-err_split_shrinker:
- unregister_shrinker(&huge_zero_page_shrinker);
-err_hzp_shrinker:
+ thp_shrinker_exit();
+err_shrinker:
khugepaged_destroy();
err_slab:
hugepage_exit_sysfs(hugepage_kobj);
@@ -2834,7 +2863,7 @@ void deferred_split_folio(struct folio *folio)
#ifdef CONFIG_MEMCG
if (memcg)
set_shrinker_bit(memcg, folio_nid(folio),
- deferred_split_shrinker.id);
+ deferred_split_shrinker->id);
#endif
}
spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
@@ -2908,14 +2937,6 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
return split;
}

-static struct shrinker deferred_split_shrinker = {
- .count_objects = deferred_split_count,
- .scan_objects = deferred_split_scan,
- .seeks = DEFAULT_SEEKS,
- .flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE |
- SHRINKER_NONSLAB,
-};
-
#ifdef CONFIG_DEBUG_FS
static void split_huge_pages_all(void)
{
--
2.30.2