2023-07-24 09:49:28

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 00/47] use refcount+RCU method to implement lockless slab shrink

Hi all,

1. Background
=============

We used to implement the lockless slab shrink with SRCU [1], but then kernel
test robot reported -88.8% regression in stress-ng.ramfs.ops_per_sec test
case [2], so we reverted it [3].

This patch series aims to re-implement the lockless slab shrink using the
refcount+RCU method proposed by Dave Chinner [4].

[1]. https://lore.kernel.org/lkml/[email protected]/
[2]. https://lore.kernel.org/lkml/[email protected]/
[3]. https://lore.kernel.org/all/[email protected]/
[4]. https://lore.kernel.org/lkml/[email protected]/

2. Implementation
=================

Currently, the shrinker instances can be divided into the following three types:

a) global shrinker instance statically defined in the kernel, such as
workingset_shadow_shrinker.

b) global shrinker instance statically defined in the kernel modules, such as
mmu_shrinker in x86.

c) shrinker instance embedded in other structures.

For case a, the memory of shrinker instance is never freed. For case b, the
memory of shrinker instance will be freed after synchronize_rcu() when the
module is unloaded. For case c, the memory of shrinker instance will be freed
along with the structure it is embedded in.

In preparation for implementing lockless slab shrink, we need to dynamically
allocate those shrinker instances in case c, then the memory can be dynamically
freed alone by calling kfree_rcu().

This patchset adds the following new APIs for dynamically allocating shrinker,
and add a private_data field to struct shrinker to record and get the original
embedded structure.

1. shrinker_alloc()
2. shrinker_free_non_registered()
3. shrinker_register()
4. shrinker_unregister()

In order to simplify shrinker-related APIs and make shrinker more independent of
other kernel mechanisms, this patchset uses the above APIs to convert all
shrinkers (including case a and b) to dynamically allocated, and then remove all
existing APIs. This will also have another advantage mentioned by Dave Chinner:

```
The other advantage of this is that it will break all the existing out of tree
code and third party modules using the old API and will no longer work with a
kernel using lockless slab shrinkers. They need to break (both at the source and
binary levels) to stop bad things from happening due to using uncoverted
shrinkers in the new setup.
```

Then we free the shrinker by calling kfree_rcu(), and use rcu_read_{lock,unlock}()
to ensure that the shrinker instance is valid. And the shrinker::refcount
mechanism ensures that the shrinker instance will not be run again after
unregistration. So the structure that records the pointer of shrinker instance
can be safely freed without waiting for the RCU read-side critical section.

In this way, while we implement the lockless slab shrink, we don't need to be
blocked in unregister_shrinker() to wait RCU read-side critical section.

PATCH 1: move shrinker-related code into a separate file
PATCH 2: remove redundant shrinker_rwsem in debugfs operations
PATCH 3: add infrastructure for dynamically allocating shrinker
PATCH 4 ~ 21: dynamically allocate the shrinker instances in case a and b
PATCH 22 ~ 40: dynamically allocate the shrinker instances in case c
PATCH 41: remove old APIs
PATCH 42: introduce pool_shrink_rwsem to implement private synchronize_shrinkers()
PATCH 43: add a secondary array for shrinker_info::{map, nr_deferred}
PATCH 44 ~ 45: implement the lockless slab shrink
PATCH 46 ~ 47: convert shrinker_rwsem to mutex

3. Testing
==========

3.1 slab shrink stress test
---------------------------

We can reproduce the down_read_trylock() hotspot through the following script:

```

DIR="/root/shrinker/memcg/mnt"

do_create()
{
mkdir -p /sys/fs/cgroup/memory/test
echo 4G > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
for i in `seq 0 $1`;
do
mkdir -p /sys/fs/cgroup/memory/test/$i;
echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs;
mkdir -p $DIR/$i;
done
}

do_mount()
{
for i in `seq $1 $2`;
do
mount -t tmpfs $i $DIR/$i;
done
}

do_touch()
{
for i in `seq $1 $2`;
do
echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs;
dd if=/dev/zero of=$DIR/$i/file$i bs=1M count=1 &
done
}

case "$1" in
touch)
do_touch $2 $3
;;
test)
do_create 4000
do_mount 0 4000
do_touch 0 3000
;;
*)
exit 1
;;
esac
```

Save the above script, then run test and touch commands. Then we can use the
following perf command to view hotspots:

perf top -U -F 999

1) Before applying this patchset:

40.44% [kernel] [k] down_read_trylock
17.59% [kernel] [k] up_read
13.64% [kernel] [k] pv_native_safe_halt
11.90% [kernel] [k] shrink_slab
8.21% [kernel] [k] idr_find
2.71% [kernel] [k] _find_next_bit
1.36% [kernel] [k] shrink_node
0.81% [kernel] [k] shrink_lruvec
0.80% [kernel] [k] __radix_tree_lookup
0.50% [kernel] [k] do_shrink_slab
0.21% [kernel] [k] list_lru_count_one
0.16% [kernel] [k] mem_cgroup_iter

2) After applying this patchset:

60.17% [kernel] [k] shrink_slab
20.42% [kernel] [k] pv_native_safe_halt
3.03% [kernel] [k] do_shrink_slab
2.73% [kernel] [k] shrink_node
2.27% [kernel] [k] shrink_lruvec
2.00% [kernel] [k] __rcu_read_unlock
1.92% [kernel] [k] mem_cgroup_iter
0.98% [kernel] [k] __rcu_read_lock
0.91% [kernel] [k] osq_lock
0.63% [kernel] [k] mem_cgroup_calculate_protection
0.55% [kernel] [k] shrinker_put
0.46% [kernel] [k] list_lru_count_one

We can see that the first perf hotspot becomes shrink_slab, which is what we
expect.

3.2 registeration and unregisteration stress test
-------------------------------------------------

Run the command below to test:

stress-ng --timeout 60 --times --verify --metrics-brief --ramfs 9 &

1) Before applying this patchset:

setting to a 60 second run per stressor
dispatching hogs: 9 ramfs
stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
(secs) (secs) (secs) (real time) (usr+sys time)
ramfs 735238 60.00 12.37 363.70 12253.05 1955.08
for a 60.01s run time:
1440.27s available CPU time
12.36s user time ( 0.86%)
363.70s system time ( 25.25%)
376.06s total time ( 26.11%)
load average: 10.79 4.47 1.69
passed: 9: ramfs (9)
failed: 0
skipped: 0
successful run completed in 60.01s (1 min, 0.01 secs)

2) After applying this patchset:

setting to a 60 second run per stressor
dispatching hogs: 9 ramfs
stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
(secs) (secs) (secs) (real time) (usr+sys time)
ramfs 746677 60.00 12.22 367.75 12443.70 1965.13
for a 60.01s run time:
1440.26s available CPU time
12.21s user time ( 0.85%)
367.75s system time ( 25.53%)
379.96s total time ( 26.38%)
load average: 8.37 2.48 0.86
passed: 9: ramfs (9)
failed: 0
skipped: 0
successful run completed in 60.01s (1 min, 0.01 secs)

We can see that the ops/s has hardly changed.

This series is based on next-20230711, and the [PATCH v2 05/49] depends on the
patch: https://lore.kernel.org/lkml/[email protected]/.

Comments and suggestions are welcome.

Thanks,
Qi

Changelog in v1 -> v2:
- implement the new APIs and convert all shrinkers to use it.
(suggested by Dave Chinner)
- fix UAF in PATCH [05/29] (pointed by Steven Price)
- add a secondary array for shrinker_info::{map, nr_deferred}
- re-implement the lockless slab shrink
(Since unifying the processing of global and memcg slab shrink needs to
modify the startup sequence (As I mentioned in https://lore.kernel.org/lkml/[email protected]/),
I finally choose to process them separately.)
- collect Acked-bys

Qi Zheng (47):
mm: vmscan: move shrinker-related code into a separate file
mm: shrinker: remove redundant shrinker_rwsem in debugfs operations
mm: shrinker: add infrastructure for dynamically allocating shrinker
kvm: mmu: dynamically allocate the x86-mmu shrinker
binder: dynamically allocate the android-binder shrinker
drm/ttm: dynamically allocate the drm-ttm_pool shrinker
xenbus/backend: dynamically allocate the xen-backend shrinker
erofs: dynamically allocate the erofs-shrinker
f2fs: dynamically allocate the f2fs-shrinker
gfs2: dynamically allocate the gfs2-glock shrinker
gfs2: dynamically allocate the gfs2-qd shrinker
NFSv4.2: dynamically allocate the nfs-xattr shrinkers
nfs: dynamically allocate the nfs-acl shrinker
nfsd: dynamically allocate the nfsd-filecache shrinker
quota: dynamically allocate the dquota-cache shrinker
ubifs: dynamically allocate the ubifs-slab shrinker
rcu: dynamically allocate the rcu-lazy shrinker
rcu: dynamically allocate the rcu-kfree shrinker
mm: thp: dynamically allocate the thp-related shrinkers
sunrpc: dynamically allocate the sunrpc_cred shrinker
mm: workingset: dynamically allocate the mm-shadow shrinker
drm/i915: dynamically allocate the i915_gem_mm shrinker
drm/msm: dynamically allocate the drm-msm_gem shrinker
drm/panfrost: dynamically allocate the drm-panfrost shrinker
dm: dynamically allocate the dm-bufio shrinker
dm zoned: dynamically allocate the dm-zoned-meta shrinker
md/raid5: dynamically allocate the md-raid5 shrinker
bcache: dynamically allocate the md-bcache shrinker
vmw_balloon: dynamically allocate the vmw-balloon shrinker
virtio_balloon: dynamically allocate the virtio-balloon shrinker
mbcache: dynamically allocate the mbcache shrinker
ext4: dynamically allocate the ext4-es shrinker
jbd2,ext4: dynamically allocate the jbd2-journal shrinker
nfsd: dynamically allocate the nfsd-client shrinker
nfsd: dynamically allocate the nfsd-reply shrinker
xfs: dynamically allocate the xfs-buf shrinker
xfs: dynamically allocate the xfs-inodegc shrinker
xfs: dynamically allocate the xfs-qm shrinker
zsmalloc: dynamically allocate the mm-zspool shrinker
fs: super: dynamically allocate the s_shrink
mm: shrinker: remove old APIs
drm/ttm: introduce pool_shrink_rwsem
mm: shrinker: add a secondary array for shrinker_info::{map,
nr_deferred}
mm: shrinker: make global slab shrink lockless
mm: shrinker: make memcg slab shrink lockless
mm: shrinker: hold write lock to reparent shrinker nr_deferred
mm: shrinker: convert shrinker_rwsem to mutex

arch/x86/kvm/mmu/mmu.c | 18 +-
drivers/android/binder_alloc.c | 31 +-
drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 30 +-
drivers/gpu/drm/i915/i915_drv.h | 2 +-
drivers/gpu/drm/msm/msm_drv.c | 4 +-
drivers/gpu/drm/msm/msm_drv.h | 4 +-
drivers/gpu/drm/msm/msm_gem_shrinker.c | 36 +-
drivers/gpu/drm/panfrost/panfrost_device.h | 2 +-
drivers/gpu/drm/panfrost/panfrost_drv.c | 6 +-
drivers/gpu/drm/panfrost/panfrost_gem.h | 2 +-
.../gpu/drm/panfrost/panfrost_gem_shrinker.c | 32 +-
drivers/gpu/drm/ttm/ttm_pool.c | 38 +-
drivers/md/bcache/bcache.h | 2 +-
drivers/md/bcache/btree.c | 27 +-
drivers/md/bcache/sysfs.c | 3 +-
drivers/md/dm-bufio.c | 26 +-
drivers/md/dm-cache-metadata.c | 2 +-
drivers/md/dm-zoned-metadata.c | 28 +-
drivers/md/raid5.c | 25 +-
drivers/md/raid5.h | 2 +-
drivers/misc/vmw_balloon.c | 38 +-
drivers/virtio/virtio_balloon.c | 25 +-
drivers/xen/xenbus/xenbus_probe_backend.c | 17 +-
fs/btrfs/super.c | 2 +-
fs/erofs/utils.c | 20 +-
fs/ext4/ext4.h | 2 +-
fs/ext4/extents_status.c | 22 +-
fs/f2fs/super.c | 32 +-
fs/gfs2/glock.c | 20 +-
fs/gfs2/main.c | 6 +-
fs/gfs2/quota.c | 26 +-
fs/gfs2/quota.h | 3 +-
fs/jbd2/journal.c | 27 +-
fs/kernfs/mount.c | 2 +-
fs/mbcache.c | 23 +-
fs/nfs/nfs42xattr.c | 87 +-
fs/nfs/super.c | 20 +-
fs/nfsd/filecache.c | 22 +-
fs/nfsd/netns.h | 4 +-
fs/nfsd/nfs4state.c | 20 +-
fs/nfsd/nfscache.c | 31 +-
fs/proc/root.c | 2 +-
fs/quota/dquot.c | 17 +-
fs/super.c | 39 +-
fs/ubifs/super.c | 22 +-
fs/xfs/xfs_buf.c | 25 +-
fs/xfs/xfs_buf.h | 2 +-
fs/xfs/xfs_icache.c | 26 +-
fs/xfs/xfs_mount.c | 4 +-
fs/xfs/xfs_mount.h | 2 +-
fs/xfs/xfs_qm.c | 26 +-
fs/xfs/xfs_qm.h | 2 +-
include/linux/fs.h | 2 +-
include/linux/jbd2.h | 2 +-
include/linux/memcontrol.h | 12 +-
include/linux/shrinker.h | 54 +-
kernel/rcu/tree.c | 21 +-
kernel/rcu/tree_nocb.h | 19 +-
mm/Makefile | 4 +-
mm/huge_memory.c | 69 +-
mm/shrinker.c | 772 ++++++++++++++++++
mm/shrinker_debug.c | 76 +-
mm/vmscan.c | 701 ----------------
mm/workingset.c | 26 +-
mm/zsmalloc.c | 28 +-
net/sunrpc/auth.c | 19 +-
66 files changed, 1516 insertions(+), 1225 deletions(-)
create mode 100644 mm/shrinker.c

--
2.30.2



2023-07-24 09:49:43

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 03/47] mm: shrinker: add infrastructure for dynamically allocating shrinker

Currently, the shrinker instances can be divided into the following three
types:

a) global shrinker instance statically defined in the kernel, such as
workingset_shadow_shrinker.

b) global shrinker instance statically defined in the kernel modules, such
as mmu_shrinker in x86.

c) shrinker instance embedded in other structures.

For case a, the memory of shrinker instance is never freed. For case b,
the memory of shrinker instance will be freed after synchronize_rcu() when
the module is unloaded. For case c, the memory of shrinker instance will
be freed along with the structure it is embedded in.

In preparation for implementing lockless slab shrink, we need to
dynamically allocate those shrinker instances in case c, then the memory
can be dynamically freed alone by calling kfree_rcu().

So this commit adds the following new APIs for dynamically allocating
shrinker, and add a private_data field to struct shrinker to record and
get the original embedded structure.

1. shrinker_alloc()

Used to allocate shrinker instance itself and related memory, it will
return a pointer to the shrinker instance on success and NULL on failure.

2. shrinker_free_non_registered()

Used to destroy the non-registered shrinker instance.

3. shrinker_register()

Used to register the shrinker instance, which is same as the current
register_shrinker_prepared().

4. shrinker_unregister()

Used to unregister and free the shrinker instance.

In order to simplify shrinker-related APIs and make shrinker more
independent of other kernel mechanisms, subsequent submissions will use
the above API to convert all shrinkers (including case a and b) to
dynamically allocated, and then remove all existing APIs.

This will also have another advantage mentioned by Dave Chinner:

```
The other advantage of this is that it will break all the existing
out of tree code and third party modules using the old API and will
no longer work with a kernel using lockless slab shrinkers. They
need to break (both at the source and binary levels) to stop bad
things from happening due to using uncoverted shrinkers in the new
setup.
```

Signed-off-by: Qi Zheng <[email protected]>
---
include/linux/shrinker.h | 6 +++
mm/shrinker.c | 113 +++++++++++++++++++++++++++++++++++++++
2 files changed, 119 insertions(+)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 961cb84e51f5..296f5e163861 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -70,6 +70,8 @@ struct shrinker {
int seeks; /* seeks to recreate an obj */
unsigned flags;

+ void *private_data;
+
/* These are for internal use */
struct list_head list;
#ifdef CONFIG_MEMCG
@@ -98,6 +100,10 @@ struct shrinker {

unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
int priority);
+struct shrinker *shrinker_alloc(unsigned int flags, const char *fmt, ...);
+void shrinker_free_non_registered(struct shrinker *shrinker);
+void shrinker_register(struct shrinker *shrinker);
+void shrinker_unregister(struct shrinker *shrinker);

extern int __printf(2, 3) prealloc_shrinker(struct shrinker *shrinker,
const char *fmt, ...);
diff --git a/mm/shrinker.c b/mm/shrinker.c
index 0a32ef42f2a7..d820e4cc5806 100644
--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -548,6 +548,119 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
return freed;
}

+struct shrinker *shrinker_alloc(unsigned int flags, const char *fmt, ...)
+{
+ struct shrinker *shrinker;
+ unsigned int size;
+ va_list __maybe_unused ap;
+ int err;
+
+ shrinker = kzalloc(sizeof(struct shrinker), GFP_KERNEL);
+ if (!shrinker)
+ return NULL;
+
+#ifdef CONFIG_SHRINKER_DEBUG
+ va_start(ap, fmt);
+ shrinker->name = kvasprintf_const(GFP_KERNEL, fmt, ap);
+ va_end(ap);
+ if (!shrinker->name)
+ goto err_name;
+#endif
+ shrinker->flags = flags;
+
+ if (flags & SHRINKER_MEMCG_AWARE) {
+ err = prealloc_memcg_shrinker(shrinker);
+ if (err == -ENOSYS)
+ shrinker->flags &= ~SHRINKER_MEMCG_AWARE;
+ else if (err == 0)
+ goto done;
+ else
+ goto err_flags;
+ }
+
+ /*
+ * The nr_deferred is available on per memcg level for memcg aware
+ * shrinkers, so only allocate nr_deferred in the following cases:
+ * - non memcg aware shrinkers
+ * - !CONFIG_MEMCG
+ * - memcg is disabled by kernel command line
+ */
+ size = sizeof(*shrinker->nr_deferred);
+ if (flags & SHRINKER_NUMA_AWARE)
+ size *= nr_node_ids;
+
+ shrinker->nr_deferred = kzalloc(size, GFP_KERNEL);
+ if (!shrinker->nr_deferred)
+ goto err_flags;
+
+done:
+ return shrinker;
+
+err_flags:
+#ifdef CONFIG_SHRINKER_DEBUG
+ kfree_const(shrinker->name);
+ shrinker->name = NULL;
+err_name:
+#endif
+ kfree(shrinker);
+ return NULL;
+}
+EXPORT_SYMBOL(shrinker_alloc);
+
+void shrinker_free_non_registered(struct shrinker *shrinker)
+{
+#ifdef CONFIG_SHRINKER_DEBUG
+ kfree_const(shrinker->name);
+ shrinker->name = NULL;
+#endif
+ if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
+ down_write(&shrinker_rwsem);
+ unregister_memcg_shrinker(shrinker);
+ up_write(&shrinker_rwsem);
+ }
+
+ kfree(shrinker->nr_deferred);
+ shrinker->nr_deferred = NULL;
+
+ kfree(shrinker);
+}
+EXPORT_SYMBOL(shrinker_free_non_registered);
+
+void shrinker_register(struct shrinker *shrinker)
+{
+ down_write(&shrinker_rwsem);
+ list_add_tail(&shrinker->list, &shrinker_list);
+ shrinker->flags |= SHRINKER_REGISTERED;
+ shrinker_debugfs_add(shrinker);
+ up_write(&shrinker_rwsem);
+}
+EXPORT_SYMBOL(shrinker_register);
+
+void shrinker_unregister(struct shrinker *shrinker)
+{
+ struct dentry *debugfs_entry;
+ int debugfs_id;
+
+ if (!shrinker || !(shrinker->flags & SHRINKER_REGISTERED))
+ return;
+
+ down_write(&shrinker_rwsem);
+ list_del(&shrinker->list);
+ shrinker->flags &= ~SHRINKER_REGISTERED;
+ if (shrinker->flags & SHRINKER_MEMCG_AWARE)
+ unregister_memcg_shrinker(shrinker);
+ debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id);
+ up_write(&shrinker_rwsem);
+
+ shrinker_debugfs_remove(debugfs_entry, debugfs_id);
+
+ kfree(shrinker->nr_deferred);
+ shrinker->nr_deferred = NULL;
+
+ kfree(shrinker);
+}
+EXPORT_SYMBOL(shrinker_unregister);
+
/*
* Add a shrinker callback to be called from the vm.
*/
--
2.30.2


2023-07-24 09:49:43

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 01/47] mm: vmscan: move shrinker-related code into a separate file

The mm/vmscan.c file is too large, so separate the shrinker-related
code from it into a separate file. No functional changes.

Signed-off-by: Qi Zheng <[email protected]>
---
include/linux/shrinker.h | 3 +
mm/Makefile | 4 +-
mm/shrinker.c | 707 +++++++++++++++++++++++++++++++++++++++
mm/vmscan.c | 701 --------------------------------------
4 files changed, 712 insertions(+), 703 deletions(-)
create mode 100644 mm/shrinker.c

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 224293b2dd06..961cb84e51f5 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -96,6 +96,9 @@ struct shrinker {
*/
#define SHRINKER_NONSLAB (1 << 3)

+unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
+ int priority);
+
extern int __printf(2, 3) prealloc_shrinker(struct shrinker *shrinker,
const char *fmt, ...);
extern void register_shrinker_prepared(struct shrinker *shrinker);
diff --git a/mm/Makefile b/mm/Makefile
index 678530a07326..891899186608 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -48,8 +48,8 @@ endif

obj-y := filemap.o mempool.o oom_kill.o fadvise.o \
maccess.o page-writeback.o folio-compat.o \
- readahead.o swap.o truncate.o vmscan.o shmem.o \
- util.o mmzone.o vmstat.o backing-dev.o \
+ readahead.o swap.o truncate.o vmscan.o shrinker.o \
+ shmem.o util.o mmzone.o vmstat.o backing-dev.o \
mm_init.o percpu.o slab_common.o \
compaction.o show_mem.o\
interval_tree.o list_lru.o workingset.o \
diff --git a/mm/shrinker.c b/mm/shrinker.c
new file mode 100644
index 000000000000..0a32ef42f2a7
--- /dev/null
+++ b/mm/shrinker.c
@@ -0,0 +1,707 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/memcontrol.h>
+#include <linux/rwsem.h>
+#include <linux/shrinker.h>
+#include <trace/events/vmscan.h>
+
+LIST_HEAD(shrinker_list);
+DECLARE_RWSEM(shrinker_rwsem);
+
+#ifdef CONFIG_MEMCG
+static int shrinker_nr_max;
+
+/* The shrinker_info is expanded in a batch of BITS_PER_LONG */
+static inline int shrinker_map_size(int nr_items)
+{
+ return (DIV_ROUND_UP(nr_items, BITS_PER_LONG) * sizeof(unsigned long));
+}
+
+static inline int shrinker_defer_size(int nr_items)
+{
+ return (round_up(nr_items, BITS_PER_LONG) * sizeof(atomic_long_t));
+}
+
+void free_shrinker_info(struct mem_cgroup *memcg)
+{
+ struct mem_cgroup_per_node *pn;
+ struct shrinker_info *info;
+ int nid;
+
+ for_each_node(nid) {
+ pn = memcg->nodeinfo[nid];
+ info = rcu_dereference_protected(pn->shrinker_info, true);
+ kvfree(info);
+ rcu_assign_pointer(pn->shrinker_info, NULL);
+ }
+}
+
+int alloc_shrinker_info(struct mem_cgroup *memcg)
+{
+ struct shrinker_info *info;
+ int nid, size, ret = 0;
+ int map_size, defer_size = 0;
+
+ down_write(&shrinker_rwsem);
+ map_size = shrinker_map_size(shrinker_nr_max);
+ defer_size = shrinker_defer_size(shrinker_nr_max);
+ size = map_size + defer_size;
+ for_each_node(nid) {
+ info = kvzalloc_node(sizeof(*info) + size, GFP_KERNEL, nid);
+ if (!info) {
+ free_shrinker_info(memcg);
+ ret = -ENOMEM;
+ break;
+ }
+ info->nr_deferred = (atomic_long_t *)(info + 1);
+ info->map = (void *)info->nr_deferred + defer_size;
+ info->map_nr_max = shrinker_nr_max;
+ rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
+ }
+ up_write(&shrinker_rwsem);
+
+ return ret;
+}
+
+static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg,
+ int nid)
+{
+ return rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info,
+ lockdep_is_held(&shrinker_rwsem));
+}
+
+static int expand_one_shrinker_info(struct mem_cgroup *memcg,
+ int map_size, int defer_size,
+ int old_map_size, int old_defer_size,
+ int new_nr_max)
+{
+ struct shrinker_info *new, *old;
+ struct mem_cgroup_per_node *pn;
+ int nid;
+ int size = map_size + defer_size;
+
+ for_each_node(nid) {
+ pn = memcg->nodeinfo[nid];
+ old = shrinker_info_protected(memcg, nid);
+ /* Not yet online memcg */
+ if (!old)
+ return 0;
+
+ /* Already expanded this shrinker_info */
+ if (new_nr_max <= old->map_nr_max)
+ continue;
+
+ new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
+ if (!new)
+ return -ENOMEM;
+
+ new->nr_deferred = (atomic_long_t *)(new + 1);
+ new->map = (void *)new->nr_deferred + defer_size;
+ new->map_nr_max = new_nr_max;
+
+ /* map: set all old bits, clear all new bits */
+ memset(new->map, (int)0xff, old_map_size);
+ memset((void *)new->map + old_map_size, 0, map_size - old_map_size);
+ /* nr_deferred: copy old values, clear all new values */
+ memcpy(new->nr_deferred, old->nr_deferred, old_defer_size);
+ memset((void *)new->nr_deferred + old_defer_size, 0,
+ defer_size - old_defer_size);
+
+ rcu_assign_pointer(pn->shrinker_info, new);
+ kvfree_rcu(old, rcu);
+ }
+
+ return 0;
+}
+
+static int expand_shrinker_info(int new_id)
+{
+ int ret = 0;
+ int new_nr_max = round_up(new_id + 1, BITS_PER_LONG);
+ int map_size, defer_size = 0;
+ int old_map_size, old_defer_size = 0;
+ struct mem_cgroup *memcg;
+
+ if (!root_mem_cgroup)
+ goto out;
+
+ lockdep_assert_held(&shrinker_rwsem);
+
+ map_size = shrinker_map_size(new_nr_max);
+ defer_size = shrinker_defer_size(new_nr_max);
+ old_map_size = shrinker_map_size(shrinker_nr_max);
+ old_defer_size = shrinker_defer_size(shrinker_nr_max);
+
+ memcg = mem_cgroup_iter(NULL, NULL, NULL);
+ do {
+ ret = expand_one_shrinker_info(memcg, map_size, defer_size,
+ old_map_size, old_defer_size,
+ new_nr_max);
+ if (ret) {
+ mem_cgroup_iter_break(NULL, memcg);
+ goto out;
+ }
+ } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
+out:
+ if (!ret)
+ shrinker_nr_max = new_nr_max;
+
+ return ret;
+}
+
+void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id)
+{
+ if (shrinker_id >= 0 && memcg && !mem_cgroup_is_root(memcg)) {
+ struct shrinker_info *info;
+
+ rcu_read_lock();
+ info = rcu_dereference(memcg->nodeinfo[nid]->shrinker_info);
+ if (!WARN_ON_ONCE(shrinker_id >= info->map_nr_max)) {
+ /* Pairs with smp mb in shrink_slab() */
+ smp_mb__before_atomic();
+ set_bit(shrinker_id, info->map);
+ }
+ rcu_read_unlock();
+ }
+}
+
+static DEFINE_IDR(shrinker_idr);
+
+static int prealloc_memcg_shrinker(struct shrinker *shrinker)
+{
+ int id, ret = -ENOMEM;
+
+ if (mem_cgroup_disabled())
+ return -ENOSYS;
+
+ down_write(&shrinker_rwsem);
+ /* This may call shrinker, so it must use down_read_trylock() */
+ id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL);
+ if (id < 0)
+ goto unlock;
+
+ if (id >= shrinker_nr_max) {
+ if (expand_shrinker_info(id)) {
+ idr_remove(&shrinker_idr, id);
+ goto unlock;
+ }
+ }
+ shrinker->id = id;
+ ret = 0;
+unlock:
+ up_write(&shrinker_rwsem);
+ return ret;
+}
+
+static void unregister_memcg_shrinker(struct shrinker *shrinker)
+{
+ int id = shrinker->id;
+
+ BUG_ON(id < 0);
+
+ lockdep_assert_held(&shrinker_rwsem);
+
+ idr_remove(&shrinker_idr, id);
+}
+
+static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker,
+ struct mem_cgroup *memcg)
+{
+ struct shrinker_info *info;
+
+ info = shrinker_info_protected(memcg, nid);
+ return atomic_long_xchg(&info->nr_deferred[shrinker->id], 0);
+}
+
+static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker,
+ struct mem_cgroup *memcg)
+{
+ struct shrinker_info *info;
+
+ info = shrinker_info_protected(memcg, nid);
+ return atomic_long_add_return(nr, &info->nr_deferred[shrinker->id]);
+}
+
+void reparent_shrinker_deferred(struct mem_cgroup *memcg)
+{
+ int i, nid;
+ long nr;
+ struct mem_cgroup *parent;
+ struct shrinker_info *child_info, *parent_info;
+
+ parent = parent_mem_cgroup(memcg);
+ if (!parent)
+ parent = root_mem_cgroup;
+
+ /* Prevent from concurrent shrinker_info expand */
+ down_read(&shrinker_rwsem);
+ for_each_node(nid) {
+ child_info = shrinker_info_protected(memcg, nid);
+ parent_info = shrinker_info_protected(parent, nid);
+ for (i = 0; i < child_info->map_nr_max; i++) {
+ nr = atomic_long_read(&child_info->nr_deferred[i]);
+ atomic_long_add(nr, &parent_info->nr_deferred[i]);
+ }
+ }
+ up_read(&shrinker_rwsem);
+}
+#else
+static int prealloc_memcg_shrinker(struct shrinker *shrinker)
+{
+ return -ENOSYS;
+}
+
+static void unregister_memcg_shrinker(struct shrinker *shrinker)
+{
+}
+
+static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker,
+ struct mem_cgroup *memcg)
+{
+ return 0;
+}
+
+static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker,
+ struct mem_cgroup *memcg)
+{
+ return 0;
+}
+#endif /* CONFIG_MEMCG */
+
+static long xchg_nr_deferred(struct shrinker *shrinker,
+ struct shrink_control *sc)
+{
+ int nid = sc->nid;
+
+ if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
+ nid = 0;
+
+ if (sc->memcg &&
+ (shrinker->flags & SHRINKER_MEMCG_AWARE))
+ return xchg_nr_deferred_memcg(nid, shrinker,
+ sc->memcg);
+
+ return atomic_long_xchg(&shrinker->nr_deferred[nid], 0);
+}
+
+
+static long add_nr_deferred(long nr, struct shrinker *shrinker,
+ struct shrink_control *sc)
+{
+ int nid = sc->nid;
+
+ if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
+ nid = 0;
+
+ if (sc->memcg &&
+ (shrinker->flags & SHRINKER_MEMCG_AWARE))
+ return add_nr_deferred_memcg(nr, nid, shrinker,
+ sc->memcg);
+
+ return atomic_long_add_return(nr, &shrinker->nr_deferred[nid]);
+}
+
+#define SHRINK_BATCH 128
+
+static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
+ struct shrinker *shrinker, int priority)
+{
+ unsigned long freed = 0;
+ unsigned long long delta;
+ long total_scan;
+ long freeable;
+ long nr;
+ long new_nr;
+ long batch_size = shrinker->batch ? shrinker->batch
+ : SHRINK_BATCH;
+ long scanned = 0, next_deferred;
+
+ freeable = shrinker->count_objects(shrinker, shrinkctl);
+ if (freeable == 0 || freeable == SHRINK_EMPTY)
+ return freeable;
+
+ /*
+ * copy the current shrinker scan count into a local variable
+ * and zero it so that other concurrent shrinker invocations
+ * don't also do this scanning work.
+ */
+ nr = xchg_nr_deferred(shrinker, shrinkctl);
+
+ if (shrinker->seeks) {
+ delta = freeable >> priority;
+ delta *= 4;
+ do_div(delta, shrinker->seeks);
+ } else {
+ /*
+ * These objects don't require any IO to create. Trim
+ * them aggressively under memory pressure to keep
+ * them from causing refetches in the IO caches.
+ */
+ delta = freeable / 2;
+ }
+
+ total_scan = nr >> priority;
+ total_scan += delta;
+ total_scan = min(total_scan, (2 * freeable));
+
+ trace_mm_shrink_slab_start(shrinker, shrinkctl, nr,
+ freeable, delta, total_scan, priority);
+
+ /*
+ * Normally, we should not scan less than batch_size objects in one
+ * pass to avoid too frequent shrinker calls, but if the slab has less
+ * than batch_size objects in total and we are really tight on memory,
+ * we will try to reclaim all available objects, otherwise we can end
+ * up failing allocations although there are plenty of reclaimable
+ * objects spread over several slabs with usage less than the
+ * batch_size.
+ *
+ * We detect the "tight on memory" situations by looking at the total
+ * number of objects we want to scan (total_scan). If it is greater
+ * than the total number of objects on slab (freeable), we must be
+ * scanning at high prio and therefore should try to reclaim as much as
+ * possible.
+ */
+ while (total_scan >= batch_size ||
+ total_scan >= freeable) {
+ unsigned long ret;
+ unsigned long nr_to_scan = min(batch_size, total_scan);
+
+ shrinkctl->nr_to_scan = nr_to_scan;
+ shrinkctl->nr_scanned = nr_to_scan;
+ ret = shrinker->scan_objects(shrinker, shrinkctl);
+ if (ret == SHRINK_STOP)
+ break;
+ freed += ret;
+
+ count_vm_events(SLABS_SCANNED, shrinkctl->nr_scanned);
+ total_scan -= shrinkctl->nr_scanned;
+ scanned += shrinkctl->nr_scanned;
+
+ cond_resched();
+ }
+
+ /*
+ * The deferred work is increased by any new work (delta) that wasn't
+ * done, decreased by old deferred work that was done now.
+ *
+ * And it is capped to two times of the freeable items.
+ */
+ next_deferred = max_t(long, (nr + delta - scanned), 0);
+ next_deferred = min(next_deferred, (2 * freeable));
+
+ /*
+ * move the unused scan count back into the shrinker in a
+ * manner that handles concurrent updates.
+ */
+ new_nr = add_nr_deferred(next_deferred, shrinker, shrinkctl);
+
+ trace_mm_shrink_slab_end(shrinker, shrinkctl->nid, freed, nr, new_nr, total_scan);
+ return freed;
+}
+
+#ifdef CONFIG_MEMCG
+static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
+ struct mem_cgroup *memcg, int priority)
+{
+ struct shrinker_info *info;
+ unsigned long ret, freed = 0;
+ int i;
+
+ if (!mem_cgroup_online(memcg))
+ return 0;
+
+ if (!down_read_trylock(&shrinker_rwsem))
+ return 0;
+
+ info = shrinker_info_protected(memcg, nid);
+ if (unlikely(!info))
+ goto unlock;
+
+ for_each_set_bit(i, info->map, info->map_nr_max) {
+ struct shrink_control sc = {
+ .gfp_mask = gfp_mask,
+ .nid = nid,
+ .memcg = memcg,
+ };
+ struct shrinker *shrinker;
+
+ shrinker = idr_find(&shrinker_idr, i);
+ if (unlikely(!shrinker || !(shrinker->flags & SHRINKER_REGISTERED))) {
+ if (!shrinker)
+ clear_bit(i, info->map);
+ continue;
+ }
+
+ /* Call non-slab shrinkers even though kmem is disabled */
+ if (!memcg_kmem_online() &&
+ !(shrinker->flags & SHRINKER_NONSLAB))
+ continue;
+
+ ret = do_shrink_slab(&sc, shrinker, priority);
+ if (ret == SHRINK_EMPTY) {
+ clear_bit(i, info->map);
+ /*
+ * After the shrinker reported that it had no objects to
+ * free, but before we cleared the corresponding bit in
+ * the memcg shrinker map, a new object might have been
+ * added. To make sure, we have the bit set in this
+ * case, we invoke the shrinker one more time and reset
+ * the bit if it reports that it is not empty anymore.
+ * The memory barrier here pairs with the barrier in
+ * set_shrinker_bit():
+ *
+ * list_lru_add() shrink_slab_memcg()
+ * list_add_tail() clear_bit()
+ * <MB> <MB>
+ * set_bit() do_shrink_slab()
+ */
+ smp_mb__after_atomic();
+ ret = do_shrink_slab(&sc, shrinker, priority);
+ if (ret == SHRINK_EMPTY)
+ ret = 0;
+ else
+ set_shrinker_bit(memcg, nid, i);
+ }
+ freed += ret;
+
+ if (rwsem_is_contended(&shrinker_rwsem)) {
+ freed = freed ? : 1;
+ break;
+ }
+ }
+unlock:
+ up_read(&shrinker_rwsem);
+ return freed;
+}
+#else /* !CONFIG_MEMCG */
+static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
+ struct mem_cgroup *memcg, int priority)
+{
+ return 0;
+}
+#endif /* CONFIG_MEMCG */
+
+/**
+ * shrink_slab - shrink slab caches
+ * @gfp_mask: allocation context
+ * @nid: node whose slab caches to target
+ * @memcg: memory cgroup whose slab caches to target
+ * @priority: the reclaim priority
+ *
+ * Call the shrink functions to age shrinkable caches.
+ *
+ * @nid is passed along to shrinkers with SHRINKER_NUMA_AWARE set,
+ * unaware shrinkers will receive a node id of 0 instead.
+ *
+ * @memcg specifies the memory cgroup to target. Unaware shrinkers
+ * are called only if it is the root cgroup.
+ *
+ * @priority is sc->priority, we take the number of objects and >> by priority
+ * in order to get the scan target.
+ *
+ * Returns the number of reclaimed slab objects.
+ */
+unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
+ int priority)
+{
+ unsigned long ret, freed = 0;
+ struct shrinker *shrinker;
+
+ /*
+ * The root memcg might be allocated even though memcg is disabled
+ * via "cgroup_disable=memory" boot parameter. This could make
+ * mem_cgroup_is_root() return false, then just run memcg slab
+ * shrink, but skip global shrink. This may result in premature
+ * oom.
+ */
+ if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
+ return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
+
+ if (!down_read_trylock(&shrinker_rwsem))
+ goto out;
+
+ list_for_each_entry(shrinker, &shrinker_list, list) {
+ struct shrink_control sc = {
+ .gfp_mask = gfp_mask,
+ .nid = nid,
+ .memcg = memcg,
+ };
+
+ ret = do_shrink_slab(&sc, shrinker, priority);
+ if (ret == SHRINK_EMPTY)
+ ret = 0;
+ freed += ret;
+ /*
+ * Bail out if someone want to register a new shrinker to
+ * prevent the registration from being stalled for long periods
+ * by parallel ongoing shrinking.
+ */
+ if (rwsem_is_contended(&shrinker_rwsem)) {
+ freed = freed ? : 1;
+ break;
+ }
+ }
+
+ up_read(&shrinker_rwsem);
+out:
+ cond_resched();
+ return freed;
+}
+
+/*
+ * Add a shrinker callback to be called from the vm.
+ */
+static int __prealloc_shrinker(struct shrinker *shrinker)
+{
+ unsigned int size;
+ int err;
+
+ if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
+ err = prealloc_memcg_shrinker(shrinker);
+ if (err != -ENOSYS)
+ return err;
+
+ shrinker->flags &= ~SHRINKER_MEMCG_AWARE;
+ }
+
+ size = sizeof(*shrinker->nr_deferred);
+ if (shrinker->flags & SHRINKER_NUMA_AWARE)
+ size *= nr_node_ids;
+
+ shrinker->nr_deferred = kzalloc(size, GFP_KERNEL);
+ if (!shrinker->nr_deferred)
+ return -ENOMEM;
+
+ return 0;
+}
+
+#ifdef CONFIG_SHRINKER_DEBUG
+int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
+{
+ va_list ap;
+ int err;
+
+ va_start(ap, fmt);
+ shrinker->name = kvasprintf_const(GFP_KERNEL, fmt, ap);
+ va_end(ap);
+ if (!shrinker->name)
+ return -ENOMEM;
+
+ err = __prealloc_shrinker(shrinker);
+ if (err) {
+ kfree_const(shrinker->name);
+ shrinker->name = NULL;
+ }
+
+ return err;
+}
+#else
+int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
+{
+ return __prealloc_shrinker(shrinker);
+}
+#endif
+
+void free_prealloced_shrinker(struct shrinker *shrinker)
+{
+#ifdef CONFIG_SHRINKER_DEBUG
+ kfree_const(shrinker->name);
+ shrinker->name = NULL;
+#endif
+ if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
+ down_write(&shrinker_rwsem);
+ unregister_memcg_shrinker(shrinker);
+ up_write(&shrinker_rwsem);
+ return;
+ }
+
+ kfree(shrinker->nr_deferred);
+ shrinker->nr_deferred = NULL;
+}
+
+void register_shrinker_prepared(struct shrinker *shrinker)
+{
+ down_write(&shrinker_rwsem);
+ list_add_tail(&shrinker->list, &shrinker_list);
+ shrinker->flags |= SHRINKER_REGISTERED;
+ shrinker_debugfs_add(shrinker);
+ up_write(&shrinker_rwsem);
+}
+
+static int __register_shrinker(struct shrinker *shrinker)
+{
+ int err = __prealloc_shrinker(shrinker);
+
+ if (err)
+ return err;
+ register_shrinker_prepared(shrinker);
+ return 0;
+}
+
+#ifdef CONFIG_SHRINKER_DEBUG
+int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
+{
+ va_list ap;
+ int err;
+
+ va_start(ap, fmt);
+ shrinker->name = kvasprintf_const(GFP_KERNEL, fmt, ap);
+ va_end(ap);
+ if (!shrinker->name)
+ return -ENOMEM;
+
+ err = __register_shrinker(shrinker);
+ if (err) {
+ kfree_const(shrinker->name);
+ shrinker->name = NULL;
+ }
+ return err;
+}
+#else
+int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
+{
+ return __register_shrinker(shrinker);
+}
+#endif
+EXPORT_SYMBOL(register_shrinker);
+
+/*
+ * Remove one
+ */
+void unregister_shrinker(struct shrinker *shrinker)
+{
+ struct dentry *debugfs_entry;
+ int debugfs_id;
+
+ if (!(shrinker->flags & SHRINKER_REGISTERED))
+ return;
+
+ down_write(&shrinker_rwsem);
+ list_del(&shrinker->list);
+ shrinker->flags &= ~SHRINKER_REGISTERED;
+ if (shrinker->flags & SHRINKER_MEMCG_AWARE)
+ unregister_memcg_shrinker(shrinker);
+ debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id);
+ up_write(&shrinker_rwsem);
+
+ shrinker_debugfs_remove(debugfs_entry, debugfs_id);
+
+ kfree(shrinker->nr_deferred);
+ shrinker->nr_deferred = NULL;
+}
+EXPORT_SYMBOL(unregister_shrinker);
+
+/**
+ * synchronize_shrinkers - Wait for all running shrinkers to complete.
+ *
+ * This is equivalent to calling unregister_shrink() and register_shrinker(),
+ * but atomically and with less overhead. This is useful to guarantee that all
+ * shrinker invocations have seen an update, before freeing memory, similar to
+ * rcu.
+ */
+void synchronize_shrinkers(void)
+{
+ down_write(&shrinker_rwsem);
+ up_write(&shrinker_rwsem);
+}
+EXPORT_SYMBOL(synchronize_shrinkers);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4039620d30fe..07bc58af6f26 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -35,7 +35,6 @@
#include <linux/cpuset.h>
#include <linux/compaction.h>
#include <linux/notifier.h>
-#include <linux/rwsem.h>
#include <linux/delay.h>
#include <linux/kthread.h>
#include <linux/freezer.h>
@@ -188,246 +187,7 @@ struct scan_control {
*/
int vm_swappiness = 60;

-LIST_HEAD(shrinker_list);
-DECLARE_RWSEM(shrinker_rwsem);
-
#ifdef CONFIG_MEMCG
-static int shrinker_nr_max;
-
-/* The shrinker_info is expanded in a batch of BITS_PER_LONG */
-static inline int shrinker_map_size(int nr_items)
-{
- return (DIV_ROUND_UP(nr_items, BITS_PER_LONG) * sizeof(unsigned long));
-}
-
-static inline int shrinker_defer_size(int nr_items)
-{
- return (round_up(nr_items, BITS_PER_LONG) * sizeof(atomic_long_t));
-}
-
-static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg,
- int nid)
-{
- return rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info,
- lockdep_is_held(&shrinker_rwsem));
-}
-
-static int expand_one_shrinker_info(struct mem_cgroup *memcg,
- int map_size, int defer_size,
- int old_map_size, int old_defer_size,
- int new_nr_max)
-{
- struct shrinker_info *new, *old;
- struct mem_cgroup_per_node *pn;
- int nid;
- int size = map_size + defer_size;
-
- for_each_node(nid) {
- pn = memcg->nodeinfo[nid];
- old = shrinker_info_protected(memcg, nid);
- /* Not yet online memcg */
- if (!old)
- return 0;
-
- /* Already expanded this shrinker_info */
- if (new_nr_max <= old->map_nr_max)
- continue;
-
- new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
- if (!new)
- return -ENOMEM;
-
- new->nr_deferred = (atomic_long_t *)(new + 1);
- new->map = (void *)new->nr_deferred + defer_size;
- new->map_nr_max = new_nr_max;
-
- /* map: set all old bits, clear all new bits */
- memset(new->map, (int)0xff, old_map_size);
- memset((void *)new->map + old_map_size, 0, map_size - old_map_size);
- /* nr_deferred: copy old values, clear all new values */
- memcpy(new->nr_deferred, old->nr_deferred, old_defer_size);
- memset((void *)new->nr_deferred + old_defer_size, 0,
- defer_size - old_defer_size);
-
- rcu_assign_pointer(pn->shrinker_info, new);
- kvfree_rcu(old, rcu);
- }
-
- return 0;
-}
-
-void free_shrinker_info(struct mem_cgroup *memcg)
-{
- struct mem_cgroup_per_node *pn;
- struct shrinker_info *info;
- int nid;
-
- for_each_node(nid) {
- pn = memcg->nodeinfo[nid];
- info = rcu_dereference_protected(pn->shrinker_info, true);
- kvfree(info);
- rcu_assign_pointer(pn->shrinker_info, NULL);
- }
-}
-
-int alloc_shrinker_info(struct mem_cgroup *memcg)
-{
- struct shrinker_info *info;
- int nid, size, ret = 0;
- int map_size, defer_size = 0;
-
- down_write(&shrinker_rwsem);
- map_size = shrinker_map_size(shrinker_nr_max);
- defer_size = shrinker_defer_size(shrinker_nr_max);
- size = map_size + defer_size;
- for_each_node(nid) {
- info = kvzalloc_node(sizeof(*info) + size, GFP_KERNEL, nid);
- if (!info) {
- free_shrinker_info(memcg);
- ret = -ENOMEM;
- break;
- }
- info->nr_deferred = (atomic_long_t *)(info + 1);
- info->map = (void *)info->nr_deferred + defer_size;
- info->map_nr_max = shrinker_nr_max;
- rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
- }
- up_write(&shrinker_rwsem);
-
- return ret;
-}
-
-static int expand_shrinker_info(int new_id)
-{
- int ret = 0;
- int new_nr_max = round_up(new_id + 1, BITS_PER_LONG);
- int map_size, defer_size = 0;
- int old_map_size, old_defer_size = 0;
- struct mem_cgroup *memcg;
-
- if (!root_mem_cgroup)
- goto out;
-
- lockdep_assert_held(&shrinker_rwsem);
-
- map_size = shrinker_map_size(new_nr_max);
- defer_size = shrinker_defer_size(new_nr_max);
- old_map_size = shrinker_map_size(shrinker_nr_max);
- old_defer_size = shrinker_defer_size(shrinker_nr_max);
-
- memcg = mem_cgroup_iter(NULL, NULL, NULL);
- do {
- ret = expand_one_shrinker_info(memcg, map_size, defer_size,
- old_map_size, old_defer_size,
- new_nr_max);
- if (ret) {
- mem_cgroup_iter_break(NULL, memcg);
- goto out;
- }
- } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
-out:
- if (!ret)
- shrinker_nr_max = new_nr_max;
-
- return ret;
-}
-
-void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id)
-{
- if (shrinker_id >= 0 && memcg && !mem_cgroup_is_root(memcg)) {
- struct shrinker_info *info;
-
- rcu_read_lock();
- info = rcu_dereference(memcg->nodeinfo[nid]->shrinker_info);
- if (!WARN_ON_ONCE(shrinker_id >= info->map_nr_max)) {
- /* Pairs with smp mb in shrink_slab() */
- smp_mb__before_atomic();
- set_bit(shrinker_id, info->map);
- }
- rcu_read_unlock();
- }
-}
-
-static DEFINE_IDR(shrinker_idr);
-
-static int prealloc_memcg_shrinker(struct shrinker *shrinker)
-{
- int id, ret = -ENOMEM;
-
- if (mem_cgroup_disabled())
- return -ENOSYS;
-
- down_write(&shrinker_rwsem);
- /* This may call shrinker, so it must use down_read_trylock() */
- id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL);
- if (id < 0)
- goto unlock;
-
- if (id >= shrinker_nr_max) {
- if (expand_shrinker_info(id)) {
- idr_remove(&shrinker_idr, id);
- goto unlock;
- }
- }
- shrinker->id = id;
- ret = 0;
-unlock:
- up_write(&shrinker_rwsem);
- return ret;
-}
-
-static void unregister_memcg_shrinker(struct shrinker *shrinker)
-{
- int id = shrinker->id;
-
- BUG_ON(id < 0);
-
- lockdep_assert_held(&shrinker_rwsem);
-
- idr_remove(&shrinker_idr, id);
-}
-
-static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker,
- struct mem_cgroup *memcg)
-{
- struct shrinker_info *info;
-
- info = shrinker_info_protected(memcg, nid);
- return atomic_long_xchg(&info->nr_deferred[shrinker->id], 0);
-}
-
-static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker,
- struct mem_cgroup *memcg)
-{
- struct shrinker_info *info;
-
- info = shrinker_info_protected(memcg, nid);
- return atomic_long_add_return(nr, &info->nr_deferred[shrinker->id]);
-}
-
-void reparent_shrinker_deferred(struct mem_cgroup *memcg)
-{
- int i, nid;
- long nr;
- struct mem_cgroup *parent;
- struct shrinker_info *child_info, *parent_info;
-
- parent = parent_mem_cgroup(memcg);
- if (!parent)
- parent = root_mem_cgroup;
-
- /* Prevent from concurrent shrinker_info expand */
- down_read(&shrinker_rwsem);
- for_each_node(nid) {
- child_info = shrinker_info_protected(memcg, nid);
- parent_info = shrinker_info_protected(parent, nid);
- for (i = 0; i < child_info->map_nr_max; i++) {
- nr = atomic_long_read(&child_info->nr_deferred[i]);
- atomic_long_add(nr, &parent_info->nr_deferred[i]);
- }
- }
- up_read(&shrinker_rwsem);
-}

/* Returns true for reclaim through cgroup limits or cgroup interfaces. */
static bool cgroup_reclaim(struct scan_control *sc)
@@ -468,27 +228,6 @@ static bool writeback_throttling_sane(struct scan_control *sc)
return false;
}
#else
-static int prealloc_memcg_shrinker(struct shrinker *shrinker)
-{
- return -ENOSYS;
-}
-
-static void unregister_memcg_shrinker(struct shrinker *shrinker)
-{
-}
-
-static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker,
- struct mem_cgroup *memcg)
-{
- return 0;
-}
-
-static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker,
- struct mem_cgroup *memcg)
-{
- return 0;
-}
-
static bool cgroup_reclaim(struct scan_control *sc)
{
return false;
@@ -557,39 +296,6 @@ static void flush_reclaim_state(struct scan_control *sc)
}
}

-static long xchg_nr_deferred(struct shrinker *shrinker,
- struct shrink_control *sc)
-{
- int nid = sc->nid;
-
- if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
- nid = 0;
-
- if (sc->memcg &&
- (shrinker->flags & SHRINKER_MEMCG_AWARE))
- return xchg_nr_deferred_memcg(nid, shrinker,
- sc->memcg);
-
- return atomic_long_xchg(&shrinker->nr_deferred[nid], 0);
-}
-
-
-static long add_nr_deferred(long nr, struct shrinker *shrinker,
- struct shrink_control *sc)
-{
- int nid = sc->nid;
-
- if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
- nid = 0;
-
- if (sc->memcg &&
- (shrinker->flags & SHRINKER_MEMCG_AWARE))
- return add_nr_deferred_memcg(nr, nid, shrinker,
- sc->memcg);
-
- return atomic_long_add_return(nr, &shrinker->nr_deferred[nid]);
-}
-
static bool can_demote(int nid, struct scan_control *sc)
{
if (!numa_demotion_enabled)
@@ -671,413 +377,6 @@ static unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru,
return size;
}

-/*
- * Add a shrinker callback to be called from the vm.
- */
-static int __prealloc_shrinker(struct shrinker *shrinker)
-{
- unsigned int size;
- int err;
-
- if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
- err = prealloc_memcg_shrinker(shrinker);
- if (err != -ENOSYS)
- return err;
-
- shrinker->flags &= ~SHRINKER_MEMCG_AWARE;
- }
-
- size = sizeof(*shrinker->nr_deferred);
- if (shrinker->flags & SHRINKER_NUMA_AWARE)
- size *= nr_node_ids;
-
- shrinker->nr_deferred = kzalloc(size, GFP_KERNEL);
- if (!shrinker->nr_deferred)
- return -ENOMEM;
-
- return 0;
-}
-
-#ifdef CONFIG_SHRINKER_DEBUG
-int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
-{
- va_list ap;
- int err;
-
- va_start(ap, fmt);
- shrinker->name = kvasprintf_const(GFP_KERNEL, fmt, ap);
- va_end(ap);
- if (!shrinker->name)
- return -ENOMEM;
-
- err = __prealloc_shrinker(shrinker);
- if (err) {
- kfree_const(shrinker->name);
- shrinker->name = NULL;
- }
-
- return err;
-}
-#else
-int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
-{
- return __prealloc_shrinker(shrinker);
-}
-#endif
-
-void free_prealloced_shrinker(struct shrinker *shrinker)
-{
-#ifdef CONFIG_SHRINKER_DEBUG
- kfree_const(shrinker->name);
- shrinker->name = NULL;
-#endif
- if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
- down_write(&shrinker_rwsem);
- unregister_memcg_shrinker(shrinker);
- up_write(&shrinker_rwsem);
- return;
- }
-
- kfree(shrinker->nr_deferred);
- shrinker->nr_deferred = NULL;
-}
-
-void register_shrinker_prepared(struct shrinker *shrinker)
-{
- down_write(&shrinker_rwsem);
- list_add_tail(&shrinker->list, &shrinker_list);
- shrinker->flags |= SHRINKER_REGISTERED;
- shrinker_debugfs_add(shrinker);
- up_write(&shrinker_rwsem);
-}
-
-static int __register_shrinker(struct shrinker *shrinker)
-{
- int err = __prealloc_shrinker(shrinker);
-
- if (err)
- return err;
- register_shrinker_prepared(shrinker);
- return 0;
-}
-
-#ifdef CONFIG_SHRINKER_DEBUG
-int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
-{
- va_list ap;
- int err;
-
- va_start(ap, fmt);
- shrinker->name = kvasprintf_const(GFP_KERNEL, fmt, ap);
- va_end(ap);
- if (!shrinker->name)
- return -ENOMEM;
-
- err = __register_shrinker(shrinker);
- if (err) {
- kfree_const(shrinker->name);
- shrinker->name = NULL;
- }
- return err;
-}
-#else
-int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
-{
- return __register_shrinker(shrinker);
-}
-#endif
-EXPORT_SYMBOL(register_shrinker);
-
-/*
- * Remove one
- */
-void unregister_shrinker(struct shrinker *shrinker)
-{
- struct dentry *debugfs_entry;
- int debugfs_id;
-
- if (!(shrinker->flags & SHRINKER_REGISTERED))
- return;
-
- down_write(&shrinker_rwsem);
- list_del(&shrinker->list);
- shrinker->flags &= ~SHRINKER_REGISTERED;
- if (shrinker->flags & SHRINKER_MEMCG_AWARE)
- unregister_memcg_shrinker(shrinker);
- debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id);
- up_write(&shrinker_rwsem);
-
- shrinker_debugfs_remove(debugfs_entry, debugfs_id);
-
- kfree(shrinker->nr_deferred);
- shrinker->nr_deferred = NULL;
-}
-EXPORT_SYMBOL(unregister_shrinker);
-
-/**
- * synchronize_shrinkers - Wait for all running shrinkers to complete.
- *
- * This is equivalent to calling unregister_shrink() and register_shrinker(),
- * but atomically and with less overhead. This is useful to guarantee that all
- * shrinker invocations have seen an update, before freeing memory, similar to
- * rcu.
- */
-void synchronize_shrinkers(void)
-{
- down_write(&shrinker_rwsem);
- up_write(&shrinker_rwsem);
-}
-EXPORT_SYMBOL(synchronize_shrinkers);
-
-#define SHRINK_BATCH 128
-
-static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
- struct shrinker *shrinker, int priority)
-{
- unsigned long freed = 0;
- unsigned long long delta;
- long total_scan;
- long freeable;
- long nr;
- long new_nr;
- long batch_size = shrinker->batch ? shrinker->batch
- : SHRINK_BATCH;
- long scanned = 0, next_deferred;
-
- freeable = shrinker->count_objects(shrinker, shrinkctl);
- if (freeable == 0 || freeable == SHRINK_EMPTY)
- return freeable;
-
- /*
- * copy the current shrinker scan count into a local variable
- * and zero it so that other concurrent shrinker invocations
- * don't also do this scanning work.
- */
- nr = xchg_nr_deferred(shrinker, shrinkctl);
-
- if (shrinker->seeks) {
- delta = freeable >> priority;
- delta *= 4;
- do_div(delta, shrinker->seeks);
- } else {
- /*
- * These objects don't require any IO to create. Trim
- * them aggressively under memory pressure to keep
- * them from causing refetches in the IO caches.
- */
- delta = freeable / 2;
- }
-
- total_scan = nr >> priority;
- total_scan += delta;
- total_scan = min(total_scan, (2 * freeable));
-
- trace_mm_shrink_slab_start(shrinker, shrinkctl, nr,
- freeable, delta, total_scan, priority);
-
- /*
- * Normally, we should not scan less than batch_size objects in one
- * pass to avoid too frequent shrinker calls, but if the slab has less
- * than batch_size objects in total and we are really tight on memory,
- * we will try to reclaim all available objects, otherwise we can end
- * up failing allocations although there are plenty of reclaimable
- * objects spread over several slabs with usage less than the
- * batch_size.
- *
- * We detect the "tight on memory" situations by looking at the total
- * number of objects we want to scan (total_scan). If it is greater
- * than the total number of objects on slab (freeable), we must be
- * scanning at high prio and therefore should try to reclaim as much as
- * possible.
- */
- while (total_scan >= batch_size ||
- total_scan >= freeable) {
- unsigned long ret;
- unsigned long nr_to_scan = min(batch_size, total_scan);
-
- shrinkctl->nr_to_scan = nr_to_scan;
- shrinkctl->nr_scanned = nr_to_scan;
- ret = shrinker->scan_objects(shrinker, shrinkctl);
- if (ret == SHRINK_STOP)
- break;
- freed += ret;
-
- count_vm_events(SLABS_SCANNED, shrinkctl->nr_scanned);
- total_scan -= shrinkctl->nr_scanned;
- scanned += shrinkctl->nr_scanned;
-
- cond_resched();
- }
-
- /*
- * The deferred work is increased by any new work (delta) that wasn't
- * done, decreased by old deferred work that was done now.
- *
- * And it is capped to two times of the freeable items.
- */
- next_deferred = max_t(long, (nr + delta - scanned), 0);
- next_deferred = min(next_deferred, (2 * freeable));
-
- /*
- * move the unused scan count back into the shrinker in a
- * manner that handles concurrent updates.
- */
- new_nr = add_nr_deferred(next_deferred, shrinker, shrinkctl);
-
- trace_mm_shrink_slab_end(shrinker, shrinkctl->nid, freed, nr, new_nr, total_scan);
- return freed;
-}
-
-#ifdef CONFIG_MEMCG
-static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
- struct mem_cgroup *memcg, int priority)
-{
- struct shrinker_info *info;
- unsigned long ret, freed = 0;
- int i;
-
- if (!mem_cgroup_online(memcg))
- return 0;
-
- if (!down_read_trylock(&shrinker_rwsem))
- return 0;
-
- info = shrinker_info_protected(memcg, nid);
- if (unlikely(!info))
- goto unlock;
-
- for_each_set_bit(i, info->map, info->map_nr_max) {
- struct shrink_control sc = {
- .gfp_mask = gfp_mask,
- .nid = nid,
- .memcg = memcg,
- };
- struct shrinker *shrinker;
-
- shrinker = idr_find(&shrinker_idr, i);
- if (unlikely(!shrinker || !(shrinker->flags & SHRINKER_REGISTERED))) {
- if (!shrinker)
- clear_bit(i, info->map);
- continue;
- }
-
- /* Call non-slab shrinkers even though kmem is disabled */
- if (!memcg_kmem_online() &&
- !(shrinker->flags & SHRINKER_NONSLAB))
- continue;
-
- ret = do_shrink_slab(&sc, shrinker, priority);
- if (ret == SHRINK_EMPTY) {
- clear_bit(i, info->map);
- /*
- * After the shrinker reported that it had no objects to
- * free, but before we cleared the corresponding bit in
- * the memcg shrinker map, a new object might have been
- * added. To make sure, we have the bit set in this
- * case, we invoke the shrinker one more time and reset
- * the bit if it reports that it is not empty anymore.
- * The memory barrier here pairs with the barrier in
- * set_shrinker_bit():
- *
- * list_lru_add() shrink_slab_memcg()
- * list_add_tail() clear_bit()
- * <MB> <MB>
- * set_bit() do_shrink_slab()
- */
- smp_mb__after_atomic();
- ret = do_shrink_slab(&sc, shrinker, priority);
- if (ret == SHRINK_EMPTY)
- ret = 0;
- else
- set_shrinker_bit(memcg, nid, i);
- }
- freed += ret;
-
- if (rwsem_is_contended(&shrinker_rwsem)) {
- freed = freed ? : 1;
- break;
- }
- }
-unlock:
- up_read(&shrinker_rwsem);
- return freed;
-}
-#else /* CONFIG_MEMCG */
-static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
- struct mem_cgroup *memcg, int priority)
-{
- return 0;
-}
-#endif /* CONFIG_MEMCG */
-
-/**
- * shrink_slab - shrink slab caches
- * @gfp_mask: allocation context
- * @nid: node whose slab caches to target
- * @memcg: memory cgroup whose slab caches to target
- * @priority: the reclaim priority
- *
- * Call the shrink functions to age shrinkable caches.
- *
- * @nid is passed along to shrinkers with SHRINKER_NUMA_AWARE set,
- * unaware shrinkers will receive a node id of 0 instead.
- *
- * @memcg specifies the memory cgroup to target. Unaware shrinkers
- * are called only if it is the root cgroup.
- *
- * @priority is sc->priority, we take the number of objects and >> by priority
- * in order to get the scan target.
- *
- * Returns the number of reclaimed slab objects.
- */
-static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
- struct mem_cgroup *memcg,
- int priority)
-{
- unsigned long ret, freed = 0;
- struct shrinker *shrinker;
-
- /*
- * The root memcg might be allocated even though memcg is disabled
- * via "cgroup_disable=memory" boot parameter. This could make
- * mem_cgroup_is_root() return false, then just run memcg slab
- * shrink, but skip global shrink. This may result in premature
- * oom.
- */
- if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
- return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
-
- if (!down_read_trylock(&shrinker_rwsem))
- goto out;
-
- list_for_each_entry(shrinker, &shrinker_list, list) {
- struct shrink_control sc = {
- .gfp_mask = gfp_mask,
- .nid = nid,
- .memcg = memcg,
- };
-
- ret = do_shrink_slab(&sc, shrinker, priority);
- if (ret == SHRINK_EMPTY)
- ret = 0;
- freed += ret;
- /*
- * Bail out if someone want to register a new shrinker to
- * prevent the registration from being stalled for long periods
- * by parallel ongoing shrinking.
- */
- if (rwsem_is_contended(&shrinker_rwsem)) {
- freed = freed ? : 1;
- break;
- }
- }
-
- up_read(&shrinker_rwsem);
-out:
- cond_resched();
- return freed;
-}
-
static unsigned long drop_slab_node(int nid)
{
unsigned long freed = 0;
--
2.30.2


2023-07-24 09:50:17

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 02/47] mm: shrinker: remove redundant shrinker_rwsem in debugfs operations

The debugfs_remove_recursive() will wait for debugfs_file_put() to return,
so the shrinker will not be freed when doing debugfs operations (such as
shrinker_debugfs_count_show() and shrinker_debugfs_scan_write()), so there
is no need to hold shrinker_rwsem during debugfs operations.

Signed-off-by: Qi Zheng <[email protected]>
---
mm/shrinker_debug.c | 14 --------------
1 file changed, 14 deletions(-)

diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
index 3ab53fad8876..f1becfd45853 100644
--- a/mm/shrinker_debug.c
+++ b/mm/shrinker_debug.c
@@ -55,11 +55,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
if (!count_per_node)
return -ENOMEM;

- ret = down_read_killable(&shrinker_rwsem);
- if (ret) {
- kfree(count_per_node);
- return ret;
- }
rcu_read_lock();

memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
@@ -92,7 +87,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);

rcu_read_unlock();
- up_read(&shrinker_rwsem);

kfree(count_per_node);
return ret;
@@ -117,7 +111,6 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
struct mem_cgroup *memcg = NULL;
int nid;
char kbuf[72];
- ssize_t ret;

read_len = size < (sizeof(kbuf) - 1) ? size : (sizeof(kbuf) - 1);
if (copy_from_user(kbuf, buf, read_len))
@@ -146,12 +139,6 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
return -EINVAL;
}

- ret = down_read_killable(&shrinker_rwsem);
- if (ret) {
- mem_cgroup_put(memcg);
- return ret;
- }
-
sc.nid = nid;
sc.memcg = memcg;
sc.nr_to_scan = nr_to_scan;
@@ -159,7 +146,6 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,

shrinker->scan_objects(shrinker, &sc);

- up_read(&shrinker_rwsem);
mem_cgroup_put(memcg);

return size;
--
2.30.2


2023-07-24 09:50:29

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 05/47] binder: dynamically allocate the android-binder shrinker

Use new APIs to dynamically allocate the android-binder shrinker.

Signed-off-by: Qi Zheng <[email protected]>
---
drivers/android/binder_alloc.c | 31 +++++++++++++++++++------------
1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index e3db8297095a..019981d65e1e 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -1053,11 +1053,7 @@ binder_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
NULL, sc->nr_to_scan);
}

-static struct shrinker binder_shrinker = {
- .count_objects = binder_shrink_count,
- .scan_objects = binder_shrink_scan,
- .seeks = DEFAULT_SEEKS,
-};
+static struct shrinker *binder_shrinker;

/**
* binder_alloc_init() - called by binder_open() for per-proc initialization
@@ -1077,19 +1073,30 @@ void binder_alloc_init(struct binder_alloc *alloc)

int binder_alloc_shrinker_init(void)
{
- int ret = list_lru_init(&binder_alloc_lru);
+ int ret;

- if (ret == 0) {
- ret = register_shrinker(&binder_shrinker, "android-binder");
- if (ret)
- list_lru_destroy(&binder_alloc_lru);
+ ret = list_lru_init(&binder_alloc_lru);
+ if (ret)
+ return ret;
+
+ binder_shrinker = shrinker_alloc(0, "android-binder");
+ if (!binder_shrinker) {
+ list_lru_destroy(&binder_alloc_lru);
+ return -ENOMEM;
}
- return ret;
+
+ binder_shrinker->count_objects = binder_shrink_count;
+ binder_shrinker->scan_objects = binder_shrink_scan;
+ binder_shrinker->seeks = DEFAULT_SEEKS;
+
+ shrinker_register(binder_shrinker);
+
+ return 0;
}

void binder_alloc_shrinker_exit(void)
{
- unregister_shrinker(&binder_shrinker);
+ shrinker_unregister(binder_shrinker);
list_lru_destroy(&binder_alloc_lru);
}

--
2.30.2


2023-07-24 09:50:34

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 06/47] drm/ttm: dynamically allocate the drm-ttm_pool shrinker

Use new APIs to dynamically allocate the drm-ttm_pool shrinker.

Signed-off-by: Qi Zheng <[email protected]>
---
drivers/gpu/drm/ttm/ttm_pool.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index cddb9151d20f..e1eb73d0b72a 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -73,7 +73,7 @@ static struct ttm_pool_type global_dma32_uncached[MAX_ORDER + 1];

static spinlock_t shrinker_lock;
static struct list_head shrinker_list;
-static struct shrinker mm_shrinker;
+static struct shrinker *mm_shrinker;

/* Allocate pages of size 1 << order with the given gfp_flags */
static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
@@ -734,8 +734,8 @@ static int ttm_pool_debugfs_shrink_show(struct seq_file *m, void *data)
struct shrink_control sc = { .gfp_mask = GFP_NOFS };

fs_reclaim_acquire(GFP_KERNEL);
- seq_printf(m, "%lu/%lu\n", ttm_pool_shrinker_count(&mm_shrinker, &sc),
- ttm_pool_shrinker_scan(&mm_shrinker, &sc));
+ seq_printf(m, "%lu/%lu\n", ttm_pool_shrinker_count(mm_shrinker, &sc),
+ ttm_pool_shrinker_scan(mm_shrinker, &sc));
fs_reclaim_release(GFP_KERNEL);

return 0;
@@ -779,10 +779,17 @@ int ttm_pool_mgr_init(unsigned long num_pages)
&ttm_pool_debugfs_shrink_fops);
#endif

- mm_shrinker.count_objects = ttm_pool_shrinker_count;
- mm_shrinker.scan_objects = ttm_pool_shrinker_scan;
- mm_shrinker.seeks = 1;
- return register_shrinker(&mm_shrinker, "drm-ttm_pool");
+ mm_shrinker = shrinker_alloc(0, "drm-ttm_pool");
+ if (!mm_shrinker)
+ return -ENOMEM;
+
+ mm_shrinker->count_objects = ttm_pool_shrinker_count;
+ mm_shrinker->scan_objects = ttm_pool_shrinker_scan;
+ mm_shrinker->seeks = 1;
+
+ shrinker_register(mm_shrinker);
+
+ return 0;
}

/**
@@ -802,6 +809,6 @@ void ttm_pool_mgr_fini(void)
ttm_pool_type_fini(&global_dma32_uncached[i]);
}

- unregister_shrinker(&mm_shrinker);
+ shrinker_unregister(mm_shrinker);
WARN_ON(!list_empty(&shrinker_list));
}
--
2.30.2


2023-07-24 09:51:01

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 08/47] erofs: dynamically allocate the erofs-shrinker

Use new APIs to dynamically allocate the erofs-shrinker.

Signed-off-by: Qi Zheng <[email protected]>
---
fs/erofs/utils.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/erofs/utils.c b/fs/erofs/utils.c
index cc6fb9e98899..389de06e1065 100644
--- a/fs/erofs/utils.c
+++ b/fs/erofs/utils.c
@@ -270,19 +270,25 @@ static unsigned long erofs_shrink_scan(struct shrinker *shrink,
return freed;
}

-static struct shrinker erofs_shrinker_info = {
- .scan_objects = erofs_shrink_scan,
- .count_objects = erofs_shrink_count,
- .seeks = DEFAULT_SEEKS,
-};
+static struct shrinker *erofs_shrinker_info;

int __init erofs_init_shrinker(void)
{
- return register_shrinker(&erofs_shrinker_info, "erofs-shrinker");
+ erofs_shrinker_info = shrinker_alloc(0, "erofs-shrinker");
+ if (!erofs_shrinker_info)
+ return -ENOMEM;
+
+ erofs_shrinker_info->count_objects = erofs_shrink_count;
+ erofs_shrinker_info->scan_objects = erofs_shrink_scan;
+ erofs_shrinker_info->seeks = DEFAULT_SEEKS;
+
+ shrinker_register(erofs_shrinker_info);
+
+ return 0;
}

void erofs_exit_shrinker(void)
{
- unregister_shrinker(&erofs_shrinker_info);
+ shrinker_unregister(erofs_shrinker_info);
}
#endif /* !CONFIG_EROFS_FS_ZIP */
--
2.30.2


2023-07-24 09:51:18

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 11/47] gfs2: dynamically allocate the gfs2-qd shrinker

Use new APIs to dynamically allocate the gfs2-qd shrinker.

Signed-off-by: Qi Zheng <[email protected]>
---
fs/gfs2/main.c | 6 +++---
fs/gfs2/quota.c | 26 ++++++++++++++++++++------
fs/gfs2/quota.h | 3 ++-
3 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index afcb32854f14..e47b1cc79f59 100644
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -147,7 +147,7 @@ static int __init init_gfs2_fs(void)
if (!gfs2_trans_cachep)
goto fail_cachep8;

- error = register_shrinker(&gfs2_qd_shrinker, "gfs2-qd");
+ error = gfs2_qd_shrinker_init();
if (error)
goto fail_shrinker;

@@ -196,7 +196,7 @@ static int __init init_gfs2_fs(void)
fail_wq2:
destroy_workqueue(gfs_recovery_wq);
fail_wq1:
- unregister_shrinker(&gfs2_qd_shrinker);
+ gfs2_qd_shrinker_exit();
fail_shrinker:
kmem_cache_destroy(gfs2_trans_cachep);
fail_cachep8:
@@ -229,7 +229,7 @@ static int __init init_gfs2_fs(void)

static void __exit exit_gfs2_fs(void)
{
- unregister_shrinker(&gfs2_qd_shrinker);
+ gfs2_qd_shrinker_exit();
gfs2_glock_exit();
gfs2_unregister_debugfs();
unregister_filesystem(&gfs2_fs_type);
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 704192b73605..bc9883cea847 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -186,13 +186,27 @@ static unsigned long gfs2_qd_shrink_count(struct shrinker *shrink,
return vfs_pressure_ratio(list_lru_shrink_count(&gfs2_qd_lru, sc));
}

-struct shrinker gfs2_qd_shrinker = {
- .count_objects = gfs2_qd_shrink_count,
- .scan_objects = gfs2_qd_shrink_scan,
- .seeks = DEFAULT_SEEKS,
- .flags = SHRINKER_NUMA_AWARE,
-};
+static struct shrinker *gfs2_qd_shrinker;
+
+int gfs2_qd_shrinker_init(void)
+{
+ gfs2_qd_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE, "gfs2-qd");
+ if (!gfs2_qd_shrinker)
+ return -ENOMEM;
+
+ gfs2_qd_shrinker->count_objects = gfs2_qd_shrink_count;
+ gfs2_qd_shrinker->scan_objects = gfs2_qd_shrink_scan;
+ gfs2_qd_shrinker->seeks = DEFAULT_SEEKS;
+
+ shrinker_register(gfs2_qd_shrinker);

+ return 0;
+}
+
+void gfs2_qd_shrinker_exit(void)
+{
+ shrinker_unregister(gfs2_qd_shrinker);
+}

static u64 qd2index(struct gfs2_quota_data *qd)
{
diff --git a/fs/gfs2/quota.h b/fs/gfs2/quota.h
index 21ada332d555..f9cb863373f7 100644
--- a/fs/gfs2/quota.h
+++ b/fs/gfs2/quota.h
@@ -59,7 +59,8 @@ static inline int gfs2_quota_lock_check(struct gfs2_inode *ip,
}

extern const struct quotactl_ops gfs2_quotactl_ops;
-extern struct shrinker gfs2_qd_shrinker;
+int gfs2_qd_shrinker_init(void);
+void gfs2_qd_shrinker_exit(void);
extern struct list_lru gfs2_qd_lru;
extern void __init gfs2_quota_hash_init(void);

--
2.30.2


2023-07-24 09:51:24

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 10/47] gfs2: dynamically allocate the gfs2-glock shrinker

Use new APIs to dynamically allocate the gfs2-glock shrinker.

Signed-off-by: Qi Zheng <[email protected]>
---
fs/gfs2/glock.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 1438e7465e30..77da354667d9 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -2046,11 +2046,7 @@ static unsigned long gfs2_glock_shrink_count(struct shrinker *shrink,
return vfs_pressure_ratio(atomic_read(&lru_count));
}

-static struct shrinker glock_shrinker = {
- .seeks = DEFAULT_SEEKS,
- .count_objects = gfs2_glock_shrink_count,
- .scan_objects = gfs2_glock_shrink_scan,
-};
+static struct shrinker *glock_shrinker;

/**
* glock_hash_walk - Call a function for glock in a hash bucket
@@ -2472,13 +2468,19 @@ int __init gfs2_glock_init(void)
return -ENOMEM;
}

- ret = register_shrinker(&glock_shrinker, "gfs2-glock");
- if (ret) {
+ glock_shrinker = shrinker_alloc(0, "gfs2-glock");
+ if (!glock_shrinker) {
destroy_workqueue(glock_workqueue);
rhashtable_destroy(&gl_hash_table);
- return ret;
+ return -ENOMEM;
}

+ glock_shrinker->count_objects = gfs2_glock_shrink_count;
+ glock_shrinker->scan_objects = gfs2_glock_shrink_scan;
+ glock_shrinker->seeks = DEFAULT_SEEKS;
+
+ shrinker_register(glock_shrinker);
+
for (i = 0; i < GLOCK_WAIT_TABLE_SIZE; i++)
init_waitqueue_head(glock_wait_table + i);

@@ -2487,7 +2489,7 @@ int __init gfs2_glock_init(void)

void gfs2_glock_exit(void)
{
- unregister_shrinker(&glock_shrinker);
+ shrinker_unregister(glock_shrinker);
rhashtable_destroy(&gl_hash_table);
destroy_workqueue(glock_workqueue);
}
--
2.30.2


2023-07-24 09:52:03

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 09/47] f2fs: dynamically allocate the f2fs-shrinker

Use new APIs to dynamically allocate the f2fs-shrinker.

Signed-off-by: Qi Zheng <[email protected]>
---
fs/f2fs/super.c | 32 ++++++++++++++++++++++++--------
1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index ca31163da00a..8b08473db358 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -83,11 +83,27 @@ void f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned int rate,
#endif

/* f2fs-wide shrinker description */
-static struct shrinker f2fs_shrinker_info = {
- .scan_objects = f2fs_shrink_scan,
- .count_objects = f2fs_shrink_count,
- .seeks = DEFAULT_SEEKS,
-};
+static struct shrinker *f2fs_shrinker_info;
+
+static int f2fs_init_shrinker(void)
+{
+ f2fs_shrinker_info = shrinker_alloc(0, "f2fs-shrinker");
+ if (!f2fs_shrinker_info)
+ return -ENOMEM;
+
+ f2fs_shrinker_info->count_objects = f2fs_shrink_count;
+ f2fs_shrinker_info->scan_objects = f2fs_shrink_scan;
+ f2fs_shrinker_info->seeks = DEFAULT_SEEKS;
+
+ shrinker_register(f2fs_shrinker_info);
+
+ return 0;
+}
+
+static void f2fs_exit_shrinker(void)
+{
+ shrinker_unregister(f2fs_shrinker_info);
+}

enum {
Opt_gc_background,
@@ -4941,7 +4957,7 @@ static int __init init_f2fs_fs(void)
err = f2fs_init_sysfs();
if (err)
goto free_garbage_collection_cache;
- err = register_shrinker(&f2fs_shrinker_info, "f2fs-shrinker");
+ err = f2fs_init_shrinker();
if (err)
goto free_sysfs;
err = register_filesystem(&f2fs_fs_type);
@@ -4986,7 +5002,7 @@ static int __init init_f2fs_fs(void)
f2fs_destroy_root_stats();
unregister_filesystem(&f2fs_fs_type);
free_shrinker:
- unregister_shrinker(&f2fs_shrinker_info);
+ f2fs_exit_shrinker();
free_sysfs:
f2fs_exit_sysfs();
free_garbage_collection_cache:
@@ -5018,7 +5034,7 @@ static void __exit exit_f2fs_fs(void)
f2fs_destroy_post_read_processing();
f2fs_destroy_root_stats();
unregister_filesystem(&f2fs_fs_type);
- unregister_shrinker(&f2fs_shrinker_info);
+ f2fs_exit_shrinker();
f2fs_exit_sysfs();
f2fs_destroy_garbage_collection_cache();
f2fs_destroy_extent_cache();
--
2.30.2


2023-07-24 09:52:38

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 12/47] NFSv4.2: dynamically allocate the nfs-xattr shrinkers

Use new APIs to dynamically allocate the nfs-xattr shrinkers.

Signed-off-by: Qi Zheng <[email protected]>
---
fs/nfs/nfs42xattr.c | 87 +++++++++++++++++++++++----------------------
1 file changed, 44 insertions(+), 43 deletions(-)

diff --git a/fs/nfs/nfs42xattr.c b/fs/nfs/nfs42xattr.c
index 911f634ba3da..3604342e0f77 100644
--- a/fs/nfs/nfs42xattr.c
+++ b/fs/nfs/nfs42xattr.c
@@ -796,28 +796,9 @@ static unsigned long nfs4_xattr_cache_scan(struct shrinker *shrink,
static unsigned long nfs4_xattr_entry_scan(struct shrinker *shrink,
struct shrink_control *sc);

-static struct shrinker nfs4_xattr_cache_shrinker = {
- .count_objects = nfs4_xattr_cache_count,
- .scan_objects = nfs4_xattr_cache_scan,
- .seeks = DEFAULT_SEEKS,
- .flags = SHRINKER_MEMCG_AWARE,
-};
-
-static struct shrinker nfs4_xattr_entry_shrinker = {
- .count_objects = nfs4_xattr_entry_count,
- .scan_objects = nfs4_xattr_entry_scan,
- .seeks = DEFAULT_SEEKS,
- .batch = 512,
- .flags = SHRINKER_MEMCG_AWARE,
-};
-
-static struct shrinker nfs4_xattr_large_entry_shrinker = {
- .count_objects = nfs4_xattr_entry_count,
- .scan_objects = nfs4_xattr_entry_scan,
- .seeks = 1,
- .batch = 512,
- .flags = SHRINKER_MEMCG_AWARE,
-};
+static struct shrinker *nfs4_xattr_cache_shrinker;
+static struct shrinker *nfs4_xattr_entry_shrinker;
+static struct shrinker *nfs4_xattr_large_entry_shrinker;

static enum lru_status
cache_lru_isolate(struct list_head *item,
@@ -943,7 +924,7 @@ nfs4_xattr_entry_scan(struct shrinker *shrink, struct shrink_control *sc)
struct nfs4_xattr_entry *entry;
struct list_lru *lru;

- lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
+ lru = (shrink == nfs4_xattr_large_entry_shrinker) ?
&nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;

freed = list_lru_shrink_walk(lru, sc, entry_lru_isolate, &dispose);
@@ -971,7 +952,7 @@ nfs4_xattr_entry_count(struct shrinker *shrink, struct shrink_control *sc)
unsigned long count;
struct list_lru *lru;

- lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
+ lru = (shrink == nfs4_xattr_large_entry_shrinker) ?
&nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;

count = list_lru_shrink_count(lru, sc);
@@ -991,18 +972,34 @@ static void nfs4_xattr_cache_init_once(void *p)
INIT_LIST_HEAD(&cache->dispose);
}

-static int nfs4_xattr_shrinker_init(struct shrinker *shrinker,
- struct list_lru *lru, const char *name)
+typedef unsigned long (*count_objects_cb)(struct shrinker *s,
+ struct shrink_control *sc);
+typedef unsigned long (*scan_objects_cb)(struct shrinker *s,
+ struct shrink_control *sc);
+
+static int nfs4_xattr_shrinker_init(struct shrinker **shrinker,
+ struct list_lru *lru, const char *name,
+ count_objects_cb count,
+ scan_objects_cb scan, long batch, int seeks)
{
- int ret = 0;
+ int ret;

- ret = register_shrinker(shrinker, name);
- if (ret)
+ *shrinker = shrinker_alloc(SHRINKER_MEMCG_AWARE, name);
+ if (!*shrinker)
+ return -ENOMEM;
+
+ ret = list_lru_init_memcg(lru, *shrinker);
+ if (ret) {
+ shrinker_free_non_registered(*shrinker);
return ret;
+ }

- ret = list_lru_init_memcg(lru, shrinker);
- if (ret)
- unregister_shrinker(shrinker);
+ (*shrinker)->count_objects = count;
+ (*shrinker)->scan_objects = scan;
+ (*shrinker)->batch = batch;
+ (*shrinker)->seeks = seeks;
+
+ shrinker_register(*shrinker);

return ret;
}
@@ -1010,7 +1007,7 @@ static int nfs4_xattr_shrinker_init(struct shrinker *shrinker,
static void nfs4_xattr_shrinker_destroy(struct shrinker *shrinker,
struct list_lru *lru)
{
- unregister_shrinker(shrinker);
+ shrinker_unregister(shrinker);
list_lru_destroy(lru);
}

@@ -1026,27 +1023,31 @@ int __init nfs4_xattr_cache_init(void)
return -ENOMEM;

ret = nfs4_xattr_shrinker_init(&nfs4_xattr_cache_shrinker,
- &nfs4_xattr_cache_lru,
- "nfs-xattr_cache");
+ &nfs4_xattr_cache_lru, "nfs-xattr_cache",
+ nfs4_xattr_cache_count,
+ nfs4_xattr_cache_scan, 0, DEFAULT_SEEKS);
if (ret)
goto out1;

ret = nfs4_xattr_shrinker_init(&nfs4_xattr_entry_shrinker,
- &nfs4_xattr_entry_lru,
- "nfs-xattr_entry");
+ &nfs4_xattr_entry_lru, "nfs-xattr_entry",
+ nfs4_xattr_entry_count,
+ nfs4_xattr_entry_scan, 512, DEFAULT_SEEKS);
if (ret)
goto out2;

ret = nfs4_xattr_shrinker_init(&nfs4_xattr_large_entry_shrinker,
&nfs4_xattr_large_entry_lru,
- "nfs-xattr_large_entry");
+ "nfs-xattr_large_entry",
+ nfs4_xattr_entry_count,
+ nfs4_xattr_entry_scan, 512, 1);
if (!ret)
return 0;

- nfs4_xattr_shrinker_destroy(&nfs4_xattr_entry_shrinker,
+ nfs4_xattr_shrinker_destroy(nfs4_xattr_entry_shrinker,
&nfs4_xattr_entry_lru);
out2:
- nfs4_xattr_shrinker_destroy(&nfs4_xattr_cache_shrinker,
+ nfs4_xattr_shrinker_destroy(nfs4_xattr_cache_shrinker,
&nfs4_xattr_cache_lru);
out1:
kmem_cache_destroy(nfs4_xattr_cache_cachep);
@@ -1056,11 +1057,11 @@ int __init nfs4_xattr_cache_init(void)

void nfs4_xattr_cache_exit(void)
{
- nfs4_xattr_shrinker_destroy(&nfs4_xattr_large_entry_shrinker,
+ nfs4_xattr_shrinker_destroy(nfs4_xattr_large_entry_shrinker,
&nfs4_xattr_large_entry_lru);
- nfs4_xattr_shrinker_destroy(&nfs4_xattr_entry_shrinker,
+ nfs4_xattr_shrinker_destroy(nfs4_xattr_entry_shrinker,
&nfs4_xattr_entry_lru);
- nfs4_xattr_shrinker_destroy(&nfs4_xattr_cache_shrinker,
+ nfs4_xattr_shrinker_destroy(nfs4_xattr_cache_shrinker,
&nfs4_xattr_cache_lru);
kmem_cache_destroy(nfs4_xattr_cache_cachep);
}
--
2.30.2


2023-07-24 09:53:19

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 14/47] nfsd: dynamically allocate the nfsd-filecache shrinker

Use new APIs to dynamically allocate the nfsd-filecache shrinker.

Signed-off-by: Qi Zheng <[email protected]>
---
fs/nfsd/filecache.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index ee9c923192e0..50216768d408 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -521,11 +521,7 @@ nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc)
return ret;
}

-static struct shrinker nfsd_file_shrinker = {
- .scan_objects = nfsd_file_lru_scan,
- .count_objects = nfsd_file_lru_count,
- .seeks = 1,
-};
+static struct shrinker *nfsd_file_shrinker;

/**
* nfsd_file_cond_queue - conditionally unhash and queue a nfsd_file
@@ -746,12 +742,18 @@ nfsd_file_cache_init(void)
goto out_err;
}

- ret = register_shrinker(&nfsd_file_shrinker, "nfsd-filecache");
- if (ret) {
- pr_err("nfsd: failed to register nfsd_file_shrinker: %d\n", ret);
+ nfsd_file_shrinker = shrinker_alloc(0, "nfsd-filecache");
+ if (!nfsd_file_shrinker) {
+ pr_err("nfsd: failed to allocate nfsd_file_shrinker\n");
goto out_lru;
}

+ nfsd_file_shrinker->count_objects = nfsd_file_lru_count;
+ nfsd_file_shrinker->scan_objects = nfsd_file_lru_scan;
+ nfsd_file_shrinker->seeks = 1;
+
+ shrinker_register(nfsd_file_shrinker);
+
ret = lease_register_notifier(&nfsd_file_lease_notifier);
if (ret) {
pr_err("nfsd: unable to register lease notifier: %d\n", ret);
@@ -774,7 +776,7 @@ nfsd_file_cache_init(void)
out_notifier:
lease_unregister_notifier(&nfsd_file_lease_notifier);
out_shrinker:
- unregister_shrinker(&nfsd_file_shrinker);
+ shrinker_unregister(nfsd_file_shrinker);
out_lru:
list_lru_destroy(&nfsd_file_lru);
out_err:
@@ -891,7 +893,7 @@ nfsd_file_cache_shutdown(void)
return;

lease_unregister_notifier(&nfsd_file_lease_notifier);
- unregister_shrinker(&nfsd_file_shrinker);
+ shrinker_unregister(nfsd_file_shrinker);
/*
* make sure all callers of nfsd_file_lru_cb are done before
* calling nfsd_file_cache_purge
--
2.30.2


2023-07-24 09:54:41

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 15/47] quota: dynamically allocate the dquota-cache shrinker

Use new APIs to dynamically allocate the dquota-cache shrinker.

Signed-off-by: Qi Zheng <[email protected]>
---
fs/quota/dquot.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index e8232242dd34..6cb2d8911bc3 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -791,11 +791,7 @@ dqcache_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
percpu_counter_read_positive(&dqstats.counter[DQST_FREE_DQUOTS]));
}

-static struct shrinker dqcache_shrinker = {
- .count_objects = dqcache_shrink_count,
- .scan_objects = dqcache_shrink_scan,
- .seeks = DEFAULT_SEEKS,
-};
+static struct shrinker *dqcache_shrinker;

/*
* Safely release dquot and put reference to dquot.
@@ -2991,8 +2987,15 @@ static int __init dquot_init(void)
pr_info("VFS: Dquot-cache hash table entries: %ld (order %ld,"
" %ld bytes)\n", nr_hash, order, (PAGE_SIZE << order));

- if (register_shrinker(&dqcache_shrinker, "dquota-cache"))
- panic("Cannot register dquot shrinker");
+ dqcache_shrinker = shrinker_alloc(0, "dquota-cache");
+ if (!dqcache_shrinker)
+ panic("Cannot allocate dquot shrinker");
+
+ dqcache_shrinker->count_objects = dqcache_shrink_count;
+ dqcache_shrinker->scan_objects = dqcache_shrink_scan;
+ dqcache_shrinker->seeks = DEFAULT_SEEKS;
+
+ shrinker_register(dqcache_shrinker);

return 0;
}
--
2.30.2


2023-07-24 09:54:54

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 17/47] rcu: dynamically allocate the rcu-lazy shrinker

Use new APIs to dynamically allocate the rcu-lazy shrinker.

Signed-off-by: Qi Zheng <[email protected]>
---
kernel/rcu/tree_nocb.h | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 43229d2b0c44..919f17561733 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1397,12 +1397,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
return count ? count : SHRINK_STOP;
}

-static struct shrinker lazy_rcu_shrinker = {
- .count_objects = lazy_rcu_shrink_count,
- .scan_objects = lazy_rcu_shrink_scan,
- .batch = 0,
- .seeks = DEFAULT_SEEKS,
-};
+static struct shrinker *lazy_rcu_shrinker;
#endif // #ifdef CONFIG_RCU_LAZY

void __init rcu_init_nohz(void)
@@ -1436,8 +1431,16 @@ void __init rcu_init_nohz(void)
return;

#ifdef CONFIG_RCU_LAZY
- if (register_shrinker(&lazy_rcu_shrinker, "rcu-lazy"))
- pr_err("Failed to register lazy_rcu shrinker!\n");
+ lazy_rcu_shrinker = shrinker_alloc(0, "rcu-lazy");
+ if (!lazy_rcu_shrinker) {
+ pr_err("Failed to allocate lazy_rcu shrinker!\n");
+ } else {
+ lazy_rcu_shrinker->count_objects = lazy_rcu_shrink_count;
+ lazy_rcu_shrinker->scan_objects = lazy_rcu_shrink_scan;
+ lazy_rcu_shrinker->seeks = DEFAULT_SEEKS;
+
+ shrinker_register(lazy_rcu_shrinker);
+ }
#endif // #ifdef CONFIG_RCU_LAZY

if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) {
--
2.30.2


2023-07-24 09:54:54

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 19/47] mm: thp: dynamically allocate the thp-related shrinkers

Use new APIs to dynamically allocate the thp-zero and thp-deferred_split
shrinkers.

Signed-off-by: Qi Zheng <[email protected]>
---
mm/huge_memory.c | 69 +++++++++++++++++++++++++++++++-----------------
1 file changed, 45 insertions(+), 24 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8c94b34024a2..4db5a1834d81 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -65,7 +65,11 @@ unsigned long transparent_hugepage_flags __read_mostly =
(1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG)|
(1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG);

-static struct shrinker deferred_split_shrinker;
+static struct shrinker *deferred_split_shrinker;
+static unsigned long deferred_split_count(struct shrinker *shrink,
+ struct shrink_control *sc);
+static unsigned long deferred_split_scan(struct shrinker *shrink,
+ struct shrink_control *sc);

static atomic_t huge_zero_refcount;
struct page *huge_zero_page __read_mostly;
@@ -229,11 +233,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink,
return 0;
}

-static struct shrinker huge_zero_page_shrinker = {
- .count_objects = shrink_huge_zero_page_count,
- .scan_objects = shrink_huge_zero_page_scan,
- .seeks = DEFAULT_SEEKS,
-};
+static struct shrinker *huge_zero_page_shrinker;

#ifdef CONFIG_SYSFS
static ssize_t enabled_show(struct kobject *kobj,
@@ -454,6 +454,40 @@ static inline void hugepage_exit_sysfs(struct kobject *hugepage_kobj)
}
#endif /* CONFIG_SYSFS */

+static int thp_shrinker_init(void)
+{
+ huge_zero_page_shrinker = shrinker_alloc(0, "thp-zero");
+ if (!huge_zero_page_shrinker)
+ return -ENOMEM;
+
+ deferred_split_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE |
+ SHRINKER_MEMCG_AWARE |
+ SHRINKER_NONSLAB,
+ "thp-deferred_split");
+ if (!deferred_split_shrinker) {
+ shrinker_free_non_registered(huge_zero_page_shrinker);
+ return -ENOMEM;
+ }
+
+ huge_zero_page_shrinker->count_objects = shrink_huge_zero_page_count;
+ huge_zero_page_shrinker->scan_objects = shrink_huge_zero_page_scan;
+ huge_zero_page_shrinker->seeks = DEFAULT_SEEKS;
+ shrinker_register(huge_zero_page_shrinker);
+
+ deferred_split_shrinker->count_objects = deferred_split_count;
+ deferred_split_shrinker->scan_objects = deferred_split_scan;
+ deferred_split_shrinker->seeks = DEFAULT_SEEKS;
+ shrinker_register(deferred_split_shrinker);
+
+ return 0;
+}
+
+static void thp_shrinker_exit(void)
+{
+ shrinker_unregister(huge_zero_page_shrinker);
+ shrinker_unregister(deferred_split_shrinker);
+}
+
static int __init hugepage_init(void)
{
int err;
@@ -482,12 +516,9 @@ static int __init hugepage_init(void)
if (err)
goto err_slab;

- err = register_shrinker(&huge_zero_page_shrinker, "thp-zero");
- if (err)
- goto err_hzp_shrinker;
- err = register_shrinker(&deferred_split_shrinker, "thp-deferred_split");
+ err = thp_shrinker_init();
if (err)
- goto err_split_shrinker;
+ goto err_shrinker;

/*
* By default disable transparent hugepages on smaller systems,
@@ -505,10 +536,8 @@ static int __init hugepage_init(void)

return 0;
err_khugepaged:
- unregister_shrinker(&deferred_split_shrinker);
-err_split_shrinker:
- unregister_shrinker(&huge_zero_page_shrinker);
-err_hzp_shrinker:
+ thp_shrinker_exit();
+err_shrinker:
khugepaged_destroy();
err_slab:
hugepage_exit_sysfs(hugepage_kobj);
@@ -2851,7 +2880,7 @@ void deferred_split_folio(struct folio *folio)
#ifdef CONFIG_MEMCG
if (memcg)
set_shrinker_bit(memcg, folio_nid(folio),
- deferred_split_shrinker.id);
+ deferred_split_shrinker->id);
#endif
}
spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
@@ -2925,14 +2954,6 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
return split;
}

-static struct shrinker deferred_split_shrinker = {
- .count_objects = deferred_split_count,
- .scan_objects = deferred_split_scan,
- .seeks = DEFAULT_SEEKS,
- .flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE |
- SHRINKER_NONSLAB,
-};
-
#ifdef CONFIG_DEBUG_FS
static void split_huge_pages_all(void)
{
--
2.30.2


2023-07-24 09:55:09

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 18/47] rcu: dynamically allocate the rcu-kfree shrinker

Use new APIs to dynamically allocate the rcu-kfree shrinker.

Signed-off-by: Qi Zheng <[email protected]>
---
kernel/rcu/tree.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1449cb69a0e0..d068ce3567fc 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3445,12 +3445,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
return freed == 0 ? SHRINK_STOP : freed;
}

-static struct shrinker kfree_rcu_shrinker = {
- .count_objects = kfree_rcu_shrink_count,
- .scan_objects = kfree_rcu_shrink_scan,
- .batch = 0,
- .seeks = DEFAULT_SEEKS,
-};
+static struct shrinker *kfree_rcu_shrinker;

void __init kfree_rcu_scheduler_running(void)
{
@@ -4958,8 +4953,18 @@ static void __init kfree_rcu_batch_init(void)
INIT_DELAYED_WORK(&krcp->page_cache_work, fill_page_cache_func);
krcp->initialized = true;
}
- if (register_shrinker(&kfree_rcu_shrinker, "rcu-kfree"))
- pr_err("Failed to register kfree_rcu() shrinker!\n");
+
+ kfree_rcu_shrinker = shrinker_alloc(0, "rcu-kfree");
+ if (!kfree_rcu_shrinker) {
+ pr_err("Failed to allocate kfree_rcu() shrinker!\n");
+ return;
+ }
+
+ kfree_rcu_shrinker->count_objects = kfree_rcu_shrink_count;
+ kfree_rcu_shrinker->scan_objects = kfree_rcu_shrink_scan;
+ kfree_rcu_shrinker->seeks = DEFAULT_SEEKS;
+
+ shrinker_register(kfree_rcu_shrinker);
}

void __init rcu_init(void)
--
2.30.2


2023-07-24 09:56:20

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 22/47] drm/i915: dynamically allocate the i915_gem_mm shrinker

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the i915_gem_mm shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct drm_i915_private.

Signed-off-by: Qi Zheng <[email protected]>
---
drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 30 +++++++++++---------
drivers/gpu/drm/i915/i915_drv.h | 2 +-
2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
index 214763942aa2..a7409b8c2634 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
@@ -284,8 +284,7 @@ unsigned long i915_gem_shrink_all(struct drm_i915_private *i915)
static unsigned long
i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
{
- struct drm_i915_private *i915 =
- container_of(shrinker, struct drm_i915_private, mm.shrinker);
+ struct drm_i915_private *i915 = shrinker->private_data;
unsigned long num_objects;
unsigned long count;

@@ -302,8 +301,8 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
if (num_objects) {
unsigned long avg = 2 * count / num_objects;

- i915->mm.shrinker.batch =
- max((i915->mm.shrinker.batch + avg) >> 1,
+ i915->mm.shrinker->batch =
+ max((i915->mm.shrinker->batch + avg) >> 1,
128ul /* default SHRINK_BATCH */);
}

@@ -313,8 +312,7 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
static unsigned long
i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
{
- struct drm_i915_private *i915 =
- container_of(shrinker, struct drm_i915_private, mm.shrinker);
+ struct drm_i915_private *i915 = shrinker->private_data;
unsigned long freed;

sc->nr_scanned = 0;
@@ -422,12 +420,18 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr

void i915_gem_driver_register__shrinker(struct drm_i915_private *i915)
{
- i915->mm.shrinker.scan_objects = i915_gem_shrinker_scan;
- i915->mm.shrinker.count_objects = i915_gem_shrinker_count;
- i915->mm.shrinker.seeks = DEFAULT_SEEKS;
- i915->mm.shrinker.batch = 4096;
- drm_WARN_ON(&i915->drm, register_shrinker(&i915->mm.shrinker,
- "drm-i915_gem"));
+ i915->mm.shrinker = shrinker_alloc(0, "drm-i915_gem");
+ if (!i915->mm.shrinker) {
+ drm_WARN_ON(&i915->drm, 1);
+ } else {
+ i915->mm.shrinker->scan_objects = i915_gem_shrinker_scan;
+ i915->mm.shrinker->count_objects = i915_gem_shrinker_count;
+ i915->mm.shrinker->seeks = DEFAULT_SEEKS;
+ i915->mm.shrinker->batch = 4096;
+ i915->mm.shrinker->private_data = i915;
+
+ shrinker_register(i915->mm.shrinker);
+ }

i915->mm.oom_notifier.notifier_call = i915_gem_shrinker_oom;
drm_WARN_ON(&i915->drm, register_oom_notifier(&i915->mm.oom_notifier));
@@ -443,7 +447,7 @@ void i915_gem_driver_unregister__shrinker(struct drm_i915_private *i915)
unregister_vmap_purge_notifier(&i915->mm.vmap_notifier));
drm_WARN_ON(&i915->drm,
unregister_oom_notifier(&i915->mm.oom_notifier));
- unregister_shrinker(&i915->mm.shrinker);
+ shrinker_unregister(i915->mm.shrinker);
}

void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 682ef2b5c7d5..389e8bf140d7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -163,7 +163,7 @@ struct i915_gem_mm {

struct notifier_block oom_notifier;
struct notifier_block vmap_notifier;
- struct shrinker shrinker;
+ struct shrinker *shrinker;

#ifdef CONFIG_MMU_NOTIFIER
/**
--
2.30.2


2023-07-24 09:59:23

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 28/47] bcache: dynamically allocate the md-bcache shrinker

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the md-bcache shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct cache_set.

Signed-off-by: Qi Zheng <[email protected]>
---
drivers/md/bcache/bcache.h | 2 +-
drivers/md/bcache/btree.c | 27 ++++++++++++++++-----------
drivers/md/bcache/sysfs.c | 3 ++-
3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 5a79bb3c272f..c622bc50f81b 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -541,7 +541,7 @@ struct cache_set {
struct bio_set bio_split;

/* For the btree cache */
- struct shrinker shrink;
+ struct shrinker *shrink;

/* For the btree cache and anything allocation related */
struct mutex bucket_lock;
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index fd121a61f17c..c176c7fc77d9 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -667,7 +667,7 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
static unsigned long bch_mca_scan(struct shrinker *shrink,
struct shrink_control *sc)
{
- struct cache_set *c = container_of(shrink, struct cache_set, shrink);
+ struct cache_set *c = shrink->private_data;
struct btree *b, *t;
unsigned long i, nr = sc->nr_to_scan;
unsigned long freed = 0;
@@ -734,7 +734,7 @@ static unsigned long bch_mca_scan(struct shrinker *shrink,
static unsigned long bch_mca_count(struct shrinker *shrink,
struct shrink_control *sc)
{
- struct cache_set *c = container_of(shrink, struct cache_set, shrink);
+ struct cache_set *c = shrink->private_data;

if (c->shrinker_disabled)
return 0;
@@ -752,8 +752,8 @@ void bch_btree_cache_free(struct cache_set *c)

closure_init_stack(&cl);

- if (c->shrink.list.next)
- unregister_shrinker(&c->shrink);
+ if (c->shrink)
+ shrinker_unregister(c->shrink);

mutex_lock(&c->bucket_lock);

@@ -828,14 +828,19 @@ int bch_btree_cache_alloc(struct cache_set *c)
c->verify_data = NULL;
#endif

- c->shrink.count_objects = bch_mca_count;
- c->shrink.scan_objects = bch_mca_scan;
- c->shrink.seeks = 4;
- c->shrink.batch = c->btree_pages * 2;
+ c->shrink = shrinker_alloc(0, "md-bcache:%pU", c->set_uuid);
+ if (!c->shrink) {
+ pr_warn("bcache: %s: could not allocate shrinker\n", __func__);
+ return -ENOMEM;
+ }
+
+ c->shrink->count_objects = bch_mca_count;
+ c->shrink->scan_objects = bch_mca_scan;
+ c->shrink->seeks = 4;
+ c->shrink->batch = c->btree_pages * 2;
+ c->shrink->private_data = c;

- if (register_shrinker(&c->shrink, "md-bcache:%pU", c->set_uuid))
- pr_warn("bcache: %s: could not register shrinker\n",
- __func__);
+ shrinker_register(c->shrink);

return 0;
}
diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index 0e2c1880f60b..45d8af755de6 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -866,7 +866,8 @@ STORE(__bch_cache_set)

sc.gfp_mask = GFP_KERNEL;
sc.nr_to_scan = strtoul_or_return(buf);
- c->shrink.scan_objects(&c->shrink, &sc);
+ if (c->shrink)
+ c->shrink->scan_objects(c->shrink, &sc);
}

sysfs_strtoul_clamp(congested_read_threshold_us,
--
2.30.2


2023-07-24 09:59:55

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 30/47] virtio_balloon: dynamically allocate the virtio-balloon shrinker

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the virtio-balloon shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct virtio_balloon.

Signed-off-by: Qi Zheng <[email protected]>
---
drivers/virtio/virtio_balloon.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 5b15936a5214..d773860c3b18 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -111,7 +111,7 @@ struct virtio_balloon {
struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];

/* Shrinker to return free pages - VIRTIO_BALLOON_F_FREE_PAGE_HINT */
- struct shrinker shrinker;
+ struct shrinker *shrinker;

/* OOM notifier to deflate on OOM - VIRTIO_BALLOON_F_DEFLATE_ON_OOM */
struct notifier_block oom_nb;
@@ -816,8 +816,7 @@ static unsigned long shrink_free_pages(struct virtio_balloon *vb,
static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
struct shrink_control *sc)
{
- struct virtio_balloon *vb = container_of(shrinker,
- struct virtio_balloon, shrinker);
+ struct virtio_balloon *vb = shrinker->private_data;

return shrink_free_pages(vb, sc->nr_to_scan);
}
@@ -825,8 +824,7 @@ static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
struct shrink_control *sc)
{
- struct virtio_balloon *vb = container_of(shrinker,
- struct virtio_balloon, shrinker);
+ struct virtio_balloon *vb = shrinker->private_data;

return vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
}
@@ -847,16 +845,23 @@ static int virtio_balloon_oom_notify(struct notifier_block *nb,

static void virtio_balloon_unregister_shrinker(struct virtio_balloon *vb)
{
- unregister_shrinker(&vb->shrinker);
+ shrinker_unregister(vb->shrinker);
}

static int virtio_balloon_register_shrinker(struct virtio_balloon *vb)
{
- vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
- vb->shrinker.count_objects = virtio_balloon_shrinker_count;
- vb->shrinker.seeks = DEFAULT_SEEKS;
+ vb->shrinker = shrinker_alloc(0, "virtio-balloon");
+ if (!vb->shrinker)
+ return -ENOMEM;

- return register_shrinker(&vb->shrinker, "virtio-balloon");
+ vb->shrinker->scan_objects = virtio_balloon_shrinker_scan;
+ vb->shrinker->count_objects = virtio_balloon_shrinker_count;
+ vb->shrinker->seeks = DEFAULT_SEEKS;
+ vb->shrinker->private_data = vb;
+
+ shrinker_register(vb->shrinker);
+
+ return 0;
}

static int virtballoon_probe(struct virtio_device *vdev)
--
2.30.2


2023-07-24 10:01:34

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 36/47] xfs: dynamically allocate the xfs-buf shrinker

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the xfs-buf shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct xfs_buftarg.

Signed-off-by: Qi Zheng <[email protected]>
---
fs/xfs/xfs_buf.c | 25 ++++++++++++++-----------
fs/xfs/xfs_buf.h | 2 +-
2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 15d1e5a7c2d3..19a0bf6ce115 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1906,8 +1906,7 @@ xfs_buftarg_shrink_scan(
struct shrinker *shrink,
struct shrink_control *sc)
{
- struct xfs_buftarg *btp = container_of(shrink,
- struct xfs_buftarg, bt_shrinker);
+ struct xfs_buftarg *btp = shrink->private_data;
LIST_HEAD(dispose);
unsigned long freed;

@@ -1929,8 +1928,7 @@ xfs_buftarg_shrink_count(
struct shrinker *shrink,
struct shrink_control *sc)
{
- struct xfs_buftarg *btp = container_of(shrink,
- struct xfs_buftarg, bt_shrinker);
+ struct xfs_buftarg *btp = shrink->private_data;
return list_lru_shrink_count(&btp->bt_lru, sc);
}

@@ -1938,7 +1936,7 @@ void
xfs_free_buftarg(
struct xfs_buftarg *btp)
{
- unregister_shrinker(&btp->bt_shrinker);
+ shrinker_unregister(btp->bt_shrinker);
ASSERT(percpu_counter_sum(&btp->bt_io_count) == 0);
percpu_counter_destroy(&btp->bt_io_count);
list_lru_destroy(&btp->bt_lru);
@@ -2021,13 +2019,18 @@ xfs_alloc_buftarg(
if (percpu_counter_init(&btp->bt_io_count, 0, GFP_KERNEL))
goto error_lru;

- btp->bt_shrinker.count_objects = xfs_buftarg_shrink_count;
- btp->bt_shrinker.scan_objects = xfs_buftarg_shrink_scan;
- btp->bt_shrinker.seeks = DEFAULT_SEEKS;
- btp->bt_shrinker.flags = SHRINKER_NUMA_AWARE;
- if (register_shrinker(&btp->bt_shrinker, "xfs-buf:%s",
- mp->m_super->s_id))
+ btp->bt_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE, "xfs-buf:%s",
+ mp->m_super->s_id);
+ if (!btp->bt_shrinker)
goto error_pcpu;
+
+ btp->bt_shrinker->count_objects = xfs_buftarg_shrink_count;
+ btp->bt_shrinker->scan_objects = xfs_buftarg_shrink_scan;
+ btp->bt_shrinker->seeks = DEFAULT_SEEKS;
+ btp->bt_shrinker->private_data = btp;
+
+ shrinker_register(btp->bt_shrinker);
+
return btp;

error_pcpu:
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 549c60942208..4e6969a675f7 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -102,7 +102,7 @@ typedef struct xfs_buftarg {
size_t bt_logical_sectormask;

/* LRU control structures */
- struct shrinker bt_shrinker;
+ struct shrinker *bt_shrinker;
struct list_lru bt_lru;

struct percpu_counter bt_io_count;
--
2.30.2


2023-07-24 10:02:15

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 37/47] xfs: dynamically allocate the xfs-inodegc shrinker

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the xfs-inodegc shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct xfs_mount.

Signed-off-by: Qi Zheng <[email protected]>
---
fs/xfs/xfs_icache.c | 26 +++++++++++++++-----------
fs/xfs/xfs_mount.c | 4 ++--
fs/xfs/xfs_mount.h | 2 +-
3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 453890942d9f..751c380afd5a 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -2225,8 +2225,7 @@ xfs_inodegc_shrinker_count(
struct shrinker *shrink,
struct shrink_control *sc)
{
- struct xfs_mount *mp = container_of(shrink, struct xfs_mount,
- m_inodegc_shrinker);
+ struct xfs_mount *mp = shrink->private_data;
struct xfs_inodegc *gc;
int cpu;

@@ -2247,8 +2246,7 @@ xfs_inodegc_shrinker_scan(
struct shrinker *shrink,
struct shrink_control *sc)
{
- struct xfs_mount *mp = container_of(shrink, struct xfs_mount,
- m_inodegc_shrinker);
+ struct xfs_mount *mp = shrink->private_data;
struct xfs_inodegc *gc;
int cpu;
bool no_items = true;
@@ -2284,13 +2282,19 @@ int
xfs_inodegc_register_shrinker(
struct xfs_mount *mp)
{
- struct shrinker *shrink = &mp->m_inodegc_shrinker;
+ mp->m_inodegc_shrinker = shrinker_alloc(SHRINKER_NONSLAB,
+ "xfs-inodegc:%s",
+ mp->m_super->s_id);
+ if (!mp->m_inodegc_shrinker)
+ return -ENOMEM;
+
+ mp->m_inodegc_shrinker->count_objects = xfs_inodegc_shrinker_count;
+ mp->m_inodegc_shrinker->scan_objects = xfs_inodegc_shrinker_scan;
+ mp->m_inodegc_shrinker->seeks = 0;
+ mp->m_inodegc_shrinker->batch = XFS_INODEGC_SHRINKER_BATCH;
+ mp->m_inodegc_shrinker->private_data = mp;

- shrink->count_objects = xfs_inodegc_shrinker_count;
- shrink->scan_objects = xfs_inodegc_shrinker_scan;
- shrink->seeks = 0;
- shrink->flags = SHRINKER_NONSLAB;
- shrink->batch = XFS_INODEGC_SHRINKER_BATCH;
+ shrinker_register(mp->m_inodegc_shrinker);

- return register_shrinker(shrink, "xfs-inodegc:%s", mp->m_super->s_id);
+ return 0;
}
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index fb87ffb48f7f..27c2d24797c9 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1018,7 +1018,7 @@ xfs_mountfs(
out_log_dealloc:
xfs_log_mount_cancel(mp);
out_inodegc_shrinker:
- unregister_shrinker(&mp->m_inodegc_shrinker);
+ shrinker_unregister(mp->m_inodegc_shrinker);
out_fail_wait:
if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp)
xfs_buftarg_drain(mp->m_logdev_targp);
@@ -1100,7 +1100,7 @@ xfs_unmountfs(
#if defined(DEBUG)
xfs_errortag_clearall(mp);
#endif
- unregister_shrinker(&mp->m_inodegc_shrinker);
+ shrinker_unregister(mp->m_inodegc_shrinker);
xfs_free_perag(mp);

xfs_errortag_del(mp);
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index e2866e7fa60c..562c294ca08e 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -217,7 +217,7 @@ typedef struct xfs_mount {
atomic_t m_agirotor; /* last ag dir inode alloced */

/* Memory shrinker to throttle and reprioritize inodegc */
- struct shrinker m_inodegc_shrinker;
+ struct shrinker *m_inodegc_shrinker;
/*
* Workqueue item so that we can coalesce multiple inode flush attempts
* into a single flush.
--
2.30.2


2023-07-24 10:02:27

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 41/47] mm: shrinker: remove old APIs

Now no users are using the old APIs, just remove them.

Signed-off-by: Qi Zheng <[email protected]>
---
include/linux/shrinker.h | 7 --
mm/shrinker.c | 143 ---------------------------------------
2 files changed, 150 deletions(-)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 296f5e163861..e464b4e9be0e 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -105,13 +105,6 @@ void shrinker_free_non_registered(struct shrinker *shrinker);
void shrinker_register(struct shrinker *shrinker);
void shrinker_unregister(struct shrinker *shrinker);

-extern int __printf(2, 3) prealloc_shrinker(struct shrinker *shrinker,
- const char *fmt, ...);
-extern void register_shrinker_prepared(struct shrinker *shrinker);
-extern int __printf(2, 3) register_shrinker(struct shrinker *shrinker,
- const char *fmt, ...);
-extern void unregister_shrinker(struct shrinker *shrinker);
-extern void free_prealloced_shrinker(struct shrinker *shrinker);
extern void synchronize_shrinkers(void);

#ifdef CONFIG_SHRINKER_DEBUG
diff --git a/mm/shrinker.c b/mm/shrinker.c
index d820e4cc5806..2f3635ad1b45 100644
--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -661,149 +661,6 @@ void shrinker_unregister(struct shrinker *shrinker)
}
EXPORT_SYMBOL(shrinker_unregister);

-/*
- * Add a shrinker callback to be called from the vm.
- */
-static int __prealloc_shrinker(struct shrinker *shrinker)
-{
- unsigned int size;
- int err;
-
- if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
- err = prealloc_memcg_shrinker(shrinker);
- if (err != -ENOSYS)
- return err;
-
- shrinker->flags &= ~SHRINKER_MEMCG_AWARE;
- }
-
- size = sizeof(*shrinker->nr_deferred);
- if (shrinker->flags & SHRINKER_NUMA_AWARE)
- size *= nr_node_ids;
-
- shrinker->nr_deferred = kzalloc(size, GFP_KERNEL);
- if (!shrinker->nr_deferred)
- return -ENOMEM;
-
- return 0;
-}
-
-#ifdef CONFIG_SHRINKER_DEBUG
-int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
-{
- va_list ap;
- int err;
-
- va_start(ap, fmt);
- shrinker->name = kvasprintf_const(GFP_KERNEL, fmt, ap);
- va_end(ap);
- if (!shrinker->name)
- return -ENOMEM;
-
- err = __prealloc_shrinker(shrinker);
- if (err) {
- kfree_const(shrinker->name);
- shrinker->name = NULL;
- }
-
- return err;
-}
-#else
-int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
-{
- return __prealloc_shrinker(shrinker);
-}
-#endif
-
-void free_prealloced_shrinker(struct shrinker *shrinker)
-{
-#ifdef CONFIG_SHRINKER_DEBUG
- kfree_const(shrinker->name);
- shrinker->name = NULL;
-#endif
- if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
- down_write(&shrinker_rwsem);
- unregister_memcg_shrinker(shrinker);
- up_write(&shrinker_rwsem);
- return;
- }
-
- kfree(shrinker->nr_deferred);
- shrinker->nr_deferred = NULL;
-}
-
-void register_shrinker_prepared(struct shrinker *shrinker)
-{
- down_write(&shrinker_rwsem);
- list_add_tail(&shrinker->list, &shrinker_list);
- shrinker->flags |= SHRINKER_REGISTERED;
- shrinker_debugfs_add(shrinker);
- up_write(&shrinker_rwsem);
-}
-
-static int __register_shrinker(struct shrinker *shrinker)
-{
- int err = __prealloc_shrinker(shrinker);
-
- if (err)
- return err;
- register_shrinker_prepared(shrinker);
- return 0;
-}
-
-#ifdef CONFIG_SHRINKER_DEBUG
-int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
-{
- va_list ap;
- int err;
-
- va_start(ap, fmt);
- shrinker->name = kvasprintf_const(GFP_KERNEL, fmt, ap);
- va_end(ap);
- if (!shrinker->name)
- return -ENOMEM;
-
- err = __register_shrinker(shrinker);
- if (err) {
- kfree_const(shrinker->name);
- shrinker->name = NULL;
- }
- return err;
-}
-#else
-int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
-{
- return __register_shrinker(shrinker);
-}
-#endif
-EXPORT_SYMBOL(register_shrinker);
-
-/*
- * Remove one
- */
-void unregister_shrinker(struct shrinker *shrinker)
-{
- struct dentry *debugfs_entry;
- int debugfs_id;
-
- if (!(shrinker->flags & SHRINKER_REGISTERED))
- return;
-
- down_write(&shrinker_rwsem);
- list_del(&shrinker->list);
- shrinker->flags &= ~SHRINKER_REGISTERED;
- if (shrinker->flags & SHRINKER_MEMCG_AWARE)
- unregister_memcg_shrinker(shrinker);
- debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id);
- up_write(&shrinker_rwsem);
-
- shrinker_debugfs_remove(debugfs_entry, debugfs_id);
-
- kfree(shrinker->nr_deferred);
- shrinker->nr_deferred = NULL;
-}
-EXPORT_SYMBOL(unregister_shrinker);
-
/**
* synchronize_shrinkers - Wait for all running shrinkers to complete.
*
--
2.30.2


2023-07-24 10:02:39

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 45/47] mm: shrinker: make memcg slab shrink lockless

Like global slab shrink, this commit also uses refcount+RCU method to make
memcg slab shrink lockless.

Use the following script to do slab shrink stress test:

```

DIR="/root/shrinker/memcg/mnt"

do_create()
{
mkdir -p /sys/fs/cgroup/memory/test
echo 4G > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
for i in `seq 0 $1`;
do
mkdir -p /sys/fs/cgroup/memory/test/$i;
echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs;
mkdir -p $DIR/$i;
done
}

do_mount()
{
for i in `seq $1 $2`;
do
mount -t tmpfs $i $DIR/$i;
done
}

do_touch()
{
for i in `seq $1 $2`;
do
echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs;
dd if=/dev/zero of=$DIR/$i/file$i bs=1M count=1 &
done
}

case "$1" in
touch)
do_touch $2 $3
;;
test)
do_create 4000
do_mount 0 4000
do_touch 0 3000
;;
*)
exit 1
;;
esac
```

Save the above script, then run test and touch commands. Then we can use
the following perf command to view hotspots:

perf top -U -F 999

1) Before applying this patchset:

40.44% [kernel] [k] down_read_trylock
17.59% [kernel] [k] up_read
13.64% [kernel] [k] pv_native_safe_halt
11.90% [kernel] [k] shrink_slab
8.21% [kernel] [k] idr_find
2.71% [kernel] [k] _find_next_bit
1.36% [kernel] [k] shrink_node
0.81% [kernel] [k] shrink_lruvec
0.80% [kernel] [k] __radix_tree_lookup
0.50% [kernel] [k] do_shrink_slab
0.21% [kernel] [k] list_lru_count_one
0.16% [kernel] [k] mem_cgroup_iter

2) After applying this patchset:

60.17% [kernel] [k] shrink_slab
20.42% [kernel] [k] pv_native_safe_halt
3.03% [kernel] [k] do_shrink_slab
2.73% [kernel] [k] shrink_node
2.27% [kernel] [k] shrink_lruvec
2.00% [kernel] [k] __rcu_read_unlock
1.92% [kernel] [k] mem_cgroup_iter
0.98% [kernel] [k] __rcu_read_lock
0.91% [kernel] [k] osq_lock
0.63% [kernel] [k] mem_cgroup_calculate_protection
0.55% [kernel] [k] shrinker_put
0.46% [kernel] [k] list_lru_count_one

We can see that the first perf hotspot becomes shrink_slab, which is what
we expect.

Signed-off-by: Qi Zheng <[email protected]>
---
mm/shrinker.c | 121 +++++++++++++++++++++++++++-----------------
mm/shrinker_debug.c | 4 +-
2 files changed, 76 insertions(+), 49 deletions(-)

diff --git a/mm/shrinker.c b/mm/shrinker.c
index 8e3334749552..744361afd520 100644
--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -107,6 +107,12 @@ static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg,
lockdep_is_held(&shrinker_rwsem));
}

+static struct shrinker_info *shrinker_info_rcu(struct mem_cgroup *memcg,
+ int nid)
+{
+ return rcu_dereference(memcg->nodeinfo[nid]->shrinker_info);
+}
+
static int expand_one_shrinker_info(struct mem_cgroup *memcg, int new_size,
int old_size, int new_nr_max)
{
@@ -152,11 +158,11 @@ static int expand_shrinker_info(int new_id)
int new_size, old_size = 0;
struct mem_cgroup *memcg;

+ down_write(&shrinker_rwsem);
+
if (!root_mem_cgroup)
goto out;

- lockdep_assert_held(&shrinker_rwsem);
-
new_size = shrinker_unit_size(new_nr_max);
old_size = shrinker_unit_size(shrinker_nr_max);

@@ -173,6 +179,8 @@ static int expand_shrinker_info(int new_id)
if (!ret)
shrinker_nr_max = new_nr_max;

+ up_write(&shrinker_rwsem);
+
return ret;
}

@@ -198,7 +206,7 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id)
struct shrinker_info_unit *unit;

rcu_read_lock();
- info = rcu_dereference(memcg->nodeinfo[nid]->shrinker_info);
+ info = shrinker_info_rcu(memcg, nid);
unit = info->unit[shriner_id_to_index(shrinker_id)];
if (!WARN_ON_ONCE(shrinker_id >= info->map_nr_max)) {
/* Pairs with smp mb in shrink_slab() */
@@ -211,39 +219,40 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id)

static DEFINE_IDR(shrinker_idr);

-static int prealloc_memcg_shrinker(struct shrinker *shrinker)
+static int shrinker_memcg_alloc(struct shrinker *shrinker)
{
- int id, ret = -ENOMEM;
+ int id;

if (mem_cgroup_disabled())
return -ENOSYS;

- down_write(&shrinker_rwsem);
- /* This may call shrinker, so it must use down_read_trylock() */
- id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL);
+ spin_lock(&shrinker_lock);
+ id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_ATOMIC);
+ spin_unlock(&shrinker_lock);
+
if (id < 0)
- goto unlock;
+ return -ENOMEM;

if (id >= shrinker_nr_max) {
if (expand_shrinker_info(id)) {
+ spin_lock(&shrinker_lock);
idr_remove(&shrinker_idr, id);
- goto unlock;
+ spin_unlock(&shrinker_lock);
+ return -ENOMEM;
}
}
shrinker->id = id;
- ret = 0;
-unlock:
- up_write(&shrinker_rwsem);
- return ret;
+
+ return 0;
}

-static void unregister_memcg_shrinker(struct shrinker *shrinker)
+static void shrinker_memcg_remove(struct shrinker *shrinker)
{
int id = shrinker->id;

BUG_ON(id < 0);

- lockdep_assert_held(&shrinker_rwsem);
+ lockdep_assert_held(&shrinker_lock);

idr_remove(&shrinker_idr, id);
}
@@ -253,10 +262,15 @@ static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker,
{
struct shrinker_info *info;
struct shrinker_info_unit *unit;
+ long nr_deferred;

- info = shrinker_info_protected(memcg, nid);
+ rcu_read_lock();
+ info = shrinker_info_rcu(memcg, nid);
unit = info->unit[shriner_id_to_index(shrinker->id)];
- return atomic_long_xchg(&unit->nr_deferred[shriner_id_to_offset(shrinker->id)], 0);
+ nr_deferred = atomic_long_xchg(&unit->nr_deferred[shriner_id_to_offset(shrinker->id)], 0);
+ rcu_read_unlock();
+
+ return nr_deferred;
}

static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker,
@@ -264,10 +278,16 @@ static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker,
{
struct shrinker_info *info;
struct shrinker_info_unit *unit;
+ long nr_deferred;

- info = shrinker_info_protected(memcg, nid);
+ rcu_read_lock();
+ info = shrinker_info_rcu(memcg, nid);
unit = info->unit[shriner_id_to_index(shrinker->id)];
- return atomic_long_add_return(nr, &unit->nr_deferred[shriner_id_to_offset(shrinker->id)]);
+ nr_deferred =
+ atomic_long_add_return(nr, &unit->nr_deferred[shriner_id_to_offset(shrinker->id)]);
+ rcu_read_unlock();
+
+ return nr_deferred;
}

void reparent_shrinker_deferred(struct mem_cgroup *memcg)
@@ -299,12 +319,12 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg)
up_read(&shrinker_rwsem);
}
#else
-static int prealloc_memcg_shrinker(struct shrinker *shrinker)
+static int shrinker_memcg_alloc(struct shrinker *shrinker)
{
return -ENOSYS;
}

-static void unregister_memcg_shrinker(struct shrinker *shrinker)
+static void shrinker_memcg_remove(struct shrinker *shrinker)
{
}

@@ -458,6 +478,8 @@ void shrinker_put(struct shrinker *shrinker)
if (refcount_dec_and_test(&shrinker->refcount)) {
spin_lock(&shrinker_lock);
list_del_rcu(&shrinker->list);
+ if (shrinker->flags & SHRINKER_MEMCG_AWARE)
+ shrinker_memcg_remove(shrinker);
spin_unlock(&shrinker_lock);

kfree(shrinker->nr_deferred);
@@ -476,18 +498,23 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
if (!mem_cgroup_online(memcg))
return 0;

- if (!down_read_trylock(&shrinker_rwsem))
- return 0;
-
- info = shrinker_info_protected(memcg, nid);
+again:
+ rcu_read_lock();
+ info = shrinker_info_rcu(memcg, nid);
if (unlikely(!info))
goto unlock;

- for (; index < shriner_id_to_index(info->map_nr_max); index++) {
+ if (index < shriner_id_to_index(info->map_nr_max)) {
struct shrinker_info_unit *unit;

unit = info->unit[index];

+ /*
+ * The shrinker_info_unit will not be freed, so we can
+ * safely release the RCU lock here.
+ */
+ rcu_read_unlock();
+
for_each_set_bit(offset, unit->map, SHRINKER_UNIT_BITS) {
struct shrink_control sc = {
.gfp_mask = gfp_mask,
@@ -497,13 +524,14 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
struct shrinker *shrinker;
int shrinker_id = calc_shrinker_id(index, offset);

+ rcu_read_lock();
shrinker = idr_find(&shrinker_idr, shrinker_id);
- if (unlikely(!shrinker ||
- !READ_ONCE(shrinker->registered))) {
- if (!shrinker)
- clear_bit(offset, unit->map);
+ if (unlikely(!shrinker || !shrinker_try_get(shrinker))) {
+ clear_bit(offset, unit->map);
+ rcu_read_unlock();
continue;
}
+ rcu_read_unlock();

/* Call non-slab shrinkers even though kmem is disabled */
if (!memcg_kmem_online() &&
@@ -536,15 +564,20 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
set_shrinker_bit(memcg, nid, shrinker_id);
}
freed += ret;
-
- if (rwsem_is_contended(&shrinker_rwsem)) {
- freed = freed ? : 1;
- goto unlock;
- }
+ shrinker_put(shrinker);
}
+
+ /*
+ * We have already exited the read-side of rcu critical section
+ * before calling do_shrink_slab(), the shrinker_info may be
+ * released in expand_one_shrinker_info(), so reacquire the
+ * shrinker_info.
+ */
+ index++;
+ goto again;
}
unlock:
- up_read(&shrinker_rwsem);
+ rcu_read_unlock();
return freed;
}
#else /* !CONFIG_MEMCG */
@@ -652,7 +685,7 @@ struct shrinker *shrinker_alloc(unsigned int flags, const char *fmt, ...)
shrinker->flags = flags;

if (flags & SHRINKER_MEMCG_AWARE) {
- err = prealloc_memcg_shrinker(shrinker);
+ err = shrinker_memcg_alloc(shrinker);
if (err == -ENOSYS)
shrinker->flags &= ~SHRINKER_MEMCG_AWARE;
else if (err == 0)
@@ -697,9 +730,9 @@ void shrinker_free_non_registered(struct shrinker *shrinker)
shrinker->name = NULL;
#endif
if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
- down_write(&shrinker_rwsem);
- unregister_memcg_shrinker(shrinker);
- up_write(&shrinker_rwsem);
+ spin_lock(&shrinker_lock);
+ shrinker_memcg_remove(shrinker);
+ spin_unlock(&shrinker_lock);
}

kfree(shrinker->nr_deferred);
@@ -731,13 +764,7 @@ void shrinker_unregister(struct shrinker *shrinker)
return;

WRITE_ONCE(shrinker->registered, false);
-
- down_write(&shrinker_rwsem);
- if (shrinker->flags & SHRINKER_MEMCG_AWARE)
- unregister_memcg_shrinker(shrinker);
debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id);
- up_write(&shrinker_rwsem);
-
shrinker_debugfs_remove(debugfs_entry, debugfs_id);

shrinker_put(shrinker);
diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
index c5573066adbf..badda35464c3 100644
--- a/mm/shrinker_debug.c
+++ b/mm/shrinker_debug.c
@@ -246,8 +246,7 @@ struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker,
{
struct dentry *entry = shrinker->debugfs_entry;

- lockdep_assert_held(&shrinker_rwsem);
-
+ down_write(&shrinker_rwsem);
kfree_const(shrinker->name);
shrinker->name = NULL;

@@ -258,6 +257,7 @@ struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker,
*/
smp_wmb();
shrinker->debugfs_entry = NULL;
+ up_write(&shrinker_rwsem);

return entry;
}
--
2.30.2


2023-07-24 10:02:59

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 38/47] xfs: dynamically allocate the xfs-qm shrinker

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the xfs-qm shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct xfs_quotainfo.

Signed-off-by: Qi Zheng <[email protected]>
---
fs/xfs/xfs_qm.c | 26 +++++++++++++-------------
fs/xfs/xfs_qm.h | 2 +-
2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 6abcc34fafd8..8f1216e1efc1 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -504,8 +504,7 @@ xfs_qm_shrink_scan(
struct shrinker *shrink,
struct shrink_control *sc)
{
- struct xfs_quotainfo *qi = container_of(shrink,
- struct xfs_quotainfo, qi_shrinker);
+ struct xfs_quotainfo *qi = shrink->private_data;
struct xfs_qm_isolate isol;
unsigned long freed;
int error;
@@ -539,8 +538,7 @@ xfs_qm_shrink_count(
struct shrinker *shrink,
struct shrink_control *sc)
{
- struct xfs_quotainfo *qi = container_of(shrink,
- struct xfs_quotainfo, qi_shrinker);
+ struct xfs_quotainfo *qi = shrink->private_data;

return list_lru_shrink_count(&qi->qi_lru, sc);
}
@@ -680,16 +678,18 @@ xfs_qm_init_quotainfo(
if (XFS_IS_PQUOTA_ON(mp))
xfs_qm_set_defquota(mp, XFS_DQTYPE_PROJ, qinf);

- qinf->qi_shrinker.count_objects = xfs_qm_shrink_count;
- qinf->qi_shrinker.scan_objects = xfs_qm_shrink_scan;
- qinf->qi_shrinker.seeks = DEFAULT_SEEKS;
- qinf->qi_shrinker.flags = SHRINKER_NUMA_AWARE;
-
- error = register_shrinker(&qinf->qi_shrinker, "xfs-qm:%s",
- mp->m_super->s_id);
- if (error)
+ qinf->qi_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE, "xfs-qm:%s",
+ mp->m_super->s_id);
+ if (!qinf->qi_shrinker)
goto out_free_inos;

+ qinf->qi_shrinker->count_objects = xfs_qm_shrink_count;
+ qinf->qi_shrinker->scan_objects = xfs_qm_shrink_scan;
+ qinf->qi_shrinker->seeks = DEFAULT_SEEKS;
+ qinf->qi_shrinker->private_data = qinf;
+
+ shrinker_register(qinf->qi_shrinker);
+
return 0;

out_free_inos:
@@ -718,7 +718,7 @@ xfs_qm_destroy_quotainfo(
qi = mp->m_quotainfo;
ASSERT(qi != NULL);

- unregister_shrinker(&qi->qi_shrinker);
+ shrinker_unregister(qi->qi_shrinker);
list_lru_destroy(&qi->qi_lru);
xfs_qm_destroy_quotainos(qi);
mutex_destroy(&qi->qi_tree_lock);
diff --git a/fs/xfs/xfs_qm.h b/fs/xfs/xfs_qm.h
index 9683f0457d19..d5c9fc4ba591 100644
--- a/fs/xfs/xfs_qm.h
+++ b/fs/xfs/xfs_qm.h
@@ -63,7 +63,7 @@ struct xfs_quotainfo {
struct xfs_def_quota qi_usr_default;
struct xfs_def_quota qi_grp_default;
struct xfs_def_quota qi_prj_default;
- struct shrinker qi_shrinker;
+ struct shrinker *qi_shrinker;

/* Minimum and maximum quota expiration timestamp values. */
time64_t qi_expiry_min;
--
2.30.2


2023-07-24 10:06:38

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 32/47] ext4: dynamically allocate the ext4-es shrinker

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the ext4-es shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct ext4_sb_info.

Signed-off-by: Qi Zheng <[email protected]>
---
fs/ext4/ext4.h | 2 +-
fs/ext4/extents_status.c | 22 ++++++++++++----------
2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0a2d55faa095..1bd150d454f5 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1651,7 +1651,7 @@ struct ext4_sb_info {
__u32 s_csum_seed;

/* Reclaim extents from extent status tree */
- struct shrinker s_es_shrinker;
+ struct shrinker *s_es_shrinker;
struct list_head s_es_list; /* List of inodes with reclaimable extents */
long s_es_nr_inode;
struct ext4_es_stats s_es_stats;
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 9b5b8951afb4..8d4a959dd32f 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -1596,7 +1596,7 @@ static unsigned long ext4_es_count(struct shrinker *shrink,
unsigned long nr;
struct ext4_sb_info *sbi;

- sbi = container_of(shrink, struct ext4_sb_info, s_es_shrinker);
+ sbi = shrink->private_data;
nr = percpu_counter_read_positive(&sbi->s_es_stats.es_stats_shk_cnt);
trace_ext4_es_shrink_count(sbi->s_sb, sc->nr_to_scan, nr);
return nr;
@@ -1605,8 +1605,7 @@ static unsigned long ext4_es_count(struct shrinker *shrink,
static unsigned long ext4_es_scan(struct shrinker *shrink,
struct shrink_control *sc)
{
- struct ext4_sb_info *sbi = container_of(shrink,
- struct ext4_sb_info, s_es_shrinker);
+ struct ext4_sb_info *sbi = shrink->private_data;
int nr_to_scan = sc->nr_to_scan;
int ret, nr_shrunk;

@@ -1690,14 +1689,17 @@ int ext4_es_register_shrinker(struct ext4_sb_info *sbi)
if (err)
goto err3;

- sbi->s_es_shrinker.scan_objects = ext4_es_scan;
- sbi->s_es_shrinker.count_objects = ext4_es_count;
- sbi->s_es_shrinker.seeks = DEFAULT_SEEKS;
- err = register_shrinker(&sbi->s_es_shrinker, "ext4-es:%s",
- sbi->s_sb->s_id);
- if (err)
+ sbi->s_es_shrinker = shrinker_alloc(0, "ext4-es:%s", sbi->s_sb->s_id);
+ if (!sbi->s_es_shrinker)
goto err4;

+ sbi->s_es_shrinker->scan_objects = ext4_es_scan;
+ sbi->s_es_shrinker->count_objects = ext4_es_count;
+ sbi->s_es_shrinker->seeks = DEFAULT_SEEKS;
+ sbi->s_es_shrinker->private_data = sbi;
+
+ shrinker_register(sbi->s_es_shrinker);
+
return 0;
err4:
percpu_counter_destroy(&sbi->s_es_stats.es_stats_shk_cnt);
@@ -1716,7 +1718,7 @@ void ext4_es_unregister_shrinker(struct ext4_sb_info *sbi)
percpu_counter_destroy(&sbi->s_es_stats.es_stats_cache_misses);
percpu_counter_destroy(&sbi->s_es_stats.es_stats_all_cnt);
percpu_counter_destroy(&sbi->s_es_stats.es_stats_shk_cnt);
- unregister_shrinker(&sbi->s_es_shrinker);
+ shrinker_unregister(sbi->s_es_shrinker);
}

/*
--
2.30.2


2023-07-24 10:12:51

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 39/47] zsmalloc: dynamically allocate the mm-zspool shrinker

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the mm-zspool shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct zs_pool.

Signed-off-by: Qi Zheng <[email protected]>
---
mm/zsmalloc.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 32f5bc4074df..bbbffe313318 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -229,7 +229,7 @@ struct zs_pool {
struct zs_pool_stats stats;

/* Compact classes */
- struct shrinker shrinker;
+ struct shrinker *shrinker;

#ifdef CONFIG_ZSMALLOC_STAT
struct dentry *stat_dentry;
@@ -2082,8 +2082,7 @@ static unsigned long zs_shrinker_scan(struct shrinker *shrinker,
struct shrink_control *sc)
{
unsigned long pages_freed;
- struct zs_pool *pool = container_of(shrinker, struct zs_pool,
- shrinker);
+ struct zs_pool *pool = shrinker->private_data;

/*
* Compact classes and calculate compaction delta.
@@ -2101,8 +2100,7 @@ static unsigned long zs_shrinker_count(struct shrinker *shrinker,
int i;
struct size_class *class;
unsigned long pages_to_free = 0;
- struct zs_pool *pool = container_of(shrinker, struct zs_pool,
- shrinker);
+ struct zs_pool *pool = shrinker->private_data;

for (i = ZS_SIZE_CLASSES - 1; i >= 0; i--) {
class = pool->size_class[i];
@@ -2117,18 +2115,24 @@ static unsigned long zs_shrinker_count(struct shrinker *shrinker,

static void zs_unregister_shrinker(struct zs_pool *pool)
{
- unregister_shrinker(&pool->shrinker);
+ shrinker_unregister(pool->shrinker);
}

static int zs_register_shrinker(struct zs_pool *pool)
{
- pool->shrinker.scan_objects = zs_shrinker_scan;
- pool->shrinker.count_objects = zs_shrinker_count;
- pool->shrinker.batch = 0;
- pool->shrinker.seeks = DEFAULT_SEEKS;
+ pool->shrinker = shrinker_alloc(0, "mm-zspool:%s", pool->name);
+ if (!pool->shrinker)
+ return -ENOMEM;
+
+ pool->shrinker->scan_objects = zs_shrinker_scan;
+ pool->shrinker->count_objects = zs_shrinker_count;
+ pool->shrinker->batch = 0;
+ pool->shrinker->seeks = DEFAULT_SEEKS;
+ pool->shrinker->private_data = pool;

- return register_shrinker(&pool->shrinker, "mm-zspool:%s",
- pool->name);
+ shrinker_register(pool->shrinker);
+
+ return 0;
}

static int calculate_zspage_chain_size(int class_size)
--
2.30.2


2023-07-24 10:34:36

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 31/47] mbcache: dynamically allocate the mbcache shrinker

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the mbcache shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct mb_cache.

Signed-off-by: Qi Zheng <[email protected]>
---
fs/mbcache.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/fs/mbcache.c b/fs/mbcache.c
index 2a4b8b549e93..bfecebeec828 100644
--- a/fs/mbcache.c
+++ b/fs/mbcache.c
@@ -37,7 +37,7 @@ struct mb_cache {
struct list_head c_list;
/* Number of entries in cache */
unsigned long c_entry_count;
- struct shrinker c_shrink;
+ struct shrinker *c_shrink;
/* Work for shrinking when the cache has too many entries */
struct work_struct c_shrink_work;
};
@@ -293,8 +293,7 @@ EXPORT_SYMBOL(mb_cache_entry_touch);
static unsigned long mb_cache_count(struct shrinker *shrink,
struct shrink_control *sc)
{
- struct mb_cache *cache = container_of(shrink, struct mb_cache,
- c_shrink);
+ struct mb_cache *cache = shrink->private_data;

return cache->c_entry_count;
}
@@ -333,8 +332,7 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache,
static unsigned long mb_cache_scan(struct shrinker *shrink,
struct shrink_control *sc)
{
- struct mb_cache *cache = container_of(shrink, struct mb_cache,
- c_shrink);
+ struct mb_cache *cache = shrink->private_data;
return mb_cache_shrink(cache, sc->nr_to_scan);
}

@@ -377,15 +375,20 @@ struct mb_cache *mb_cache_create(int bucket_bits)
for (i = 0; i < bucket_count; i++)
INIT_HLIST_BL_HEAD(&cache->c_hash[i]);

- cache->c_shrink.count_objects = mb_cache_count;
- cache->c_shrink.scan_objects = mb_cache_scan;
- cache->c_shrink.seeks = DEFAULT_SEEKS;
- if (register_shrinker(&cache->c_shrink, "mbcache-shrinker")) {
+ cache->c_shrink = shrinker_alloc(0, "mbcache-shrinker");
+ if (!cache->c_shrink) {
kfree(cache->c_hash);
kfree(cache);
goto err_out;
}

+ cache->c_shrink->count_objects = mb_cache_count;
+ cache->c_shrink->scan_objects = mb_cache_scan;
+ cache->c_shrink->seeks = DEFAULT_SEEKS;
+ cache->c_shrink->private_data = cache;
+
+ shrinker_register(cache->c_shrink);
+
INIT_WORK(&cache->c_shrink_work, mb_cache_shrink_worker);

return cache;
@@ -406,7 +409,7 @@ void mb_cache_destroy(struct mb_cache *cache)
{
struct mb_cache_entry *entry, *next;

- unregister_shrinker(&cache->c_shrink);
+ shrinker_unregister(cache->c_shrink);

/*
* We don't bother with any locking. Cache must not be used at this
--
2.30.2


2023-07-24 10:35:49

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 34/47] nfsd: dynamically allocate the nfsd-client shrinker

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the nfsd-client shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct nfsd_net.

Acked-by: Chuck Lever <[email protected]>
Signed-off-by: Qi Zheng <[email protected]>
---
fs/nfsd/netns.h | 2 +-
fs/nfsd/nfs4state.c | 20 ++++++++++++--------
2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index ec49b200b797..f669444d5336 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -195,7 +195,7 @@ struct nfsd_net {
int nfs4_max_clients;

atomic_t nfsd_courtesy_clients;
- struct shrinker nfsd_client_shrinker;
+ struct shrinker *nfsd_client_shrinker;
struct work_struct nfsd_shrinker_work;
};

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 3339177f8e2f..c7a4616cd866 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4388,8 +4388,7 @@ static unsigned long
nfsd4_state_shrinker_count(struct shrinker *shrink, struct shrink_control *sc)
{
int count;
- struct nfsd_net *nn = container_of(shrink,
- struct nfsd_net, nfsd_client_shrinker);
+ struct nfsd_net *nn = shrink->private_data;

count = atomic_read(&nn->nfsd_courtesy_clients);
if (!count)
@@ -8125,12 +8124,17 @@ static int nfs4_state_create_net(struct net *net)
INIT_WORK(&nn->nfsd_shrinker_work, nfsd4_state_shrinker_worker);
get_net(net);

- nn->nfsd_client_shrinker.scan_objects = nfsd4_state_shrinker_scan;
- nn->nfsd_client_shrinker.count_objects = nfsd4_state_shrinker_count;
- nn->nfsd_client_shrinker.seeks = DEFAULT_SEEKS;
-
- if (register_shrinker(&nn->nfsd_client_shrinker, "nfsd-client"))
+ nn->nfsd_client_shrinker = shrinker_alloc(0, "nfsd-client");
+ if (!nn->nfsd_client_shrinker)
goto err_shrinker;
+
+ nn->nfsd_client_shrinker->scan_objects = nfsd4_state_shrinker_scan;
+ nn->nfsd_client_shrinker->count_objects = nfsd4_state_shrinker_count;
+ nn->nfsd_client_shrinker->seeks = DEFAULT_SEEKS;
+ nn->nfsd_client_shrinker->private_data = nn;
+
+ shrinker_register(nn->nfsd_client_shrinker);
+
return 0;

err_shrinker:
@@ -8228,7 +8232,7 @@ nfs4_state_shutdown_net(struct net *net)
struct list_head *pos, *next, reaplist;
struct nfsd_net *nn = net_generic(net, nfsd_net_id);

- unregister_shrinker(&nn->nfsd_client_shrinker);
+ shrinker_unregister(nn->nfsd_client_shrinker);
cancel_work(&nn->nfsd_shrinker_work);
cancel_delayed_work_sync(&nn->laundromat_work);
locks_end_grace(&nn->nfsd4_manager);
--
2.30.2


2023-07-24 10:36:16

by Qi Zheng

[permalink] [raw]
Subject: [PATCH v2 47/47] mm: shrinker: convert shrinker_rwsem to mutex

Now there are no readers of shrinker_rwsem, so we can simply replace it
with mutex lock.

Signed-off-by: Qi Zheng <[email protected]>
---
drivers/md/dm-cache-metadata.c | 2 +-
fs/super.c | 2 +-
mm/shrinker.c | 16 ++++++++--------
mm/shrinker_debug.c | 14 +++++++-------
4 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c
index acffed750e3e..9e0c69958587 100644
--- a/drivers/md/dm-cache-metadata.c
+++ b/drivers/md/dm-cache-metadata.c
@@ -1828,7 +1828,7 @@ int dm_cache_metadata_abort(struct dm_cache_metadata *cmd)
* Replacement block manager (new_bm) is created and old_bm destroyed outside of
* cmd root_lock to avoid ABBA deadlock that would result (due to life-cycle of
* shrinker associated with the block manager's bufio client vs cmd root_lock).
- * - must take shrinker_rwsem without holding cmd->root_lock
+ * - must take shrinker_mutex without holding cmd->root_lock
*/
new_bm = dm_block_manager_create(cmd->bdev, DM_CACHE_METADATA_BLOCK_SIZE << SECTOR_SHIFT,
CACHE_MAX_CONCURRENT_LOCKS);
diff --git a/fs/super.c b/fs/super.c
index 04643fd80886..602cf54eb7da 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -54,7 +54,7 @@ static char *sb_writers_name[SB_FREEZE_LEVELS] = {
* One thing we have to be careful of with a per-sb shrinker is that we don't
* drop the last active reference to the superblock from within the shrinker.
* If that happens we could trigger unregistering the shrinker from within the
- * shrinker path and that leads to deadlock on the shrinker_rwsem. Hence we
+ * shrinker path and that leads to deadlock on the shrinker_mutex. Hence we
* take a passive reference to the superblock to avoid this from occurring.
*/
static unsigned long super_cache_scan(struct shrinker *shrink,
diff --git a/mm/shrinker.c b/mm/shrinker.c
index 90c045620fe3..5c4546d2c234 100644
--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -7,7 +7,7 @@
#include <trace/events/vmscan.h>

LIST_HEAD(shrinker_list);
-DECLARE_RWSEM(shrinker_rwsem);
+DEFINE_MUTEX(shrinker_mutex);
DEFINE_SPINLOCK(shrinker_lock);

#ifdef CONFIG_MEMCG
@@ -80,7 +80,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
int nid, ret = 0;
int array_size = 0;

- down_write(&shrinker_rwsem);
+ mutex_lock(&shrinker_mutex);
array_size = shrinker_unit_size(shrinker_nr_max);
for_each_node(nid) {
info = kvzalloc_node(sizeof(*info) + array_size, GFP_KERNEL, nid);
@@ -91,7 +91,7 @@ int alloc_shrinker_info(struct mem_cgroup *memcg)
goto err;
rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
}
- up_write(&shrinker_rwsem);
+ mutex_unlock(&shrinker_mutex);

return ret;

@@ -104,7 +104,7 @@ static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg,
int nid)
{
return rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info,
- lockdep_is_held(&shrinker_rwsem));
+ lockdep_is_held(&shrinker_mutex));
}

static struct shrinker_info *shrinker_info_rcu(struct mem_cgroup *memcg,
@@ -158,7 +158,7 @@ static int expand_shrinker_info(int new_id)
int new_size, old_size = 0;
struct mem_cgroup *memcg;

- down_write(&shrinker_rwsem);
+ mutex_lock(&shrinker_mutex);

if (!root_mem_cgroup)
goto out;
@@ -179,7 +179,7 @@ static int expand_shrinker_info(int new_id)
if (!ret)
shrinker_nr_max = new_nr_max;

- up_write(&shrinker_rwsem);
+ mutex_unlock(&shrinker_mutex);

return ret;
}
@@ -303,7 +303,7 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg)
parent = root_mem_cgroup;

/* Prevent from concurrent shrinker_info expand */
- down_write(&shrinker_rwsem);
+ mutex_lock(&shrinker_mutex);
for_each_node(nid) {
child_info = shrinker_info_protected(memcg, nid);
parent_info = shrinker_info_protected(parent, nid);
@@ -316,7 +316,7 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg)
}
}
}
- up_write(&shrinker_rwsem);
+ mutex_unlock(&shrinker_mutex);
}
#else
static int shrinker_memcg_alloc(struct shrinker *shrinker)
diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
index badda35464c3..44b620b1919d 100644
--- a/mm/shrinker_debug.c
+++ b/mm/shrinker_debug.c
@@ -8,7 +8,7 @@
#include <linux/rculist.h>

/* defined in vmscan.c */
-extern struct rw_semaphore shrinker_rwsem;
+extern struct mutex shrinker_mutex;
extern struct list_head shrinker_list;

static DEFINE_IDA(shrinker_debugfs_ida);
@@ -168,7 +168,7 @@ int shrinker_debugfs_add(struct shrinker *shrinker)
if (!shrinker_debugfs_root)
return 0;

- down_write(&shrinker_rwsem);
+ mutex_lock(&shrinker_mutex);
if (shrinker->debugfs_entry)
goto fail;

@@ -196,7 +196,7 @@ int shrinker_debugfs_add(struct shrinker *shrinker)
&shrinker_debugfs_scan_fops);

fail:
- up_write(&shrinker_rwsem);
+ mutex_unlock(&shrinker_mutex);
return ret;
}

@@ -215,7 +215,7 @@ int shrinker_debugfs_rename(struct shrinker *shrinker, const char *fmt, ...)
if (!new)
return -ENOMEM;

- down_write(&shrinker_rwsem);
+ mutex_lock(&shrinker_mutex);

old = shrinker->name;
shrinker->name = new;
@@ -233,7 +233,7 @@ int shrinker_debugfs_rename(struct shrinker *shrinker, const char *fmt, ...)
shrinker->debugfs_entry = entry;
}

- up_write(&shrinker_rwsem);
+ mutex_unlock(&shrinker_mutex);

kfree_const(old);

@@ -246,7 +246,7 @@ struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker,
{
struct dentry *entry = shrinker->debugfs_entry;

- down_write(&shrinker_rwsem);
+ mutex_lock(&shrinker_mutex);
kfree_const(shrinker->name);
shrinker->name = NULL;

@@ -257,7 +257,7 @@ struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker,
*/
smp_wmb();
shrinker->debugfs_entry = NULL;
- up_write(&shrinker_rwsem);
+ mutex_unlock(&shrinker_mutex);

return entry;
}
--
2.30.2


2023-07-24 12:36:29

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 03/47] mm: shrinker: add infrastructure for dynamically allocating shrinker

On Mon, Jul 24, 2023 at 05:43:10PM +0800, Qi Zheng wrote:

> +void shrinker_unregister(struct shrinker *shrinker)
> +{
> + struct dentry *debugfs_entry;
> + int debugfs_id;
> +
> + if (!shrinker || !(shrinker->flags & SHRINKER_REGISTERED))
> + return;
> +
> + down_write(&shrinker_rwsem);
> + list_del(&shrinker->list);
> + shrinker->flags &= ~SHRINKER_REGISTERED;
> + if (shrinker->flags & SHRINKER_MEMCG_AWARE)
> + unregister_memcg_shrinker(shrinker);
> + debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id);
> + up_write(&shrinker_rwsem);
> +
> + shrinker_debugfs_remove(debugfs_entry, debugfs_id);

Should there not be an rcu_barrier() right about here?

> +
> + kfree(shrinker->nr_deferred);
> + shrinker->nr_deferred = NULL;
> +
> + kfree(shrinker);
> +}
> +EXPORT_SYMBOL(shrinker_unregister);


2023-07-25 03:28:49

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v2 01/47] mm: vmscan: move shrinker-related code into a separate file



> On Jul 24, 2023, at 17:43, Qi Zheng <[email protected]> wrote:
>
> The mm/vmscan.c file is too large, so separate the shrinker-related
> code from it into a separate file. No functional changes.
>
> Signed-off-by: Qi Zheng <[email protected]>
> ---
> include/linux/shrinker.h | 3 +
> mm/Makefile | 4 +-
> mm/shrinker.c | 707 +++++++++++++++++++++++++++++++++++++++
> mm/vmscan.c | 701 --------------------------------------
> 4 files changed, 712 insertions(+), 703 deletions(-)
> create mode 100644 mm/shrinker.c
>
> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> index 224293b2dd06..961cb84e51f5 100644
> --- a/include/linux/shrinker.h
> +++ b/include/linux/shrinker.h
> @@ -96,6 +96,9 @@ struct shrinker {
> */
> #define SHRINKER_NONSLAB (1 << 3)
>
> +unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
> + int priority);

A good cleanup, vmscan.c is so huge.

I'd like to introduce a new header in mm/ directory and contains those
declarations of functions (like this and other debug function in
shrinker_debug.c) since they are used internally across mm.

Thanks.


2023-07-25 03:34:10

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v2 02/47] mm: shrinker: remove redundant shrinker_rwsem in debugfs operations



> On Jul 24, 2023, at 17:43, Qi Zheng <[email protected]> wrote:
>
> The debugfs_remove_recursive() will wait for debugfs_file_put() to return,
> so the shrinker will not be freed when doing debugfs operations (such as
> shrinker_debugfs_count_show() and shrinker_debugfs_scan_write()), so there
> is no need to hold shrinker_rwsem during debugfs operations.
>
> Signed-off-by: Qi Zheng <[email protected]>

Reviewed-by: Muchun Song <[email protected]>

Thanks.



2023-07-25 03:34:40

by Qi Zheng

[permalink] [raw]
Subject: Re: [PATCH v2 01/47] mm: vmscan: move shrinker-related code into a separate file



On 2023/7/25 10:35, Muchun Song wrote:
>
>
>> On Jul 24, 2023, at 17:43, Qi Zheng <[email protected]> wrote:
>>
>> The mm/vmscan.c file is too large, so separate the shrinker-related
>> code from it into a separate file. No functional changes.
>>
>> Signed-off-by: Qi Zheng <[email protected]>
>> ---
>> include/linux/shrinker.h | 3 +
>> mm/Makefile | 4 +-
>> mm/shrinker.c | 707 +++++++++++++++++++++++++++++++++++++++
>> mm/vmscan.c | 701 --------------------------------------
>> 4 files changed, 712 insertions(+), 703 deletions(-)
>> create mode 100644 mm/shrinker.c
>>
>> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
>> index 224293b2dd06..961cb84e51f5 100644
>> --- a/include/linux/shrinker.h
>> +++ b/include/linux/shrinker.h
>> @@ -96,6 +96,9 @@ struct shrinker {
>> */
>> #define SHRINKER_NONSLAB (1 << 3)
>>
>> +unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
>> + int priority);
>
> A good cleanup, vmscan.c is so huge.
>
> I'd like to introduce a new header in mm/ directory and contains those
> declarations of functions (like this and other debug function in
> shrinker_debug.c) since they are used internally across mm.

How about putting them in the mm/internal.h file?

>
> Thanks.
>

2023-07-25 03:35:27

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v2 01/47] mm: vmscan: move shrinker-related code into a separate file



> On Jul 25, 2023, at 11:09, Qi Zheng <[email protected]> wrote:
>
>
>
> On 2023/7/25 10:35, Muchun Song wrote:
>>> On Jul 24, 2023, at 17:43, Qi Zheng <[email protected]> wrote:
>>>
>>> The mm/vmscan.c file is too large, so separate the shrinker-related
>>> code from it into a separate file. No functional changes.
>>>
>>> Signed-off-by: Qi Zheng <[email protected]>
>>> ---
>>> include/linux/shrinker.h | 3 +
>>> mm/Makefile | 4 +-
>>> mm/shrinker.c | 707 +++++++++++++++++++++++++++++++++++++++
>>> mm/vmscan.c | 701 --------------------------------------
>>> 4 files changed, 712 insertions(+), 703 deletions(-)
>>> create mode 100644 mm/shrinker.c
>>>
>>> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
>>> index 224293b2dd06..961cb84e51f5 100644
>>> --- a/include/linux/shrinker.h
>>> +++ b/include/linux/shrinker.h
>>> @@ -96,6 +96,9 @@ struct shrinker {
>>> */
>>> #define SHRINKER_NONSLAB (1 << 3)
>>>
>>> +unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
>>> + int priority);
>> A good cleanup, vmscan.c is so huge.
>> I'd like to introduce a new header in mm/ directory and contains those
>> declarations of functions (like this and other debug function in
>> shrinker_debug.c) since they are used internally across mm.
>
> How about putting them in the mm/internal.h file?

Either is fine to me.

>
>> Thanks.



2023-07-25 03:56:21

by Qi Zheng

[permalink] [raw]
Subject: Re: [PATCH v2 01/47] mm: vmscan: move shrinker-related code into a separate file



On 2023/7/25 11:23, Muchun Song wrote:
>
>
>> On Jul 25, 2023, at 11:09, Qi Zheng <[email protected]> wrote:
>>
>>
>>
>> On 2023/7/25 10:35, Muchun Song wrote:
>>>> On Jul 24, 2023, at 17:43, Qi Zheng <[email protected]> wrote:
>>>>
>>>> The mm/vmscan.c file is too large, so separate the shrinker-related
>>>> code from it into a separate file. No functional changes.
>>>>
>>>> Signed-off-by: Qi Zheng <[email protected]>
>>>> ---
>>>> include/linux/shrinker.h | 3 +
>>>> mm/Makefile | 4 +-
>>>> mm/shrinker.c | 707 +++++++++++++++++++++++++++++++++++++++
>>>> mm/vmscan.c | 701 --------------------------------------
>>>> 4 files changed, 712 insertions(+), 703 deletions(-)
>>>> create mode 100644 mm/shrinker.c
>>>>
>>>> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
>>>> index 224293b2dd06..961cb84e51f5 100644
>>>> --- a/include/linux/shrinker.h
>>>> +++ b/include/linux/shrinker.h
>>>> @@ -96,6 +96,9 @@ struct shrinker {
>>>> */
>>>> #define SHRINKER_NONSLAB (1 << 3)
>>>>
>>>> +unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
>>>> + int priority);
>>> A good cleanup, vmscan.c is so huge.
>>> I'd like to introduce a new header in mm/ directory and contains those
>>> declarations of functions (like this and other debug function in
>>> shrinker_debug.c) since they are used internally across mm.
>>
>> How about putting them in the mm/internal.h file?
>
> Either is fine to me.

OK, will do in the next version.

>
>>
>>> Thanks.
>
>

2023-07-25 09:22:10

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v2 03/47] mm: shrinker: add infrastructure for dynamically allocating shrinker



On 2023/7/24 17:43, Qi Zheng wrote:
> Currently, the shrinker instances can be divided into the following three
> types:
>
> a) global shrinker instance statically defined in the kernel, such as
> workingset_shadow_shrinker.
>
> b) global shrinker instance statically defined in the kernel modules, such
> as mmu_shrinker in x86.
>
> c) shrinker instance embedded in other structures.
>
> For case a, the memory of shrinker instance is never freed. For case b,
> the memory of shrinker instance will be freed after synchronize_rcu() when
> the module is unloaded. For case c, the memory of shrinker instance will
> be freed along with the structure it is embedded in.
>
> In preparation for implementing lockless slab shrink, we need to
> dynamically allocate those shrinker instances in case c, then the memory
> can be dynamically freed alone by calling kfree_rcu().
>
> So this commit adds the following new APIs for dynamically allocating
> shrinker, and add a private_data field to struct shrinker to record and
> get the original embedded structure.
>
> 1. shrinker_alloc()
>
> Used to allocate shrinker instance itself and related memory, it will
> return a pointer to the shrinker instance on success and NULL on failure.
>
> 2. shrinker_free_non_registered()
>
> Used to destroy the non-registered shrinker instance.

At least I don't like this name. I know you want to tell others
this function only should be called when shrinker has not been
registed but allocated. Maybe shrinker_free() is more simple.
And and a comment to tell the users when to use it.

>
> 3. shrinker_register()
>
> Used to register the shrinker instance, which is same as the current
> register_shrinker_prepared().
>
> 4. shrinker_unregister()
>
> Used to unregister and free the shrinker instance.
>
> In order to simplify shrinker-related APIs and make shrinker more
> independent of other kernel mechanisms, subsequent submissions will use
> the above API to convert all shrinkers (including case a and b) to
> dynamically allocated, and then remove all existing APIs.
>
> This will also have another advantage mentioned by Dave Chinner:
>
> ```
> The other advantage of this is that it will break all the existing
> out of tree code and third party modules using the old API and will
> no longer work with a kernel using lockless slab shrinkers. They
> need to break (both at the source and binary levels) to stop bad
> things from happening due to using uncoverted shrinkers in the new
> setup.
> ```
>
> Signed-off-by: Qi Zheng <[email protected]>
> ---
> include/linux/shrinker.h | 6 +++
> mm/shrinker.c | 113 +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 119 insertions(+)
>
> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> index 961cb84e51f5..296f5e163861 100644
> --- a/include/linux/shrinker.h
> +++ b/include/linux/shrinker.h
> @@ -70,6 +70,8 @@ struct shrinker {
> int seeks; /* seeks to recreate an obj */
> unsigned flags;
>
> + void *private_data;
> +
> /* These are for internal use */
> struct list_head list;
> #ifdef CONFIG_MEMCG
> @@ -98,6 +100,10 @@ struct shrinker {
>
> unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
> int priority);
> +struct shrinker *shrinker_alloc(unsigned int flags, const char *fmt, ...);
> +void shrinker_free_non_registered(struct shrinker *shrinker);
> +void shrinker_register(struct shrinker *shrinker);
> +void shrinker_unregister(struct shrinker *shrinker);
>
> extern int __printf(2, 3) prealloc_shrinker(struct shrinker *shrinker,
> const char *fmt, ...);
> diff --git a/mm/shrinker.c b/mm/shrinker.c
> index 0a32ef42f2a7..d820e4cc5806 100644
> --- a/mm/shrinker.c
> +++ b/mm/shrinker.c
> @@ -548,6 +548,119 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
> return freed;
> }
>
> +struct shrinker *shrinker_alloc(unsigned int flags, const char *fmt, ...)
> +{
> + struct shrinker *shrinker;
> + unsigned int size;
> + va_list __maybe_unused ap;
> + int err;
> +
> + shrinker = kzalloc(sizeof(struct shrinker), GFP_KERNEL);
> + if (!shrinker)
> + return NULL;
> +
> +#ifdef CONFIG_SHRINKER_DEBUG
> + va_start(ap, fmt);
> + shrinker->name = kvasprintf_const(GFP_KERNEL, fmt, ap);
> + va_end(ap);
> + if (!shrinker->name)
> + goto err_name;
> +#endif

So why not introduce another helper to handle this and declare it
as a void function when !CONFIG_SHRINKER_DEBUG? Something like the
following:

#ifdef CONFIG_SHRINKER_DEBUG
static int shrinker_debugfs_name_alloc(struct shrinker *shrinker, const
char *fmt,
                                       va_list vargs)

{
    shrinker->name = kvasprintf_const(GFP_KERNEL, fmt, vargs);
    return shrinker->name ? 0 : -ENOMEM;
}
#else
static int shrinker_debugfs_name_alloc(struct shrinker *shrinker, const
char *fmt,
                                       va_list vargs)
{
    return 0;
}
#endif

> + shrinker->flags = flags;
> +
> + if (flags & SHRINKER_MEMCG_AWARE) {
> + err = prealloc_memcg_shrinker(shrinker);
> + if (err == -ENOSYS)
> + shrinker->flags &= ~SHRINKER_MEMCG_AWARE;
> + else if (err == 0)
> + goto done;
> + else
> + goto err_flags;
> + }
> +
> + /*
> + * The nr_deferred is available on per memcg level for memcg aware
> + * shrinkers, so only allocate nr_deferred in the following cases:
> + * - non memcg aware shrinkers
> + * - !CONFIG_MEMCG
> + * - memcg is disabled by kernel command line
> + */
> + size = sizeof(*shrinker->nr_deferred);
> + if (flags & SHRINKER_NUMA_AWARE)
> + size *= nr_node_ids;
> +
> + shrinker->nr_deferred = kzalloc(size, GFP_KERNEL);
> + if (!shrinker->nr_deferred)
> + goto err_flags;
> +
> +done:
> + return shrinker;
> +
> +err_flags:
> +#ifdef CONFIG_SHRINKER_DEBUG
> + kfree_const(shrinker->name);
> + shrinker->name = NULL;

This could be shrinker_debugfs_name_free()

> +err_name:
> +#endif
> + kfree(shrinker);
> + return NULL;
> +}
> +EXPORT_SYMBOL(shrinker_alloc);
> +
> +void shrinker_free_non_registered(struct shrinker *shrinker)
> +{
> +#ifdef CONFIG_SHRINKER_DEBUG
> + kfree_const(shrinker->name);
> + shrinker->name = NULL;

This could be shrinker_debugfs_name_free()

> +#endif
> + if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
> + down_write(&shrinker_rwsem);
> + unregister_memcg_shrinker(shrinker);
> + up_write(&shrinker_rwsem);
> + }
> +
> + kfree(shrinker->nr_deferred);
> + shrinker->nr_deferred = NULL;
> +
> + kfree(shrinker);
> +}
> +EXPORT_SYMBOL(shrinker_free_non_registered);
> +
> +void shrinker_register(struct shrinker *shrinker)
> +{
> + down_write(&shrinker_rwsem);
> + list_add_tail(&shrinker->list, &shrinker_list);
> + shrinker->flags |= SHRINKER_REGISTERED;
> + shrinker_debugfs_add(shrinker);
> + up_write(&shrinker_rwsem);
> +}
> +EXPORT_SYMBOL(shrinker_register);
> +
> +void shrinker_unregister(struct shrinker *shrinker)

You have made all shrinkers to be dynamically allocated, so
we should prevent users from allocating shrinkers statically and
use this function to unregister it. It is better to add a
flag like SHRINKER_ALLOCATED which is set in shrinker_alloc(),
and check whether it is set in shrinker_unregister(), if not
maybe a warning should be added to tell the users what happened.

> +{
> + struct dentry *debugfs_entry;
> + int debugfs_id;
> +
> + if (!shrinker || !(shrinker->flags & SHRINKER_REGISTERED))
> + return;
> +
> + down_write(&shrinker_rwsem);
> + list_del(&shrinker->list);
> + shrinker->flags &= ~SHRINKER_REGISTERED;
> + if (shrinker->flags & SHRINKER_MEMCG_AWARE)
> + unregister_memcg_shrinker(shrinker);
> + debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id);

In the internal of this function, you also could use
shrinker_debugfs_name_free().

Thanks.

> + up_write(&shrinker_rwsem);
> +
> + shrinker_debugfs_remove(debugfs_entry, debugfs_id);
> +
> + kfree(shrinker->nr_deferred);
> + shrinker->nr_deferred = NULL;
> +
> + kfree(shrinker);
> +}
> +EXPORT_SYMBOL(shrinker_unregister);
> +
> /*
> * Add a shrinker callback to be called from the vm.
> */


2023-07-25 09:58:46

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v2 08/47] erofs: dynamically allocate the erofs-shrinker



> On Jul 24, 2023, at 17:43, Qi Zheng <[email protected]> wrote:
>
> Use new APIs to dynamically allocate the erofs-shrinker.
>
> Signed-off-by: Qi Zheng <[email protected]>

Reviewed-by: Muchun Song <[email protected]>

Thanks.


2023-07-25 10:04:18

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v2 09/47] f2fs: dynamically allocate the f2fs-shrinker



> On Jul 24, 2023, at 17:43, Qi Zheng <[email protected]> wrote:
>
> Use new APIs to dynamically allocate the f2fs-shrinker.
>
> Signed-off-by: Qi Zheng <[email protected]>

Reviewed-by: Muchun Song <[email protected]>

Thanks.


2023-07-26 07:29:43

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v2 19/47] mm: thp: dynamically allocate the thp-related shrinkers



On 2023/7/24 17:43, Qi Zheng wrote:
> Use new APIs to dynamically allocate the thp-zero and thp-deferred_split
> shrinkers.
>
> Signed-off-by: Qi Zheng <[email protected]>
> ---
> mm/huge_memory.c | 69 +++++++++++++++++++++++++++++++-----------------
> 1 file changed, 45 insertions(+), 24 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 8c94b34024a2..4db5a1834d81 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -65,7 +65,11 @@ unsigned long transparent_hugepage_flags __read_mostly =
> (1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG)|
> (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG);
>
> -static struct shrinker deferred_split_shrinker;
> +static struct shrinker *deferred_split_shrinker;
> +static unsigned long deferred_split_count(struct shrinker *shrink,
> + struct shrink_control *sc);
> +static unsigned long deferred_split_scan(struct shrinker *shrink,
> + struct shrink_control *sc);
>
> static atomic_t huge_zero_refcount;
> struct page *huge_zero_page __read_mostly;
> @@ -229,11 +233,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink,
> return 0;
> }
>
> -static struct shrinker huge_zero_page_shrinker = {
> - .count_objects = shrink_huge_zero_page_count,
> - .scan_objects = shrink_huge_zero_page_scan,
> - .seeks = DEFAULT_SEEKS,
> -};
> +static struct shrinker *huge_zero_page_shrinker;

Same as patch #17.

>
> #ifdef CONFIG_SYSFS
> static ssize_t enabled_show(struct kobject *kobj,
> @@ -454,6 +454,40 @@ static inline void hugepage_exit_sysfs(struct kobject *hugepage_kobj)
> }
> #endif /* CONFIG_SYSFS */
>
> +static int thp_shrinker_init(void)

Better to declare it as __init.

> +{
> + huge_zero_page_shrinker = shrinker_alloc(0, "thp-zero");
> + if (!huge_zero_page_shrinker)
> + return -ENOMEM;
> +
> + deferred_split_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE |
> + SHRINKER_MEMCG_AWARE |
> + SHRINKER_NONSLAB,
> + "thp-deferred_split");
> + if (!deferred_split_shrinker) {
> + shrinker_free_non_registered(huge_zero_page_shrinker);
> + return -ENOMEM;
> + }
> +
> + huge_zero_page_shrinker->count_objects = shrink_huge_zero_page_count;
> + huge_zero_page_shrinker->scan_objects = shrink_huge_zero_page_scan;
> + huge_zero_page_shrinker->seeks = DEFAULT_SEEKS;
> + shrinker_register(huge_zero_page_shrinker);
> +
> + deferred_split_shrinker->count_objects = deferred_split_count;
> + deferred_split_shrinker->scan_objects = deferred_split_scan;
> + deferred_split_shrinker->seeks = DEFAULT_SEEKS;
> + shrinker_register(deferred_split_shrinker);
> +
> + return 0;
> +}
> +
> +static void thp_shrinker_exit(void)

Same as here.

> +{
> + shrinker_unregister(huge_zero_page_shrinker);
> + shrinker_unregister(deferred_split_shrinker);
> +}
> +
> static int __init hugepage_init(void)
> {
> int err;
> @@ -482,12 +516,9 @@ static int __init hugepage_init(void)
> if (err)
> goto err_slab;
>
> - err = register_shrinker(&huge_zero_page_shrinker, "thp-zero");
> - if (err)
> - goto err_hzp_shrinker;
> - err = register_shrinker(&deferred_split_shrinker, "thp-deferred_split");
> + err = thp_shrinker_init();
> if (err)
> - goto err_split_shrinker;
> + goto err_shrinker;
>
> /*
> * By default disable transparent hugepages on smaller systems,
> @@ -505,10 +536,8 @@ static int __init hugepage_init(void)
>
> return 0;
> err_khugepaged:
> - unregister_shrinker(&deferred_split_shrinker);
> -err_split_shrinker:
> - unregister_shrinker(&huge_zero_page_shrinker);
> -err_hzp_shrinker:
> + thp_shrinker_exit();
> +err_shrinker:
> khugepaged_destroy();
> err_slab:
> hugepage_exit_sysfs(hugepage_kobj);
> @@ -2851,7 +2880,7 @@ void deferred_split_folio(struct folio *folio)
> #ifdef CONFIG_MEMCG
> if (memcg)
> set_shrinker_bit(memcg, folio_nid(folio),
> - deferred_split_shrinker.id);
> + deferred_split_shrinker->id);
> #endif
> }
> spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
> @@ -2925,14 +2954,6 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
> return split;
> }
>
> -static struct shrinker deferred_split_shrinker = {
> - .count_objects = deferred_split_count,
> - .scan_objects = deferred_split_scan,
> - .seeks = DEFAULT_SEEKS,
> - .flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE |
> - SHRINKER_NONSLAB,
> -};
> -
> #ifdef CONFIG_DEBUG_FS
> static void split_huge_pages_all(void)
> {


2023-07-26 07:41:04

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v2 22/47] drm/i915: dynamically allocate the i915_gem_mm shrinker



> On Jul 24, 2023, at 17:43, Qi Zheng <[email protected]> wrote:
>
> In preparation for implementing lockless slab shrink, use new APIs to
> dynamically allocate the i915_gem_mm shrinker, so that it can be freed
> asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
> read-side critical section when releasing the struct drm_i915_private.
>
> Signed-off-by: Qi Zheng <[email protected]>

Reviewed-by: Muchun Song <[email protected]>



2023-07-26 07:42:15

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v2 03/47] mm: shrinker: add infrastructure for dynamically allocating shrinker

On Mon, Jul 24, 2023 at 05:43:10PM +0800, Qi Zheng wrote:
> Currently, the shrinker instances can be divided into the following three
> types:
>
> a) global shrinker instance statically defined in the kernel, such as
> workingset_shadow_shrinker.
>
> b) global shrinker instance statically defined in the kernel modules, such
> as mmu_shrinker in x86.
>
> c) shrinker instance embedded in other structures.
>
> For case a, the memory of shrinker instance is never freed. For case b,
> the memory of shrinker instance will be freed after synchronize_rcu() when
> the module is unloaded. For case c, the memory of shrinker instance will
> be freed along with the structure it is embedded in.
>
> In preparation for implementing lockless slab shrink, we need to
> dynamically allocate those shrinker instances in case c, then the memory
> can be dynamically freed alone by calling kfree_rcu().
>
> So this commit adds the following new APIs for dynamically allocating
> shrinker, and add a private_data field to struct shrinker to record and
> get the original embedded structure.
>
> 1. shrinker_alloc()
>
> Used to allocate shrinker instance itself and related memory, it will
> return a pointer to the shrinker instance on success and NULL on failure.
>
> 2. shrinker_free_non_registered()
>
> Used to destroy the non-registered shrinker instance.

This is a bit nasty

>
> 3. shrinker_register()
>
> Used to register the shrinker instance, which is same as the current
> register_shrinker_prepared().
>
> 4. shrinker_unregister()

rename this "shrinker_free()" and key the two different freeing
cases on the SHRINKER_REGISTERED bit rather than mostly duplicating
the two.

void shrinker_free(struct shrinker *shrinker)
{
struct dentry *debugfs_entry = NULL;
int debugfs_id;

if (!shrinker)
return;

down_write(&shrinker_rwsem);
if (shrinker->flags & SHRINKER_REGISTERED) {
list_del(&shrinker->list);
debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id);
} else if (IS_ENABLED(CONFIG_SHRINKER_DEBUG)) {
kfree_const(shrinker->name);
}

if (shrinker->flags & SHRINKER_MEMCG_AWARE)
unregister_memcg_shrinker(shrinker);
up_write(&shrinker_rwsem);

if (debugfs_entry)
shrinker_debugfs_remove(debugfs_entry, debugfs_id);

kfree(shrinker->nr_deferred);
kfree(shrinker);
}
EXPORT_SYMBOL_GPL(shrinker_free);

--
Dave Chinner
[email protected]

2023-07-26 07:50:49

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v2 30/47] virtio_balloon: dynamically allocate the virtio-balloon shrinker



> On Jul 24, 2023, at 17:43, Qi Zheng <[email protected]> wrote:
>
> In preparation for implementing lockless slab shrink, use new APIs to
> dynamically allocate the virtio-balloon shrinker, so that it can be freed
> asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
> read-side critical section when releasing the struct virtio_balloon.
>
> Signed-off-by: Qi Zheng <[email protected]>

Reviewed-by: Muchun Song <[email protected]>



2023-07-26 08:08:48

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v2 39/47] zsmalloc: dynamically allocate the mm-zspool shrinker



> On Jul 24, 2023, at 17:43, Qi Zheng <[email protected]> wrote:
>
> In preparation for implementing lockless slab shrink, use new APIs to
> dynamically allocate the mm-zspool shrinker, so that it can be freed
> asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
> read-side critical section when releasing the struct zs_pool.
>
> Signed-off-by: Qi Zheng <[email protected]>

Reviewed-by: Muchun Song <[email protected]>



2023-07-26 08:15:49

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v2 32/47] ext4: dynamically allocate the ext4-es shrinker



> On Jul 24, 2023, at 17:43, Qi Zheng <[email protected]> wrote:
>
> In preparation for implementing lockless slab shrink, use new APIs to
> dynamically allocate the ext4-es shrinker, so that it can be freed
> asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
> read-side critical section when releasing the struct ext4_sb_info.
>
> Signed-off-by: Qi Zheng <[email protected]>

Reviewed-by: Muchun Song <[email protected]>



2023-07-26 09:38:54

by Qi Zheng

[permalink] [raw]
Subject: Re: [PATCH v2 17/47] rcu: dynamically allocate the rcu-lazy shrinker



On 2023/7/26 15:04, Muchun Song wrote:
>
>
>> On Jul 24, 2023, at 17:43, Qi Zheng <[email protected]> wrote:
>>
>> Use new APIs to dynamically allocate the rcu-lazy shrinker.
>>
>> Signed-off-by: Qi Zheng <[email protected]>
>> ---
>> kernel/rcu/tree_nocb.h | 19 +++++++++++--------
>> 1 file changed, 11 insertions(+), 8 deletions(-)
>>
>> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
>> index 43229d2b0c44..919f17561733 100644
>> --- a/kernel/rcu/tree_nocb.h
>> +++ b/kernel/rcu/tree_nocb.h
>> @@ -1397,12 +1397,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>> return count ? count : SHRINK_STOP;
>> }
>>
>> -static struct shrinker lazy_rcu_shrinker = {
>> - .count_objects = lazy_rcu_shrink_count,
>> - .scan_objects = lazy_rcu_shrink_scan,
>> - .batch = 0,
>> - .seeks = DEFAULT_SEEKS,
>> -};
>> +static struct shrinker *lazy_rcu_shrinker;
>
> Seems there is no users of this variable, maybe we could drop
> this.

Yeah, will change it to a local variable. And the patch #15 is
the same.

>

2023-07-26 09:40:29

by Qi Zheng

[permalink] [raw]
Subject: Re: [PATCH v2 19/47] mm: thp: dynamically allocate the thp-related shrinkers



On 2023/7/26 15:10, Muchun Song wrote:
>
>
> On 2023/7/24 17:43, Qi Zheng wrote:
>> Use new APIs to dynamically allocate the thp-zero and thp-deferred_split
>> shrinkers.
>>
>> Signed-off-by: Qi Zheng <[email protected]>
>> ---
>>   mm/huge_memory.c | 69 +++++++++++++++++++++++++++++++-----------------
>>   1 file changed, 45 insertions(+), 24 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 8c94b34024a2..4db5a1834d81 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -65,7 +65,11 @@ unsigned long transparent_hugepage_flags
>> __read_mostly =
>>       (1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG)|
>>       (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG);
>> -static struct shrinker deferred_split_shrinker;
>> +static struct shrinker *deferred_split_shrinker;
>> +static unsigned long deferred_split_count(struct shrinker *shrink,
>> +                      struct shrink_control *sc);
>> +static unsigned long deferred_split_scan(struct shrinker *shrink,
>> +                     struct shrink_control *sc);
>>   static atomic_t huge_zero_refcount;
>>   struct page *huge_zero_page __read_mostly;
>> @@ -229,11 +233,7 @@ static unsigned long
>> shrink_huge_zero_page_scan(struct shrinker *shrink,
>>       return 0;
>>   }
>> -static struct shrinker huge_zero_page_shrinker = {
>> -    .count_objects = shrink_huge_zero_page_count,
>> -    .scan_objects = shrink_huge_zero_page_scan,
>> -    .seeks = DEFAULT_SEEKS,
>> -};
>> +static struct shrinker *huge_zero_page_shrinker;
>
> Same as patch #17.

OK, will do.

>
>>   #ifdef CONFIG_SYSFS
>>   static ssize_t enabled_show(struct kobject *kobj,
>> @@ -454,6 +454,40 @@ static inline void hugepage_exit_sysfs(struct
>> kobject *hugepage_kobj)
>>   }
>>   #endif /* CONFIG_SYSFS */
>> +static int thp_shrinker_init(void)
>
> Better to declare it as __init.

Will do.

>
>> +{
>> +    huge_zero_page_shrinker = shrinker_alloc(0, "thp-zero");
>> +    if (!huge_zero_page_shrinker)
>> +        return -ENOMEM;
>> +
>> +    deferred_split_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE |
>> +                         SHRINKER_MEMCG_AWARE |
>> +                         SHRINKER_NONSLAB,
>> +                         "thp-deferred_split");
>> +    if (!deferred_split_shrinker) {
>> +        shrinker_free_non_registered(huge_zero_page_shrinker);
>> +        return -ENOMEM;
>> +    }
>> +
>> +    huge_zero_page_shrinker->count_objects =
>> shrink_huge_zero_page_count;
>> +    huge_zero_page_shrinker->scan_objects = shrink_huge_zero_page_scan;
>> +    huge_zero_page_shrinker->seeks = DEFAULT_SEEKS;
>> +    shrinker_register(huge_zero_page_shrinker);
>> +
>> +    deferred_split_shrinker->count_objects = deferred_split_count;
>> +    deferred_split_shrinker->scan_objects = deferred_split_scan;
>> +    deferred_split_shrinker->seeks = DEFAULT_SEEKS;
>> +    shrinker_register(deferred_split_shrinker);
>> +
>> +    return 0;
>> +}
>> +
>> +static void thp_shrinker_exit(void)
>
> Same as here.

Will do.

>
>> +{
>> +    shrinker_unregister(huge_zero_page_shrinker);
>> +    shrinker_unregister(deferred_split_shrinker);
>> +}
>> +
>>   static int __init hugepage_init(void)
>>   {
>>       int err;
>> @@ -482,12 +516,9 @@ static int __init hugepage_init(void)
>>       if (err)
>>           goto err_slab;
>> -    err = register_shrinker(&huge_zero_page_shrinker, "thp-zero");
>> -    if (err)
>> -        goto err_hzp_shrinker;
>> -    err = register_shrinker(&deferred_split_shrinker,
>> "thp-deferred_split");
>> +    err = thp_shrinker_init();
>>       if (err)
>> -        goto err_split_shrinker;
>> +        goto err_shrinker;
>>       /*
>>        * By default disable transparent hugepages on smaller systems,
>> @@ -505,10 +536,8 @@ static int __init hugepage_init(void)
>>       return 0;
>>   err_khugepaged:
>> -    unregister_shrinker(&deferred_split_shrinker);
>> -err_split_shrinker:
>> -    unregister_shrinker(&huge_zero_page_shrinker);
>> -err_hzp_shrinker:
>> +    thp_shrinker_exit();
>> +err_shrinker:
>>       khugepaged_destroy();
>>   err_slab:
>>       hugepage_exit_sysfs(hugepage_kobj);
>> @@ -2851,7 +2880,7 @@ void deferred_split_folio(struct folio *folio)
>>   #ifdef CONFIG_MEMCG
>>           if (memcg)
>>               set_shrinker_bit(memcg, folio_nid(folio),
>> -                     deferred_split_shrinker.id);
>> +                     deferred_split_shrinker->id);
>>   #endif
>>       }
>>       spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>> @@ -2925,14 +2954,6 @@ static unsigned long deferred_split_scan(struct
>> shrinker *shrink,
>>       return split;
>>   }
>> -static struct shrinker deferred_split_shrinker = {
>> -    .count_objects = deferred_split_count,
>> -    .scan_objects = deferred_split_scan,
>> -    .seeks = DEFAULT_SEEKS,
>> -    .flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE |
>> -         SHRINKER_NONSLAB,
>> -};
>> -
>>   #ifdef CONFIG_DEBUG_FS
>>   static void split_huge_pages_all(void)
>>   {
>