Hi all,
1. Background
=============
We used to implement the lockless slab shrink with SRCU [1], but then kernel
test robot reported -88.8% regression in stress-ng.ramfs.ops_per_sec test
case [2], so we reverted it [3].
This patch series aims to re-implement the lockless slab shrink using the
refcount+RCU method proposed by Dave Chinner [4].
[1]. https://lore.kernel.org/lkml/[email protected]/
[2]. https://lore.kernel.org/lkml/[email protected]/
[3]. https://lore.kernel.org/all/[email protected]/
[4]. https://lore.kernel.org/lkml/[email protected]/
2. Implementation
=================
Currently, the shrinker instances can be divided into the following three types:
a) global shrinker instance statically defined in the kernel, such as
workingset_shadow_shrinker.
b) global shrinker instance statically defined in the kernel modules, such as
mmu_shrinker in x86.
c) shrinker instance embedded in other structures.
For case a, the memory of shrinker instance is never freed. For case b, the
memory of shrinker instance will be freed after synchronize_rcu() when the
module is unloaded. For case c, the memory of shrinker instance will be freed
along with the structure it is embedded in.
In preparation for implementing lockless slab shrink, we need to dynamically
allocate those shrinker instances in case c, then the memory can be dynamically
freed alone by calling kfree_rcu().
This patchset adds the following new APIs for dynamically allocating shrinker,
and add a private_data field to struct shrinker to record and get the original
embedded structure.
1. shrinker_alloc()
2. shrinker_register()
3. shrinker_free()
In order to simplify shrinker-related APIs and make shrinker more independent of
other kernel mechanisms, this patchset uses the above APIs to convert all
shrinkers (including case a and b) to dynamically allocated, and then remove all
existing APIs. This will also have another advantage mentioned by Dave Chinner:
```
The other advantage of this is that it will break all the existing out of tree
code and third party modules using the old API and will no longer work with a
kernel using lockless slab shrinkers. They need to break (both at the source and
binary levels) to stop bad things from happening due to using uncoverted
shrinkers in the new setup.
```
Then we free the shrinker by calling call_rcu(), and use rcu_read_{lock,unlock}()
to ensure that the shrinker instance is valid. And the shrinker::refcount
mechanism ensures that the shrinker instance will not be run again after
unregistration. So the structure that records the pointer of shrinker instance
can be safely freed without waiting for the RCU read-side critical section.
In this way, while we implement the lockless slab shrink, we don't need to be
blocked in unregister_shrinker() to wait RCU read-side critical section.
PATCH 1: fix memory leak in binder_init()
PATCH 2: move some shrinker-related function declarations to mm/internal.h
PATCH 3: move shrinker-related code into a separate file
PATCH 4: remove redundant shrinker_rwsem in debugfs operations
PATCH 5: add infrastructure for dynamically allocating shrinker
PATCH 6 ~ 23: dynamically allocate the shrinker instances in case a and b
PATCH 24 ~ 42: dynamically allocate the shrinker instances in case c
PATCH 43: remove old APIs
PATCH 44: introduce pool_shrink_rwsem to implement private synchronize_shrinkers()
PATCH 45: add a secondary array for shrinker_info::{map, nr_deferred}
PATCH 46 ~ 47: implement the lockless slab shrink
PATCH 48 ~ 49: convert shrinker_rwsem to mutex
3. Testing
==========
3.1 slab shrink stress test
---------------------------
We can reproduce the down_read_trylock() hotspot through the following script:
```
DIR="/root/shrinker/memcg/mnt"
do_create()
{
mkdir -p /sys/fs/cgroup/memory/test
echo 4G > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
for i in `seq 0 $1`;
do
mkdir -p /sys/fs/cgroup/memory/test/$i;
echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs;
mkdir -p $DIR/$i;
done
}
do_mount()
{
for i in `seq $1 $2`;
do
mount -t tmpfs $i $DIR/$i;
done
}
do_touch()
{
for i in `seq $1 $2`;
do
echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs;
dd if=/dev/zero of=$DIR/$i/file$i bs=1M count=1 &
done
}
case "$1" in
touch)
do_touch $2 $3
;;
test)
do_create 4000
do_mount 0 4000
do_touch 0 3000
;;
*)
exit 1
;;
esac
```
Save the above script, then run test and touch commands. Then we can use the
following perf command to view hotspots:
perf top -U -F 999
1) Before applying this patchset:
40.44% [kernel] [k] down_read_trylock
17.59% [kernel] [k] up_read
13.64% [kernel] [k] pv_native_safe_halt
11.90% [kernel] [k] shrink_slab
8.21% [kernel] [k] idr_find
2.71% [kernel] [k] _find_next_bit
1.36% [kernel] [k] shrink_node
0.81% [kernel] [k] shrink_lruvec
0.80% [kernel] [k] __radix_tree_lookup
0.50% [kernel] [k] do_shrink_slab
0.21% [kernel] [k] list_lru_count_one
0.16% [kernel] [k] mem_cgroup_iter
2) After applying this patchset:
60.17% [kernel] [k] shrink_slab
20.42% [kernel] [k] pv_native_safe_halt
3.03% [kernel] [k] do_shrink_slab
2.73% [kernel] [k] shrink_node
2.27% [kernel] [k] shrink_lruvec
2.00% [kernel] [k] __rcu_read_unlock
1.92% [kernel] [k] mem_cgroup_iter
0.98% [kernel] [k] __rcu_read_lock
0.91% [kernel] [k] osq_lock
0.63% [kernel] [k] mem_cgroup_calculate_protection
0.55% [kernel] [k] shrinker_put
0.46% [kernel] [k] list_lru_count_one
We can see that the first perf hotspot becomes shrink_slab, which is what we
expect.
3.2 registeration and unregisteration stress test
-------------------------------------------------
Run the command below to test:
stress-ng --timeout 60 --times --verify --metrics-brief --ramfs 9 &
1) Before applying this patchset:
setting to a 60 second run per stressor
dispatching hogs: 9 ramfs
stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
(secs) (secs) (secs) (real time) (usr+sys time)
ramfs 735238 60.00 12.37 363.70 12253.05 1955.08
for a 60.01s run time:
1440.27s available CPU time
12.36s user time ( 0.86%)
363.70s system time ( 25.25%)
376.06s total time ( 26.11%)
load average: 10.79 4.47 1.69
passed: 9: ramfs (9)
failed: 0
skipped: 0
successful run completed in 60.01s (1 min, 0.01 secs)
2) After applying this patchset:
setting to a 60 second run per stressor
dispatching hogs: 9 ramfs
stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
(secs) (secs) (secs) (real time) (usr+sys time)
ramfs 746698 60.00 12.45 376.16 12444.02 1921.47
for a 60.01s run time:
1440.28s available CPU time
12.44s user time ( 0.86%)
376.16s system time ( 26.12%)
388.60s total time ( 26.98%)
load average: 9.01 3.85 1.49
passed: 9: ramfs (9)
failed: 0
skipped: 0
successful run completed in 60.01s (1 min, 0.01 secs)
We can see that the ops/s has hardly changed.
This series is based on next-20230726.
Comments and suggestions are welcome.
Thanks,
Qi
Changelog in v2 -> v3:
- add the patch that [PATCH v3 07/49] depends on
- move some shrinker-related function declarations to mm/internal.h
(suggested by Muchun Song)
- combine shrinker_free_non_registered() and shrinker_unregister() into
shrinker_free() (suggested by Dave Chinner)
- add missing __init and fix return value in bch_btree_cache_alloc()
(pointed by Muchun Song)
- remove unnecessary WARN_ON() (pointed by Steven Price)
- go back to use completion to implement lockless slab shrink
(pointed by Dave Chinner)
- collect Acked-bys and Reviewed-bys
- rebase onto the next-20230726.
Changelog in v1 -> v2:
- implement the new APIs and convert all shrinkers to use it.
(suggested by Dave Chinner)
- fix UAF in PATCH [05/29] (pointed by Steven Price)
- add a secondary array for shrinker_info::{map, nr_deferred}
- re-implement the lockless slab shrink
(Since unifying the processing of global and memcg slab shrink needs to
modify the startup sequence (As I mentioned in https://lore.kernel.org/lkml/[email protected]/),
I finally choose to process them separately.)
- collect Acked-bys
Qi Zheng (49):
binder: fix memory leak in binder_init()
mm: move some shrinker-related function declarations to mm/internal.h
mm: vmscan: move shrinker-related code into a separate file
mm: shrinker: remove redundant shrinker_rwsem in debugfs operations
mm: shrinker: add infrastructure for dynamically allocating shrinker
kvm: mmu: dynamically allocate the x86-mmu shrinker
binder: dynamically allocate the android-binder shrinker
drm/ttm: dynamically allocate the drm-ttm_pool shrinker
xenbus/backend: dynamically allocate the xen-backend shrinker
erofs: dynamically allocate the erofs-shrinker
f2fs: dynamically allocate the f2fs-shrinker
gfs2: dynamically allocate the gfs2-glock shrinker
gfs2: dynamically allocate the gfs2-qd shrinker
NFSv4.2: dynamically allocate the nfs-xattr shrinkers
nfs: dynamically allocate the nfs-acl shrinker
nfsd: dynamically allocate the nfsd-filecache shrinker
quota: dynamically allocate the dquota-cache shrinker
ubifs: dynamically allocate the ubifs-slab shrinker
rcu: dynamically allocate the rcu-lazy shrinker
rcu: dynamically allocate the rcu-kfree shrinker
mm: thp: dynamically allocate the thp-related shrinkers
sunrpc: dynamically allocate the sunrpc_cred shrinker
mm: workingset: dynamically allocate the mm-shadow shrinker
drm/i915: dynamically allocate the i915_gem_mm shrinker
drm/msm: dynamically allocate the drm-msm_gem shrinker
drm/panfrost: dynamically allocate the drm-panfrost shrinker
dm: dynamically allocate the dm-bufio shrinker
dm zoned: dynamically allocate the dm-zoned-meta shrinker
md/raid5: dynamically allocate the md-raid5 shrinker
bcache: dynamically allocate the md-bcache shrinker
vmw_balloon: dynamically allocate the vmw-balloon shrinker
virtio_balloon: dynamically allocate the virtio-balloon shrinker
mbcache: dynamically allocate the mbcache shrinker
ext4: dynamically allocate the ext4-es shrinker
jbd2,ext4: dynamically allocate the jbd2-journal shrinker
nfsd: dynamically allocate the nfsd-client shrinker
nfsd: dynamically allocate the nfsd-reply shrinker
xfs: dynamically allocate the xfs-buf shrinker
xfs: dynamically allocate the xfs-inodegc shrinker
xfs: dynamically allocate the xfs-qm shrinker
zsmalloc: dynamically allocate the mm-zspool shrinker
fs: super: dynamically allocate the s_shrink
mm: shrinker: remove old APIs
drm/ttm: introduce pool_shrink_rwsem
mm: shrinker: add a secondary array for shrinker_info::{map,
nr_deferred}
mm: shrinker: make global slab shrink lockless
mm: shrinker: make memcg slab shrink lockless
mm: shrinker: hold write lock to reparent shrinker nr_deferred
mm: shrinker: convert shrinker_rwsem to mutex
arch/x86/kvm/mmu/mmu.c | 18 +-
drivers/android/binder.c | 1 +
drivers/android/binder_alloc.c | 35 +-
drivers/android/binder_alloc.h | 1 +
drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 30 +-
drivers/gpu/drm/i915/i915_drv.h | 2 +-
drivers/gpu/drm/msm/msm_drv.c | 4 +-
drivers/gpu/drm/msm/msm_drv.h | 4 +-
drivers/gpu/drm/msm/msm_gem_shrinker.c | 34 +-
drivers/gpu/drm/panfrost/panfrost_device.h | 2 +-
drivers/gpu/drm/panfrost/panfrost_drv.c | 6 +-
drivers/gpu/drm/panfrost/panfrost_gem.h | 2 +-
.../gpu/drm/panfrost/panfrost_gem_shrinker.c | 30 +-
drivers/gpu/drm/ttm/ttm_pool.c | 38 +-
drivers/md/bcache/bcache.h | 2 +-
drivers/md/bcache/btree.c | 27 +-
drivers/md/bcache/sysfs.c | 3 +-
drivers/md/dm-bufio.c | 26 +-
drivers/md/dm-cache-metadata.c | 2 +-
drivers/md/dm-zoned-metadata.c | 28 +-
drivers/md/raid5.c | 25 +-
drivers/md/raid5.h | 2 +-
drivers/misc/vmw_balloon.c | 38 +-
drivers/virtio/virtio_balloon.c | 25 +-
drivers/xen/xenbus/xenbus_probe_backend.c | 18 +-
fs/btrfs/super.c | 2 +-
fs/erofs/utils.c | 20 +-
fs/ext4/ext4.h | 2 +-
fs/ext4/extents_status.c | 22 +-
fs/f2fs/super.c | 32 +-
fs/gfs2/glock.c | 20 +-
fs/gfs2/main.c | 6 +-
fs/gfs2/quota.c | 26 +-
fs/gfs2/quota.h | 3 +-
fs/jbd2/journal.c | 27 +-
fs/kernfs/mount.c | 2 +-
fs/mbcache.c | 23 +-
fs/nfs/nfs42xattr.c | 87 +-
fs/nfs/super.c | 20 +-
fs/nfsd/filecache.c | 22 +-
fs/nfsd/netns.h | 4 +-
fs/nfsd/nfs4state.c | 20 +-
fs/nfsd/nfscache.c | 31 +-
fs/proc/root.c | 2 +-
fs/quota/dquot.c | 18 +-
fs/super.c | 38 +-
fs/ubifs/super.c | 22 +-
fs/xfs/xfs_buf.c | 25 +-
fs/xfs/xfs_buf.h | 2 +-
fs/xfs/xfs_icache.c | 26 +-
fs/xfs/xfs_mount.c | 4 +-
fs/xfs/xfs_mount.h | 2 +-
fs/xfs/xfs_qm.c | 26 +-
fs/xfs/xfs_qm.h | 2 +-
include/linux/fs.h | 2 +-
include/linux/jbd2.h | 2 +-
include/linux/memcontrol.h | 12 +-
include/linux/shrinker.h | 67 +-
kernel/rcu/tree.c | 22 +-
kernel/rcu/tree_nocb.h | 20 +-
mm/Makefile | 4 +-
mm/huge_memory.c | 69 +-
mm/internal.h | 41 +
mm/shrinker.c | 770 ++++++++++++++++++
mm/shrinker_debug.c | 45 +-
mm/vmscan.c | 701 ----------------
mm/workingset.c | 27 +-
mm/zsmalloc.c | 28 +-
net/sunrpc/auth.c | 19 +-
69 files changed, 1534 insertions(+), 1234 deletions(-)
create mode 100644 mm/shrinker.c
--
2.30.2
The mm/vmscan.c file is too large, so separate the shrinker-related
code from it into a separate file. No functional changes.
Signed-off-by: Qi Zheng <[email protected]>
---
mm/Makefile | 4 +-
mm/internal.h | 2 +
mm/shrinker.c | 709 ++++++++++++++++++++++++++++++++++++++++++++++++++
mm/vmscan.c | 701 -------------------------------------------------
4 files changed, 713 insertions(+), 703 deletions(-)
create mode 100644 mm/shrinker.c
diff --git a/mm/Makefile b/mm/Makefile
index e6d9a1d5e84d..48a2ab9f86ac 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -48,8 +48,8 @@ endif
obj-y := filemap.o mempool.o oom_kill.o fadvise.o \
maccess.o page-writeback.o folio-compat.o \
- readahead.o swap.o truncate.o vmscan.o shmem.o \
- util.o mmzone.o vmstat.o backing-dev.o \
+ readahead.o swap.o truncate.o vmscan.o shrinker.o \
+ shmem.o util.o mmzone.o vmstat.o backing-dev.o \
mm_init.o percpu.o slab_common.o \
compaction.o show_mem.o\
interval_tree.o list_lru.o workingset.o \
diff --git a/mm/internal.h b/mm/internal.h
index 8aeaf16ae039..8b82038dcc6a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1139,6 +1139,8 @@ struct vma_prepare {
/*
* shrinker related functions
*/
+unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
+ int priority);
#ifdef CONFIG_SHRINKER_DEBUG
extern int shrinker_debugfs_add(struct shrinker *shrinker);
diff --git a/mm/shrinker.c b/mm/shrinker.c
new file mode 100644
index 000000000000..043c87ccfab4
--- /dev/null
+++ b/mm/shrinker.c
@@ -0,0 +1,709 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/memcontrol.h>
+#include <linux/rwsem.h>
+#include <linux/shrinker.h>
+#include <trace/events/vmscan.h>
+
+#include "internal.h"
+
+LIST_HEAD(shrinker_list);
+DECLARE_RWSEM(shrinker_rwsem);
+
+#ifdef CONFIG_MEMCG
+static int shrinker_nr_max;
+
+/* The shrinker_info is expanded in a batch of BITS_PER_LONG */
+static inline int shrinker_map_size(int nr_items)
+{
+ return (DIV_ROUND_UP(nr_items, BITS_PER_LONG) * sizeof(unsigned long));
+}
+
+static inline int shrinker_defer_size(int nr_items)
+{
+ return (round_up(nr_items, BITS_PER_LONG) * sizeof(atomic_long_t));
+}
+
+void free_shrinker_info(struct mem_cgroup *memcg)
+{
+ struct mem_cgroup_per_node *pn;
+ struct shrinker_info *info;
+ int nid;
+
+ for_each_node(nid) {
+ pn = memcg->nodeinfo[nid];
+ info = rcu_dereference_protected(pn->shrinker_info, true);
+ kvfree(info);
+ rcu_assign_pointer(pn->shrinker_info, NULL);
+ }
+}
+
+int alloc_shrinker_info(struct mem_cgroup *memcg)
+{
+ struct shrinker_info *info;
+ int nid, size, ret = 0;
+ int map_size, defer_size = 0;
+
+ down_write(&shrinker_rwsem);
+ map_size = shrinker_map_size(shrinker_nr_max);
+ defer_size = shrinker_defer_size(shrinker_nr_max);
+ size = map_size + defer_size;
+ for_each_node(nid) {
+ info = kvzalloc_node(sizeof(*info) + size, GFP_KERNEL, nid);
+ if (!info) {
+ free_shrinker_info(memcg);
+ ret = -ENOMEM;
+ break;
+ }
+ info->nr_deferred = (atomic_long_t *)(info + 1);
+ info->map = (void *)info->nr_deferred + defer_size;
+ info->map_nr_max = shrinker_nr_max;
+ rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
+ }
+ up_write(&shrinker_rwsem);
+
+ return ret;
+}
+
+static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg,
+ int nid)
+{
+ return rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info,
+ lockdep_is_held(&shrinker_rwsem));
+}
+
+static int expand_one_shrinker_info(struct mem_cgroup *memcg,
+ int map_size, int defer_size,
+ int old_map_size, int old_defer_size,
+ int new_nr_max)
+{
+ struct shrinker_info *new, *old;
+ struct mem_cgroup_per_node *pn;
+ int nid;
+ int size = map_size + defer_size;
+
+ for_each_node(nid) {
+ pn = memcg->nodeinfo[nid];
+ old = shrinker_info_protected(memcg, nid);
+ /* Not yet online memcg */
+ if (!old)
+ return 0;
+
+ /* Already expanded this shrinker_info */
+ if (new_nr_max <= old->map_nr_max)
+ continue;
+
+ new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
+ if (!new)
+ return -ENOMEM;
+
+ new->nr_deferred = (atomic_long_t *)(new + 1);
+ new->map = (void *)new->nr_deferred + defer_size;
+ new->map_nr_max = new_nr_max;
+
+ /* map: set all old bits, clear all new bits */
+ memset(new->map, (int)0xff, old_map_size);
+ memset((void *)new->map + old_map_size, 0, map_size - old_map_size);
+ /* nr_deferred: copy old values, clear all new values */
+ memcpy(new->nr_deferred, old->nr_deferred, old_defer_size);
+ memset((void *)new->nr_deferred + old_defer_size, 0,
+ defer_size - old_defer_size);
+
+ rcu_assign_pointer(pn->shrinker_info, new);
+ kvfree_rcu(old, rcu);
+ }
+
+ return 0;
+}
+
+static int expand_shrinker_info(int new_id)
+{
+ int ret = 0;
+ int new_nr_max = round_up(new_id + 1, BITS_PER_LONG);
+ int map_size, defer_size = 0;
+ int old_map_size, old_defer_size = 0;
+ struct mem_cgroup *memcg;
+
+ if (!root_mem_cgroup)
+ goto out;
+
+ lockdep_assert_held(&shrinker_rwsem);
+
+ map_size = shrinker_map_size(new_nr_max);
+ defer_size = shrinker_defer_size(new_nr_max);
+ old_map_size = shrinker_map_size(shrinker_nr_max);
+ old_defer_size = shrinker_defer_size(shrinker_nr_max);
+
+ memcg = mem_cgroup_iter(NULL, NULL, NULL);
+ do {
+ ret = expand_one_shrinker_info(memcg, map_size, defer_size,
+ old_map_size, old_defer_size,
+ new_nr_max);
+ if (ret) {
+ mem_cgroup_iter_break(NULL, memcg);
+ goto out;
+ }
+ } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
+out:
+ if (!ret)
+ shrinker_nr_max = new_nr_max;
+
+ return ret;
+}
+
+void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id)
+{
+ if (shrinker_id >= 0 && memcg && !mem_cgroup_is_root(memcg)) {
+ struct shrinker_info *info;
+
+ rcu_read_lock();
+ info = rcu_dereference(memcg->nodeinfo[nid]->shrinker_info);
+ if (!WARN_ON_ONCE(shrinker_id >= info->map_nr_max)) {
+ /* Pairs with smp mb in shrink_slab() */
+ smp_mb__before_atomic();
+ set_bit(shrinker_id, info->map);
+ }
+ rcu_read_unlock();
+ }
+}
+
+static DEFINE_IDR(shrinker_idr);
+
+static int prealloc_memcg_shrinker(struct shrinker *shrinker)
+{
+ int id, ret = -ENOMEM;
+
+ if (mem_cgroup_disabled())
+ return -ENOSYS;
+
+ down_write(&shrinker_rwsem);
+ /* This may call shrinker, so it must use down_read_trylock() */
+ id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL);
+ if (id < 0)
+ goto unlock;
+
+ if (id >= shrinker_nr_max) {
+ if (expand_shrinker_info(id)) {
+ idr_remove(&shrinker_idr, id);
+ goto unlock;
+ }
+ }
+ shrinker->id = id;
+ ret = 0;
+unlock:
+ up_write(&shrinker_rwsem);
+ return ret;
+}
+
+static void unregister_memcg_shrinker(struct shrinker *shrinker)
+{
+ int id = shrinker->id;
+
+ BUG_ON(id < 0);
+
+ lockdep_assert_held(&shrinker_rwsem);
+
+ idr_remove(&shrinker_idr, id);
+}
+
+static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker,
+ struct mem_cgroup *memcg)
+{
+ struct shrinker_info *info;
+
+ info = shrinker_info_protected(memcg, nid);
+ return atomic_long_xchg(&info->nr_deferred[shrinker->id], 0);
+}
+
+static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker,
+ struct mem_cgroup *memcg)
+{
+ struct shrinker_info *info;
+
+ info = shrinker_info_protected(memcg, nid);
+ return atomic_long_add_return(nr, &info->nr_deferred[shrinker->id]);
+}
+
+void reparent_shrinker_deferred(struct mem_cgroup *memcg)
+{
+ int i, nid;
+ long nr;
+ struct mem_cgroup *parent;
+ struct shrinker_info *child_info, *parent_info;
+
+ parent = parent_mem_cgroup(memcg);
+ if (!parent)
+ parent = root_mem_cgroup;
+
+ /* Prevent from concurrent shrinker_info expand */
+ down_read(&shrinker_rwsem);
+ for_each_node(nid) {
+ child_info = shrinker_info_protected(memcg, nid);
+ parent_info = shrinker_info_protected(parent, nid);
+ for (i = 0; i < child_info->map_nr_max; i++) {
+ nr = atomic_long_read(&child_info->nr_deferred[i]);
+ atomic_long_add(nr, &parent_info->nr_deferred[i]);
+ }
+ }
+ up_read(&shrinker_rwsem);
+}
+#else
+static int prealloc_memcg_shrinker(struct shrinker *shrinker)
+{
+ return -ENOSYS;
+}
+
+static void unregister_memcg_shrinker(struct shrinker *shrinker)
+{
+}
+
+static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker,
+ struct mem_cgroup *memcg)
+{
+ return 0;
+}
+
+static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker,
+ struct mem_cgroup *memcg)
+{
+ return 0;
+}
+#endif /* CONFIG_MEMCG */
+
+static long xchg_nr_deferred(struct shrinker *shrinker,
+ struct shrink_control *sc)
+{
+ int nid = sc->nid;
+
+ if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
+ nid = 0;
+
+ if (sc->memcg &&
+ (shrinker->flags & SHRINKER_MEMCG_AWARE))
+ return xchg_nr_deferred_memcg(nid, shrinker,
+ sc->memcg);
+
+ return atomic_long_xchg(&shrinker->nr_deferred[nid], 0);
+}
+
+
+static long add_nr_deferred(long nr, struct shrinker *shrinker,
+ struct shrink_control *sc)
+{
+ int nid = sc->nid;
+
+ if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
+ nid = 0;
+
+ if (sc->memcg &&
+ (shrinker->flags & SHRINKER_MEMCG_AWARE))
+ return add_nr_deferred_memcg(nr, nid, shrinker,
+ sc->memcg);
+
+ return atomic_long_add_return(nr, &shrinker->nr_deferred[nid]);
+}
+
+#define SHRINK_BATCH 128
+
+static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
+ struct shrinker *shrinker, int priority)
+{
+ unsigned long freed = 0;
+ unsigned long long delta;
+ long total_scan;
+ long freeable;
+ long nr;
+ long new_nr;
+ long batch_size = shrinker->batch ? shrinker->batch
+ : SHRINK_BATCH;
+ long scanned = 0, next_deferred;
+
+ freeable = shrinker->count_objects(shrinker, shrinkctl);
+ if (freeable == 0 || freeable == SHRINK_EMPTY)
+ return freeable;
+
+ /*
+ * copy the current shrinker scan count into a local variable
+ * and zero it so that other concurrent shrinker invocations
+ * don't also do this scanning work.
+ */
+ nr = xchg_nr_deferred(shrinker, shrinkctl);
+
+ if (shrinker->seeks) {
+ delta = freeable >> priority;
+ delta *= 4;
+ do_div(delta, shrinker->seeks);
+ } else {
+ /*
+ * These objects don't require any IO to create. Trim
+ * them aggressively under memory pressure to keep
+ * them from causing refetches in the IO caches.
+ */
+ delta = freeable / 2;
+ }
+
+ total_scan = nr >> priority;
+ total_scan += delta;
+ total_scan = min(total_scan, (2 * freeable));
+
+ trace_mm_shrink_slab_start(shrinker, shrinkctl, nr,
+ freeable, delta, total_scan, priority);
+
+ /*
+ * Normally, we should not scan less than batch_size objects in one
+ * pass to avoid too frequent shrinker calls, but if the slab has less
+ * than batch_size objects in total and we are really tight on memory,
+ * we will try to reclaim all available objects, otherwise we can end
+ * up failing allocations although there are plenty of reclaimable
+ * objects spread over several slabs with usage less than the
+ * batch_size.
+ *
+ * We detect the "tight on memory" situations by looking at the total
+ * number of objects we want to scan (total_scan). If it is greater
+ * than the total number of objects on slab (freeable), we must be
+ * scanning at high prio and therefore should try to reclaim as much as
+ * possible.
+ */
+ while (total_scan >= batch_size ||
+ total_scan >= freeable) {
+ unsigned long ret;
+ unsigned long nr_to_scan = min(batch_size, total_scan);
+
+ shrinkctl->nr_to_scan = nr_to_scan;
+ shrinkctl->nr_scanned = nr_to_scan;
+ ret = shrinker->scan_objects(shrinker, shrinkctl);
+ if (ret == SHRINK_STOP)
+ break;
+ freed += ret;
+
+ count_vm_events(SLABS_SCANNED, shrinkctl->nr_scanned);
+ total_scan -= shrinkctl->nr_scanned;
+ scanned += shrinkctl->nr_scanned;
+
+ cond_resched();
+ }
+
+ /*
+ * The deferred work is increased by any new work (delta) that wasn't
+ * done, decreased by old deferred work that was done now.
+ *
+ * And it is capped to two times of the freeable items.
+ */
+ next_deferred = max_t(long, (nr + delta - scanned), 0);
+ next_deferred = min(next_deferred, (2 * freeable));
+
+ /*
+ * move the unused scan count back into the shrinker in a
+ * manner that handles concurrent updates.
+ */
+ new_nr = add_nr_deferred(next_deferred, shrinker, shrinkctl);
+
+ trace_mm_shrink_slab_end(shrinker, shrinkctl->nid, freed, nr, new_nr, total_scan);
+ return freed;
+}
+
+#ifdef CONFIG_MEMCG
+static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
+ struct mem_cgroup *memcg, int priority)
+{
+ struct shrinker_info *info;
+ unsigned long ret, freed = 0;
+ int i;
+
+ if (!mem_cgroup_online(memcg))
+ return 0;
+
+ if (!down_read_trylock(&shrinker_rwsem))
+ return 0;
+
+ info = shrinker_info_protected(memcg, nid);
+ if (unlikely(!info))
+ goto unlock;
+
+ for_each_set_bit(i, info->map, info->map_nr_max) {
+ struct shrink_control sc = {
+ .gfp_mask = gfp_mask,
+ .nid = nid,
+ .memcg = memcg,
+ };
+ struct shrinker *shrinker;
+
+ shrinker = idr_find(&shrinker_idr, i);
+ if (unlikely(!shrinker || !(shrinker->flags & SHRINKER_REGISTERED))) {
+ if (!shrinker)
+ clear_bit(i, info->map);
+ continue;
+ }
+
+ /* Call non-slab shrinkers even though kmem is disabled */
+ if (!memcg_kmem_online() &&
+ !(shrinker->flags & SHRINKER_NONSLAB))
+ continue;
+
+ ret = do_shrink_slab(&sc, shrinker, priority);
+ if (ret == SHRINK_EMPTY) {
+ clear_bit(i, info->map);
+ /*
+ * After the shrinker reported that it had no objects to
+ * free, but before we cleared the corresponding bit in
+ * the memcg shrinker map, a new object might have been
+ * added. To make sure, we have the bit set in this
+ * case, we invoke the shrinker one more time and reset
+ * the bit if it reports that it is not empty anymore.
+ * The memory barrier here pairs with the barrier in
+ * set_shrinker_bit():
+ *
+ * list_lru_add() shrink_slab_memcg()
+ * list_add_tail() clear_bit()
+ * <MB> <MB>
+ * set_bit() do_shrink_slab()
+ */
+ smp_mb__after_atomic();
+ ret = do_shrink_slab(&sc, shrinker, priority);
+ if (ret == SHRINK_EMPTY)
+ ret = 0;
+ else
+ set_shrinker_bit(memcg, nid, i);
+ }
+ freed += ret;
+
+ if (rwsem_is_contended(&shrinker_rwsem)) {
+ freed = freed ? : 1;
+ break;
+ }
+ }
+unlock:
+ up_read(&shrinker_rwsem);
+ return freed;
+}
+#else /* !CONFIG_MEMCG */
+static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
+ struct mem_cgroup *memcg, int priority)
+{
+ return 0;
+}
+#endif /* CONFIG_MEMCG */
+
+/**
+ * shrink_slab - shrink slab caches
+ * @gfp_mask: allocation context
+ * @nid: node whose slab caches to target
+ * @memcg: memory cgroup whose slab caches to target
+ * @priority: the reclaim priority
+ *
+ * Call the shrink functions to age shrinkable caches.
+ *
+ * @nid is passed along to shrinkers with SHRINKER_NUMA_AWARE set,
+ * unaware shrinkers will receive a node id of 0 instead.
+ *
+ * @memcg specifies the memory cgroup to target. Unaware shrinkers
+ * are called only if it is the root cgroup.
+ *
+ * @priority is sc->priority, we take the number of objects and >> by priority
+ * in order to get the scan target.
+ *
+ * Returns the number of reclaimed slab objects.
+ */
+unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
+ int priority)
+{
+ unsigned long ret, freed = 0;
+ struct shrinker *shrinker;
+
+ /*
+ * The root memcg might be allocated even though memcg is disabled
+ * via "cgroup_disable=memory" boot parameter. This could make
+ * mem_cgroup_is_root() return false, then just run memcg slab
+ * shrink, but skip global shrink. This may result in premature
+ * oom.
+ */
+ if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
+ return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
+
+ if (!down_read_trylock(&shrinker_rwsem))
+ goto out;
+
+ list_for_each_entry(shrinker, &shrinker_list, list) {
+ struct shrink_control sc = {
+ .gfp_mask = gfp_mask,
+ .nid = nid,
+ .memcg = memcg,
+ };
+
+ ret = do_shrink_slab(&sc, shrinker, priority);
+ if (ret == SHRINK_EMPTY)
+ ret = 0;
+ freed += ret;
+ /*
+ * Bail out if someone want to register a new shrinker to
+ * prevent the registration from being stalled for long periods
+ * by parallel ongoing shrinking.
+ */
+ if (rwsem_is_contended(&shrinker_rwsem)) {
+ freed = freed ? : 1;
+ break;
+ }
+ }
+
+ up_read(&shrinker_rwsem);
+out:
+ cond_resched();
+ return freed;
+}
+
+/*
+ * Add a shrinker callback to be called from the vm.
+ */
+static int __prealloc_shrinker(struct shrinker *shrinker)
+{
+ unsigned int size;
+ int err;
+
+ if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
+ err = prealloc_memcg_shrinker(shrinker);
+ if (err != -ENOSYS)
+ return err;
+
+ shrinker->flags &= ~SHRINKER_MEMCG_AWARE;
+ }
+
+ size = sizeof(*shrinker->nr_deferred);
+ if (shrinker->flags & SHRINKER_NUMA_AWARE)
+ size *= nr_node_ids;
+
+ shrinker->nr_deferred = kzalloc(size, GFP_KERNEL);
+ if (!shrinker->nr_deferred)
+ return -ENOMEM;
+
+ return 0;
+}
+
+#ifdef CONFIG_SHRINKER_DEBUG
+int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
+{
+ va_list ap;
+ int err;
+
+ va_start(ap, fmt);
+ shrinker->name = kvasprintf_const(GFP_KERNEL, fmt, ap);
+ va_end(ap);
+ if (!shrinker->name)
+ return -ENOMEM;
+
+ err = __prealloc_shrinker(shrinker);
+ if (err) {
+ kfree_const(shrinker->name);
+ shrinker->name = NULL;
+ }
+
+ return err;
+}
+#else
+int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
+{
+ return __prealloc_shrinker(shrinker);
+}
+#endif
+
+void free_prealloced_shrinker(struct shrinker *shrinker)
+{
+#ifdef CONFIG_SHRINKER_DEBUG
+ kfree_const(shrinker->name);
+ shrinker->name = NULL;
+#endif
+ if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
+ down_write(&shrinker_rwsem);
+ unregister_memcg_shrinker(shrinker);
+ up_write(&shrinker_rwsem);
+ return;
+ }
+
+ kfree(shrinker->nr_deferred);
+ shrinker->nr_deferred = NULL;
+}
+
+void register_shrinker_prepared(struct shrinker *shrinker)
+{
+ down_write(&shrinker_rwsem);
+ list_add_tail(&shrinker->list, &shrinker_list);
+ shrinker->flags |= SHRINKER_REGISTERED;
+ shrinker_debugfs_add(shrinker);
+ up_write(&shrinker_rwsem);
+}
+
+static int __register_shrinker(struct shrinker *shrinker)
+{
+ int err = __prealloc_shrinker(shrinker);
+
+ if (err)
+ return err;
+ register_shrinker_prepared(shrinker);
+ return 0;
+}
+
+#ifdef CONFIG_SHRINKER_DEBUG
+int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
+{
+ va_list ap;
+ int err;
+
+ va_start(ap, fmt);
+ shrinker->name = kvasprintf_const(GFP_KERNEL, fmt, ap);
+ va_end(ap);
+ if (!shrinker->name)
+ return -ENOMEM;
+
+ err = __register_shrinker(shrinker);
+ if (err) {
+ kfree_const(shrinker->name);
+ shrinker->name = NULL;
+ }
+ return err;
+}
+#else
+int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
+{
+ return __register_shrinker(shrinker);
+}
+#endif
+EXPORT_SYMBOL(register_shrinker);
+
+/*
+ * Remove one
+ */
+void unregister_shrinker(struct shrinker *shrinker)
+{
+ struct dentry *debugfs_entry;
+ int debugfs_id;
+
+ if (!(shrinker->flags & SHRINKER_REGISTERED))
+ return;
+
+ down_write(&shrinker_rwsem);
+ list_del(&shrinker->list);
+ shrinker->flags &= ~SHRINKER_REGISTERED;
+ if (shrinker->flags & SHRINKER_MEMCG_AWARE)
+ unregister_memcg_shrinker(shrinker);
+ debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id);
+ up_write(&shrinker_rwsem);
+
+ shrinker_debugfs_remove(debugfs_entry, debugfs_id);
+
+ kfree(shrinker->nr_deferred);
+ shrinker->nr_deferred = NULL;
+}
+EXPORT_SYMBOL(unregister_shrinker);
+
+/**
+ * synchronize_shrinkers - Wait for all running shrinkers to complete.
+ *
+ * This is equivalent to calling unregister_shrink() and register_shrinker(),
+ * but atomically and with less overhead. This is useful to guarantee that all
+ * shrinker invocations have seen an update, before freeing memory, similar to
+ * rcu.
+ */
+void synchronize_shrinkers(void)
+{
+ down_write(&shrinker_rwsem);
+ up_write(&shrinker_rwsem);
+}
+EXPORT_SYMBOL(synchronize_shrinkers);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4039620d30fe..07bc58af6f26 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -35,7 +35,6 @@
#include <linux/cpuset.h>
#include <linux/compaction.h>
#include <linux/notifier.h>
-#include <linux/rwsem.h>
#include <linux/delay.h>
#include <linux/kthread.h>
#include <linux/freezer.h>
@@ -188,246 +187,7 @@ struct scan_control {
*/
int vm_swappiness = 60;
-LIST_HEAD(shrinker_list);
-DECLARE_RWSEM(shrinker_rwsem);
-
#ifdef CONFIG_MEMCG
-static int shrinker_nr_max;
-
-/* The shrinker_info is expanded in a batch of BITS_PER_LONG */
-static inline int shrinker_map_size(int nr_items)
-{
- return (DIV_ROUND_UP(nr_items, BITS_PER_LONG) * sizeof(unsigned long));
-}
-
-static inline int shrinker_defer_size(int nr_items)
-{
- return (round_up(nr_items, BITS_PER_LONG) * sizeof(atomic_long_t));
-}
-
-static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg,
- int nid)
-{
- return rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info,
- lockdep_is_held(&shrinker_rwsem));
-}
-
-static int expand_one_shrinker_info(struct mem_cgroup *memcg,
- int map_size, int defer_size,
- int old_map_size, int old_defer_size,
- int new_nr_max)
-{
- struct shrinker_info *new, *old;
- struct mem_cgroup_per_node *pn;
- int nid;
- int size = map_size + defer_size;
-
- for_each_node(nid) {
- pn = memcg->nodeinfo[nid];
- old = shrinker_info_protected(memcg, nid);
- /* Not yet online memcg */
- if (!old)
- return 0;
-
- /* Already expanded this shrinker_info */
- if (new_nr_max <= old->map_nr_max)
- continue;
-
- new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
- if (!new)
- return -ENOMEM;
-
- new->nr_deferred = (atomic_long_t *)(new + 1);
- new->map = (void *)new->nr_deferred + defer_size;
- new->map_nr_max = new_nr_max;
-
- /* map: set all old bits, clear all new bits */
- memset(new->map, (int)0xff, old_map_size);
- memset((void *)new->map + old_map_size, 0, map_size - old_map_size);
- /* nr_deferred: copy old values, clear all new values */
- memcpy(new->nr_deferred, old->nr_deferred, old_defer_size);
- memset((void *)new->nr_deferred + old_defer_size, 0,
- defer_size - old_defer_size);
-
- rcu_assign_pointer(pn->shrinker_info, new);
- kvfree_rcu(old, rcu);
- }
-
- return 0;
-}
-
-void free_shrinker_info(struct mem_cgroup *memcg)
-{
- struct mem_cgroup_per_node *pn;
- struct shrinker_info *info;
- int nid;
-
- for_each_node(nid) {
- pn = memcg->nodeinfo[nid];
- info = rcu_dereference_protected(pn->shrinker_info, true);
- kvfree(info);
- rcu_assign_pointer(pn->shrinker_info, NULL);
- }
-}
-
-int alloc_shrinker_info(struct mem_cgroup *memcg)
-{
- struct shrinker_info *info;
- int nid, size, ret = 0;
- int map_size, defer_size = 0;
-
- down_write(&shrinker_rwsem);
- map_size = shrinker_map_size(shrinker_nr_max);
- defer_size = shrinker_defer_size(shrinker_nr_max);
- size = map_size + defer_size;
- for_each_node(nid) {
- info = kvzalloc_node(sizeof(*info) + size, GFP_KERNEL, nid);
- if (!info) {
- free_shrinker_info(memcg);
- ret = -ENOMEM;
- break;
- }
- info->nr_deferred = (atomic_long_t *)(info + 1);
- info->map = (void *)info->nr_deferred + defer_size;
- info->map_nr_max = shrinker_nr_max;
- rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
- }
- up_write(&shrinker_rwsem);
-
- return ret;
-}
-
-static int expand_shrinker_info(int new_id)
-{
- int ret = 0;
- int new_nr_max = round_up(new_id + 1, BITS_PER_LONG);
- int map_size, defer_size = 0;
- int old_map_size, old_defer_size = 0;
- struct mem_cgroup *memcg;
-
- if (!root_mem_cgroup)
- goto out;
-
- lockdep_assert_held(&shrinker_rwsem);
-
- map_size = shrinker_map_size(new_nr_max);
- defer_size = shrinker_defer_size(new_nr_max);
- old_map_size = shrinker_map_size(shrinker_nr_max);
- old_defer_size = shrinker_defer_size(shrinker_nr_max);
-
- memcg = mem_cgroup_iter(NULL, NULL, NULL);
- do {
- ret = expand_one_shrinker_info(memcg, map_size, defer_size,
- old_map_size, old_defer_size,
- new_nr_max);
- if (ret) {
- mem_cgroup_iter_break(NULL, memcg);
- goto out;
- }
- } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
-out:
- if (!ret)
- shrinker_nr_max = new_nr_max;
-
- return ret;
-}
-
-void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id)
-{
- if (shrinker_id >= 0 && memcg && !mem_cgroup_is_root(memcg)) {
- struct shrinker_info *info;
-
- rcu_read_lock();
- info = rcu_dereference(memcg->nodeinfo[nid]->shrinker_info);
- if (!WARN_ON_ONCE(shrinker_id >= info->map_nr_max)) {
- /* Pairs with smp mb in shrink_slab() */
- smp_mb__before_atomic();
- set_bit(shrinker_id, info->map);
- }
- rcu_read_unlock();
- }
-}
-
-static DEFINE_IDR(shrinker_idr);
-
-static int prealloc_memcg_shrinker(struct shrinker *shrinker)
-{
- int id, ret = -ENOMEM;
-
- if (mem_cgroup_disabled())
- return -ENOSYS;
-
- down_write(&shrinker_rwsem);
- /* This may call shrinker, so it must use down_read_trylock() */
- id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL);
- if (id < 0)
- goto unlock;
-
- if (id >= shrinker_nr_max) {
- if (expand_shrinker_info(id)) {
- idr_remove(&shrinker_idr, id);
- goto unlock;
- }
- }
- shrinker->id = id;
- ret = 0;
-unlock:
- up_write(&shrinker_rwsem);
- return ret;
-}
-
-static void unregister_memcg_shrinker(struct shrinker *shrinker)
-{
- int id = shrinker->id;
-
- BUG_ON(id < 0);
-
- lockdep_assert_held(&shrinker_rwsem);
-
- idr_remove(&shrinker_idr, id);
-}
-
-static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker,
- struct mem_cgroup *memcg)
-{
- struct shrinker_info *info;
-
- info = shrinker_info_protected(memcg, nid);
- return atomic_long_xchg(&info->nr_deferred[shrinker->id], 0);
-}
-
-static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker,
- struct mem_cgroup *memcg)
-{
- struct shrinker_info *info;
-
- info = shrinker_info_protected(memcg, nid);
- return atomic_long_add_return(nr, &info->nr_deferred[shrinker->id]);
-}
-
-void reparent_shrinker_deferred(struct mem_cgroup *memcg)
-{
- int i, nid;
- long nr;
- struct mem_cgroup *parent;
- struct shrinker_info *child_info, *parent_info;
-
- parent = parent_mem_cgroup(memcg);
- if (!parent)
- parent = root_mem_cgroup;
-
- /* Prevent from concurrent shrinker_info expand */
- down_read(&shrinker_rwsem);
- for_each_node(nid) {
- child_info = shrinker_info_protected(memcg, nid);
- parent_info = shrinker_info_protected(parent, nid);
- for (i = 0; i < child_info->map_nr_max; i++) {
- nr = atomic_long_read(&child_info->nr_deferred[i]);
- atomic_long_add(nr, &parent_info->nr_deferred[i]);
- }
- }
- up_read(&shrinker_rwsem);
-}
/* Returns true for reclaim through cgroup limits or cgroup interfaces. */
static bool cgroup_reclaim(struct scan_control *sc)
@@ -468,27 +228,6 @@ static bool writeback_throttling_sane(struct scan_control *sc)
return false;
}
#else
-static int prealloc_memcg_shrinker(struct shrinker *shrinker)
-{
- return -ENOSYS;
-}
-
-static void unregister_memcg_shrinker(struct shrinker *shrinker)
-{
-}
-
-static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker,
- struct mem_cgroup *memcg)
-{
- return 0;
-}
-
-static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker,
- struct mem_cgroup *memcg)
-{
- return 0;
-}
-
static bool cgroup_reclaim(struct scan_control *sc)
{
return false;
@@ -557,39 +296,6 @@ static void flush_reclaim_state(struct scan_control *sc)
}
}
-static long xchg_nr_deferred(struct shrinker *shrinker,
- struct shrink_control *sc)
-{
- int nid = sc->nid;
-
- if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
- nid = 0;
-
- if (sc->memcg &&
- (shrinker->flags & SHRINKER_MEMCG_AWARE))
- return xchg_nr_deferred_memcg(nid, shrinker,
- sc->memcg);
-
- return atomic_long_xchg(&shrinker->nr_deferred[nid], 0);
-}
-
-
-static long add_nr_deferred(long nr, struct shrinker *shrinker,
- struct shrink_control *sc)
-{
- int nid = sc->nid;
-
- if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
- nid = 0;
-
- if (sc->memcg &&
- (shrinker->flags & SHRINKER_MEMCG_AWARE))
- return add_nr_deferred_memcg(nr, nid, shrinker,
- sc->memcg);
-
- return atomic_long_add_return(nr, &shrinker->nr_deferred[nid]);
-}
-
static bool can_demote(int nid, struct scan_control *sc)
{
if (!numa_demotion_enabled)
@@ -671,413 +377,6 @@ static unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru,
return size;
}
-/*
- * Add a shrinker callback to be called from the vm.
- */
-static int __prealloc_shrinker(struct shrinker *shrinker)
-{
- unsigned int size;
- int err;
-
- if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
- err = prealloc_memcg_shrinker(shrinker);
- if (err != -ENOSYS)
- return err;
-
- shrinker->flags &= ~SHRINKER_MEMCG_AWARE;
- }
-
- size = sizeof(*shrinker->nr_deferred);
- if (shrinker->flags & SHRINKER_NUMA_AWARE)
- size *= nr_node_ids;
-
- shrinker->nr_deferred = kzalloc(size, GFP_KERNEL);
- if (!shrinker->nr_deferred)
- return -ENOMEM;
-
- return 0;
-}
-
-#ifdef CONFIG_SHRINKER_DEBUG
-int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
-{
- va_list ap;
- int err;
-
- va_start(ap, fmt);
- shrinker->name = kvasprintf_const(GFP_KERNEL, fmt, ap);
- va_end(ap);
- if (!shrinker->name)
- return -ENOMEM;
-
- err = __prealloc_shrinker(shrinker);
- if (err) {
- kfree_const(shrinker->name);
- shrinker->name = NULL;
- }
-
- return err;
-}
-#else
-int prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...)
-{
- return __prealloc_shrinker(shrinker);
-}
-#endif
-
-void free_prealloced_shrinker(struct shrinker *shrinker)
-{
-#ifdef CONFIG_SHRINKER_DEBUG
- kfree_const(shrinker->name);
- shrinker->name = NULL;
-#endif
- if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
- down_write(&shrinker_rwsem);
- unregister_memcg_shrinker(shrinker);
- up_write(&shrinker_rwsem);
- return;
- }
-
- kfree(shrinker->nr_deferred);
- shrinker->nr_deferred = NULL;
-}
-
-void register_shrinker_prepared(struct shrinker *shrinker)
-{
- down_write(&shrinker_rwsem);
- list_add_tail(&shrinker->list, &shrinker_list);
- shrinker->flags |= SHRINKER_REGISTERED;
- shrinker_debugfs_add(shrinker);
- up_write(&shrinker_rwsem);
-}
-
-static int __register_shrinker(struct shrinker *shrinker)
-{
- int err = __prealloc_shrinker(shrinker);
-
- if (err)
- return err;
- register_shrinker_prepared(shrinker);
- return 0;
-}
-
-#ifdef CONFIG_SHRINKER_DEBUG
-int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
-{
- va_list ap;
- int err;
-
- va_start(ap, fmt);
- shrinker->name = kvasprintf_const(GFP_KERNEL, fmt, ap);
- va_end(ap);
- if (!shrinker->name)
- return -ENOMEM;
-
- err = __register_shrinker(shrinker);
- if (err) {
- kfree_const(shrinker->name);
- shrinker->name = NULL;
- }
- return err;
-}
-#else
-int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
-{
- return __register_shrinker(shrinker);
-}
-#endif
-EXPORT_SYMBOL(register_shrinker);
-
-/*
- * Remove one
- */
-void unregister_shrinker(struct shrinker *shrinker)
-{
- struct dentry *debugfs_entry;
- int debugfs_id;
-
- if (!(shrinker->flags & SHRINKER_REGISTERED))
- return;
-
- down_write(&shrinker_rwsem);
- list_del(&shrinker->list);
- shrinker->flags &= ~SHRINKER_REGISTERED;
- if (shrinker->flags & SHRINKER_MEMCG_AWARE)
- unregister_memcg_shrinker(shrinker);
- debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id);
- up_write(&shrinker_rwsem);
-
- shrinker_debugfs_remove(debugfs_entry, debugfs_id);
-
- kfree(shrinker->nr_deferred);
- shrinker->nr_deferred = NULL;
-}
-EXPORT_SYMBOL(unregister_shrinker);
-
-/**
- * synchronize_shrinkers - Wait for all running shrinkers to complete.
- *
- * This is equivalent to calling unregister_shrink() and register_shrinker(),
- * but atomically and with less overhead. This is useful to guarantee that all
- * shrinker invocations have seen an update, before freeing memory, similar to
- * rcu.
- */
-void synchronize_shrinkers(void)
-{
- down_write(&shrinker_rwsem);
- up_write(&shrinker_rwsem);
-}
-EXPORT_SYMBOL(synchronize_shrinkers);
-
-#define SHRINK_BATCH 128
-
-static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
- struct shrinker *shrinker, int priority)
-{
- unsigned long freed = 0;
- unsigned long long delta;
- long total_scan;
- long freeable;
- long nr;
- long new_nr;
- long batch_size = shrinker->batch ? shrinker->batch
- : SHRINK_BATCH;
- long scanned = 0, next_deferred;
-
- freeable = shrinker->count_objects(shrinker, shrinkctl);
- if (freeable == 0 || freeable == SHRINK_EMPTY)
- return freeable;
-
- /*
- * copy the current shrinker scan count into a local variable
- * and zero it so that other concurrent shrinker invocations
- * don't also do this scanning work.
- */
- nr = xchg_nr_deferred(shrinker, shrinkctl);
-
- if (shrinker->seeks) {
- delta = freeable >> priority;
- delta *= 4;
- do_div(delta, shrinker->seeks);
- } else {
- /*
- * These objects don't require any IO to create. Trim
- * them aggressively under memory pressure to keep
- * them from causing refetches in the IO caches.
- */
- delta = freeable / 2;
- }
-
- total_scan = nr >> priority;
- total_scan += delta;
- total_scan = min(total_scan, (2 * freeable));
-
- trace_mm_shrink_slab_start(shrinker, shrinkctl, nr,
- freeable, delta, total_scan, priority);
-
- /*
- * Normally, we should not scan less than batch_size objects in one
- * pass to avoid too frequent shrinker calls, but if the slab has less
- * than batch_size objects in total and we are really tight on memory,
- * we will try to reclaim all available objects, otherwise we can end
- * up failing allocations although there are plenty of reclaimable
- * objects spread over several slabs with usage less than the
- * batch_size.
- *
- * We detect the "tight on memory" situations by looking at the total
- * number of objects we want to scan (total_scan). If it is greater
- * than the total number of objects on slab (freeable), we must be
- * scanning at high prio and therefore should try to reclaim as much as
- * possible.
- */
- while (total_scan >= batch_size ||
- total_scan >= freeable) {
- unsigned long ret;
- unsigned long nr_to_scan = min(batch_size, total_scan);
-
- shrinkctl->nr_to_scan = nr_to_scan;
- shrinkctl->nr_scanned = nr_to_scan;
- ret = shrinker->scan_objects(shrinker, shrinkctl);
- if (ret == SHRINK_STOP)
- break;
- freed += ret;
-
- count_vm_events(SLABS_SCANNED, shrinkctl->nr_scanned);
- total_scan -= shrinkctl->nr_scanned;
- scanned += shrinkctl->nr_scanned;
-
- cond_resched();
- }
-
- /*
- * The deferred work is increased by any new work (delta) that wasn't
- * done, decreased by old deferred work that was done now.
- *
- * And it is capped to two times of the freeable items.
- */
- next_deferred = max_t(long, (nr + delta - scanned), 0);
- next_deferred = min(next_deferred, (2 * freeable));
-
- /*
- * move the unused scan count back into the shrinker in a
- * manner that handles concurrent updates.
- */
- new_nr = add_nr_deferred(next_deferred, shrinker, shrinkctl);
-
- trace_mm_shrink_slab_end(shrinker, shrinkctl->nid, freed, nr, new_nr, total_scan);
- return freed;
-}
-
-#ifdef CONFIG_MEMCG
-static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
- struct mem_cgroup *memcg, int priority)
-{
- struct shrinker_info *info;
- unsigned long ret, freed = 0;
- int i;
-
- if (!mem_cgroup_online(memcg))
- return 0;
-
- if (!down_read_trylock(&shrinker_rwsem))
- return 0;
-
- info = shrinker_info_protected(memcg, nid);
- if (unlikely(!info))
- goto unlock;
-
- for_each_set_bit(i, info->map, info->map_nr_max) {
- struct shrink_control sc = {
- .gfp_mask = gfp_mask,
- .nid = nid,
- .memcg = memcg,
- };
- struct shrinker *shrinker;
-
- shrinker = idr_find(&shrinker_idr, i);
- if (unlikely(!shrinker || !(shrinker->flags & SHRINKER_REGISTERED))) {
- if (!shrinker)
- clear_bit(i, info->map);
- continue;
- }
-
- /* Call non-slab shrinkers even though kmem is disabled */
- if (!memcg_kmem_online() &&
- !(shrinker->flags & SHRINKER_NONSLAB))
- continue;
-
- ret = do_shrink_slab(&sc, shrinker, priority);
- if (ret == SHRINK_EMPTY) {
- clear_bit(i, info->map);
- /*
- * After the shrinker reported that it had no objects to
- * free, but before we cleared the corresponding bit in
- * the memcg shrinker map, a new object might have been
- * added. To make sure, we have the bit set in this
- * case, we invoke the shrinker one more time and reset
- * the bit if it reports that it is not empty anymore.
- * The memory barrier here pairs with the barrier in
- * set_shrinker_bit():
- *
- * list_lru_add() shrink_slab_memcg()
- * list_add_tail() clear_bit()
- * <MB> <MB>
- * set_bit() do_shrink_slab()
- */
- smp_mb__after_atomic();
- ret = do_shrink_slab(&sc, shrinker, priority);
- if (ret == SHRINK_EMPTY)
- ret = 0;
- else
- set_shrinker_bit(memcg, nid, i);
- }
- freed += ret;
-
- if (rwsem_is_contended(&shrinker_rwsem)) {
- freed = freed ? : 1;
- break;
- }
- }
-unlock:
- up_read(&shrinker_rwsem);
- return freed;
-}
-#else /* CONFIG_MEMCG */
-static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
- struct mem_cgroup *memcg, int priority)
-{
- return 0;
-}
-#endif /* CONFIG_MEMCG */
-
-/**
- * shrink_slab - shrink slab caches
- * @gfp_mask: allocation context
- * @nid: node whose slab caches to target
- * @memcg: memory cgroup whose slab caches to target
- * @priority: the reclaim priority
- *
- * Call the shrink functions to age shrinkable caches.
- *
- * @nid is passed along to shrinkers with SHRINKER_NUMA_AWARE set,
- * unaware shrinkers will receive a node id of 0 instead.
- *
- * @memcg specifies the memory cgroup to target. Unaware shrinkers
- * are called only if it is the root cgroup.
- *
- * @priority is sc->priority, we take the number of objects and >> by priority
- * in order to get the scan target.
- *
- * Returns the number of reclaimed slab objects.
- */
-static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
- struct mem_cgroup *memcg,
- int priority)
-{
- unsigned long ret, freed = 0;
- struct shrinker *shrinker;
-
- /*
- * The root memcg might be allocated even though memcg is disabled
- * via "cgroup_disable=memory" boot parameter. This could make
- * mem_cgroup_is_root() return false, then just run memcg slab
- * shrink, but skip global shrink. This may result in premature
- * oom.
- */
- if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
- return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
-
- if (!down_read_trylock(&shrinker_rwsem))
- goto out;
-
- list_for_each_entry(shrinker, &shrinker_list, list) {
- struct shrink_control sc = {
- .gfp_mask = gfp_mask,
- .nid = nid,
- .memcg = memcg,
- };
-
- ret = do_shrink_slab(&sc, shrinker, priority);
- if (ret == SHRINK_EMPTY)
- ret = 0;
- freed += ret;
- /*
- * Bail out if someone want to register a new shrinker to
- * prevent the registration from being stalled for long periods
- * by parallel ongoing shrinking.
- */
- if (rwsem_is_contended(&shrinker_rwsem)) {
- freed = freed ? : 1;
- break;
- }
- }
-
- up_read(&shrinker_rwsem);
-out:
- cond_resched();
- return freed;
-}
-
static unsigned long drop_slab_node(int nid)
{
unsigned long freed = 0;
--
2.30.2
The following functions are only used inside the mm subsystem, so it's
better to move their declarations to the mm/internal.h file.
1. shrinker_debugfs_add()
2. shrinker_debugfs_detach()
3. shrinker_debugfs_remove()
Signed-off-by: Qi Zheng <[email protected]>
---
include/linux/shrinker.h | 19 -------------------
mm/internal.h | 28 ++++++++++++++++++++++++++++
2 files changed, 28 insertions(+), 19 deletions(-)
diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 224293b2dd06..8dc15aa37410 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -106,28 +106,9 @@ extern void free_prealloced_shrinker(struct shrinker *shrinker);
extern void synchronize_shrinkers(void);
#ifdef CONFIG_SHRINKER_DEBUG
-extern int shrinker_debugfs_add(struct shrinker *shrinker);
-extern struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker,
- int *debugfs_id);
-extern void shrinker_debugfs_remove(struct dentry *debugfs_entry,
- int debugfs_id);
extern int __printf(2, 3) shrinker_debugfs_rename(struct shrinker *shrinker,
const char *fmt, ...);
#else /* CONFIG_SHRINKER_DEBUG */
-static inline int shrinker_debugfs_add(struct shrinker *shrinker)
-{
- return 0;
-}
-static inline struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker,
- int *debugfs_id)
-{
- *debugfs_id = -1;
- return NULL;
-}
-static inline void shrinker_debugfs_remove(struct dentry *debugfs_entry,
- int debugfs_id)
-{
-}
static inline __printf(2, 3)
int shrinker_debugfs_rename(struct shrinker *shrinker, const char *fmt, ...)
{
diff --git a/mm/internal.h b/mm/internal.h
index 5a03bc4782a2..8aeaf16ae039 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1135,4 +1135,32 @@ struct vma_prepare {
struct vm_area_struct *remove;
struct vm_area_struct *remove2;
};
+
+/*
+ * shrinker related functions
+ */
+
+#ifdef CONFIG_SHRINKER_DEBUG
+extern int shrinker_debugfs_add(struct shrinker *shrinker);
+extern struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker,
+ int *debugfs_id);
+extern void shrinker_debugfs_remove(struct dentry *debugfs_entry,
+ int debugfs_id);
+#else /* CONFIG_SHRINKER_DEBUG */
+static inline int shrinker_debugfs_add(struct shrinker *shrinker)
+{
+ return 0;
+}
+static inline struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker,
+ int *debugfs_id)
+{
+ *debugfs_id = -1;
+ return NULL;
+}
+static inline void shrinker_debugfs_remove(struct dentry *debugfs_entry,
+ int debugfs_id)
+{
+}
+#endif /* CONFIG_SHRINKER_DEBUG */
+
#endif /* __MM_INTERNAL_H */
--
2.30.2
The debugfs_remove_recursive() will wait for debugfs_file_put() to return,
so the shrinker will not be freed when doing debugfs operations (such as
shrinker_debugfs_count_show() and shrinker_debugfs_scan_write()), so there
is no need to hold shrinker_rwsem during debugfs operations.
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
mm/shrinker_debug.c | 14 --------------
1 file changed, 14 deletions(-)
diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
index 3ab53fad8876..f1becfd45853 100644
--- a/mm/shrinker_debug.c
+++ b/mm/shrinker_debug.c
@@ -55,11 +55,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
if (!count_per_node)
return -ENOMEM;
- ret = down_read_killable(&shrinker_rwsem);
- if (ret) {
- kfree(count_per_node);
- return ret;
- }
rcu_read_lock();
memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
@@ -92,7 +87,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
rcu_read_unlock();
- up_read(&shrinker_rwsem);
kfree(count_per_node);
return ret;
@@ -117,7 +111,6 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
struct mem_cgroup *memcg = NULL;
int nid;
char kbuf[72];
- ssize_t ret;
read_len = size < (sizeof(kbuf) - 1) ? size : (sizeof(kbuf) - 1);
if (copy_from_user(kbuf, buf, read_len))
@@ -146,12 +139,6 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
return -EINVAL;
}
- ret = down_read_killable(&shrinker_rwsem);
- if (ret) {
- mem_cgroup_put(memcg);
- return ret;
- }
-
sc.nid = nid;
sc.memcg = memcg;
sc.nr_to_scan = nr_to_scan;
@@ -159,7 +146,6 @@ static ssize_t shrinker_debugfs_scan_write(struct file *file,
shrinker->scan_objects(shrinker, &sc);
- up_read(&shrinker_rwsem);
mem_cgroup_put(memcg);
return size;
--
2.30.2
Use new APIs to dynamically allocate the f2fs-shrinker.
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
fs/f2fs/super.c | 32 ++++++++++++++++++++++++--------
1 file changed, 24 insertions(+), 8 deletions(-)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index a123f1378d57..9200b67aa745 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -83,11 +83,27 @@ void f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned int rate,
#endif
/* f2fs-wide shrinker description */
-static struct shrinker f2fs_shrinker_info = {
- .scan_objects = f2fs_shrink_scan,
- .count_objects = f2fs_shrink_count,
- .seeks = DEFAULT_SEEKS,
-};
+static struct shrinker *f2fs_shrinker_info;
+
+static int __init f2fs_init_shrinker(void)
+{
+ f2fs_shrinker_info = shrinker_alloc(0, "f2fs-shrinker");
+ if (!f2fs_shrinker_info)
+ return -ENOMEM;
+
+ f2fs_shrinker_info->count_objects = f2fs_shrink_count;
+ f2fs_shrinker_info->scan_objects = f2fs_shrink_scan;
+ f2fs_shrinker_info->seeks = DEFAULT_SEEKS;
+
+ shrinker_register(f2fs_shrinker_info);
+
+ return 0;
+}
+
+static void f2fs_exit_shrinker(void)
+{
+ shrinker_free(f2fs_shrinker_info);
+}
enum {
Opt_gc_background,
@@ -4937,7 +4953,7 @@ static int __init init_f2fs_fs(void)
err = f2fs_init_sysfs();
if (err)
goto free_garbage_collection_cache;
- err = register_shrinker(&f2fs_shrinker_info, "f2fs-shrinker");
+ err = f2fs_init_shrinker();
if (err)
goto free_sysfs;
err = register_filesystem(&f2fs_fs_type);
@@ -4982,7 +4998,7 @@ static int __init init_f2fs_fs(void)
f2fs_destroy_root_stats();
unregister_filesystem(&f2fs_fs_type);
free_shrinker:
- unregister_shrinker(&f2fs_shrinker_info);
+ f2fs_exit_shrinker();
free_sysfs:
f2fs_exit_sysfs();
free_garbage_collection_cache:
@@ -5014,7 +5030,7 @@ static void __exit exit_f2fs_fs(void)
f2fs_destroy_post_read_processing();
f2fs_destroy_root_stats();
unregister_filesystem(&f2fs_fs_type);
- unregister_shrinker(&f2fs_shrinker_info);
+ f2fs_exit_shrinker();
f2fs_exit_sysfs();
f2fs_destroy_garbage_collection_cache();
f2fs_destroy_extent_cache();
--
2.30.2
Use new APIs to dynamically allocate the nfs-xattr shrinkers.
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
fs/nfs/nfs42xattr.c | 87 +++++++++++++++++++++++----------------------
1 file changed, 44 insertions(+), 43 deletions(-)
diff --git a/fs/nfs/nfs42xattr.c b/fs/nfs/nfs42xattr.c
index 911f634ba3da..2ad66a8922f4 100644
--- a/fs/nfs/nfs42xattr.c
+++ b/fs/nfs/nfs42xattr.c
@@ -796,28 +796,9 @@ static unsigned long nfs4_xattr_cache_scan(struct shrinker *shrink,
static unsigned long nfs4_xattr_entry_scan(struct shrinker *shrink,
struct shrink_control *sc);
-static struct shrinker nfs4_xattr_cache_shrinker = {
- .count_objects = nfs4_xattr_cache_count,
- .scan_objects = nfs4_xattr_cache_scan,
- .seeks = DEFAULT_SEEKS,
- .flags = SHRINKER_MEMCG_AWARE,
-};
-
-static struct shrinker nfs4_xattr_entry_shrinker = {
- .count_objects = nfs4_xattr_entry_count,
- .scan_objects = nfs4_xattr_entry_scan,
- .seeks = DEFAULT_SEEKS,
- .batch = 512,
- .flags = SHRINKER_MEMCG_AWARE,
-};
-
-static struct shrinker nfs4_xattr_large_entry_shrinker = {
- .count_objects = nfs4_xattr_entry_count,
- .scan_objects = nfs4_xattr_entry_scan,
- .seeks = 1,
- .batch = 512,
- .flags = SHRINKER_MEMCG_AWARE,
-};
+static struct shrinker *nfs4_xattr_cache_shrinker;
+static struct shrinker *nfs4_xattr_entry_shrinker;
+static struct shrinker *nfs4_xattr_large_entry_shrinker;
static enum lru_status
cache_lru_isolate(struct list_head *item,
@@ -943,7 +924,7 @@ nfs4_xattr_entry_scan(struct shrinker *shrink, struct shrink_control *sc)
struct nfs4_xattr_entry *entry;
struct list_lru *lru;
- lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
+ lru = (shrink == nfs4_xattr_large_entry_shrinker) ?
&nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
freed = list_lru_shrink_walk(lru, sc, entry_lru_isolate, &dispose);
@@ -971,7 +952,7 @@ nfs4_xattr_entry_count(struct shrinker *shrink, struct shrink_control *sc)
unsigned long count;
struct list_lru *lru;
- lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
+ lru = (shrink == nfs4_xattr_large_entry_shrinker) ?
&nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
count = list_lru_shrink_count(lru, sc);
@@ -991,18 +972,34 @@ static void nfs4_xattr_cache_init_once(void *p)
INIT_LIST_HEAD(&cache->dispose);
}
-static int nfs4_xattr_shrinker_init(struct shrinker *shrinker,
- struct list_lru *lru, const char *name)
+typedef unsigned long (*count_objects_cb)(struct shrinker *s,
+ struct shrink_control *sc);
+typedef unsigned long (*scan_objects_cb)(struct shrinker *s,
+ struct shrink_control *sc);
+
+static int __init nfs4_xattr_shrinker_init(struct shrinker **shrinker,
+ struct list_lru *lru, const char *name,
+ count_objects_cb count,
+ scan_objects_cb scan, long batch, int seeks)
{
- int ret = 0;
+ int ret;
- ret = register_shrinker(shrinker, name);
- if (ret)
+ *shrinker = shrinker_alloc(SHRINKER_MEMCG_AWARE, name);
+ if (!*shrinker)
+ return -ENOMEM;
+
+ ret = list_lru_init_memcg(lru, *shrinker);
+ if (ret) {
+ shrinker_free(*shrinker);
return ret;
+ }
- ret = list_lru_init_memcg(lru, shrinker);
- if (ret)
- unregister_shrinker(shrinker);
+ (*shrinker)->count_objects = count;
+ (*shrinker)->scan_objects = scan;
+ (*shrinker)->batch = batch;
+ (*shrinker)->seeks = seeks;
+
+ shrinker_register(*shrinker);
return ret;
}
@@ -1010,7 +1007,7 @@ static int nfs4_xattr_shrinker_init(struct shrinker *shrinker,
static void nfs4_xattr_shrinker_destroy(struct shrinker *shrinker,
struct list_lru *lru)
{
- unregister_shrinker(shrinker);
+ shrinker_free(shrinker);
list_lru_destroy(lru);
}
@@ -1026,27 +1023,31 @@ int __init nfs4_xattr_cache_init(void)
return -ENOMEM;
ret = nfs4_xattr_shrinker_init(&nfs4_xattr_cache_shrinker,
- &nfs4_xattr_cache_lru,
- "nfs-xattr_cache");
+ &nfs4_xattr_cache_lru, "nfs-xattr_cache",
+ nfs4_xattr_cache_count,
+ nfs4_xattr_cache_scan, 0, DEFAULT_SEEKS);
if (ret)
goto out1;
ret = nfs4_xattr_shrinker_init(&nfs4_xattr_entry_shrinker,
- &nfs4_xattr_entry_lru,
- "nfs-xattr_entry");
+ &nfs4_xattr_entry_lru, "nfs-xattr_entry",
+ nfs4_xattr_entry_count,
+ nfs4_xattr_entry_scan, 512, DEFAULT_SEEKS);
if (ret)
goto out2;
ret = nfs4_xattr_shrinker_init(&nfs4_xattr_large_entry_shrinker,
&nfs4_xattr_large_entry_lru,
- "nfs-xattr_large_entry");
+ "nfs-xattr_large_entry",
+ nfs4_xattr_entry_count,
+ nfs4_xattr_entry_scan, 512, 1);
if (!ret)
return 0;
- nfs4_xattr_shrinker_destroy(&nfs4_xattr_entry_shrinker,
+ nfs4_xattr_shrinker_destroy(nfs4_xattr_entry_shrinker,
&nfs4_xattr_entry_lru);
out2:
- nfs4_xattr_shrinker_destroy(&nfs4_xattr_cache_shrinker,
+ nfs4_xattr_shrinker_destroy(nfs4_xattr_cache_shrinker,
&nfs4_xattr_cache_lru);
out1:
kmem_cache_destroy(nfs4_xattr_cache_cachep);
@@ -1056,11 +1057,11 @@ int __init nfs4_xattr_cache_init(void)
void nfs4_xattr_cache_exit(void)
{
- nfs4_xattr_shrinker_destroy(&nfs4_xattr_large_entry_shrinker,
+ nfs4_xattr_shrinker_destroy(nfs4_xattr_large_entry_shrinker,
&nfs4_xattr_large_entry_lru);
- nfs4_xattr_shrinker_destroy(&nfs4_xattr_entry_shrinker,
+ nfs4_xattr_shrinker_destroy(nfs4_xattr_entry_shrinker,
&nfs4_xattr_entry_lru);
- nfs4_xattr_shrinker_destroy(&nfs4_xattr_cache_shrinker,
+ nfs4_xattr_shrinker_destroy(nfs4_xattr_cache_shrinker,
&nfs4_xattr_cache_lru);
kmem_cache_destroy(nfs4_xattr_cache_cachep);
}
--
2.30.2
Use new APIs to dynamically allocate the nfsd-filecache shrinker.
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
fs/nfsd/filecache.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index ee9c923192e0..872eb9501965 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -521,11 +521,7 @@ nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc)
return ret;
}
-static struct shrinker nfsd_file_shrinker = {
- .scan_objects = nfsd_file_lru_scan,
- .count_objects = nfsd_file_lru_count,
- .seeks = 1,
-};
+static struct shrinker *nfsd_file_shrinker;
/**
* nfsd_file_cond_queue - conditionally unhash and queue a nfsd_file
@@ -746,12 +742,18 @@ nfsd_file_cache_init(void)
goto out_err;
}
- ret = register_shrinker(&nfsd_file_shrinker, "nfsd-filecache");
- if (ret) {
- pr_err("nfsd: failed to register nfsd_file_shrinker: %d\n", ret);
+ nfsd_file_shrinker = shrinker_alloc(0, "nfsd-filecache");
+ if (!nfsd_file_shrinker) {
+ pr_err("nfsd: failed to allocate nfsd_file_shrinker\n");
goto out_lru;
}
+ nfsd_file_shrinker->count_objects = nfsd_file_lru_count;
+ nfsd_file_shrinker->scan_objects = nfsd_file_lru_scan;
+ nfsd_file_shrinker->seeks = 1;
+
+ shrinker_register(nfsd_file_shrinker);
+
ret = lease_register_notifier(&nfsd_file_lease_notifier);
if (ret) {
pr_err("nfsd: unable to register lease notifier: %d\n", ret);
@@ -774,7 +776,7 @@ nfsd_file_cache_init(void)
out_notifier:
lease_unregister_notifier(&nfsd_file_lease_notifier);
out_shrinker:
- unregister_shrinker(&nfsd_file_shrinker);
+ shrinker_free(nfsd_file_shrinker);
out_lru:
list_lru_destroy(&nfsd_file_lru);
out_err:
@@ -891,7 +893,7 @@ nfsd_file_cache_shutdown(void)
return;
lease_unregister_notifier(&nfsd_file_lease_notifier);
- unregister_shrinker(&nfsd_file_shrinker);
+ shrinker_free(nfsd_file_shrinker);
/*
* make sure all callers of nfsd_file_lru_cb are done before
* calling nfsd_file_cache_purge
--
2.30.2
Use new APIs to dynamically allocate the thp-zero and thp-deferred_split
shrinkers.
Signed-off-by: Qi Zheng <[email protected]>
---
mm/huge_memory.c | 69 +++++++++++++++++++++++++++++++-----------------
1 file changed, 45 insertions(+), 24 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e371503f7746..a0dbb55b4913 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -65,7 +65,11 @@ unsigned long transparent_hugepage_flags __read_mostly =
(1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG)|
(1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG);
-static struct shrinker deferred_split_shrinker;
+static struct shrinker *deferred_split_shrinker;
+static unsigned long deferred_split_count(struct shrinker *shrink,
+ struct shrink_control *sc);
+static unsigned long deferred_split_scan(struct shrinker *shrink,
+ struct shrink_control *sc);
static atomic_t huge_zero_refcount;
struct page *huge_zero_page __read_mostly;
@@ -229,11 +233,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink,
return 0;
}
-static struct shrinker huge_zero_page_shrinker = {
- .count_objects = shrink_huge_zero_page_count,
- .scan_objects = shrink_huge_zero_page_scan,
- .seeks = DEFAULT_SEEKS,
-};
+static struct shrinker *huge_zero_page_shrinker;
#ifdef CONFIG_SYSFS
static ssize_t enabled_show(struct kobject *kobj,
@@ -454,6 +454,40 @@ static inline void hugepage_exit_sysfs(struct kobject *hugepage_kobj)
}
#endif /* CONFIG_SYSFS */
+static int __init thp_shrinker_init(void)
+{
+ huge_zero_page_shrinker = shrinker_alloc(0, "thp-zero");
+ if (!huge_zero_page_shrinker)
+ return -ENOMEM;
+
+ deferred_split_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE |
+ SHRINKER_MEMCG_AWARE |
+ SHRINKER_NONSLAB,
+ "thp-deferred_split");
+ if (!deferred_split_shrinker) {
+ shrinker_free(huge_zero_page_shrinker);
+ return -ENOMEM;
+ }
+
+ huge_zero_page_shrinker->count_objects = shrink_huge_zero_page_count;
+ huge_zero_page_shrinker->scan_objects = shrink_huge_zero_page_scan;
+ huge_zero_page_shrinker->seeks = DEFAULT_SEEKS;
+ shrinker_register(huge_zero_page_shrinker);
+
+ deferred_split_shrinker->count_objects = deferred_split_count;
+ deferred_split_shrinker->scan_objects = deferred_split_scan;
+ deferred_split_shrinker->seeks = DEFAULT_SEEKS;
+ shrinker_register(deferred_split_shrinker);
+
+ return 0;
+}
+
+static void __init thp_shrinker_exit(void)
+{
+ shrinker_free(huge_zero_page_shrinker);
+ shrinker_free(deferred_split_shrinker);
+}
+
static int __init hugepage_init(void)
{
int err;
@@ -482,12 +516,9 @@ static int __init hugepage_init(void)
if (err)
goto err_slab;
- err = register_shrinker(&huge_zero_page_shrinker, "thp-zero");
- if (err)
- goto err_hzp_shrinker;
- err = register_shrinker(&deferred_split_shrinker, "thp-deferred_split");
+ err = thp_shrinker_init();
if (err)
- goto err_split_shrinker;
+ goto err_shrinker;
/*
* By default disable transparent hugepages on smaller systems,
@@ -505,10 +536,8 @@ static int __init hugepage_init(void)
return 0;
err_khugepaged:
- unregister_shrinker(&deferred_split_shrinker);
-err_split_shrinker:
- unregister_shrinker(&huge_zero_page_shrinker);
-err_hzp_shrinker:
+ thp_shrinker_exit();
+err_shrinker:
khugepaged_destroy();
err_slab:
hugepage_exit_sysfs(hugepage_kobj);
@@ -2833,7 +2862,7 @@ void deferred_split_folio(struct folio *folio)
#ifdef CONFIG_MEMCG
if (memcg)
set_shrinker_bit(memcg, folio_nid(folio),
- deferred_split_shrinker.id);
+ deferred_split_shrinker->id);
#endif
}
spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
@@ -2907,14 +2936,6 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
return split;
}
-static struct shrinker deferred_split_shrinker = {
- .count_objects = deferred_split_count,
- .scan_objects = deferred_split_scan,
- .seeks = DEFAULT_SEEKS,
- .flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE |
- SHRINKER_NONSLAB,
-};
-
#ifdef CONFIG_DEBUG_FS
static void split_huge_pages_all(void)
{
--
2.30.2
Use new APIs to dynamically allocate the mm-shadow shrinker.
Signed-off-by: Qi Zheng <[email protected]>
---
mm/workingset.c | 27 ++++++++++++++-------------
1 file changed, 14 insertions(+), 13 deletions(-)
diff --git a/mm/workingset.c b/mm/workingset.c
index da58a26d0d4d..3c53138903a7 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -763,13 +763,6 @@ static unsigned long scan_shadow_nodes(struct shrinker *shrinker,
NULL);
}
-static struct shrinker workingset_shadow_shrinker = {
- .count_objects = count_shadow_nodes,
- .scan_objects = scan_shadow_nodes,
- .seeks = 0, /* ->count reports only fully expendable nodes */
- .flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE,
-};
-
/*
* Our list_lru->lock is IRQ-safe as it nests inside the IRQ-safe
* i_pages lock.
@@ -778,9 +771,10 @@ static struct lock_class_key shadow_nodes_key;
static int __init workingset_init(void)
{
+ struct shrinker *workingset_shadow_shrinker;
unsigned int timestamp_bits;
unsigned int max_order;
- int ret;
+ int ret = -ENOMEM;
BUILD_BUG_ON(BITS_PER_LONG < EVICTION_SHIFT);
/*
@@ -797,17 +791,24 @@ static int __init workingset_init(void)
pr_info("workingset: timestamp_bits=%d max_order=%d bucket_order=%u\n",
timestamp_bits, max_order, bucket_order);
- ret = prealloc_shrinker(&workingset_shadow_shrinker, "mm-shadow");
- if (ret)
+ workingset_shadow_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE |
+ SHRINKER_MEMCG_AWARE,
+ "mm-shadow");
+ if (!workingset_shadow_shrinker)
goto err;
+
ret = __list_lru_init(&shadow_nodes, true, &shadow_nodes_key,
- &workingset_shadow_shrinker);
+ workingset_shadow_shrinker);
if (ret)
goto err_list_lru;
- register_shrinker_prepared(&workingset_shadow_shrinker);
+
+ workingset_shadow_shrinker->count_objects = count_shadow_nodes;
+ workingset_shadow_shrinker->scan_objects = scan_shadow_nodes;
+
+ shrinker_register(workingset_shadow_shrinker);
return 0;
err_list_lru:
- free_prealloced_shrinker(&workingset_shadow_shrinker);
+ shrinker_free(workingset_shadow_shrinker);
err:
return ret;
}
--
2.30.2
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the i915_gem_mm shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct drm_i915_private.
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 30 +++++++++++---------
drivers/gpu/drm/i915/i915_drv.h | 2 +-
2 files changed, 18 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
index 214763942aa2..4504eb4f31d5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
@@ -284,8 +284,7 @@ unsigned long i915_gem_shrink_all(struct drm_i915_private *i915)
static unsigned long
i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
{
- struct drm_i915_private *i915 =
- container_of(shrinker, struct drm_i915_private, mm.shrinker);
+ struct drm_i915_private *i915 = shrinker->private_data;
unsigned long num_objects;
unsigned long count;
@@ -302,8 +301,8 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
if (num_objects) {
unsigned long avg = 2 * count / num_objects;
- i915->mm.shrinker.batch =
- max((i915->mm.shrinker.batch + avg) >> 1,
+ i915->mm.shrinker->batch =
+ max((i915->mm.shrinker->batch + avg) >> 1,
128ul /* default SHRINK_BATCH */);
}
@@ -313,8 +312,7 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
static unsigned long
i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
{
- struct drm_i915_private *i915 =
- container_of(shrinker, struct drm_i915_private, mm.shrinker);
+ struct drm_i915_private *i915 = shrinker->private_data;
unsigned long freed;
sc->nr_scanned = 0;
@@ -422,12 +420,18 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr
void i915_gem_driver_register__shrinker(struct drm_i915_private *i915)
{
- i915->mm.shrinker.scan_objects = i915_gem_shrinker_scan;
- i915->mm.shrinker.count_objects = i915_gem_shrinker_count;
- i915->mm.shrinker.seeks = DEFAULT_SEEKS;
- i915->mm.shrinker.batch = 4096;
- drm_WARN_ON(&i915->drm, register_shrinker(&i915->mm.shrinker,
- "drm-i915_gem"));
+ i915->mm.shrinker = shrinker_alloc(0, "drm-i915_gem");
+ if (!i915->mm.shrinker) {
+ drm_WARN_ON(&i915->drm, 1);
+ } else {
+ i915->mm.shrinker->scan_objects = i915_gem_shrinker_scan;
+ i915->mm.shrinker->count_objects = i915_gem_shrinker_count;
+ i915->mm.shrinker->seeks = DEFAULT_SEEKS;
+ i915->mm.shrinker->batch = 4096;
+ i915->mm.shrinker->private_data = i915;
+
+ shrinker_register(i915->mm.shrinker);
+ }
i915->mm.oom_notifier.notifier_call = i915_gem_shrinker_oom;
drm_WARN_ON(&i915->drm, register_oom_notifier(&i915->mm.oom_notifier));
@@ -443,7 +447,7 @@ void i915_gem_driver_unregister__shrinker(struct drm_i915_private *i915)
unregister_vmap_purge_notifier(&i915->mm.vmap_notifier));
drm_WARN_ON(&i915->drm,
unregister_oom_notifier(&i915->mm.oom_notifier));
- unregister_shrinker(&i915->mm.shrinker);
+ shrinker_free(i915->mm.shrinker);
}
void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 682ef2b5c7d5..389e8bf140d7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -163,7 +163,7 @@ struct i915_gem_mm {
struct notifier_block oom_notifier;
struct notifier_block vmap_notifier;
- struct shrinker shrinker;
+ struct shrinker *shrinker;
#ifdef CONFIG_MMU_NOTIFIER
/**
--
2.30.2
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the drm-panfrost shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct panfrost_device.
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Steven Price <[email protected]>
---
drivers/gpu/drm/panfrost/panfrost_device.h | 2 +-
drivers/gpu/drm/panfrost/panfrost_drv.c | 6 +++-
drivers/gpu/drm/panfrost/panfrost_gem.h | 2 +-
.../gpu/drm/panfrost/panfrost_gem_shrinker.c | 30 +++++++++++--------
4 files changed, 25 insertions(+), 15 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
index b0126b9fbadc..e667e5689353 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -118,7 +118,7 @@ struct panfrost_device {
struct mutex shrinker_lock;
struct list_head shrinker_list;
- struct shrinker shrinker;
+ struct shrinker *shrinker;
struct panfrost_devfreq pfdevfreq;
};
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index a2ab99698ca8..e1d0e3a23757 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -601,10 +601,14 @@ static int panfrost_probe(struct platform_device *pdev)
if (err < 0)
goto err_out1;
- panfrost_gem_shrinker_init(ddev);
+ err = panfrost_gem_shrinker_init(ddev);
+ if (err)
+ goto err_out2;
return 0;
+err_out2:
+ drm_dev_unregister(ddev);
err_out1:
pm_runtime_disable(pfdev->dev);
panfrost_device_fini(pfdev);
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
index ad2877eeeccd..863d2ec8d4f0 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -81,7 +81,7 @@ panfrost_gem_mapping_get(struct panfrost_gem_object *bo,
void panfrost_gem_mapping_put(struct panfrost_gem_mapping *mapping);
void panfrost_gem_teardown_mappings_locked(struct panfrost_gem_object *bo);
-void panfrost_gem_shrinker_init(struct drm_device *dev);
+int panfrost_gem_shrinker_init(struct drm_device *dev);
void panfrost_gem_shrinker_cleanup(struct drm_device *dev);
#endif /* __PANFROST_GEM_H__ */
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
index 6a71a2555f85..3dfe2b7ccdd9 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
@@ -18,8 +18,7 @@
static unsigned long
panfrost_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
{
- struct panfrost_device *pfdev =
- container_of(shrinker, struct panfrost_device, shrinker);
+ struct panfrost_device *pfdev = shrinker->private_data;
struct drm_gem_shmem_object *shmem;
unsigned long count = 0;
@@ -65,8 +64,7 @@ static bool panfrost_gem_purge(struct drm_gem_object *obj)
static unsigned long
panfrost_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
{
- struct panfrost_device *pfdev =
- container_of(shrinker, struct panfrost_device, shrinker);
+ struct panfrost_device *pfdev = shrinker->private_data;
struct drm_gem_shmem_object *shmem, *tmp;
unsigned long freed = 0;
@@ -97,13 +95,22 @@ panfrost_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
*
* This function registers and sets up the panfrost shrinker.
*/
-void panfrost_gem_shrinker_init(struct drm_device *dev)
+int panfrost_gem_shrinker_init(struct drm_device *dev)
{
struct panfrost_device *pfdev = dev->dev_private;
- pfdev->shrinker.count_objects = panfrost_gem_shrinker_count;
- pfdev->shrinker.scan_objects = panfrost_gem_shrinker_scan;
- pfdev->shrinker.seeks = DEFAULT_SEEKS;
- WARN_ON(register_shrinker(&pfdev->shrinker, "drm-panfrost"));
+
+ pfdev->shrinker = shrinker_alloc(0, "drm-panfrost");
+ if (!pfdev->shrinker)
+ return -ENOMEM;
+
+ pfdev->shrinker->count_objects = panfrost_gem_shrinker_count;
+ pfdev->shrinker->scan_objects = panfrost_gem_shrinker_scan;
+ pfdev->shrinker->seeks = DEFAULT_SEEKS;
+ pfdev->shrinker->private_data = pfdev;
+
+ shrinker_register(pfdev->shrinker);
+
+ return 0;
}
/**
@@ -116,7 +123,6 @@ void panfrost_gem_shrinker_cleanup(struct drm_device *dev)
{
struct panfrost_device *pfdev = dev->dev_private;
- if (pfdev->shrinker.nr_deferred) {
- unregister_shrinker(&pfdev->shrinker);
- }
+ if (pfdev->shrinker)
+ shrinker_free(pfdev->shrinker);
}
--
2.30.2
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the vmw-balloon shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct vmballoon.
And we can simply exit vmballoon_init() when registering the shrinker
fails. So the shrinker_registered indication is redundant, just remove it.
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
drivers/misc/vmw_balloon.c | 38 ++++++++++++--------------------------
1 file changed, 12 insertions(+), 26 deletions(-)
diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c
index 9ce9b9e0e9b6..ac2cdb6cdf74 100644
--- a/drivers/misc/vmw_balloon.c
+++ b/drivers/misc/vmw_balloon.c
@@ -380,16 +380,7 @@ struct vmballoon {
/**
* @shrinker: shrinker interface that is used to avoid over-inflation.
*/
- struct shrinker shrinker;
-
- /**
- * @shrinker_registered: whether the shrinker was registered.
- *
- * The shrinker interface does not handle gracefully the removal of
- * shrinker that was not registered before. This indication allows to
- * simplify the unregistration process.
- */
- bool shrinker_registered;
+ struct shrinker *shrinker;
};
static struct vmballoon balloon;
@@ -1568,29 +1559,27 @@ static unsigned long vmballoon_shrinker_count(struct shrinker *shrinker,
static void vmballoon_unregister_shrinker(struct vmballoon *b)
{
- if (b->shrinker_registered)
- unregister_shrinker(&b->shrinker);
- b->shrinker_registered = false;
+ shrinker_free(b->shrinker);
}
static int vmballoon_register_shrinker(struct vmballoon *b)
{
- int r;
-
/* Do nothing if the shrinker is not enabled */
if (!vmwballoon_shrinker_enable)
return 0;
- b->shrinker.scan_objects = vmballoon_shrinker_scan;
- b->shrinker.count_objects = vmballoon_shrinker_count;
- b->shrinker.seeks = DEFAULT_SEEKS;
+ b->shrinker = shrinker_alloc(0, "vmw-balloon");
+ if (!b->shrinker)
+ return -ENOMEM;
- r = register_shrinker(&b->shrinker, "vmw-balloon");
+ b->shrinker->scan_objects = vmballoon_shrinker_scan;
+ b->shrinker->count_objects = vmballoon_shrinker_count;
+ b->shrinker->seeks = DEFAULT_SEEKS;
+ b->shrinker->private_data = b;
- if (r == 0)
- b->shrinker_registered = true;
+ shrinker_register(b->shrinker);
- return r;
+ return 0;
}
/*
@@ -1883,7 +1872,7 @@ static int __init vmballoon_init(void)
error = vmballoon_register_shrinker(&balloon);
if (error)
- goto fail;
+ return error;
/*
* Initialization of compaction must be done after the call to
@@ -1905,9 +1894,6 @@ static int __init vmballoon_init(void)
vmballoon_debugfs_init(&balloon);
return 0;
-fail:
- vmballoon_unregister_shrinker(&balloon);
- return error;
}
/*
--
2.30.2
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the ext4-es shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct ext4_sb_info.
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
fs/ext4/ext4.h | 2 +-
fs/ext4/extents_status.c | 22 ++++++++++++----------
2 files changed, 13 insertions(+), 11 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 1e2259d9967d..82397bf0b33e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1657,7 +1657,7 @@ struct ext4_sb_info {
__u32 s_csum_seed;
/* Reclaim extents from extent status tree */
- struct shrinker s_es_shrinker;
+ struct shrinker *s_es_shrinker;
struct list_head s_es_list; /* List of inodes with reclaimable extents */
long s_es_nr_inode;
struct ext4_es_stats s_es_stats;
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 9b5b8951afb4..74bb64fadbc4 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -1596,7 +1596,7 @@ static unsigned long ext4_es_count(struct shrinker *shrink,
unsigned long nr;
struct ext4_sb_info *sbi;
- sbi = container_of(shrink, struct ext4_sb_info, s_es_shrinker);
+ sbi = shrink->private_data;
nr = percpu_counter_read_positive(&sbi->s_es_stats.es_stats_shk_cnt);
trace_ext4_es_shrink_count(sbi->s_sb, sc->nr_to_scan, nr);
return nr;
@@ -1605,8 +1605,7 @@ static unsigned long ext4_es_count(struct shrinker *shrink,
static unsigned long ext4_es_scan(struct shrinker *shrink,
struct shrink_control *sc)
{
- struct ext4_sb_info *sbi = container_of(shrink,
- struct ext4_sb_info, s_es_shrinker);
+ struct ext4_sb_info *sbi = shrink->private_data;
int nr_to_scan = sc->nr_to_scan;
int ret, nr_shrunk;
@@ -1690,14 +1689,17 @@ int ext4_es_register_shrinker(struct ext4_sb_info *sbi)
if (err)
goto err3;
- sbi->s_es_shrinker.scan_objects = ext4_es_scan;
- sbi->s_es_shrinker.count_objects = ext4_es_count;
- sbi->s_es_shrinker.seeks = DEFAULT_SEEKS;
- err = register_shrinker(&sbi->s_es_shrinker, "ext4-es:%s",
- sbi->s_sb->s_id);
- if (err)
+ sbi->s_es_shrinker = shrinker_alloc(0, "ext4-es:%s", sbi->s_sb->s_id);
+ if (!sbi->s_es_shrinker)
goto err4;
+ sbi->s_es_shrinker->scan_objects = ext4_es_scan;
+ sbi->s_es_shrinker->count_objects = ext4_es_count;
+ sbi->s_es_shrinker->seeks = DEFAULT_SEEKS;
+ sbi->s_es_shrinker->private_data = sbi;
+
+ shrinker_register(sbi->s_es_shrinker);
+
return 0;
err4:
percpu_counter_destroy(&sbi->s_es_stats.es_stats_shk_cnt);
@@ -1716,7 +1718,7 @@ void ext4_es_unregister_shrinker(struct ext4_sb_info *sbi)
percpu_counter_destroy(&sbi->s_es_stats.es_stats_cache_misses);
percpu_counter_destroy(&sbi->s_es_stats.es_stats_all_cnt);
percpu_counter_destroy(&sbi->s_es_stats.es_stats_shk_cnt);
- unregister_shrinker(&sbi->s_es_shrinker);
+ shrinker_free(sbi->s_es_shrinker);
}
/*
--
2.30.2
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the jbd2-journal shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct journal_s.
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
fs/jbd2/journal.c | 27 +++++++++++++++++----------
include/linux/jbd2.h | 2 +-
2 files changed, 18 insertions(+), 11 deletions(-)
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 1b5a45ab62b0..4c421da03fee 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1298,7 +1298,7 @@ static int jbd2_min_tag_size(void)
static unsigned long jbd2_journal_shrink_scan(struct shrinker *shrink,
struct shrink_control *sc)
{
- journal_t *journal = container_of(shrink, journal_t, j_shrinker);
+ journal_t *journal = shrink->private_data;
unsigned long nr_to_scan = sc->nr_to_scan;
unsigned long nr_shrunk;
unsigned long count;
@@ -1324,7 +1324,7 @@ static unsigned long jbd2_journal_shrink_scan(struct shrinker *shrink,
static unsigned long jbd2_journal_shrink_count(struct shrinker *shrink,
struct shrink_control *sc)
{
- journal_t *journal = container_of(shrink, journal_t, j_shrinker);
+ journal_t *journal = shrink->private_data;
unsigned long count;
count = percpu_counter_read_positive(&journal->j_checkpoint_jh_count);
@@ -1412,19 +1412,26 @@ static journal_t *journal_init_common(struct block_device *bdev,
journal->j_superblock = (journal_superblock_t *)bh->b_data;
journal->j_shrink_transaction = NULL;
- journal->j_shrinker.scan_objects = jbd2_journal_shrink_scan;
- journal->j_shrinker.count_objects = jbd2_journal_shrink_count;
- journal->j_shrinker.seeks = DEFAULT_SEEKS;
- journal->j_shrinker.batch = journal->j_max_transaction_buffers;
if (percpu_counter_init(&journal->j_checkpoint_jh_count, 0, GFP_KERNEL))
goto err_cleanup;
- if (register_shrinker(&journal->j_shrinker, "jbd2-journal:(%u:%u)",
- MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev))) {
+ journal->j_shrinker = shrinker_alloc(0, "jbd2-journal:(%u:%u)",
+ MAJOR(bdev->bd_dev),
+ MINOR(bdev->bd_dev));
+ if (!journal->j_shrinker) {
percpu_counter_destroy(&journal->j_checkpoint_jh_count);
goto err_cleanup;
}
+
+ journal->j_shrinker->scan_objects = jbd2_journal_shrink_scan;
+ journal->j_shrinker->count_objects = jbd2_journal_shrink_count;
+ journal->j_shrinker->seeks = DEFAULT_SEEKS;
+ journal->j_shrinker->batch = journal->j_max_transaction_buffers;
+ journal->j_shrinker->private_data = journal;
+
+ shrinker_register(journal->j_shrinker);
+
return journal;
err_cleanup:
@@ -2187,9 +2194,9 @@ int jbd2_journal_destroy(journal_t *journal)
brelse(journal->j_sb_buffer);
}
- if (journal->j_shrinker.flags & SHRINKER_REGISTERED) {
+ if (journal->j_shrinker) {
percpu_counter_destroy(&journal->j_checkpoint_jh_count);
- unregister_shrinker(&journal->j_shrinker);
+ shrinker_free(journal->j_shrinker);
}
if (journal->j_proc_entry)
jbd2_stats_proc_exit(journal);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 44c298aa58d4..beb4c4586320 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -891,7 +891,7 @@ struct journal_s
* Journal head shrinker, reclaim buffer's journal head which
* has been written back.
*/
- struct shrinker j_shrinker;
+ struct shrinker *j_shrinker;
/**
* @j_checkpoint_jh_count:
--
2.30.2
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the xfs-buf shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct xfs_buftarg.
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
fs/xfs/xfs_buf.c | 25 ++++++++++++++-----------
fs/xfs/xfs_buf.h | 2 +-
2 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 15d1e5a7c2d3..715730fc91cb 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1906,8 +1906,7 @@ xfs_buftarg_shrink_scan(
struct shrinker *shrink,
struct shrink_control *sc)
{
- struct xfs_buftarg *btp = container_of(shrink,
- struct xfs_buftarg, bt_shrinker);
+ struct xfs_buftarg *btp = shrink->private_data;
LIST_HEAD(dispose);
unsigned long freed;
@@ -1929,8 +1928,7 @@ xfs_buftarg_shrink_count(
struct shrinker *shrink,
struct shrink_control *sc)
{
- struct xfs_buftarg *btp = container_of(shrink,
- struct xfs_buftarg, bt_shrinker);
+ struct xfs_buftarg *btp = shrink->private_data;
return list_lru_shrink_count(&btp->bt_lru, sc);
}
@@ -1938,7 +1936,7 @@ void
xfs_free_buftarg(
struct xfs_buftarg *btp)
{
- unregister_shrinker(&btp->bt_shrinker);
+ shrinker_free(btp->bt_shrinker);
ASSERT(percpu_counter_sum(&btp->bt_io_count) == 0);
percpu_counter_destroy(&btp->bt_io_count);
list_lru_destroy(&btp->bt_lru);
@@ -2021,13 +2019,18 @@ xfs_alloc_buftarg(
if (percpu_counter_init(&btp->bt_io_count, 0, GFP_KERNEL))
goto error_lru;
- btp->bt_shrinker.count_objects = xfs_buftarg_shrink_count;
- btp->bt_shrinker.scan_objects = xfs_buftarg_shrink_scan;
- btp->bt_shrinker.seeks = DEFAULT_SEEKS;
- btp->bt_shrinker.flags = SHRINKER_NUMA_AWARE;
- if (register_shrinker(&btp->bt_shrinker, "xfs-buf:%s",
- mp->m_super->s_id))
+ btp->bt_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE, "xfs-buf:%s",
+ mp->m_super->s_id);
+ if (!btp->bt_shrinker)
goto error_pcpu;
+
+ btp->bt_shrinker->count_objects = xfs_buftarg_shrink_count;
+ btp->bt_shrinker->scan_objects = xfs_buftarg_shrink_scan;
+ btp->bt_shrinker->seeks = DEFAULT_SEEKS;
+ btp->bt_shrinker->private_data = btp;
+
+ shrinker_register(btp->bt_shrinker);
+
return btp;
error_pcpu:
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 549c60942208..4e6969a675f7 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -102,7 +102,7 @@ typedef struct xfs_buftarg {
size_t bt_logical_sectormask;
/* LRU control structures */
- struct shrinker bt_shrinker;
+ struct shrinker *bt_shrinker;
struct list_lru bt_lru;
struct percpu_counter bt_io_count;
--
2.30.2
Currently, we maintain two linear arrays per node per memcg, which are
shrinker_info::map and shrinker_info::nr_deferred. And we need to resize
them when the shrinker_nr_max is exceeded, that is, allocate a new array,
and then copy the old array to the new array, and finally free the old
array by RCU.
For shrinker_info::map, we do set_bit() under the RCU lock, so we may set
the value into the old map which is about to be freed. This may cause the
value set to be lost. The current solution is not to copy the old map when
resizing, but to set all the corresponding bits in the new map to 1. This
solves the data loss problem, but bring the overhead of more pointless
loops while doing memcg slab shrink.
For shrinker_info::nr_deferred, we will only modify it under the read lock
of shrinker_rwsem, so it will not run concurrently with the resizing. But
after we make memcg slab shrink lockless, there will be the same data loss
problem as shrinker_info::map, and we can't work around it like the map.
For such resizable arrays, the most straightforward idea is to change it
to xarray, like we did for list_lru [1]. We need to do xa_store() in the
list_lru_add()-->set_shrinker_bit(), but this will cause memory
allocation, and the list_lru_add() doesn't accept failure. A possible
solution is to pre-allocate, but the location of pre-allocation is not
well determined.
Therefore, this commit chooses to introduce a secondary array for
shrinker_info::{map, nr_deferred}, so that we only need to copy this
secondary array every time the size is resized. Then even if we get the
old secondary array under the RCU lock, the found map and nr_deferred are
also true, so no data is lost.
[1]. https://lore.kernel.org/all/[email protected]/
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
include/linux/memcontrol.h | 12 +-
include/linux/shrinker.h | 17 +++
mm/shrinker.c | 250 +++++++++++++++++++++++--------------
3 files changed, 172 insertions(+), 107 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e765d1ff9cbb..2c9327d27c3b 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -21,6 +21,7 @@
#include <linux/vmstat.h>
#include <linux/writeback.h>
#include <linux/page-flags.h>
+#include <linux/shrinker.h>
struct mem_cgroup;
struct obj_cgroup;
@@ -88,17 +89,6 @@ struct mem_cgroup_reclaim_iter {
unsigned int generation;
};
-/*
- * Bitmap and deferred work of shrinker::id corresponding to memcg-aware
- * shrinkers, which have elements charged to this memcg.
- */
-struct shrinker_info {
- struct rcu_head rcu;
- atomic_long_t *nr_deferred;
- unsigned long *map;
- int map_nr_max;
-};
-
struct lruvec_stats_percpu {
/* Local (CPU and cgroup) state */
long state[NR_VM_NODE_STAT_ITEMS];
diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 025c8070dd86..eb342994675a 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -5,6 +5,23 @@
#include <linux/atomic.h>
#include <linux/types.h>
+#define SHRINKER_UNIT_BITS BITS_PER_LONG
+
+/*
+ * Bitmap and deferred work of shrinker::id corresponding to memcg-aware
+ * shrinkers, which have elements charged to the memcg.
+ */
+struct shrinker_info_unit {
+ atomic_long_t nr_deferred[SHRINKER_UNIT_BITS];
+ DECLARE_BITMAP(map, SHRINKER_UNIT_BITS);
+};
+
+struct shrinker_info {
+ struct rcu_head rcu;
+ int map_nr_max;
+ struct shrinker_info_unit *unit[];
+};
+
/*
* This struct is used to pass information from page reclaim to the shrinkers.
* We consolidate the values for easier extension later.
diff --git a/mm/shrinker.c b/mm/shrinker.c
index a27779ed3798..1911c06b8af5 100644
--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -12,15 +12,50 @@ DECLARE_RWSEM(shrinker_rwsem);
#ifdef CONFIG_MEMCG
static int shrinker_nr_max;
-/* The shrinker_info is expanded in a batch of BITS_PER_LONG */
-static inline int shrinker_map_size(int nr_items)
+static inline int shrinker_unit_size(int nr_items)
{
- return (DIV_ROUND_UP(nr_items, BITS_PER_LONG) * sizeof(unsigned long));
+ return (DIV_ROUND_UP(nr_items, SHRINKER_UNIT_BITS) * sizeof(struct shrinker_info_unit *));
}
-static inline int shrinker_defer_size(int nr_items)
+static inline void shrinker_unit_free(struct shrinker_info *info, int start)
{
- return (round_up(nr_items, BITS_PER_LONG) * sizeof(atomic_long_t));
+ struct shrinker_info_unit **unit;
+ int nr, i;
+
+ if (!info)
+ return;
+
+ unit = info->unit;
+ nr = DIV_ROUND_UP(info->map_nr_max, SHRINKER_UNIT_BITS);
+
+ for (i = start; i < nr; i++) {
+ if (!unit[i])
+ break;
+
+ kvfree(unit[i]);
+ unit[i] = NULL;
+ }
+}
+
+static inline int shrinker_unit_alloc(struct shrinker_info *new,
+ struct shrinker_info *old, int nid)
+{
+ struct shrinker_info_unit *unit;
+ int nr = DIV_ROUND_UP(new->map_nr_max, SHRINKER_UNIT_BITS);
+ int start = old ? DIV_ROUND_UP(old->map_nr_max, SHRINKER_UNIT_BITS) : 0;
+ int i;
+
+ for (i = start; i < nr; i++) {
+ unit = kvzalloc_node(sizeof(*unit), GFP_KERNEL, nid);
+ if (!unit) {
+ shrinker_unit_free(new, start);
+ return -ENOMEM;
+ }
+
+ new->unit[i] = unit;
+ }
+
+ return 0;
}
void free_shrinker_info(struct mem_cgroup *memcg)
@@ -32,6 +67,7 @@ void free_shrinker_info(struct mem_cgroup *memcg)
for_each_node(nid) {
pn = memcg->nodeinfo[nid];
info = rcu_dereference_protected(pn->shrinker_info, true);
+ shrinker_unit_free(info, 0);
kvfree(info);
rcu_assign_pointer(pn->shrinker_info, NULL);
}
@@ -40,28 +76,27 @@ void free_shrinker_info(struct mem_cgroup *memcg)
int alloc_shrinker_info(struct mem_cgroup *memcg)
{
struct shrinker_info *info;
- int nid, size, ret = 0;
- int map_size, defer_size = 0;
+ int nid, ret = 0;
+ int array_size = 0;
down_write(&shrinker_rwsem);
- map_size = shrinker_map_size(shrinker_nr_max);
- defer_size = shrinker_defer_size(shrinker_nr_max);
- size = map_size + defer_size;
+ array_size = shrinker_unit_size(shrinker_nr_max);
for_each_node(nid) {
- info = kvzalloc_node(sizeof(*info) + size, GFP_KERNEL, nid);
- if (!info) {
- free_shrinker_info(memcg);
- ret = -ENOMEM;
- break;
- }
- info->nr_deferred = (atomic_long_t *)(info + 1);
- info->map = (void *)info->nr_deferred + defer_size;
+ info = kvzalloc_node(sizeof(*info) + array_size, GFP_KERNEL, nid);
+ if (!info)
+ goto err;
info->map_nr_max = shrinker_nr_max;
+ if (shrinker_unit_alloc(info, NULL, nid))
+ goto err;
rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
}
up_write(&shrinker_rwsem);
return ret;
+
+err:
+ free_shrinker_info(memcg);
+ return -ENOMEM;
}
static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg,
@@ -71,15 +106,12 @@ static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg,
lockdep_is_held(&shrinker_rwsem));
}
-static int expand_one_shrinker_info(struct mem_cgroup *memcg,
- int map_size, int defer_size,
- int old_map_size, int old_defer_size,
- int new_nr_max)
+static int expand_one_shrinker_info(struct mem_cgroup *memcg, int new_size,
+ int old_size, int new_nr_max)
{
struct shrinker_info *new, *old;
struct mem_cgroup_per_node *pn;
int nid;
- int size = map_size + defer_size;
for_each_node(nid) {
pn = memcg->nodeinfo[nid];
@@ -92,21 +124,18 @@ static int expand_one_shrinker_info(struct mem_cgroup *memcg,
if (new_nr_max <= old->map_nr_max)
continue;
- new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
+ new = kvmalloc_node(sizeof(*new) + new_size, GFP_KERNEL, nid);
if (!new)
return -ENOMEM;
- new->nr_deferred = (atomic_long_t *)(new + 1);
- new->map = (void *)new->nr_deferred + defer_size;
new->map_nr_max = new_nr_max;
- /* map: set all old bits, clear all new bits */
- memset(new->map, (int)0xff, old_map_size);
- memset((void *)new->map + old_map_size, 0, map_size - old_map_size);
- /* nr_deferred: copy old values, clear all new values */
- memcpy(new->nr_deferred, old->nr_deferred, old_defer_size);
- memset((void *)new->nr_deferred + old_defer_size, 0,
- defer_size - old_defer_size);
+ /* copy old values, allocate all new values */
+ memcpy(new->unit, old->unit, old_size);
+ if (shrinker_unit_alloc(new, old, nid)) {
+ kvfree(new);
+ return -ENOMEM;
+ }
rcu_assign_pointer(pn->shrinker_info, new);
kvfree_rcu(old, rcu);
@@ -118,9 +147,8 @@ static int expand_one_shrinker_info(struct mem_cgroup *memcg,
static int expand_shrinker_info(int new_id)
{
int ret = 0;
- int new_nr_max = round_up(new_id + 1, BITS_PER_LONG);
- int map_size, defer_size = 0;
- int old_map_size, old_defer_size = 0;
+ int new_nr_max = round_up(new_id + 1, SHRINKER_UNIT_BITS);
+ int new_size, old_size = 0;
struct mem_cgroup *memcg;
if (!root_mem_cgroup)
@@ -128,15 +156,12 @@ static int expand_shrinker_info(int new_id)
lockdep_assert_held(&shrinker_rwsem);
- map_size = shrinker_map_size(new_nr_max);
- defer_size = shrinker_defer_size(new_nr_max);
- old_map_size = shrinker_map_size(shrinker_nr_max);
- old_defer_size = shrinker_defer_size(shrinker_nr_max);
+ new_size = shrinker_unit_size(new_nr_max);
+ old_size = shrinker_unit_size(shrinker_nr_max);
memcg = mem_cgroup_iter(NULL, NULL, NULL);
do {
- ret = expand_one_shrinker_info(memcg, map_size, defer_size,
- old_map_size, old_defer_size,
+ ret = expand_one_shrinker_info(memcg, new_size, old_size,
new_nr_max);
if (ret) {
mem_cgroup_iter_break(NULL, memcg);
@@ -150,17 +175,34 @@ static int expand_shrinker_info(int new_id)
return ret;
}
+static inline int shriner_id_to_index(int shrinker_id)
+{
+ return shrinker_id / SHRINKER_UNIT_BITS;
+}
+
+static inline int shriner_id_to_offset(int shrinker_id)
+{
+ return shrinker_id % SHRINKER_UNIT_BITS;
+}
+
+static inline int calc_shrinker_id(int index, int offset)
+{
+ return index * SHRINKER_UNIT_BITS + offset;
+}
+
void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id)
{
if (shrinker_id >= 0 && memcg && !mem_cgroup_is_root(memcg)) {
struct shrinker_info *info;
+ struct shrinker_info_unit *unit;
rcu_read_lock();
info = rcu_dereference(memcg->nodeinfo[nid]->shrinker_info);
+ unit = info->unit[shriner_id_to_index(shrinker_id)];
if (!WARN_ON_ONCE(shrinker_id >= info->map_nr_max)) {
/* Pairs with smp mb in shrink_slab() */
smp_mb__before_atomic();
- set_bit(shrinker_id, info->map);
+ set_bit(shriner_id_to_offset(shrinker_id), unit->map);
}
rcu_read_unlock();
}
@@ -209,26 +251,31 @@ static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker,
struct mem_cgroup *memcg)
{
struct shrinker_info *info;
+ struct shrinker_info_unit *unit;
info = shrinker_info_protected(memcg, nid);
- return atomic_long_xchg(&info->nr_deferred[shrinker->id], 0);
+ unit = info->unit[shriner_id_to_index(shrinker->id)];
+ return atomic_long_xchg(&unit->nr_deferred[shriner_id_to_offset(shrinker->id)], 0);
}
static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker,
struct mem_cgroup *memcg)
{
struct shrinker_info *info;
+ struct shrinker_info_unit *unit;
info = shrinker_info_protected(memcg, nid);
- return atomic_long_add_return(nr, &info->nr_deferred[shrinker->id]);
+ unit = info->unit[shriner_id_to_index(shrinker->id)];
+ return atomic_long_add_return(nr, &unit->nr_deferred[shriner_id_to_offset(shrinker->id)]);
}
void reparent_shrinker_deferred(struct mem_cgroup *memcg)
{
- int i, nid;
+ int nid, index, offset;
long nr;
struct mem_cgroup *parent;
struct shrinker_info *child_info, *parent_info;
+ struct shrinker_info_unit *child_unit, *parent_unit;
parent = parent_mem_cgroup(memcg);
if (!parent)
@@ -239,9 +286,13 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg)
for_each_node(nid) {
child_info = shrinker_info_protected(memcg, nid);
parent_info = shrinker_info_protected(parent, nid);
- for (i = 0; i < child_info->map_nr_max; i++) {
- nr = atomic_long_read(&child_info->nr_deferred[i]);
- atomic_long_add(nr, &parent_info->nr_deferred[i]);
+ for (index = 0; index < shriner_id_to_index(child_info->map_nr_max); index++) {
+ child_unit = child_info->unit[index];
+ parent_unit = parent_info->unit[index];
+ for (offset = 0; offset < SHRINKER_UNIT_BITS; offset++) {
+ nr = atomic_long_read(&child_unit->nr_deferred[offset]);
+ atomic_long_add(nr, &parent_unit->nr_deferred[offset]);
+ }
}
}
up_read(&shrinker_rwsem);
@@ -407,7 +458,7 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
{
struct shrinker_info *info;
unsigned long ret, freed = 0;
- int i;
+ int offset, index = 0;
if (!mem_cgroup_online(memcg))
return 0;
@@ -419,56 +470,63 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
if (unlikely(!info))
goto unlock;
- for_each_set_bit(i, info->map, info->map_nr_max) {
- struct shrink_control sc = {
- .gfp_mask = gfp_mask,
- .nid = nid,
- .memcg = memcg,
- };
- struct shrinker *shrinker;
+ for (; index < shriner_id_to_index(info->map_nr_max); index++) {
+ struct shrinker_info_unit *unit;
- shrinker = idr_find(&shrinker_idr, i);
- if (unlikely(!shrinker || !(shrinker->flags & SHRINKER_REGISTERED))) {
- if (!shrinker)
- clear_bit(i, info->map);
- continue;
- }
+ unit = info->unit[index];
- /* Call non-slab shrinkers even though kmem is disabled */
- if (!memcg_kmem_online() &&
- !(shrinker->flags & SHRINKER_NONSLAB))
- continue;
+ for_each_set_bit(offset, unit->map, SHRINKER_UNIT_BITS) {
+ struct shrink_control sc = {
+ .gfp_mask = gfp_mask,
+ .nid = nid,
+ .memcg = memcg,
+ };
+ struct shrinker *shrinker;
+ int shrinker_id = calc_shrinker_id(index, offset);
- ret = do_shrink_slab(&sc, shrinker, priority);
- if (ret == SHRINK_EMPTY) {
- clear_bit(i, info->map);
- /*
- * After the shrinker reported that it had no objects to
- * free, but before we cleared the corresponding bit in
- * the memcg shrinker map, a new object might have been
- * added. To make sure, we have the bit set in this
- * case, we invoke the shrinker one more time and reset
- * the bit if it reports that it is not empty anymore.
- * The memory barrier here pairs with the barrier in
- * set_shrinker_bit():
- *
- * list_lru_add() shrink_slab_memcg()
- * list_add_tail() clear_bit()
- * <MB> <MB>
- * set_bit() do_shrink_slab()
- */
- smp_mb__after_atomic();
- ret = do_shrink_slab(&sc, shrinker, priority);
- if (ret == SHRINK_EMPTY)
- ret = 0;
- else
- set_shrinker_bit(memcg, nid, i);
- }
- freed += ret;
+ shrinker = idr_find(&shrinker_idr, shrinker_id);
+ if (unlikely(!shrinker || !(shrinker->flags & SHRINKER_REGISTERED))) {
+ if (!shrinker)
+ clear_bit(offset, unit->map);
+ continue;
+ }
- if (rwsem_is_contended(&shrinker_rwsem)) {
- freed = freed ? : 1;
- break;
+ /* Call non-slab shrinkers even though kmem is disabled */
+ if (!memcg_kmem_online() &&
+ !(shrinker->flags & SHRINKER_NONSLAB))
+ continue;
+
+ ret = do_shrink_slab(&sc, shrinker, priority);
+ if (ret == SHRINK_EMPTY) {
+ clear_bit(offset, unit->map);
+ /*
+ * After the shrinker reported that it had no objects to
+ * free, but before we cleared the corresponding bit in
+ * the memcg shrinker map, a new object might have been
+ * added. To make sure, we have the bit set in this
+ * case, we invoke the shrinker one more time and reset
+ * the bit if it reports that it is not empty anymore.
+ * The memory barrier here pairs with the barrier in
+ * set_shrinker_bit():
+ *
+ * list_lru_add() shrink_slab_memcg()
+ * list_add_tail() clear_bit()
+ * <MB> <MB>
+ * set_bit() do_shrink_slab()
+ */
+ smp_mb__after_atomic();
+ ret = do_shrink_slab(&sc, shrinker, priority);
+ if (ret == SHRINK_EMPTY)
+ ret = 0;
+ else
+ set_shrinker_bit(memcg, nid, shrinker_id);
+ }
+ freed += ret;
+
+ if (rwsem_is_contended(&shrinker_rwsem)) {
+ freed = freed ? : 1;
+ goto unlock;
+ }
}
}
unlock:
--
2.30.2
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the s_shrink, so that it can be freed asynchronously
using kfree_rcu(). Then it doesn't need to wait for RCU read-side critical
section when releasing the struct super_block.
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
fs/btrfs/super.c | 2 +-
fs/kernfs/mount.c | 2 +-
fs/proc/root.c | 2 +-
fs/super.c | 36 ++++++++++++++++++++----------------
include/linux/fs.h | 2 +-
5 files changed, 24 insertions(+), 20 deletions(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index cffdd6f7f8e8..4c9c878b0da4 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1519,7 +1519,7 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
error = -EBUSY;
} else {
snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev);
- shrinker_debugfs_rename(&s->s_shrink, "sb-%s:%s", fs_type->name,
+ shrinker_debugfs_rename(s->s_shrink, "sb-%s:%s", fs_type->name,
s->s_id);
btrfs_sb(s)->bdev_holder = fs_type;
error = btrfs_fill_super(s, fs_devices, data);
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index d49606accb07..2657ff1181f1 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -256,7 +256,7 @@ static int kernfs_fill_super(struct super_block *sb, struct kernfs_fs_context *k
sb->s_time_gran = 1;
/* sysfs dentries and inodes don't require IO to create */
- sb->s_shrink.seeks = 0;
+ sb->s_shrink->seeks = 0;
/* get root inode, initialize and unlock it */
down_read(&kf_root->kernfs_rwsem);
diff --git a/fs/proc/root.c b/fs/proc/root.c
index a86e65a608da..22b78b28b477 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -188,7 +188,7 @@ static int proc_fill_super(struct super_block *s, struct fs_context *fc)
s->s_stack_depth = FILESYSTEM_MAX_STACK_DEPTH;
/* procfs dentries and inodes don't require IO to create */
- s->s_shrink.seeks = 0;
+ s->s_shrink->seeks = 0;
pde_get(&proc_root);
root_inode = proc_get_inode(s, &proc_root);
diff --git a/fs/super.c b/fs/super.c
index da68584815e4..68b3877af941 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -67,7 +67,7 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
long dentries;
long inodes;
- sb = container_of(shrink, struct super_block, s_shrink);
+ sb = shrink->private_data;
/*
* Deadlock avoidance. We may hold various FS locks, and we don't want
@@ -120,7 +120,7 @@ static unsigned long super_cache_count(struct shrinker *shrink,
struct super_block *sb;
long total_objects = 0;
- sb = container_of(shrink, struct super_block, s_shrink);
+ sb = shrink->private_data;
/*
* We don't call trylock_super() here as it is a scalability bottleneck,
@@ -182,7 +182,7 @@ static void destroy_unused_super(struct super_block *s)
security_sb_free(s);
put_user_ns(s->s_user_ns);
kfree(s->s_subtype);
- free_prealloced_shrinker(&s->s_shrink);
+ shrinker_free(s->s_shrink);
/* no delays needed */
destroy_super_work(&s->destroy_work);
}
@@ -259,16 +259,20 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
s->s_time_min = TIME64_MIN;
s->s_time_max = TIME64_MAX;
- s->s_shrink.seeks = DEFAULT_SEEKS;
- s->s_shrink.scan_objects = super_cache_scan;
- s->s_shrink.count_objects = super_cache_count;
- s->s_shrink.batch = 1024;
- s->s_shrink.flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE;
- if (prealloc_shrinker(&s->s_shrink, "sb-%s", type->name))
+ s->s_shrink = shrinker_alloc(SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE,
+ "sb-%s", type->name);
+ if (!s->s_shrink)
goto fail;
- if (list_lru_init_memcg(&s->s_dentry_lru, &s->s_shrink))
+
+ s->s_shrink->seeks = DEFAULT_SEEKS;
+ s->s_shrink->scan_objects = super_cache_scan;
+ s->s_shrink->count_objects = super_cache_count;
+ s->s_shrink->batch = 1024;
+ s->s_shrink->private_data = s;
+
+ if (list_lru_init_memcg(&s->s_dentry_lru, s->s_shrink))
goto fail;
- if (list_lru_init_memcg(&s->s_inode_lru, &s->s_shrink))
+ if (list_lru_init_memcg(&s->s_inode_lru, s->s_shrink))
goto fail;
return s;
@@ -326,7 +330,7 @@ void deactivate_locked_super(struct super_block *s)
{
struct file_system_type *fs = s->s_type;
if (atomic_dec_and_test(&s->s_active)) {
- unregister_shrinker(&s->s_shrink);
+ shrinker_free(s->s_shrink);
fs->kill_sb(s);
/*
@@ -599,7 +603,7 @@ struct super_block *sget_fc(struct fs_context *fc,
hlist_add_head(&s->s_instances, &s->s_type->fs_supers);
spin_unlock(&sb_lock);
get_filesystem(s->s_type);
- register_shrinker_prepared(&s->s_shrink);
+ shrinker_register(s->s_shrink);
return s;
share_extant_sb:
@@ -678,7 +682,7 @@ struct super_block *sget(struct file_system_type *type,
hlist_add_head(&s->s_instances, &type->fs_supers);
spin_unlock(&sb_lock);
get_filesystem(type);
- register_shrinker_prepared(&s->s_shrink);
+ shrinker_register(s->s_shrink);
return s;
}
EXPORT_SYMBOL(sget);
@@ -1312,7 +1316,7 @@ int get_tree_bdev(struct fs_context *fc,
down_write(&s->s_umount);
} else {
snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev);
- shrinker_debugfs_rename(&s->s_shrink, "sb-%s:%s",
+ shrinker_debugfs_rename(s->s_shrink, "sb-%s:%s",
fc->fs_type->name, s->s_id);
sb_set_blocksize(s, block_size(bdev));
error = fill_super(s, fc);
@@ -1385,7 +1389,7 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
down_write(&s->s_umount);
} else {
snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev);
- shrinker_debugfs_rename(&s->s_shrink, "sb-%s:%s",
+ shrinker_debugfs_rename(s->s_shrink, "sb-%s:%s",
fs_type->name, s->s_id);
sb_set_blocksize(s, block_size(bdev));
error = fill_super(s, data, flags & SB_SILENT ? 1 : 0);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 891cf662b26f..500238213fd9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1232,7 +1232,7 @@ struct super_block {
const struct dentry_operations *s_d_op; /* default d_op for dentries */
- struct shrinker s_shrink; /* per-sb shrinker handle */
+ struct shrinker *s_shrink; /* per-sb shrinker handle */
/* Number of inodes with nlink == 0 but still referenced */
atomic_long_t s_remove_count;
--
2.30.2
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the nfsd-reply shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct nfsd_net.
Signed-off-by: Qi Zheng <[email protected]>
Acked-by: Chuck Lever <[email protected]>
Acked-by: Jeff Layton <[email protected]>
---
fs/nfsd/netns.h | 2 +-
fs/nfsd/nfscache.c | 31 ++++++++++++++++---------------
2 files changed, 17 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index f669444d5336..ab303a8b77d5 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -177,7 +177,7 @@ struct nfsd_net {
/* size of cache when we saw the longest hash chain */
unsigned int longest_chain_cachesize;
- struct shrinker nfsd_reply_cache_shrinker;
+ struct shrinker *nfsd_reply_cache_shrinker;
/* tracking server-to-server copy mounts */
spinlock_t nfsd_ssc_lock;
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index 80621a709510..fd56a52aa5fb 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -201,26 +201,29 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)
{
unsigned int hashsize;
unsigned int i;
- int status = 0;
nn->max_drc_entries = nfsd_cache_size_limit();
atomic_set(&nn->num_drc_entries, 0);
hashsize = nfsd_hashsize(nn->max_drc_entries);
nn->maskbits = ilog2(hashsize);
- nn->nfsd_reply_cache_shrinker.scan_objects = nfsd_reply_cache_scan;
- nn->nfsd_reply_cache_shrinker.count_objects = nfsd_reply_cache_count;
- nn->nfsd_reply_cache_shrinker.seeks = 1;
- status = register_shrinker(&nn->nfsd_reply_cache_shrinker,
- "nfsd-reply:%s", nn->nfsd_name);
- if (status)
- return status;
-
nn->drc_hashtbl = kvzalloc(array_size(hashsize,
sizeof(*nn->drc_hashtbl)), GFP_KERNEL);
if (!nn->drc_hashtbl)
+ return -ENOMEM;
+
+ nn->nfsd_reply_cache_shrinker = shrinker_alloc(0, "nfsd-reply:%s",
+ nn->nfsd_name);
+ if (!nn->nfsd_reply_cache_shrinker)
goto out_shrinker;
+ nn->nfsd_reply_cache_shrinker->scan_objects = nfsd_reply_cache_scan;
+ nn->nfsd_reply_cache_shrinker->count_objects = nfsd_reply_cache_count;
+ nn->nfsd_reply_cache_shrinker->seeks = 1;
+ nn->nfsd_reply_cache_shrinker->private_data = nn;
+
+ shrinker_register(nn->nfsd_reply_cache_shrinker);
+
for (i = 0; i < hashsize; i++) {
INIT_LIST_HEAD(&nn->drc_hashtbl[i].lru_head);
spin_lock_init(&nn->drc_hashtbl[i].cache_lock);
@@ -229,7 +232,7 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)
return 0;
out_shrinker:
- unregister_shrinker(&nn->nfsd_reply_cache_shrinker);
+ kvfree(nn->drc_hashtbl);
printk(KERN_ERR "nfsd: failed to allocate reply cache\n");
return -ENOMEM;
}
@@ -239,7 +242,7 @@ void nfsd_reply_cache_shutdown(struct nfsd_net *nn)
struct nfsd_cacherep *rp;
unsigned int i;
- unregister_shrinker(&nn->nfsd_reply_cache_shrinker);
+ shrinker_free(nn->nfsd_reply_cache_shrinker);
for (i = 0; i < nn->drc_hashsize; i++) {
struct list_head *head = &nn->drc_hashtbl[i].lru_head;
@@ -323,8 +326,7 @@ nfsd_prune_bucket_locked(struct nfsd_net *nn, struct nfsd_drc_bucket *b,
static unsigned long
nfsd_reply_cache_count(struct shrinker *shrink, struct shrink_control *sc)
{
- struct nfsd_net *nn = container_of(shrink,
- struct nfsd_net, nfsd_reply_cache_shrinker);
+ struct nfsd_net *nn = shrink->private_data;
return atomic_read(&nn->num_drc_entries);
}
@@ -343,8 +345,7 @@ nfsd_reply_cache_count(struct shrinker *shrink, struct shrink_control *sc)
static unsigned long
nfsd_reply_cache_scan(struct shrinker *shrink, struct shrink_control *sc)
{
- struct nfsd_net *nn = container_of(shrink,
- struct nfsd_net, nfsd_reply_cache_shrinker);
+ struct nfsd_net *nn = shrink->private_data;
unsigned long freed = 0;
LIST_HEAD(dispose);
unsigned int i;
--
2.30.2
Use new APIs to dynamically allocate the drm-ttm_pool shrinker.
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
drivers/gpu/drm/ttm/ttm_pool.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index cddb9151d20f..c9c9618c0dce 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -73,7 +73,7 @@ static struct ttm_pool_type global_dma32_uncached[MAX_ORDER + 1];
static spinlock_t shrinker_lock;
static struct list_head shrinker_list;
-static struct shrinker mm_shrinker;
+static struct shrinker *mm_shrinker;
/* Allocate pages of size 1 << order with the given gfp_flags */
static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
@@ -734,8 +734,8 @@ static int ttm_pool_debugfs_shrink_show(struct seq_file *m, void *data)
struct shrink_control sc = { .gfp_mask = GFP_NOFS };
fs_reclaim_acquire(GFP_KERNEL);
- seq_printf(m, "%lu/%lu\n", ttm_pool_shrinker_count(&mm_shrinker, &sc),
- ttm_pool_shrinker_scan(&mm_shrinker, &sc));
+ seq_printf(m, "%lu/%lu\n", ttm_pool_shrinker_count(mm_shrinker, &sc),
+ ttm_pool_shrinker_scan(mm_shrinker, &sc));
fs_reclaim_release(GFP_KERNEL);
return 0;
@@ -779,10 +779,17 @@ int ttm_pool_mgr_init(unsigned long num_pages)
&ttm_pool_debugfs_shrink_fops);
#endif
- mm_shrinker.count_objects = ttm_pool_shrinker_count;
- mm_shrinker.scan_objects = ttm_pool_shrinker_scan;
- mm_shrinker.seeks = 1;
- return register_shrinker(&mm_shrinker, "drm-ttm_pool");
+ mm_shrinker = shrinker_alloc(0, "drm-ttm_pool");
+ if (!mm_shrinker)
+ return -ENOMEM;
+
+ mm_shrinker->count_objects = ttm_pool_shrinker_count;
+ mm_shrinker->scan_objects = ttm_pool_shrinker_scan;
+ mm_shrinker->seeks = 1;
+
+ shrinker_register(mm_shrinker);
+
+ return 0;
}
/**
@@ -802,6 +809,6 @@ void ttm_pool_mgr_fini(void)
ttm_pool_type_fini(&global_dma32_uncached[i]);
}
- unregister_shrinker(&mm_shrinker);
+ shrinker_free(mm_shrinker);
WARN_ON(!list_empty(&shrinker_list));
}
--
2.30.2
Use new APIs to dynamically allocate the ubifs-slab shrinker.
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
---
fs/ubifs/super.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index b08fb28d16b5..c690782388a8 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -54,11 +54,7 @@ module_param_cb(default_version, &ubifs_default_version_ops, &ubifs_default_vers
static struct kmem_cache *ubifs_inode_slab;
/* UBIFS TNC shrinker description */
-static struct shrinker ubifs_shrinker_info = {
- .scan_objects = ubifs_shrink_scan,
- .count_objects = ubifs_shrink_count,
- .seeks = DEFAULT_SEEKS,
-};
+static struct shrinker *ubifs_shrinker_info;
/**
* validate_inode - validate inode.
@@ -2373,7 +2369,7 @@ static void inode_slab_ctor(void *obj)
static int __init ubifs_init(void)
{
- int err;
+ int err = -ENOMEM;
BUILD_BUG_ON(sizeof(struct ubifs_ch) != 24);
@@ -2439,10 +2435,16 @@ static int __init ubifs_init(void)
if (!ubifs_inode_slab)
return -ENOMEM;
- err = register_shrinker(&ubifs_shrinker_info, "ubifs-slab");
- if (err)
+ ubifs_shrinker_info = shrinker_alloc(0, "ubifs-slab");
+ if (!ubifs_shrinker_info)
goto out_slab;
+ ubifs_shrinker_info->count_objects = ubifs_shrink_count;
+ ubifs_shrinker_info->scan_objects = ubifs_shrink_scan;
+ ubifs_shrinker_info->seeks = DEFAULT_SEEKS;
+
+ shrinker_register(ubifs_shrinker_info);
+
err = ubifs_compressors_init();
if (err)
goto out_shrinker;
@@ -2467,7 +2469,7 @@ static int __init ubifs_init(void)
dbg_debugfs_exit();
ubifs_compressors_exit();
out_shrinker:
- unregister_shrinker(&ubifs_shrinker_info);
+ shrinker_free(ubifs_shrinker_info);
out_slab:
kmem_cache_destroy(ubifs_inode_slab);
return err;
@@ -2483,7 +2485,7 @@ static void __exit ubifs_exit(void)
dbg_debugfs_exit();
ubifs_sysfs_exit();
ubifs_compressors_exit();
- unregister_shrinker(&ubifs_shrinker_info);
+ shrinker_free(ubifs_shrinker_info);
/*
* Make sure all delayed rcu free inodes are flushed before we
--
2.30.2
Hi Simon,
On 2023/7/28 16:13, Simon Horman wrote:
> On Thu, Jul 27, 2023 at 04:04:17PM +0800, Qi Zheng wrote:
>> The debugfs_remove_recursive() will wait for debugfs_file_put() to return,
>> so the shrinker will not be freed when doing debugfs operations (such as
>> shrinker_debugfs_count_show() and shrinker_debugfs_scan_write()), so there
>> is no need to hold shrinker_rwsem during debugfs operations.
>>
>> Signed-off-by: Qi Zheng <[email protected]>
>> Reviewed-by: Muchun Song <[email protected]>
>> ---
>> mm/shrinker_debug.c | 14 --------------
>> 1 file changed, 14 deletions(-)
>>
>> diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
>> index 3ab53fad8876..f1becfd45853 100644
>> --- a/mm/shrinker_debug.c
>> +++ b/mm/shrinker_debug.c
>> @@ -55,11 +55,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
>> if (!count_per_node)
>> return -ENOMEM;
>>
>> - ret = down_read_killable(&shrinker_rwsem);
>> - if (ret) {
>> - kfree(count_per_node);
>> - return ret;
>> - }
>> rcu_read_lock();
>
> Hi Qi Zheng,
>
> As can be seen in the next hunk, this function returns 'ret'.
> However, with this change 'ret' is uninitialised unless
> signal_pending() returns non-zero in the while loop below.
Thanks for your feedback, the 'ret' should be initialized to 0,
will fix it.
Thanks,
Qi
>
> This is flagged in a clan-16 W=1 build.
>
> mm/shrinker_debug.c:87:11: warning: variable 'ret' is used uninitialized whenever 'do' loop exits because its condition is false [-Wsometimes-uninitialized]
> } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> mm/shrinker_debug.c:92:9: note: uninitialized use occurs here
> return ret;
> ^~~
> mm/shrinker_debug.c:87:11: note: remove the condition if it is always true
> } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 1
> mm/shrinker_debug.c:77:7: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
> if (!memcg_aware) {
> ^~~~~~~~~~~~
> mm/shrinker_debug.c:92:9: note: uninitialized use occurs here
> return ret;
> ^~~
> mm/shrinker_debug.c:77:3: note: remove the 'if' if its condition is always false
> if (!memcg_aware) {
> ^~~~~~~~~~~~~~~~~~~
> mm/shrinker_debug.c:52:9: note: initialize the variable 'ret' to silence this warning
> int ret, nid;
> ^
> = 0
>
>>
>> memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
>> @@ -92,7 +87,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, void *v)
>> } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
>>
>> rcu_read_unlock();
>> - up_read(&shrinker_rwsem);
>>
>> kfree(count_per_node);
>> return ret;
>
> ...