We introduced alloc_inode_sb() in previous version 2, which sets up the
inode reclaim context properly, to allocate filesystems specific inode.
So we have to convert to new API for all filesystems, which is done in
one patch. Some filesystems are easy to convert (just replace
kmem_cache_alloc() to alloc_inode_sb()), while other filesystems need to
do more work. In order to make it easy for maintainers of different
filesystems to review their own maintained part, I split the patch into
patches which are per-filesystem in this version. I am not sure if this
is a good idea, because there is going to be more commits.
In our server, we found a suspected memory leak problem. The kmalloc-32
consumes more than 6GB of memory. Other kmem_caches consume less than 2GB
memory.
After our in-depth analysis, the memory consumption of kmalloc-32 slab
cache is the cause of list_lru_one allocation.
crash> p memcg_nr_cache_ids
memcg_nr_cache_ids = $2 = 24574
memcg_nr_cache_ids is very large and memory consumption of each list_lru
can be calculated with the following formula.
num_numa_node * memcg_nr_cache_ids * 32 (kmalloc-32)
There are 4 numa nodes in our system, so each list_lru consumes ~3MB.
crash> list super_blocks | wc -l
952
Every mount will register 2 list lrus, one is for inode, another is for
dentry. There are 952 super_blocks. So the total memory is 952 * 2 * 3
MB (~5.6GB). But now the number of memory cgroups is less than 500. So I
guess more than 12286 memory cgroups have been created on this machine (I
do not know why there are so many cgroups, it may be a user's bug or
the user really want to do that). Because memcg_nr_cache_ids has not been
reduced to a suitable value. It leads to waste a lot of memory. If we want
to reduce memcg_nr_cache_ids, we have to *reboot* the server. This is not
what we want.
In order to reduce memcg_nr_cache_ids, I had posted a patchset [1] to do
this. But this did not fundamentally solve the problem.
We currently allocate scope for every memcg to be able to tracked on every
superblock instantiated in the system, regardless of whether that superblock
is even accessible to that memcg.
These huge memcg counts come from container hosts where memcgs are confined
to just a small subset of the total number of superblocks that instantiated
at any given point in time.
For these systems with huge container counts, list_lru does not need the
capability of tracking every memcg on every superblock.
What it comes down to is that the list_lru is only needed for a given memcg
if that memcg is instatiating and freeing objects on a given list_lru.
As Dave said, "Which makes me think we should be moving more towards 'add the
memcg to the list_lru at the first insert' model rather than 'instantiate
all at memcg init time just in case'."
This patchset aims to optimize the list lru memory consumption from different
aspects.
Patch 1-6 are code simplification.
Patch 7 converts the array from per-memcg per-node to per-memcg
Patch 8 introduces kmem_cache_alloc_lru()
Patch 9 introduces alloc_inode_sb()
Patch 10-66 convert all filesystems to alloc_inode_sb() respectively.
Patch 70 let list_lru allocation dynamically.
Patch 72 use xarray to optimize per memcg pointer array size.
Patch 73-76 is code simplification.
I had done a easy test to show the optimization. I create 10k memory cgroups
and mount 10k filesystems in the systems. We use free command to show how many
memory does the systems comsumes after this operation (There are 2 numa nodes
in the system).
+-----------------------+------------------------+
| condition | memory consumption |
+-----------------------+------------------------+
| without this patchset | 24464 MB |
+-----------------------+------------------------+
| after patch 7 | 21957 MB | <--------+
+-----------------------+------------------------+ |
| after patch 70 | 6895 MB | |
+-----------------------+------------------------+ |
| after patch 72 | 4367 MB | |
+-----------------------+------------------------+ |
|
The more the number of nodes, the more obvious the effect---+
BTW, there was a recent discussion [2] on the same issue.
[1] https://lore.kernel.org/linux-fsdevel/[email protected]/
[2] https://lore.kernel.org/linux-fsdevel/[email protected]/
This series not only optimizes the memory usage of list_lru but also
simplifies the code.
Changelog in v3:
- Fix mixing advanced and normal XArray concepts (Thanks to Matthew).
- Split one patch into per-filesystem patches.
Changelog in v2:
- Update Documentation/filesystems/porting.rst suggested by Dave.
- Add a comment above alloc_inode_sb() suggested by Dave.
- Rework some patch's commit log.
- Add patch 18-21.
Thanks Dave.
Muchun Song (76):
mm: list_lru: fix the return value of list_lru_count_one()
mm: memcontrol: remove kmemcg_id reparenting
mm: memcontrol: remove the kmem states
mm: memcontrol: move memcg_online_kmem() to mem_cgroup_css_online()
mm: list_lru: remove holding lru lock
mm: list_lru: only add memcg-aware lrus to the global lru list
mm: list_lru: optimize memory consumption of arrays
mm: introduce kmem_cache_alloc_lru
fs: introduce alloc_inode_sb() to allocate filesystems specific inode
dax: allocate inode by using alloc_inode_sb()
9p: allocate inode by using alloc_inode_sb()
adfs: allocate inode by using alloc_inode_sb()
affs: allocate inode by using alloc_inode_sb()
afs: allocate inode by using alloc_inode_sb()
befs: allocate inode by using alloc_inode_sb()
bfs: allocate inode by using alloc_inode_sb()
block: allocate inode by using alloc_inode_sb()
btrfs: allocate inode by using alloc_inode_sb()
ceph: allocate inode by using alloc_inode_sb()
cifs: allocate inode by using alloc_inode_sb()
coda: allocate inode by using alloc_inode_sb()
ecryptfs: allocate inode by using alloc_inode_sb()
efs: allocate inode by using alloc_inode_sb()
erofs: allocate inode by using alloc_inode_sb()
exfat: allocate inode by using alloc_inode_sb()
ext2: allocate inode by using alloc_inode_sb()
ext4: allocate inode by using alloc_inode_sb()
fat: allocate inode by using alloc_inode_sb()
freevxfs: allocate inode by using alloc_inode_sb()
fuse: allocate inode by using alloc_inode_sb()
gfs2: allocate inode by using alloc_inode_sb()
hfs: allocate inode by using alloc_inode_sb()
hfsplus: allocate inode by using alloc_inode_sb()
hostfs: allocate inode by using alloc_inode_sb()
hpfs: allocate inode by using alloc_inode_sb()
hugetlbfs: allocate inode by using alloc_inode_sb()
isofs: allocate inode by using alloc_inode_sb()
jffs2: allocate inode by using alloc_inode_sb()
jfs: allocate inode by using alloc_inode_sb()
minix: allocate inode by using alloc_inode_sb()
nfs: allocate inode by using alloc_inode_sb()
nilfs2: allocate inode by using alloc_inode_sb()
ntfs: allocate inode by using alloc_inode_sb()
ocfs2: allocate inode by using alloc_inode_sb()
openpromfs: allocate inode by using alloc_inode_sb()
orangefs: allocate inode by using alloc_inode_sb()
overlayfs: allocate inode by using alloc_inode_sb()
proc: allocate inode by using alloc_inode_sb()
qnx4: allocate inode by using alloc_inode_sb()
qnx6: allocate inode by using alloc_inode_sb()
reiserfs: allocate inode by using alloc_inode_sb()
romfs: allocate inode by using alloc_inode_sb()
squashfs: allocate inode by using alloc_inode_sb()
sysv: allocate inode by using alloc_inode_sb()
ubifs: allocate inode by using alloc_inode_sb()
udf: allocate inode by using alloc_inode_sb()
ufs: allocate inode by using alloc_inode_sb()
vboxsf: allocate inode by using alloc_inode_sb()
xfs: allocate inode by using alloc_inode_sb()
zonefs: allocate inode by using alloc_inode_sb()
ipc: allocate inode by using alloc_inode_sb()
shmem: allocate inode by using alloc_inode_sb()
net: allocate inode by using alloc_inode_sb()
rpc: allocate inode by using alloc_inode_sb()
f2fs: allocate inode by using alloc_inode_sb()
nfs42: use a specific kmem_cache to allocate nfs4_xattr_entry
mm: dcache: use kmem_cache_alloc_lru() to allocate dentry
xarray: use kmem_cache_alloc_lru to allocate xa_node
mm: workingset: use xas_set_lru() to pass shadow_nodes
mm: list_lru: allocate list_lru_one only when needed
mm: list_lru: rename memcg_drain_all_list_lrus to
memcg_reparent_list_lrus
mm: list_lru: replace linear array with xarray
mm: memcontrol: reuse memory cgroup ID for kmem ID
mm: memcontrol: fix cannot alloc the maximum memcg ID
mm: list_lru: rename list_lru_per_memcg to list_lru_memcg
mm: memcontrol: rename memcg_cache_id to memcg_kmem_id
Documentation/filesystems/porting.rst | 5 +
drivers/dax/super.c | 2 +-
fs/9p/vfs_inode.c | 2 +-
fs/adfs/super.c | 2 +-
fs/affs/super.c | 2 +-
fs/afs/super.c | 2 +-
fs/befs/linuxvfs.c | 2 +-
fs/bfs/inode.c | 2 +-
fs/block_dev.c | 2 +-
fs/btrfs/inode.c | 2 +-
fs/ceph/inode.c | 2 +-
fs/cifs/cifsfs.c | 2 +-
fs/coda/inode.c | 2 +-
fs/dcache.c | 3 +-
fs/ecryptfs/super.c | 2 +-
fs/efs/super.c | 2 +-
fs/erofs/super.c | 2 +-
fs/exfat/super.c | 2 +-
fs/ext2/super.c | 2 +-
fs/ext4/super.c | 2 +-
fs/f2fs/super.c | 8 +-
fs/fat/inode.c | 2 +-
fs/freevxfs/vxfs_super.c | 2 +-
fs/fuse/inode.c | 2 +-
fs/gfs2/super.c | 2 +-
fs/hfs/super.c | 2 +-
fs/hfsplus/super.c | 2 +-
fs/hostfs/hostfs_kern.c | 2 +-
fs/hpfs/super.c | 2 +-
fs/hugetlbfs/inode.c | 2 +-
fs/inode.c | 2 +-
fs/isofs/inode.c | 2 +-
fs/jffs2/super.c | 2 +-
fs/jfs/super.c | 2 +-
fs/minix/inode.c | 2 +-
fs/nfs/inode.c | 2 +-
fs/nfs/nfs42xattr.c | 95 ++++---
fs/nilfs2/super.c | 2 +-
fs/ntfs/inode.c | 2 +-
fs/ocfs2/dlmfs/dlmfs.c | 2 +-
fs/ocfs2/super.c | 2 +-
fs/openpromfs/inode.c | 2 +-
fs/orangefs/super.c | 2 +-
fs/overlayfs/super.c | 2 +-
fs/proc/inode.c | 2 +-
fs/qnx4/inode.c | 2 +-
fs/qnx6/inode.c | 2 +-
fs/reiserfs/super.c | 2 +-
fs/romfs/super.c | 2 +-
fs/squashfs/super.c | 2 +-
fs/sysv/inode.c | 2 +-
fs/ubifs/super.c | 2 +-
fs/udf/super.c | 2 +-
fs/ufs/super.c | 2 +-
fs/vboxsf/super.c | 2 +-
fs/xfs/xfs_icache.c | 2 +-
fs/zonefs/super.c | 2 +-
include/linux/fs.h | 11 +
include/linux/list_lru.h | 16 +-
include/linux/memcontrol.h | 49 ++--
include/linux/slab.h | 3 +
include/linux/swap.h | 5 +-
include/linux/xarray.h | 9 +-
ipc/mqueue.c | 2 +-
lib/xarray.c | 10 +-
mm/list_lru.c | 472 ++++++++++++++++------------------
mm/memcontrol.c | 190 ++------------
mm/shmem.c | 2 +-
mm/slab.c | 39 ++-
mm/slab.h | 17 +-
mm/slob.c | 6 +
mm/slub.c | 42 ++-
mm/workingset.c | 2 +-
net/socket.c | 2 +-
net/sunrpc/rpc_pipe.c | 2 +-
75 files changed, 498 insertions(+), 598 deletions(-)
--
2.11.0
Since slab objects and kmem pages are charged to object cgroup instead
of memory cgroup, memcg_reparent_objcgs() will reparent this cgroup and
all its descendants to its parent cgroup. This already makes further
list_lru_add()'s add elements to the parent's list. So it is unnecessary
to change kmemcg_id of an offline cgroup to its parent's id. It just
wastes CPU cycles. Just to remove those redundant code.
Signed-off-by: Muchun Song <[email protected]>
---
mm/memcontrol.c | 20 ++------------------
1 file changed, 2 insertions(+), 18 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 999e626f4111..e0d7ceb0db26 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3635,8 +3635,7 @@ static int memcg_online_kmem(struct mem_cgroup *memcg)
static void memcg_offline_kmem(struct mem_cgroup *memcg)
{
- struct cgroup_subsys_state *css;
- struct mem_cgroup *parent, *child;
+ struct mem_cgroup *parent;
int kmemcg_id;
if (memcg->kmem_state != KMEM_ONLINE)
@@ -3653,22 +3652,7 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg)
kmemcg_id = memcg->kmemcg_id;
BUG_ON(kmemcg_id < 0);
- /*
- * Change kmemcg_id of this cgroup and all its descendants to the
- * parent's id, and then move all entries from this cgroup's list_lrus
- * to ones of the parent. After we have finished, all list_lrus
- * corresponding to this cgroup are guaranteed to remain empty. The
- * ordering is imposed by list_lru_node->lock taken by
- * memcg_drain_all_list_lrus().
- */
- rcu_read_lock(); /* can be called from css_free w/o cgroup_mutex */
- css_for_each_descendant_pre(css, &memcg->css) {
- child = mem_cgroup_from_css(css);
- BUG_ON(child->kmemcg_id != kmemcg_id);
- child->kmemcg_id = parent->kmemcg_id;
- }
- rcu_read_unlock();
-
+ /* memcg_reparent_objcgs() must be called before this. */
memcg_drain_all_list_lrus(kmemcg_id, parent);
memcg_free_cache_id(kmemcg_id);
--
2.11.0
Now the kmem states is only used to indicate whether the kmem is
offline. Because css_alloc() could fail, then we didn't make the
kmem offline. In this case, we need the kmem state to mark this
so that memcg_free_kmem() can make the kmem offline.
However, we can set ->kmemcg_id to -1 to indicate the kmem is
offline. Actually, we can remove the kmem states to simplify the
code.
Signed-off-by: Muchun Song <[email protected]>
---
include/linux/memcontrol.h | 7 -------
mm/memcontrol.c | 9 +++------
2 files changed, 3 insertions(+), 13 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 3a0ce40090c6..7267cf9d1f3d 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -180,12 +180,6 @@ struct mem_cgroup_thresholds {
struct mem_cgroup_threshold_ary *spare;
};
-enum memcg_kmem_state {
- KMEM_NONE,
- KMEM_ALLOCATED,
- KMEM_ONLINE,
-};
-
#if defined(CONFIG_SMP)
struct memcg_padding {
char x[0];
@@ -318,7 +312,6 @@ struct mem_cgroup {
#ifdef CONFIG_MEMCG_KMEM
int kmemcg_id;
- enum memcg_kmem_state kmem_state;
struct obj_cgroup __rcu *objcg;
struct list_head objcg_list; /* list of inherited objcgs */
#endif
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e0d7ceb0db26..6844d8b511d8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3611,7 +3611,6 @@ static int memcg_online_kmem(struct mem_cgroup *memcg)
return 0;
BUG_ON(memcg->kmemcg_id >= 0);
- BUG_ON(memcg->kmem_state);
memcg_id = memcg_alloc_cache_id();
if (memcg_id < 0)
@@ -3628,7 +3627,6 @@ static int memcg_online_kmem(struct mem_cgroup *memcg)
static_branch_enable(&memcg_kmem_enabled_key);
memcg->kmemcg_id = memcg_id;
- memcg->kmem_state = KMEM_ONLINE;
return 0;
}
@@ -3638,11 +3636,9 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg)
struct mem_cgroup *parent;
int kmemcg_id;
- if (memcg->kmem_state != KMEM_ONLINE)
+ if (cgroup_memory_nokmem)
return;
- memcg->kmem_state = KMEM_ALLOCATED;
-
parent = parent_mem_cgroup(memcg);
if (!parent)
parent = root_mem_cgroup;
@@ -3656,12 +3652,13 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg)
memcg_drain_all_list_lrus(kmemcg_id, parent);
memcg_free_cache_id(kmemcg_id);
+ memcg->kmemcg_id = -1;
}
static void memcg_free_kmem(struct mem_cgroup *memcg)
{
/* css_alloc() failed, offlining didn't happen */
- if (unlikely(memcg->kmem_state == KMEM_ONLINE))
+ if (unlikely(memcg->kmemcg_id != -1))
memcg_offline_kmem(memcg);
}
#else
--
2.11.0
The non-memcg-aware lru is always skiped when traversing the global lru
list, which is not efficient. We can only add the memcg-aware lru to the
global lru list instead to make traversing more efficient.
Signed-off-by: Muchun Song <[email protected]>
---
mm/list_lru.c | 35 ++++++++++++++++-------------------
1 file changed, 16 insertions(+), 19 deletions(-)
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 6b2f3cbe5f67..39828632631c 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -15,18 +15,29 @@
#include "slab.h"
#ifdef CONFIG_MEMCG_KMEM
-static LIST_HEAD(list_lrus);
+static LIST_HEAD(memcg_list_lrus);
static DEFINE_MUTEX(list_lrus_mutex);
+static inline bool list_lru_memcg_aware(struct list_lru *lru)
+{
+ return lru->memcg_aware;
+}
+
static void list_lru_register(struct list_lru *lru)
{
+ if (!list_lru_memcg_aware(lru))
+ return;
+
mutex_lock(&list_lrus_mutex);
- list_add(&lru->list, &list_lrus);
+ list_add(&lru->list, &memcg_list_lrus);
mutex_unlock(&list_lrus_mutex);
}
static void list_lru_unregister(struct list_lru *lru)
{
+ if (!list_lru_memcg_aware(lru))
+ return;
+
mutex_lock(&list_lrus_mutex);
list_del(&lru->list);
mutex_unlock(&list_lrus_mutex);
@@ -37,11 +48,6 @@ static int lru_shrinker_id(struct list_lru *lru)
return lru->shrinker_id;
}
-static inline bool list_lru_memcg_aware(struct list_lru *lru)
-{
- return lru->memcg_aware;
-}
-
static inline struct list_lru_one *
list_lru_from_memcg_idx(struct list_lru_node *nlru, int idx)
{
@@ -458,9 +464,6 @@ static int memcg_update_list_lru(struct list_lru *lru,
{
int i;
- if (!list_lru_memcg_aware(lru))
- return 0;
-
for_each_node(i) {
if (memcg_update_list_lru_node(&lru->node[i],
old_size, new_size))
@@ -483,9 +486,6 @@ static void memcg_cancel_update_list_lru(struct list_lru *lru,
{
int i;
- if (!list_lru_memcg_aware(lru))
- return;
-
for_each_node(i)
memcg_cancel_update_list_lru_node(&lru->node[i],
old_size, new_size);
@@ -498,7 +498,7 @@ int memcg_update_all_list_lrus(int new_size)
int old_size = memcg_nr_cache_ids;
mutex_lock(&list_lrus_mutex);
- list_for_each_entry(lru, &list_lrus, list) {
+ list_for_each_entry(lru, &memcg_list_lrus, list) {
ret = memcg_update_list_lru(lru, old_size, new_size);
if (ret)
goto fail;
@@ -507,7 +507,7 @@ int memcg_update_all_list_lrus(int new_size)
mutex_unlock(&list_lrus_mutex);
return ret;
fail:
- list_for_each_entry_continue_reverse(lru, &list_lrus, list)
+ list_for_each_entry_continue_reverse(lru, &memcg_list_lrus, list)
memcg_cancel_update_list_lru(lru, old_size, new_size);
goto out;
}
@@ -544,9 +544,6 @@ static void memcg_drain_list_lru(struct list_lru *lru,
{
int i;
- if (!list_lru_memcg_aware(lru))
- return;
-
for_each_node(i)
memcg_drain_list_lru_node(lru, i, src_idx, dst_memcg);
}
@@ -556,7 +553,7 @@ void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_memcg)
struct list_lru *lru;
mutex_lock(&list_lrus_mutex);
- list_for_each_entry(lru, &list_lrus, list)
+ list_for_each_entry(lru, &memcg_list_lrus, list)
memcg_drain_list_lru(lru, src_idx, dst_memcg);
mutex_unlock(&list_lrus_mutex);
}
--
2.11.0
The list_lru use an array to store the list_lru_one pointers, which is
per-memcg per-node. What if we run 10k containers in the system? The
size of the array of every list_lru can be 10k * number_of_node *
sizeof(void *). The array size becomes very big, the more numa node
in the system, the more memory it consumes. We can convert the array
to per-memcg instead of per-memcg per-node. It can save memory
especially when there are many numa nodes in the system. And also
simplify the code. In my test case (10k memory cgroup and 2 NUMA
nodes), it can save 2.5GB memory.
Signed-off-by: Muchun Song <[email protected]>
---
include/linux/list_lru.h | 17 +++--
mm/list_lru.c | 191 +++++++++++++++++------------------------------
2 files changed, 79 insertions(+), 129 deletions(-)
diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index 1b5fceb565df..2b32dbd89214 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -31,10 +31,15 @@ struct list_lru_one {
long nr_items;
};
+struct list_lru_per_memcg {
+ /* array of per cgroup per node lists, indexed by node id */
+ struct list_lru_one nodes[0];
+};
+
struct list_lru_memcg {
- struct rcu_head rcu;
+ struct rcu_head rcu;
/* array of per cgroup lists, indexed by memcg_cache_id */
- struct list_lru_one *lru[];
+ struct list_lru_per_memcg *lrus[];
};
struct list_lru_node {
@@ -42,11 +47,7 @@ struct list_lru_node {
spinlock_t lock;
/* global list, used for the root cgroup in cgroup aware lrus */
struct list_lru_one lru;
-#ifdef CONFIG_MEMCG_KMEM
- /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
- struct list_lru_memcg __rcu *memcg_lrus;
-#endif
- long nr_items;
+ long nr_items;
} ____cacheline_aligned_in_smp;
struct list_lru {
@@ -55,6 +56,8 @@ struct list_lru {
struct list_head list;
int shrinker_id;
bool memcg_aware;
+ /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
+ struct list_lru_memcg __rcu *memcg_lrus;
#endif
};
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 39828632631c..f1c73b53af9a 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -49,35 +49,38 @@ static int lru_shrinker_id(struct list_lru *lru)
}
static inline struct list_lru_one *
-list_lru_from_memcg_idx(struct list_lru_node *nlru, int idx)
+list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
{
struct list_lru_memcg *memcg_lrus;
+ struct list_lru_node *nlru = &lru->node[nid];
+
/*
* Either lock or RCU protects the array of per cgroup lists
- * from relocation (see memcg_update_list_lru_node).
+ * from relocation (see memcg_update_list_lru).
*/
- memcg_lrus = rcu_dereference_check(nlru->memcg_lrus,
+ memcg_lrus = rcu_dereference_check(lru->memcg_lrus,
lockdep_is_held(&nlru->lock));
if (memcg_lrus && idx >= 0)
- return memcg_lrus->lru[idx];
+ return &memcg_lrus->lrus[idx]->nodes[nid];
return &nlru->lru;
}
static inline struct list_lru_one *
-list_lru_from_kmem(struct list_lru_node *nlru, void *ptr,
+list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr,
struct mem_cgroup **memcg_ptr)
{
+ struct list_lru_node *nlru = &lru->node[nid];
struct list_lru_one *l = &nlru->lru;
struct mem_cgroup *memcg = NULL;
- if (!nlru->memcg_lrus)
+ if (!lru->memcg_lrus)
goto out;
memcg = mem_cgroup_from_obj(ptr);
if (!memcg)
goto out;
- l = list_lru_from_memcg_idx(nlru, memcg_cache_id(memcg));
+ l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg));
out:
if (memcg_ptr)
*memcg_ptr = memcg;
@@ -103,18 +106,18 @@ static inline bool list_lru_memcg_aware(struct list_lru *lru)
}
static inline struct list_lru_one *
-list_lru_from_memcg_idx(struct list_lru_node *nlru, int idx)
+list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
{
- return &nlru->lru;
+ return &lru->node[nid].lru;
}
static inline struct list_lru_one *
-list_lru_from_kmem(struct list_lru_node *nlru, void *ptr,
+list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr,
struct mem_cgroup **memcg_ptr)
{
if (memcg_ptr)
*memcg_ptr = NULL;
- return &nlru->lru;
+ return &lru->node[nid].lru;
}
#endif /* CONFIG_MEMCG_KMEM */
@@ -127,7 +130,7 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item)
spin_lock(&nlru->lock);
if (list_empty(item)) {
- l = list_lru_from_kmem(nlru, item, &memcg);
+ l = list_lru_from_kmem(lru, nid, item, &memcg);
list_add_tail(item, &l->list);
/* Set shrinker bit if the first element was added */
if (!l->nr_items++)
@@ -150,7 +153,7 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item)
spin_lock(&nlru->lock);
if (!list_empty(item)) {
- l = list_lru_from_kmem(nlru, item, NULL);
+ l = list_lru_from_kmem(lru, nid, item, NULL);
list_del_init(item);
l->nr_items--;
nlru->nr_items--;
@@ -180,12 +183,11 @@ EXPORT_SYMBOL_GPL(list_lru_isolate_move);
unsigned long list_lru_count_one(struct list_lru *lru,
int nid, struct mem_cgroup *memcg)
{
- struct list_lru_node *nlru = &lru->node[nid];
struct list_lru_one *l;
long count;
rcu_read_lock();
- l = list_lru_from_memcg_idx(nlru, memcg_cache_id(memcg));
+ l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg));
count = READ_ONCE(l->nr_items);
rcu_read_unlock();
@@ -206,16 +208,16 @@ unsigned long list_lru_count_node(struct list_lru *lru, int nid)
EXPORT_SYMBOL_GPL(list_lru_count_node);
static unsigned long
-__list_lru_walk_one(struct list_lru_node *nlru, int memcg_idx,
+__list_lru_walk_one(struct list_lru *lru, int nid, int memcg_idx,
list_lru_walk_cb isolate, void *cb_arg,
unsigned long *nr_to_walk)
{
-
+ struct list_lru_node *nlru = &lru->node[nid];
struct list_lru_one *l;
struct list_head *item, *n;
unsigned long isolated = 0;
- l = list_lru_from_memcg_idx(nlru, memcg_idx);
+ l = list_lru_from_memcg_idx(lru, nid, memcg_idx);
restart:
list_for_each_safe(item, n, &l->list) {
enum lru_status ret;
@@ -272,8 +274,8 @@ list_lru_walk_one(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
unsigned long ret;
spin_lock(&nlru->lock);
- ret = __list_lru_walk_one(nlru, memcg_cache_id(memcg), isolate, cb_arg,
- nr_to_walk);
+ ret = __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate,
+ cb_arg, nr_to_walk);
spin_unlock(&nlru->lock);
return ret;
}
@@ -288,8 +290,8 @@ list_lru_walk_one_irq(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
unsigned long ret;
spin_lock_irq(&nlru->lock);
- ret = __list_lru_walk_one(nlru, memcg_cache_id(memcg), isolate, cb_arg,
- nr_to_walk);
+ ret = __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate,
+ cb_arg, nr_to_walk);
spin_unlock_irq(&nlru->lock);
return ret;
}
@@ -308,7 +310,7 @@ unsigned long list_lru_walk_node(struct list_lru *lru, int nid,
struct list_lru_node *nlru = &lru->node[nid];
spin_lock(&nlru->lock);
- isolated += __list_lru_walk_one(nlru, memcg_idx,
+ isolated += __list_lru_walk_one(lru, nid, memcg_idx,
isolate, cb_arg,
nr_to_walk);
spin_unlock(&nlru->lock);
@@ -328,167 +330,112 @@ static void init_one_lru(struct list_lru_one *l)
}
#ifdef CONFIG_MEMCG_KMEM
-static void __memcg_destroy_list_lru_node(struct list_lru_memcg *memcg_lrus,
- int begin, int end)
+static void memcg_destroy_list_lru_range(struct list_lru_memcg *memcg_lrus,
+ int begin, int end)
{
int i;
for (i = begin; i < end; i++)
- kfree(memcg_lrus->lru[i]);
+ kfree(memcg_lrus->lrus[i]);
}
-static int __memcg_init_list_lru_node(struct list_lru_memcg *memcg_lrus,
- int begin, int end)
+static int memcg_init_list_lru_range(struct list_lru_memcg *memcg_lrus,
+ int begin, int end)
{
int i;
for (i = begin; i < end; i++) {
- struct list_lru_one *l;
+ int nid;
+ struct list_lru_per_memcg *lru;
- l = kmalloc(sizeof(struct list_lru_one), GFP_KERNEL);
- if (!l)
+ lru = kmalloc(struct_size(lru, nodes, nr_node_ids), GFP_KERNEL);
+ if (!lru)
goto fail;
- init_one_lru(l);
- memcg_lrus->lru[i] = l;
+ for_each_node(nid)
+ init_one_lru(&lru->nodes[nid]);
+ memcg_lrus->lrus[i] = lru;
}
return 0;
fail:
- __memcg_destroy_list_lru_node(memcg_lrus, begin, i);
+ memcg_destroy_list_lru_range(memcg_lrus, begin, i);
return -ENOMEM;
}
-static int memcg_init_list_lru_node(struct list_lru_node *nlru)
+static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
{
struct list_lru_memcg *memcg_lrus;
int size = memcg_nr_cache_ids;
+ lru->memcg_aware = memcg_aware;
+ if (!memcg_aware)
+ return 0;
+
memcg_lrus = kvmalloc(sizeof(*memcg_lrus) +
- size * sizeof(void *), GFP_KERNEL);
+ size * sizeof(memcg_lrus->lrus[0]), GFP_KERNEL);
if (!memcg_lrus)
return -ENOMEM;
- if (__memcg_init_list_lru_node(memcg_lrus, 0, size)) {
+ if (memcg_init_list_lru_range(memcg_lrus, 0, size)) {
kvfree(memcg_lrus);
return -ENOMEM;
}
- RCU_INIT_POINTER(nlru->memcg_lrus, memcg_lrus);
+ RCU_INIT_POINTER(lru->memcg_lrus, memcg_lrus);
return 0;
}
-static void memcg_destroy_list_lru_node(struct list_lru_node *nlru)
+static void memcg_destroy_list_lru(struct list_lru *lru)
{
struct list_lru_memcg *memcg_lrus;
+
+ if (!list_lru_memcg_aware(lru))
+ return;
+
/*
* This is called when shrinker has already been unregistered,
* and nobody can use it. So, there is no need to use kvfree_rcu().
*/
- memcg_lrus = rcu_dereference_protected(nlru->memcg_lrus, true);
- __memcg_destroy_list_lru_node(memcg_lrus, 0, memcg_nr_cache_ids);
+ memcg_lrus = rcu_dereference_protected(lru->memcg_lrus, true);
+ memcg_destroy_list_lru_range(memcg_lrus, 0, memcg_nr_cache_ids);
kvfree(memcg_lrus);
}
-static int memcg_update_list_lru_node(struct list_lru_node *nlru,
- int old_size, int new_size)
+static int memcg_update_list_lru(struct list_lru *lru, int old_size, int new_size)
{
struct list_lru_memcg *old, *new;
BUG_ON(old_size > new_size);
- old = rcu_dereference_protected(nlru->memcg_lrus,
+ old = rcu_dereference_protected(lru->memcg_lrus,
lockdep_is_held(&list_lrus_mutex));
- new = kvmalloc(sizeof(*new) + new_size * sizeof(void *), GFP_KERNEL);
+ new = kvmalloc(sizeof(*new) + new_size * sizeof(new->lrus[0]), GFP_KERNEL);
if (!new)
return -ENOMEM;
- if (__memcg_init_list_lru_node(new, old_size, new_size)) {
+ if (memcg_init_list_lru_range(new, old_size, new_size)) {
kvfree(new);
return -ENOMEM;
}
- memcpy(&new->lru, &old->lru, old_size * sizeof(void *));
- rcu_assign_pointer(nlru->memcg_lrus, new);
+ memcpy(&new->lrus, &old->lrus, old_size * sizeof(new->lrus[0]));
+ rcu_assign_pointer(lru->memcg_lrus, new);
kvfree_rcu(old, rcu);
return 0;
}
-static void memcg_cancel_update_list_lru_node(struct list_lru_node *nlru,
- int old_size, int new_size)
-{
- struct list_lru_memcg *memcg_lrus;
-
- memcg_lrus = rcu_dereference_protected(nlru->memcg_lrus,
- lockdep_is_held(&list_lrus_mutex));
- /* do not bother shrinking the array back to the old size, because we
- * cannot handle allocation failures here */
- __memcg_destroy_list_lru_node(memcg_lrus, old_size, new_size);
-}
-
-static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
-{
- int i;
-
- lru->memcg_aware = memcg_aware;
-
- if (!memcg_aware)
- return 0;
-
- for_each_node(i) {
- if (memcg_init_list_lru_node(&lru->node[i]))
- goto fail;
- }
- return 0;
-fail:
- for (i = i - 1; i >= 0; i--) {
- if (!lru->node[i].memcg_lrus)
- continue;
- memcg_destroy_list_lru_node(&lru->node[i]);
- }
- return -ENOMEM;
-}
-
-static void memcg_destroy_list_lru(struct list_lru *lru)
-{
- int i;
-
- if (!list_lru_memcg_aware(lru))
- return;
-
- for_each_node(i)
- memcg_destroy_list_lru_node(&lru->node[i]);
-}
-
-static int memcg_update_list_lru(struct list_lru *lru,
- int old_size, int new_size)
-{
- int i;
-
- for_each_node(i) {
- if (memcg_update_list_lru_node(&lru->node[i],
- old_size, new_size))
- goto fail;
- }
- return 0;
-fail:
- for (i = i - 1; i >= 0; i--) {
- if (!lru->node[i].memcg_lrus)
- continue;
-
- memcg_cancel_update_list_lru_node(&lru->node[i],
- old_size, new_size);
- }
- return -ENOMEM;
-}
-
static void memcg_cancel_update_list_lru(struct list_lru *lru,
int old_size, int new_size)
{
- int i;
+ struct list_lru_memcg *memcg_lrus;
- for_each_node(i)
- memcg_cancel_update_list_lru_node(&lru->node[i],
- old_size, new_size);
+ memcg_lrus = rcu_dereference_protected(lru->memcg_lrus,
+ lockdep_is_held(&list_lrus_mutex));
+ /*
+ * Do not bother shrinking the array back to the old size, because we
+ * cannot handle allocation failures here.
+ */
+ memcg_destroy_list_lru_range(memcg_lrus, old_size, new_size);
}
int memcg_update_all_list_lrus(int new_size)
@@ -525,8 +472,8 @@ static void memcg_drain_list_lru_node(struct list_lru *lru, int nid,
*/
spin_lock_irq(&nlru->lock);
- src = list_lru_from_memcg_idx(nlru, src_idx);
- dst = list_lru_from_memcg_idx(nlru, dst_idx);
+ src = list_lru_from_memcg_idx(lru, nid, src_idx);
+ dst = list_lru_from_memcg_idx(lru, nid, dst_idx);
list_splice_init(&src->list, &dst->list);
--
2.11.0
We currently allocate scope for every memcg to be able to tracked on
every superblock instantiated in the system, regardless of whether
that superblock is even accessible to that memcg.
These huge memcg counts come from container hosts where memcgs are
confined to just a small subset of the total number of superblocks
that instantiated at any given point in time.
For these systems with huge container counts, list_lru does not need
the capability of tracking every memcg on every superblock. What it
comes down to is that adding the memcg to the list_lru at the first
insert. So introduce kmem_cache_alloc_lru to allocate objects and its
list_lru. In the later patch, we will convert all inode and dentry
allocation from kmem_cache_alloc to kmem_cache_alloc_lru.
Signed-off-by: Muchun Song <[email protected]>
---
include/linux/list_lru.h | 3 ++
include/linux/memcontrol.h | 14 ++++++
include/linux/slab.h | 3 ++
mm/list_lru.c | 114 +++++++++++++++++++++++++++++++++++++++++----
mm/memcontrol.c | 14 ------
mm/slab.c | 39 +++++++++++-----
mm/slab.h | 17 ++++++-
mm/slob.c | 6 +++
mm/slub.c | 42 +++++++++++------
9 files changed, 202 insertions(+), 50 deletions(-)
diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index 2b32dbd89214..50a3144016b4 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -56,11 +56,14 @@ struct list_lru {
struct list_head list;
int shrinker_id;
bool memcg_aware;
+ /* protects ->memcg_lrus->lrus[i] */
+ spinlock_t lock;
/* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
struct list_lru_memcg __rcu *memcg_lrus;
#endif
};
+int list_lru_memcg_alloc(struct list_lru *lru, struct mem_cgroup *memcg, gfp_t gfp);
void list_lru_destroy(struct list_lru *lru);
int __list_lru_init(struct list_lru *lru, bool memcg_aware,
struct lock_class_key *key, struct shrinker *shrinker);
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 7267cf9d1f3d..06ee32822fd4 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -520,6 +520,20 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
}
+static inline struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg)
+{
+ struct mem_cgroup *memcg;
+
+ rcu_read_lock();
+retry:
+ memcg = obj_cgroup_memcg(objcg);
+ if (unlikely(!css_tryget(&memcg->css)))
+ goto retry;
+ rcu_read_unlock();
+
+ return memcg;
+}
+
#ifdef CONFIG_MEMCG_KMEM
/*
* folio_memcg_kmem - Check if the folio has the memcg_kmem flag set.
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 6ce826d8194d..441f4e87cb34 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -135,6 +135,7 @@
#include <linux/kasan.h>
+struct list_lru;
struct mem_cgroup;
/*
* struct kmem_cache related prototypes
@@ -429,6 +430,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
__alloc_size(1)
void *__kmalloc(size_t size, gfp_t flags) __assume_kmalloc_alignment __malloc;
void *kmem_cache_alloc(struct kmem_cache *s, gfp_t flags) __assume_kmalloc_alignment __malloc;
+void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
+ gfp_t gfpflags) __assume_kmalloc_alignment __malloc;
void kmem_cache_free(struct kmem_cache *s, void *objp);
/*
diff --git a/mm/list_lru.c b/mm/list_lru.c
index f1c73b53af9a..eea29eb4cf48 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -339,22 +339,30 @@ static void memcg_destroy_list_lru_range(struct list_lru_memcg *memcg_lrus,
kfree(memcg_lrus->lrus[i]);
}
+static struct list_lru_per_memcg *memcg_list_lru_alloc(gfp_t gfp)
+{
+ int nid;
+ struct list_lru_per_memcg *lru;
+
+ lru = kmalloc(struct_size(lru, nodes, nr_node_ids), gfp);
+ if (!lru)
+ return NULL;
+
+ for_each_node(nid)
+ init_one_lru(&lru->nodes[nid]);
+
+ return lru;
+}
+
static int memcg_init_list_lru_range(struct list_lru_memcg *memcg_lrus,
int begin, int end)
{
int i;
for (i = begin; i < end; i++) {
- int nid;
- struct list_lru_per_memcg *lru;
-
- lru = kmalloc(struct_size(lru, nodes, nr_node_ids), GFP_KERNEL);
- if (!lru)
+ memcg_lrus->lrus[i] = memcg_list_lru_alloc(GFP_KERNEL);
+ if (!memcg_lrus->lrus[i])
goto fail;
-
- for_each_node(nid)
- init_one_lru(&lru->nodes[nid]);
- memcg_lrus->lrus[i] = lru;
}
return 0;
fail:
@@ -371,6 +379,8 @@ static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
if (!memcg_aware)
return 0;
+ spin_lock_init(&lru->lock);
+
memcg_lrus = kvmalloc(sizeof(*memcg_lrus) +
size * sizeof(memcg_lrus->lrus[0]), GFP_KERNEL);
if (!memcg_lrus)
@@ -418,8 +428,11 @@ static int memcg_update_list_lru(struct list_lru *lru, int old_size, int new_siz
return -ENOMEM;
}
+ spin_lock_irq(&lru->lock);
memcpy(&new->lrus, &old->lrus, old_size * sizeof(new->lrus[0]));
rcu_assign_pointer(lru->memcg_lrus, new);
+ spin_unlock_irq(&lru->lock);
+
kvfree_rcu(old, rcu);
return 0;
}
@@ -504,6 +517,89 @@ void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_memcg)
memcg_drain_list_lru(lru, src_idx, dst_memcg);
mutex_unlock(&list_lrus_mutex);
}
+
+static bool memcg_list_lru_skip_alloc(struct list_lru *lru,
+ struct mem_cgroup *memcg)
+{
+ struct list_lru_memcg *memcg_lrus;
+ int idx = memcg_cache_id(memcg);
+
+ if (unlikely(idx < 0))
+ return true;
+
+ rcu_read_lock();
+ memcg_lrus = rcu_dereference(lru->memcg_lrus);
+ if (memcg_lrus->lrus[idx]) {
+ rcu_read_unlock();
+ return true;
+ }
+ rcu_read_unlock();
+
+ return false;
+}
+
+int list_lru_memcg_alloc(struct list_lru *lru, struct mem_cgroup *memcg, gfp_t gfp)
+{
+ unsigned long flags;
+ struct list_lru_memcg *memcg_lrus;
+ int i;
+
+ struct list_lru_memcg {
+ struct list_lru_per_memcg *mlru;
+ struct mem_cgroup *memcg;
+ } *table;
+
+ if (!list_lru_memcg_aware(lru))
+ return 0;
+
+ if (memcg_list_lru_skip_alloc(lru, memcg))
+ return 0;
+
+ /*
+ * The allocated list_lru_per_memcg array is not accounted directly.
+ * Moreover, it should not come from DMA buffer and is not readily
+ * reclaimable. So those GFP bits should be masked off.
+ */
+ gfp &= ~(__GFP_DMA | __GFP_RECLAIMABLE | __GFP_ACCOUNT | __GFP_ZERO);
+ table = kmalloc_array(memcg->css.cgroup->level, sizeof(*table), gfp);
+ if (!table)
+ return -ENOMEM;
+
+ /*
+ * Because the list_lru can be reparented to the parent cgroup's
+ * list_lru, we should make sure that this cgroup and all its
+ * ancestors have allocated list_lru_per_memcg.
+ */
+ for (i = 0; memcg; memcg = parent_mem_cgroup(memcg), i++) {
+ if (memcg_list_lru_skip_alloc(lru, memcg))
+ break;
+
+ table[i].memcg = memcg;
+ table[i].mlru = memcg_list_lru_alloc(gfp);
+ if (!table[i].mlru) {
+ while (i--)
+ kfree(table[i].mlru);
+ kfree(table);
+ return -ENOMEM;
+ }
+ }
+
+ spin_lock_irqsave(&lru->lock, flags);
+ memcg_lrus = rcu_dereference_protected(lru->memcg_lrus, true);
+ while (i--) {
+ int index = memcg_cache_id(table[i].memcg);
+
+ if (memcg_lrus->lrus[index])
+ kfree(table[i].mlru);
+ else
+ memcg_lrus->lrus[index] = table[i].mlru;
+ }
+ spin_unlock_irqrestore(&lru->lock, flags);
+
+ kfree(table);
+
+ return 0;
+}
#else
static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
{
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a85b52968666..0e8c8d8465e5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2763,20 +2763,6 @@ static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
folio->memcg_data = (unsigned long)memcg;
}
-static struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg)
-{
- struct mem_cgroup *memcg;
-
- rcu_read_lock();
-retry:
- memcg = obj_cgroup_memcg(objcg);
- if (unlikely(!css_tryget(&memcg->css)))
- goto retry;
- rcu_read_unlock();
-
- return memcg;
-}
-
#ifdef CONFIG_MEMCG_KMEM
/*
* The allocated objcg pointers array is not accounted directly.
diff --git a/mm/slab.c b/mm/slab.c
index d0f725637663..9a001aabc77b 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3219,7 +3219,7 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_
bool init = false;
flags &= gfp_allowed_mask;
- cachep = slab_pre_alloc_hook(cachep, &objcg, 1, flags);
+ cachep = slab_pre_alloc_hook(cachep, NULL, &objcg, 1, flags);
if (unlikely(!cachep))
return NULL;
@@ -3295,7 +3295,8 @@ __do_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
#endif /* CONFIG_NUMA */
static __always_inline void *
-slab_alloc(struct kmem_cache *cachep, gfp_t flags, size_t orig_size, unsigned long caller)
+slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
+ size_t orig_size, unsigned long caller)
{
unsigned long save_flags;
void *objp;
@@ -3303,7 +3304,7 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, size_t orig_size, unsigned lo
bool init = false;
flags &= gfp_allowed_mask;
- cachep = slab_pre_alloc_hook(cachep, &objcg, 1, flags);
+ cachep = slab_pre_alloc_hook(cachep, lru, &objcg, 1, flags);
if (unlikely(!cachep))
return NULL;
@@ -3492,6 +3493,18 @@ void ___cache_free(struct kmem_cache *cachep, void *objp,
__free_one(ac, objp);
}
+static __always_inline
+void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
+ gfp_t flags)
+{
+ void *ret = slab_alloc(cachep, lru, flags, cachep->object_size, _RET_IP_);
+
+ trace_kmem_cache_alloc(_RET_IP_, ret,
+ cachep->object_size, cachep->size, flags);
+
+ return ret;
+}
+
/**
* kmem_cache_alloc - Allocate an object
* @cachep: The cache to allocate from.
@@ -3504,15 +3517,17 @@ void ___cache_free(struct kmem_cache *cachep, void *objp,
*/
void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
- void *ret = slab_alloc(cachep, flags, cachep->object_size, _RET_IP_);
-
- trace_kmem_cache_alloc(_RET_IP_, ret,
- cachep->object_size, cachep->size, flags);
-
- return ret;
+ return __kmem_cache_alloc_lru(cachep, NULL, flags);
}
EXPORT_SYMBOL(kmem_cache_alloc);
+void *kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
+ gfp_t flags)
+{
+ return __kmem_cache_alloc_lru(cachep, lru, flags);
+}
+EXPORT_SYMBOL(kmem_cache_alloc_lru);
+
static __always_inline void
cache_alloc_debugcheck_after_bulk(struct kmem_cache *s, gfp_t flags,
size_t size, void **p, unsigned long caller)
@@ -3529,7 +3544,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
size_t i;
struct obj_cgroup *objcg = NULL;
- s = slab_pre_alloc_hook(s, &objcg, size, flags);
+ s = slab_pre_alloc_hook(s, NULL, &objcg, size, flags);
if (!s)
return 0;
@@ -3570,7 +3585,7 @@ kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
{
void *ret;
- ret = slab_alloc(cachep, flags, size, _RET_IP_);
+ ret = slab_alloc(cachep, NULL, flags, size, _RET_IP_);
ret = kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc(_RET_IP_, ret,
@@ -3697,7 +3712,7 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
cachep = kmalloc_slab(size, flags);
if (unlikely(ZERO_OR_NULL_PTR(cachep)))
return cachep;
- ret = slab_alloc(cachep, flags, size, caller);
+ ret = slab_alloc(cachep, NULL, flags, size, caller);
ret = kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc(caller, ret,
diff --git a/mm/slab.h b/mm/slab.h
index 58c01a34e5b8..c6fbfda824df 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -46,6 +46,7 @@ struct kmem_cache {
#include <linux/kmemleak.h>
#include <linux/random.h>
#include <linux/sched/mm.h>
+#include <linux/list_lru.h>
/*
* State of the slab allocator.
@@ -269,6 +270,7 @@ static inline size_t obj_full_size(struct kmem_cache *s)
* Returns false if the allocation should fail.
*/
static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
+ struct list_lru *lru,
struct obj_cgroup **objcgp,
size_t objects, gfp_t flags)
{
@@ -284,6 +286,17 @@ static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
if (!objcg)
return true;
+ if (lru) {
+ struct mem_cgroup *memcg = get_mem_cgroup_from_objcg(objcg);
+
+ if (list_lru_memcg_alloc(lru, memcg, flags)) {
+ css_put(&memcg->css);
+ obj_cgroup_put(objcg);
+ return false;
+ }
+ css_put(&memcg->css);
+ }
+
if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s))) {
obj_cgroup_put(objcg);
return false;
@@ -386,6 +399,7 @@ static inline void memcg_free_page_obj_cgroups(struct page *page)
}
static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
+ struct list_lru *lru,
struct obj_cgroup **objcgp,
size_t objects, gfp_t flags)
{
@@ -484,6 +498,7 @@ static inline size_t slab_ksize(const struct kmem_cache *s)
}
static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
+ struct list_lru *lru,
struct obj_cgroup **objcgp,
size_t size, gfp_t flags)
{
@@ -494,7 +509,7 @@ static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
if (should_failslab(s, flags))
return NULL;
- if (!memcg_slab_pre_alloc_hook(s, objcgp, size, flags))
+ if (!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags))
return NULL;
return s;
diff --git a/mm/slob.c b/mm/slob.c
index 74d3f6e60666..9db272c75928 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -633,6 +633,12 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
}
EXPORT_SYMBOL(kmem_cache_alloc);
+
+void *kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags)
+{
+ return slob_alloc_node(cachep, flags, NUMA_NO_NODE);
+}
+EXPORT_SYMBOL(kmem_cache_alloc_lru);
#ifdef CONFIG_NUMA
void *__kmalloc_node(size_t size, gfp_t gfp, int node)
{
diff --git a/mm/slub.c b/mm/slub.c
index df1ac8aff86f..41211a2de0da 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3083,7 +3083,7 @@ static __always_inline void maybe_wipe_obj_freeptr(struct kmem_cache *s,
*
* Otherwise we can simply pick the next object from the lockless free list.
*/
-static __always_inline void *slab_alloc_node(struct kmem_cache *s,
+static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_lru *lru,
gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
{
void *object;
@@ -3093,7 +3093,7 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
struct obj_cgroup *objcg = NULL;
bool init = false;
- s = slab_pre_alloc_hook(s, &objcg, 1, gfpflags);
+ s = slab_pre_alloc_hook(s, lru, &objcg, 1, gfpflags);
if (!s)
return NULL;
@@ -3184,27 +3184,41 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
return object;
}
-static __always_inline void *slab_alloc(struct kmem_cache *s,
+static __always_inline void *slab_alloc(struct kmem_cache *s, struct list_lru *lru,
gfp_t gfpflags, unsigned long addr, size_t orig_size)
{
- return slab_alloc_node(s, gfpflags, NUMA_NO_NODE, addr, orig_size);
+ return slab_alloc_node(s, lru, gfpflags, NUMA_NO_NODE, addr, orig_size);
}
-void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
+static __always_inline
+void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
+ gfp_t gfpflags)
{
- void *ret = slab_alloc(s, gfpflags, _RET_IP_, s->object_size);
+ void *ret = slab_alloc(s, lru, gfpflags, _RET_IP_, s->object_size);
trace_kmem_cache_alloc(_RET_IP_, ret, s->object_size,
s->size, gfpflags);
return ret;
}
+
+void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
+{
+ return __kmem_cache_alloc_lru(s, NULL, gfpflags);
+}
EXPORT_SYMBOL(kmem_cache_alloc);
+void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
+ gfp_t gfpflags)
+{
+ return __kmem_cache_alloc_lru(s, lru, gfpflags);
+}
+EXPORT_SYMBOL(kmem_cache_alloc_lru);
+
#ifdef CONFIG_TRACING
void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
{
- void *ret = slab_alloc(s, gfpflags, _RET_IP_, size);
+ void *ret = slab_alloc(s, NULL, gfpflags, _RET_IP_, size);
trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags);
ret = kasan_kmalloc(s, ret, size, gfpflags);
return ret;
@@ -3215,7 +3229,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_trace);
#ifdef CONFIG_NUMA
void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
{
- void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_, s->object_size);
+ void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
trace_kmem_cache_alloc_node(_RET_IP_, ret,
s->object_size, s->size, gfpflags, node);
@@ -3229,7 +3243,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
gfp_t gfpflags,
int node, size_t size)
{
- void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_, size);
+ void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
trace_kmalloc_node(_RET_IP_, ret,
size, s->size, gfpflags, node);
@@ -3611,7 +3625,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
struct obj_cgroup *objcg = NULL;
/* memcg and kmem_cache debug support */
- s = slab_pre_alloc_hook(s, &objcg, size, flags);
+ s = slab_pre_alloc_hook(s, NULL, &objcg, size, flags);
if (unlikely(!s))
return false;
/*
@@ -4360,7 +4374,7 @@ void *__kmalloc(size_t size, gfp_t flags)
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;
- ret = slab_alloc(s, flags, _RET_IP_, size);
+ ret = slab_alloc(s, NULL, flags, _RET_IP_, size);
trace_kmalloc(_RET_IP_, ret, size, s->size, flags);
@@ -4408,7 +4422,7 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;
- ret = slab_alloc_node(s, flags, node, _RET_IP_, size);
+ ret = slab_alloc_node(s, NULL, flags, node, _RET_IP_, size);
trace_kmalloc_node(_RET_IP_, ret, size, s->size, flags, node);
@@ -4878,7 +4892,7 @@ void *__kmalloc_track_caller(size_t size, gfp_t gfpflags, unsigned long caller)
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;
- ret = slab_alloc(s, gfpflags, caller, size);
+ ret = slab_alloc(s, NULL, gfpflags, caller, size);
/* Honor the call site pointer we received. */
trace_kmalloc(caller, ret, size, s->size, gfpflags);
@@ -4909,7 +4923,7 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;
- ret = slab_alloc_node(s, gfpflags, node, caller, size);
+ ret = slab_alloc_node(s, NULL, gfpflags, node, caller, size);
/* Honor the call site pointer we received. */
trace_kmalloc_node(caller, ret, size, s->size, gfpflags, node);
--
2.11.0
The allocated inode cache is supposed to be added to its memcg list_lru
which should be allocated as well in advance. That can be done by
kmem_cache_alloc_lru() which allocates object and list_lru. The file
systems is main user of it. So introduce alloc_inode_sb() to allocate
file system specific inodes and set up the inode reclaim context
properly. The file system is supposed to use alloc_inode_sb() to
allocate inodes. In the later patches, we will convert all users to the
new API.
Signed-off-by: Muchun Song <[email protected]>
---
Documentation/filesystems/porting.rst | 5 +++++
fs/inode.c | 2 +-
include/linux/fs.h | 11 +++++++++++
3 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index bf19fd6b86e7..c9c157d7b7bb 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -45,6 +45,11 @@ typically between calling iget_locked() and unlocking the inode.
At some point that will become mandatory.
+**mandatory**
+
+The foo_inode_info should always be allocated through alloc_inode_sb() rather
+than kmem_cache_alloc() or kmalloc() related.
+
---
**mandatory**
diff --git a/fs/inode.c b/fs/inode.c
index cb41f02d8ced..43d06b42f908 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -234,7 +234,7 @@ static struct inode *alloc_inode(struct super_block *sb)
if (ops->alloc_inode)
inode = ops->alloc_inode(sb);
else
- inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL);
+ inode = alloc_inode_sb(sb, inode_cachep, GFP_KERNEL);
if (!inode)
return NULL;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bec3781d260f..c03d8b3fa70c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -41,6 +41,7 @@
#include <linux/stddef.h>
#include <linux/mount.h>
#include <linux/cred.h>
+#include <linux/slab.h>
#include <asm/byteorder.h>
#include <uapi/linux/fs.h>
@@ -3173,6 +3174,16 @@ extern void free_inode_nonrcu(struct inode *inode);
extern int should_remove_suid(struct dentry *);
extern int file_remove_privs(struct file *);
+/*
+ * This must be used for allocating filesystems specific inodes to set
+ * up the inode reclaim context correctly.
+ */
+static inline void *
+alloc_inode_sb(struct super_block *sb, struct kmem_cache *cache, gfp_t gfp)
+{
+ return kmem_cache_alloc_lru(cache, &sb->s_inode_lru, gfp);
+}
+
extern void __insert_inode_hash(struct inode *, unsigned long hashval);
static inline void insert_inode_hash(struct inode *inode)
{
--
2.11.0
It will simplify code if moving work of making kmem online to the place
where making memcg online. It is unnecessary to set ->kmemcg_id when the
kmem is offline, memcg_free_kmem() can go away as well.
Signed-off-by: Muchun Song <[email protected]>
---
mm/memcontrol.c | 42 +++++++++++++++---------------------------
1 file changed, 15 insertions(+), 27 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6844d8b511d8..a85b52968666 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3610,7 +3610,8 @@ static int memcg_online_kmem(struct mem_cgroup *memcg)
if (cgroup_memory_nokmem)
return 0;
- BUG_ON(memcg->kmemcg_id >= 0);
+ if (unlikely(mem_cgroup_is_root(memcg)))
+ return 0;
memcg_id = memcg_alloc_cache_id();
if (memcg_id < 0)
@@ -3639,6 +3640,9 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg)
if (cgroup_memory_nokmem)
return;
+ if (unlikely(mem_cgroup_is_root(memcg)))
+ return;
+
parent = parent_mem_cgroup(memcg);
if (!parent)
parent = root_mem_cgroup;
@@ -3646,20 +3650,11 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg)
memcg_reparent_objcgs(memcg, parent);
kmemcg_id = memcg->kmemcg_id;
- BUG_ON(kmemcg_id < 0);
/* memcg_reparent_objcgs() must be called before this. */
memcg_drain_all_list_lrus(kmemcg_id, parent);
memcg_free_cache_id(kmemcg_id);
- memcg->kmemcg_id = -1;
-}
-
-static void memcg_free_kmem(struct mem_cgroup *memcg)
-{
- /* css_alloc() failed, offlining didn't happen */
- if (unlikely(memcg->kmemcg_id != -1))
- memcg_offline_kmem(memcg);
}
#else
static int memcg_online_kmem(struct mem_cgroup *memcg)
@@ -3669,9 +3664,6 @@ static int memcg_online_kmem(struct mem_cgroup *memcg)
static void memcg_offline_kmem(struct mem_cgroup *memcg)
{
}
-static void memcg_free_kmem(struct mem_cgroup *memcg)
-{
-}
#endif /* CONFIG_MEMCG_KMEM */
static int memcg_update_kmem_max(struct mem_cgroup *memcg,
@@ -5183,7 +5175,6 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
{
struct mem_cgroup *parent = mem_cgroup_from_css(parent_css);
struct mem_cgroup *memcg, *old_memcg;
- long error = -ENOMEM;
old_memcg = set_active_memcg(parent);
memcg = mem_cgroup_alloc();
@@ -5213,33 +5204,26 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
}
/* The following stuff does not apply to the root */
- error = memcg_online_kmem(memcg);
- if (error)
- goto fail;
-
if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
static_branch_inc(&memcg_sockets_enabled_key);
return &memcg->css;
-fail:
- mem_cgroup_id_remove(memcg);
- mem_cgroup_free(memcg);
- return ERR_PTR(error);
}
static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
{
struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+ if (memcg_online_kmem(memcg))
+ goto remove_id;
+
/*
* A memcg must be visible for expand_shrinker_info()
* by the time the maps are allocated. So, we allocate maps
* here, when for_each_mem_cgroup() can't skip it.
*/
- if (alloc_shrinker_info(memcg)) {
- mem_cgroup_id_remove(memcg);
- return -ENOMEM;
- }
+ if (alloc_shrinker_info(memcg))
+ goto offline_kmem;
/* Online state pins memcg ID, memcg ID pins CSS */
refcount_set(&memcg->id.ref, 1);
@@ -5249,6 +5233,11 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
queue_delayed_work(system_unbound_wq, &stats_flush_dwork,
2UL*HZ);
return 0;
+offline_kmem:
+ memcg_offline_kmem(memcg);
+remove_id:
+ mem_cgroup_id_remove(memcg);
+ return -ENOMEM;
}
static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
@@ -5306,7 +5295,6 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
cancel_work_sync(&memcg->high_work);
mem_cgroup_remove_from_trees(memcg);
free_shrinker_info(memcg);
- memcg_free_kmem(memcg);
mem_cgroup_free(memcg);
}
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
drivers/dax/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index fc89e91beea7..288c0b85bb77 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -447,7 +447,7 @@ static struct inode *dax_alloc_inode(struct super_block *sb)
struct dax_device *dax_dev;
struct inode *inode;
- dax_dev = kmem_cache_alloc(dax_cache, GFP_KERNEL);
+ dax_dev = alloc_inode_sb(sb, dax_cache, GFP_KERNEL);
if (!dax_dev)
return NULL;
--
2.11.0
Since commit e5bc3af7734f ("rcu: Consolidate PREEMPT and !PREEMPT
synchronize_rcu()"), the critical section of spin lock can serve
as an RCU read-side critical section which already allows readers
that hold nlru->lock avoid taking rcu lock. So just to remove
holding lock.
Signed-off-by: Muchun Song <[email protected]>
---
mm/list_lru.c | 11 -----------
1 file changed, 11 deletions(-)
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 4962d48d4410..6b2f3cbe5f67 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -402,18 +402,7 @@ static int memcg_update_list_lru_node(struct list_lru_node *nlru,
}
memcpy(&new->lru, &old->lru, old_size * sizeof(void *));
-
- /*
- * The locking below allows readers that hold nlru->lock avoid taking
- * rcu_read_lock (see list_lru_from_memcg_idx).
- *
- * Since list_lru_{add,del} may be called under an IRQ-safe lock,
- * we have to use IRQ-safe primitives here to avoid deadlock.
- */
- spin_lock_irq(&nlru->lock);
rcu_assign_pointer(nlru->memcg_lrus, new);
- spin_unlock_irq(&nlru->lock);
-
kvfree_rcu(old, rcu);
return 0;
}
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/9p/vfs_inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 795706520b5e..5311f35accf5 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -223,7 +223,7 @@ v9fs_blank_wstat(struct p9_wstat *wstat)
struct inode *v9fs_alloc_inode(struct super_block *sb)
{
struct v9fs_inode *v9inode;
- v9inode = kmem_cache_alloc(v9fs_inode_cache, GFP_KERNEL);
+ v9inode = alloc_inode_sb(sb, v9fs_inode_cache, GFP_KERNEL);
if (!v9inode)
return NULL;
#ifdef CONFIG_9P_FSCACHE
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/afs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/afs/super.c b/fs/afs/super.c
index e38bb1e7a4d2..0ecea5a94af9 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -677,7 +677,7 @@ static struct inode *afs_alloc_inode(struct super_block *sb)
{
struct afs_vnode *vnode;
- vnode = kmem_cache_alloc(afs_inode_cachep, GFP_KERNEL);
+ vnode = alloc_inode_sb(sb, afs_inode_cachep, GFP_KERNEL);
if (!vnode)
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/befs/linuxvfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index c1ba13d19024..b4b3567ac655 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -277,7 +277,7 @@ befs_alloc_inode(struct super_block *sb)
{
struct befs_inode_info *bi;
- bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
+ bi = alloc_inode_sb(sb, befs_inode_cachep, GFP_KERNEL);
if (!bi)
return NULL;
return &bi->vfs_inode;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/bfs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index fd691e4815c5..1926bec2c850 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -239,7 +239,7 @@ static struct kmem_cache *bfs_inode_cachep;
static struct inode *bfs_alloc_inode(struct super_block *sb)
{
struct bfs_inode_info *bi;
- bi = kmem_cache_alloc(bfs_inode_cachep, GFP_KERNEL);
+ bi = alloc_inode_sb(sb, bfs_inode_cachep, GFP_KERNEL);
if (!bi)
return NULL;
return &bi->vfs_inode;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/block_dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 45df6cbccf12..1630458b3e98 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -799,7 +799,7 @@ static struct kmem_cache * bdev_cachep __read_mostly;
static struct inode *bdev_alloc_inode(struct super_block *sb)
{
- struct bdev_inode *ei = kmem_cache_alloc(bdev_cachep, GFP_KERNEL);
+ struct bdev_inode *ei = alloc_inode_sb(sb, bdev_cachep, GFP_KERNEL);
if (!ei)
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/cifs/cifsfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 8c20bfa187ac..7c239ba551b9 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -355,7 +355,7 @@ static struct inode *
cifs_alloc_inode(struct super_block *sb)
{
struct cifsInodeInfo *cifs_inode;
- cifs_inode = kmem_cache_alloc(cifs_inode_cachep, GFP_KERNEL);
+ cifs_inode = alloc_inode_sb(sb, cifs_inode_cachep, GFP_KERNEL);
if (!cifs_inode)
return NULL;
cifs_inode->cifsAttrs = 0x20; /* default */
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/ceph/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 2df1e1284451..96239d209bec 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -447,7 +447,7 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
struct ceph_inode_info *ci;
int i;
- ci = kmem_cache_alloc(ceph_inode_cachep, GFP_NOFS);
+ ci = alloc_inode_sb(sb, ceph_inode_cachep, GFP_NOFS);
if (!ci)
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/coda/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/coda/inode.c b/fs/coda/inode.c
index d9f1bd7153df..2185328b65c7 100644
--- a/fs/coda/inode.c
+++ b/fs/coda/inode.c
@@ -43,7 +43,7 @@ static struct kmem_cache * coda_inode_cachep;
static struct inode *coda_alloc_inode(struct super_block *sb)
{
struct coda_inode_info *ei;
- ei = kmem_cache_alloc(coda_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, coda_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
memset(&ei->c_fid, 0, sizeof(struct CodaFid));
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/ecryptfs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ecryptfs/super.c b/fs/ecryptfs/super.c
index 39116af0390f..0b1c878317ab 100644
--- a/fs/ecryptfs/super.c
+++ b/fs/ecryptfs/super.c
@@ -38,7 +38,7 @@ static struct inode *ecryptfs_alloc_inode(struct super_block *sb)
struct ecryptfs_inode_info *inode_info;
struct inode *inode = NULL;
- inode_info = kmem_cache_alloc(ecryptfs_inode_info_cache, GFP_KERNEL);
+ inode_info = alloc_inode_sb(sb, ecryptfs_inode_info_cache, GFP_KERNEL);
if (unlikely(!inode_info))
goto out;
if (ecryptfs_init_crypt_stat(&inode_info->crypt_stat)) {
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/efs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/efs/super.c b/fs/efs/super.c
index 62b155b9366b..b287f47c165b 100644
--- a/fs/efs/super.c
+++ b/fs/efs/super.c
@@ -69,7 +69,7 @@ static struct kmem_cache * efs_inode_cachep;
static struct inode *efs_alloc_inode(struct super_block *sb)
{
struct efs_inode_info *ei;
- ei = kmem_cache_alloc(efs_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, efs_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/erofs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 2dc0b9f1d421..1bf7f8c6cfab 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -83,7 +83,7 @@ static void erofs_inode_init_once(void *ptr)
static struct inode *erofs_alloc_inode(struct super_block *sb)
{
struct erofs_inode *vi =
- kmem_cache_alloc(erofs_inode_cachep, GFP_KERNEL);
+ alloc_inode_sb(sb, erofs_inode_cachep, GFP_KERNEL);
if (!vi)
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/exfat/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/exfat/super.c b/fs/exfat/super.c
index 5539ffc20d16..1b24c9f61be1 100644
--- a/fs/exfat/super.c
+++ b/fs/exfat/super.c
@@ -182,7 +182,7 @@ static struct inode *exfat_alloc_inode(struct super_block *sb)
{
struct exfat_inode_info *ei;
- ei = kmem_cache_alloc(exfat_inode_cachep, GFP_NOFS);
+ ei = alloc_inode_sb(sb, exfat_inode_cachep, GFP_NOFS);
if (!ei)
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/ext2/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index d8d580b609ba..8deb03ae4742 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -180,7 +180,7 @@ static struct kmem_cache * ext2_inode_cachep;
static struct inode *ext2_alloc_inode(struct super_block *sb)
{
struct ext2_inode_info *ei;
- ei = kmem_cache_alloc(ext2_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, ext2_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
ei->i_block_alloc_info = NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/ext4/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0775950ee84e..71982851063b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1284,7 +1284,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
{
struct ext4_inode_info *ei;
- ei = kmem_cache_alloc(ext4_inode_cachep, GFP_NOFS);
+ ei = alloc_inode_sb(sb, ext4_inode_cachep, GFP_NOFS);
if (!ei)
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/fat/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index de0c9b013a85..5439831c725a 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -745,7 +745,7 @@ static struct kmem_cache *fat_inode_cachep;
static struct inode *fat_alloc_inode(struct super_block *sb)
{
struct msdos_inode_info *ei;
- ei = kmem_cache_alloc(fat_inode_cachep, GFP_NOFS);
+ ei = alloc_inode_sb(sb, fat_inode_cachep, GFP_NOFS);
if (!ei)
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/freevxfs/vxfs_super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/freevxfs/vxfs_super.c b/fs/freevxfs/vxfs_super.c
index 578a5062706e..22eed5a73ac2 100644
--- a/fs/freevxfs/vxfs_super.c
+++ b/fs/freevxfs/vxfs_super.c
@@ -124,7 +124,7 @@ static struct inode *vxfs_alloc_inode(struct super_block *sb)
{
struct vxfs_inode_info *vi;
- vi = kmem_cache_alloc(vxfs_inode_cachep, GFP_KERNEL);
+ vi = alloc_inode_sb(sb, vxfs_inode_cachep, GFP_KERNEL);
if (!vi)
return NULL;
inode_init_once(&vi->vfs_inode);
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/fuse/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 8baa76b4ae41..bab5c564301e 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -73,7 +73,7 @@ static struct inode *fuse_alloc_inode(struct super_block *sb)
{
struct fuse_inode *fi;
- fi = kmem_cache_alloc(fuse_inode_cachep, GFP_KERNEL);
+ fi = alloc_inode_sb(sb, fuse_inode_cachep, GFP_KERNEL);
if (!fi)
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/gfs2/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 6e00d15ef0a8..2778d4349b66 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1427,7 +1427,7 @@ static struct inode *gfs2_alloc_inode(struct super_block *sb)
{
struct gfs2_inode *ip;
- ip = kmem_cache_alloc(gfs2_inode_cachep, GFP_KERNEL);
+ ip = alloc_inode_sb(sb, gfs2_inode_cachep, GFP_KERNEL);
if (!ip)
return NULL;
ip->i_flags = 0;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/hfs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/hfs/super.c b/fs/hfs/super.c
index 12d9bae39363..6764afa98a6f 100644
--- a/fs/hfs/super.c
+++ b/fs/hfs/super.c
@@ -162,7 +162,7 @@ static struct inode *hfs_alloc_inode(struct super_block *sb)
{
struct hfs_inode_info *i;
- i = kmem_cache_alloc(hfs_inode_cachep, GFP_KERNEL);
+ i = alloc_inode_sb(sb, hfs_inode_cachep, GFP_KERNEL);
return i ? &i->vfs_inode : NULL;
}
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/hfsplus/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c
index b9e3db3f855f..8479add998b5 100644
--- a/fs/hfsplus/super.c
+++ b/fs/hfsplus/super.c
@@ -624,7 +624,7 @@ static struct inode *hfsplus_alloc_inode(struct super_block *sb)
{
struct hfsplus_inode_info *i;
- i = kmem_cache_alloc(hfsplus_inode_cachep, GFP_KERNEL);
+ i = alloc_inode_sb(sb, hfsplus_inode_cachep, GFP_KERNEL);
return i ? &i->vfs_inode : NULL;
}
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/hostfs/hostfs_kern.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index d5c9d886cd9f..2123d2bed55b 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -222,7 +222,7 @@ static struct inode *hostfs_alloc_inode(struct super_block *sb)
{
struct hostfs_inode_info *hi;
- hi = kmem_cache_alloc(hostfs_inode_cache, GFP_KERNEL_ACCOUNT);
+ hi = alloc_inode_sb(sb, hostfs_inode_cache, GFP_KERNEL_ACCOUNT);
if (hi == NULL)
return NULL;
hi->fd = -1;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/hugetlbfs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index cdfb1ae78a3f..b1885a3723f7 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -1109,7 +1109,7 @@ static struct inode *hugetlbfs_alloc_inode(struct super_block *sb)
if (unlikely(!hugetlbfs_dec_free_inodes(sbinfo)))
return NULL;
- p = kmem_cache_alloc(hugetlbfs_inode_cachep, GFP_KERNEL);
+ p = alloc_inode_sb(sb, hugetlbfs_inode_cachep, GFP_KERNEL);
if (unlikely(!p)) {
hugetlbfs_inc_free_inodes(sbinfo);
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/isofs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index 678e2c51b855..ea8fb73ae3e7 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -70,7 +70,7 @@ static struct kmem_cache *isofs_inode_cachep;
static struct inode *isofs_alloc_inode(struct super_block *sb)
{
struct iso_inode_info *ei;
- ei = kmem_cache_alloc(isofs_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, isofs_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/hpfs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/hpfs/super.c b/fs/hpfs/super.c
index a7dbfc892022..1cb89595b875 100644
--- a/fs/hpfs/super.c
+++ b/fs/hpfs/super.c
@@ -232,7 +232,7 @@ static struct kmem_cache * hpfs_inode_cachep;
static struct inode *hpfs_alloc_inode(struct super_block *sb)
{
struct hpfs_inode_info *ei;
- ei = kmem_cache_alloc(hpfs_inode_cachep, GFP_NOFS);
+ ei = alloc_inode_sb(sb, hpfs_inode_cachep, GFP_NOFS);
if (!ei)
return NULL;
return &ei->vfs_inode;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/minix/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/minix/inode.c b/fs/minix/inode.c
index a71f1cf894b9..8a0af80741b5 100644
--- a/fs/minix/inode.c
+++ b/fs/minix/inode.c
@@ -63,7 +63,7 @@ static struct kmem_cache * minix_inode_cachep;
static struct inode *minix_alloc_inode(struct super_block *sb)
{
struct minix_inode_info *ei;
- ei = kmem_cache_alloc(minix_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, minix_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/nfs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 853213b3a209..b759264885e9 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2221,7 +2221,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
struct inode *nfs_alloc_inode(struct super_block *sb)
{
struct nfs_inode *nfsi;
- nfsi = kmem_cache_alloc(nfs_inode_cachep, GFP_KERNEL);
+ nfsi = alloc_inode_sb(sb, nfs_inode_cachep, GFP_KERNEL);
if (!nfsi)
return NULL;
nfsi->flags = 0UL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/jffs2/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c
index 81ca58c10b72..7ea37f49f1e1 100644
--- a/fs/jffs2/super.c
+++ b/fs/jffs2/super.c
@@ -39,7 +39,7 @@ static struct inode *jffs2_alloc_inode(struct super_block *sb)
{
struct jffs2_inode_info *f;
- f = kmem_cache_alloc(jffs2_inode_cachep, GFP_KERNEL);
+ f = alloc_inode_sb(sb, jffs2_inode_cachep, GFP_KERNEL);
if (!f)
return NULL;
return &f->vfs_inode;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/orangefs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index 2f2e430461b2..1deb411ca5e8 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -106,7 +106,7 @@ static struct inode *orangefs_alloc_inode(struct super_block *sb)
{
struct orangefs_inode_s *orangefs_inode;
- orangefs_inode = kmem_cache_alloc(orangefs_inode_cache, GFP_KERNEL);
+ orangefs_inode = alloc_inode_sb(sb, orangefs_inode_cache, GFP_KERNEL);
if (!orangefs_inode)
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/nilfs2/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index f6b2d280aab5..cf1de3ed9f8b 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -151,7 +151,7 @@ struct inode *nilfs_alloc_inode(struct super_block *sb)
{
struct nilfs_inode_info *ii;
- ii = kmem_cache_alloc(nilfs_inode_cachep, GFP_NOFS);
+ ii = alloc_inode_sb(sb, nilfs_inode_cachep, GFP_NOFS);
if (!ii)
return NULL;
ii->i_bh = NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/jfs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index 9030aeaf0f88..5e77b5769464 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -102,7 +102,7 @@ static struct inode *jfs_alloc_inode(struct super_block *sb)
{
struct jfs_inode_info *jfs_inode;
- jfs_inode = kmem_cache_alloc(jfs_inode_cachep, GFP_NOFS);
+ jfs_inode = alloc_inode_sb(sb, jfs_inode_cachep, GFP_NOFS);
if (!jfs_inode)
return NULL;
#ifdef CONFIG_QUOTA
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/overlayfs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 178daa5e82c9..0e2a38a0b857 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -174,7 +174,7 @@ static struct kmem_cache *ovl_inode_cachep;
static struct inode *ovl_alloc_inode(struct super_block *sb)
{
- struct ovl_inode *oi = kmem_cache_alloc(ovl_inode_cachep, GFP_KERNEL);
+ struct ovl_inode *oi = alloc_inode_sb(sb, ovl_inode_cachep, GFP_KERNEL);
if (!oi)
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/proc/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 599eb724ff2d..cc0a406d3a19 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -66,7 +66,7 @@ static struct inode *proc_alloc_inode(struct super_block *sb)
{
struct proc_inode *ei;
- ei = kmem_cache_alloc(proc_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, proc_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
ei->pid = NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/qnx4/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/qnx4/inode.c b/fs/qnx4/inode.c
index 3fb7fc819b4f..a635bb6615e9 100644
--- a/fs/qnx4/inode.c
+++ b/fs/qnx4/inode.c
@@ -338,7 +338,7 @@ static struct kmem_cache *qnx4_inode_cachep;
static struct inode *qnx4_alloc_inode(struct super_block *sb)
{
struct qnx4_inode_info *ei;
- ei = kmem_cache_alloc(qnx4_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, qnx4_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/reiserfs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
index 58481f8d63d5..e7beba4dae09 100644
--- a/fs/reiserfs/super.c
+++ b/fs/reiserfs/super.c
@@ -639,7 +639,7 @@ static struct kmem_cache *reiserfs_inode_cachep;
static struct inode *reiserfs_alloc_inode(struct super_block *sb)
{
struct reiserfs_inode_info *ei;
- ei = kmem_cache_alloc(reiserfs_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, reiserfs_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
atomic_set(&ei->openers, 0);
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/romfs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/romfs/super.c b/fs/romfs/super.c
index 259f684d9236..9e6bbb4219de 100644
--- a/fs/romfs/super.c
+++ b/fs/romfs/super.c
@@ -375,7 +375,7 @@ static struct inode *romfs_alloc_inode(struct super_block *sb)
{
struct romfs_inode_info *inode;
- inode = kmem_cache_alloc(romfs_inode_cachep, GFP_KERNEL);
+ inode = alloc_inode_sb(sb, romfs_inode_cachep, GFP_KERNEL);
return inode ? &inode->vfs_inode : NULL;
}
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/squashfs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/squashfs/super.c b/fs/squashfs/super.c
index 60d6951915f4..e51625e93b00 100644
--- a/fs/squashfs/super.c
+++ b/fs/squashfs/super.c
@@ -550,7 +550,7 @@ static void __exit exit_squashfs_fs(void)
static struct inode *squashfs_alloc_inode(struct super_block *sb)
{
struct squashfs_inode_info *ei =
- kmem_cache_alloc(squashfs_inode_cachep, GFP_KERNEL);
+ alloc_inode_sb(sb, squashfs_inode_cachep, GFP_KERNEL);
return ei ? &ei->vfs_inode : NULL;
}
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/sysv/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/sysv/inode.c b/fs/sysv/inode.c
index be47263b8605..9e8d4a6fb2f3 100644
--- a/fs/sysv/inode.c
+++ b/fs/sysv/inode.c
@@ -306,7 +306,7 @@ static struct inode *sysv_alloc_inode(struct super_block *sb)
{
struct sysv_inode_info *si;
- si = kmem_cache_alloc(sysv_inode_cachep, GFP_KERNEL);
+ si = alloc_inode_sb(sb, sysv_inode_cachep, GFP_KERNEL);
if (!si)
return NULL;
return &si->vfs_inode;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/qnx6/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/qnx6/inode.c b/fs/qnx6/inode.c
index 61191f7bdf62..9d8e7e9788a1 100644
--- a/fs/qnx6/inode.c
+++ b/fs/qnx6/inode.c
@@ -597,7 +597,7 @@ static struct kmem_cache *qnx6_inode_cachep;
static struct inode *qnx6_alloc_inode(struct super_block *sb)
{
struct qnx6_inode_info *ei;
- ei = kmem_cache_alloc(qnx6_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, qnx6_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/openpromfs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/openpromfs/inode.c b/fs/openpromfs/inode.c
index f825176ff4ed..f0b7f4d51a17 100644
--- a/fs/openpromfs/inode.c
+++ b/fs/openpromfs/inode.c
@@ -335,7 +335,7 @@ static struct inode *openprom_alloc_inode(struct super_block *sb)
{
struct op_inode_info *oi;
- oi = kmem_cache_alloc(op_inode_cachep, GFP_KERNEL);
+ oi = alloc_inode_sb(sb, op_inode_cachep, GFP_KERNEL);
if (!oi)
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/ufs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index 00a01471ea05..23377c1baed9 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -1443,7 +1443,7 @@ static struct inode *ufs_alloc_inode(struct super_block *sb)
{
struct ufs_inode_info *ei;
- ei = kmem_cache_alloc(ufs_inode_cachep, GFP_NOFS);
+ ei = alloc_inode_sb(sb, ufs_inode_cachep, GFP_NOFS);
if (!ei)
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/xfs/xfs_icache.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index f2210d927481..0a4f32c0044c 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -77,7 +77,7 @@ xfs_inode_alloc(
* XXX: If this didn't occur in transactions, we could drop GFP_NOFAIL
* and return NULL here on ENOMEM.
*/
- ip = kmem_cache_alloc(xfs_inode_zone, GFP_KERNEL | __GFP_NOFAIL);
+ ip = alloc_inode_sb(mp->m_super, xfs_inode_zone, GFP_KERNEL | __GFP_NOFAIL);
if (inode_init_always(mp->m_super, VFS_I(ip))) {
kmem_cache_free(xfs_inode_zone, ip);
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/ocfs2/dlmfs/dlmfs.c | 2 +-
fs/ocfs2/super.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/ocfs2/dlmfs/dlmfs.c b/fs/ocfs2/dlmfs/dlmfs.c
index fa0a14f199eb..e360543ad7e7 100644
--- a/fs/ocfs2/dlmfs/dlmfs.c
+++ b/fs/ocfs2/dlmfs/dlmfs.c
@@ -280,7 +280,7 @@ static struct inode *dlmfs_alloc_inode(struct super_block *sb)
{
struct dlmfs_inode_private *ip;
- ip = kmem_cache_alloc(dlmfs_inode_cache, GFP_NOFS);
+ ip = alloc_inode_sb(sb, dlmfs_inode_cache, GFP_NOFS);
if (!ip)
return NULL;
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index c86bd4e60e20..cf044448130f 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -549,7 +549,7 @@ static struct inode *ocfs2_alloc_inode(struct super_block *sb)
{
struct ocfs2_inode_info *oi;
- oi = kmem_cache_alloc(ocfs2_inode_cachep, GFP_NOFS);
+ oi = alloc_inode_sb(sb, ocfs2_inode_cachep, GFP_NOFS);
if (!oi)
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
ipc/mqueue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 5becca9be867..7c08eb3c258d 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -486,7 +486,7 @@ static struct inode *mqueue_alloc_inode(struct super_block *sb)
{
struct mqueue_inode_info *ei;
- ei = kmem_cache_alloc(mqueue_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, mqueue_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
mm/shmem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index a5ae8266891d..541bdf61113e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3738,7 +3738,7 @@ static struct kmem_cache *shmem_inode_cachep;
static struct inode *shmem_alloc_inode(struct super_block *sb)
{
struct shmem_inode_info *info;
- info = kmem_cache_alloc(shmem_inode_cachep, GFP_KERNEL);
+ info = alloc_inode_sb(sb, shmem_inode_cachep, GFP_KERNEL);
if (!info)
return NULL;
return &info->vfs_inode;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/zonefs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index ddc346a9df9b..19bebbc2ccdf 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -1137,7 +1137,7 @@ static struct inode *zonefs_alloc_inode(struct super_block *sb)
{
struct zonefs_inode_info *zi;
- zi = kmem_cache_alloc(zonefs_inode_cachep, GFP_KERNEL);
+ zi = alloc_inode_sb(sb, zonefs_inode_cachep, GFP_KERNEL);
if (!zi)
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/vboxsf/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/vboxsf/super.c b/fs/vboxsf/super.c
index 4f5e59f06284..050ef855158b 100644
--- a/fs/vboxsf/super.c
+++ b/fs/vboxsf/super.c
@@ -244,7 +244,7 @@ static struct inode *vboxsf_alloc_inode(struct super_block *sb)
{
struct vboxsf_inode *sf_i;
- sf_i = kmem_cache_alloc(vboxsf_inode_cachep, GFP_NOFS);
+ sf_i = alloc_inode_sb(sb, vboxsf_inode_cachep, GFP_NOFS);
if (!sf_i)
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/udf/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/udf/super.c b/fs/udf/super.c
index b2d7c57d0688..76b706584632 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -135,7 +135,7 @@ static struct kmem_cache *udf_inode_cachep;
static struct inode *udf_alloc_inode(struct super_block *sb)
{
struct udf_inode_info *ei;
- ei = kmem_cache_alloc(udf_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, udf_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/ubifs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index f0fb25727d96..73b51f8e6817 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -268,7 +268,7 @@ static struct inode *ubifs_alloc_inode(struct super_block *sb)
{
struct ubifs_inode *ui;
- ui = kmem_cache_alloc(ubifs_inode_slab, GFP_NOFS);
+ ui = alloc_inode_sb(sb, ubifs_inode_slab, GFP_NOFS);
if (!ui)
return NULL;
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
fs/f2fs/super.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 78ebc306ee2b..20e335f50219 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1306,8 +1306,12 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
{
struct f2fs_inode_info *fi;
- fi = f2fs_kmem_cache_alloc(f2fs_inode_cachep,
- GFP_F2FS_ZERO, false, F2FS_SB(sb));
+ if (time_to_inject(F2FS_SB(sb), FAULT_SLAB_ALLOC)) {
+ f2fs_show_injection_info(F2FS_SB(sb), FAULT_SLAB_ALLOC);
+ return NULL;
+ }
+
+ fi = alloc_inode_sb(sb, f2fs_inode_cachep, GFP_F2FS_ZERO);
if (!fi)
return NULL;
--
2.11.0
If we want to add the allocated objects to its list_lru, we should use
kmem_cache_alloc_lru() to allocate objects. So intruduce
nfs4_xattr_entry_cachep which is used to allocate nfs4_xattr_entry.
Signed-off-by: Muchun Song <[email protected]>
---
fs/nfs/nfs42xattr.c | 95 ++++++++++++++++++++++++++---------------------------
1 file changed, 47 insertions(+), 48 deletions(-)
diff --git a/fs/nfs/nfs42xattr.c b/fs/nfs/nfs42xattr.c
index 1c4d2a05b401..5b7af9080db0 100644
--- a/fs/nfs/nfs42xattr.c
+++ b/fs/nfs/nfs42xattr.c
@@ -81,7 +81,7 @@ struct nfs4_xattr_entry {
struct hlist_node hnode;
struct list_head lru;
struct list_head dispose;
- char *xattr_name;
+ const char *xattr_name;
void *xattr_value;
size_t xattr_size;
struct nfs4_xattr_bucket *bucket;
@@ -98,6 +98,7 @@ static struct list_lru nfs4_xattr_entry_lru;
static struct list_lru nfs4_xattr_large_entry_lru;
static struct kmem_cache *nfs4_xattr_cache_cachep;
+static struct kmem_cache *nfs4_xattr_entry_cachep;
/*
* Hashing helper functions.
@@ -177,49 +178,27 @@ nfs4_xattr_alloc_entry(const char *name, const void *value,
{
struct nfs4_xattr_entry *entry;
void *valp;
- char *namep;
- size_t alloclen, slen;
- char *buf;
- uint32_t flags;
+ const char *namep;
+ uint32_t flags = len > PAGE_SIZE ? NFS4_XATTR_ENTRY_EXTVAL : 0;
+ gfp_t gfp = GFP_KERNEL_ACCOUNT | GFP_NOFS;
+ struct list_lru *lru;
BUILD_BUG_ON(sizeof(struct nfs4_xattr_entry) +
XATTR_NAME_MAX + 1 > PAGE_SIZE);
- alloclen = sizeof(struct nfs4_xattr_entry);
- if (name != NULL) {
- slen = strlen(name) + 1;
- alloclen += slen;
- } else
- slen = 0;
-
- if (alloclen + len <= PAGE_SIZE) {
- alloclen += len;
- flags = 0;
- } else {
- flags = NFS4_XATTR_ENTRY_EXTVAL;
- }
-
- buf = kmalloc(alloclen, GFP_KERNEL_ACCOUNT | GFP_NOFS);
- if (buf == NULL)
+ lru = flags & NFS4_XATTR_ENTRY_EXTVAL ? &nfs4_xattr_large_entry_lru :
+ &nfs4_xattr_entry_lru;
+ entry = kmem_cache_alloc_lru(nfs4_xattr_entry_cachep, lru, gfp);
+ if (!entry)
return NULL;
- entry = (struct nfs4_xattr_entry *)buf;
-
- if (name != NULL) {
- namep = buf + sizeof(struct nfs4_xattr_entry);
- memcpy(namep, name, slen);
- } else {
- namep = NULL;
- }
-
-
- if (flags & NFS4_XATTR_ENTRY_EXTVAL) {
- valp = kvmalloc(len, GFP_KERNEL_ACCOUNT | GFP_NOFS);
- if (valp == NULL) {
- kfree(buf);
- return NULL;
- }
- } else if (len != 0) {
- valp = buf + sizeof(struct nfs4_xattr_entry) + slen;
+ namep = kstrdup_const(name, gfp);
+ if (!namep && name)
+ goto free_buf;
+
+ if (len != 0) {
+ valp = kvmalloc(len, gfp);
+ if (!valp)
+ goto free_name;
} else
valp = NULL;
@@ -232,23 +211,23 @@ nfs4_xattr_alloc_entry(const char *name, const void *value,
entry->flags = flags;
entry->xattr_value = valp;
- kref_init(&entry->ref);
entry->xattr_name = namep;
entry->xattr_size = len;
- entry->bucket = NULL;
- INIT_LIST_HEAD(&entry->lru);
- INIT_LIST_HEAD(&entry->dispose);
- INIT_HLIST_NODE(&entry->hnode);
return entry;
+free_name:
+ kfree_const(namep);
+free_buf:
+ kmem_cache_free(nfs4_xattr_entry_cachep, entry);
+ return NULL;
}
static void
nfs4_xattr_free_entry(struct nfs4_xattr_entry *entry)
{
- if (entry->flags & NFS4_XATTR_ENTRY_EXTVAL)
- kvfree(entry->xattr_value);
- kfree(entry);
+ kvfree(entry->xattr_value);
+ kfree_const(entry->xattr_name);
+ kmem_cache_free(nfs4_xattr_entry_cachep, entry);
}
static void
@@ -289,7 +268,7 @@ nfs4_xattr_alloc_cache(void)
{
struct nfs4_xattr_cache *cache;
- cache = kmem_cache_alloc(nfs4_xattr_cache_cachep,
+ cache = kmem_cache_alloc_lru(nfs4_xattr_cache_cachep, &nfs4_xattr_cache_lru,
GFP_KERNEL_ACCOUNT | GFP_NOFS);
if (cache == NULL)
return NULL;
@@ -992,6 +971,17 @@ static void nfs4_xattr_cache_init_once(void *p)
INIT_LIST_HEAD(&cache->dispose);
}
+static void nfs4_xattr_entry_init_once(void *p)
+{
+ struct nfs4_xattr_entry *entry = p;
+
+ kref_init(&entry->ref);
+ entry->bucket = NULL;
+ INIT_LIST_HEAD(&entry->lru);
+ INIT_LIST_HEAD(&entry->dispose);
+ INIT_HLIST_NODE(&entry->hnode);
+}
+
int __init nfs4_xattr_cache_init(void)
{
int ret = 0;
@@ -1003,6 +993,13 @@ int __init nfs4_xattr_cache_init(void)
if (nfs4_xattr_cache_cachep == NULL)
return -ENOMEM;
+ nfs4_xattr_entry_cachep = kmem_cache_create("nfs4_xattr_entry",
+ sizeof(struct nfs4_xattr_entry), 0,
+ (SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD | SLAB_ACCOUNT),
+ nfs4_xattr_entry_init_once);
+ if (!nfs4_xattr_entry_cachep)
+ goto out5;
+
ret = list_lru_init_memcg(&nfs4_xattr_large_entry_lru,
&nfs4_xattr_large_entry_shrinker);
if (ret)
@@ -1040,6 +1037,8 @@ int __init nfs4_xattr_cache_init(void)
out3:
list_lru_destroy(&nfs4_xattr_large_entry_lru);
out4:
+ kmem_cache_destroy(nfs4_xattr_entry_cachep);
+out5:
kmem_cache_destroy(nfs4_xattr_cache_cachep);
return ret;
--
2.11.0
Like inode cache, the dentry will also be added to its memcg list_lru.
So replace kmem_cache_alloc() with kmem_cache_alloc_lru() to allocate
dentry.
Signed-off-by: Muchun Song <[email protected]>
---
fs/dcache.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/dcache.c b/fs/dcache.c
index cf871a81f4fd..36d4806d7284 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1741,7 +1741,8 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
char *dname;
int err;
- dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL);
+ dentry = kmem_cache_alloc_lru(dentry_cache, &sb->s_dentry_lru,
+ GFP_KERNEL);
if (!dentry)
return NULL;
--
2.11.0
In our server, we found a suspected memory leak problem. The kmalloc-32
consumes more than 6GB of memory. Other kmem_caches consume less than
2GB memory.
After our in-depth analysis, the memory consumption of kmalloc-32 slab
cache is the cause of list_lru_one allocation.
crash> p memcg_nr_cache_ids
memcg_nr_cache_ids = $2 = 24574
memcg_nr_cache_ids is very large and memory consumption of each list_lru
can be calculated with the following formula.
num_numa_node * memcg_nr_cache_ids * 32 (kmalloc-32)
There are 4 numa nodes in our system, so each list_lru consumes ~3MB.
crash> list super_blocks | wc -l
952
Every mount will register 2 list lrus, one is for inode, another is for
dentry. There are 952 super_blocks. So the total memory is 952 * 2 * 3
MB (~5.6GB). But the number of memory cgroup is less than 500. So I
guess more than 12286 containers have been deployed on this machine (I
do not know why there are so many containers, it may be a user's bug or
the user really want to do that). And memcg_nr_cache_ids has not been
reduced to a suitable value. This can waste a lot of memory.
Now the infrastructure for dynamic list_lru_one allocation is ready, so
remove statically allocated memory code to save memory.
Signed-off-by: Muchun Song <[email protected]>
---
include/linux/list_lru.h | 8 +--
mm/list_lru.c | 125 +++++++++++++++++++++++++++--------------------
mm/memcontrol.c | 7 ++-
3 files changed, 81 insertions(+), 59 deletions(-)
diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index 50a3144016b4..62f407831b8c 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -32,14 +32,15 @@ struct list_lru_one {
};
struct list_lru_per_memcg {
+ struct rcu_head rcu;
/* array of per cgroup per node lists, indexed by node id */
- struct list_lru_one nodes[0];
+ struct list_lru_one nodes[];
};
struct list_lru_memcg {
struct rcu_head rcu;
/* array of per cgroup lists, indexed by memcg_cache_id */
- struct list_lru_per_memcg *lrus[];
+ struct list_lru_per_memcg __rcu *lrus[];
};
struct list_lru_node {
@@ -76,7 +77,8 @@ int __list_lru_init(struct list_lru *lru, bool memcg_aware,
__list_lru_init((lru), true, NULL, shrinker)
int memcg_update_all_list_lrus(int num_memcgs);
-void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_memcg);
+void memcg_drain_all_list_lrus(struct mem_cgroup *src_memcg,
+ struct mem_cgroup *dst_memcg);
/**
* list_lru_add: add an element to the lru list's tail
diff --git a/mm/list_lru.c b/mm/list_lru.c
index eea29eb4cf48..48651c29a9d1 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -60,8 +60,12 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
*/
memcg_lrus = rcu_dereference_check(lru->memcg_lrus,
lockdep_is_held(&nlru->lock));
- if (memcg_lrus && idx >= 0)
- return &memcg_lrus->lrus[idx]->nodes[nid];
+ if (memcg_lrus && idx >= 0) {
+ struct list_lru_per_memcg *mlru;
+
+ mlru = rcu_dereference_check(memcg_lrus->lrus[idx], true);
+ return mlru ? &mlru->nodes[nid] : NULL;
+ }
return &nlru->lru;
}
@@ -184,11 +188,12 @@ unsigned long list_lru_count_one(struct list_lru *lru,
int nid, struct mem_cgroup *memcg)
{
struct list_lru_one *l;
- long count;
+ long count = 0;
rcu_read_lock();
l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg));
- count = READ_ONCE(l->nr_items);
+ if (l)
+ count = READ_ONCE(l->nr_items);
rcu_read_unlock();
if (unlikely(count < 0))
@@ -217,8 +222,11 @@ __list_lru_walk_one(struct list_lru *lru, int nid, int memcg_idx,
struct list_head *item, *n;
unsigned long isolated = 0;
- l = list_lru_from_memcg_idx(lru, nid, memcg_idx);
restart:
+ l = list_lru_from_memcg_idx(lru, nid, memcg_idx);
+ if (!l)
+ goto out;
+
list_for_each_safe(item, n, &l->list) {
enum lru_status ret;
@@ -262,6 +270,7 @@ __list_lru_walk_one(struct list_lru *lru, int nid, int memcg_idx,
BUG();
}
}
+out:
return isolated;
}
@@ -354,20 +363,26 @@ static struct list_lru_per_memcg *memcg_list_lru_alloc(gfp_t gfp)
return lru;
}
-static int memcg_init_list_lru_range(struct list_lru_memcg *memcg_lrus,
- int begin, int end)
+static void memcg_list_lru_free(struct list_lru *lru, int src_idx)
{
- int i;
+ struct list_lru_memcg *memcg_lrus;
+ struct list_lru_per_memcg *mlru;
- for (i = begin; i < end; i++) {
- memcg_lrus->lrus[i] = memcg_list_lru_alloc(GFP_KERNEL);
- if (!memcg_lrus->lrus[i])
- goto fail;
- }
- return 0;
-fail:
- memcg_destroy_list_lru_range(memcg_lrus, begin, i);
- return -ENOMEM;
+ spin_lock_irq(&lru->lock);
+ memcg_lrus = rcu_dereference_protected(lru->memcg_lrus, true);
+ mlru = rcu_dereference_protected(memcg_lrus->lrus[src_idx], true);
+ if (mlru)
+ rcu_assign_pointer(memcg_lrus->lrus[src_idx], NULL);
+ spin_unlock_irq(&lru->lock);
+
+ /*
+ * The __list_lru_walk_one() can walk the list of this node.
+ * We need kvfree_rcu() here. And the walking of the list
+ * is under lru->node[nid]->lock, which can serve as a RCU
+ * read-side critical section.
+ */
+ if (mlru)
+ kvfree_rcu(mlru, rcu);
}
static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
@@ -381,15 +396,11 @@ static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
spin_lock_init(&lru->lock);
- memcg_lrus = kvmalloc(sizeof(*memcg_lrus) +
+ memcg_lrus = kvzalloc(sizeof(*memcg_lrus) +
size * sizeof(memcg_lrus->lrus[0]), GFP_KERNEL);
if (!memcg_lrus)
return -ENOMEM;
- if (memcg_init_list_lru_range(memcg_lrus, 0, size)) {
- kvfree(memcg_lrus);
- return -ENOMEM;
- }
RCU_INIT_POINTER(lru->memcg_lrus, memcg_lrus);
return 0;
@@ -423,13 +434,9 @@ static int memcg_update_list_lru(struct list_lru *lru, int old_size, int new_siz
if (!new)
return -ENOMEM;
- if (memcg_init_list_lru_range(new, old_size, new_size)) {
- kvfree(new);
- return -ENOMEM;
- }
-
spin_lock_irq(&lru->lock);
memcpy(&new->lrus, &old->lrus, old_size * sizeof(new->lrus[0]));
+ memset(&new->lrus[old_size], 0, (new_size - old_size) * sizeof(new->lrus[0]));
rcu_assign_pointer(lru->memcg_lrus, new);
spin_unlock_irq(&lru->lock);
@@ -437,20 +444,6 @@ static int memcg_update_list_lru(struct list_lru *lru, int old_size, int new_siz
return 0;
}
-static void memcg_cancel_update_list_lru(struct list_lru *lru,
- int old_size, int new_size)
-{
- struct list_lru_memcg *memcg_lrus;
-
- memcg_lrus = rcu_dereference_protected(lru->memcg_lrus,
- lockdep_is_held(&list_lrus_mutex));
- /*
- * Do not bother shrinking the array back to the old size, because we
- * cannot handle allocation failures here.
- */
- memcg_destroy_list_lru_range(memcg_lrus, old_size, new_size);
-}
-
int memcg_update_all_list_lrus(int new_size)
{
int ret = 0;
@@ -461,15 +454,10 @@ int memcg_update_all_list_lrus(int new_size)
list_for_each_entry(lru, &memcg_list_lrus, list) {
ret = memcg_update_list_lru(lru, old_size, new_size);
if (ret)
- goto fail;
+ break;
}
-out:
mutex_unlock(&list_lrus_mutex);
return ret;
-fail:
- list_for_each_entry_continue_reverse(lru, &memcg_list_lrus, list)
- memcg_cancel_update_list_lru(lru, old_size, new_size);
- goto out;
}
static void memcg_drain_list_lru_node(struct list_lru *lru, int nid,
@@ -486,6 +474,8 @@ static void memcg_drain_list_lru_node(struct list_lru *lru, int nid,
spin_lock_irq(&nlru->lock);
src = list_lru_from_memcg_idx(lru, nid, src_idx);
+ if (!src)
+ goto out;
dst = list_lru_from_memcg_idx(lru, nid, dst_idx);
list_splice_init(&src->list, &dst->list);
@@ -495,7 +485,7 @@ static void memcg_drain_list_lru_node(struct list_lru *lru, int nid,
set_shrinker_bit(dst_memcg, nid, lru_shrinker_id(lru));
src->nr_items = 0;
}
-
+out:
spin_unlock_irq(&nlru->lock);
}
@@ -506,11 +496,37 @@ static void memcg_drain_list_lru(struct list_lru *lru,
for_each_node(i)
memcg_drain_list_lru_node(lru, i, src_idx, dst_memcg);
+
+ memcg_list_lru_free(lru, src_idx);
}
-void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_memcg)
+void memcg_drain_all_list_lrus(struct mem_cgroup *src_memcg,
+ struct mem_cgroup *dst_memcg)
{
+ struct cgroup_subsys_state *css;
struct list_lru *lru;
+ int src_idx = src_memcg->kmemcg_id;
+
+ /*
+ * Change kmemcg_id of this cgroup and all its descendants to the
+ * parent's id, and then move all entries from this cgroup's list_lrus
+ * to ones of the parent.
+ *
+ * After we have finished, all list_lrus corresponding to this cgroup
+ * are guaranteed to remain empty. So we can safely free this cgroup's
+ * list lrus which is freed in memcg_list_lru_free().
+ * Changing ->kmemcg_id to the parent can prevent list_lru_memcg_alloc()
+ * from allocating list lrus for this cgroup after calling
+ * memcg_list_lru_free().
+ */
+ rcu_read_lock();
+ css_for_each_descendant_pre(css, &src_memcg->css) {
+ struct mem_cgroup *memcg;
+
+ memcg = mem_cgroup_from_css(css);
+ memcg->kmemcg_id = dst_memcg->kmemcg_id;
+ }
+ rcu_read_unlock();
mutex_lock(&list_lrus_mutex);
list_for_each_entry(lru, &memcg_list_lrus, list)
@@ -529,7 +545,7 @@ static bool memcg_list_lru_skip_alloc(struct list_lru *lru,
rcu_read_lock();
memcg_lrus = rcu_dereference(lru->memcg_lrus);
- if (memcg_lrus->lrus[idx]) {
+ if (rcu_access_pointer(memcg_lrus->lrus[idx])) {
rcu_read_unlock();
return true;
}
@@ -544,7 +560,7 @@ int list_lru_memcg_alloc(struct list_lru *lru, struct mem_cgroup *memcg, gfp_t g
struct list_lru_memcg *memcg_lrus;
int i;
- struct list_lru_memcg {
+ struct list_lru_memcg_table {
struct list_lru_per_memcg *mlru;
struct mem_cgroup *memcg;
} *table;
@@ -588,11 +604,12 @@ int list_lru_memcg_alloc(struct list_lru *lru, struct mem_cgroup *memcg, gfp_t g
memcg_lrus = rcu_dereference_protected(lru->memcg_lrus, true);
while (i--) {
int index = memcg_cache_id(table[i].memcg);
+ struct list_lru_per_memcg *mlru = table[i].mlru;
- if (memcg_lrus->lrus[index])
- kfree(table[i].mlru);
+ if (index < 0 || rcu_dereference_protected(memcg_lrus->lrus[index], true))
+ kfree(mlru);
else
- memcg_lrus->lrus[index] = table[i].mlru;
+ rcu_assign_pointer(memcg_lrus->lrus[index], mlru);
}
spin_unlock_irqrestore(&lru->lock, flags);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0e8c8d8465e5..2045cd8b1d7f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3635,10 +3635,13 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg)
memcg_reparent_objcgs(memcg, parent);
+ /*
+ * memcg_drain_all_list_lrus() can change memcg->kmemcg_id.
+ * Cache it to @kmemcg_id.
+ */
kmemcg_id = memcg->kmemcg_id;
- /* memcg_reparent_objcgs() must be called before this. */
- memcg_drain_all_list_lrus(kmemcg_id, parent);
+ memcg_drain_all_list_lrus(memcg, parent);
memcg_free_cache_id(kmemcg_id);
}
--
2.11.0
The workingset will add the xa_node to the shadow_nodes list. So the
allocation of xa_node should be done by kmem_cache_alloc_lru(). The
workingset can use xas_set_lru() to pass the list_lru.
Signed-off-by: Muchun Song <[email protected]>
---
include/linux/xarray.h | 9 ++++++++-
lib/xarray.c | 10 +++++-----
2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index a91e3d90df8a..31f3e5ef3c7b 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -1317,6 +1317,7 @@ struct xa_state {
struct xa_node *xa_node;
struct xa_node *xa_alloc;
xa_update_node_t xa_update;
+ struct list_lru *xa_lru;
};
/*
@@ -1336,7 +1337,8 @@ struct xa_state {
.xa_pad = 0, \
.xa_node = XAS_RESTART, \
.xa_alloc = NULL, \
- .xa_update = NULL \
+ .xa_update = NULL, \
+ .xa_lru = NULL, \
}
/**
@@ -1613,6 +1615,11 @@ static inline void xas_set_update(struct xa_state *xas, xa_update_node_t update)
xas->xa_update = update;
}
+static inline void xas_set_lru(struct xa_state *xas, struct list_lru *lru)
+{
+ xas->xa_lru = lru;
+}
+
/**
* xas_next_entry() - Advance iterator to next present entry.
* @xas: XArray operation state.
diff --git a/lib/xarray.c b/lib/xarray.c
index f5d8f54907b4..e9b818abc823 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -302,7 +302,7 @@ bool xas_nomem(struct xa_state *xas, gfp_t gfp)
}
if (xas->xa->xa_flags & XA_FLAGS_ACCOUNT)
gfp |= __GFP_ACCOUNT;
- xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp);
+ xas->xa_alloc = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
if (!xas->xa_alloc)
return false;
xas->xa_alloc->parent = NULL;
@@ -334,10 +334,10 @@ static bool __xas_nomem(struct xa_state *xas, gfp_t gfp)
gfp |= __GFP_ACCOUNT;
if (gfpflags_allow_blocking(gfp)) {
xas_unlock_type(xas, lock_type);
- xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp);
+ xas->xa_alloc = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
xas_lock_type(xas, lock_type);
} else {
- xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp);
+ xas->xa_alloc = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
}
if (!xas->xa_alloc)
return false;
@@ -371,7 +371,7 @@ static void *xas_alloc(struct xa_state *xas, unsigned int shift)
if (xas->xa->xa_flags & XA_FLAGS_ACCOUNT)
gfp |= __GFP_ACCOUNT;
- node = kmem_cache_alloc(radix_tree_node_cachep, gfp);
+ node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
if (!node) {
xas_set_err(xas, -ENOMEM);
return NULL;
@@ -1014,7 +1014,7 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order,
void *sibling = NULL;
struct xa_node *node;
- node = kmem_cache_alloc(radix_tree_node_cachep, gfp);
+ node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
if (!node)
goto nomem;
node->array = xas->xa;
--
2.11.0
The workingset will add the xa_node to shadow_nodes, so we should use
xas_set_lru() to pass the list_lru which we want to insert xa_node
into to set up the xa_node reclaim context correctly.
Signed-off-by: Muchun Song <[email protected]>
---
include/linux/swap.h | 5 ++++-
mm/workingset.c | 2 +-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index cdf0957a88a4..629262582eb9 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -334,9 +334,12 @@ void workingset_activation(struct folio *folio);
/* Only track the nodes of mappings with shadow entries */
void workingset_update_node(struct xa_node *node);
+extern struct list_lru shadow_nodes;
#define mapping_set_update(xas, mapping) do { \
- if (!dax_mapping(mapping) && !shmem_mapping(mapping)) \
+ if (!dax_mapping(mapping) && !shmem_mapping(mapping)) { \
xas_set_update(xas, workingset_update_node); \
+ xas_set_lru(xas, &shadow_nodes); \
+ } \
} while (0)
/* linux/mm/page_alloc.c */
diff --git a/mm/workingset.c b/mm/workingset.c
index e9cc99ebdec7..5a38c08ca1c4 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -428,7 +428,7 @@ void workingset_activation(struct folio *folio)
* point where they would still be useful.
*/
-static struct list_lru shadow_nodes;
+struct list_lru shadow_nodes;
void workingset_update_node(struct xa_node *node)
{
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
net/socket.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/socket.c b/net/socket.c
index 7f64a6eccf63..cee567ccd99c 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -300,7 +300,7 @@ static struct inode *sock_alloc_inode(struct super_block *sb)
{
struct socket_alloc *ei;
- ei = kmem_cache_alloc(sock_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, sock_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
init_waitqueue_head(&ei->socket.wq.wait);
--
2.11.0
There are two idrs being used by memory cgroup, one is for kmem ID,
another is for memory cgroup ID. The maximum ID of both is 64Ki.
Both of them can limit the total number of memory cgroups. Actually,
we can reuse memory cgroup ID for kmem ID to simplify the code.
Signed-off-by: Muchun Song <[email protected]>
---
include/linux/memcontrol.h | 1 +
mm/memcontrol.c | 47 ++++++++--------------------------------------
2 files changed, 9 insertions(+), 39 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 83add6c484b1..33f6ec4783f8 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -56,6 +56,7 @@ struct mem_cgroup_reclaim_cookie {
#ifdef CONFIG_MEMCG
#define MEM_CGROUP_ID_SHIFT 16
+#define MEM_CGROUP_ID_MIN 1
#define MEM_CGROUP_ID_MAX USHRT_MAX
struct mem_cgroup_id {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8e0cde19b648..e3a2e4d65cc5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -356,23 +356,6 @@ static void memcg_reparent_objcgs(struct mem_cgroup *memcg,
}
/*
- * This will be used as a shrinker list's index.
- * The main reason for not using cgroup id for this:
- * this works better in sparse environments, where we have a lot of memcgs,
- * but only a few kmem-limited.
- */
-static DEFINE_IDA(memcg_cache_ida);
-
-/*
- * MAX_SIZE should be as large as the number of cgrp_ids. Ideally, we could get
- * this constant directly from cgroup, but it is understandable that this is
- * better kept as an internal representation in cgroup.c. In any case, the
- * cgrp_id space is not getting any smaller, and we don't have to necessarily
- * increase ours as well if it increases.
- */
-#define MEMCG_CACHES_MAX_SIZE MEM_CGROUP_ID_MAX
-
-/*
* A lot of the calls to the cache allocation functions are expected to be
* inlined by the compiler. Since the calls to memcg_slab_pre_alloc_hook() are
* conditional to this static branch, we'll have to allow modules that does
@@ -3520,10 +3503,12 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css,
}
#ifdef CONFIG_MEMCG_KMEM
+#define MEM_CGROUP_KMEM_ID_MIN -1
+#define MEM_CGROUP_ID_DIFF (MEM_CGROUP_ID_MIN - MEM_CGROUP_KMEM_ID_MIN)
+
static int memcg_online_kmem(struct mem_cgroup *memcg)
{
struct obj_cgroup *objcg;
- int memcg_id;
if (cgroup_memory_nokmem)
return 0;
@@ -3531,22 +3516,16 @@ static int memcg_online_kmem(struct mem_cgroup *memcg)
if (unlikely(mem_cgroup_is_root(memcg)))
return 0;
- memcg_id = ida_alloc_max(&memcg_cache_ida, MEMCG_CACHES_MAX_SIZE - 1,
- GFP_KERNEL);
- if (memcg_id < 0)
- return memcg_id;
-
objcg = obj_cgroup_alloc();
- if (!objcg) {
- ida_free(&memcg_cache_ida, memcg_id);
+ if (!objcg)
return -ENOMEM;
- }
+
objcg->memcg = memcg;
rcu_assign_pointer(memcg->objcg, objcg);
static_branch_enable(&memcg_kmem_enabled_key);
- memcg->kmemcg_id = memcg_id;
+ memcg->kmemcg_id = memcg->id.id - MEM_CGROUP_ID_DIFF;
return 0;
}
@@ -3554,7 +3533,6 @@ static int memcg_online_kmem(struct mem_cgroup *memcg)
static void memcg_offline_kmem(struct mem_cgroup *memcg)
{
struct mem_cgroup *parent;
- int kmemcg_id;
if (cgroup_memory_nokmem)
return;
@@ -3567,16 +3545,7 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg)
parent = root_mem_cgroup;
memcg_reparent_objcgs(memcg, parent);
-
- /*
- * memcg_reparent_list_lrus() can change memcg->kmemcg_id.
- * Cache it to @kmemcg_id.
- */
- kmemcg_id = memcg->kmemcg_id;
-
memcg_reparent_list_lrus(memcg, parent);
-
- ida_free(&memcg_cache_ida, kmemcg_id);
}
#else
static int memcg_online_kmem(struct mem_cgroup *memcg)
@@ -5042,7 +5011,7 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
return ERR_PTR(error);
memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
- 1, MEM_CGROUP_ID_MAX,
+ MEM_CGROUP_ID_MIN, MEM_CGROUP_ID_MAX,
GFP_KERNEL);
if (memcg->id.id < 0) {
error = memcg->id.id;
@@ -5070,7 +5039,7 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
spin_lock_init(&memcg->event_list_lock);
memcg->socket_pressure = jiffies;
#ifdef CONFIG_MEMCG_KMEM
- memcg->kmemcg_id = -1;
+ memcg->kmemcg_id = MEM_CGROUP_KMEM_ID_MIN;
INIT_LIST_HEAD(&memcg->objcg_list);
#endif
#ifdef CONFIG_CGROUP_WRITEBACK
--
2.11.0
If we run 10k containers in the system, the size of the
list_lru_memcg->lrus can be ~96KB per list_lru. When we decrease the
number containers, the size of the array will not be shrinked. It is
not scalable. The xarray is a good choice for this case. We can save
a lot of memory when there are tens of thousands continers in the
system. If we use xarray, we also can remove the logic code of
resizing array, which can simplify the code.
Signed-off-by: Muchun Song <[email protected]>
---
include/linux/list_lru.h | 13 +--
include/linux/memcontrol.h | 23 ------
mm/list_lru.c | 196 ++++++++++++++-------------------------------
mm/memcontrol.c | 77 ++----------------
4 files changed, 68 insertions(+), 241 deletions(-)
diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index 5e9c632c9eb7..c423be3cf2d3 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -11,6 +11,7 @@
#include <linux/list.h>
#include <linux/nodemask.h>
#include <linux/shrinker.h>
+#include <linux/xarray.h>
struct mem_cgroup;
@@ -37,12 +38,6 @@ struct list_lru_per_memcg {
struct list_lru_one nodes[];
};
-struct list_lru_memcg {
- struct rcu_head rcu;
- /* array of per cgroup lists, indexed by memcg_cache_id */
- struct list_lru_per_memcg __rcu *lrus[];
-};
-
struct list_lru_node {
/* protects all lists on the node, including per cgroup */
spinlock_t lock;
@@ -57,10 +52,7 @@ struct list_lru {
struct list_head list;
int shrinker_id;
bool memcg_aware;
- /* protects ->memcg_lrus->lrus[i] */
- spinlock_t lock;
- /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
- struct list_lru_memcg __rcu *memcg_lrus;
+ struct xarray xa;
#endif
};
@@ -76,7 +68,6 @@ int __list_lru_init(struct list_lru *lru, bool memcg_aware,
#define list_lru_init_memcg(lru, shrinker) \
__list_lru_init((lru), true, NULL, shrinker)
-int memcg_update_all_list_lrus(int num_memcgs);
void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent);
/**
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 06ee32822fd4..83add6c484b1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1689,18 +1689,6 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size);
extern struct static_key_false memcg_kmem_enabled_key;
-extern int memcg_nr_cache_ids;
-void memcg_get_cache_ids(void);
-void memcg_put_cache_ids(void);
-
-/*
- * Helper macro to loop through all memcg-specific caches. Callers must still
- * check if the cache is valid (it is either valid or NULL).
- * the slab_mutex must be held when looping through those caches
- */
-#define for_each_memcg_cache_index(_idx) \
- for ((_idx) = 0; (_idx) < memcg_nr_cache_ids; (_idx)++)
-
static inline bool memcg_kmem_enabled(void)
{
return static_branch_likely(&memcg_kmem_enabled_key);
@@ -1757,9 +1745,6 @@ static inline void __memcg_kmem_uncharge_page(struct page *page, int order)
{
}
-#define for_each_memcg_cache_index(_idx) \
- for (; NULL; )
-
static inline bool memcg_kmem_enabled(void)
{
return false;
@@ -1770,14 +1755,6 @@ static inline int memcg_cache_id(struct mem_cgroup *memcg)
return -1;
}
-static inline void memcg_get_cache_ids(void)
-{
-}
-
-static inline void memcg_put_cache_ids(void)
-{
-}
-
static inline struct mem_cgroup *mem_cgroup_from_obj(void *p)
{
return NULL;
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 1e42d9847b08..1202519aeb31 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -51,22 +51,12 @@ static int lru_shrinker_id(struct list_lru *lru)
static inline struct list_lru_one *
list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
{
- struct list_lru_memcg *memcg_lrus;
- struct list_lru_node *nlru = &lru->node[nid];
-
- /*
- * Either lock or RCU protects the array of per cgroup lists
- * from relocation (see memcg_update_list_lru).
- */
- memcg_lrus = rcu_dereference_check(lru->memcg_lrus,
- lockdep_is_held(&nlru->lock));
- if (memcg_lrus && idx >= 0) {
- struct list_lru_per_memcg *mlru;
+ if (list_lru_memcg_aware(lru) && idx >= 0) {
+ struct list_lru_per_memcg *mlru = xa_load(&lru->xa, idx);
- mlru = rcu_dereference_check(memcg_lrus->lrus[idx], true);
return mlru ? &mlru->nodes[nid] : NULL;
}
- return &nlru->lru;
+ return &lru->node[nid].lru;
}
static inline struct list_lru_one *
@@ -77,7 +67,7 @@ list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr,
struct list_lru_one *l = &nlru->lru;
struct mem_cgroup *memcg = NULL;
- if (!lru->memcg_lrus)
+ if (!list_lru_memcg_aware(lru))
goto out;
memcg = mem_cgroup_from_obj(ptr);
@@ -310,16 +300,20 @@ unsigned long list_lru_walk_node(struct list_lru *lru, int nid,
unsigned long *nr_to_walk)
{
long isolated = 0;
- int memcg_idx;
isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg,
nr_to_walk);
+
+#ifdef CONFIG_MEMCG_KMEM
if (*nr_to_walk > 0 && list_lru_memcg_aware(lru)) {
- for_each_memcg_cache_index(memcg_idx) {
+ struct list_lru_per_memcg *mlru;
+ unsigned long index;
+
+ xa_for_each(&lru->xa, index, mlru) {
struct list_lru_node *nlru = &lru->node[nid];
spin_lock(&nlru->lock);
- isolated += __list_lru_walk_one(lru, nid, memcg_idx,
+ isolated += __list_lru_walk_one(lru, nid, index,
isolate, cb_arg,
nr_to_walk);
spin_unlock(&nlru->lock);
@@ -328,6 +322,8 @@ unsigned long list_lru_walk_node(struct list_lru *lru, int nid,
break;
}
}
+#endif
+
return isolated;
}
EXPORT_SYMBOL_GPL(list_lru_walk_node);
@@ -339,15 +335,6 @@ static void init_one_lru(struct list_lru_one *l)
}
#ifdef CONFIG_MEMCG_KMEM
-static void memcg_destroy_list_lru_range(struct list_lru_memcg *memcg_lrus,
- int begin, int end)
-{
- int i;
-
- for (i = begin; i < end; i++)
- kfree(memcg_lrus->lrus[i]);
-}
-
static struct list_lru_per_memcg *memcg_list_lru_alloc(gfp_t gfp)
{
int nid;
@@ -365,15 +352,7 @@ static struct list_lru_per_memcg *memcg_list_lru_alloc(gfp_t gfp)
static void memcg_list_lru_free(struct list_lru *lru, int src_idx)
{
- struct list_lru_memcg *memcg_lrus;
- struct list_lru_per_memcg *mlru;
-
- spin_lock_irq(&lru->lock);
- memcg_lrus = rcu_dereference_protected(lru->memcg_lrus, true);
- mlru = rcu_dereference_protected(memcg_lrus->lrus[src_idx], true);
- if (mlru)
- rcu_assign_pointer(memcg_lrus->lrus[src_idx], NULL);
- spin_unlock_irq(&lru->lock);
+ struct list_lru_per_memcg *mlru = xa_erase_irq(&lru->xa, src_idx);
/*
* The __list_lru_walk_one() can walk the list of this node.
@@ -385,79 +364,27 @@ static void memcg_list_lru_free(struct list_lru *lru, int src_idx)
kvfree_rcu(mlru, rcu);
}
-static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
+static void memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
{
- struct list_lru_memcg *memcg_lrus;
- int size = memcg_nr_cache_ids;
-
+ if (memcg_aware)
+ xa_init_flags(&lru->xa, XA_FLAGS_LOCK_IRQ);
lru->memcg_aware = memcg_aware;
- if (!memcg_aware)
- return 0;
-
- spin_lock_init(&lru->lock);
-
- memcg_lrus = kvzalloc(sizeof(*memcg_lrus) +
- size * sizeof(memcg_lrus->lrus[0]), GFP_KERNEL);
- if (!memcg_lrus)
- return -ENOMEM;
-
- RCU_INIT_POINTER(lru->memcg_lrus, memcg_lrus);
-
- return 0;
}
static void memcg_destroy_list_lru(struct list_lru *lru)
{
- struct list_lru_memcg *memcg_lrus;
+ XA_STATE(xas, &lru->xa, 0);
+ struct list_lru_per_memcg *mlru;
if (!list_lru_memcg_aware(lru))
return;
- /*
- * This is called when shrinker has already been unregistered,
- * and nobody can use it. So, there is no need to use kvfree_rcu().
- */
- memcg_lrus = rcu_dereference_protected(lru->memcg_lrus, true);
- memcg_destroy_list_lru_range(memcg_lrus, 0, memcg_nr_cache_ids);
- kvfree(memcg_lrus);
-}
-
-static int memcg_update_list_lru(struct list_lru *lru, int old_size, int new_size)
-{
- struct list_lru_memcg *old, *new;
-
- BUG_ON(old_size > new_size);
-
- old = rcu_dereference_protected(lru->memcg_lrus,
- lockdep_is_held(&list_lrus_mutex));
- new = kvmalloc(sizeof(*new) + new_size * sizeof(new->lrus[0]), GFP_KERNEL);
- if (!new)
- return -ENOMEM;
-
- spin_lock_irq(&lru->lock);
- memcpy(&new->lrus, &old->lrus, old_size * sizeof(new->lrus[0]));
- memset(&new->lrus[old_size], 0, (new_size - old_size) * sizeof(new->lrus[0]));
- rcu_assign_pointer(lru->memcg_lrus, new);
- spin_unlock_irq(&lru->lock);
-
- kvfree_rcu(old, rcu);
- return 0;
-}
-
-int memcg_update_all_list_lrus(int new_size)
-{
- int ret = 0;
- struct list_lru *lru;
- int old_size = memcg_nr_cache_ids;
-
- mutex_lock(&list_lrus_mutex);
- list_for_each_entry(lru, &memcg_list_lrus, list) {
- ret = memcg_update_list_lru(lru, old_size, new_size);
- if (ret)
- break;
+ xas_lock_irq(&xas);
+ xas_for_each(&xas, mlru, ULONG_MAX) {
+ kfree(mlru);
+ xas_store(&xas, NULL);
}
- mutex_unlock(&list_lrus_mutex);
- return ret;
+ xas_unlock_irq(&xas);
}
static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid,
@@ -536,27 +463,17 @@ void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *paren
static bool memcg_list_lru_skip_alloc(struct list_lru *lru,
struct mem_cgroup *memcg)
{
- struct list_lru_memcg *memcg_lrus;
int idx = memcg_cache_id(memcg);
- if (unlikely(idx < 0))
- return true;
-
- rcu_read_lock();
- memcg_lrus = rcu_dereference(lru->memcg_lrus);
- if (rcu_access_pointer(memcg_lrus->lrus[idx])) {
- rcu_read_unlock();
+ if (unlikely(idx < 0) || xa_load(&lru->xa, idx))
return true;
- }
- rcu_read_unlock();
-
return false;
}
int list_lru_memcg_alloc(struct list_lru *lru, struct mem_cgroup *memcg, gfp_t gfp)
{
+ XA_STATE(xas, &lru->xa, 0);
unsigned long flags;
- struct list_lru_memcg *memcg_lrus;
int i;
struct list_lru_memcg_table {
@@ -599,27 +516,49 @@ int list_lru_memcg_alloc(struct list_lru *lru, struct mem_cgroup *memcg, gfp_t g
}
}
- spin_lock_irqsave(&lru->lock, flags);
- memcg_lrus = rcu_dereference_protected(lru->memcg_lrus, true);
+ xas_lock_irqsave(&xas, flags);
while (i--) {
int index = memcg_cache_id(table[i].memcg);
struct list_lru_per_memcg *mlru = table[i].mlru;
- if (index < 0 || rcu_dereference_protected(memcg_lrus->lrus[index], true))
+ xas_set(&xas, index);
+retry:
+ if (unlikely(index < 0 || xas_error(&xas) || xas_load(&xas))) {
kfree(mlru);
- else
- rcu_assign_pointer(memcg_lrus->lrus[index], mlru);
+ } else {
+ xas_store(&xas, mlru);
+ if (xas_error(&xas) == -ENOMEM) {
+ xas_unlock_irqrestore(&xas, flags);
+ if (xas_nomem(&xas, gfp))
+ xas_set_err(&xas, 0);
+ xas_lock_irqsave(&xas, flags);
+ /*
+ * The xas lock has been released, this memcg
+ * can be reparented before us. So reload
+ * memcg id. More details see the comments
+ * in memcg_reparent_list_lrus().
+ */
+ index = memcg_cache_id(table[i].memcg);
+ if (index < 0)
+ xas_set_err(&xas, 0);
+ else if (!xas_error(&xas) && index != xas.xa_index)
+ xas_set(&xas, index);
+ goto retry;
+ }
+ }
}
- spin_unlock_irqrestore(&lru->lock, flags);
+ /* xas_nomem() is used to free memory instead of memory allocation. */
+ if (xas.xa_alloc)
+ xas_nomem(&xas, gfp);
+ xas_unlock_irqrestore(&xas, flags);
kfree(table);
- return 0;
+ return xas_error(&xas);
}
#else
-static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
+static inline void memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
{
- return 0;
}
static void memcg_destroy_list_lru(struct list_lru *lru)
@@ -631,7 +570,6 @@ int __list_lru_init(struct list_lru *lru, bool memcg_aware,
struct lock_class_key *key, struct shrinker *shrinker)
{
int i;
- int err = -ENOMEM;
#ifdef CONFIG_MEMCG_KMEM
if (shrinker)
@@ -639,11 +577,10 @@ int __list_lru_init(struct list_lru *lru, bool memcg_aware,
else
lru->shrinker_id = -1;
#endif
- memcg_get_cache_ids();
lru->node = kcalloc(nr_node_ids, sizeof(*lru->node), GFP_KERNEL);
if (!lru->node)
- goto out;
+ return -ENOMEM;
for_each_node(i) {
spin_lock_init(&lru->node[i].lock);
@@ -652,18 +589,10 @@ int __list_lru_init(struct list_lru *lru, bool memcg_aware,
init_one_lru(&lru->node[i].lru);
}
- err = memcg_init_list_lru(lru, memcg_aware);
- if (err) {
- kfree(lru->node);
- /* Do this so a list_lru_destroy() doesn't crash: */
- lru->node = NULL;
- goto out;
- }
-
+ memcg_init_list_lru(lru, memcg_aware);
list_lru_register(lru);
-out:
- memcg_put_cache_ids();
- return err;
+
+ return 0;
}
EXPORT_SYMBOL_GPL(__list_lru_init);
@@ -673,8 +602,6 @@ void list_lru_destroy(struct list_lru *lru)
if (!lru->node)
return;
- memcg_get_cache_ids();
-
list_lru_unregister(lru);
memcg_destroy_list_lru(lru);
@@ -684,6 +611,5 @@ void list_lru_destroy(struct list_lru *lru)
#ifdef CONFIG_MEMCG_KMEM
lru->shrinker_id = -1;
#endif
- memcg_put_cache_ids();
}
EXPORT_SYMBOL_GPL(list_lru_destroy);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4cf98de2ad09..8e0cde19b648 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -359,42 +359,17 @@ static void memcg_reparent_objcgs(struct mem_cgroup *memcg,
* This will be used as a shrinker list's index.
* The main reason for not using cgroup id for this:
* this works better in sparse environments, where we have a lot of memcgs,
- * but only a few kmem-limited. Or also, if we have, for instance, 200
- * memcgs, and none but the 200th is kmem-limited, we'd have to have a
- * 200 entry array for that.
- *
- * The current size of the caches array is stored in memcg_nr_cache_ids. It
- * will double each time we have to increase it.
+ * but only a few kmem-limited.
*/
static DEFINE_IDA(memcg_cache_ida);
-int memcg_nr_cache_ids;
-
-/* Protects memcg_nr_cache_ids */
-static DECLARE_RWSEM(memcg_cache_ids_sem);
-
-void memcg_get_cache_ids(void)
-{
- down_read(&memcg_cache_ids_sem);
-}
-
-void memcg_put_cache_ids(void)
-{
- up_read(&memcg_cache_ids_sem);
-}
/*
- * MIN_SIZE is different than 1, because we would like to avoid going through
- * the alloc/free process all the time. In a small machine, 4 kmem-limited
- * cgroups is a reasonable guess. In the future, it could be a parameter or
- * tunable, but that is strictly not necessary.
- *
* MAX_SIZE should be as large as the number of cgrp_ids. Ideally, we could get
* this constant directly from cgroup, but it is understandable that this is
* better kept as an internal representation in cgroup.c. In any case, the
* cgrp_id space is not getting any smaller, and we don't have to necessarily
* increase ours as well if it increases.
*/
-#define MEMCG_CACHES_MIN_SIZE 4
#define MEMCG_CACHES_MAX_SIZE MEM_CGROUP_ID_MAX
/*
@@ -2879,49 +2854,6 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
return objcg;
}
-static int memcg_alloc_cache_id(void)
-{
- int id, size;
- int err;
-
- id = ida_simple_get(&memcg_cache_ida,
- 0, MEMCG_CACHES_MAX_SIZE, GFP_KERNEL);
- if (id < 0)
- return id;
-
- if (id < memcg_nr_cache_ids)
- return id;
-
- /*
- * There's no space for the new id in memcg_caches arrays,
- * so we have to grow them.
- */
- down_write(&memcg_cache_ids_sem);
-
- size = 2 * (id + 1);
- if (size < MEMCG_CACHES_MIN_SIZE)
- size = MEMCG_CACHES_MIN_SIZE;
- else if (size > MEMCG_CACHES_MAX_SIZE)
- size = MEMCG_CACHES_MAX_SIZE;
-
- err = memcg_update_all_list_lrus(size);
- if (!err)
- memcg_nr_cache_ids = size;
-
- up_write(&memcg_cache_ids_sem);
-
- if (err) {
- ida_simple_remove(&memcg_cache_ida, id);
- return err;
- }
- return id;
-}
-
-static void memcg_free_cache_id(int id)
-{
- ida_simple_remove(&memcg_cache_ida, id);
-}
-
/*
* obj_cgroup_uncharge_pages: uncharge a number of kernel pages from a objcg
* @objcg: object cgroup to uncharge
@@ -3599,13 +3531,14 @@ static int memcg_online_kmem(struct mem_cgroup *memcg)
if (unlikely(mem_cgroup_is_root(memcg)))
return 0;
- memcg_id = memcg_alloc_cache_id();
+ memcg_id = ida_alloc_max(&memcg_cache_ida, MEMCG_CACHES_MAX_SIZE - 1,
+ GFP_KERNEL);
if (memcg_id < 0)
return memcg_id;
objcg = obj_cgroup_alloc();
if (!objcg) {
- memcg_free_cache_id(memcg_id);
+ ida_free(&memcg_cache_ida, memcg_id);
return -ENOMEM;
}
objcg->memcg = memcg;
@@ -3643,7 +3576,7 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg)
memcg_reparent_list_lrus(memcg, parent);
- memcg_free_cache_id(kmemcg_id);
+ ida_free(&memcg_cache_ida, kmemcg_id);
}
#else
static int memcg_online_kmem(struct mem_cgroup *memcg)
--
2.11.0
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Signed-off-by: Muchun Song <[email protected]>
---
net/sunrpc/rpc_pipe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index ee5336d73fdd..7ed4accc480d 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -197,7 +197,7 @@ static struct inode *
rpc_alloc_inode(struct super_block *sb)
{
struct rpc_inode *rpci;
- rpci = kmem_cache_alloc(rpc_inode_cachep, GFP_KERNEL);
+ rpci = alloc_inode_sb(sb, rpc_inode_cachep, GFP_KERNEL);
if (!rpci)
return NULL;
return &rpci->vfs_inode;
--
2.11.0
The memcg_cache_id is introduced by commit 2633d7a02823 ("slab/slub:
consider a memcg parameter in kmem_create_cache"). It is used to index
in the kmem_cache->memcg_params->memcg_caches array. Since
kmem_cache->memcg_params.memcg_caches has been removed by commit
9855609bde03 ("mm: memcg/slab: use a single set of kmem_caches for
all accounted allocations"). So the name does not need to reflect cache
related. Just rename it to memcg_kmem_id. And it can reflect kmem
related.
Signed-off-by: Muchun Song <[email protected]>
---
include/linux/memcontrol.h | 4 ++--
mm/list_lru.c | 14 +++++++-------
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 33f6ec4783f8..6541ec768a60 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1713,7 +1713,7 @@ static inline void memcg_kmem_uncharge_page(struct page *page, int order)
* A helper for accessing memcg's kmem_id, used for getting
* corresponding LRU lists.
*/
-static inline int memcg_cache_id(struct mem_cgroup *memcg)
+static inline int memcg_kmem_id(struct mem_cgroup *memcg)
{
return memcg ? memcg->kmemcg_id : -1;
}
@@ -1751,7 +1751,7 @@ static inline bool memcg_kmem_enabled(void)
return false;
}
-static inline int memcg_cache_id(struct mem_cgroup *memcg)
+static inline int memcg_kmem_id(struct mem_cgroup *memcg)
{
return -1;
}
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 371097ee2485..8fb38dee0e99 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -74,7 +74,7 @@ list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr,
if (!memcg)
goto out;
- l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg));
+ l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
out:
if (memcg_ptr)
*memcg_ptr = memcg;
@@ -181,7 +181,7 @@ unsigned long list_lru_count_one(struct list_lru *lru,
long count = 0;
rcu_read_lock();
- l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg));
+ l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
if (l)
count = READ_ONCE(l->nr_items);
rcu_read_unlock();
@@ -273,7 +273,7 @@ list_lru_walk_one(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
unsigned long ret;
spin_lock(&nlru->lock);
- ret = __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate,
+ ret = __list_lru_walk_one(lru, nid, memcg_kmem_id(memcg), isolate,
cb_arg, nr_to_walk);
spin_unlock(&nlru->lock);
return ret;
@@ -289,7 +289,7 @@ list_lru_walk_one_irq(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
unsigned long ret;
spin_lock_irq(&nlru->lock);
- ret = __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate,
+ ret = __list_lru_walk_one(lru, nid, memcg_kmem_id(memcg), isolate,
cb_arg, nr_to_walk);
spin_unlock_irq(&nlru->lock);
return ret;
@@ -463,7 +463,7 @@ void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *paren
static bool memcg_list_lru_skip_alloc(struct list_lru *lru,
struct mem_cgroup *memcg)
{
- int idx = memcg_cache_id(memcg);
+ int idx = memcg_kmem_id(memcg);
if (unlikely(idx < 0) || xa_load(&lru->xa, idx))
return true;
@@ -518,7 +518,7 @@ int list_lru_memcg_alloc(struct list_lru *lru, struct mem_cgroup *memcg, gfp_t g
xas_lock_irqsave(&xas, flags);
while (i--) {
- int index = memcg_cache_id(table[i].memcg);
+ int index = memcg_kmem_id(table[i].memcg);
struct list_lru_memcg *mlru = table[i].mlru;
xas_set(&xas, index);
@@ -538,7 +538,7 @@ int list_lru_memcg_alloc(struct list_lru *lru, struct mem_cgroup *memcg, gfp_t g
* memcg id. More details see the comments
* in memcg_reparent_list_lrus().
*/
- index = memcg_cache_id(table[i].memcg);
+ index = memcg_kmem_id(table[i].memcg);
if (index < 0)
xas_set_err(&xas, 0);
else if (!xas_error(&xas) && index != xas.xa_index)
--
2.11.0
The idr_alloc() does not include @max ID. So in the current implementation,
the maximum memcg ID is 65534 instead of 65535. It seems a bug. So fix this.
Signed-off-by: Muchun Song <[email protected]>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e3a2e4d65cc5..28f0aa0a2ce5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5011,7 +5011,7 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
return ERR_PTR(error);
memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
- MEM_CGROUP_ID_MIN, MEM_CGROUP_ID_MAX,
+ MEM_CGROUP_ID_MIN, MEM_CGROUP_ID_MAX + 1,
GFP_KERNEL);
if (memcg->id.id < 0) {
error = memcg->id.id;
--
2.11.0
On Tue, Sep 14, 2021 at 03:28:22PM +0800, Muchun Song wrote:
> So we have to convert to new API for all filesystems, which is done in
> one patch. Some filesystems are easy to convert (just replace
> kmem_cache_alloc() to alloc_inode_sb()), while other filesystems need to
> do more work.
From what I can tell, three are 54 file systems for which it was a
trivial one-line change, and two (f2fs and nfs42) that were a tad bit
more complex.
> In order to make it easy for maintainers of different
> filesystems to review their own maintained part, I split the patch into
> patches which are per-filesystem in this version. I am not sure if this
> is a good idea, because there is going to be more commits.
What I'd actually suggest is that you combine all of the trivial file
system changes into a single commit, and keep the two more complex
changes for f2fs and nfs42 in separate commits.
Acked-by: Theodore Ts'o <[email protected]>
... for the ext4 related change.
- Ted
On Wed, Sep 15, 2021 at 4:23 AM Theodore Ts'o <[email protected]> wrote:
>
> On Tue, Sep 14, 2021 at 03:28:22PM +0800, Muchun Song wrote:
> > So we have to convert to new API for all filesystems, which is done in
> > one patch. Some filesystems are easy to convert (just replace
> > kmem_cache_alloc() to alloc_inode_sb()), while other filesystems need to
> > do more work.
>
> From what I can tell, three are 54 file systems for which it was a
> trivial one-line change, and two (f2fs and nfs42) that were a tad bit
> more complex.
Definitely right. Thanks for your clarification.
>
> > In order to make it easy for maintainers of different
> > filesystems to review their own maintained part, I split the patch into
> > patches which are per-filesystem in this version. I am not sure if this
> > is a good idea, because there is going to be more commits.
>
> What I'd actually suggest is that you combine all of the trivial file
> system changes into a single commit, and keep the two more complex
> changes for f2fs and nfs42 in separate commits.
Got it. Will do in the next version.
>
> Acked-by: Theodore Ts'o <[email protected]>
Thanks.
>
> ... for the ext4 related change.
>
> - Ted
>
On Tue, Sep 14, 2021 at 03:28:22PM +0800, Muchun Song wrote:
> We introduced alloc_inode_sb() in previous version 2, which sets up the
> inode reclaim context properly, to allocate filesystems specific inode.
> So we have to convert to new API for all filesystems, which is done in
> one patch. Some filesystems are easy to convert (just replace
> kmem_cache_alloc() to alloc_inode_sb()), while other filesystems need to
> do more work. In order to make it easy for maintainers of different
> filesystems to review their own maintained part, I split the patch into
> patches which are per-filesystem in this version. I am not sure if this
> is a good idea, because there is going to be more commits.
>
> In our server, we found a suspected memory leak problem. The kmalloc-32
> consumes more than 6GB of memory. Other kmem_caches consume less than 2GB
> memory.
>
> After our in-depth analysis, the memory consumption of kmalloc-32 slab
> cache is the cause of list_lru_one allocation.
>
> crash> p memcg_nr_cache_ids
> memcg_nr_cache_ids = $2 = 24574
>
> memcg_nr_cache_ids is very large and memory consumption of each list_lru
> can be calculated with the following formula.
>
> num_numa_node * memcg_nr_cache_ids * 32 (kmalloc-32)
>
> There are 4 numa nodes in our system, so each list_lru consumes ~3MB.
>
> crash> list super_blocks | wc -l
> 952
>
> Every mount will register 2 list lrus, one is for inode, another is for
> dentry. There are 952 super_blocks. So the total memory is 952 * 2 * 3
> MB (~5.6GB). But now the number of memory cgroups is less than 500. So I
> guess more than 12286 memory cgroups have been created on this machine (I
> do not know why there are so many cgroups, it may be a user's bug or
> the user really want to do that). Because memcg_nr_cache_ids has not been
> reduced to a suitable value. It leads to waste a lot of memory. If we want
> to reduce memcg_nr_cache_ids, we have to *reboot* the server. This is not
> what we want.
>
> In order to reduce memcg_nr_cache_ids, I had posted a patchset [1] to do
> this. But this did not fundamentally solve the problem.
>
> We currently allocate scope for every memcg to be able to tracked on every
> superblock instantiated in the system, regardless of whether that superblock
> is even accessible to that memcg.
>
> These huge memcg counts come from container hosts where memcgs are confined
> to just a small subset of the total number of superblocks that instantiated
> at any given point in time.
>
> For these systems with huge container counts, list_lru does not need the
> capability of tracking every memcg on every superblock.
>
> What it comes down to is that the list_lru is only needed for a given memcg
> if that memcg is instatiating and freeing objects on a given list_lru.
>
> As Dave said, "Which makes me think we should be moving more towards 'add the
> memcg to the list_lru at the first insert' model rather than 'instantiate
> all at memcg init time just in case'."
>
> This patchset aims to optimize the list lru memory consumption from different
> aspects.
>
> Patch 1-6 are code simplification.
> Patch 7 converts the array from per-memcg per-node to per-memcg
> Patch 8 introduces kmem_cache_alloc_lru()
> Patch 9 introduces alloc_inode_sb()
> Patch 10-66 convert all filesystems to alloc_inode_sb() respectively.
There is now days also ntfs3. If you do not plan to convert this please
CC me atleast so that I can do it when these lands.
Argillander
> Patch 70 let list_lru allocation dynamically.
> Patch 72 use xarray to optimize per memcg pointer array size.
> Patch 73-76 is code simplification.
>
> I had done a easy test to show the optimization. I create 10k memory cgroups
> and mount 10k filesystems in the systems. We use free command to show how many
> memory does the systems comsumes after this operation (There are 2 numa nodes
> in the system).
>
> +-----------------------+------------------------+
> | condition | memory consumption |
> +-----------------------+------------------------+
> | without this patchset | 24464 MB |
> +-----------------------+------------------------+
> | after patch 7 | 21957 MB | <--------+
> +-----------------------+------------------------+ |
> | after patch 70 | 6895 MB | |
> +-----------------------+------------------------+ |
> | after patch 72 | 4367 MB | |
> +-----------------------+------------------------+ |
> |
> The more the number of nodes, the more obvious the effect---+
>
> BTW, there was a recent discussion [2] on the same issue.
>
> [1] https://lore.kernel.org/linux-fsdevel/[email protected]/
> [2] https://lore.kernel.org/linux-fsdevel/[email protected]/
>
> This series not only optimizes the memory usage of list_lru but also
> simplifies the code.
>
> Changelog in v3:
> - Fix mixing advanced and normal XArray concepts (Thanks to Matthew).
> - Split one patch into per-filesystem patches.
>
> Changelog in v2:
> - Update Documentation/filesystems/porting.rst suggested by Dave.
> - Add a comment above alloc_inode_sb() suggested by Dave.
> - Rework some patch's commit log.
> - Add patch 18-21.
>
> Thanks Dave.
>
> Muchun Song (76):
> mm: list_lru: fix the return value of list_lru_count_one()
> mm: memcontrol: remove kmemcg_id reparenting
> mm: memcontrol: remove the kmem states
> mm: memcontrol: move memcg_online_kmem() to mem_cgroup_css_online()
> mm: list_lru: remove holding lru lock
> mm: list_lru: only add memcg-aware lrus to the global lru list
> mm: list_lru: optimize memory consumption of arrays
> mm: introduce kmem_cache_alloc_lru
> fs: introduce alloc_inode_sb() to allocate filesystems specific inode
> dax: allocate inode by using alloc_inode_sb()
> 9p: allocate inode by using alloc_inode_sb()
> adfs: allocate inode by using alloc_inode_sb()
> affs: allocate inode by using alloc_inode_sb()
> afs: allocate inode by using alloc_inode_sb()
> befs: allocate inode by using alloc_inode_sb()
> bfs: allocate inode by using alloc_inode_sb()
> block: allocate inode by using alloc_inode_sb()
> btrfs: allocate inode by using alloc_inode_sb()
> ceph: allocate inode by using alloc_inode_sb()
> cifs: allocate inode by using alloc_inode_sb()
> coda: allocate inode by using alloc_inode_sb()
> ecryptfs: allocate inode by using alloc_inode_sb()
> efs: allocate inode by using alloc_inode_sb()
> erofs: allocate inode by using alloc_inode_sb()
> exfat: allocate inode by using alloc_inode_sb()
> ext2: allocate inode by using alloc_inode_sb()
> ext4: allocate inode by using alloc_inode_sb()
> fat: allocate inode by using alloc_inode_sb()
> freevxfs: allocate inode by using alloc_inode_sb()
> fuse: allocate inode by using alloc_inode_sb()
> gfs2: allocate inode by using alloc_inode_sb()
> hfs: allocate inode by using alloc_inode_sb()
> hfsplus: allocate inode by using alloc_inode_sb()
> hostfs: allocate inode by using alloc_inode_sb()
> hpfs: allocate inode by using alloc_inode_sb()
> hugetlbfs: allocate inode by using alloc_inode_sb()
> isofs: allocate inode by using alloc_inode_sb()
> jffs2: allocate inode by using alloc_inode_sb()
> jfs: allocate inode by using alloc_inode_sb()
> minix: allocate inode by using alloc_inode_sb()
> nfs: allocate inode by using alloc_inode_sb()
> nilfs2: allocate inode by using alloc_inode_sb()
> ntfs: allocate inode by using alloc_inode_sb()
> ocfs2: allocate inode by using alloc_inode_sb()
> openpromfs: allocate inode by using alloc_inode_sb()
> orangefs: allocate inode by using alloc_inode_sb()
> overlayfs: allocate inode by using alloc_inode_sb()
> proc: allocate inode by using alloc_inode_sb()
> qnx4: allocate inode by using alloc_inode_sb()
> qnx6: allocate inode by using alloc_inode_sb()
> reiserfs: allocate inode by using alloc_inode_sb()
> romfs: allocate inode by using alloc_inode_sb()
> squashfs: allocate inode by using alloc_inode_sb()
> sysv: allocate inode by using alloc_inode_sb()
> ubifs: allocate inode by using alloc_inode_sb()
> udf: allocate inode by using alloc_inode_sb()
> ufs: allocate inode by using alloc_inode_sb()
> vboxsf: allocate inode by using alloc_inode_sb()
> xfs: allocate inode by using alloc_inode_sb()
> zonefs: allocate inode by using alloc_inode_sb()
> ipc: allocate inode by using alloc_inode_sb()
> shmem: allocate inode by using alloc_inode_sb()
> net: allocate inode by using alloc_inode_sb()
> rpc: allocate inode by using alloc_inode_sb()
> f2fs: allocate inode by using alloc_inode_sb()
> nfs42: use a specific kmem_cache to allocate nfs4_xattr_entry
> mm: dcache: use kmem_cache_alloc_lru() to allocate dentry
> xarray: use kmem_cache_alloc_lru to allocate xa_node
> mm: workingset: use xas_set_lru() to pass shadow_nodes
> mm: list_lru: allocate list_lru_one only when needed
> mm: list_lru: rename memcg_drain_all_list_lrus to
> memcg_reparent_list_lrus
> mm: list_lru: replace linear array with xarray
> mm: memcontrol: reuse memory cgroup ID for kmem ID
> mm: memcontrol: fix cannot alloc the maximum memcg ID
> mm: list_lru: rename list_lru_per_memcg to list_lru_memcg
> mm: memcontrol: rename memcg_cache_id to memcg_kmem_id
>
> Documentation/filesystems/porting.rst | 5 +
> drivers/dax/super.c | 2 +-
> fs/9p/vfs_inode.c | 2 +-
> fs/adfs/super.c | 2 +-
> fs/affs/super.c | 2 +-
> fs/afs/super.c | 2 +-
> fs/befs/linuxvfs.c | 2 +-
> fs/bfs/inode.c | 2 +-
> fs/block_dev.c | 2 +-
> fs/btrfs/inode.c | 2 +-
> fs/ceph/inode.c | 2 +-
> fs/cifs/cifsfs.c | 2 +-
> fs/coda/inode.c | 2 +-
> fs/dcache.c | 3 +-
> fs/ecryptfs/super.c | 2 +-
> fs/efs/super.c | 2 +-
> fs/erofs/super.c | 2 +-
> fs/exfat/super.c | 2 +-
> fs/ext2/super.c | 2 +-
> fs/ext4/super.c | 2 +-
> fs/f2fs/super.c | 8 +-
> fs/fat/inode.c | 2 +-
> fs/freevxfs/vxfs_super.c | 2 +-
> fs/fuse/inode.c | 2 +-
> fs/gfs2/super.c | 2 +-
> fs/hfs/super.c | 2 +-
> fs/hfsplus/super.c | 2 +-
> fs/hostfs/hostfs_kern.c | 2 +-
> fs/hpfs/super.c | 2 +-
> fs/hugetlbfs/inode.c | 2 +-
> fs/inode.c | 2 +-
> fs/isofs/inode.c | 2 +-
> fs/jffs2/super.c | 2 +-
> fs/jfs/super.c | 2 +-
> fs/minix/inode.c | 2 +-
> fs/nfs/inode.c | 2 +-
> fs/nfs/nfs42xattr.c | 95 ++++---
> fs/nilfs2/super.c | 2 +-
> fs/ntfs/inode.c | 2 +-
> fs/ocfs2/dlmfs/dlmfs.c | 2 +-
> fs/ocfs2/super.c | 2 +-
> fs/openpromfs/inode.c | 2 +-
> fs/orangefs/super.c | 2 +-
> fs/overlayfs/super.c | 2 +-
> fs/proc/inode.c | 2 +-
> fs/qnx4/inode.c | 2 +-
> fs/qnx6/inode.c | 2 +-
> fs/reiserfs/super.c | 2 +-
> fs/romfs/super.c | 2 +-
> fs/squashfs/super.c | 2 +-
> fs/sysv/inode.c | 2 +-
> fs/ubifs/super.c | 2 +-
> fs/udf/super.c | 2 +-
> fs/ufs/super.c | 2 +-
> fs/vboxsf/super.c | 2 +-
> fs/xfs/xfs_icache.c | 2 +-
> fs/zonefs/super.c | 2 +-
> include/linux/fs.h | 11 +
> include/linux/list_lru.h | 16 +-
> include/linux/memcontrol.h | 49 ++--
> include/linux/slab.h | 3 +
> include/linux/swap.h | 5 +-
> include/linux/xarray.h | 9 +-
> ipc/mqueue.c | 2 +-
> lib/xarray.c | 10 +-
> mm/list_lru.c | 472 ++++++++++++++++------------------
> mm/memcontrol.c | 190 ++------------
> mm/shmem.c | 2 +-
> mm/slab.c | 39 ++-
> mm/slab.h | 17 +-
> mm/slob.c | 6 +
> mm/slub.c | 42 ++-
> mm/workingset.c | 2 +-
> net/socket.c | 2 +-
> net/sunrpc/rpc_pipe.c | 2 +-
> 75 files changed, 498 insertions(+), 598 deletions(-)
>
> --
> 2.11.0
>
On Sat, Sep 18, 2021 at 2:56 PM Kari Argillander
<[email protected]> wrote:
>
> On Tue, Sep 14, 2021 at 03:28:22PM +0800, Muchun Song wrote:
> > We introduced alloc_inode_sb() in previous version 2, which sets up the
> > inode reclaim context properly, to allocate filesystems specific inode.
> > So we have to convert to new API for all filesystems, which is done in
> > one patch. Some filesystems are easy to convert (just replace
> > kmem_cache_alloc() to alloc_inode_sb()), while other filesystems need to
> > do more work. In order to make it easy for maintainers of different
> > filesystems to review their own maintained part, I split the patch into
> > patches which are per-filesystem in this version. I am not sure if this
> > is a good idea, because there is going to be more commits.
> >
> > In our server, we found a suspected memory leak problem. The kmalloc-32
> > consumes more than 6GB of memory. Other kmem_caches consume less than 2GB
> > memory.
> >
> > After our in-depth analysis, the memory consumption of kmalloc-32 slab
> > cache is the cause of list_lru_one allocation.
> >
> > crash> p memcg_nr_cache_ids
> > memcg_nr_cache_ids = $2 = 24574
> >
> > memcg_nr_cache_ids is very large and memory consumption of each list_lru
> > can be calculated with the following formula.
> >
> > num_numa_node * memcg_nr_cache_ids * 32 (kmalloc-32)
> >
> > There are 4 numa nodes in our system, so each list_lru consumes ~3MB.
> >
> > crash> list super_blocks | wc -l
> > 952
> >
> > Every mount will register 2 list lrus, one is for inode, another is for
> > dentry. There are 952 super_blocks. So the total memory is 952 * 2 * 3
> > MB (~5.6GB). But now the number of memory cgroups is less than 500. So I
> > guess more than 12286 memory cgroups have been created on this machine (I
> > do not know why there are so many cgroups, it may be a user's bug or
> > the user really want to do that). Because memcg_nr_cache_ids has not been
> > reduced to a suitable value. It leads to waste a lot of memory. If we want
> > to reduce memcg_nr_cache_ids, we have to *reboot* the server. This is not
> > what we want.
> >
> > In order to reduce memcg_nr_cache_ids, I had posted a patchset [1] to do
> > this. But this did not fundamentally solve the problem.
> >
> > We currently allocate scope for every memcg to be able to tracked on every
> > superblock instantiated in the system, regardless of whether that superblock
> > is even accessible to that memcg.
> >
> > These huge memcg counts come from container hosts where memcgs are confined
> > to just a small subset of the total number of superblocks that instantiated
> > at any given point in time.
> >
> > For these systems with huge container counts, list_lru does not need the
> > capability of tracking every memcg on every superblock.
> >
> > What it comes down to is that the list_lru is only needed for a given memcg
> > if that memcg is instatiating and freeing objects on a given list_lru.
> >
> > As Dave said, "Which makes me think we should be moving more towards 'add the
> > memcg to the list_lru at the first insert' model rather than 'instantiate
> > all at memcg init time just in case'."
> >
> > This patchset aims to optimize the list lru memory consumption from different
> > aspects.
> >
> > Patch 1-6 are code simplification.
> > Patch 7 converts the array from per-memcg per-node to per-memcg
> > Patch 8 introduces kmem_cache_alloc_lru()
> > Patch 9 introduces alloc_inode_sb()
> > Patch 10-66 convert all filesystems to alloc_inode_sb() respectively.
>
> There is now days also ntfs3. If you do not plan to convert this please
> CC me atleast so that I can do it when these lands.
>
> Argillander
>
Wow, a new filesystem. I didn't notice it before. I'll cover it
in the next version and Cc you if you can do a review.
Thanks for your reminder.