Hi,
Currently, each kmem active memory cgroup has its own set of kmem
caches. The caches are only used by the memory cgroup they were created
for, so when the cgroup is taken offline they must be destroyed.
However, we can't easily destroy all the caches on css offline, because
they still may contain objects accounted to the cgroup. Actually, we
don't bother destroying busy caches on css offline at all, effectively
leaking them. To make this scheme work as it was intended to, we have to
introduce a kind of asynchronous caches destruction, which is going to
be quite a complex stuff, because we'd have to handle a lot of various
race conditions. And even if we manage to solve them all, kmem caches
created for memory cgroups that are now dead will be dangling
indefinitely long wasting memory.
In this patch set I implement a different approach, which can be
described by the following statements:
1. Never destroy per memcg kmem caches (except the root cache is
destroyed, of course).
2. Reuse kmemcg_id and therefore the set of per memcg kmem caches left
from a dead memory cgroup.
3. After allocating a kmem object, check if the slab is accounted to
the proper (i.e. current) memory cgroup. If it doesn't recharge it.
The benefits are:
- It's much simpler than what we have now, even though the current
implementation is incomplete.
- The number of per cgroup caches of the same kind cannot be be greater
than the maximal number of online kmem active memory cgroups that
have ever existed simultaneously. Currently it is unlimited, which is
really bad.
- Once a new memory cgroup starts using a cache that was used by a dead
cgroup before, it will be recharging slabs accounted to the dead
cgroup while allocating objects from the cache. Therefore all
references to the old cgroup will be put sooner or later, and it will
be freed. Currently, cgroups that have kmem objects accounted to them
on css offline leak for good.
This patch set is based on v3.18-rc2-mmotm-2014-10-29-14-19 with the
following patches by Johannes applied on top:
[patch] mm: memcontrol: remove stale page_cgroup_lock comment
[patch 1/3] mm: embed the memcg pointer directly into struct page
[patch 2/3] mm: page_cgroup: rename file to mm/swap_cgroup.c
[patch 3/3] mm: move page->mem_cgroup bad page handling into generic code
Thanks,
Vladimir Davydov (8):
memcg: do not destroy kmem caches on css offline
slab: charge slab pages to the current memory cgroup
memcg: decouple per memcg kmem cache from the owner memcg
memcg: zap memcg_{un}register_cache
memcg: free kmem cache id on css offline
memcg: introduce memcg_kmem_should_charge helper
slab: introduce slab_free helper
slab: recharge slab pages to the allocating memory cgroup
include/linux/memcontrol.h | 63 ++++++-----
include/linux/slab.h | 12 +-
mm/memcontrol.c | 260 ++++++++++++++------------------------------
mm/slab.c | 62 +++++++----
mm/slab.h | 28 -----
mm/slab_common.c | 66 ++++++++---
mm/slub.c | 26 +++--
7 files changed, 228 insertions(+), 289 deletions(-)
--
1.7.10.4
Basically, we substitute the reference to the owner memory cgroup in
memcg_cache_params with the index in the memcg_caches array. This
decouples kmem cache from the memcg it was created for and will allow to
reuse it for another cgroup after css offline.
Signed-off-by: Vladimir Davydov <[email protected]>
---
include/linux/slab.h | 7 +++----
mm/memcontrol.c | 18 +++++-------------
mm/slab_common.c | 12 +++++-------
3 files changed, 13 insertions(+), 24 deletions(-)
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 390341d30b2d..293c04df7953 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -117,8 +117,7 @@ struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
void (*)(void *));
#ifdef CONFIG_MEMCG_KMEM
struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *,
- struct kmem_cache *,
- const char *);
+ struct kmem_cache *);
#endif
void kmem_cache_destroy(struct kmem_cache *);
int kmem_cache_shrink(struct kmem_cache *);
@@ -490,7 +489,7 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
*
* Child caches will hold extra metadata needed for its operation. Fields are:
*
- * @memcg: pointer to the memcg this cache belongs to
+ * @id: the index in the root_cache's memcg_caches array.
* @root_cache: pointer to the global, root cache, this cache was derived from
*/
struct memcg_cache_params {
@@ -501,7 +500,7 @@ struct memcg_cache_params {
struct kmem_cache *memcg_caches[0];
};
struct {
- struct mem_cgroup *memcg;
+ int id;
struct kmem_cache *root_cache;
};
};
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8c60d7a30f4f..78d12076b01d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2603,8 +2603,6 @@ void memcg_update_array_size(int num)
static void memcg_register_cache(struct mem_cgroup *memcg,
struct kmem_cache *root_cache)
{
- static char memcg_name_buf[NAME_MAX + 1]; /* protected by
- memcg_slab_mutex */
struct kmem_cache *cachep;
int id;
@@ -2620,8 +2618,7 @@ static void memcg_register_cache(struct mem_cgroup *memcg,
if (cache_from_memcg_idx(root_cache, id))
return;
- cgroup_name(memcg->css.cgroup, memcg_name_buf, NAME_MAX + 1);
- cachep = memcg_create_kmem_cache(memcg, root_cache, memcg_name_buf);
+ cachep = memcg_create_kmem_cache(memcg, root_cache);
/*
* If we could not create a memcg cache, do not complain, because
* that's not critical at all as we can always proceed with the root
@@ -2630,8 +2627,6 @@ static void memcg_register_cache(struct mem_cgroup *memcg,
if (!cachep)
return;
- css_get(&memcg->css);
-
/*
* Since readers won't lock (see cache_from_memcg_idx()), we need a
* barrier here to ensure nobody will see the kmem_cache partially
@@ -2646,7 +2641,6 @@ static void memcg_register_cache(struct mem_cgroup *memcg,
static void memcg_unregister_cache(struct kmem_cache *cachep)
{
struct kmem_cache *root_cache;
- struct mem_cgroup *memcg;
int id;
lockdep_assert_held(&memcg_slab_mutex);
@@ -2654,16 +2648,12 @@ static void memcg_unregister_cache(struct kmem_cache *cachep)
BUG_ON(is_root_cache(cachep));
root_cache = cachep->memcg_params->root_cache;
- memcg = cachep->memcg_params->memcg;
- id = memcg_cache_id(memcg);
+ id = cachep->memcg_params->id;
BUG_ON(root_cache->memcg_params->memcg_caches[id] != cachep);
root_cache->memcg_params->memcg_caches[id] = NULL;
kmem_cache_destroy(cachep);
-
- /* drop the reference taken in memcg_register_cache */
- css_put(&memcg->css);
}
/*
@@ -4198,7 +4188,6 @@ static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
{
int ret;
- memcg->kmemcg_id = -1;
ret = memcg_propagate_kmem(memcg);
if (ret)
return ret;
@@ -4743,6 +4732,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
vmpressure_init(&memcg->vmpressure);
INIT_LIST_HEAD(&memcg->event_list);
spin_lock_init(&memcg->event_list_lock);
+#ifdef CONFIG_MEMCG_KMEM
+ memcg->kmemcg_id = -1;
+#endif
return &memcg->css;
diff --git a/mm/slab_common.c b/mm/slab_common.c
index d5e9e050a3ec..974d77db1b39 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -125,7 +125,7 @@ static int memcg_alloc_cache_params(struct mem_cgroup *memcg,
return -ENOMEM;
if (memcg) {
- s->memcg_params->memcg = memcg;
+ s->memcg_params->id = memcg_cache_id(memcg);
s->memcg_params->root_cache = root_cache;
} else
s->memcg_params->is_root_cache = true;
@@ -426,15 +426,13 @@ EXPORT_SYMBOL(kmem_cache_create);
* memcg_create_kmem_cache - Create a cache for a memory cgroup.
* @memcg: The memory cgroup the new cache is for.
* @root_cache: The parent of the new cache.
- * @memcg_name: The name of the memory cgroup (used for naming the new cache).
*
* This function attempts to create a kmem cache that will serve allocation
* requests going from @memcg to @root_cache. The new cache inherits properties
* from its parent.
*/
struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
- struct kmem_cache *root_cache,
- const char *memcg_name)
+ struct kmem_cache *root_cache)
{
struct kmem_cache *s = NULL;
char *cache_name;
@@ -444,8 +442,8 @@ struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
mutex_lock(&slab_mutex);
- cache_name = kasprintf(GFP_KERNEL, "%s(%d:%s)", root_cache->name,
- memcg_cache_id(memcg), memcg_name);
+ cache_name = kasprintf(GFP_KERNEL, "%s(%d)", root_cache->name,
+ memcg_cache_id(memcg));
if (!cache_name)
goto out_unlock;
@@ -912,7 +910,7 @@ int memcg_slab_show(struct seq_file *m, void *p)
if (p == slab_caches.next)
print_slabinfo_header(m);
- if (!is_root_cache(s) && s->memcg_params->memcg == memcg)
+ if (!is_root_cache(s) && s->memcg_params->id == memcg_cache_id(memcg))
cache_show(s, m);
return 0;
}
--
1.7.10.4
We use the same set of checks in both memcg_kmem_newpage_charge and
memcg_kmem_get_cache, and I need it in yet another function, which will
be introduced by one of the following patches. So let's introduce a
helper function for it.
Signed-off-by: Vladimir Davydov <[email protected]>
---
include/linux/memcontrol.h | 43 ++++++++++++++++++++++---------------------
1 file changed, 22 insertions(+), 21 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 617652712da8..224c045fd37f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -416,6 +416,26 @@ void memcg_update_array_size(int num_groups);
struct kmem_cache *
__memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
+static __always_inline bool memcg_kmem_should_charge(gfp_t gfp)
+{
+ /*
+ * __GFP_NOFAIL allocations will move on even if charging is not
+ * possible. Therefore we don't even try, and have this allocation
+ * unaccounted. We could in theory charge it forcibly, but we hope
+ * those allocations are rare, and won't be worth the trouble.
+ */
+ if (gfp & __GFP_NOFAIL)
+ return false;
+ if (in_interrupt())
+ return false;
+ if (!current->mm || (current->flags & PF_KTHREAD))
+ return false;
+ /* If the test is dying, just let it go. */
+ if (unlikely(fatal_signal_pending(current)))
+ return false;
+ return true;
+}
+
/**
* memcg_kmem_newpage_charge: verify if a new kmem allocation is allowed.
* @gfp: the gfp allocation flags.
@@ -433,22 +453,8 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order)
{
if (!memcg_kmem_enabled())
return true;
-
- /*
- * __GFP_NOFAIL allocations will move on even if charging is not
- * possible. Therefore we don't even try, and have this allocation
- * unaccounted. We could in theory charge it forcibly, but we hope
- * those allocations are rare, and won't be worth the trouble.
- */
- if (gfp & __GFP_NOFAIL)
- return true;
- if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD))
- return true;
-
- /* If the test is dying, just let it go. */
- if (unlikely(fatal_signal_pending(current)))
+ if (!memcg_kmem_should_charge(gfp))
return true;
-
return __memcg_kmem_newpage_charge(gfp, memcg, order);
}
@@ -491,13 +497,8 @@ memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
{
if (!memcg_kmem_enabled())
return cachep;
- if (gfp & __GFP_NOFAIL)
- return cachep;
- if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD))
- return cachep;
- if (unlikely(fatal_signal_pending(current)))
+ if (!memcg_kmem_should_charge(gfp))
return cachep;
-
return __memcg_kmem_get_cache(cachep, gfp);
}
#else
--
1.7.10.4
No need in these helpers any more. We can do the stuff in
memcg_create_kmem_cache and kmem_cache_destroy.
Signed-off-by: Vladimir Davydov <[email protected]>
---
include/linux/memcontrol.h | 2 -
include/linux/slab.h | 3 +-
mm/memcontrol.c | 115 ++++++--------------------------------------
mm/slab_common.c | 60 ++++++++++++++++++-----
4 files changed, 64 insertions(+), 116 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 31b495ff5f3a..617652712da8 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -416,8 +416,6 @@ void memcg_update_array_size(int num_groups);
struct kmem_cache *
__memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
-int __memcg_cleanup_cache_params(struct kmem_cache *s);
-
/**
* memcg_kmem_newpage_charge: verify if a new kmem allocation is allowed.
* @gfp: the gfp allocation flags.
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 293c04df7953..411b25f95ed8 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -116,8 +116,7 @@ struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
unsigned long,
void (*)(void *));
#ifdef CONFIG_MEMCG_KMEM
-struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *,
- struct kmem_cache *);
+void memcg_create_kmem_cache(struct mem_cgroup *, struct kmem_cache *);
#endif
void kmem_cache_destroy(struct kmem_cache *);
int kmem_cache_shrink(struct kmem_cache *);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 78d12076b01d..923fe4c29e92 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2484,12 +2484,6 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg,
}
#ifdef CONFIG_MEMCG_KMEM
-/*
- * The memcg_slab_mutex is held whenever a per memcg kmem cache is created or
- * destroyed. It protects memcg_caches arrays.
- */
-static DEFINE_MUTEX(memcg_slab_mutex);
-
static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp,
unsigned long nr_pages)
{
@@ -2574,10 +2568,7 @@ static int memcg_alloc_cache_id(void)
else if (size > MEMCG_CACHES_MAX_SIZE)
size = MEMCG_CACHES_MAX_SIZE;
- mutex_lock(&memcg_slab_mutex);
err = memcg_update_all_caches(size);
- mutex_unlock(&memcg_slab_mutex);
-
if (err) {
ida_simple_remove(&kmem_limited_groups, id);
return err;
@@ -2600,62 +2591,6 @@ void memcg_update_array_size(int num)
memcg_limited_groups_array_size = num;
}
-static void memcg_register_cache(struct mem_cgroup *memcg,
- struct kmem_cache *root_cache)
-{
- struct kmem_cache *cachep;
- int id;
-
- lockdep_assert_held(&memcg_slab_mutex);
-
- id = memcg_cache_id(memcg);
-
- /*
- * Since per-memcg caches are created asynchronously on first
- * allocation (see memcg_kmem_get_cache()), several threads can try to
- * create the same cache, but only one of them may succeed.
- */
- if (cache_from_memcg_idx(root_cache, id))
- return;
-
- cachep = memcg_create_kmem_cache(memcg, root_cache);
- /*
- * If we could not create a memcg cache, do not complain, because
- * that's not critical at all as we can always proceed with the root
- * cache.
- */
- if (!cachep)
- return;
-
- /*
- * Since readers won't lock (see cache_from_memcg_idx()), we need a
- * barrier here to ensure nobody will see the kmem_cache partially
- * initialized.
- */
- smp_wmb();
-
- BUG_ON(root_cache->memcg_params->memcg_caches[id]);
- root_cache->memcg_params->memcg_caches[id] = cachep;
-}
-
-static void memcg_unregister_cache(struct kmem_cache *cachep)
-{
- struct kmem_cache *root_cache;
- int id;
-
- lockdep_assert_held(&memcg_slab_mutex);
-
- BUG_ON(is_root_cache(cachep));
-
- root_cache = cachep->memcg_params->root_cache;
- id = cachep->memcg_params->id;
-
- BUG_ON(root_cache->memcg_params->memcg_caches[id] != cachep);
- root_cache->memcg_params->memcg_caches[id] = NULL;
-
- kmem_cache_destroy(cachep);
-}
-
/*
* During the creation a new cache, we need to disable our accounting mechanism
* altogether. This is true even if we are not creating, but rather just
@@ -2687,42 +2622,20 @@ static inline void memcg_resume_kmem_account(void)
current->memcg_kmem_skip_account--;
}
-int __memcg_cleanup_cache_params(struct kmem_cache *s)
-{
- struct kmem_cache *c;
- int i, failed = 0;
-
- mutex_lock(&memcg_slab_mutex);
- for_each_memcg_cache_index(i) {
- c = cache_from_memcg_idx(s, i);
- if (!c)
- continue;
-
- memcg_unregister_cache(c);
-
- if (cache_from_memcg_idx(s, i))
- failed++;
- }
- mutex_unlock(&memcg_slab_mutex);
- return failed;
-}
-
-struct memcg_register_cache_work {
+struct memcg_cache_create_work {
struct mem_cgroup *memcg;
struct kmem_cache *cachep;
struct work_struct work;
};
-static void memcg_register_cache_func(struct work_struct *w)
+static void memcg_cache_create_work_fn(struct work_struct *w)
{
- struct memcg_register_cache_work *cw =
- container_of(w, struct memcg_register_cache_work, work);
+ struct memcg_cache_create_work *cw = container_of(w,
+ struct memcg_cache_create_work, work);
struct mem_cgroup *memcg = cw->memcg;
struct kmem_cache *cachep = cw->cachep;
- mutex_lock(&memcg_slab_mutex);
- memcg_register_cache(memcg, cachep);
- mutex_unlock(&memcg_slab_mutex);
+ memcg_create_kmem_cache(memcg, cachep);
css_put(&memcg->css);
kfree(cw);
@@ -2731,10 +2644,10 @@ static void memcg_register_cache_func(struct work_struct *w)
/*
* Enqueue the creation of a per-memcg kmem_cache.
*/
-static void __memcg_schedule_register_cache(struct mem_cgroup *memcg,
- struct kmem_cache *cachep)
+static void __memcg_schedule_cache_create(struct mem_cgroup *memcg,
+ struct kmem_cache *cachep)
{
- struct memcg_register_cache_work *cw;
+ struct memcg_cache_create_work *cw;
cw = kmalloc(sizeof(*cw), GFP_NOWAIT);
if (cw == NULL) {
@@ -2745,17 +2658,17 @@ static void __memcg_schedule_register_cache(struct mem_cgroup *memcg,
cw->memcg = memcg;
cw->cachep = cachep;
- INIT_WORK(&cw->work, memcg_register_cache_func);
+ INIT_WORK(&cw->work, memcg_cache_create_work_fn);
schedule_work(&cw->work);
}
-static void memcg_schedule_register_cache(struct mem_cgroup *memcg,
- struct kmem_cache *cachep)
+static void memcg_schedule_cache_create(struct mem_cgroup *memcg,
+ struct kmem_cache *cachep)
{
/*
* We need to stop accounting when we kmalloc, because if the
* corresponding kmalloc cache is not yet created, the first allocation
- * in __memcg_schedule_register_cache will recurse.
+ * in __memcg_schedule_cache_create will recurse.
*
* However, it is better to enclose the whole function. Depending on
* the debugging options enabled, INIT_WORK(), for instance, can
@@ -2764,7 +2677,7 @@ static void memcg_schedule_register_cache(struct mem_cgroup *memcg,
* the safest choice is to do it like this, wrapping the whole function.
*/
memcg_stop_kmem_account();
- __memcg_schedule_register_cache(memcg, cachep);
+ __memcg_schedule_cache_create(memcg, cachep);
memcg_resume_kmem_account();
}
@@ -2822,7 +2735,7 @@ struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep,
* could happen with the slab_mutex held. So it's better to
* defer everything.
*/
- memcg_schedule_register_cache(memcg, cachep);
+ memcg_schedule_cache_create(memcg, cachep);
return cachep;
out:
rcu_read_unlock();
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 974d77db1b39..70a2ba4b4600 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -431,10 +431,11 @@ EXPORT_SYMBOL(kmem_cache_create);
* requests going from @memcg to @root_cache. The new cache inherits properties
* from its parent.
*/
-struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
- struct kmem_cache *root_cache)
+void memcg_create_kmem_cache(struct mem_cgroup *memcg,
+ struct kmem_cache *root_cache)
{
- struct kmem_cache *s = NULL;
+ int id = memcg_cache_id(memcg);
+ struct kmem_cache *s;
char *cache_name;
get_online_cpus();
@@ -442,8 +443,15 @@ struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
mutex_lock(&slab_mutex);
- cache_name = kasprintf(GFP_KERNEL, "%s(%d)", root_cache->name,
- memcg_cache_id(memcg));
+ /*
+ * Since per-memcg caches are created asynchronously on first
+ * allocation (see memcg_kmem_get_cache()), several threads can try to
+ * create the same cache, but only one of them may succeed.
+ */
+ if (cache_from_memcg_idx(root_cache, id))
+ goto out_unlock;
+
+ cache_name = kasprintf(GFP_KERNEL, "%s(%d)", root_cache->name, id);
if (!cache_name)
goto out_unlock;
@@ -453,31 +461,52 @@ struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
memcg, root_cache);
if (IS_ERR(s)) {
kfree(cache_name);
- s = NULL;
+ goto out_unlock;
}
+ /*
+ * Since readers won't lock (see cache_from_memcg_idx()), we need a
+ * barrier here to ensure nobody will see the kmem_cache partially
+ * initialized.
+ */
+ smp_wmb();
+
+ BUG_ON(root_cache->memcg_params->memcg_caches[id]);
+ root_cache->memcg_params->memcg_caches[id] = s;
+
out_unlock:
mutex_unlock(&slab_mutex);
put_online_mems();
put_online_cpus();
-
- return s;
}
static int memcg_cleanup_cache_params(struct kmem_cache *s)
{
- int rc;
+ int i;
+ int ret = 0;
if (!s->memcg_params ||
!s->memcg_params->is_root_cache)
return 0;
mutex_unlock(&slab_mutex);
- rc = __memcg_cleanup_cache_params(s);
+ for_each_memcg_cache_index(i) {
+ struct kmem_cache *c;
+
+ c = cache_from_memcg_idx(s, i);
+ if (!c)
+ continue;
+
+ kmem_cache_destroy(s);
+
+ /* failed to destroy? */
+ if (cache_from_memcg_idx(s, i))
+ ret = -EBUSY;
+ }
mutex_lock(&slab_mutex);
- return rc;
+ return ret;
}
#else
static int memcg_cleanup_cache_params(struct kmem_cache *s)
@@ -513,6 +542,15 @@ void kmem_cache_destroy(struct kmem_cache *s)
goto out_unlock;
}
+#ifdef CONFIG_MEMCG_KMEM
+ if (!is_root_cache(s)) {
+ int id = s->memcg_params->id;
+ struct kmem_cache *root_cache = s->memcg_params->root_cache;
+
+ BUG_ON(root_cache->memcg_params->memcg_caches[id] != s);
+ root_cache->memcg_params->memcg_caches[id] = NULL;
+ }
+#endif
list_del(&s->list);
mutex_unlock(&slab_mutex);
--
1.7.10.4
Since we now reuse per cgroup kmem caches, the slab we allocate an
object from may be accounted to a dead memory cgroup. If we leave such a
slab accounted to a dead cgroup, we risk pinning the cgroup forever, so
we introduce a new function, memcg_kmem_recharge_slab, which is to be
called in the end of kmalloc. It recharges the new object's slab to the
current cgroup unless it is already charged to it.
Signed-off-by: Vladimir Davydov <[email protected]>
---
include/linux/memcontrol.h | 15 +++++++++++
mm/memcontrol.c | 62 ++++++++++++++++++++++++++++++++++++++++++++
mm/slab.c | 10 +++++++
mm/slub.c | 8 ++++++
4 files changed, 95 insertions(+)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 224c045fd37f..4b0ff999605a 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -408,6 +408,7 @@ bool __memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg,
void __memcg_kmem_commit_charge(struct page *page,
struct mem_cgroup *memcg, int order);
void __memcg_kmem_uncharge_pages(struct page *page, int order);
+int __memcg_kmem_recharge_slab(void *obj, gfp_t gfp);
int memcg_cache_id(struct mem_cgroup *memcg);
@@ -501,6 +502,15 @@ memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
return cachep;
return __memcg_kmem_get_cache(cachep, gfp);
}
+
+static __always_inline int memcg_kmem_recharge_slab(void *obj, gfp_t gfp)
+{
+ if (!memcg_kmem_enabled())
+ return 0;
+ if (!memcg_kmem_should_charge(gfp))
+ return 0;
+ return __memcg_kmem_recharge_slab(obj, gfp);
+}
#else
#define for_each_memcg_cache_index(_idx) \
for (; NULL; )
@@ -535,6 +545,11 @@ memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
{
return cachep;
}
+
+static inline int memcg_kmem_recharge_slab(void *obj, gfp_t gfp)
+{
+ return 0;
+}
#endif /* CONFIG_MEMCG_KMEM */
#endif /* _LINUX_MEMCONTROL_H */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 755604079d8e..f6567627c3b1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2828,6 +2828,68 @@ void __memcg_kmem_uncharge_pages(struct page *page, int order)
memcg_uncharge_kmem(memcg, 1 << order);
page->mem_cgroup = NULL;
}
+
+/*
+ * Since we reuse per cgroup kmem caches, the slab we allocate an object from
+ * may be accounted to a dead memory cgroup. If we leave such a slab accounted
+ * to a dead cgroup, we risk pinning the cgroup forever, so this function is
+ * called in the end of kmalloc to recharge the new object's slab to the
+ * current cgroup unless it is already charged to it.
+ */
+int __memcg_kmem_recharge_slab(void *obj, gfp_t gfp)
+{
+ struct mem_cgroup *page_memcg, *memcg;
+ struct page *page;
+ int nr_pages;
+ int ret = 0;
+
+ if (current->memcg_kmem_skip_account)
+ goto out;
+
+ page = virt_to_head_page(obj);
+ page_memcg = ACCESS_ONCE(page->mem_cgroup);
+
+ rcu_read_lock();
+
+ memcg = mem_cgroup_from_task(rcu_dereference(current->mm->owner));
+ if (!memcg_kmem_is_active(memcg))
+ memcg = NULL;
+ if (likely(memcg == page_memcg))
+ goto out_unlock;
+ if (memcg && !css_tryget(&memcg->css))
+ goto out_unlock;
+
+ rcu_read_unlock();
+
+ nr_pages = 1 << compound_order(page);
+
+ if (memcg && memcg_charge_kmem(memcg, gfp, nr_pages)) {
+ ret = -ENOMEM;
+ goto out_put_memcg;
+ }
+
+ /*
+ * We use cmpxchg to synchronize against concurrent threads allocating
+ * from the same slab. If it fails, it means that some other thread
+ * recharged the slab before us, and we are done.
+ */
+ if (cmpxchg(&page->mem_cgroup, page_memcg, memcg) == page_memcg) {
+ if (page_memcg)
+ memcg_uncharge_kmem(page_memcg, nr_pages);
+ } else {
+ if (memcg)
+ memcg_uncharge_kmem(memcg, nr_pages);
+ }
+
+out_put_memcg:
+ if (memcg)
+ css_put(&memcg->css);
+ goto out;
+out_unlock:
+ rcu_read_unlock();
+out:
+ return ret;
+}
#endif /* CONFIG_MEMCG_KMEM */
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/mm/slab.c b/mm/slab.c
index 178a3b733a50..61b01c2ae1d9 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3133,6 +3133,8 @@ done:
return obj;
}
+static __always_inline void slab_free(struct kmem_cache *cachep, void *objp);
+
static __always_inline void *
slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
unsigned long caller)
@@ -3185,6 +3187,10 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
kmemcheck_slab_alloc(cachep, flags, ptr, cachep->object_size);
if (unlikely(flags & __GFP_ZERO))
memset(ptr, 0, cachep->object_size);
+ if (unlikely(memcg_kmem_recharge_slab(ptr, flags))) {
+ slab_free(cachep, ptr);
+ ptr = NULL;
+ }
}
return ptr;
@@ -3250,6 +3256,10 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
kmemcheck_slab_alloc(cachep, flags, objp, cachep->object_size);
if (unlikely(flags & __GFP_ZERO))
memset(objp, 0, cachep->object_size);
+ if (unlikely(memcg_kmem_recharge_slab(objp, flags))) {
+ slab_free(cachep, objp);
+ objp = NULL;
+ }
}
return objp;
diff --git a/mm/slub.c b/mm/slub.c
index 205eaca18b7b..28721ddea448 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2360,6 +2360,9 @@ new_slab:
return freelist;
}
+static __always_inline void slab_free(struct kmem_cache *s,
+ struct page *page, void *x, unsigned long addr);
+
/*
* Inlined fastpath so that allocation functions (kmalloc, kmem_cache_alloc)
* have the fastpath folded into their functions. So no function call
@@ -2445,6 +2448,11 @@ redo:
slab_post_alloc_hook(s, gfpflags, object);
+ if (object && unlikely(memcg_kmem_recharge_slab(object, gfpflags))) {
+ slab_free(s, virt_to_head_page(object), object, _RET_IP_);
+ object = NULL;
+ }
+
return object;
}
--
1.7.10.4
Signed-off-by: Vladimir Davydov <[email protected]>
---
mm/slab.c | 30 ++++++++++++++----------------
1 file changed, 14 insertions(+), 16 deletions(-)
diff --git a/mm/slab.c b/mm/slab.c
index a9eb49f40c0a..178a3b733a50 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3521,6 +3521,18 @@ void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller)
}
EXPORT_SYMBOL(__kmalloc_track_caller);
+static __always_inline void slab_free(struct kmem_cache *cachep, void *objp)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ debug_check_no_locks_freed(objp, cachep->object_size);
+ if (!(cachep->flags & SLAB_DEBUG_OBJECTS))
+ debug_check_no_obj_freed(objp, cachep->object_size);
+ __cache_free(cachep, objp, _RET_IP_);
+ local_irq_restore(flags);
+}
+
/**
* kmem_cache_free - Deallocate an object
* @cachep: The cache the allocation was from.
@@ -3531,18 +3543,10 @@ EXPORT_SYMBOL(__kmalloc_track_caller);
*/
void kmem_cache_free(struct kmem_cache *cachep, void *objp)
{
- unsigned long flags;
cachep = cache_from_obj(cachep, objp);
if (!cachep)
return;
-
- local_irq_save(flags);
- debug_check_no_locks_freed(objp, cachep->object_size);
- if (!(cachep->flags & SLAB_DEBUG_OBJECTS))
- debug_check_no_obj_freed(objp, cachep->object_size);
- __cache_free(cachep, objp, _RET_IP_);
- local_irq_restore(flags);
-
+ slab_free(cachep, objp);
trace_kmem_cache_free(_RET_IP_, objp);
}
EXPORT_SYMBOL(kmem_cache_free);
@@ -3559,20 +3563,14 @@ EXPORT_SYMBOL(kmem_cache_free);
void kfree(const void *objp)
{
struct kmem_cache *c;
- unsigned long flags;
trace_kfree(_RET_IP_, objp);
if (unlikely(ZERO_OR_NULL_PTR(objp)))
return;
- local_irq_save(flags);
kfree_debugcheck(objp);
c = virt_to_cache(objp);
- debug_check_no_locks_freed(objp, c->object_size);
-
- debug_check_no_obj_freed(objp, c->object_size);
- __cache_free(c, (void *)objp, _RET_IP_);
- local_irq_restore(flags);
+ slab_free(c, (void *)objp);
}
EXPORT_SYMBOL(kfree);
--
1.7.10.4
This will allow new kmem active cgroups to reuse the id and therefore
the caches used by the dead memory cgroup.
Signed-off-by: Vladimir Davydov <[email protected]>
---
mm/memcontrol.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 923fe4c29e92..755604079d8e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -605,14 +605,10 @@ int memcg_limited_groups_array_size;
struct static_key memcg_kmem_enabled_key;
EXPORT_SYMBOL(memcg_kmem_enabled_key);
-static void memcg_free_cache_id(int id);
-
static void disarm_kmem_keys(struct mem_cgroup *memcg)
{
- if (memcg_kmem_is_active(memcg)) {
+ if (memcg_kmem_is_active(memcg))
static_key_slow_dec(&memcg_kmem_enabled_key);
- memcg_free_cache_id(memcg->kmemcg_id);
- }
/*
* This check can't live in kmem destruction function,
* since the charges will outlive the cgroup
@@ -4730,6 +4726,11 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
spin_unlock(&memcg->event_list_lock);
vmpressure_cleanup(&memcg->vmpressure);
+
+#ifdef CONFIG_MEMCG_KMEM
+ if (memcg_kmem_is_active(memcg))
+ memcg_free_cache_id(memcg_cache_id(memcg));
+#endif
}
static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
--
1.7.10.4
Currently, we try to destroy per memcg kmem caches on css offline. Since
a cache can contain active objects when the memory cgroup is removed, we
can't destroy all caches immediately and therefore should introduce
asynchronous destruction for this scheme to work properly. However, this
requires a lot of trickery and complex synchronization stuff, so I'm
planning to go another way. I'm going to reuse caches left from dead
memory cgroups instead of recreating them. This patch makes the first
step in this direction: it removes caches destruction from css offline.
Signed-off-by: Vladimir Davydov <[email protected]>
---
include/linux/slab.h | 4 ----
mm/memcontrol.c | 52 ++------------------------------------------------
2 files changed, 2 insertions(+), 54 deletions(-)
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 8a2457d42fc8..390341d30b2d 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -491,9 +491,7 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
* Child caches will hold extra metadata needed for its operation. Fields are:
*
* @memcg: pointer to the memcg this cache belongs to
- * @list: list_head for the list of all caches in this memcg
* @root_cache: pointer to the global, root cache, this cache was derived from
- * @nr_pages: number of pages that belongs to this cache.
*/
struct memcg_cache_params {
bool is_root_cache;
@@ -504,9 +502,7 @@ struct memcg_cache_params {
};
struct {
struct mem_cgroup *memcg;
- struct list_head list;
struct kmem_cache *root_cache;
- atomic_t nr_pages;
};
};
};
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index af53fea9978d..370a27509e45 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -344,9 +344,6 @@ struct mem_cgroup {
struct cg_proto tcp_mem;
#endif
#if defined(CONFIG_MEMCG_KMEM)
- /* analogous to slab_common's slab_caches list, but per-memcg;
- * protected by memcg_slab_mutex */
- struct list_head memcg_slab_caches;
/* Index in the kmem_cache->memcg_params->memcg_caches array */
int kmemcg_id;
#endif
@@ -2489,23 +2486,10 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg,
#ifdef CONFIG_MEMCG_KMEM
/*
* The memcg_slab_mutex is held whenever a per memcg kmem cache is created or
- * destroyed. It protects memcg_caches arrays and memcg_slab_caches lists.
+ * destroyed. It protects memcg_caches arrays.
*/
static DEFINE_MUTEX(memcg_slab_mutex);
-/*
- * This is a bit cumbersome, but it is rarely used and avoids a backpointer
- * in the memcg_cache_params struct.
- */
-static struct kmem_cache *memcg_params_to_cache(struct memcg_cache_params *p)
-{
- struct kmem_cache *cachep;
-
- VM_BUG_ON(p->is_root_cache);
- cachep = p->root_cache;
- return cache_from_memcg_idx(cachep, memcg_cache_id(p->memcg));
-}
-
static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp,
unsigned long nr_pages)
{
@@ -2647,7 +2631,6 @@ static void memcg_register_cache(struct mem_cgroup *memcg,
return;
css_get(&memcg->css);
- list_add(&cachep->memcg_params->list, &memcg->memcg_slab_caches);
/*
* Since readers won't lock (see cache_from_memcg_idx()), we need a
@@ -2677,8 +2660,6 @@ static void memcg_unregister_cache(struct kmem_cache *cachep)
BUG_ON(root_cache->memcg_params->memcg_caches[id] != cachep);
root_cache->memcg_params->memcg_caches[id] = NULL;
- list_del(&cachep->memcg_params->list);
-
kmem_cache_destroy(cachep);
/* drop the reference taken in memcg_register_cache */
@@ -2736,24 +2717,6 @@ int __memcg_cleanup_cache_params(struct kmem_cache *s)
return failed;
}
-static void memcg_unregister_all_caches(struct mem_cgroup *memcg)
-{
- struct kmem_cache *cachep;
- struct memcg_cache_params *params, *tmp;
-
- if (!memcg_kmem_is_active(memcg))
- return;
-
- mutex_lock(&memcg_slab_mutex);
- list_for_each_entry_safe(params, tmp, &memcg->memcg_slab_caches, list) {
- cachep = memcg_params_to_cache(params);
- kmem_cache_shrink(cachep);
- if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
- memcg_unregister_cache(cachep);
- }
- mutex_unlock(&memcg_slab_mutex);
-}
-
struct memcg_register_cache_work {
struct mem_cgroup *memcg;
struct kmem_cache *cachep;
@@ -2818,12 +2781,8 @@ static void memcg_schedule_register_cache(struct mem_cgroup *memcg,
int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order)
{
unsigned int nr_pages = 1 << order;
- int res;
- res = memcg_charge_kmem(cachep->memcg_params->memcg, gfp, nr_pages);
- if (!res)
- atomic_add(nr_pages, &cachep->memcg_params->nr_pages);
- return res;
+ return memcg_charge_kmem(cachep->memcg_params->memcg, gfp, nr_pages);
}
void __memcg_uncharge_slab(struct kmem_cache *cachep, int order)
@@ -2831,7 +2790,6 @@ void __memcg_uncharge_slab(struct kmem_cache *cachep, int order)
unsigned int nr_pages = 1 << order;
memcg_uncharge_kmem(cachep->memcg_params->memcg, nr_pages);
- atomic_sub(nr_pages, &cachep->memcg_params->nr_pages);
}
/*
@@ -2985,10 +2943,6 @@ void __memcg_kmem_uncharge_pages(struct page *page, int order)
memcg_uncharge_kmem(memcg, 1 << order);
page->mem_cgroup = NULL;
}
-#else
-static inline void memcg_unregister_all_caches(struct mem_cgroup *memcg)
-{
-}
#endif /* CONFIG_MEMCG_KMEM */
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -3571,7 +3525,6 @@ static int memcg_activate_kmem(struct mem_cgroup *memcg,
}
memcg->kmemcg_id = memcg_id;
- INIT_LIST_HEAD(&memcg->memcg_slab_caches);
/*
* We couldn't have accounted to this cgroup, because it hasn't got the
@@ -4885,7 +4838,6 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
}
spin_unlock(&memcg->event_list_lock);
- memcg_unregister_all_caches(memcg);
vmpressure_cleanup(&memcg->vmpressure);
}
--
1.7.10.4
Currently, new slabs are charged to the memory cgroup that owns the
cache (kmem_cache->memcg_params->memcg), but I'm going to decouple kmem
caches from memory cgroups so I make them charged to the current cgroup.
Signed-off-by: Vladimir Davydov <[email protected]>
---
include/linux/memcontrol.h | 5 -----
mm/memcontrol.c | 14 --------------
mm/slab.c | 22 +++++++++++++++-------
mm/slab.h | 28 ----------------------------
mm/slub.c | 18 ++++++++----------
5 files changed, 23 insertions(+), 64 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e789551d4db0..31b495ff5f3a 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -416,9 +416,6 @@ void memcg_update_array_size(int num_groups);
struct kmem_cache *
__memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
-int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order);
-void __memcg_uncharge_slab(struct kmem_cache *cachep, int order);
-
int __memcg_cleanup_cache_params(struct kmem_cache *s);
/**
@@ -490,8 +487,6 @@ memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int order)
* memcg_kmem_get_cache: selects the correct per-memcg cache for allocation
* @cachep: the original global kmem cache
* @gfp: allocation flags.
- *
- * All memory allocated from a per-memcg cache is charged to the owner memcg.
*/
static __always_inline struct kmem_cache *
memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 370a27509e45..8c60d7a30f4f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2778,20 +2778,6 @@ static void memcg_schedule_register_cache(struct mem_cgroup *memcg,
memcg_resume_kmem_account();
}
-int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order)
-{
- unsigned int nr_pages = 1 << order;
-
- return memcg_charge_kmem(cachep->memcg_params->memcg, gfp, nr_pages);
-}
-
-void __memcg_uncharge_slab(struct kmem_cache *cachep, int order)
-{
- unsigned int nr_pages = 1 << order;
-
- memcg_uncharge_kmem(cachep->memcg_params->memcg, nr_pages);
-}
-
/*
* Return the kmem_cache we're supposed to use for a slab allocation.
* We try to use the current memcg's version of the cache.
diff --git a/mm/slab.c b/mm/slab.c
index 458613d75533..a9eb49f40c0a 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1559,6 +1559,19 @@ slab_out_of_memory(struct kmem_cache *cachep, gfp_t gfpflags, int nodeid)
#endif
}
+static inline struct page *alloc_slab_page(gfp_t flags, int nodeid, int order)
+{
+ struct mem_cgroup *memcg = NULL;
+ struct page *page;
+
+ flags |= __GFP_NOTRACK;
+ if (!memcg_kmem_newpage_charge(flags, &memcg, order))
+ return NULL;
+ page = alloc_pages_exact_node(nodeid, flags, order);
+ memcg_kmem_commit_charge(page, memcg, order);
+ return page;
+}
+
/*
* Interface to system's page allocator. No need to hold the
* kmem_cache_node ->list_lock.
@@ -1577,12 +1590,8 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
flags |= __GFP_RECLAIMABLE;
- if (memcg_charge_slab(cachep, flags, cachep->gfporder))
- return NULL;
-
- page = alloc_pages_exact_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder);
+ page = alloc_slab_page(flags, nodeid, cachep->gfporder);
if (!page) {
- memcg_uncharge_slab(cachep, cachep->gfporder);
slab_out_of_memory(cachep, flags, nodeid);
return NULL;
}
@@ -1638,8 +1647,7 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
if (current->reclaim_state)
current->reclaim_state->reclaimed_slab += nr_freed;
- __free_pages(page, cachep->gfporder);
- memcg_uncharge_slab(cachep, cachep->gfporder);
+ __free_kmem_pages(page, cachep->gfporder);
}
static void kmem_rcu_free(struct rcu_head *head)
diff --git a/mm/slab.h b/mm/slab.h
index 3347fd77f7be..1ba7ad07dce4 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -227,25 +227,6 @@ static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s)
return s;
return s->memcg_params->root_cache;
}
-
-static __always_inline int memcg_charge_slab(struct kmem_cache *s,
- gfp_t gfp, int order)
-{
- if (!memcg_kmem_enabled())
- return 0;
- if (is_root_cache(s))
- return 0;
- return __memcg_charge_slab(s, gfp, order);
-}
-
-static __always_inline void memcg_uncharge_slab(struct kmem_cache *s, int order)
-{
- if (!memcg_kmem_enabled())
- return;
- if (is_root_cache(s))
- return;
- __memcg_uncharge_slab(s, order);
-}
#else
static inline bool is_root_cache(struct kmem_cache *s)
{
@@ -273,15 +254,6 @@ static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s)
{
return s;
}
-
-static inline int memcg_charge_slab(struct kmem_cache *s, gfp_t gfp, int order)
-{
- return 0;
-}
-
-static inline void memcg_uncharge_slab(struct kmem_cache *s, int order)
-{
-}
#endif
static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
diff --git a/mm/slub.c b/mm/slub.c
index 80c170e92ffc..205eaca18b7b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1276,15 +1276,16 @@ static inline void slab_free_hook(struct kmem_cache *s, void *x)
/*
* Slab allocation and freeing
*/
-static inline struct page *alloc_slab_page(struct kmem_cache *s,
- gfp_t flags, int node, struct kmem_cache_order_objects oo)
+static inline struct page *alloc_slab_page(gfp_t flags, int node,
+ struct kmem_cache_order_objects oo)
{
+ struct mem_cgroup *memcg = NULL;
struct page *page;
int order = oo_order(oo);
flags |= __GFP_NOTRACK;
- if (memcg_charge_slab(s, flags, order))
+ if (!memcg_kmem_newpage_charge(flags, &memcg, order))
return NULL;
if (node == NUMA_NO_NODE)
@@ -1292,9 +1293,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
else
page = alloc_pages_exact_node(node, flags, order);
- if (!page)
- memcg_uncharge_slab(s, order);
-
+ memcg_kmem_commit_charge(page, memcg, order);
return page;
}
@@ -1317,7 +1316,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
*/
alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
- page = alloc_slab_page(s, alloc_gfp, node, oo);
+ page = alloc_slab_page(alloc_gfp, node, oo);
if (unlikely(!page)) {
oo = s->min;
alloc_gfp = flags;
@@ -1325,7 +1324,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
* Allocation may have failed due to fragmentation.
* Try a lower order alloc if possible
*/
- page = alloc_slab_page(s, alloc_gfp, node, oo);
+ page = alloc_slab_page(alloc_gfp, node, oo);
if (page)
stat(s, ORDER_FALLBACK);
@@ -1438,8 +1437,7 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
page_mapcount_reset(page);
if (current->reclaim_state)
current->reclaim_state->reclaimed_slab += pages;
- __free_pages(page, order);
- memcg_uncharge_slab(s, order);
+ __free_kmem_pages(page, order);
}
#define need_reserve_slab_rcu \
--
1.7.10.4
Some comments would be good for the commit.
Acked-by: Christoph Lameter <[email protected]>
On Mon, 3 Nov 2014, Vladimir Davydov wrote:
> +static __always_inline void slab_free(struct kmem_cache *cachep, void *objp);
> +
> static __always_inline void *
> slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
> unsigned long caller)
> @@ -3185,6 +3187,10 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
> kmemcheck_slab_alloc(cachep, flags, ptr, cachep->object_size);
> if (unlikely(flags & __GFP_ZERO))
> memset(ptr, 0, cachep->object_size);
> + if (unlikely(memcg_kmem_recharge_slab(ptr, flags))) {
> + slab_free(cachep, ptr);
> + ptr = NULL;
> + }
> }
>
> return ptr;
> @@ -3250,6 +3256,10 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
> kmemcheck_slab_alloc(cachep, flags, objp, cachep->object_size);
> if (unlikely(flags & __GFP_ZERO))
> memset(objp, 0, cachep->object_size);
> + if (unlikely(memcg_kmem_recharge_slab(objp, flags))) {
> + slab_free(cachep, objp);
> + objp = NULL;
> + }
> }
>
Please do not add code to the hotpaths if its avoidable. Can you charge
the full slab only when allocated please?
Hi Christoph,
On Wed, Nov 05, 2014 at 12:43:31PM -0600, Christoph Lameter wrote:
> On Mon, 3 Nov 2014, Vladimir Davydov wrote:
>
> > +static __always_inline void slab_free(struct kmem_cache *cachep, void *objp);
> > +
> > static __always_inline void *
> > slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
> > unsigned long caller)
> > @@ -3185,6 +3187,10 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
> > kmemcheck_slab_alloc(cachep, flags, ptr, cachep->object_size);
> > if (unlikely(flags & __GFP_ZERO))
> > memset(ptr, 0, cachep->object_size);
> > + if (unlikely(memcg_kmem_recharge_slab(ptr, flags))) {
> > + slab_free(cachep, ptr);
> > + ptr = NULL;
> > + }
> > }
> >
> > return ptr;
> > @@ -3250,6 +3256,10 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
> > kmemcheck_slab_alloc(cachep, flags, objp, cachep->object_size);
> > if (unlikely(flags & __GFP_ZERO))
> > memset(objp, 0, cachep->object_size);
> > + if (unlikely(memcg_kmem_recharge_slab(objp, flags))) {
> > + slab_free(cachep, objp);
> > + objp = NULL;
> > + }
> > }
> >
>
> Please do not add code to the hotpaths if its avoidable. Can you charge
> the full slab only when allocated please?
I call memcg_kmem_recharge_slab only on alloc path. Free path isn't
touched. The overhead added is one function call. The function only
reads and compares two pointers under RCU most of time. This is
comparable to the overhead introduced by memcg_kmem_get_cache, which is
called in slab_alloc/slab_alloc_node earlier.
Anyways, if you think this is unacceptable, I don't mind dropping the
whole patch set and thinking more on how to fix this per-memcg caches
trickery. What do you think?
Thanks,
Vladimir
On Wed, Nov 05, 2014 at 12:42:05PM -0600, Christoph Lameter wrote:
> Some comments would be good for the commit.
If it isn't too late, here it goes:
We have code duplication in kmem_cache_free/kfree. Let's move it to a
separate function.
>
> Acked-by: Christoph Lameter <[email protected]>
Thank you.
On Thu, 6 Nov 2014, Vladimir Davydov wrote:
> I call memcg_kmem_recharge_slab only on alloc path. Free path isn't
> touched. The overhead added is one function call. The function only
> reads and compares two pointers under RCU most of time. This is
> comparable to the overhead introduced by memcg_kmem_get_cache, which is
> called in slab_alloc/slab_alloc_node earlier.
Right maybe remove those too? Things seem to be accumulating in the hot
path which is bad. There is a slow path where these things can be added
and also a page based even slower path for statistics keeping.
The approach in SLUB is to do accounting on a slab page basis. Also memory
policies are applied at page granularity not object granularity.
> Anyways, if you think this is unacceptable, I don't mind dropping the
> whole patch set and thinking more on how to fix this per-memcg caches
> trickery. What do you think?
Maybe its possible to just use slab page accounting instead of object
accounting? Reduces overhead significantly. There may be some fuzz here
with occasional object accounted in the wrong way (which is similar to how
memory policies and other methods work) but it has been done before and
works ok.
On Thu, Nov 06, 2014 at 09:01:52AM -0600, Christoph Lameter wrote:
> On Thu, 6 Nov 2014, Vladimir Davydov wrote:
>
> > I call memcg_kmem_recharge_slab only on alloc path. Free path isn't
> > touched. The overhead added is one function call. The function only
> > reads and compares two pointers under RCU most of time. This is
> > comparable to the overhead introduced by memcg_kmem_get_cache, which is
> > called in slab_alloc/slab_alloc_node earlier.
>
> Right maybe remove those too? Things seem to be accumulating in the hot
> path which is bad. There is a slow path where these things can be added
> and also a page based even slower path for statistics keeping.
>
> The approach in SLUB is to do accounting on a slab page basis. Also memory
> policies are applied at page granularity not object granularity.
>
> > Anyways, if you think this is unacceptable, I don't mind dropping the
> > whole patch set and thinking more on how to fix this per-memcg caches
> > trickery. What do you think?
>
> Maybe its possible to just use slab page accounting instead of object
> accounting? Reduces overhead significantly. There may be some fuzz here
> with occasional object accounted in the wrong way (which is similar to how
> memory policies and other methods work) but it has been done before and
> works ok.
Actually, it's not about mis-accounting. The problem is a newly
allocated object can pin a charge of a dead cgroup that used the cache
before. May be, it wouldn't be a problem though.
Anyways, I think I need more time to brood over the whole approach, so
I've asked Andrew to drop the patch set.
Thank you for the feedback!