Each slab kmem cache has per cpu array caches. The array caches are
created when the kmem_cache is created, either via kmem_cache_create()
or lazily when the first object is allocated in context of a kmem
enabled memcg. Array caches are replaced by writing to /proc/slabinfo.
Array caches are protected by holding slab_mutex or disabling
interrupts. Array cache allocation and replacement is done by
__do_tune_cpucache() which holds slab_mutex and calls
kick_all_cpus_sync() to interrupt all remote processors which confirms
there are no references to the old array caches.
IPIs are needed when replacing array caches. But when creating a new
array cache, there's no need to send IPIs because there cannot be any
references to the new cache. Outside of memcg kmem accounting these
IPIs occur at boot time, so they're not a problem. But with memcg kmem
accounting each container can create kmem caches, so the IPIs are
wasteful.
Avoid unnecessary IPIs when creating array caches.
Test which reports the IPI count of allocating slab in 10000 memcg:
import os
def ipi_count():
with open("/proc/interrupts") as f:
for l in f:
if 'Function call interrupts' in l:
return int(l.split()[1])
def echo(val, path):
with open(path, "w") as f:
f.write(val)
n = 10000
os.chdir("/mnt/cgroup/memory")
pid = str(os.getpid())
a = ipi_count()
for i in range(n):
os.mkdir(str(i))
echo("1G\n", "%d/memory.limit_in_bytes" % i)
echo("1G\n", "%d/memory.kmem.limit_in_bytes" % i)
echo(pid, "%d/cgroup.procs" % i)
open("/tmp/x", "w").close()
os.unlink("/tmp/x")
b = ipi_count()
print "%d loops: %d => %d (+%d ipis)" % (n, a, b, b-a)
echo(pid, "cgroup.procs")
for i in range(n):
os.rmdir(str(i))
patched: 10000 loops: 1069 => 1170 (+101 ipis)
unpatched: 10000 loops: 1192 => 48933 (+47741 ipis)
Signed-off-by: Greg Thelen <[email protected]>
---
mm/slab.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/mm/slab.c b/mm/slab.c
index 807d86c76908..1880d482a0cb 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3879,7 +3879,12 @@ static int __do_tune_cpucache(struct kmem_cache *cachep, int limit,
prev = cachep->cpu_cache;
cachep->cpu_cache = cpu_cache;
- kick_all_cpus_sync();
+ /*
+ * Without a previous cpu_cache there's no need to synchronize remote
+ * cpus, so skip the IPIs.
+ */
+ if (prev)
+ kick_all_cpus_sync();
check_irq_on();
cachep->batchcount = batchcount;
--
2.12.2.762.g0e3151a226-goog
On Sun, Apr 16, 2017 at 02:45:44PM -0700, Greg Thelen wrote:
> Each slab kmem cache has per cpu array caches. The array caches are
> created when the kmem_cache is created, either via kmem_cache_create()
> or lazily when the first object is allocated in context of a kmem
> enabled memcg. Array caches are replaced by writing to /proc/slabinfo.
>
> Array caches are protected by holding slab_mutex or disabling
> interrupts. Array cache allocation and replacement is done by
> __do_tune_cpucache() which holds slab_mutex and calls
> kick_all_cpus_sync() to interrupt all remote processors which confirms
> there are no references to the old array caches.
>
> IPIs are needed when replacing array caches. But when creating a new
> array cache, there's no need to send IPIs because there cannot be any
> references to the new cache. Outside of memcg kmem accounting these
> IPIs occur at boot time, so they're not a problem. But with memcg kmem
> accounting each container can create kmem caches, so the IPIs are
> wasteful.
>
> Avoid unnecessary IPIs when creating array caches.
>
> Test which reports the IPI count of allocating slab in 10000 memcg:
> import os
>
> def ipi_count():
> with open("/proc/interrupts") as f:
> for l in f:
> if 'Function call interrupts' in l:
> return int(l.split()[1])
>
> def echo(val, path):
> with open(path, "w") as f:
> f.write(val)
>
> n = 10000
> os.chdir("/mnt/cgroup/memory")
> pid = str(os.getpid())
> a = ipi_count()
> for i in range(n):
> os.mkdir(str(i))
> echo("1G\n", "%d/memory.limit_in_bytes" % i)
> echo("1G\n", "%d/memory.kmem.limit_in_bytes" % i)
> echo(pid, "%d/cgroup.procs" % i)
> open("/tmp/x", "w").close()
> os.unlink("/tmp/x")
> b = ipi_count()
> print "%d loops: %d => %d (+%d ipis)" % (n, a, b, b-a)
> echo(pid, "cgroup.procs")
> for i in range(n):
> os.rmdir(str(i))
>
> patched: 10000 loops: 1069 => 1170 (+101 ipis)
> unpatched: 10000 loops: 1192 => 48933 (+47741 ipis)
>
> Signed-off-by: Greg Thelen <[email protected]>
Acked-by: Joonsoo Kim <[email protected]>
On Sun, 16 Apr 2017, Greg Thelen wrote:
> Each slab kmem cache has per cpu array caches. The array caches are
> created when the kmem_cache is created, either via kmem_cache_create()
> or lazily when the first object is allocated in context of a kmem
> enabled memcg. Array caches are replaced by writing to /proc/slabinfo.
>
> Array caches are protected by holding slab_mutex or disabling
> interrupts. Array cache allocation and replacement is done by
> __do_tune_cpucache() which holds slab_mutex and calls
> kick_all_cpus_sync() to interrupt all remote processors which confirms
> there are no references to the old array caches.
>
> IPIs are needed when replacing array caches. But when creating a new
> array cache, there's no need to send IPIs because there cannot be any
> references to the new cache. Outside of memcg kmem accounting these
> IPIs occur at boot time, so they're not a problem. But with memcg kmem
> accounting each container can create kmem caches, so the IPIs are
> wasteful.
>
> Avoid unnecessary IPIs when creating array caches.
>
> Test which reports the IPI count of allocating slab in 10000 memcg:
> import os
>
> def ipi_count():
> with open("/proc/interrupts") as f:
> for l in f:
> if 'Function call interrupts' in l:
> return int(l.split()[1])
>
> def echo(val, path):
> with open(path, "w") as f:
> f.write(val)
>
> n = 10000
> os.chdir("/mnt/cgroup/memory")
> pid = str(os.getpid())
> a = ipi_count()
> for i in range(n):
> os.mkdir(str(i))
> echo("1G\n", "%d/memory.limit_in_bytes" % i)
> echo("1G\n", "%d/memory.kmem.limit_in_bytes" % i)
> echo(pid, "%d/cgroup.procs" % i)
> open("/tmp/x", "w").close()
> os.unlink("/tmp/x")
> b = ipi_count()
> print "%d loops: %d => %d (+%d ipis)" % (n, a, b, b-a)
> echo(pid, "cgroup.procs")
> for i in range(n):
> os.rmdir(str(i))
>
> patched: 10000 loops: 1069 => 1170 (+101 ipis)
> unpatched: 10000 loops: 1192 => 48933 (+47741 ipis)
>
> Signed-off-by: Greg Thelen <[email protected]>
Acked-by: David Rientjes <[email protected]>