2018-06-18 09:20:01

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 0/7] kmalloc-reclaimable caches

v2 changes:
- shorten cache names to kmalloc-rcl-<SIZE>
- last patch shortens <SIZE> for all kmalloc caches to e.g. "1k", "4M"
- include dma caches to the 2D kmalloc_caches[] array to avoid a branch
- vmstat counter nr_indirectly_reclaimable_bytes renamed to
nr_kernel_misc_reclaimable, doesn't include kmalloc-rcl-*
- /proc/meminfo counter renamed to KReclaimable, includes kmalloc-rcl*
and nr_kernel_misc_reclaimable

Hi,

as discussed at LSF/MM [1] here's a patchset that introduces
kmalloc-reclaimable caches (more details in the second patch) and uses them for
SLAB freelists and dcache external names. The latter allows us to repurpose the
NR_INDIRECTLY_RECLAIMABLE_BYTES counter later in the series.

This is how /proc/slabinfo looks like after booting in virtme:

...
kmalloc-rcl-4M 0 0 4194304 1 1024 : tunables 1 1 0 : slabdata 0 0 0
...
kmalloc-rcl-96 7 32 128 32 1 : tunables 120 60 8 : slabdata 1 1 0
kmalloc-rcl-64 25 128 64 64 1 : tunables 120 60 8 : slabdata 2 2 0
kmalloc-rcl-32 0 0 32 124 1 : tunables 120 60 8 : slabdata 0 0 0
kmalloc-4M 0 0 4194304 1 1024 : tunables 1 1 0 : slabdata 0 0 0
kmalloc-2M 0 0 2097152 1 512 : tunables 1 1 0 : slabdata 0 0 0
kmalloc-1M 0 0 1048576 1 256 : tunables 1 1 0 : slabdata 0 0 0
...

/proc/vmstat with renamed nr_indirectly_reclaimable_bytes counter:

...
nr_slab_reclaimable 2817
nr_slab_unreclaimable 1781
...
nr_kernel_misc_reclaimable 0
...

/proc/meminfo with new KReclaimable counter:

...
Shmem: 564 kB
KReclaimable: 11260 kB
Slab: 18368 kB
SReclaimable: 11260 kB
SUnreclaim: 7108 kB
KernelStack: 1248 kB
...

Thanks,
Vlastimil

Vlastimil Babka (7):
mm, slab: combine kmalloc_caches and kmalloc_dma_caches
mm, slab/slub: introduce kmalloc-reclaimable caches
mm, slab: allocate off-slab freelists as reclaimable when appropriate
dcache: allocate external names from reclaimable kmalloc caches
mm: rename and change semantics of nr_indirectly_reclaimable_bytes
mm, proc: add KReclaimable to /proc/meminfo
mm, slab: shorten kmalloc cache names for large sizes

Documentation/filesystems/proc.txt | 4 +
drivers/base/node.c | 19 ++--
drivers/staging/android/ion/ion_page_pool.c | 4 +-
fs/dcache.c | 38 ++------
fs/proc/meminfo.c | 16 +--
include/linux/mmzone.h | 2 +-
include/linux/slab.h | 49 +++++++---
mm/page_alloc.c | 19 ++--
mm/slab.c | 11 ++-
mm/slab_common.c | 102 ++++++++++++--------
mm/slub.c | 13 +--
mm/util.c | 3 +-
mm/vmstat.c | 6 +-
13 files changed, 159 insertions(+), 127 deletions(-)

--
2.17.1



2018-06-18 09:19:40

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 4/7] dcache: allocate external names from reclaimable kmalloc caches

We can use the newly introduced kmalloc-reclaimable-X caches, to allocate
external names in dcache, which will take care of the proper accounting
automatically, and also improve anti-fragmentation page grouping.

This effectively reverts commit f1782c9bc547 ("dcache: account external names
as indirectly reclaimable memory") and instead passes __GFP_RECLAIMABLE to
kmalloc(). The accounting thus moves from NR_INDIRECTLY_RECLAIMABLE_BYTES to
NR_SLAB_RECLAIMABLE, which is also considered in MemAvailable calculation and
overcommit decisions.

Signed-off-by: Vlastimil Babka <[email protected]>
---
fs/dcache.c | 38 +++++++++-----------------------------
1 file changed, 9 insertions(+), 29 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 0e8e5de3c48a..518c9ed8db8c 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -257,24 +257,10 @@ static void __d_free(struct rcu_head *head)
kmem_cache_free(dentry_cache, dentry);
}

-static void __d_free_external_name(struct rcu_head *head)
-{
- struct external_name *name = container_of(head, struct external_name,
- u.head);
-
- mod_node_page_state(page_pgdat(virt_to_page(name)),
- NR_INDIRECTLY_RECLAIMABLE_BYTES,
- -ksize(name));
-
- kfree(name);
-}
-
static void __d_free_external(struct rcu_head *head)
{
struct dentry *dentry = container_of(head, struct dentry, d_u.d_rcu);
-
- __d_free_external_name(&external_name(dentry)->u.head);
-
+ kfree(external_name(dentry));
kmem_cache_free(dentry_cache, dentry);
}

@@ -305,7 +291,7 @@ void release_dentry_name_snapshot(struct name_snapshot *name)
struct external_name *p;
p = container_of(name->name, struct external_name, name[0]);
if (unlikely(atomic_dec_and_test(&p->u.count)))
- call_rcu(&p->u.head, __d_free_external_name);
+ kfree_rcu(p, u.head);
}
}
EXPORT_SYMBOL(release_dentry_name_snapshot);
@@ -1608,7 +1594,6 @@ EXPORT_SYMBOL(d_invalidate);

struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
{
- struct external_name *ext = NULL;
struct dentry *dentry;
char *dname;
int err;
@@ -1629,14 +1614,15 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
dname = dentry->d_iname;
} else if (name->len > DNAME_INLINE_LEN-1) {
size_t size = offsetof(struct external_name, name[1]);
-
- ext = kmalloc(size + name->len, GFP_KERNEL_ACCOUNT);
- if (!ext) {
+ struct external_name *p = kmalloc(size + name->len,
+ GFP_KERNEL_ACCOUNT |
+ __GFP_RECLAIMABLE);
+ if (!p) {
kmem_cache_free(dentry_cache, dentry);
return NULL;
}
- atomic_set(&ext->u.count, 1);
- dname = ext->name;
+ atomic_set(&p->u.count, 1);
+ dname = p->name;
} else {
dname = dentry->d_iname;
}
@@ -1675,12 +1661,6 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
}
}

- if (unlikely(ext)) {
- pg_data_t *pgdat = page_pgdat(virt_to_page(ext));
- mod_node_page_state(pgdat, NR_INDIRECTLY_RECLAIMABLE_BYTES,
- ksize(ext));
- }
-
this_cpu_inc(nr_dentry);

return dentry;
@@ -2761,7 +2741,7 @@ static void copy_name(struct dentry *dentry, struct dentry *target)
dentry->d_name.hash_len = target->d_name.hash_len;
}
if (old_name && likely(atomic_dec_and_test(&old_name->u.count)))
- call_rcu(&old_name->u.head, __d_free_external_name);
+ kfree_rcu(old_name, u.head);
}

/*
--
2.17.1


2018-06-18 09:20:13

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 5/7] mm: rename and change semantics of nr_indirectly_reclaimable_bytes

The vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES was introduced by commit
eb59254608bc ("mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTES") with the goal of
accounting objects that can be reclaimed, but cannot be allocated via a
SLAB_RECLAIM_ACCOUNT cache. This is now possible via kmalloc() with
__GFP_RECLAIMABLE flag, and the dcache external names user is converted.

The counter is however still useful for accounting direct page allocations
(i.e. not slab) with a shrinker, such as the ION page pool. So keep it, and:

- change granularity to pages to be more like other counters; sub-page
allocations should be able to use kmalloc
- rename the counter to NR_KERNEL_MISC_RECLAIMABLE
- expose the counter again in vmstat as "nr_kernel_misc_reclaimable"; we can
again remove the check for not printing "hidden" counters

Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Vijayanand Jitta <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Sumit Semwal <[email protected]>
---
drivers/staging/android/ion/ion_page_pool.c | 4 ++--
include/linux/mmzone.h | 2 +-
mm/page_alloc.c | 19 +++++++------------
mm/util.c | 3 +--
mm/vmstat.c | 6 +-----
5 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/drivers/staging/android/ion/ion_page_pool.c b/drivers/staging/android/ion/ion_page_pool.c
index 9bc56eb48d2a..b7ad2d2449ac 100644
--- a/drivers/staging/android/ion/ion_page_pool.c
+++ b/drivers/staging/android/ion/ion_page_pool.c
@@ -33,8 +33,8 @@ static void ion_page_pool_add(struct ion_page_pool *pool, struct page *page)
pool->low_count++;
}

- mod_node_page_state(page_pgdat(page), NR_INDIRECTLY_RECLAIMABLE_BYTES,
- (1 << (PAGE_SHIFT + pool->order)));
+ mod_node_page_state(page_pgdat(page), NR_KERNEL_MISC_RECLAIMABLE,
+ 1 << pool->order);
mutex_unlock(&pool->mutex);
}

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 32699b2dc52a..c2f6bc4c9e8a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -180,7 +180,7 @@ enum node_stat_item {
NR_VMSCAN_IMMEDIATE, /* Prioritise for reclaim when writeback ends */
NR_DIRTIED, /* page dirtyings since bootup */
NR_WRITTEN, /* page writings since bootup */
- NR_INDIRECTLY_RECLAIMABLE_BYTES, /* measured in bytes */
+ NR_KERNEL_MISC_RECLAIMABLE, /* reclaimable non-slab kernel pages */
NR_VM_NODE_STAT_ITEMS
};

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1521100f1e63..8ceb45e11b97 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4704,6 +4704,7 @@ long si_mem_available(void)
unsigned long pagecache;
unsigned long wmark_low = 0;
unsigned long pages[NR_LRU_LISTS];
+ unsigned long reclaimable;
struct zone *zone;
int lru;

@@ -4729,19 +4730,13 @@ long si_mem_available(void)
available += pagecache;

/*
- * Part of the reclaimable slab consists of items that are in use,
- * and cannot be freed. Cap this estimate at the low watermark.
+ * Part of the reclaimable slab and other kernel memory consists of
+ * items that are in use, and cannot be freed. Cap this estimate at the
+ * low watermark.
*/
- available += global_node_page_state(NR_SLAB_RECLAIMABLE) -
- min(global_node_page_state(NR_SLAB_RECLAIMABLE) / 2,
- wmark_low);
-
- /*
- * Part of the kernel memory, which can be released under memory
- * pressure.
- */
- available += global_node_page_state(NR_INDIRECTLY_RECLAIMABLE_BYTES) >>
- PAGE_SHIFT;
+ reclaimable = global_node_page_state(NR_SLAB_RECLAIMABLE) +
+ global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE);
+ available += reclaimable - min(reclaimable / 2, wmark_low);

if (available < 0)
available = 0;
diff --git a/mm/util.c b/mm/util.c
index 3351659200e6..891f0654e7b5 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -675,8 +675,7 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
* Part of the kernel memory, which can be released
* under memory pressure.
*/
- free += global_node_page_state(
- NR_INDIRECTLY_RECLAIMABLE_BYTES) >> PAGE_SHIFT;
+ free += global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE);

/*
* Leave reserved pages. The pages are not for anonymous pages.
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 75eda9c2b260..7c677d3a61ec 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1161,7 +1161,7 @@ const char * const vmstat_text[] = {
"nr_vmscan_immediate_reclaim",
"nr_dirtied",
"nr_written",
- "", /* nr_indirectly_reclaimable */
+ "nr_kernel_misc_reclaimable",

/* enum writeback_stat_item counters */
"nr_dirty_threshold",
@@ -1704,10 +1704,6 @@ static int vmstat_show(struct seq_file *m, void *arg)
unsigned long *l = arg;
unsigned long off = l - (unsigned long *)m->private;

- /* Skip hidden vmstat items. */
- if (*vmstat_text[off] == '\0')
- return 0;
-
seq_puts(m, vmstat_text[off]);
seq_put_decimal_ull(m, " ", *l);
seq_putc(m, '\n');
--
2.17.1


2018-06-18 09:20:19

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 2/7] mm, slab/slub: introduce kmalloc-reclaimable caches

Kmem caches can be created with a SLAB_RECLAIM_ACCOUNT flag, which indicates
they contain objects which can be reclaimed under memory pressure (typically
through a shrinker). This makes the slab pages accounted as NR_SLAB_RECLAIMABLE
in vmstat, which is reflected also the MemAvailable meminfo counter and in
overcommit decisions. The slab pages are also allocated with __GFP_RECLAIMABLE,
which is good for anti-fragmentation through grouping pages by mobility.

The generic kmalloc-X caches are created without this flag, but sometimes are
used also for objects that can be reclaimed, which due to varying size cannot
have a dedicated kmem cache with SLAB_RECLAIM_ACCOUNT flag. A prominent example
are dcache external names, which prompted the creation of a new, manually
managed vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES in commit f1782c9bc547
("dcache: account external names as indirectly reclaimable memory").

To better handle this and any other similar cases, this patch introduces
SLAB_RECLAIM_ACCOUNT variants of kmalloc caches, named kmalloc-rcl-X.
They are used whenever the kmalloc() call passes __GFP_RECLAIMABLE among gfp
flags. They are added to the kmalloc_caches array as a new type. Allocations
with both __GFP_DMA and __GFP_RECLAIMABLE will use a dma type cache.

This change only applies to SLAB and SLUB, not SLOB. This is fine, since SLOB's
target are tiny system and this patch does add some overhead of kmem management
objects.

Signed-off-by: Vlastimil Babka <[email protected]>
---
include/linux/slab.h | 16 +++++++++++----
mm/slab_common.c | 48 ++++++++++++++++++++++++++++----------------
2 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 4299c59353a1..d89e934e0d8b 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -296,11 +296,12 @@ static inline void __check_heap_object(const void *ptr, unsigned long n,
(KMALLOC_MIN_SIZE) : 16)

#define KMALLOC_NORMAL 0
+#define KMALLOC_RECLAIM 1
#ifdef CONFIG_ZONE_DMA
-#define KMALLOC_DMA 1
-#define KMALLOC_TYPES 2
+#define KMALLOC_DMA 2
+#define KMALLOC_TYPES 3
#else
-#define KMALLOC_TYPES 1
+#define KMALLOC_TYPES 2
#endif

#ifndef CONFIG_SLOB
@@ -309,12 +310,19 @@ extern struct kmem_cache *kmalloc_caches[KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1];
static __always_inline unsigned int kmalloc_type(gfp_t flags)
{
int is_dma = 0;
+ int is_reclaimable;

#ifdef CONFIG_ZONE_DMA
is_dma = !!(flags & __GFP_DMA);
#endif

- return is_dma;
+ is_reclaimable = !!(flags & __GFP_RECLAIMABLE);
+
+ /*
+ * If an allocation is botth __GFP_DMA and __GFP_RECLAIMABLE, return
+ * KMALLOC_DMA and effectively ignore __GFP_RECLAIMABLE
+ */
+ return (is_dma * 2) + (is_reclaimable & !is_dma);
}

/*
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 635f2d8d0198..8a30d6979936 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1103,10 +1103,21 @@ void __init setup_kmalloc_cache_index_table(void)
}
}

-static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
+static void __init
+new_kmalloc_cache(int idx, int type, slab_flags_t flags)
{
- kmalloc_caches[KMALLOC_NORMAL][idx] = create_kmalloc_cache(
- kmalloc_info[idx].name,
+ const char *name;
+
+ if (type == KMALLOC_RECLAIM) {
+ flags |= SLAB_RECLAIM_ACCOUNT;
+ name = kasprintf(GFP_NOWAIT, "kmalloc-rcl-%u",
+ kmalloc_info[idx].size);
+ BUG_ON(!name);
+ } else {
+ name = kmalloc_info[idx].name;
+ }
+
+ kmalloc_caches[type][idx] = create_kmalloc_cache(name,
kmalloc_info[idx].size, flags, 0,
kmalloc_info[idx].size);
}
@@ -1118,22 +1129,25 @@ static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
*/
void __init create_kmalloc_caches(slab_flags_t flags)
{
- int i;
- int type = KMALLOC_NORMAL;
+ int i, type;

- for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
- if (!kmalloc_caches[type][i])
- new_kmalloc_cache(i, flags);
+ for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) {
+ for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
+ if (!kmalloc_caches[type][i])
+ new_kmalloc_cache(i, type, flags);

- /*
- * Caches that are not of the two-to-the-power-of size.
- * These have to be created immediately after the
- * earlier power of two caches
- */
- if (KMALLOC_MIN_SIZE <= 32 && !kmalloc_caches[type][1] && i == 6)
- new_kmalloc_cache(1, flags);
- if (KMALLOC_MIN_SIZE <= 64 && !kmalloc_caches[type][2] && i == 7)
- new_kmalloc_cache(2, flags);
+ /*
+ * Caches that are not of the two-to-the-power-of size.
+ * These have to be created immediately after the
+ * earlier power of two caches
+ */
+ if (KMALLOC_MIN_SIZE <= 32 && i == 6 &&
+ !kmalloc_caches[type][1])
+ new_kmalloc_cache(1, type, flags);
+ if (KMALLOC_MIN_SIZE <= 64 && i == 7 &&
+ !kmalloc_caches[type][2])
+ new_kmalloc_cache(2, type, flags);
+ }
}

/* Kmalloc array is now usable */
--
2.17.1


2018-06-18 09:21:08

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 3/7] mm, slab: allocate off-slab freelists as reclaimable when appropriate

In SLAB, OFF_SLAB caches allocate management structures (currently just the
freelist) from kmalloc caches when placement in a slab page together with
objects would lead to suboptimal memory usage. For SLAB_RECLAIM_ACCOUNT caches,
we can allocate the freelists from the newly introduced reclaimable kmalloc
caches, because shrinking the OFF_SLAB cache will in general result to freeing
of the freelists as well. This should improve accounting and anti-fragmentation
a bit.

Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/slab.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index 9515798f37b2..99d779ba2b92 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2140,8 +2140,13 @@ int __kmem_cache_create(struct kmem_cache *cachep, slab_flags_t flags)
#endif

if (OFF_SLAB(cachep)) {
+ /*
+ * If this cache is reclaimable, allocate also freelists from
+ * a reclaimable kmalloc cache.
+ */
cachep->freelist_cache =
- kmalloc_slab(cachep->freelist_size, 0u);
+ kmalloc_slab(cachep->freelist_size,
+ cachep->allocflags & __GFP_RECLAIMABLE);
}

err = setup_cpu_cache(cachep, gfp);
--
2.17.1


2018-06-18 09:21:15

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 1/7] mm, slab: combine kmalloc_caches and kmalloc_dma_caches

The kmalloc caches currently mainain separate (optional) array
kmalloc_dma_caches for __GFP_DMA allocations. There are tests for __GFP_DMA in
the allocation hotpaths. We can avoid the branches by combining kmalloc_caches
and kmalloc_dma_caches into a single two-dimensional array where the outer
dimension is cache "type". This will also allow to add kmalloc-reclaimable
caches as a third type.

Signed-off-by: Vlastimil Babka <[email protected]>
---
include/linux/slab.h | 41 ++++++++++++++++++++++++++++++-----------
mm/slab.c | 4 ++--
mm/slab_common.c | 30 +++++++++++-------------------
mm/slub.c | 13 +++++++------
4 files changed, 50 insertions(+), 38 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 14e3fe4bd6a1..4299c59353a1 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -295,12 +295,28 @@ static inline void __check_heap_object(const void *ptr, unsigned long n,
#define SLAB_OBJ_MIN_SIZE (KMALLOC_MIN_SIZE < 16 ? \
(KMALLOC_MIN_SIZE) : 16)

+#define KMALLOC_NORMAL 0
+#ifdef CONFIG_ZONE_DMA
+#define KMALLOC_DMA 1
+#define KMALLOC_TYPES 2
+#else
+#define KMALLOC_TYPES 1
+#endif
+
#ifndef CONFIG_SLOB
-extern struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];
+extern struct kmem_cache *kmalloc_caches[KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1];
+
+static __always_inline unsigned int kmalloc_type(gfp_t flags)
+{
+ int is_dma = 0;
+
#ifdef CONFIG_ZONE_DMA
-extern struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH + 1];
+ is_dma = !!(flags & __GFP_DMA);
#endif

+ return is_dma;
+}
+
/*
* Figure out which kmalloc slab an allocation of a certain size
* belongs to.
@@ -501,18 +517,20 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
static __always_inline void *kmalloc(size_t size, gfp_t flags)
{
if (__builtin_constant_p(size)) {
+#ifndef CONFIG_SLOB
+ unsigned int index;
+#endif
if (size > KMALLOC_MAX_CACHE_SIZE)
return kmalloc_large(size, flags);
#ifndef CONFIG_SLOB
- if (!(flags & GFP_DMA)) {
- unsigned int index = kmalloc_index(size);
+ index = kmalloc_index(size);

- if (!index)
- return ZERO_SIZE_PTR;
+ if (!index)
+ return ZERO_SIZE_PTR;

- return kmem_cache_alloc_trace(kmalloc_caches[index],
- flags, size);
- }
+ return kmem_cache_alloc_trace(
+ kmalloc_caches[kmalloc_type(flags)][index],
+ flags, size);
#endif
}
return __kmalloc(size, flags);
@@ -542,13 +560,14 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
{
#ifndef CONFIG_SLOB
if (__builtin_constant_p(size) &&
- size <= KMALLOC_MAX_CACHE_SIZE && !(flags & GFP_DMA)) {
+ size <= KMALLOC_MAX_CACHE_SIZE) {
unsigned int i = kmalloc_index(size);

if (!i)
return ZERO_SIZE_PTR;

- return kmem_cache_alloc_node_trace(kmalloc_caches[i],
+ return kmem_cache_alloc_node_trace(
+ kmalloc_caches[kmalloc_type(flags)][i],
flags, node, size);
}
#endif
diff --git a/mm/slab.c b/mm/slab.c
index aa76a70e087e..9515798f37b2 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1288,7 +1288,7 @@ void __init kmem_cache_init(void)
* Initialize the caches that provide memory for the kmem_cache_node
* structures first. Without this, further allocations will bug.
*/
- kmalloc_caches[INDEX_NODE] = create_kmalloc_cache(
+ kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache(
kmalloc_info[INDEX_NODE].name,
kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS,
0, kmalloc_size(INDEX_NODE));
@@ -1304,7 +1304,7 @@ void __init kmem_cache_init(void)
for_each_online_node(nid) {
init_list(kmem_cache, &init_kmem_cache_node[CACHE_CACHE + nid], nid);

- init_list(kmalloc_caches[INDEX_NODE],
+ init_list(kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE],
&init_kmem_cache_node[SIZE_NODE + nid], nid);
}
}
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 890b1f04a03a..635f2d8d0198 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -969,14 +969,9 @@ struct kmem_cache *__init create_kmalloc_cache(const char *name,
return s;
}

-struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1] __ro_after_init;
+struct kmem_cache *kmalloc_caches[KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1] __ro_after_init;
EXPORT_SYMBOL(kmalloc_caches);

-#ifdef CONFIG_ZONE_DMA
-struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH + 1] __ro_after_init;
-EXPORT_SYMBOL(kmalloc_dma_caches);
-#endif
-
/*
* Conversion table for small slabs sizes / 8 to the index in the
* kmalloc array. This is necessary for slabs < 192 since we have non power
@@ -1036,12 +1031,7 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
} else
index = fls(size - 1);

-#ifdef CONFIG_ZONE_DMA
- if (unlikely((flags & GFP_DMA)))
- return kmalloc_dma_caches[index];
-
-#endif
- return kmalloc_caches[index];
+ return kmalloc_caches[kmalloc_type(flags)][index];
}

/*
@@ -1115,7 +1105,8 @@ void __init setup_kmalloc_cache_index_table(void)

static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
{
- kmalloc_caches[idx] = create_kmalloc_cache(kmalloc_info[idx].name,
+ kmalloc_caches[KMALLOC_NORMAL][idx] = create_kmalloc_cache(
+ kmalloc_info[idx].name,
kmalloc_info[idx].size, flags, 0,
kmalloc_info[idx].size);
}
@@ -1128,9 +1119,10 @@ static void __init new_kmalloc_cache(int idx, slab_flags_t flags)
void __init create_kmalloc_caches(slab_flags_t flags)
{
int i;
+ int type = KMALLOC_NORMAL;

for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
- if (!kmalloc_caches[i])
+ if (!kmalloc_caches[type][i])
new_kmalloc_cache(i, flags);

/*
@@ -1138,9 +1130,9 @@ void __init create_kmalloc_caches(slab_flags_t flags)
* These have to be created immediately after the
* earlier power of two caches
*/
- if (KMALLOC_MIN_SIZE <= 32 && !kmalloc_caches[1] && i == 6)
+ if (KMALLOC_MIN_SIZE <= 32 && !kmalloc_caches[type][1] && i == 6)
new_kmalloc_cache(1, flags);
- if (KMALLOC_MIN_SIZE <= 64 && !kmalloc_caches[2] && i == 7)
+ if (KMALLOC_MIN_SIZE <= 64 && !kmalloc_caches[type][2] && i == 7)
new_kmalloc_cache(2, flags);
}

@@ -1149,7 +1141,7 @@ void __init create_kmalloc_caches(slab_flags_t flags)

#ifdef CONFIG_ZONE_DMA
for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
- struct kmem_cache *s = kmalloc_caches[i];
+ struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i];

if (s) {
unsigned int size = kmalloc_size(i);
@@ -1157,8 +1149,8 @@ void __init create_kmalloc_caches(slab_flags_t flags)
"dma-kmalloc-%u", size);

BUG_ON(!n);
- kmalloc_dma_caches[i] = create_kmalloc_cache(n,
- size, SLAB_CACHE_DMA | flags, 0, 0);
+ kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
+ n, size, SLAB_CACHE_DMA | flags, 0, 0);
}
}
#endif
diff --git a/mm/slub.c b/mm/slub.c
index a3b8467c14af..cdc31c1561c3 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4659,6 +4659,7 @@ static int list_locations(struct kmem_cache *s, char *buf,
static void __init resiliency_test(void)
{
u8 *p;
+ int type = KMALLOC_NORMAL;

BUILD_BUG_ON(KMALLOC_MIN_SIZE > 16 || KMALLOC_SHIFT_HIGH < 10);

@@ -4671,7 +4672,7 @@ static void __init resiliency_test(void)
pr_err("\n1. kmalloc-16: Clobber Redzone/next pointer 0x12->0x%p\n\n",
p + 16);

- validate_slab_cache(kmalloc_caches[4]);
+ validate_slab_cache(kmalloc_caches[type][4]);

/* Hmmm... The next two are dangerous */
p = kzalloc(32, GFP_KERNEL);
@@ -4680,33 +4681,33 @@ static void __init resiliency_test(void)
p);
pr_err("If allocated object is overwritten then not detectable\n\n");

- validate_slab_cache(kmalloc_caches[5]);
+ validate_slab_cache(kmalloc_caches[type][5]);
p = kzalloc(64, GFP_KERNEL);
p += 64 + (get_cycles() & 0xff) * sizeof(void *);
*p = 0x56;
pr_err("\n3. kmalloc-64: corrupting random byte 0x56->0x%p\n",
p);
pr_err("If allocated object is overwritten then not detectable\n\n");
- validate_slab_cache(kmalloc_caches[6]);
+ validate_slab_cache(kmalloc_caches[type][6]);

pr_err("\nB. Corruption after free\n");
p = kzalloc(128, GFP_KERNEL);
kfree(p);
*p = 0x78;
pr_err("1. kmalloc-128: Clobber first word 0x78->0x%p\n\n", p);
- validate_slab_cache(kmalloc_caches[7]);
+ validate_slab_cache(kmalloc_caches[type][7]);

p = kzalloc(256, GFP_KERNEL);
kfree(p);
p[50] = 0x9a;
pr_err("\n2. kmalloc-256: Clobber 50th byte 0x9a->0x%p\n\n", p);
- validate_slab_cache(kmalloc_caches[8]);
+ validate_slab_cache(kmalloc_caches[type][8]);

p = kzalloc(512, GFP_KERNEL);
kfree(p);
p[512] = 0xab;
pr_err("\n3. kmalloc-512: Clobber redzone 0xab->0x%p\n\n", p);
- validate_slab_cache(kmalloc_caches[9]);
+ validate_slab_cache(kmalloc_caches[type][9]);
}
#else
#ifdef CONFIG_SYSFS
--
2.17.1


2018-06-18 09:21:45

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 6/7] mm, proc: add KReclaimable to /proc/meminfo

The vmstat NR_KERNEL_MISC_RECLAIMABLE counter is for kernel non-slab
allocations that can be reclaimed via shrinker. In /proc/meminfo, we can show
the sum of all reclaimable kernel allocations (including slab) as
"KReclaimable". Add the same counter also to per-node meminfo under /sys

Signed-off-by: Vlastimil Babka <[email protected]>
---
Documentation/filesystems/proc.txt | 4 ++++
drivers/base/node.c | 19 ++++++++++++-------
fs/proc/meminfo.c | 16 ++++++++--------
3 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 520f6a84cf50..6a255f960ab5 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -858,6 +858,7 @@ Writeback: 0 kB
AnonPages: 861800 kB
Mapped: 280372 kB
Shmem: 644 kB
+KReclaimable: 168048 kB
Slab: 284364 kB
SReclaimable: 159856 kB
SUnreclaim: 124508 kB
@@ -921,6 +922,9 @@ AnonHugePages: Non-file backed huge pages mapped into userspace page tables
ShmemHugePages: Memory used by shared memory (shmem) and tmpfs allocated
with huge pages
ShmemPmdMapped: Shared memory mapped into userspace with huge pages
+KReclaimable: Kernel allocations that the kernel will attempt to reclaim
+ under memory pressure. Includes SReclaimable (below), and other
+ direct allocations with a shrinker.
Slab: in-kernel data structures cache
SReclaimable: Part of Slab, that might be reclaimed, such as caches
SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure
diff --git a/drivers/base/node.c b/drivers/base/node.c
index a5e821d09656..81cef8031eae 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -67,8 +67,11 @@ static ssize_t node_read_meminfo(struct device *dev,
int nid = dev->id;
struct pglist_data *pgdat = NODE_DATA(nid);
struct sysinfo i;
+ unsigned long sreclaimable, sunreclaimable;

si_meminfo_node(&i, nid);
+ sreclaimable = node_page_state(pgdat, NR_SLAB_RECLAIMABLE);
+ sunreclaimable = node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE);
n = sprintf(buf,
"Node %d MemTotal: %8lu kB\n"
"Node %d MemFree: %8lu kB\n"
@@ -118,6 +121,7 @@ static ssize_t node_read_meminfo(struct device *dev,
"Node %d NFS_Unstable: %8lu kB\n"
"Node %d Bounce: %8lu kB\n"
"Node %d WritebackTmp: %8lu kB\n"
+ "Node %d KReclaimable: %8lu kB\n"
"Node %d Slab: %8lu kB\n"
"Node %d SReclaimable: %8lu kB\n"
"Node %d SUnreclaim: %8lu kB\n"
@@ -138,20 +142,21 @@ static ssize_t node_read_meminfo(struct device *dev,
nid, K(node_page_state(pgdat, NR_UNSTABLE_NFS)),
nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)),
nid, K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
- nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE) +
- node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)),
- nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE)),
+ nid, K(sreclaimable +
+ node_page_state(pgdat, NR_KERNEL_MISC_RECLAIMABLE)),
+ nid, K(sreclaimable + sunreclaimable),
+ nid, K(sreclaimable),
+ nid, K(sunreclaimable)
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)),
+ ,
nid, K(node_page_state(pgdat, NR_ANON_THPS) *
HPAGE_PMD_NR),
nid, K(node_page_state(pgdat, NR_SHMEM_THPS) *
HPAGE_PMD_NR),
nid, K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED) *
- HPAGE_PMD_NR));
-#else
- nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)));
+ HPAGE_PMD_NR)
#endif
+ );
n += hugetlb_report_node_meminfo(nid, buf + n);
return n;
}
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 2fb04846ed11..61a18477bc07 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -37,6 +37,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
long cached;
long available;
unsigned long pages[NR_LRU_LISTS];
+ unsigned long sreclaimable, sunreclaim;
int lru;

si_meminfo(&i);
@@ -52,6 +53,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
pages[lru] = global_node_page_state(NR_LRU_BASE + lru);

available = si_mem_available();
+ sreclaimable = global_node_page_state(NR_SLAB_RECLAIMABLE);
+ sunreclaim = global_node_page_state(NR_SLAB_UNRECLAIMABLE);

show_val_kb(m, "MemTotal: ", i.totalram);
show_val_kb(m, "MemFree: ", i.freeram);
@@ -93,14 +96,11 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
show_val_kb(m, "Mapped: ",
global_node_page_state(NR_FILE_MAPPED));
show_val_kb(m, "Shmem: ", i.sharedram);
- show_val_kb(m, "Slab: ",
- global_node_page_state(NR_SLAB_RECLAIMABLE) +
- global_node_page_state(NR_SLAB_UNRECLAIMABLE));
-
- show_val_kb(m, "SReclaimable: ",
- global_node_page_state(NR_SLAB_RECLAIMABLE));
- show_val_kb(m, "SUnreclaim: ",
- global_node_page_state(NR_SLAB_UNRECLAIMABLE));
+ show_val_kb(m, "KReclaimable: ", sreclaimable +
+ global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE));
+ show_val_kb(m, "Slab: ", sreclaimable + sunreclaim);
+ show_val_kb(m, "SReclaimable: ", sreclaimable);
+ show_val_kb(m, "SUnreclaim: ", sunreclaim);
seq_printf(m, "KernelStack: %8lu kB\n",
global_zone_page_state(NR_KERNEL_STACK_KB));
show_val_kb(m, "PageTables: ",
--
2.17.1


2018-06-18 09:21:55

by Vlastimil Babka

[permalink] [raw]
Subject: [PATCH v2 7/7] mm, slab: shorten kmalloc cache names for large sizes

Kmalloc cache names can get quite long for large object sizes, when the sizes
are expressed in bytes. Use 'k' and 'M' prefixes to make the names as short
as possible e.g. in /proc/slabinfo. This works, as we mostly use power-of-two
sizes, with exceptions only below 1k.

Example: 'kmalloc-4194304' becomes 'kmalloc-4M'

Suggested-by: Matthew Wilcox <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
---
mm/slab_common.c | 38 ++++++++++++++++++++++++++------------
1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 8a30d6979936..462509978ec0 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1045,15 +1045,15 @@ const struct kmalloc_info_struct kmalloc_info[] __initconst = {
{"kmalloc-16", 16}, {"kmalloc-32", 32},
{"kmalloc-64", 64}, {"kmalloc-128", 128},
{"kmalloc-256", 256}, {"kmalloc-512", 512},
- {"kmalloc-1024", 1024}, {"kmalloc-2048", 2048},
- {"kmalloc-4096", 4096}, {"kmalloc-8192", 8192},
- {"kmalloc-16384", 16384}, {"kmalloc-32768", 32768},
- {"kmalloc-65536", 65536}, {"kmalloc-131072", 131072},
- {"kmalloc-262144", 262144}, {"kmalloc-524288", 524288},
- {"kmalloc-1048576", 1048576}, {"kmalloc-2097152", 2097152},
- {"kmalloc-4194304", 4194304}, {"kmalloc-8388608", 8388608},
- {"kmalloc-16777216", 16777216}, {"kmalloc-33554432", 33554432},
- {"kmalloc-67108864", 67108864}
+ {"kmalloc-1k", 1024}, {"kmalloc-2k", 2048},
+ {"kmalloc-4k", 4096}, {"kmalloc-8k", 8192},
+ {"kmalloc-16k", 16384}, {"kmalloc-32k", 32768},
+ {"kmalloc-64k", 65536}, {"kmalloc-128k", 131072},
+ {"kmalloc-256k", 262144}, {"kmalloc-512k", 524288},
+ {"kmalloc-1M", 1048576}, {"kmalloc-2M", 2097152},
+ {"kmalloc-4M", 4194304}, {"kmalloc-8M", 8388608},
+ {"kmalloc-16M", 16777216}, {"kmalloc-32M", 33554432},
+ {"kmalloc-64M", 67108864}
};

/*
@@ -1103,6 +1103,21 @@ void __init setup_kmalloc_cache_index_table(void)
}
}

+static const char *
+kmalloc_cache_name(const char *prefix, unsigned int size)
+{
+
+ static const char units[3] = "\0kM";
+ int idx = 0;
+
+ while (size >= 1024 && (size % 1024 == 0)) {
+ size /= 1024;
+ idx++;
+ }
+
+ return kasprintf(GFP_NOWAIT, "%s-%u%c", prefix, size, units[idx]);
+}
+
static void __init
new_kmalloc_cache(int idx, int type, slab_flags_t flags)
{
@@ -1110,7 +1125,7 @@ new_kmalloc_cache(int idx, int type, slab_flags_t flags)

if (type == KMALLOC_RECLAIM) {
flags |= SLAB_RECLAIM_ACCOUNT;
- name = kasprintf(GFP_NOWAIT, "kmalloc-rcl-%u",
+ name = kmalloc_cache_name("kmalloc-rcl",
kmalloc_info[idx].size);
BUG_ON(!name);
} else {
@@ -1159,8 +1174,7 @@ void __init create_kmalloc_caches(slab_flags_t flags)

if (s) {
unsigned int size = kmalloc_size(i);
- char *n = kasprintf(GFP_NOWAIT,
- "dma-kmalloc-%u", size);
+ const char *n = kmalloc_cache_name("dma-kmalloc", size);

BUG_ON(!n);
kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
--
2.17.1


2018-06-18 21:34:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] mm, proc: add KReclaimable to /proc/meminfo

On Mon, 18 Jun 2018 11:18:07 +0200 Vlastimil Babka <[email protected]> wrote:

> The vmstat NR_KERNEL_MISC_RECLAIMABLE counter is for kernel non-slab
> allocations that can be reclaimed via shrinker. In /proc/meminfo, we can show
> the sum of all reclaimable kernel allocations (including slab) as
> "KReclaimable". Add the same counter also to per-node meminfo under /sys

Why do you consider this useful enough to justify adding it to
/pro/meminfo? How will people use it, what benefit will they see, etc?


Maybe you've undersold this whole patchset, but I'm struggling a bit to
see what the end-user benefits are. What would be wrong with just
sticking with what we have now?

2018-06-19 07:31:24

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] mm, proc: add KReclaimable to /proc/meminfo

On 06/18/2018 11:33 PM, Andrew Morton wrote:
> On Mon, 18 Jun 2018 11:18:07 +0200 Vlastimil Babka <[email protected]> wrote:
>
>> The vmstat NR_KERNEL_MISC_RECLAIMABLE counter is for kernel non-slab
>> allocations that can be reclaimed via shrinker. In /proc/meminfo, we can show
>> the sum of all reclaimable kernel allocations (including slab) as
>> "KReclaimable". Add the same counter also to per-node meminfo under /sys
>
> Why do you consider this useful enough to justify adding it to
> /pro/meminfo? How will people use it, what benefit will they see, etc?

Let's add this:

With this counter, users will have more complete information about
kernel memory usage. Non-slab reclaimable pages (currently just the ION
allocator) will not be missing from /proc/meminfo, making users wonder
where part of their memory went. More precisely, they already appear in
MemAvailable, but without the new counter, it's not obvious why the
value in MemAvailable doesn't fully correspond with the sum of other
counters participating in it.

> Maybe you've undersold this whole patchset, but I'm struggling a bit to
> see what the end-user benefits are. What would be wrong with just
> sticking with what we have now?

Fair enough, I will add more info in reply to the cover letter.

2018-06-19 07:56:41

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] kmalloc-reclaimable caches

On 06/18/2018 11:18 AM, Vlastimil Babka wrote:
> v2 changes:
> - shorten cache names to kmalloc-rcl-<SIZE>
> - last patch shortens <SIZE> for all kmalloc caches to e.g. "1k", "4M"
> - include dma caches to the 2D kmalloc_caches[] array to avoid a branch
> - vmstat counter nr_indirectly_reclaimable_bytes renamed to
> nr_kernel_misc_reclaimable, doesn't include kmalloc-rcl-*
> - /proc/meminfo counter renamed to KReclaimable, includes kmalloc-rcl*
> and nr_kernel_misc_reclaimable
>
> Hi,
>
> as discussed at LSF/MM [1] here's a patchset that introduces
> kmalloc-reclaimable caches (more details in the second patch) and uses them for
> SLAB freelists and dcache external names. The latter allows us to repurpose the
> NR_INDIRECTLY_RECLAIMABLE_BYTES counter later in the series.

More info about user benefits of the patchset:

With patch 4, dcache external names are allocated from kmalloc-rcl-*
caches, eliminating the need for manual accounting. More importantly, it
also ensures the reclaimable kmalloc allocations are grouped in pages
separate from the regular kmalloc allocations. The need for proper
accounting of dcache external names has shown it's easy for misbehaving
process to allocate lots of them, causing premature OOMs. Without the
added grouping, it's likely that similar workload can interleave the
dcache external names allocations with regular kmalloc allocations
(note: I haven't searched myself for an example of such regular kmalloc
allocation, but I would be very surprised if there wasn't some). A
pathological case would be e.g. one 64byte regular allocations with 63
external dcache names in a page (64x64=4096), which means the page is
not freed even after reclaiming after all dcache names, and the process
can thus steal the whole page with single 64byte allocation.

If there other kmalloc users similar to dcache external names become
identified, they can also benefit from the new functionality simply by
adding __GFP_RECLAIMABLE to the kmalloc calls.

Side benefits of the patchset (that could be also merged separately)
include removed branch for detecting __GFP_DMA kmalloc(), and shortening
kmalloc cache names in /proc/slabinfo output. The latter is potentially
an ABI break in case there are tools parsing the names and expecting the
values to be in bytes.

2018-06-19 08:15:04

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] mm, proc: add KReclaimable to /proc/meminfo

On Tue, Jun 19, 2018 at 09:30:03AM +0200, Vlastimil Babka wrote:
> On 06/18/2018 11:33 PM, Andrew Morton wrote:
> > On Mon, 18 Jun 2018 11:18:07 +0200 Vlastimil Babka <[email protected]> wrote:
> >
> >> The vmstat NR_KERNEL_MISC_RECLAIMABLE counter is for kernel non-slab
> >> allocations that can be reclaimed via shrinker. In /proc/meminfo, we can show
> >> the sum of all reclaimable kernel allocations (including slab) as
> >> "KReclaimable". Add the same counter also to per-node meminfo under /sys
> >
> > Why do you consider this useful enough to justify adding it to
> > /pro/meminfo? How will people use it, what benefit will they see, etc?
>
> Let's add this:
>
> With this counter, users will have more complete information about
> kernel memory usage. Non-slab reclaimable pages (currently just the ION
> allocator) will not be missing from /proc/meminfo, making users wonder
> where part of their memory went. More precisely, they already appear in
> MemAvailable, but without the new counter, it's not obvious why the
> value in MemAvailable doesn't fully correspond with the sum of other
> counters participating in it.

Hmm, if we could get MemAvailable with sum of other counters participating
in it, MemAvailable wouldn't be meaninful. IMO, MemAvailable don't need to
be matched with other counters.

The benefit of ION KReclaimable in real field is there are some sluggish
problem bugreport under memory pressure and found ION page pool is too
much without shrinking. In that case, that meminfo would be useful to
know something was broken in the system.

In that point of view, a concern to me is if we put more KReclaimable
pages(e.g., binder is candidate), it ends up we couldn't identify what
caches are too much among them. That means we needs KReclaimableInfo(like
slabinfo) to show each type's KReclaimable pages in future.

Anyway, it's good for first step.

>
> > Maybe you've undersold this whole patchset, but I'm struggling a bit to
> > see what the end-user benefits are. What would be wrong with just
> > sticking with what we have now?
>
> Fair enough, I will add more info in reply to the cover letter.
>

2018-06-19 12:46:00

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] mm, proc: add KReclaimable to /proc/meminfo

On 06/19/2018 10:13 AM, Minchan Kim wrote:
> On Tue, Jun 19, 2018 at 09:30:03AM +0200, Vlastimil Babka wrote:
>> On 06/18/2018 11:33 PM, Andrew Morton wrote:
>>> On Mon, 18 Jun 2018 11:18:07 +0200 Vlastimil Babka <[email protected]> wrote:
>>>
>>>> The vmstat NR_KERNEL_MISC_RECLAIMABLE counter is for kernel non-slab
>>>> allocations that can be reclaimed via shrinker. In /proc/meminfo, we can show
>>>> the sum of all reclaimable kernel allocations (including slab) as
>>>> "KReclaimable". Add the same counter also to per-node meminfo under /sys
>>>
>>> Why do you consider this useful enough to justify adding it to
>>> /pro/meminfo? How will people use it, what benefit will they see, etc?
>>
>> Let's add this:
>>
>> With this counter, users will have more complete information about
>> kernel memory usage. Non-slab reclaimable pages (currently just the ION
>> allocator) will not be missing from /proc/meminfo, making users wonder
>> where part of their memory went. More precisely, they already appear in
>> MemAvailable, but without the new counter, it's not obvious why the
>> value in MemAvailable doesn't fully correspond with the sum of other
>> counters participating in it.
>
> Hmm, if we could get MemAvailable with sum of other counters participating
> in it, MemAvailable wouldn't be meaninful. IMO, MemAvailable don't need to
> be matched with other counters.

MemAvailable is meant as a "shortcut" for users, so they don't have to
remember which counters to count and add them up manually. It's also not
an exact sum, because there are some assumptions that part of
reclaimable memory might be pinned etc. Still, missing KReclaimable in
/proc/meminfo would be an odd exception wrt the other counters, IMHO.

> The benefit of ION KReclaimable in real field is there are some sluggish
> problem bugreport under memory pressure and found ION page pool is too
> much without shrinking. In that case, that meminfo would be useful to
> know something was broken in the system.

Right.

> In that point of view, a concern to me is if we put more KReclaimable
> pages(e.g., binder is candidate), it ends up we couldn't identify what
> caches are too much among them. That means we needs KReclaimableInfo(like
> slabinfo) to show each type's KReclaimable pages in future.

Yeah there are more direct kernel allocations that can eat significant
amounts of memory, without being visible in /proc/meminfo, and not
necessarily reclaimable. E.g. unless that changed, I recall XFS page
buffers. Striking a good balance of how detailed the accounting should
be is not easy.

BTW at some point I proposed MemUnaccounted to make it more obvious
(without adding up fields manually) that there is some memory consumed
by kernel allocations not visible in the other meminfo fields.

2018-06-20 11:25:50

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] mm: rename and change semantics of nr_indirectly_reclaimable_bytes

Hi Vlastimil,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.18-rc1 next-20180619]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Vlastimil-Babka/kmalloc-reclaimable-caches/20180618-172912
base: git://git.cmpxchg.org/linux-mmotm.git master
config: x86_64-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

All errors (new ones prefixed by >>):

drivers/staging//android/ion/ion_page_pool.c: In function 'ion_page_pool_remove':
>> drivers/staging//android/ion/ion_page_pool.c:56:40: error: 'NR_INDIRECTLY_RECLAIMABLE_BYTES' undeclared (first use in this function)
mod_node_page_state(page_pgdat(page), NR_INDIRECTLY_RECLAIMABLE_BYTES,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/staging//android/ion/ion_page_pool.c:56:40: note: each undeclared identifier is reported only once for each function it appears in

vim +/NR_INDIRECTLY_RECLAIMABLE_BYTES +56 drivers/staging//android/ion/ion_page_pool.c

0214c7f2 Rebecca Schultz Zavin 2013-12-13 40
0fb9b815 Rebecca Schultz Zavin 2013-12-13 41 static struct page *ion_page_pool_remove(struct ion_page_pool *pool, bool high)
0214c7f2 Rebecca Schultz Zavin 2013-12-13 42 {
0214c7f2 Rebecca Schultz Zavin 2013-12-13 43 struct page *page;
0214c7f2 Rebecca Schultz Zavin 2013-12-13 44
0fb9b815 Rebecca Schultz Zavin 2013-12-13 45 if (high) {
0fb9b815 Rebecca Schultz Zavin 2013-12-13 46 BUG_ON(!pool->high_count);
38c003b1 Heesub Shin 2014-05-28 47 page = list_first_entry(&pool->high_items, struct page, lru);
0fb9b815 Rebecca Schultz Zavin 2013-12-13 48 pool->high_count--;
0fb9b815 Rebecca Schultz Zavin 2013-12-13 49 } else {
0fb9b815 Rebecca Schultz Zavin 2013-12-13 50 BUG_ON(!pool->low_count);
38c003b1 Heesub Shin 2014-05-28 51 page = list_first_entry(&pool->low_items, struct page, lru);
0fb9b815 Rebecca Schultz Zavin 2013-12-13 52 pool->low_count--;
0fb9b815 Rebecca Schultz Zavin 2013-12-13 53 }
0214c7f2 Rebecca Schultz Zavin 2013-12-13 54
38c003b1 Heesub Shin 2014-05-28 55 list_del(&page->lru);
06cd8a61 Andrew Morton 2018-06-15 @56 mod_node_page_state(page_pgdat(page), NR_INDIRECTLY_RECLAIMABLE_BYTES,
06cd8a61 Andrew Morton 2018-06-15 57 -(1 << (PAGE_SHIFT + pool->order)));
0214c7f2 Rebecca Schultz Zavin 2013-12-13 58 return page;
0214c7f2 Rebecca Schultz Zavin 2013-12-13 59 }
0214c7f2 Rebecca Schultz Zavin 2013-12-13 60

:::::: The code at line 56 was first introduced by commit
:::::: 06cd8a610861a7ea0be1ff627fd8d6d6b3f62ca0 origin

:::::: TO: Andrew Morton <[email protected]>
:::::: CC: Johannes Weiner <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (3.11 kB)
.config.gz (63.15 kB)
Download all attachments

2018-06-29 18:30:50

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] mm: rename and change semantics of nr_indirectly_reclaimable_bytes

On 06/20/2018 01:23 PM, kbuild test robot wrote:
> Hi Vlastimil,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on mmotm/master]
> [also build test ERROR on v4.18-rc1 next-20180619]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url: https://github.com/0day-ci/linux/commits/Vlastimil-Babka/kmalloc-reclaimable-caches/20180618-172912
> base: git://git.cmpxchg.org/linux-mmotm.git master
> config: x86_64-allmodconfig (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
>
> All errors (new ones prefixed by >>):
>
> drivers/staging//android/ion/ion_page_pool.c: In function 'ion_page_pool_remove':
>>> drivers/staging//android/ion/ion_page_pool.c:56:40: error: 'NR_INDIRECTLY_RECLAIMABLE_BYTES' undeclared (first use in this function)
> mod_node_page_state(page_pgdat(page), NR_INDIRECTLY_RECLAIMABLE_BYTES,
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> drivers/staging//android/ion/ion_page_pool.c:56:40: note: each undeclared identifier is reported only once for each function it appears in
>
> vim +/NR_INDIRECTLY_RECLAIMABLE_BYTES +56 drivers/staging//android/ion/ion_page_pool.c

Looks like I missed a hunk, updated patch below.

----8<----
From a0053c64c72d7e094252d0d7462de8569d87c543 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <[email protected]>
Date: Tue, 22 May 2018 16:10:10 +0200
Subject: [PATCH v3 5/7] mm: rename and change semantics of
nr_indirectly_reclaimable_bytes

The vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES was introduced by commit
eb59254608bc ("mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTES") with the goal of
accounting objects that can be reclaimed, but cannot be allocated via a
SLAB_RECLAIM_ACCOUNT cache. This is now possible via kmalloc() with
__GFP_RECLAIMABLE flag, and the dcache external names user is converted.

The counter is however still useful for accounting direct page allocations
(i.e. not slab) with a shrinker, such as the ION page pool. So keep it, and:

- change granularity to pages to be more like other counters; sub-page
allocations should be able to use kmalloc
- rename the counter to NR_KERNEL_MISC_RECLAIMABLE
- expose the counter again in vmstat as "nr_kernel_misc_reclaimable"; we can
again remove the check for not printing "hidden" counters

Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Vijayanand Jitta <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Sumit Semwal <[email protected]>
---
drivers/staging/android/ion/ion_page_pool.c | 8 ++++----
include/linux/mmzone.h | 2 +-
mm/page_alloc.c | 19 +++++++------------
mm/util.c | 3 +--
mm/vmstat.c | 6 +-----
5 files changed, 14 insertions(+), 24 deletions(-)

diff --git a/drivers/staging/android/ion/ion_page_pool.c b/drivers/staging/android/ion/ion_page_pool.c
index 9bc56eb48d2a..0d2a95957ee8 100644
--- a/drivers/staging/android/ion/ion_page_pool.c
+++ b/drivers/staging/android/ion/ion_page_pool.c
@@ -33,8 +33,8 @@ static void ion_page_pool_add(struct ion_page_pool *pool, struct page *page)
pool->low_count++;
}

- mod_node_page_state(page_pgdat(page), NR_INDIRECTLY_RECLAIMABLE_BYTES,
- (1 << (PAGE_SHIFT + pool->order)));
+ mod_node_page_state(page_pgdat(page), NR_KERNEL_MISC_RECLAIMABLE,
+ 1 << pool->order);
mutex_unlock(&pool->mutex);
}

@@ -53,8 +53,8 @@ static struct page *ion_page_pool_remove(struct ion_page_pool *pool, bool high)
}

list_del(&page->lru);
- mod_node_page_state(page_pgdat(page), NR_INDIRECTLY_RECLAIMABLE_BYTES,
- -(1 << (PAGE_SHIFT + pool->order)));
+ mod_node_page_state(page_pgdat(page), NR_KERNEL_MISC_RECLAIMABLE,
+ -(1 << pool->order));
return page;
}

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 32699b2dc52a..c2f6bc4c9e8a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -180,7 +180,7 @@ enum node_stat_item {
NR_VMSCAN_IMMEDIATE, /* Prioritise for reclaim when writeback ends */
NR_DIRTIED, /* page dirtyings since bootup */
NR_WRITTEN, /* page writings since bootup */
- NR_INDIRECTLY_RECLAIMABLE_BYTES, /* measured in bytes */
+ NR_KERNEL_MISC_RECLAIMABLE, /* reclaimable non-slab kernel pages */
NR_VM_NODE_STAT_ITEMS
};

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1521100f1e63..8ceb45e11b97 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4704,6 +4704,7 @@ long si_mem_available(void)
unsigned long pagecache;
unsigned long wmark_low = 0;
unsigned long pages[NR_LRU_LISTS];
+ unsigned long reclaimable;
struct zone *zone;
int lru;

@@ -4729,19 +4730,13 @@ long si_mem_available(void)
available += pagecache;

/*
- * Part of the reclaimable slab consists of items that are in use,
- * and cannot be freed. Cap this estimate at the low watermark.
+ * Part of the reclaimable slab and other kernel memory consists of
+ * items that are in use, and cannot be freed. Cap this estimate at the
+ * low watermark.
*/
- available += global_node_page_state(NR_SLAB_RECLAIMABLE) -
- min(global_node_page_state(NR_SLAB_RECLAIMABLE) / 2,
- wmark_low);
-
- /*
- * Part of the kernel memory, which can be released under memory
- * pressure.
- */
- available += global_node_page_state(NR_INDIRECTLY_RECLAIMABLE_BYTES) >>
- PAGE_SHIFT;
+ reclaimable = global_node_page_state(NR_SLAB_RECLAIMABLE) +
+ global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE);
+ available += reclaimable - min(reclaimable / 2, wmark_low);

if (available < 0)
available = 0;
diff --git a/mm/util.c b/mm/util.c
index 3351659200e6..891f0654e7b5 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -675,8 +675,7 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
* Part of the kernel memory, which can be released
* under memory pressure.
*/
- free += global_node_page_state(
- NR_INDIRECTLY_RECLAIMABLE_BYTES) >> PAGE_SHIFT;
+ free += global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE);

/*
* Leave reserved pages. The pages are not for anonymous pages.
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 75eda9c2b260..7c677d3a61ec 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1161,7 +1161,7 @@ const char * const vmstat_text[] = {
"nr_vmscan_immediate_reclaim",
"nr_dirtied",
"nr_written",
- "", /* nr_indirectly_reclaimable */
+ "nr_kernel_misc_reclaimable",

/* enum writeback_stat_item counters */
"nr_dirty_threshold",
@@ -1704,10 +1704,6 @@ static int vmstat_show(struct seq_file *m, void *arg)
unsigned long *l = arg;
unsigned long off = l - (unsigned long *)m->private;

- /* Skip hidden vmstat items. */
- if (*vmstat_text[off] == '\0')
- return 0;
-
seq_puts(m, vmstat_text[off]);
seq_put_decimal_ull(m, " ", *l);
seq_putc(m, '\n');
--
2.17.1



2018-06-29 21:21:32

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] mm: rename and change semantics of nr_indirectly_reclaimable_bytes

On Fri, Jun 29, 2018 at 05:37:02PM +0200, Vlastimil Babka wrote:
> On 06/20/2018 01:23 PM, kbuild test robot wrote:
> > Hi Vlastimil,
> >
> > Thank you for the patch! Yet something to improve:
> >
> > [auto build test ERROR on mmotm/master]
> > [also build test ERROR on v4.18-rc1 next-20180619]
> > [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> >
> > url: https://github.com/0day-ci/linux/commits/Vlastimil-Babka/kmalloc-reclaimable-caches/20180618-172912
> > base: git://git.cmpxchg.org/linux-mmotm.git master
> > config: x86_64-allmodconfig (attached as .config)
> > compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
> > reproduce:
> > # save the attached .config to linux build tree
> > make ARCH=x86_64
> >
> > All errors (new ones prefixed by >>):
> >
> > drivers/staging//android/ion/ion_page_pool.c: In function 'ion_page_pool_remove':
> >>> drivers/staging//android/ion/ion_page_pool.c:56:40: error: 'NR_INDIRECTLY_RECLAIMABLE_BYTES' undeclared (first use in this function)
> > mod_node_page_state(page_pgdat(page), NR_INDIRECTLY_RECLAIMABLE_BYTES,
> > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > drivers/staging//android/ion/ion_page_pool.c:56:40: note: each undeclared identifier is reported only once for each function it appears in
> >
> > vim +/NR_INDIRECTLY_RECLAIMABLE_BYTES +56 drivers/staging//android/ion/ion_page_pool.c
>
> Looks like I missed a hunk, updated patch below.
>
> ----8<----
> From a0053c64c72d7e094252d0d7462de8569d87c543 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <[email protected]>
> Date: Tue, 22 May 2018 16:10:10 +0200
> Subject: [PATCH v3 5/7] mm: rename and change semantics of
> nr_indirectly_reclaimable_bytes
>
> The vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES was introduced by commit
> eb59254608bc ("mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTES") with the goal of
> accounting objects that can be reclaimed, but cannot be allocated via a
> SLAB_RECLAIM_ACCOUNT cache. This is now possible via kmalloc() with
> __GFP_RECLAIMABLE flag, and the dcache external names user is converted.
>
> The counter is however still useful for accounting direct page allocations
> (i.e. not slab) with a shrinker, such as the ION page pool. So keep it, and:

Btw, it looks like I've another example of usefulness of this counter:
dynamic per-cpu data.

2018-06-30 10:12:55

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] mm: rename and change semantics of nr_indirectly_reclaimable_bytes

On 06/29/2018 11:12 PM, Roman Gushchin wrote:
>>
>> The vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES was introduced by commit
>> eb59254608bc ("mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTES") with the goal of
>> accounting objects that can be reclaimed, but cannot be allocated via a
>> SLAB_RECLAIM_ACCOUNT cache. This is now possible via kmalloc() with
>> __GFP_RECLAIMABLE flag, and the dcache external names user is converted.
>>
>> The counter is however still useful for accounting direct page allocations
>> (i.e. not slab) with a shrinker, such as the ION page pool. So keep it, and:
>
> Btw, it looks like I've another example of usefulness of this counter:
> dynamic per-cpu data.

Hmm, but are those reclaimable? Most likely not in general? Do you have
examples that are?

2018-07-02 16:55:03

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] mm: rename and change semantics of nr_indirectly_reclaimable_bytes

On Sat, Jun 30, 2018 at 12:09:27PM +0200, Vlastimil Babka wrote:
> On 06/29/2018 11:12 PM, Roman Gushchin wrote:
> >>
> >> The vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES was introduced by commit
> >> eb59254608bc ("mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTES") with the goal of
> >> accounting objects that can be reclaimed, but cannot be allocated via a
> >> SLAB_RECLAIM_ACCOUNT cache. This is now possible via kmalloc() with
> >> __GFP_RECLAIMABLE flag, and the dcache external names user is converted.
> >>
> >> The counter is however still useful for accounting direct page allocations
> >> (i.e. not slab) with a shrinker, such as the ION page pool. So keep it, and:
> >
> > Btw, it looks like I've another example of usefulness of this counter:
> > dynamic per-cpu data.
>
> Hmm, but are those reclaimable? Most likely not in general? Do you have
> examples that are?

If these per-cpu data is something like per-cpu refcounters,
which are using to manage reclaimable objects (e.g. cgroup css objects).
Of course, they are not always reclaimable, but in certain states.

Thanks!

2018-07-17 08:45:00

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] mm: rename and change semantics of nr_indirectly_reclaimable_bytes

On 07/02/2018 06:52 PM, Roman Gushchin wrote:
> On Sat, Jun 30, 2018 at 12:09:27PM +0200, Vlastimil Babka wrote:
>> On 06/29/2018 11:12 PM, Roman Gushchin wrote:
>>>>
>>>> The vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES was introduced by commit
>>>> eb59254608bc ("mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTES") with the goal of
>>>> accounting objects that can be reclaimed, but cannot be allocated via a
>>>> SLAB_RECLAIM_ACCOUNT cache. This is now possible via kmalloc() with
>>>> __GFP_RECLAIMABLE flag, and the dcache external names user is converted.
>>>>
>>>> The counter is however still useful for accounting direct page allocations
>>>> (i.e. not slab) with a shrinker, such as the ION page pool. So keep it, and:
>>>
>>> Btw, it looks like I've another example of usefulness of this counter:
>>> dynamic per-cpu data.
>>
>> Hmm, but are those reclaimable? Most likely not in general? Do you have
>> examples that are?
>
> If these per-cpu data is something like per-cpu refcounters,
> which are using to manage reclaimable objects (e.g. cgroup css objects).
> Of course, they are not always reclaimable, but in certain states.

BTW, seems you seem interested, could you provide some more formal
review as well? Others too. We don't need to cover all use cases
immediately, when the patchset is apparently stalled due to lack of
review. Thanks!

> Thanks!
>


2018-07-17 18:57:33

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] mm: rename and change semantics of nr_indirectly_reclaimable_bytes

On Tue, Jul 17, 2018 at 10:44:07AM +0200, Vlastimil Babka wrote:
> On 07/02/2018 06:52 PM, Roman Gushchin wrote:
> > On Sat, Jun 30, 2018 at 12:09:27PM +0200, Vlastimil Babka wrote:
> >> On 06/29/2018 11:12 PM, Roman Gushchin wrote:
> >>>>
> >>>> The vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES was introduced by commit
> >>>> eb59254608bc ("mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTES") with the goal of
> >>>> accounting objects that can be reclaimed, but cannot be allocated via a
> >>>> SLAB_RECLAIM_ACCOUNT cache. This is now possible via kmalloc() with
> >>>> __GFP_RECLAIMABLE flag, and the dcache external names user is converted.
> >>>>
> >>>> The counter is however still useful for accounting direct page allocations
> >>>> (i.e. not slab) with a shrinker, such as the ION page pool. So keep it, and:
> >>>
> >>> Btw, it looks like I've another example of usefulness of this counter:
> >>> dynamic per-cpu data.
> >>
> >> Hmm, but are those reclaimable? Most likely not in general? Do you have
> >> examples that are?
> >
> > If these per-cpu data is something like per-cpu refcounters,
> > which are using to manage reclaimable objects (e.g. cgroup css objects).
> > Of course, they are not always reclaimable, but in certain states.
>
> BTW, seems you seem interested, could you provide some more formal
> review as well? Others too. We don't need to cover all use cases
> immediately, when the patchset is apparently stalled due to lack of
> review. Thanks!

Sure!

The patchset looks sane at a first glance, but I need some time
to dig deeper. Is v2 the final version?

Thanks!

2018-07-17 19:15:20

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] mm: rename and change semantics of nr_indirectly_reclaimable_bytes

On 07/17/2018 08:54 PM, Roman Gushchin wrote:
> On Tue, Jul 17, 2018 at 10:44:07AM +0200, Vlastimil Babka wrote:
>> On 07/02/2018 06:52 PM, Roman Gushchin wrote:
>>> On Sat, Jun 30, 2018 at 12:09:27PM +0200, Vlastimil Babka wrote:
>>>
>>> If these per-cpu data is something like per-cpu refcounters,
>>> which are using to manage reclaimable objects (e.g. cgroup css objects).
>>> Of course, they are not always reclaimable, but in certain states.
>>
>> BTW, seems you seem interested, could you provide some more formal
>> review as well? Others too. We don't need to cover all use cases
>> immediately, when the patchset is apparently stalled due to lack of
>> review. Thanks!
>
> Sure!

Thanks!

> The patchset looks sane at a first glance, but I need some time
> to dig deeper. Is v2 the final version?

There was a fixlet on top and some added changelog text, so I'll do a v3
tomorrow incorporating that to make things easier for everyone.

> Thanks!
>