2017-06-19 23:28:48

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 0/4] percpu: add basic stats and tracepoints to percpu allocator

There is limited visibility into the percpu memory allocator making it hard to
understand usage patterns. Without these concrete numbers, we are left to
conjecture about the correctness of percpu memory patterns and usage.
Additionally, there is no mechanism to review the correctness/efficiency of the
current implementation.

This patchset address the following:
- Adds basic statistics to reason about the number of allocations over the
lifetime, allocation sizes, and fragmentation.
- Adds tracepoints to enable better debug capabilities as well as the ability
to review allocation requests and corresponding decisions.

This patchiest contains the following four patches:
0001-percpu-add-missing-lockdep_assert_held-to-func-pcpu_.patch
0002-percpu-migrate-percpu-data-structures-to-internal-he.patch
0003-percpu-expose-statistics-about-percpu-memory-via-deb.patch
0004-percpu-add-tracepoint-support-for-percpu-memory.patch

0001 adds a missing lockdep_assert_held for pcpu_lock to improve consistency
and safety. 0002 prepares for the following patches by moving the definition of
data structures and exposes previously static variables. 0003 adds percpu
statistics via debugfs. 0004 adds tracepoints to key percpu events: chunk
creation/deletion and area allocation/free/failure.

This patchset is on top of linus#master 1132d5e.

diffstats below:

percpu: add missing lockdep_assert_held to func pcpu_free_area
percpu: migrate percpu data structures to internal header
percpu: expose statistics about percpu memory via debugfs
percpu: add tracepoint support for percpu memory

include/trace/events/percpu.h | 125 ++++++++++++++++++++++++
mm/Kconfig | 8 ++
mm/Makefile | 1 +
mm/percpu-internal.h | 164 +++++++++++++++++++++++++++++++
mm/percpu-km.c | 6 ++
mm/percpu-stats.c | 222 ++++++++++++++++++++++++++++++++++++++++++
mm/percpu-vm.c | 7 ++
mm/percpu.c | 53 +++++-----
8 files changed, 563 insertions(+), 23 deletions(-)
create mode 100644 include/trace/events/percpu.h
create mode 100644 mm/percpu-internal.h
create mode 100644 mm/percpu-stats.c

Thanks,
Dennis


2017-06-19 23:28:49

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 1/4] percpu: add missing lockdep_assert_held to func pcpu_free_area

Add a missing lockdep_assert_held for pcpu_lock to improve consistency
and safety throughout mm/percpu.c.

Signed-off-by: Dennis Zhou <[email protected]>
---
mm/percpu.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/mm/percpu.c b/mm/percpu.c
index e0aa8ae..f94a5eb 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -672,6 +672,8 @@ static void pcpu_free_area(struct pcpu_chunk *chunk, int freeme,
int to_free = 0;
int *p;

+ lockdep_assert_held(&pcpu_lock);
+
freeme |= 1; /* we are searching for <given offset, in use> pair */

i = 0;
--
2.9.3

2017-06-19 23:28:54

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 4/4] percpu: add tracepoint support for percpu memory

Add support for tracepoints to the following events: chunk allocation,
chunk free, area allocation, area free, and area allocation failure.
This should let us replay percpu memory requests and evaluate
corresponding decisions.

Signed-off-by: Dennis Zhou <[email protected]>
---
include/trace/events/percpu.h | 125 ++++++++++++++++++++++++++++++++++++++++++
mm/percpu-km.c | 2 +
mm/percpu-vm.c | 2 +
mm/percpu.c | 12 ++++
4 files changed, 141 insertions(+)
create mode 100644 include/trace/events/percpu.h

diff --git a/include/trace/events/percpu.h b/include/trace/events/percpu.h
new file mode 100644
index 0000000..ad34b1b
--- /dev/null
+++ b/include/trace/events/percpu.h
@@ -0,0 +1,125 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM percpu
+
+#if !defined(_TRACE_PERCPU_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PERCPU_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(percpu_alloc_percpu,
+
+ TP_PROTO(bool reserved, bool is_atomic, size_t size,
+ size_t align, void *base_addr, int off, void __percpu *ptr),
+
+ TP_ARGS(reserved, is_atomic, size, align, base_addr, off, ptr),
+
+ TP_STRUCT__entry(
+ __field( bool, reserved )
+ __field( bool, is_atomic )
+ __field( size_t, size )
+ __field( size_t, align )
+ __field( void *, base_addr )
+ __field( int, off )
+ __field( void __percpu *, ptr )
+ ),
+
+ TP_fast_assign(
+ __entry->reserved = reserved;
+ __entry->is_atomic = is_atomic;
+ __entry->size = size;
+ __entry->align = align;
+ __entry->base_addr = base_addr;
+ __entry->off = off;
+ __entry->ptr = ptr;
+ ),
+
+ TP_printk("reserved=%d is_atomic=%d size=%zu align=%zu base_addr=%p off=%d ptr=%p",
+ __entry->reserved, __entry->is_atomic,
+ __entry->size, __entry->align,
+ __entry->base_addr, __entry->off, __entry->ptr)
+);
+
+TRACE_EVENT(percpu_free_percpu,
+
+ TP_PROTO(void *base_addr, int off, void __percpu *ptr),
+
+ TP_ARGS(base_addr, off, ptr),
+
+ TP_STRUCT__entry(
+ __field( void *, base_addr )
+ __field( int, off )
+ __field( void __percpu *, ptr )
+ ),
+
+ TP_fast_assign(
+ __entry->base_addr = base_addr;
+ __entry->off = off;
+ __entry->ptr = ptr;
+ ),
+
+ TP_printk("base_addr=%p off=%d ptr=%p",
+ __entry->base_addr, __entry->off, __entry->ptr)
+);
+
+TRACE_EVENT(percpu_alloc_percpu_fail,
+
+ TP_PROTO(bool reserved, bool is_atomic, size_t size, size_t align),
+
+ TP_ARGS(reserved, is_atomic, size, align),
+
+ TP_STRUCT__entry(
+ __field( bool, reserved )
+ __field( bool, is_atomic )
+ __field( size_t, size )
+ __field( size_t, align )
+ ),
+
+ TP_fast_assign(
+ __entry->reserved = reserved;
+ __entry->is_atomic = is_atomic;
+ __entry->size = size;
+ __entry->align = align;
+ ),
+
+ TP_printk("reserved=%d is_atomic=%d size=%zu align=%zu",
+ __entry->reserved, __entry->is_atomic,
+ __entry->size, __entry->align)
+);
+
+TRACE_EVENT(percpu_create_chunk,
+
+ TP_PROTO(void *base_addr),
+
+ TP_ARGS(base_addr),
+
+ TP_STRUCT__entry(
+ __field( void *, base_addr )
+ ),
+
+ TP_fast_assign(
+ __entry->base_addr = base_addr;
+ ),
+
+ TP_printk("base_addr=%p", __entry->base_addr)
+);
+
+TRACE_EVENT(percpu_destroy_chunk,
+
+ TP_PROTO(void *base_addr),
+
+ TP_ARGS(base_addr),
+
+ TP_STRUCT__entry(
+ __field( void *, base_addr )
+ ),
+
+ TP_fast_assign(
+ __entry->base_addr = base_addr;
+ ),
+
+ TP_printk("base_addr=%p", __entry->base_addr)
+);
+
+#endif /* _TRACE_PERCPU_H */
+
+#include <trace/define_trace.h>
diff --git a/mm/percpu-km.c b/mm/percpu-km.c
index 3bbfa0c..2b79e43 100644
--- a/mm/percpu-km.c
+++ b/mm/percpu-km.c
@@ -73,6 +73,7 @@ static struct pcpu_chunk *pcpu_create_chunk(void)
spin_unlock_irq(&pcpu_lock);

pcpu_stats_chunk_alloc();
+ trace_percpu_create_chunk(chunk->base_addr);

return chunk;
}
@@ -82,6 +83,7 @@ static void pcpu_destroy_chunk(struct pcpu_chunk *chunk)
const int nr_pages = pcpu_group_sizes[0] >> PAGE_SHIFT;

pcpu_stats_chunk_dealloc();
+ trace_percpu_destroy_chunk(chunk->base_addr);

if (chunk && chunk->data)
__free_pages(chunk->data, order_base_2(nr_pages));
diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
index 5915a22..7ad9d94 100644
--- a/mm/percpu-vm.c
+++ b/mm/percpu-vm.c
@@ -345,6 +345,7 @@ static struct pcpu_chunk *pcpu_create_chunk(void)
chunk->base_addr = vms[0]->addr - pcpu_group_offsets[0];

pcpu_stats_chunk_alloc();
+ trace_percpu_create_chunk(chunk->base_addr);

return chunk;
}
@@ -352,6 +353,7 @@ static struct pcpu_chunk *pcpu_create_chunk(void)
static void pcpu_destroy_chunk(struct pcpu_chunk *chunk)
{
pcpu_stats_chunk_dealloc();
+ trace_percpu_destroy_chunk(chunk->base_addr);

if (chunk && chunk->data)
pcpu_free_vm_areas(chunk->data, pcpu_nr_groups);
diff --git a/mm/percpu.c b/mm/percpu.c
index 25b4ba5..7a1707a 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -76,6 +76,9 @@
#include <asm/tlbflush.h>
#include <asm/io.h>

+#define CREATE_TRACE_POINTS
+#include <trace/events/percpu.h>
+
#include "percpu-internal.h"

#define PCPU_SLOT_BASE_SHIFT 5 /* 1-31 shares the same slot */
@@ -1015,11 +1018,17 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,

ptr = __addr_to_pcpu_ptr(chunk->base_addr + off);
kmemleak_alloc_percpu(ptr, size, gfp);
+
+ trace_percpu_alloc_percpu(reserved, is_atomic, size, align,
+ chunk->base_addr, off, ptr);
+
return ptr;

fail_unlock:
spin_unlock_irqrestore(&pcpu_lock, flags);
fail:
+ trace_percpu_alloc_percpu_fail(reserved, is_atomic, size, align);
+
if (!is_atomic && warn_limit) {
pr_warn("allocation failed, size=%zu align=%zu atomic=%d, %s\n",
size, align, is_atomic, err);
@@ -1269,6 +1278,8 @@ void free_percpu(void __percpu *ptr)
}
}

+ trace_percpu_free_percpu(chunk->base_addr, off, ptr);
+
spin_unlock_irqrestore(&pcpu_lock, flags);
}
EXPORT_SYMBOL_GPL(free_percpu);
@@ -1719,6 +1730,7 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
pcpu_chunk_relocate(pcpu_first_chunk, -1);

pcpu_stats_chunk_alloc();
+ trace_percpu_create_chunk(base_addr);

/* we're done */
pcpu_base_addr = base_addr;
--
2.9.3

2017-06-19 23:28:53

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 3/4] percpu: expose statistics about percpu memory via debugfs

There is limited visibility into the use of percpu memory leaving us
unable to reason about correctness of parameters and overall use of
percpu memory. These counters and statistics aim to help understand
basic statistics about percpu memory such as number of allocations over
the lifetime, allocation sizes, and fragmentation.

New Config: PERCPU_STATS

Signed-off-by: Dennis Zhou <[email protected]>
---
mm/Kconfig | 8 ++
mm/Makefile | 1 +
mm/percpu-internal.h | 131 ++++++++++++++++++++++++++++++
mm/percpu-km.c | 4 +
mm/percpu-stats.c | 222 +++++++++++++++++++++++++++++++++++++++++++++++++++
mm/percpu-vm.c | 5 ++
mm/percpu.c | 9 +++
7 files changed, 380 insertions(+)
create mode 100644 mm/percpu-stats.c

diff --git a/mm/Kconfig b/mm/Kconfig
index beb7a45..8fae426 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -706,3 +706,11 @@ config ARCH_USES_HIGH_VMA_FLAGS
bool
config ARCH_HAS_PKEYS
bool
+
+config PERCPU_STATS
+ bool "Collect percpu memory statistics"
+ default n
+ help
+ This feature collects and exposes statistics via debugfs. The
+ information includes global and per chunk statistics, which can
+ be used to help understand percpu memory usage.
diff --git a/mm/Makefile b/mm/Makefile
index 026f6a8..411bd24 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -103,3 +103,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
+obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
index 8b6cb2a..5509593 100644
--- a/mm/percpu-internal.h
+++ b/mm/percpu-internal.h
@@ -5,6 +5,11 @@
#include <linux/percpu.h>

struct pcpu_chunk {
+#ifdef CONFIG_PERCPU_STATS
+ int nr_alloc; /* # of allocations */
+ size_t max_alloc_size; /* largest allocation size */
+#endif
+
struct list_head list; /* linked to pcpu_slot lists */
int free_size; /* free bytes in the chunk */
int contig_hint; /* max contiguous size hint */
@@ -18,6 +23,11 @@ struct pcpu_chunk {
void *data; /* chunk data */
int first_free; /* no free below this */
bool immutable; /* no [de]population allowed */
+ bool has_reserved; /* Indicates if chunk has reserved space
+ at the beginning. Reserved chunk will
+ contain reservation for static chunk.
+ Dynamic chunk will contain reservation
+ for static and reserved chunks. */
int nr_populated; /* # of populated pages */
unsigned long populated[]; /* populated bitmap */
};
@@ -30,4 +40,125 @@ extern int pcpu_nr_slots __read_mostly;
extern struct pcpu_chunk *pcpu_first_chunk;
extern struct pcpu_chunk *pcpu_reserved_chunk;

+#ifdef CONFIG_PERCPU_STATS
+
+#include <linux/spinlock.h>
+
+struct percpu_stats {
+ u64 nr_alloc; /* lifetime # of allocations */
+ u64 nr_dealloc; /* lifetime # of deallocations */
+ u64 nr_cur_alloc; /* current # of allocations */
+ u64 nr_max_alloc; /* max # of live allocations */
+ u32 nr_chunks; /* current # of live chunks */
+ u32 nr_max_chunks; /* max # of live chunks */
+ size_t min_alloc_size; /* min allocaiton size */
+ size_t max_alloc_size; /* max allocation size */
+};
+
+extern struct percpu_stats pcpu_stats;
+extern struct pcpu_alloc_info pcpu_stats_ai;
+
+/*
+ * For debug purposes. We don't care about the flexible array.
+ */
+static inline void pcpu_stats_save_ai(const struct pcpu_alloc_info *ai)
+{
+ memcpy(&pcpu_stats_ai, ai, sizeof(struct pcpu_alloc_info));
+
+ /* initialize min_alloc_size to unit_size */
+ pcpu_stats.min_alloc_size = pcpu_stats_ai.unit_size;
+}
+
+/*
+ * pcpu_stats_area_alloc - increment area allocation stats
+ * @chunk: the location of the area being allocated
+ * @size: size of area to allocate in bytes
+ *
+ * CONTEXT:
+ * pcpu_lock.
+ */
+static inline void pcpu_stats_area_alloc(struct pcpu_chunk *chunk, size_t size)
+{
+ lockdep_assert_held(&pcpu_lock);
+
+ pcpu_stats.nr_alloc++;
+ pcpu_stats.nr_cur_alloc++;
+ pcpu_stats.nr_max_alloc =
+ max(pcpu_stats.nr_max_alloc, pcpu_stats.nr_cur_alloc);
+ pcpu_stats.min_alloc_size =
+ min(pcpu_stats.min_alloc_size, size);
+ pcpu_stats.max_alloc_size =
+ max(pcpu_stats.max_alloc_size, size);
+
+ chunk->nr_alloc++;
+ chunk->max_alloc_size = max(chunk->max_alloc_size, size);
+}
+
+/*
+ * pcpu_stats_area_dealloc - decrement allocation stats
+ * @chunk: the location of the area being deallocated
+ *
+ * CONTEXT:
+ * pcpu_lock.
+ */
+static inline void pcpu_stats_area_dealloc(struct pcpu_chunk *chunk)
+{
+ lockdep_assert_held(&pcpu_lock);
+
+ pcpu_stats.nr_dealloc++;
+ pcpu_stats.nr_cur_alloc--;
+
+ chunk->nr_alloc--;
+}
+
+/*
+ * pcpu_stats_chunk_alloc - increment chunk stats
+ */
+static inline void pcpu_stats_chunk_alloc(void)
+{
+ spin_lock_irq(&pcpu_lock);
+
+ pcpu_stats.nr_chunks++;
+ pcpu_stats.nr_max_chunks =
+ max(pcpu_stats.nr_max_chunks, pcpu_stats.nr_chunks);
+
+ spin_unlock_irq(&pcpu_lock);
+}
+
+/*
+ * pcpu_stats_chunk_dealloc - decrement chunk stats
+ */
+static inline void pcpu_stats_chunk_dealloc(void)
+{
+ spin_lock_irq(&pcpu_lock);
+
+ pcpu_stats.nr_chunks--;
+
+ spin_unlock_irq(&pcpu_lock);
+}
+
+#else
+
+static inline void pcpu_stats_save_ai(const struct pcpu_alloc_info *ai)
+{
+}
+
+static inline void pcpu_stats_area_alloc(struct pcpu_chunk *chunk, size_t size)
+{
+}
+
+static inline void pcpu_stats_area_dealloc(struct pcpu_chunk *chunk)
+{
+}
+
+static inline void pcpu_stats_chunk_alloc(void)
+{
+}
+
+static inline void pcpu_stats_chunk_dealloc(void)
+{
+}
+
+#endif /* !CONFIG_PERCPU_STATS */
+
#endif
diff --git a/mm/percpu-km.c b/mm/percpu-km.c
index d66911f..3bbfa0c 100644
--- a/mm/percpu-km.c
+++ b/mm/percpu-km.c
@@ -72,6 +72,8 @@ static struct pcpu_chunk *pcpu_create_chunk(void)
pcpu_chunk_populated(chunk, 0, nr_pages);
spin_unlock_irq(&pcpu_lock);

+ pcpu_stats_chunk_alloc();
+
return chunk;
}

@@ -79,6 +81,8 @@ static void pcpu_destroy_chunk(struct pcpu_chunk *chunk)
{
const int nr_pages = pcpu_group_sizes[0] >> PAGE_SHIFT;

+ pcpu_stats_chunk_dealloc();
+
if (chunk && chunk->data)
__free_pages(chunk->data, order_base_2(nr_pages));
pcpu_free_chunk(chunk);
diff --git a/mm/percpu-stats.c b/mm/percpu-stats.c
new file mode 100644
index 0000000..03524a5
--- /dev/null
+++ b/mm/percpu-stats.c
@@ -0,0 +1,222 @@
+/*
+ * mm/percpu-debug.c
+ *
+ * Copyright (C) 2017 Facebook Inc.
+ * Copyright (C) 2017 Dennis Zhou <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * Prints statistics about the percpu allocator and backing chunks.
+ */
+#include <linux/debugfs.h>
+#include <linux/list.h>
+#include <linux/percpu.h>
+#include <linux/seq_file.h>
+#include <linux/sort.h>
+#include <linux/vmalloc.h>
+
+#include "percpu-internal.h"
+
+#define P(X, Y) \
+ seq_printf(m, " %-24s: %8lld\n", X, (long long int)Y)
+
+struct percpu_stats pcpu_stats;
+struct pcpu_alloc_info pcpu_stats_ai;
+
+static int cmpint(const void *a, const void *b)
+{
+ return *(int *)a - *(int *)b;
+}
+
+/*
+ * Iterates over all chunks to find the max # of map entries used.
+ */
+static int find_max_map_used(void)
+{
+ struct pcpu_chunk *chunk;
+ int slot, max_map_used;
+
+ max_map_used = 0;
+ for (slot = 0; slot < pcpu_nr_slots; slot++)
+ list_for_each_entry(chunk, &pcpu_slot[slot], list)
+ max_map_used = max(max_map_used, chunk->map_used);
+
+ return max_map_used;
+}
+
+/*
+ * Prints out chunk state. Fragmentation is considered between
+ * the beginning of the chunk to the last allocation.
+ */
+static void chunk_map_stats(struct seq_file *m, struct pcpu_chunk *chunk,
+ void *buffer)
+{
+ int i, s_index, last_alloc, alloc_sign, as_len;
+ int *alloc_sizes, *p;
+ /* statistics */
+ int sum_frag = 0, max_frag = 0;
+ int cur_min_alloc = 0, cur_med_alloc = 0, cur_max_alloc = 0;
+
+ alloc_sizes = buffer;
+ s_index = chunk->has_reserved ? 1 : 0;
+
+ /* find last allocation */
+ last_alloc = -1;
+ for (i = chunk->map_used - 1; i >= s_index; i--) {
+ if (chunk->map[i] & 1) {
+ last_alloc = i;
+ break;
+ }
+ }
+
+ /* if the chunk is not empty - ignoring reserve */
+ if (last_alloc >= s_index) {
+ as_len = last_alloc + 1 - s_index;
+
+ /*
+ * Iterate through chunk map computing size info.
+ * The first bit is overloaded to be a used flag.
+ * negative = free space, positive = allocated
+ */
+ for (i = 0, p = chunk->map + s_index; i < as_len; i++, p++) {
+ alloc_sign = (*p & 1) ? 1 : -1;
+ alloc_sizes[i] = alloc_sign *
+ ((p[1] & ~1) - (p[0] & ~1));
+ }
+
+ sort(alloc_sizes, as_len, sizeof(chunk->map[0]), cmpint, NULL);
+
+ /* Iterate through the unallocated fragements. */
+ for (i = 0, p = alloc_sizes; *p < 0 && i < as_len; i++, p++) {
+ sum_frag -= *p;
+ max_frag = max(max_frag, -1 * (*p));
+ }
+
+ cur_min_alloc = alloc_sizes[i];
+ cur_med_alloc = alloc_sizes[(i + as_len - 1) / 2];
+ cur_max_alloc = alloc_sizes[as_len - 1];
+ }
+
+ P("nr_alloc", chunk->nr_alloc);
+ P("max_alloc_size", chunk->max_alloc_size);
+ P("free_size", chunk->free_size);
+ P("contig_hint", chunk->contig_hint);
+ P("sum_frag", sum_frag);
+ P("max_frag", max_frag);
+ P("cur_min_alloc", cur_min_alloc);
+ P("cur_med_alloc", cur_med_alloc);
+ P("cur_max_alloc", cur_max_alloc);
+ seq_putc(m, '\n');
+}
+
+static int percpu_stats_show(struct seq_file *m, void *v)
+{
+ struct pcpu_chunk *chunk;
+ int slot, max_map_used;
+ void *buffer;
+
+alloc_buffer:
+ spin_lock_irq(&pcpu_lock);
+ max_map_used = find_max_map_used();
+ spin_unlock_irq(&pcpu_lock);
+
+ buffer = vmalloc(max_map_used * sizeof(pcpu_first_chunk->map[0]));
+ if (!buffer)
+ return -ENOMEM;
+
+ spin_lock_irq(&pcpu_lock);
+
+ /* if the buffer allocated earlier is too small */
+ if (max_map_used < find_max_map_used()) {
+ spin_unlock_irq(&pcpu_lock);
+ vfree(buffer);
+ goto alloc_buffer;
+ }
+
+#define PL(X) \
+ seq_printf(m, " %-24s: %8lld\n", #X, (long long int)pcpu_stats_ai.X)
+
+ seq_printf(m,
+ "Percpu Memory Statistics\n"
+ "Allocation Info:\n"
+ "----------------------------------------\n");
+ PL(unit_size);
+ PL(static_size);
+ PL(reserved_size);
+ PL(dyn_size);
+ PL(atom_size);
+ PL(alloc_size);
+ seq_putc(m, '\n');
+
+#undef PL
+
+#define PU(X) \
+ seq_printf(m, " %-18s: %14llu\n", #X, (unsigned long long)pcpu_stats.X)
+
+ seq_printf(m,
+ "Global Stats:\n"
+ "----------------------------------------\n");
+ PU(nr_alloc);
+ PU(nr_dealloc);
+ PU(nr_cur_alloc);
+ PU(nr_max_alloc);
+ PU(nr_chunks);
+ PU(nr_max_chunks);
+ PU(min_alloc_size);
+ PU(max_alloc_size);
+ seq_putc(m, '\n');
+
+#undef PU
+
+ seq_printf(m,
+ "Per Chunk Stats:\n"
+ "----------------------------------------\n");
+
+ if (pcpu_reserved_chunk) {
+ seq_puts(m, "Chunk: <- Reserved Chunk\n");
+ chunk_map_stats(m, pcpu_reserved_chunk, buffer);
+ }
+
+ for (slot = 0; slot < pcpu_nr_slots; slot++) {
+ list_for_each_entry(chunk, &pcpu_slot[slot], list) {
+ if (chunk == pcpu_first_chunk) {
+ seq_puts(m, "Chunk: <- First Chunk\n");
+ chunk_map_stats(m, chunk, buffer);
+
+
+ } else {
+ seq_puts(m, "Chunk:\n");
+ chunk_map_stats(m, chunk, buffer);
+ }
+
+ }
+ }
+
+ spin_unlock_irq(&pcpu_lock);
+
+ vfree(buffer);
+
+ return 0;
+}
+
+static int percpu_stats_open(struct inode *inode, struct file *filp)
+{
+ return single_open(filp, percpu_stats_show, NULL);
+}
+
+static const struct file_operations percpu_stats_fops = {
+ .open = percpu_stats_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int __init init_percpu_stats_debugfs(void)
+{
+ debugfs_create_file("percpu_stats", 0444, NULL, NULL,
+ &percpu_stats_fops);
+
+ return 0;
+}
+
+late_initcall(init_percpu_stats_debugfs);
diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
index 9ac6394..5915a22 100644
--- a/mm/percpu-vm.c
+++ b/mm/percpu-vm.c
@@ -343,11 +343,16 @@ static struct pcpu_chunk *pcpu_create_chunk(void)

chunk->data = vms;
chunk->base_addr = vms[0]->addr - pcpu_group_offsets[0];
+
+ pcpu_stats_chunk_alloc();
+
return chunk;
}

static void pcpu_destroy_chunk(struct pcpu_chunk *chunk)
{
+ pcpu_stats_chunk_dealloc();
+
if (chunk && chunk->data)
pcpu_free_vm_areas(chunk->data, pcpu_nr_groups);
pcpu_free_chunk(chunk);
diff --git a/mm/percpu.c b/mm/percpu.c
index 5cf7d73..25b4ba5 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -657,6 +657,7 @@ static void pcpu_free_area(struct pcpu_chunk *chunk, int freeme,
int *p;

lockdep_assert_held(&pcpu_lock);
+ pcpu_stats_area_dealloc(chunk);

freeme |= 1; /* we are searching for <given offset, in use> pair */

@@ -721,6 +722,7 @@ static struct pcpu_chunk *pcpu_alloc_chunk(void)
chunk->map[0] = 0;
chunk->map[1] = pcpu_unit_size | 1;
chunk->map_used = 1;
+ chunk->has_reserved = false;

INIT_LIST_HEAD(&chunk->list);
INIT_LIST_HEAD(&chunk->map_extend_list);
@@ -970,6 +972,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
goto restart;

area_found:
+ pcpu_stats_area_alloc(chunk, size);
spin_unlock_irqrestore(&pcpu_lock, flags);

/* populate if not all pages are already there */
@@ -1642,6 +1645,8 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
pcpu_chunk_struct_size = sizeof(struct pcpu_chunk) +
BITS_TO_LONGS(pcpu_unit_pages) * sizeof(unsigned long);

+ pcpu_stats_save_ai(ai);
+
/*
* Allocate chunk slots. The additional last slot is for
* empty chunks.
@@ -1685,6 +1690,7 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
if (schunk->free_size)
schunk->map[++schunk->map_used] = ai->static_size + schunk->free_size;
schunk->map[schunk->map_used] |= 1;
+ schunk->has_reserved = true;

/* init dynamic chunk if necessary */
if (dyn_size) {
@@ -1703,6 +1709,7 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
dchunk->map[1] = pcpu_reserved_chunk_limit;
dchunk->map[2] = (pcpu_reserved_chunk_limit + dchunk->free_size) | 1;
dchunk->map_used = 2;
+ dchunk->has_reserved = true;
}

/* link the first chunk in */
@@ -1711,6 +1718,8 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
pcpu_count_occupied_pages(pcpu_first_chunk, 1);
pcpu_chunk_relocate(pcpu_first_chunk, -1);

+ pcpu_stats_chunk_alloc();
+
/* we're done */
pcpu_base_addr = base_addr;
return 0;
--
2.9.3

2017-06-19 23:29:25

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 2/4] percpu: migrate percpu data structures to internal header

Migrates pcpu_chunk definition and a few percpu static variables to an
internal header file from mm/percpu.c. These will be used with debugfs
to expose statistics about percpu memory improving visibility regarding
allocations and fragmentation.

Signed-off-by: Dennis Zhou <[email protected]>
---
mm/percpu-internal.h | 33 +++++++++++++++++++++++++++++++++
mm/percpu.c | 30 +++++++-----------------------
2 files changed, 40 insertions(+), 23 deletions(-)
create mode 100644 mm/percpu-internal.h

diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
new file mode 100644
index 0000000..8b6cb2a
--- /dev/null
+++ b/mm/percpu-internal.h
@@ -0,0 +1,33 @@
+#ifndef _MM_PERCPU_INTERNAL_H
+#define _MM_PERCPU_INTERNAL_H
+
+#include <linux/types.h>
+#include <linux/percpu.h>
+
+struct pcpu_chunk {
+ struct list_head list; /* linked to pcpu_slot lists */
+ int free_size; /* free bytes in the chunk */
+ int contig_hint; /* max contiguous size hint */
+ void *base_addr; /* base address of this chunk */
+
+ int map_used; /* # of map entries used before the sentry */
+ int map_alloc; /* # of map entries allocated */
+ int *map; /* allocation map */
+ struct list_head map_extend_list;/* on pcpu_map_extend_chunks */
+
+ void *data; /* chunk data */
+ int first_free; /* no free below this */
+ bool immutable; /* no [de]population allowed */
+ int nr_populated; /* # of populated pages */
+ unsigned long populated[]; /* populated bitmap */
+};
+
+extern spinlock_t pcpu_lock;
+
+extern struct list_head *pcpu_slot __read_mostly;
+extern int pcpu_nr_slots __read_mostly;
+
+extern struct pcpu_chunk *pcpu_first_chunk;
+extern struct pcpu_chunk *pcpu_reserved_chunk;
+
+#endif
diff --git a/mm/percpu.c b/mm/percpu.c
index f94a5eb..5cf7d73 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -76,6 +76,8 @@
#include <asm/tlbflush.h>
#include <asm/io.h>

+#include "percpu-internal.h"
+
#define PCPU_SLOT_BASE_SHIFT 5 /* 1-31 shares the same slot */
#define PCPU_DFL_MAP_ALLOC 16 /* start a map with 16 ents */
#define PCPU_ATOMIC_MAP_MARGIN_LOW 32
@@ -103,29 +105,11 @@
#define __pcpu_ptr_to_addr(ptr) (void __force *)(ptr)
#endif /* CONFIG_SMP */

-struct pcpu_chunk {
- struct list_head list; /* linked to pcpu_slot lists */
- int free_size; /* free bytes in the chunk */
- int contig_hint; /* max contiguous size hint */
- void *base_addr; /* base address of this chunk */
-
- int map_used; /* # of map entries used before the sentry */
- int map_alloc; /* # of map entries allocated */
- int *map; /* allocation map */
- struct list_head map_extend_list;/* on pcpu_map_extend_chunks */
-
- void *data; /* chunk data */
- int first_free; /* no free below this */
- bool immutable; /* no [de]population allowed */
- int nr_populated; /* # of populated pages */
- unsigned long populated[]; /* populated bitmap */
-};
-
static int pcpu_unit_pages __read_mostly;
static int pcpu_unit_size __read_mostly;
static int pcpu_nr_units __read_mostly;
static int pcpu_atom_size __read_mostly;
-static int pcpu_nr_slots __read_mostly;
+int pcpu_nr_slots __read_mostly;
static size_t pcpu_chunk_struct_size __read_mostly;

/* cpus with the lowest and highest unit addresses */
@@ -149,7 +133,7 @@ static const size_t *pcpu_group_sizes __read_mostly;
* chunks, this one can be allocated and mapped in several different
* ways and thus often doesn't live in the vmalloc area.
*/
-static struct pcpu_chunk *pcpu_first_chunk;
+struct pcpu_chunk *pcpu_first_chunk;

/*
* Optional reserved chunk. This chunk reserves part of the first
@@ -158,13 +142,13 @@ static struct pcpu_chunk *pcpu_first_chunk;
* area doesn't exist, the following variables contain NULL and 0
* respectively.
*/
-static struct pcpu_chunk *pcpu_reserved_chunk;
+struct pcpu_chunk *pcpu_reserved_chunk;
static int pcpu_reserved_chunk_limit;

-static DEFINE_SPINLOCK(pcpu_lock); /* all internal data structures */
+DEFINE_SPINLOCK(pcpu_lock); /* all internal data structures */
static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop, map ext */

-static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */
+struct list_head *pcpu_slot __read_mostly; /* chunk list slots */

/* chunks which need their map areas extended, protected by pcpu_lock */
static LIST_HEAD(pcpu_map_extend_chunks);
--
2.9.3

2017-06-20 17:45:25

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 0/4] percpu: add basic stats and tracepoints to percpu allocator

On Mon, Jun 19, 2017 at 07:28:28PM -0400, Dennis Zhou wrote:
> There is limited visibility into the percpu memory allocator making it hard to
> understand usage patterns. Without these concrete numbers, we are left to
> conjecture about the correctness of percpu memory patterns and usage.
> Additionally, there is no mechanism to review the correctness/efficiency of the
> current implementation.
>
> This patchset address the following:
> - Adds basic statistics to reason about the number of allocations over the
> lifetime, allocation sizes, and fragmentation.
> - Adds tracepoints to enable better debug capabilities as well as the ability
> to review allocation requests and corresponding decisions.
>
> This patchiest contains the following four patches:
> 0001-percpu-add-missing-lockdep_assert_held-to-func-pcpu_.patch
> 0002-percpu-migrate-percpu-data-structures-to-internal-he.patch
> 0003-percpu-expose-statistics-about-percpu-memory-via-deb.patch
> 0004-percpu-add-tracepoint-support-for-percpu-memory.patch

Applied to percpu/for-4.13. I had to update 0002 because of the
recent __ro_after_init changes. Can you please see whether I made any
mistakes while updating it?

git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu.git for-4.13

Thanks.

--
tejun

2017-06-20 19:12:56

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH 0/4] percpu: add basic stats and tracepoints to percpu allocator

On 6/20/17, 1:45 PM, "Tejun Heo" <[email protected] on behalf of [email protected]> wrote:
> Applied to percpu/for-4.13. I had to update 0002 because of the
> recent __ro_after_init changes. Can you please see whether I made any
> mistakes while updating it?

There is a tagging mismatch in 0002. Can you please change or remove the __read_mostly annotation in mm/percpu-internal.h?

Thanks,
Dennis


2017-06-20 19:32:41

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 0/4] percpu: add basic stats and tracepoints to percpu allocator

On Tue, Jun 20, 2017 at 07:12:49PM +0000, Dennis Zhou wrote:
> On 6/20/17, 1:45 PM, "Tejun Heo" <[email protected] on behalf of [email protected]> wrote:
> > Applied to percpu/for-4.13. I had to update 0002 because of the
> > recent __ro_after_init changes. Can you please see whether I made any
> > mistakes while updating it?
>
> There is a tagging mismatch in 0002. Can you please change or remove the __read_mostly annotation in mm/percpu-internal.h?

Fixed. Thanks.

--
tejun

Subject: Re: [PATCH 4/4] percpu: add tracepoint support for percpu memory

On Mon, Jun 19, 2017 at 07:28:32PM -0400, Dennis Zhou wrote:
>Add support for tracepoints to the following events: chunk allocation,
>chunk free, area allocation, area free, and area allocation failure.
>This should let us replay percpu memory requests and evaluate
>corresponding decisions.

This patch breaks boot for me:

[ 0.000000] DEBUG_LOCKS_WARN_ON(unlikely(early_boot_irqs_disabled))
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:2741 trace_hardirqs_on_caller.cold.58+0x47/0x4e
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc6-next-20170621+ #155
[ 0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
[ 0.000000] task: ffffffffb7831180 task.stack: ffffffffb7800000
[ 0.000000] RIP: 0010:trace_hardirqs_on_caller.cold.58+0x47/0x4e
[ 0.000000] RSP: 0000:ffffffffb78079d0 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
[ 0.000000] RAX: 0000000000000037 RBX: 0000000000000003 RCX: 0000000000000000
[ 0.000000] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 1ffffffff6f00ef6
[ 0.000000] RBP: ffffffffb78079e0 R08: 0000000000000000 R09: ffffffffb7831180
[ 0.000000] R10: 0000000000000000 R11: ffffffffb24e96ce R12: ffffffffb6b39b87
[ 0.000000] R13: 00000000001f0001 R14: ffffffffb85603a0 R15: 0000000000002000
[ 0.000000] FS: 0000000000000000(0000) GS:ffffffffb81be000(0000) knlGS:0000000000000000
[ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.000000] CR2: ffff88007fbff000 CR3: 000000006b828000 CR4: 00000000000406b0
[ 0.000000] Call Trace:
[ 0.000000] trace_hardirqs_on+0xd/0x10
[ 0.000000] _raw_spin_unlock_irq+0x27/0x50
[ 0.000000] pcpu_setup_first_chunk+0x19c2/0x1c27
[ 0.000000] ? pcpu_free_alloc_info+0x4b/0x4b
[ 0.000000] ? vprintk_emit+0x403/0x480
[ 0.000000] ? __down_trylock_console_sem+0xb7/0xc0
[ 0.000000] ? __down_trylock_console_sem+0x6e/0xc0
[ 0.000000] ? vprintk_emit+0x362/0x480
[ 0.000000] ? vprintk_default+0x28/0x30
[ 0.000000] ? printk+0xb2/0xdd
[ 0.000000] ? snapshot_ioctl.cold.1+0x19/0x19
[ 0.000000] ? __alloc_bootmem_node_nopanic+0x88/0x96
[ 0.000000] pcpu_embed_first_chunk+0x7b0/0x8ef
[ 0.000000] ? pcpup_populate_pte+0xb/0xb
[ 0.000000] setup_per_cpu_areas+0x105/0x6d9
[ 0.000000] ? find_last_bit+0xa6/0xd0
[ 0.000000] start_kernel+0x25e/0x78f
[ 0.000000] ? thread_stack_cache_init+0xb/0xb
[ 0.000000] ? early_idt_handler_common+0x3b/0x52
[ 0.000000] ? early_idt_handler_array+0x120/0x120
[ 0.000000] ? early_idt_handler_array+0x120/0x120
[ 0.000000] x86_64_start_reservations+0x24/0x26
[ 0.000000] x86_64_start_kernel+0x143/0x166
[ 0.000000] secondary_startup_64+0x9f/0x9f
[ 0.000000] Code: c6 a0 49 c6 b6 48 c7 c7 e0 49 c6 b6 e8 43 34 00 00 0f ff e9 ed 71 ce ff 48 c7 c6 c0 79 c6 b6 48 c7 c7 e0 49 c6 b6 e8 29 34 00 00 <0f> ff e9 d3 71 ce ff 48 c7 c6 20 7c c6 b6 48 c7 c7 e0 49 c6 b6
[ 0.000000] random: print_oops_end_marker+0x30/0x50 get_random_bytes called with crng_init=0
[ 0.000000] ---[ end trace f68728a0d3053b52 ]---
[ 0.000000] BUG: unable to handle kernel paging request at 00000000ffffffff
[ 0.000000] IP: native_write_msr+0x6/0x30
[ 0.000000] PGD 0
[ 0.000000] P4D 0
[ 0.000000]
[ 0.000000] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W 4.12.0-rc6-next-20170621+ #155
[ 0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
[ 0.000000] task: ffffffffb7831180 task.stack: ffffffffb7800000
[ 0.000000] RIP: 0010:native_write_msr+0x6/0x30
[ 0.000000] RSP: 0000:ffffffffb7807dc8 EFLAGS: 00010202
[ 0.000000] RAX: 000000003ea15d43 RBX: ffff88003ea15d40 RCX: 000000004b564d02
[ 0.000000] RDX: 0000000000000000 RSI: 000000003ea15d43 RDI: 000000004b564d02
[ 0.000000] RBP: ffffffffb7807df0 R08: 0000000000000040 R09: 0000000000000000
[ 0.000000] R10: 0000000000007100 R11: 000000007ffd6f00 R12: 0000000000000000
[ 0.000000] R13: 1ffffffff6f00fc3 R14: ffffffffb7807eb8 R15: dffffc0000000000
[ 0.000000] FS: 0000000000000000(0000) GS:ffff88003ea00000(0000) knlGS:0000000000000000
[ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.000000] CR2: 00000000ffffffff CR3: 000000006b828000 CR4: 00000000000406b0
[ 0.000000] Call Trace:
[ 0.000000] ? kvm_guest_cpu_init+0x155/0x220
[ 0.000000] kvm_smp_prepare_boot_cpu+0x9/0x10
[ 0.000000] start_kernel+0x28c/0x78f
[ 0.000000] ? thread_stack_cache_init+0xb/0xb
[ 0.000000] ? early_idt_handler_common+0x3b/0x52
[ 0.000000] ? early_idt_handler_array+0x120/0x120
[ 0.000000] ? early_idt_handler_array+0x120/0x120
[ 0.000000] x86_64_start_reservations+0x24/0x26
[ 0.000000] x86_64_start_kernel+0x143/0x166
[ 0.000000] secondary_startup_64+0x9f/0x9f
[ 0.000000] Code: c3 0f 21 c8 5d c3 0f 21 d0 5d c3 0f 21 d8 5d c3 0f 21 f0 5d c3 0f 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 89 f9 89 f0 0f 30 <0f> 1f 44 00 00 c3 48 89 d6 55 89 c2 48 c1 e6 20 48 89 e5 48 09
[ 0.000000] RIP: native_write_msr+0x6/0x30 RSP: ffffffffb7807dc8
[ 0.000000] CR2: 00000000ffffffff
[ 0.000000] ---[ end trace f68728a0d3053b53 ]---
[ 0.000000] Kernel panic - not syncing: Fatal exception
[ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception

--

Thanks,
Sasha

2017-06-21 17:53:21

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH 1/1] percpu: fix early calls for spinlock in pcpu_stats

>From 2c06e795162cb306c9707ec51d3e1deadb37f573 Mon Sep 17 00:00:00 2001
From: Dennis Zhou <[email protected]>
Date: Wed, 21 Jun 2017 10:17:09 -0700

Commit 30a5b5367ef9 ("percpu: expose statistics about percpu memory via
debugfs") introduces percpu memory statistics. pcpu_stats_chunk_alloc
takes the spin lock and disables/enables irqs on creation of a chunk. Irqs
are not enabled when the first chunk is initialized and thus kernels are
failing to boot with kernel debugging enabled. Fixed by changing _irq to
_irqsave and _irqrestore.

Fixes: 30a5b5367ef9 ("percpu: expose statistics about percpu memory via debugfs")
Signed-off-by: Dennis Zhou <[email protected]>
Reported-by: Alexander Levin <[email protected]>
---

Hi Sasha,

The root cause was from 0003 of that series where I prematurely enabled
irqs and the problem is addresssed here. I am able to boot with debug
options enabled.

Thanks,
Dennis

mm/percpu-internal.h | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
index d030fce..cd2442e 100644
--- a/mm/percpu-internal.h
+++ b/mm/percpu-internal.h
@@ -116,13 +116,14 @@ static inline void pcpu_stats_area_dealloc(struct pcpu_chunk *chunk)
*/
static inline void pcpu_stats_chunk_alloc(void)
{
- spin_lock_irq(&pcpu_lock);
+ unsigned long flags;
+ spin_lock_irqsave(&pcpu_lock, flags);

pcpu_stats.nr_chunks++;
pcpu_stats.nr_max_chunks =
max(pcpu_stats.nr_max_chunks, pcpu_stats.nr_chunks);

- spin_unlock_irq(&pcpu_lock);
+ spin_unlock_irqrestore(&pcpu_lock, flags);
}

/*
@@ -130,11 +131,12 @@ static inline void pcpu_stats_chunk_alloc(void)
*/
static inline void pcpu_stats_chunk_dealloc(void)
{
- spin_lock_irq(&pcpu_lock);
+ unsigned long flags;
+ spin_lock_irqsave(&pcpu_lock, flags);

pcpu_stats.nr_chunks--;

- spin_unlock_irq(&pcpu_lock);
+ spin_unlock_irqrestore(&pcpu_lock, flags);
}

#else
--
2.9.3

2017-06-21 17:54:42

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/1] percpu: fix early calls for spinlock in pcpu_stats

On Wed, Jun 21, 2017 at 01:52:46PM -0400, Dennis Zhou wrote:
> From 2c06e795162cb306c9707ec51d3e1deadb37f573 Mon Sep 17 00:00:00 2001
> From: Dennis Zhou <[email protected]>
> Date: Wed, 21 Jun 2017 10:17:09 -0700
>
> Commit 30a5b5367ef9 ("percpu: expose statistics about percpu memory via
> debugfs") introduces percpu memory statistics. pcpu_stats_chunk_alloc
> takes the spin lock and disables/enables irqs on creation of a chunk. Irqs
> are not enabled when the first chunk is initialized and thus kernels are
> failing to boot with kernel debugging enabled. Fixed by changing _irq to
> _irqsave and _irqrestore.
>
> Fixes: 30a5b5367ef9 ("percpu: expose statistics about percpu memory via debugfs")
> Signed-off-by: Dennis Zhou <[email protected]>
> Reported-by: Alexander Levin <[email protected]>

Applied to percpu/for-4.13.

Thanks.

--
tejun

2017-07-07 08:16:05

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH 3/4] percpu: expose statistics about percpu memory via debugfs

Hi Dennis,

On Tue, Jun 20, 2017 at 1:28 AM, Dennis Zhou <[email protected]> wrote:
> There is limited visibility into the use of percpu memory leaving us
> unable to reason about correctness of parameters and overall use of
> percpu memory. These counters and statistics aim to help understand
> basic statistics about percpu memory such as number of allocations over
> the lifetime, allocation sizes, and fragmentation.
>
> New Config: PERCPU_STATS
>
> Signed-off-by: Dennis Zhou <[email protected]>
> ---
> mm/Kconfig | 8 ++
> mm/Makefile | 1 +
> mm/percpu-internal.h | 131 ++++++++++++++++++++++++++++++
> mm/percpu-km.c | 4 +
> mm/percpu-stats.c | 222 +++++++++++++++++++++++++++++++++++++++++++++++++++
> mm/percpu-vm.c | 5 ++
> mm/percpu.c | 9 +++
> 7 files changed, 380 insertions(+)
> create mode 100644 mm/percpu-stats.c
>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index beb7a45..8fae426 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -706,3 +706,11 @@ config ARCH_USES_HIGH_VMA_FLAGS
> bool
> config ARCH_HAS_PKEYS
> bool
> +
> +config PERCPU_STATS
> + bool "Collect percpu memory statistics"
> + default n
> + help
> + This feature collects and exposes statistics via debugfs. The
> + information includes global and per chunk statistics, which can
> + be used to help understand percpu memory usage.

Just wondering: does this option make sense to enable on !SMP?

If not, you may want to make it depend on SMP.

Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2017-07-08 20:33:54

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH 3/4] percpu: expose statistics about percpu memory via debugfs

On Fri, Jul 07, 2017 at 10:16:01AM +0200, Geert Uytterhoeven wrote:
> Hi Dennis,
>
> On Tue, Jun 20, 2017 at 1:28 AM, Dennis Zhou <[email protected]> wrote:
>
> Just wondering: does this option make sense to enable on !SMP?
>
> If not, you may want to make it depend on SMP.
>
> Thanks!
>
> Gr{oetje,eeting}s,
>
> Geert

Hi Geert,

The percpu allocator is still used on UP configs, so it would still
provide useful data.

Thanks,
Dennis