LinuxLists.cc - [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

2008-12-29 21:42:38

Subject: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

Impact: new tracer

This patch adapts kmemtrace raw events tracing to the unified tracing API.
It applies on latest tip/tracing/kmemtrace
To enable and use this tracer, just do the following:

echo kmemtrace > /debugfs/tracing/current_tracer
cat /debugfs/tracing/trace

You will have the following output:

type_id 1 call_site 18446744071565527833 ptr 18446612134395152256
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345164672 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345164912 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345165152 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 0 call_site 18446744071566144042 ptr 18446612134346191680 bytes_req 1304 bytes_alloc 1312 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584

That was to stay backward compatible with the format output produced in linux/tracepoint.h
This is the default ouput, but note that I tried something else.
If you change an option:

echo kmem_minimalistic > /debugfs/trace_options

and then cat /debugfs/trace
You will have the following output:

- C 0xffff88007c088780 file_free_rcu
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
- C 0xffff88007cad6000 putname
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
+ K 240 240 000000d0 0xffff8800790dc780 -1 d_alloc
- C 0xffff88007cad6000 putname
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
+ K 240 240 000000d0 0xffff8800790dc870 -1 d_alloc
- C 0xffff88007cad6000 putname
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
+ K 240 240 000000d0 0xffff8800790dc960 -1 d_alloc
+ K 1304 1312 000000d0 0xffff8800791d7340 -1 reiserfs_alloc_inode
- C 0xffff88007cad6000 putname
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
- C 0xffff88007cad6000 putname
+ K 992 1000 000000d0 0xffff880079045b58 -1 alloc_inode
+ K 768 1024 000080d0 0xffff88007c096400 -1 alloc_pipe_info
+ K 240 240 000000d0 0xffff8800790dca50 -1 d_alloc
+ K 272 320 000080d0 0xffff88007c088780 -1 get_empty_filp
+ K 272 320 000080d0 0xffff88007c088000 -1 get_empty_filp

Yeah I shall confess kmem_minimalistic should be: kmem_alternative.

Whatever, I find it more readable but this a personal opinion of course. We can drop it if you want.
On the ALLOC/FREE column, + means an allocation and - a free.

On the type column, you have K = kmalloc, C = cache, P = page
I would like the flags to be GFP_* strings but that would not be easy to not break the column with strings....

About the node...it seems to always be -1. I don't know why but that shouldn't be difficult to find.

I moved linux/tracepoint.h to trace/tracepoint.h as well. I think that would be more easy to find the tracer
headers if they are all in their common directory.

Don't hesitate to comment, I'm not a kmemtrace specialist so...

Signed-off-by: Frederic Weisbecker <[email protected]>
---
include/linux/kmemtrace.h | 86 ----------
include/linux/slab_def.h | 2 +-
include/linux/slub_def.h | 2 +-
include/trace/kmemtrace.h | 75 +++++++++
init/main.c | 2 +-
kernel/trace/Kconfig | 21 +++
kernel/trace/Makefile | 1 +
kernel/trace/trace.h | 25 +++
kernel/trace/trace_kmemtrace.c | 345 ++++++++++++++++++++++++++++++++++++++++
lib/Kconfig.debug | 20 ---
mm/kmemtrace.c | 2 +-
mm/slob.c | 2 +-
mm/slub.c | 2 +-
13 files changed, 473 insertions(+), 112 deletions(-)

diff --git a/include/linux/kmemtrace.h b/include/linux/kmemtrace.h
deleted file mode 100644
index 5bea8ea..0000000
--- a/include/linux/kmemtrace.h
+++ /dev/null
@@ -1,86 +0,0 @@
-/*
- * Copyright (C) 2008 Eduard - Gabriel Munteanu
- *
- * This file is released under GPL version 2.
- */
-
-#ifndef _LINUX_KMEMTRACE_H
-#define _LINUX_KMEMTRACE_H
-
-#ifdef __KERNEL__
-
-#include <linux/types.h>
-#include <linux/marker.h>
-
-enum kmemtrace_type_id {
- KMEMTRACE_TYPE_KMALLOC = 0, /* kmalloc() or kfree(). */
- KMEMTRACE_TYPE_CACHE, /* kmem_cache_*(). */
- KMEMTRACE_TYPE_PAGES, /* __get_free_pages() and friends. */
-};
-
-#ifdef CONFIG_KMEMTRACE
-
-extern void kmemtrace_init(void);
-
-static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
- unsigned long call_site,
- const void *ptr,
- size_t bytes_req,
- size_t bytes_alloc,
- gfp_t gfp_flags,
- int node)
-{
- trace_mark(kmemtrace_alloc, "type_id %d call_site %lu ptr %lu "
- "bytes_req %lu bytes_alloc %lu gfp_flags %lu node %d",
- type_id, call_site, (unsigned long) ptr,
- (unsigned long) bytes_req, (unsigned long) bytes_alloc,
- (unsigned long) gfp_flags, node);
-}
-
-static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
- unsigned long call_site,
- const void *ptr)
-{
- trace_mark(kmemtrace_free, "type_id %d call_site %lu ptr %lu",
- type_id, call_site, (unsigned long) ptr);
-}
-
-#else /* CONFIG_KMEMTRACE */
-
-static inline void kmemtrace_init(void)
-{
-}
-
-static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
- unsigned long call_site,
- const void *ptr,
- size_t bytes_req,
- size_t bytes_alloc,
- gfp_t gfp_flags,
- int node)
-{
-}
-
-static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
- unsigned long call_site,
- const void *ptr)
-{
-}
-
-#endif /* CONFIG_KMEMTRACE */
-
-static inline void kmemtrace_mark_alloc(enum kmemtrace_type_id type_id,
- unsigned long call_site,
- const void *ptr,
- size_t bytes_req,
- size_t bytes_alloc,
- gfp_t gfp_flags)
-{
- kmemtrace_mark_alloc_node(type_id, call_site, ptr,
- bytes_req, bytes_alloc, gfp_flags, -1);
-}
-
-#endif /* __KERNEL__ */
-
-#endif /* _LINUX_KMEMTRACE_H */
-
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 7555ce9..455f9af 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -14,7 +14,7 @@
#include <asm/page.h> /* kmalloc_sizes.h needs PAGE_SIZE */
#include <asm/cache.h> /* kmalloc_sizes.h needs L1_CACHE_BYTES */
#include <linux/compiler.h>
-#include <linux/kmemtrace.h>
+#include <trace/kmemtrace.h>

/* Size description struct for general caches. */
struct cache_sizes {
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index dc28432..6b657f7 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -10,7 +10,7 @@
#include <linux/gfp.h>
#include <linux/workqueue.h>
#include <linux/kobject.h>
-#include <linux/kmemtrace.h>
+#include <trace/kmemtrace.h>

enum stat_item {
ALLOC_FASTPATH, /* Allocation from cpu slab */
diff --git a/include/trace/kmemtrace.h b/include/trace/kmemtrace.h
new file mode 100644
index 0000000..ad8b785
--- /dev/null
+++ b/include/trace/kmemtrace.h
@@ -0,0 +1,75 @@
+/*
+ * Copyright (C) 2008 Eduard - Gabriel Munteanu
+ *
+ * This file is released under GPL version 2.
+ */
+
+#ifndef _LINUX_KMEMTRACE_H
+#define _LINUX_KMEMTRACE_H
+
+#ifdef __KERNEL__
+
+#include <linux/types.h>
+#include <linux/marker.h>
+
+enum kmemtrace_type_id {
+ KMEMTRACE_TYPE_KMALLOC = 0, /* kmalloc() or kfree(). */
+ KMEMTRACE_TYPE_CACHE, /* kmem_cache_*(). */
+ KMEMTRACE_TYPE_PAGES, /* __get_free_pages() and friends. */
+};
+
+#ifdef CONFIG_KMEMTRACE
+
+extern void kmemtrace_init(void);
+
+extern void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
+ unsigned long call_site,
+ const void *ptr,
+ size_t bytes_req,
+ size_t bytes_alloc,
+ gfp_t gfp_flags,
+ int node);
+
+extern void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
+ unsigned long call_site,
+ const void *ptr);
+
+#else /* CONFIG_KMEMTRACE */
+
+static inline void kmemtrace_init(void)
+{
+}
+
+static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
+ unsigned long call_site,
+ const void *ptr,
+ size_t bytes_req,
+ size_t bytes_alloc,
+ gfp_t gfp_flags,
+ int node)
+{
+}
+
+static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
+ unsigned long call_site,
+ const void *ptr)
+{
+}
+
+#endif /* CONFIG_KMEMTRACE */
+
+static inline void kmemtrace_mark_alloc(enum kmemtrace_type_id type_id,
+ unsigned long call_site,
+ const void *ptr,
+ size_t bytes_req,
+ size_t bytes_alloc,
+ gfp_t gfp_flags)
+{
+ kmemtrace_mark_alloc_node(type_id, call_site, ptr,
+ bytes_req, bytes_alloc, gfp_flags, -1);
+}
+
+#endif /* __KERNEL__ */
+
+#endif /* _LINUX_KMEMTRACE_H */
+
diff --git a/init/main.c b/init/main.c
index 9711586..beca7aa 100644
--- a/init/main.c
+++ b/init/main.c
@@ -70,7 +70,7 @@
#include <asm/setup.h>
#include <asm/sections.h>
#include <asm/cacheflush.h>
-#include <linux/kmemtrace.h>
+#include <trace/kmemtrace.h>

#ifdef CONFIG_X86_LOCAL_APIC
#include <asm/smp.h>
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e2a4ff6..1c0b750 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -264,6 +264,27 @@ config HW_BRANCH_TRACER
This tracer records all branches on the system in a circular
buffer giving access to the last N branches for each cpu.

+config KMEMTRACE
+ bool "Trace SLAB allocations"
+ select TRACING
+ help
+ kmemtrace provides tracing for slab allocator functions, such as
+ kmalloc, kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected
+ data is then fed to the userspace application in order to analyse
+ allocation hotspots, internal fragmentation and so on, making it
+ possible to see how well an allocator performs, as well as debug
+ and profile kernel code.
+
+ This requires an userspace application to use. See
+ Documentation/vm/kmemtrace.txt for more information.
+
+ Saying Y will make the kernel somewhat larger and slower. However,
+ if you disable kmemtrace at run-time or boot-time, the performance
+ impact is minimal (depending on the arch the kernel is built for).
+
+ If unsure, say N.
+
+
config DYNAMIC_FTRACE
bool "enable/disable ftrace tracepoints dynamically"
depends on FUNCTION_TRACER
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 349d5a9..df106e3 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -33,5 +33,6 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
obj-$(CONFIG_HW_BRANCH_TRACER) += trace_hw_branches.o
obj-$(CONFIG_POWER_TRACER) += trace_power.o
+obj-$(CONFIG_KMEMTRACE) += trace_kmemtrace.o

libftrace-y := ftrace.o
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index cc7a4f8..534505b 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -9,6 +9,7 @@
#include <linux/mmiotrace.h>
#include <linux/ftrace.h>
#include <trace/boot.h>
+#include <trace/kmemtrace.h>

enum trace_type {
__TRACE_FIRST_TYPE = 0,
@@ -29,6 +30,8 @@ enum trace_type {
TRACE_GRAPH_ENT,
TRACE_USER_STACK,
TRACE_HW_BRANCHES,
+ TRACE_KMEM_ALLOC,
+ TRACE_KMEM_FREE,
TRACE_POWER,

__TRACE_LAST_TYPE
@@ -170,6 +173,24 @@ struct trace_power {
struct power_trace state_data;
};

+struct kmemtrace_alloc_entry {
+ struct trace_entry ent;
+ enum kmemtrace_type_id type_id;
+ unsigned long call_site;
+ const void *ptr;
+ size_t bytes_req;
+ size_t bytes_alloc;
+ gfp_t gfp_flags;
+ int node;
+};
+
+struct kmemtrace_free_entry {
+ struct trace_entry ent;
+ enum kmemtrace_type_id type_id;
+ unsigned long call_site;
+ const void *ptr;
+};
+
/*
* trace_flag_type is an enumeration that holds different
* states when a trace occurs. These are:
@@ -280,6 +301,10 @@ extern void __ftrace_bad_type(void);
TRACE_GRAPH_RET); \
IF_ASSIGN(var, ent, struct hw_branch_entry, TRACE_HW_BRANCHES);\
IF_ASSIGN(var, ent, struct trace_power, TRACE_POWER); \
+ IF_ASSIGN(var, ent, struct kmemtrace_alloc_entry, \
+ TRACE_KMEM_ALLOC); \
+ IF_ASSIGN(var, ent, struct kmemtrace_free_entry, \
+ TRACE_KMEM_FREE); \
__ftrace_bad_type(); \
} while (0)

diff --git a/kernel/trace/trace_kmemtrace.c b/kernel/trace/trace_kmemtrace.c
new file mode 100644
index 0000000..a9809b5
--- /dev/null
+++ b/kernel/trace/trace_kmemtrace.c
@@ -0,0 +1,345 @@
+/*
+ * Memory allocator tracing
+ *
+ * Copyright (C) 2008 Frederic Weisbecker <[email protected]>
+ * Copyright (C) 2008 Pekka Enberg <[email protected]>
+ *
+ * Some parts based on the old linux/kmemtrace.h which is
+ * Copyright (C) 2008 Eduard - Gabriel Munteanu
+ *
+ */
+
+#include <linux/dcache.h>
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/seq_file.h>
+#include <trace/kmemtrace.h>
+
+#include "trace.h"
+
+/* Select an alternative, minimalistic output than the original one */
+#define TRACE_KMEM_OPT_MINIMAL 0x1
+
+static struct tracer_opt kmem_opts[] = {
+ /* Default disable the minimalistic output */
+ { TRACER_OPT(kmem_minimalistic, TRACE_KMEM_OPT_MINIMAL) },
+ { }
+};
+
+static struct tracer_flags kmem_tracer_flags = {
+ .val = 0,
+ .opts = kmem_opts
+};
+
+
+static bool kmem_tracing_enabled __read_mostly;
+static struct trace_array *kmemtrace_array;
+
+static int kmem_trace_init(struct trace_array *tr)
+{
+ int cpu;
+ kmemtrace_array = tr;
+
+ for_each_cpu_mask(cpu, cpu_possible_map)
+ tracing_reset(tr, cpu);
+
+ kmem_tracing_enabled = true;
+
+ return 0;
+}
+
+static void kmem_trace_reset(struct trace_array *tr)
+{
+ kmem_tracing_enabled = false;
+}
+
+static void kmemtrace_headers(struct seq_file *s)
+{
+ /* Don't need headers for the original kmemtrace output */
+ if (!(kmem_tracer_flags.val & TRACE_KMEM_OPT_MINIMAL))
+ return;
+
+ seq_printf(s, "#\n");
+ seq_printf(s, "# ALLOC TYPE REQ GIVEN FLAGS "
+ " POINTER NODE CALLER\n");
+ seq_printf(s, "# FREE | | | | "
+ " | | | |\n");
+ seq_printf(s, "# |\n\n");
+}
+
+/*
+ * The two following functions give the original output from kmemtrace,
+ * or something close to....perhaps they need some missing things
+ */
+static enum print_line_t
+kmemtrace_print_alloc_original(struct trace_iterator *iter,
+ struct kmemtrace_alloc_entry *entry)
+{
+ struct trace_seq *s = &iter->seq;
+ int ret;
+
+ /* Taken from the old linux/kmemtrace.h */
+ ret = trace_seq_printf(s, "type_id %d call_site %lu ptr %lu "
+ "bytes_req %lu bytes_alloc %lu gfp_flags %lu node %d\n",
+ entry->type_id, entry->call_site, (unsigned long) entry->ptr,
+ (unsigned long) entry->bytes_req, (unsigned long) entry->bytes_alloc,
+ (unsigned long) entry->gfp_flags, entry->node);
+
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ return TRACE_TYPE_HANDLED;
+}
+
+static enum print_line_t
+kmemtrace_print_free_original(struct trace_iterator *iter,
+ struct kmemtrace_free_entry *entry)
+{
+ struct trace_seq *s = &iter->seq;
+ int ret;
+
+ /* Taken from the old linux/kmemtrace.h */
+ ret = trace_seq_printf(s, "type_id %d call_site %lu ptr %lu\n",
+ entry->type_id, entry->call_site, (unsigned long) entry->ptr);
+
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ return TRACE_TYPE_HANDLED;
+}
+
+
+/* The two other following provide a more minimalistic output */
+static enum print_line_t
+kmemtrace_print_alloc_compress(struct trace_iterator *iter,
+ struct kmemtrace_alloc_entry *entry)
+{
+ struct trace_seq *s = &iter->seq;
+ int ret;
+
+ /* Alloc entry */
+ ret = trace_seq_printf(s, " + ");
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ /* Type */
+ switch (entry->type_id) {
+ case KMEMTRACE_TYPE_KMALLOC:
+ ret = trace_seq_printf(s, "K ");
+ break;
+ case KMEMTRACE_TYPE_CACHE:
+ ret = trace_seq_printf(s, "C ");
+ break;
+ case KMEMTRACE_TYPE_PAGES:
+ ret = trace_seq_printf(s, "P ");
+ break;
+ default:
+ ret = trace_seq_printf(s, "? ");
+ }
+
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ /* Requested */
+ ret = trace_seq_printf(s, "%4d ", entry->bytes_req);
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ /* Allocated */
+ ret = trace_seq_printf(s, "%4d ", entry->bytes_alloc);
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ /* Flags
+ * TODO: would be better to see the name of the GFP flag names
+ */
+ ret = trace_seq_printf(s, "%08x ", entry->gfp_flags);
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ /* Pointer to allocated */
+ ret = trace_seq_printf(s, "0x%tx ", (ptrdiff_t)entry->ptr);
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ /* Node */
+ ret = trace_seq_printf(s, "%4d ", entry->node);
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ /* Call site */
+ ret = seq_print_ip_sym(s, entry->call_site, 0);
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ if (!trace_seq_printf(s, "\n"))
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ return TRACE_TYPE_HANDLED;
+}
+
+static enum print_line_t
+kmemtrace_print_free_compress(struct trace_iterator *iter,
+ struct kmemtrace_free_entry *entry)
+{
+ struct trace_seq *s = &iter->seq;
+ int ret;
+
+ /* Free entry */
+ ret = trace_seq_printf(s, " - ");
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ /* Type */
+ switch (entry->type_id) {
+ case KMEMTRACE_TYPE_KMALLOC:
+ ret = trace_seq_printf(s, "K ");
+ break;
+ case KMEMTRACE_TYPE_CACHE:
+ ret = trace_seq_printf(s, "C ");
+ break;
+ case KMEMTRACE_TYPE_PAGES:
+ ret = trace_seq_printf(s, "P ");
+ break;
+ default:
+ ret = trace_seq_printf(s, "? ");
+ }
+
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ /* Skip requested/allocated/flags */
+ ret = trace_seq_printf(s, " ");
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ /* Pointer to allocated */
+ ret = trace_seq_printf(s, "0x%tx ", (ptrdiff_t)entry->ptr);
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ /* Skip node */
+ ret = trace_seq_printf(s, " ");
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ /* Call site */
+ ret = seq_print_ip_sym(s, entry->call_site, 0);
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ if (!trace_seq_printf(s, "\n"))
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ return TRACE_TYPE_HANDLED;
+}
+
+static enum print_line_t kmemtrace_print_line(struct trace_iterator *iter)
+{
+ struct trace_entry *entry = iter->ent;
+
+ switch (entry->type) {
+ case TRACE_KMEM_ALLOC: {
+ struct kmemtrace_alloc_entry *field;
+ trace_assign_type(field, entry);
+ if (kmem_tracer_flags.val & TRACE_KMEM_OPT_MINIMAL)
+ return kmemtrace_print_alloc_compress(iter, field);
+ else
+ return kmemtrace_print_alloc_original(iter, field);
+ }
+
+ case TRACE_KMEM_FREE: {
+ struct kmemtrace_free_entry *field;
+ trace_assign_type(field, entry);
+ if (kmem_tracer_flags.val & TRACE_KMEM_OPT_MINIMAL)
+ return kmemtrace_print_free_compress(iter, field);
+ else
+ return kmemtrace_print_free_original(iter, field);
+ }
+
+ default:
+ return TRACE_TYPE_UNHANDLED;
+ }
+}
+
+/* Trace allocations */
+void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
+ unsigned long call_site,
+ const void *ptr,
+ size_t bytes_req,
+ size_t bytes_alloc,
+ gfp_t gfp_flags,
+ int node)
+{
+ struct ring_buffer_event *event;
+ struct kmemtrace_alloc_entry *entry;
+ struct trace_array *tr = kmemtrace_array;
+ unsigned long irq_flags;
+
+ if (!kmem_tracing_enabled)
+ return;
+
+ event = ring_buffer_lock_reserve(tr->buffer, sizeof(*entry),
+ &irq_flags);
+ if (!event)
+ return;
+ entry = ring_buffer_event_data(event);
+ tracing_generic_entry_update(&entry->ent, 0, 0);
+
+ entry->ent.type = TRACE_KMEM_ALLOC;
+ entry->call_site = call_site;
+ entry->ptr = ptr;
+ entry->bytes_req = bytes_req;
+ entry->bytes_alloc = bytes_alloc;
+ entry->gfp_flags = gfp_flags;
+ entry->node = node;
+
+ ring_buffer_unlock_commit(tr->buffer, event, irq_flags);
+
+ trace_wake_up();
+}
+
+void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
+ unsigned long call_site,
+ const void *ptr)
+{
+ struct ring_buffer_event *event;
+ struct kmemtrace_free_entry *entry;
+ struct trace_array *tr = kmemtrace_array;
+ unsigned long irq_flags;
+
+ if (!kmem_tracing_enabled)
+ return;
+
+ event = ring_buffer_lock_reserve(tr->buffer, sizeof(*entry),
+ &irq_flags);
+ if (!event)
+ return;
+ entry = ring_buffer_event_data(event);
+ tracing_generic_entry_update(&entry->ent, 0, 0);
+
+ entry->ent.type = TRACE_KMEM_FREE;
+ entry->type_id = type_id;
+ entry->call_site = call_site;
+ entry->ptr = ptr;
+
+ ring_buffer_unlock_commit(tr->buffer, event, irq_flags);
+
+ trace_wake_up();
+}
+
+static struct tracer kmem_tracer __read_mostly = {
+ .name = "kmemtrace",
+ .init = kmem_trace_init,
+ .reset = kmem_trace_reset,
+ .print_line = kmemtrace_print_line,
+ .print_header = kmemtrace_headers,
+ .flags = &kmem_tracer_flags
+};
+
+static int __init init_kmem_tracer(void)
+{
+ return register_tracer(&kmem_tracer);
+}
+
+device_initcall(init_kmem_tracer);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index b5417e2..b0f239e 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -803,26 +803,6 @@ config FIREWIRE_OHCI_REMOTE_DMA

If unsure, say N.

-config KMEMTRACE
- bool "Kernel memory tracer (kmemtrace)"
- depends on RELAY && DEBUG_FS && MARKERS
- help
- kmemtrace provides tracing for slab allocator functions, such as
- kmalloc, kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected
- data is then fed to the userspace application in order to analyse
- allocation hotspots, internal fragmentation and so on, making it
- possible to see how well an allocator performs, as well as debug
- and profile kernel code.
-
- This requires an userspace application to use. See
- Documentation/vm/kmemtrace.txt for more information.
-
- Saying Y will make the kernel somewhat larger and slower. However,
- if you disable kmemtrace at run-time or boot-time, the performance
- impact is minimal (depending on the arch the kernel is built for).
-
- If unsure, say N.
-
menuconfig BUILD_DOCSRC
bool "Build targets in Documentation/ tree"
depends on HEADERS_CHECK
diff --git a/mm/kmemtrace.c b/mm/kmemtrace.c
index 2a70a80..0573b50 100644
--- a/mm/kmemtrace.c
+++ b/mm/kmemtrace.c
@@ -10,7 +10,7 @@
#include <linux/module.h>
#include <linux/marker.h>
#include <linux/gfp.h>
-#include <linux/kmemtrace.h>
+#include <trace/kmemtrace.h>

#define KMEMTRACE_SUBBUF_SIZE 524288
#define KMEMTRACE_DEF_N_SUBBUFS 20
diff --git a/mm/slob.c b/mm/slob.c
index 0f1a49f..4d1c0fc 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -65,7 +65,7 @@
#include <linux/module.h>
#include <linux/rcupdate.h>
#include <linux/list.h>
-#include <linux/kmemtrace.h>
+#include <trace/kmemtrace.h>
#include <asm/atomic.h>

/*
diff --git a/mm/slub.c b/mm/slub.c
index cc4001f..7bf8cf8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -16,7 +16,7 @@
#include <linux/slab.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
-#include <linux/kmemtrace.h>
+#include <trace/kmemtrace.h>
#include <linux/cpu.h>
#include <linux/cpuset.h>
#include <linux/mempolicy.h>
--
1.6.0.4

2008-12-29 21:47:18

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

And of course the headers of the trace have been zapped by git-commit :-)
An example with the headers:

# tracer: kmemtrace
#
#
# ALLOC TYPE REQ GIVEN FLAGS POINTER NODE CALLER
# FREE | | | | | | | |
# |

- C 0xffff88007c088780 file_free_rcu
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
- C 0xffff88007cad6000 putname
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
+ K 240 240 000000d0 0xffff8800790dc780 -1 d_alloc
- C 0xffff88007cad6000 putname
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
+ K 240 240 000000d0 0xffff8800790dc870 -1 d_alloc
- C 0xffff88007cad6000 putname
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
+ K 240 240 000000d0 0xffff8800790dc960 -1 d_alloc
+ K 1304 1312 000000d0 0xffff8800791d7340 -1 reiserfs_alloc_inode

2008-12-29 22:09:55

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

Pekka, note that I would be pleased to add statistical tracing on
this tracer, but I would need a hashtable, or an array, or a list, or whatever
iterable to insert the data into the stat tracing api.

But I don't know your projects about this... whether you wanted to use a section
or something else...

2008-12-29 22:13:18

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

On Mon, Dec 29, 2008 at 11:09:37PM +0100, Frederic Weisbecker wrote:
> Pekka, note that I would be pleased to add statistical tracing on
> this tracer, but I would need a hashtable, or an array, or a list, or whatever
> iterable to insert the data into the stat tracing api.
>
> But I don't know your projects about this... whether you wanted to use a section
> or something else...

Hmm, forgot to cc Ingo...

2008-12-30 07:34:37

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

* Frederic Weisbecker <[email protected]> wrote:

> Impact: new tracer
>
> This patch adapts kmemtrace raw events tracing to the unified tracing
> API. It applies on latest tip/tracing/kmemtrace To enable and use this
> tracer, just do the following:
>
> echo kmemtrace > /debugfs/tracing/current_tracer
> cat /debugfs/tracing/trace

nice! I've put this into a separate topic for the time being:
tip/tracing/kmemtrace2, so that Pekka and Eduard can comment on it.

Ingo

2008-12-30 07:46:09

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

* Frederic Weisbecker <[email protected]> wrote:

> kernel/trace/trace_kmemtrace.c | 345 ++++++++++++++++++++++++++++++++++++++++

btw., i renamed this to kernel/trace/kmemtrace.c. Mentioning 'trace' twice
is enough already ;-)

Ingo

2008-12-30 07:49:36

by Pekka Enberg

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

Hi Frederic,

On Mon, 2008-12-29 at 23:09 +0100, Frederic Weisbecker wrote:
> Pekka, note that I would be pleased to add statistical tracing on
> this tracer, but I would need a hashtable, or an array, or a list, or whatever
> iterable to insert the data into the stat tracing api.
>
> But I don't know your projects about this... whether you wanted to use a section
> or something else...

It really depends on what we're tracing. If we're interested in just the
allocation hotspots, a section will do just fine. However, if we're
tracing memory footprint, we need to take into store the object pointer
returned from kmalloc() and kmem_cache_alloc() so we can update
call-site statistics properly upon kfree().

So I suppose we need both, a section for per call-site statistics and a
hash table for the object -> call-site mapping.

Pekka

2008-12-30 08:00:55

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

* Frederic Weisbecker <[email protected]> wrote:

> Pekka, note that I would be pleased to add statistical tracing on this
> tracer, but I would need a hashtable, or an array, or a list, or
> whatever iterable to insert the data into the stat tracing api.

there would be a couple of natural objects to group events by:

1) callsite [IP] of kmalloc()/kfree()/etc.

2) slab cache

3) slab object

for 1) callsite based histograms, i think ftrace should have a built-in
mechanism for that. kmemtrace tracepoints already pass in a call_site
argument that can be used to drive it.

for 2) slab cache based histograms (counts) - we need some knowledge about
the affected slab caches, and we need some space as well. The tracepoints
could be extended with a kmem_cache argument perhaps. A callback is needed
at cache creation time (which could be in the form of a tracepoint) that
gives kernel/tracing/kmemtrace.c the right place to allocate the per slab
histogram. (so that the other tracepoints dont have to do it implicitly -
which would be fragile as we are in the SLAB code, often with spinlocks
taken, so we cannot allocate)

i think 3) is the hardest so lets skip it for now ;-)

Ingo

2008-12-30 08:01:25

by Eduard - Gabriel Munteanu

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

On Tue, Dec 30, 2008 at 09:49:24AM +0200, Pekka Enberg wrote:
> Hi Frederic,
>
> On Mon, 2008-12-29 at 23:09 +0100, Frederic Weisbecker wrote:
> > Pekka, note that I would be pleased to add statistical tracing on
> > this tracer, but I would need a hashtable, or an array, or a list, or whatever
> > iterable to insert the data into the stat tracing api.
> >
> > But I don't know your projects about this... whether you wanted to use a section
> > or something else...
>
> It really depends on what we're tracing. If we're interested in just the
> allocation hotspots, a section will do just fine. However, if we're
> tracing memory footprint, we need to take into store the object pointer
> returned from kmalloc() and kmem_cache_alloc() so we can update
> call-site statistics properly upon kfree().
>
> So I suppose we need both, a section for per call-site statistics and a
> hash table for the object -> call-site mapping.
>
> Pekka

Hi Frederic,

Thanks for doing this work.

kmemtrace-user currently shows two main views: per call-site and
per-allocation. For the former, it uses a hashtable to accumulate
statistics. There's also a hashtable for individual allocations (by ptr
address), which we use to check for consistency for example ("if this is
going to be freed, has it been allocated?"). Apart from those two views,
it also shows global statistics.

Thanks again,
Eduard

2008-12-30 08:16:22

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

* Pekka Enberg <[email protected]> wrote:

> Hi Frederic,
>
> On Mon, 2008-12-29 at 23:09 +0100, Frederic Weisbecker wrote:
> > Pekka, note that I would be pleased to add statistical tracing on
> > this tracer, but I would need a hashtable, or an array, or a list, or whatever
> > iterable to insert the data into the stat tracing api.
> >
> > But I don't know your projects about this... whether you wanted to use a section
> > or something else...
>
> It really depends on what we're tracing. If we're interested in just the
> allocation hotspots, a section will do just fine. However, if we're
> tracing memory footprint, we need to take into store the object pointer
> returned from kmalloc() and kmem_cache_alloc() so we can update
> call-site statistics properly upon kfree().
>
> So I suppose we need both, a section for per call-site statistics and a
> hash table for the object -> call-site mapping.

1)

i think the call_site based tracking should be a built-in capability - the
branch tracer needs that too for example. That would also make it very
simple on the usage place: you wouldnt have to worry about sections in
slub.c/etc.

2)

i think a possibly useful intermediate object would be the slab cache
itself, which could be the basis for some highlevel stats too. It would
probably overlap /proc/slabinfo statistics but it's a natural part of this
abstraction i think.

3)

the most lowlevel (and hence most allocation-footprint sensitive) object
to track would be the memory object itself. I think the best approach
would be to do a static, limited size hash that could track up to N memory
objects.

The advantage of such an approach is that it does not impact allocation
patterns at all (besides the one-time allocation cost of the hash itself
during tracer startup).

The disadvantage is when an overflow happens: the sizing heuristics would
get the size correct most of the time anyway, so it's not a practical
issue. There would be some sort of sizing control similar to
/debug/tracing/buffer_size_kb, and a special trace entry that signals an
'overflow' of the hash table. (in that case we wont track certain objects
- but it would be clear from the trace output what happens and the hash
size can be adjusted.)

Another advantage would be that it would trivially not interact with any
allocator - because the hash itself would never 'allocate' in any dynamic
way. Either there are free entries available (in which case we use it), or
not - in which case we emit an hash-overflow trace entry.

And this too would be driven from ftrace mainly - the SLAB code would only
offer the alloc+free callbacks with the object IDs. [ and this means that
we could detect memory leaks by looking at the hash table and print out
the age of entries :-) ]

How does this sound to you?

Ingo

2008-12-30 08:41:19

by Eduard - Gabriel Munteanu

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

On Tue, Dec 30, 2008 at 09:16:00AM +0100, Ingo Molnar wrote:
> 3)
>
> the most lowlevel (and hence most allocation-footprint sensitive) object
> to track would be the memory object itself. I think the best approach
> would be to do a static, limited size hash that could track up to N memory
> objects.
>
> The advantage of such an approach is that it does not impact allocation
> patterns at all (besides the one-time allocation cost of the hash itself
> during tracer startup).

kmemtrace-user handles this by analysing offline :). I presume you could get
around this by discarding every hash collision in a well-sized
hashtable. The hashing algo in kmemtrace-user performs okay, considering
it fills the hashtable almost entirely, but I presume you're doing that
in-kernel and using other available code.

> And this too would be driven from ftrace mainly - the SLAB code would only
> offer the alloc+free callbacks with the object IDs. [ and this means that
> we could detect memory leaks by looking at the hash table and print out
> the age of entries :-) ]

Some time ago I dropped timestamps because they were not providing a
good way to reorder packets in userspace. We're currently relying on a
sequence number to do that. You could take that as 'age', but it's not
temporally-meaningful.

Eduard

2008-12-30 08:47:43

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

* Eduard - Gabriel Munteanu <[email protected]> wrote:

> On Tue, Dec 30, 2008 at 09:16:00AM +0100, Ingo Molnar wrote:
> > 3)
> >
> > the most lowlevel (and hence most allocation-footprint sensitive) object
> > to track would be the memory object itself. I think the best approach
> > would be to do a static, limited size hash that could track up to N memory
> > objects.
> >
> > The advantage of such an approach is that it does not impact allocation
> > patterns at all (besides the one-time allocation cost of the hash itself
> > during tracer startup).
>
> kmemtrace-user handles this by analysing offline :). I presume you could
> get around this by discarding every hash collision in a well-sized
> hashtable. The hashing algo in kmemtrace-user performs okay, considering
> it fills the hashtable almost entirely, but I presume you're doing that
> in-kernel and using other available code.

yeah - this is not a replacement for kmemtrace-user - analyzing raw trace
events offline is still possible of course.

> > And this too would be driven from ftrace mainly - the SLAB code would
> > only offer the alloc+free callbacks with the object IDs. [ and this
> > means that we could detect memory leaks by looking at the hash table
> > and print out the age of entries :-) ]
>
> Some time ago I dropped timestamps because they were not providing a
> good way to reorder packets in userspace. We're currently relying on a
> sequence number to do that. You could take that as 'age', but it's not
> temporally-meaningful.

yeah - ftrace entries generally have a timestamp so it should be rather
easy.

Ingo

2008-12-30 09:02:13

by Pekka Enberg

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

Hi Ingo,

On Tue, 2008-12-30 at 09:16 +0100, Ingo Molnar wrote:
> 1)
>
> i think the call_site based tracking should be a built-in capability - the
> branch tracer needs that too for example. That would also make it very
> simple on the usage place: you wouldnt have to worry about sections in
> slub.c/etc.
>
> 2)
>
> i think a possibly useful intermediate object would be the slab cache
> itself, which could be the basis for some highlevel stats too. It would
> probably overlap /proc/slabinfo statistics but it's a natural part of this
> abstraction i think.

Makes sense but keep in mind that this is really just an extension to
SLUB statistics and is only good for detecting allocation hotspots, not
for analyzing memory footprint.

On Tue, 2008-12-30 at 09:16 +0100, Ingo Molnar wrote:
> 3)
>
> the most lowlevel (and hence most allocation-footprint sensitive) object
> to track would be the memory object itself. I think the best approach
> would be to do a static, limited size hash that could track up to N memory
> objects.
>
> The advantage of such an approach is that it does not impact allocation
> patterns at all (besides the one-time allocation cost of the hash itself
> during tracer startup).
>
> The disadvantage is when an overflow happens: the sizing heuristics would
> get the size correct most of the time anyway, so it's not a practical
> issue. There would be some sort of sizing control similar to
> /debug/tracing/buffer_size_kb, and a special trace entry that signals an
> 'overflow' of the hash table. (in that case we wont track certain objects
> - but it would be clear from the trace output what happens and the hash
> size can be adjusted.)
>
> Another advantage would be that it would trivially not interact with any
> allocator - because the hash itself would never 'allocate' in any dynamic
> way. Either there are free entries available (in which case we use it), or
> not - in which case we emit an hash-overflow trace entry.
>
> And this too would be driven from ftrace mainly - the SLAB code would only
> offer the alloc+free callbacks with the object IDs. [ and this means that
> we could detect memory leaks by looking at the hash table and print out
> the age of entries :-) ]
>
> How does this sound to you?

That will probably be okay for things like analyzing memory footprint
immediately after boot. However, as soon as the amount of active memory
objects increases (think dentry and inode cache), the numbers might get
skewed. One option would be to let the user exclude some of the caches
from tracing.

Pekka

2008-12-30 09:12:10

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

* Pekka Enberg <[email protected]> wrote:

> Hi Ingo,
>
> On Tue, 2008-12-30 at 09:16 +0100, Ingo Molnar wrote:
> > 1)
> >
> > i think the call_site based tracking should be a built-in capability - the
> > branch tracer needs that too for example. That would also make it very
> > simple on the usage place: you wouldnt have to worry about sections in
> > slub.c/etc.
> >
> > 2)
> >
> > i think a possibly useful intermediate object would be the slab cache
> > itself, which could be the basis for some highlevel stats too. It would
> > probably overlap /proc/slabinfo statistics but it's a natural part of this
> > abstraction i think.
>
> Makes sense but keep in mind that this is really just an extension to
> SLUB statistics and is only good for detecting allocation hotspots, not
> for analyzing memory footprint.
>
> On Tue, 2008-12-30 at 09:16 +0100, Ingo Molnar wrote:
> > 3)
> >
> > the most lowlevel (and hence most allocation-footprint sensitive) object
> > to track would be the memory object itself. I think the best approach
> > would be to do a static, limited size hash that could track up to N memory
> > objects.
> >
> > The advantage of such an approach is that it does not impact allocation
> > patterns at all (besides the one-time allocation cost of the hash itself
> > during tracer startup).
> >
> > The disadvantage is when an overflow happens: the sizing heuristics would
> > get the size correct most of the time anyway, so it's not a practical
> > issue. There would be some sort of sizing control similar to
> > /debug/tracing/buffer_size_kb, and a special trace entry that signals an
> > 'overflow' of the hash table. (in that case we wont track certain objects
> > - but it would be clear from the trace output what happens and the hash
> > size can be adjusted.)
> >
> > Another advantage would be that it would trivially not interact with any
> > allocator - because the hash itself would never 'allocate' in any dynamic
> > way. Either there are free entries available (in which case we use it), or
> > not - in which case we emit an hash-overflow trace entry.
> >
> > And this too would be driven from ftrace mainly - the SLAB code would only
> > offer the alloc+free callbacks with the object IDs. [ and this means that
> > we could detect memory leaks by looking at the hash table and print out
> > the age of entries :-) ]
> >
> > How does this sound to you?
>
> That will probably be okay for things like analyzing memory footprint
> immediately after boot. However, as soon as the amount of active memory
> objects increases (think dentry and inode cache), the numbers might get
> skewed. One option would be to let the user exclude some of the caches
> from tracing.

well, it gets skewed only in terms of total footprint: the same way as if
you had total_ram-hash_size amount of RAM. Since there are so many RAM
sizes possible, this can be considered as if the test was done on a
slighly smaller machine - but otherwise it's an invariant. It wont impact
the micro-layout of the slab objects themselves (does not change their
size), and it shouldnt impact most workloads which behave very gradually
to small changes in total memory size.

Ingo

2008-12-30 14:16:26

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

On Tue, Dec 30, 2008 at 09:16:00AM +0100, Ingo Molnar wrote:
>
> * Pekka Enberg <[email protected]> wrote:
>
> > Hi Frederic,
> >
> > On Mon, 2008-12-29 at 23:09 +0100, Frederic Weisbecker wrote:
> > > Pekka, note that I would be pleased to add statistical tracing on
> > > this tracer, but I would need a hashtable, or an array, or a list, or whatever
> > > iterable to insert the data into the stat tracing api.
> > >
> > > But I don't know your projects about this... whether you wanted to use a section
> > > or something else...
> >
> > It really depends on what we're tracing. If we're interested in just the
> > allocation hotspots, a section will do just fine. However, if we're
> > tracing memory footprint, we need to take into store the object pointer
> > returned from kmalloc() and kmem_cache_alloc() so we can update
> > call-site statistics properly upon kfree().
> >
> > So I suppose we need both, a section for per call-site statistics and a
> > hash table for the object -> call-site mapping.
>
> 1)
>
> i think the call_site based tracking should be a built-in capability - the
> branch tracer needs that too for example. That would also make it very
> simple on the usage place: you wouldnt have to worry about sections in
> slub.c/etc.

I think that too. Can we use sections here? The traced functions are not
directly kmalloc/kmem_cache_alloc and to use a section which contains the
per site allocation requests, such a thing is required (we can't build a section
with per site allocations requests by using intermediate level allocation function
I fear...).

> 2)
>
> i think a possibly useful intermediate object would be the slab cache
> itself, which could be the basis for some highlevel stats too. It would
> probably overlap /proc/slabinfo statistics but it's a natural part of this
> abstraction i think.
>
> 3)
>
> the most lowlevel (and hence most allocation-footprint sensitive) object
> to track would be the memory object itself. I think the best approach
> would be to do a static, limited size hash that could track up to N memory
> objects.
> The advantage of such an approach is that it does not impact allocation
> patterns at all (besides the one-time allocation cost of the hash itself
> during tracer startup).
>
> The disadvantage is when an overflow happens: the sizing heuristics would
> get the size correct most of the time anyway, so it's not a practical
> issue. There would be some sort of sizing control similar to
> /debug/tracing/buffer_size_kb, and a special trace entry that signals an
> 'overflow' of the hash table. (in that case we wont track certain objects
> - but it would be clear from the trace output what happens and the hash
> size can be adjusted.)
>
> Another advantage would be that it would trivially not interact with any
> allocator - because the hash itself would never 'allocate' in any dynamic
> way. Either there are free entries available (in which case we use it), or
> not - in which case we emit an hash-overflow trace entry.
>
> And this too would be driven from ftrace mainly - the SLAB code would only
> offer the alloc+free callbacks with the object IDs. [ and this means that
> we could detect memory leaks by looking at the hash table and print out
> the age of entries :-) ]
>
> How does this sound to you?
>
> Ingo

That looks good. Since we can have an overflow event, it would always be possible
to built-in enlarge it for debugging purposes....

2008-12-30 14:32:36

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

On Tue, Dec 30, 2008 at 09:49:24AM +0200, Pekka Enberg wrote:
> Hi Frederic,
>
> On Mon, 2008-12-29 at 23:09 +0100, Frederic Weisbecker wrote:
> > Pekka, note that I would be pleased to add statistical tracing on
> > this tracer, but I would need a hashtable, or an array, or a list, or whatever
> > iterable to insert the data into the stat tracing api.
> >
> > But I don't know your projects about this... whether you wanted to use a section
> > or something else...
>
> It really depends on what we're tracing. If we're interested in just the
> allocation hotspots, a section will do just fine. However, if we're
> tracing memory footprint, we need to take into store the object pointer
> returned from kmalloc() and kmem_cache_alloc() so we can update
> call-site statistics properly upon kfree().
>
> So I suppose we need both, a section for per call-site statistics and a
> hash table for the object -> call-site mapping.
>
> Pekka
>

BTW. By looking on your needs, the statistical branch tracer and the rcu stat one,
it looks like these tracers often (if not always) need several stat files, not only
one.

Actually what I should do with stat tracing is to build a directory inside
/debugfs/tracing devoted to the several stat output a tracer need.

Which means I have to make trace_stat.c completely reentrant and some other
things....plus the fact that multiple tracers should be enable to run concurrently
for 2.6.30... which means one directory of stat tracing for each running tracer.

2008-12-30 15:38:27

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

* Frederic Weisbecker <[email protected]> wrote:

> On Tue, Dec 30, 2008 at 09:16:00AM +0100, Ingo Molnar wrote:
> >
> > * Pekka Enberg <[email protected]> wrote:
> >
> > > Hi Frederic,
> > >
> > > On Mon, 2008-12-29 at 23:09 +0100, Frederic Weisbecker wrote:
> > > > Pekka, note that I would be pleased to add statistical tracing on
> > > > this tracer, but I would need a hashtable, or an array, or a list, or whatever
> > > > iterable to insert the data into the stat tracing api.
> > > >
> > > > But I don't know your projects about this... whether you wanted to use a section
> > > > or something else...
> > >
> > > It really depends on what we're tracing. If we're interested in just the
> > > allocation hotspots, a section will do just fine. However, if we're
> > > tracing memory footprint, we need to take into store the object pointer
> > > returned from kmalloc() and kmem_cache_alloc() so we can update
> > > call-site statistics properly upon kfree().
> > >
> > > So I suppose we need both, a section for per call-site statistics and a
> > > hash table for the object -> call-site mapping.
> >
> > 1)
> >
> > i think the call_site based tracking should be a built-in capability - the
> > branch tracer needs that too for example. That would also make it very
> > simple on the usage place: you wouldnt have to worry about sections in
> > slub.c/etc.
>
>
> I think that too. Can we use sections here? The traced functions are not
> directly kmalloc/kmem_cache_alloc and to use a section which contains
> the per site allocation requests, such a thing is required (we can't
> build a section with per site allocations requests by using intermediate
> level allocation function I fear...).

i think initially this should be a fixed-size allocation array + hash as
well. (like lockdep uses) The number of allocation sites is even the most
extreme case at most a few thousand - and is typically at most a couple of
hundred.

Ingo

2008-12-30 21:09:50

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

On Tue, Dec 30, 2008 at 04:37:52PM +0100, Ingo Molnar wrote:
>
> * Frederic Weisbecker <[email protected]> wrote:
>
> > On Tue, Dec 30, 2008 at 09:16:00AM +0100, Ingo Molnar wrote:
> > >
> > > * Pekka Enberg <[email protected]> wrote:
> > >
> > > > Hi Frederic,
> > > >
> > > > On Mon, 2008-12-29 at 23:09 +0100, Frederic Weisbecker wrote:
> > > > > Pekka, note that I would be pleased to add statistical tracing on
> > > > > this tracer, but I would need a hashtable, or an array, or a list, or whatever
> > > > > iterable to insert the data into the stat tracing api.
> > > > >
> > > > > But I don't know your projects about this... whether you wanted to use a section
> > > > > or something else...
> > > >
> > > > It really depends on what we're tracing. If we're interested in just the
> > > > allocation hotspots, a section will do just fine. However, if we're
> > > > tracing memory footprint, we need to take into store the object pointer
> > > > returned from kmalloc() and kmem_cache_alloc() so we can update
> > > > call-site statistics properly upon kfree().
> > > >
> > > > So I suppose we need both, a section for per call-site statistics and a
> > > > hash table for the object -> call-site mapping.
> > >
> > > 1)
> > >
> > > i think the call_site based tracking should be a built-in capability - the
> > > branch tracer needs that too for example. That would also make it very
> > > simple on the usage place: you wouldnt have to worry about sections in
> > > slub.c/etc.
> >
> >
> > I think that too. Can we use sections here? The traced functions are not
> > directly kmalloc/kmem_cache_alloc and to use a section which contains
> > the per site allocation requests, such a thing is required (we can't
> > build a section with per site allocations requests by using intermediate
> > level allocation function I fear...).
>
> i think initially this should be a fixed-size allocation array + hash as
> well. (like lockdep uses) The number of allocation sites is even the most
> extreme case at most a few thousand - and is typically at most a couple of
> hundred.
>
> Ingo

Why not. And if someone reports too much overruns, we could make the size of this
array an option for the kernel.

2009-01-05 16:49:17

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

On Tue, 30 Dec 2008, Ingo Molnar wrote:

>
> * Frederic Weisbecker <[email protected]> wrote:
>
> > kernel/trace/trace_kmemtrace.c | 345 ++++++++++++++++++++++++++++++++++++++++
>
> btw., i renamed this to kernel/trace/kmemtrace.c. Mentioning 'trace' twice
> is enough already ;-)

Should we rename all the kernel/trace/trace_(.*)\.c to kernel/trace/\1.c ?

Although I do like the trace in the function name, because it groups them
nicely together, and keeps ftrace.c and ring_buffer.c stand out from the
rest.

-- Steve

2009-01-05 23:51:07

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API

On Mon, Jan 05, 2009 at 11:48:56AM -0500, Steven Rostedt wrote:
>
> On Tue, 30 Dec 2008, Ingo Molnar wrote:
>
> >
> > * Frederic Weisbecker <[email protected]> wrote:
> >
> > > kernel/trace/trace_kmemtrace.c | 345 ++++++++++++++++++++++++++++++++++++++++
> >
> > btw., i renamed this to kernel/trace/kmemtrace.c. Mentioning 'trace' twice
> > is enough already ;-)
>
> Should we rename all the kernel/trace/trace_(.*)\.c to kernel/trace/\1.c ?
>
> Although I do like the trace in the function name, because it groups them
> nicely together, and keeps ftrace.c and ring_buffer.c stand out from the
> rest.
>
> -- Steve
>

Or why not a subdirectory called "tracers" inside kernel/trace to store
the tracers files. This way, the core of tracing (ftrace.c/ring_buffer.c/trace_output.c/
/trace.c/trace_stat.c) can be found easily against the rest. Moreover, there is a risk that
the number of tracers will tend to increase by the time.