Hi,
This set exports the functions hashlist in use by the functions
profiler for more general uses.
For now it has fixed a scalability problem in the function graph
tracer concerning the function start filtering (toward
set_graph_function file).
This is especially the beginning of a work to allow multiple function
(and function graph) tracers instances usable in parallel with
their own dynamic ftrace parameters. The plan is then to make
these tracers available for wider uses like perf, toward the trace
event api or whatever...
This is still in RFC stage because with this patchset,
the set of function filters from set_graph_function is only
active after the function graph tracer is enabled. If
set_graph_function is changed in the middle of the tracing, the
filters will be actually updated only after you switch to another
tracer and re-switch to the grah tracer. But that should be
fixed in a further iteration.
Frederic Weisbecker (9):
ftrace: Generalize the function hashlist from function profiler
ftrace: Ensure tracing has really stopped before leaving unregister_ftrace_graph
ftrace: Drop the ftrace_profile_enabled checks in tracing hot path
ftrace: Ensure buffers are visibles to tracing callbacks right away
ftrace: Drop buffer check in function profiler callbacks
ftrace: Release the function hlist if we don't need it anymore
ftrace: Make the function hashlist concurrently usable
tracing: Use the hashlist for graph function
ftrace: Factorize search and insertion in the function hashlist
Lai Jiangshan (1):
tracing: Simplify test for function_graph tracing
kernel/trace/Makefile | 1 +
kernel/trace/ftrace.c | 378 ++++++++++------------------------
kernel/trace/functions_hlist.c | 289 ++++++++++++++++++++++++++
kernel/trace/functions_hlist.h | 54 +++++
kernel/trace/trace.h | 21 +--
kernel/trace/trace_functions_graph.c | 46 ++++-
6 files changed, 493 insertions(+), 296 deletions(-)
create mode 100644 kernel/trace/functions_hlist.c
create mode 100644 kernel/trace/functions_hlist.h
Every time we enter the function profiler tracing callbacks, we first
check if the function profiling is enabled.
This check is useless because we register the function graph
callbacks only after the hashlist has been initialized.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Lai Jiangshan <[email protected]>
---
kernel/trace/ftrace.c | 7 ++-----
1 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 94117ec..f258f67 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -381,13 +381,10 @@ function_profile_call(unsigned long ip, unsigned long parent_ip)
struct func_node *rec;
unsigned long flags;
- if (!ftrace_profile_enabled)
- return;
-
local_irq_save(flags);
hlist = &__get_cpu_var(func_hlist_cpu);
- if (!hlist->hash || !ftrace_profile_enabled)
+ if (!hlist->hash)
goto out;
rec = function_find_hlist_node(hlist, ip);
@@ -418,7 +415,7 @@ static void profile_graph_return(struct ftrace_graph_ret *trace)
local_irq_save(flags);
hlist = &__get_cpu_var(func_hlist_cpu);
- if (!hlist->hash || !ftrace_profile_enabled)
+ if (!hlist->hash)
goto out;
calltime = trace->rettime - trace->calltime;
--
1.6.2.3
From: Lai Jiangshan <[email protected]>
For function_graph, a calling function is to be traced only when
it is enabled or it is nested in a enabled function.
Current code uses TSK_TRACE_FL_GRAPH to test whether it is nested
or not. Looking at the code, we can get this:
(trace->depth > 0) <==> (TSK_TRACE_FL_GRAPH is set)
trace->depth is more explicit to tell that it is nested.
So we use trace->depth directly and simplify the code.
No functionality is changed.
TSK_TRACE_FL_GRAPH is not removed now, it is left for future usage.
Signed-off-by: Lai Jiangshan <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Li Zefan <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/trace/trace.h | 2 +-
kernel/trace/trace_functions_graph.c | 8 ++------
2 files changed, 3 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 4df6a77..ce077fb 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -504,7 +504,7 @@ static inline int ftrace_graph_addr(unsigned long addr)
{
int i;
- if (!ftrace_graph_count || test_tsk_trace_graph(current))
+ if (!ftrace_graph_count)
return 1;
for (i = 0; i < ftrace_graph_count; i++) {
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index f225229..616b135 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -215,7 +215,8 @@ int trace_graph_entry(struct ftrace_graph_ent *trace)
if (!ftrace_trace_task(current))
return 0;
- if (!ftrace_graph_addr(trace->func))
+ /* trace it when it is-nested-in or is a function enabled. */
+ if (!(trace->depth || ftrace_graph_addr(trace->func)))
return 0;
local_irq_save(flags);
@@ -228,9 +229,6 @@ int trace_graph_entry(struct ftrace_graph_ent *trace)
} else {
ret = 0;
}
- /* Only do the atomic if it is not already set */
- if (!test_tsk_trace_graph(current))
- set_tsk_trace_graph(current);
atomic_dec(&data->disabled);
local_irq_restore(flags);
@@ -278,8 +276,6 @@ void trace_graph_return(struct ftrace_graph_ret *trace)
pc = preempt_count();
__trace_graph_return(tr, trace, flags, pc);
}
- if (!trace->depth)
- clear_tsk_trace_graph(current);
atomic_dec(&data->disabled);
local_irq_restore(flags);
}
--
1.6.2.3
After we disable the function profiler, the function hashlist
stays in memory. This is wasteful as nobody needs it anymore,
until the next use if any.
Release it when we disable the function profiler instead of
resetting it in the next use.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Lai Jiangshan <[email protected]>
---
kernel/trace/ftrace.c | 1 +
kernel/trace/functions_hlist.c | 61 +++++++++++++++++-----------------------
kernel/trace/functions_hlist.h | 1 +
3 files changed, 28 insertions(+), 35 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index dfd8f7c..0ded01c 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -509,6 +509,7 @@ ftrace_profile_write(struct file *filp, const char __user *ubuf,
* so this acts like an synchronize_sched.
*/
unregister_ftrace_profiler();
+ function_hlist_release();
}
}
out:
diff --git a/kernel/trace/functions_hlist.c b/kernel/trace/functions_hlist.c
index 37804c4..c79c4c5 100644
--- a/kernel/trace/functions_hlist.c
+++ b/kernel/trace/functions_hlist.c
@@ -21,20 +21,23 @@ DEFINE_PER_CPU(struct func_hlist, func_hlist_cpu);
int functions_hash_bits __read_mostly;
-static void function_hlist_reset(struct func_hlist *hlist)
+static void __function_hlist_release(struct func_hlist *hlist)
{
- struct func_hlist_page *pg;
-
- pg = hlist->pages = hlist->start;
+ struct func_hlist_page *pg = hlist->start;
while (pg) {
- memset(pg->records, 0, FUNCTIONS_RECORDS_SIZE);
- pg->index = 0;
+ unsigned long tmp = (unsigned long)pg;
+
pg = pg->next;
+ free_page(tmp);
}
- memset(hlist->hash, 0,
- FUNCTIONS_HLIST_SIZE * sizeof(struct hlist_head));
+ free_page((unsigned long)hlist->pages);
+ hlist->pages = NULL;
+ hlist->start = NULL;
+
+ kfree(hlist->hash);
+ hlist->hash = NULL;
}
static int function_hlist_pages_init(struct func_hlist *hlist)
@@ -44,10 +47,6 @@ static int function_hlist_pages_init(struct func_hlist *hlist)
int pages;
int i;
- /* If we already allocated, do nothing */
- if (hlist->pages)
- return 0;
-
hlist->pages = (void *)get_zeroed_page(GFP_KERNEL);
if (!hlist->pages)
return -ENOMEM;
@@ -72,26 +71,11 @@ static int function_hlist_pages_init(struct func_hlist *hlist)
for (i = 0; i < pages; i++) {
pg->next = (void *)get_zeroed_page(GFP_KERNEL);
if (!pg->next)
- goto out_free;
+ return -ENOMEM;
pg = pg->next;
}
return 0;
-
- out_free:
- pg = hlist->start;
- while (pg) {
- unsigned long tmp = (unsigned long)pg;
-
- pg = pg->next;
- free_page(tmp);
- }
-
- free_page((unsigned long)hlist->pages);
- hlist->pages = NULL;
- hlist->start = NULL;
-
- return -ENOMEM;
}
static int function_hlist_init_cpu(int cpu)
@@ -101,11 +85,8 @@ static int function_hlist_init_cpu(int cpu)
hlist = &per_cpu(func_hlist_cpu, cpu);
- if (hlist->hash) {
- /* If the profile is already created, simply reset it */
- function_hlist_reset(hlist);
- return 0;
- }
+ if (WARN_ON_ONCE(hlist->hash))
+ return -EBUSY;
/*
* We are profiling all functions, but usually only a few thousand
@@ -127,14 +108,24 @@ static int function_hlist_init_cpu(int cpu)
/* Preallocate the function profiling pages */
if (function_hlist_pages_init(hlist) < 0) {
- kfree(hlist->hash);
- hlist->hash = NULL;
+ __function_hlist_release(hlist);
return -ENOMEM;
}
return 0;
}
+void function_hlist_release(void)
+{
+ int cpu;
+ struct func_hlist *hlist;
+
+ for_each_online_cpu(cpu) {
+ hlist = &per_cpu(func_hlist_cpu, cpu);
+ __function_hlist_release(hlist);
+ }
+}
+
int function_hlist_init(void)
{
int cpu;
diff --git a/kernel/trace/functions_hlist.h b/kernel/trace/functions_hlist.h
index 3f4e485..8001f95 100644
--- a/kernel/trace/functions_hlist.h
+++ b/kernel/trace/functions_hlist.h
@@ -36,3 +36,4 @@ struct func_node *
function_hlist_record_alloc(struct func_hlist *hlist, unsigned long ip);
int function_hlist_init(void);
+void function_hlist_release(void);
--
1.6.2.3
A tracer may need to allocate specific buffers before calling
function tracing callbacks. And these callbacks may use these
buffers. To avoid lazy memory ordering, the callbacks may need
to check the state of the buffers before doing anything. This
is wasteful on the tracing hot path.
Let's then put an appropriate barrier before registering tracing
callbacks.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Lai Jiangshan <[email protected]>
---
kernel/trace/ftrace.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index f258f67..3cdb35e 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3099,6 +3099,14 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
goto out;
}
+ /*
+ * The tracing callbacks might use buffers that the tracer could
+ * have allocated just before. Ensure this is all visible before
+ * starting the tracing, so that we avoid null pointer checks
+ * in the tracing hot paths.
+ */
+ smp_mb();
+
ftrace_graph_return = retfunc;
ftrace_graph_entry = entryfunc;
--
1.6.2.3
Allow the function hashlist to be usable by simultaneous users.
The function tracers can then plug in for their own needs.
This introduces a get/put_function_hlist() that keeps track
of the hashlist users for allocation and release.
We also introduce function_hlist_reset_profile() that resets
the function profiling statistics on a function hashlist already
in use.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Lai Jiangshan <[email protected]>
---
kernel/trace/ftrace.c | 5 ++-
kernel/trace/functions_hlist.c | 56 ++++++++++++++++++++++++++++++++++++++-
kernel/trace/functions_hlist.h | 5 ++-
3 files changed, 60 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 0ded01c..027743c 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -490,11 +490,12 @@ ftrace_profile_write(struct file *filp, const char __user *ubuf,
mutex_lock(&ftrace_profile_lock);
if (ftrace_profile_enabled ^ val) {
if (val) {
- ret = function_hlist_init();
+ ret = get_function_hlist();
if (ret < 0) {
cnt = ret;
goto out;
}
+ function_hlist_reset_profile();
ret = register_ftrace_profiler();
if (ret < 0) {
@@ -509,7 +510,7 @@ ftrace_profile_write(struct file *filp, const char __user *ubuf,
* so this acts like an synchronize_sched.
*/
unregister_ftrace_profiler();
- function_hlist_release();
+ put_function_hlist();
}
}
out:
diff --git a/kernel/trace/functions_hlist.c b/kernel/trace/functions_hlist.c
index c79c4c5..d682213 100644
--- a/kernel/trace/functions_hlist.c
+++ b/kernel/trace/functions_hlist.c
@@ -20,6 +20,39 @@
DEFINE_PER_CPU(struct func_hlist, func_hlist_cpu);
int functions_hash_bits __read_mostly;
+static atomic_t hlist_refcount;
+
+
+static void function_hlist_reset_profile_cpu(int cpu)
+{
+ struct func_hlist *hlist = &per_cpu(func_hlist_cpu, cpu);
+ struct hlist_head *head;
+ struct hlist_node *node;
+ struct func_node *rec;
+ int i;
+
+ for (i = 0; i < FUNCTIONS_HLIST_SIZE; i++) {
+ head = &hlist->hash[i];
+
+ if (hlist_empty(head))
+ continue;
+
+ hlist_for_each_entry(rec, node, head, node) {
+ rec->counter = 0;
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+ rec->time = 0;
+#endif
+ }
+ }
+}
+
+void function_hlist_reset_profile(void)
+{
+ int cpu;
+
+ for_each_online_cpu(cpu)
+ function_hlist_reset_profile_cpu(cpu);
+}
static void __function_hlist_release(struct func_hlist *hlist)
{
@@ -115,7 +148,7 @@ static int function_hlist_init_cpu(int cpu)
return 0;
}
-void function_hlist_release(void)
+static void function_hlist_release(void)
{
int cpu;
struct func_hlist *hlist;
@@ -126,7 +159,7 @@ void function_hlist_release(void)
}
}
-int function_hlist_init(void)
+static int function_hlist_init(void)
{
int cpu;
int ret = 0;
@@ -140,6 +173,25 @@ int function_hlist_init(void)
return ret;
}
+int get_function_hlist(void)
+{
+ int ret = 0;
+
+ if (atomic_inc_return(&hlist_refcount) == 1) {
+ ret = function_hlist_init();
+ if (ret < 0)
+ atomic_dec(&hlist_refcount);
+ }
+
+ return ret;
+}
+
+void put_function_hlist(void)
+{
+ if (!atomic_dec_return(&hlist_refcount))
+ function_hlist_release();
+}
+
struct func_node *
function_find_hlist_node(struct func_hlist *hlist, unsigned long ip)
{
diff --git a/kernel/trace/functions_hlist.h b/kernel/trace/functions_hlist.h
index 8001f95..a4655c7 100644
--- a/kernel/trace/functions_hlist.h
+++ b/kernel/trace/functions_hlist.h
@@ -35,5 +35,6 @@ function_find_hlist_node(struct func_hlist *hlist, unsigned long ip);
struct func_node *
function_hlist_record_alloc(struct func_hlist *hlist, unsigned long ip);
-int function_hlist_init(void);
-void function_hlist_release(void);
+int get_function_hlist(void);
+void put_function_hlist(void);
+void function_hlist_reset_profile(void);
--
1.6.2.3
When we set a filter to start tracing from a given function in
the function graph tracer, the filter is stored in a linear array.
It doesn't scale very well, we even limited the number of such
functions to 32.
Now that we have a hashlist of functions, lets put a field inside
each function node so that we can check if a function is one of
these filters using the hashlist, not a linear array.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Lai Jiangshan <[email protected]>
---
kernel/trace/ftrace.c | 61 ++++++++++++++++++++++++++++++++++
kernel/trace/functions_hlist.c | 29 ++++++++++++++++
kernel/trace/functions_hlist.h | 2 +
kernel/trace/trace.h | 21 +-----------
kernel/trace/trace_functions_graph.c | 41 ++++++++++++++++++++++-
5 files changed, 133 insertions(+), 21 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 027743c..d719078 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2193,6 +2193,67 @@ static DEFINE_MUTEX(graph_lock);
int ftrace_graph_count;
unsigned long ftrace_graph_funcs[FTRACE_GRAPH_MAX_FUNCS] __read_mostly;
+static int
+change_graph_function_hlist(unsigned long ip, int val, struct func_hlist *hlist)
+{
+ struct func_node *rec;
+
+ rec = function_find_hlist_node(hlist, ip);
+ if (!rec) {
+ rec = function_hlist_record_alloc(hlist, ip);
+ if (!rec)
+ return 1;
+ }
+
+ rec->graph_start = val;
+
+ return 0;
+}
+
+static int change_graph_function(unsigned long ip, int val)
+{
+ int ret;
+ int cpu;
+ struct func_hlist *hlist;
+
+ for_each_online_cpu(cpu) {
+ hlist = &per_cpu(func_hlist_cpu, cpu);
+ ret = change_graph_function_hlist(ip, val, hlist);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+static void clear_graph_functions(void)
+{
+ int i;
+
+ for (i = 0; i < ftrace_graph_count; i++)
+ change_graph_function(ftrace_graph_funcs[i], 0);
+}
+
+int set_graph_functions(void)
+{
+ int i;
+ int ret = 0;
+
+ mutex_lock(&graph_lock);
+
+ for (i = 0; i < ftrace_graph_count; i++) {
+ ret = change_graph_function(ftrace_graph_funcs[i], 1);
+ if (ret) {
+ clear_graph_functions();
+ break;
+ }
+ }
+
+ mutex_unlock(&graph_lock);
+
+ return 0;
+}
+
static void *
__g_next(struct seq_file *m, loff_t *pos)
{
diff --git a/kernel/trace/functions_hlist.c b/kernel/trace/functions_hlist.c
index d682213..7a60265 100644
--- a/kernel/trace/functions_hlist.c
+++ b/kernel/trace/functions_hlist.c
@@ -54,6 +54,35 @@ void function_hlist_reset_profile(void)
function_hlist_reset_profile_cpu(cpu);
}
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+static void function_hlist_reset_graph_cpu(int cpu)
+{
+ struct func_hlist *hlist = &per_cpu(func_hlist_cpu, cpu);
+ struct hlist_head *head;
+ struct hlist_node *node;
+ struct func_node *rec;
+ int i;
+
+ for (i = 0; i < FUNCTIONS_HLIST_SIZE; i++) {
+ head = &hlist->hash[i];
+
+ if (hlist_empty(head))
+ continue;
+
+ hlist_for_each_entry(rec, node, head, node)
+ rec->graph_start = 0;
+ }
+}
+
+void function_hlist_reset_graph(void)
+{
+ int cpu;
+
+ for_each_online_cpu(cpu)
+ function_hlist_reset_graph_cpu(cpu);
+}
+#endif
+
static void __function_hlist_release(struct func_hlist *hlist)
{
struct func_hlist_page *pg = hlist->start;
diff --git a/kernel/trace/functions_hlist.h b/kernel/trace/functions_hlist.h
index a4655c7..39d89b4 100644
--- a/kernel/trace/functions_hlist.h
+++ b/kernel/trace/functions_hlist.h
@@ -8,6 +8,7 @@ struct func_node {
unsigned long counter;
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
unsigned long long time;
+ int graph_start;
#endif
};
@@ -38,3 +39,4 @@ function_hlist_record_alloc(struct func_hlist *hlist, unsigned long ip);
int get_function_hlist(void);
void put_function_hlist(void);
void function_hlist_reset_profile(void);
+void function_hlist_reset_graph(void);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index ce077fb..ff0b01a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -499,26 +499,7 @@ trace_print_graph_duration(unsigned long long duration, struct trace_seq *s);
#define FTRACE_GRAPH_MAX_FUNCS 32
extern int ftrace_graph_count;
extern unsigned long ftrace_graph_funcs[FTRACE_GRAPH_MAX_FUNCS];
-
-static inline int ftrace_graph_addr(unsigned long addr)
-{
- int i;
-
- if (!ftrace_graph_count)
- return 1;
-
- for (i = 0; i < ftrace_graph_count; i++) {
- if (addr == ftrace_graph_funcs[i])
- return 1;
- }
-
- return 0;
-}
-#else
-static inline int ftrace_graph_addr(unsigned long addr)
-{
- return 1;
-}
+int set_graph_functions(void);
#endif /* CONFIG_DYNAMIC_FTRACE */
#else /* CONFIG_FUNCTION_GRAPH_TRACER */
static inline enum print_line_t
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 616b135..da24add 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -13,6 +13,7 @@
#include "trace.h"
#include "trace_output.h"
+#include "functions_hlist.h"
struct fgraph_cpu_data {
pid_t last_pid;
@@ -202,6 +203,33 @@ static int __trace_graph_entry(struct trace_array *tr,
return 1;
}
+static inline int ftrace_graph_addr(unsigned long addr)
+{
+ struct func_node *rec;
+ struct func_hlist *hlist;
+
+ if (!ftrace_graph_count)
+ return 0;
+
+ hlist = &__get_cpu_var(func_hlist_cpu);
+
+ rec = function_find_hlist_node(hlist, addr);
+ if (!rec) {
+ /*
+ * TODO: send a retrieval error event
+ * to keep track of this.
+ */
+ rec = function_hlist_record_alloc(hlist, addr);
+ if (!rec)
+ return 0;
+ }
+
+ if (rec->graph_start)
+ return 1;
+
+ return 0;
+}
+
int trace_graph_entry(struct ftrace_graph_ent *trace)
{
struct trace_array *tr = graph_array;
@@ -294,10 +322,20 @@ static int graph_trace_init(struct trace_array *tr)
int ret;
set_graph_array(tr);
+ ret = get_function_hlist();
+ if (ret)
+ return ret;
+
+ function_hlist_reset_graph();
+ set_graph_functions();
+
ret = register_ftrace_graph(&trace_graph_return,
&trace_graph_entry);
- if (ret)
+ if (ret) {
+ put_function_hlist();
return ret;
+ }
+
tracing_start_cmdline_record();
return 0;
@@ -307,6 +345,7 @@ static void graph_trace_reset(struct trace_array *tr)
{
tracing_stop_cmdline_record();
unregister_ftrace_graph();
+ put_function_hlist();
}
static int max_bytes_for_cpu;
--
1.6.2.3
Factorize search and insertion operations in the function hashlist
to remove some redundant code.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Lai Jiangshan <[email protected]>
---
kernel/trace/ftrace.c | 18 ++++++------------
kernel/trace/functions_hlist.h | 12 ++++++++++++
kernel/trace/trace_functions_graph.c | 17 +++++++----------
3 files changed, 25 insertions(+), 22 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index d719078..785d077 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -385,12 +385,9 @@ function_profile_call(unsigned long ip, unsigned long parent_ip)
hlist = &__get_cpu_var(func_hlist_cpu);
- rec = function_find_hlist_node(hlist, ip);
- if (!rec) {
- rec = function_hlist_record_alloc(hlist, ip);
- if (!rec)
- goto out;
- }
+ rec = function_get_hlist_node(hlist, ip);
+ if (!rec)
+ goto out;
rec->counter++;
out:
@@ -2198,12 +2195,9 @@ change_graph_function_hlist(unsigned long ip, int val, struct func_hlist *hlist)
{
struct func_node *rec;
- rec = function_find_hlist_node(hlist, ip);
- if (!rec) {
- rec = function_hlist_record_alloc(hlist, ip);
- if (!rec)
- return 1;
- }
+ rec = function_get_hlist_node(hlist, ip);
+ if (!rec)
+ return 1;
rec->graph_start = val;
diff --git a/kernel/trace/functions_hlist.h b/kernel/trace/functions_hlist.h
index 39d89b4..3baf4bb 100644
--- a/kernel/trace/functions_hlist.h
+++ b/kernel/trace/functions_hlist.h
@@ -40,3 +40,15 @@ int get_function_hlist(void);
void put_function_hlist(void);
void function_hlist_reset_profile(void);
void function_hlist_reset_graph(void);
+
+static inline struct func_node *
+function_get_hlist_node(struct func_hlist *hlist, unsigned long ip)
+{
+ struct func_node *rec;
+
+ rec = function_find_hlist_node(hlist, ip);
+ if (!rec)
+ rec = function_hlist_record_alloc(hlist, ip);
+
+ return rec;
+}
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index da24add..0fa0691 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -213,16 +213,13 @@ static inline int ftrace_graph_addr(unsigned long addr)
hlist = &__get_cpu_var(func_hlist_cpu);
- rec = function_find_hlist_node(hlist, addr);
- if (!rec) {
- /*
- * TODO: send a retrieval error event
- * to keep track of this.
- */
- rec = function_hlist_record_alloc(hlist, addr);
- if (!rec)
- return 0;
- }
+ /*
+ * TODO: send a retrieval error event
+ * to keep track of failures
+ */
+ rec = function_get_hlist_node(hlist, addr);
+ if (!rec)
+ return 0;
if (rec->graph_start)
return 1;
--
1.6.2.3
Drop the null check on hlist->hash. It is wasteful because
we don't register the tracer if the buffer allocation failed,
and the memory barrier in register_ftrace_graph() ensure it
is visible to the callbacks right away.
Also we know the tracing callbacks won't be called after
register_ftrace_graph(), so subsequent buffer resets are safe
too.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Lai Jiangshan <[email protected]>
---
kernel/trace/ftrace.c | 5 -----
1 files changed, 0 insertions(+), 5 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 3cdb35e..dfd8f7c 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -384,8 +384,6 @@ function_profile_call(unsigned long ip, unsigned long parent_ip)
local_irq_save(flags);
hlist = &__get_cpu_var(func_hlist_cpu);
- if (!hlist->hash)
- goto out;
rec = function_find_hlist_node(hlist, ip);
if (!rec) {
@@ -415,8 +413,6 @@ static void profile_graph_return(struct ftrace_graph_ret *trace)
local_irq_save(flags);
hlist = &__get_cpu_var(func_hlist_cpu);
- if (!hlist->hash)
- goto out;
calltime = trace->rettime - trace->calltime;
@@ -439,7 +435,6 @@ static void profile_graph_return(struct ftrace_graph_ret *trace)
if (rec)
rec->time += calltime;
- out:
local_irq_restore(flags);
}
--
1.6.2.3
When we run under dynamic tracing, we know that after calling
unregister_ftrace_graph(), tracing has really stopped because of
the hot patching and use of stop_machine().
But in static tracing case, we only set stub callbacks. This is
not sufficient on archs that have weak memory ordering to assume
the older callbacks won't be called right after we leave
unregister_ftrace_graph().
Insert a read/write memory barrier in the end of
unregister_ftrace_graph() so that the code that follow can safely
assume tracing has really stopped. This can avoid its older tracing
callbacks to perform checks about various states like ensuring
needed buffers have been allocated, etc...
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Lai Jiangshan <[email protected]>
---
kernel/trace/ftrace.c | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index c050ce3..94117ec 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3128,6 +3128,18 @@ void unregister_ftrace_graph(void)
out:
mutex_unlock(&ftrace_lock);
+
+ /*
+ * In dynamic tracing case, tracing is really stopped here.
+ * Otherwise we need to ensure ftrace_graph_return
+ * and ftrace_graph_entry are seen uptodate, so that code
+ * that follows unregister_ftrace_graph can safely assume
+ * tracing has stopped. This avoid wasteful checks of tracing
+ * states inside tracing callbacks.
+ */
+#ifndef CONFIG_DYNAMIC_FTRACE
+ smp_mb();
+#endif
}
/* Allocate a return stack for newly created task */
--
1.6.2.3
Extract and generalize the function hashlist in use by the function
profiler.
Having a general purpose hashlist for kernel functions will help
multiplexing dynamic tracing parameters for parallel users of the
function and function graph tracers. This can help tracking who is
tracing which functions among concurrent tracers.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Lai Jiangshan <[email protected]>
---
kernel/trace/Makefile | 1 +
kernel/trace/ftrace.c | 289 ++++-----------------------------------
kernel/trace/functions_hlist.c | 217 ++++++++++++++++++++++++++++++
kernel/trace/functions_hlist.h | 38 ++++++
4 files changed, 286 insertions(+), 259 deletions(-)
create mode 100644 kernel/trace/functions_hlist.c
create mode 100644 kernel/trace/functions_hlist.h
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index d00c6fe..f9804f2 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -58,5 +58,6 @@ obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o
obj-$(CONFIG_KPROBE_EVENT) += trace_kprobe.o
obj-$(CONFIG_KSYM_TRACER) += trace_ksym.o
obj-$(CONFIG_EVENT_TRACING) += power-traces.o
+obj-$(CONFIG_FUNCTION_PROFILER) += functions_hlist.o
libftrace-y := ftrace.o
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 1e6640f..c050ce3 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -27,13 +27,13 @@
#include <linux/sysctl.h>
#include <linux/ctype.h>
#include <linux/list.h>
-#include <linux/hash.h>
#include <trace/events/sched.h>
#include <asm/ftrace.h>
#include <asm/setup.h>
+#include "functions_hlist.h"
#include "trace_output.h"
#include "trace_stat.h"
@@ -258,52 +258,19 @@ static void ftrace_update_pid_func(void)
}
#ifdef CONFIG_FUNCTION_PROFILER
-struct ftrace_profile {
- struct hlist_node node;
- unsigned long ip;
- unsigned long counter;
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
- unsigned long long time;
-#endif
-};
-
-struct ftrace_profile_page {
- struct ftrace_profile_page *next;
- unsigned long index;
- struct ftrace_profile records[];
-};
-
-struct ftrace_profile_stat {
- atomic_t disabled;
- struct hlist_head *hash;
- struct ftrace_profile_page *pages;
- struct ftrace_profile_page *start;
- struct tracer_stat stat;
-};
-
-#define PROFILE_RECORDS_SIZE \
- (PAGE_SIZE - offsetof(struct ftrace_profile_page, records))
-#define PROFILES_PER_PAGE \
- (PROFILE_RECORDS_SIZE / sizeof(struct ftrace_profile))
-
-static int ftrace_profile_bits __read_mostly;
static int ftrace_profile_enabled __read_mostly;
/* ftrace_profile_lock - synchronize the enable and disable of the profiler */
static DEFINE_MUTEX(ftrace_profile_lock);
-static DEFINE_PER_CPU(struct ftrace_profile_stat, ftrace_profile_stats);
-
-#define FTRACE_PROFILE_HASH_SIZE 1024 /* must be power of 2 */
-
static void *
function_stat_next(void *v, int idx)
{
- struct ftrace_profile *rec = v;
- struct ftrace_profile_page *pg;
+ struct func_node *rec = v;
+ struct func_hlist_page *pg;
- pg = (struct ftrace_profile_page *)((unsigned long)rec & PAGE_MASK);
+ pg = (struct func_hlist_page *)((unsigned long)rec & PAGE_MASK);
again:
if (idx != 0)
@@ -323,21 +290,21 @@ function_stat_next(void *v, int idx)
static void *function_stat_start(struct tracer_stat *trace)
{
- struct ftrace_profile_stat *stat =
- container_of(trace, struct ftrace_profile_stat, stat);
+ struct func_hlist *hlist =
+ container_of(trace, struct func_hlist, stat);
- if (!stat || !stat->start)
+ if (!hlist || !hlist->start)
return NULL;
- return function_stat_next(&stat->start->records[0], 0);
+ return function_stat_next(&hlist->start->records[0], 0);
}
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
/* function graph compares on total time */
static int function_stat_cmp(void *p1, void *p2)
{
- struct ftrace_profile *a = p1;
- struct ftrace_profile *b = p2;
+ struct func_node *a = p1;
+ struct func_node *b = p2;
if (a->time < b->time)
return -1;
@@ -350,8 +317,8 @@ static int function_stat_cmp(void *p1, void *p2)
/* not function graph compares against hits */
static int function_stat_cmp(void *p1, void *p2)
{
- struct ftrace_profile *a = p1;
- struct ftrace_profile *b = p2;
+ struct func_node *a = p1;
+ struct func_node *b = p2;
if (a->counter < b->counter)
return -1;
@@ -378,7 +345,7 @@ static int function_stat_headers(struct seq_file *m)
static int function_stat_show(struct seq_file *m, void *v)
{
- struct ftrace_profile *rec = v;
+ struct func_node *rec = v;
char str[KSYM_SYMBOL_LEN];
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
static DEFINE_MUTEX(mutex);
@@ -407,207 +374,11 @@ static int function_stat_show(struct seq_file *m, void *v)
return 0;
}
-static void ftrace_profile_reset(struct ftrace_profile_stat *stat)
-{
- struct ftrace_profile_page *pg;
-
- pg = stat->pages = stat->start;
-
- while (pg) {
- memset(pg->records, 0, PROFILE_RECORDS_SIZE);
- pg->index = 0;
- pg = pg->next;
- }
-
- memset(stat->hash, 0,
- FTRACE_PROFILE_HASH_SIZE * sizeof(struct hlist_head));
-}
-
-int ftrace_profile_pages_init(struct ftrace_profile_stat *stat)
-{
- struct ftrace_profile_page *pg;
- int functions;
- int pages;
- int i;
-
- /* If we already allocated, do nothing */
- if (stat->pages)
- return 0;
-
- stat->pages = (void *)get_zeroed_page(GFP_KERNEL);
- if (!stat->pages)
- return -ENOMEM;
-
-#ifdef CONFIG_DYNAMIC_FTRACE
- functions = ftrace_update_tot_cnt;
-#else
- /*
- * We do not know the number of functions that exist because
- * dynamic tracing is what counts them. With past experience
- * we have around 20K functions. That should be more than enough.
- * It is highly unlikely we will execute every function in
- * the kernel.
- */
- functions = 20000;
-#endif
-
- pg = stat->start = stat->pages;
-
- pages = DIV_ROUND_UP(functions, PROFILES_PER_PAGE);
-
- for (i = 0; i < pages; i++) {
- pg->next = (void *)get_zeroed_page(GFP_KERNEL);
- if (!pg->next)
- goto out_free;
- pg = pg->next;
- }
-
- return 0;
-
- out_free:
- pg = stat->start;
- while (pg) {
- unsigned long tmp = (unsigned long)pg;
-
- pg = pg->next;
- free_page(tmp);
- }
-
- free_page((unsigned long)stat->pages);
- stat->pages = NULL;
- stat->start = NULL;
-
- return -ENOMEM;
-}
-
-static int ftrace_profile_init_cpu(int cpu)
-{
- struct ftrace_profile_stat *stat;
- int size;
-
- stat = &per_cpu(ftrace_profile_stats, cpu);
-
- if (stat->hash) {
- /* If the profile is already created, simply reset it */
- ftrace_profile_reset(stat);
- return 0;
- }
-
- /*
- * We are profiling all functions, but usually only a few thousand
- * functions are hit. We'll make a hash of 1024 items.
- */
- size = FTRACE_PROFILE_HASH_SIZE;
-
- stat->hash = kzalloc(sizeof(struct hlist_head) * size, GFP_KERNEL);
-
- if (!stat->hash)
- return -ENOMEM;
-
- if (!ftrace_profile_bits) {
- size--;
-
- for (; size; size >>= 1)
- ftrace_profile_bits++;
- }
-
- /* Preallocate the function profiling pages */
- if (ftrace_profile_pages_init(stat) < 0) {
- kfree(stat->hash);
- stat->hash = NULL;
- return -ENOMEM;
- }
-
- return 0;
-}
-
-static int ftrace_profile_init(void)
-{
- int cpu;
- int ret = 0;
-
- for_each_online_cpu(cpu) {
- ret = ftrace_profile_init_cpu(cpu);
- if (ret)
- break;
- }
-
- return ret;
-}
-
-/* interrupts must be disabled */
-static struct ftrace_profile *
-ftrace_find_profiled_func(struct ftrace_profile_stat *stat, unsigned long ip)
-{
- struct ftrace_profile *rec;
- struct hlist_head *hhd;
- struct hlist_node *n;
- unsigned long key;
-
- key = hash_long(ip, ftrace_profile_bits);
- hhd = &stat->hash[key];
-
- if (hlist_empty(hhd))
- return NULL;
-
- hlist_for_each_entry_rcu(rec, n, hhd, node) {
- if (rec->ip == ip)
- return rec;
- }
-
- return NULL;
-}
-
-static void ftrace_add_profile(struct ftrace_profile_stat *stat,
- struct ftrace_profile *rec)
-{
- unsigned long key;
-
- key = hash_long(rec->ip, ftrace_profile_bits);
- hlist_add_head_rcu(&rec->node, &stat->hash[key]);
-}
-
-/*
- * The memory is already allocated, this simply finds a new record to use.
- */
-static struct ftrace_profile *
-ftrace_profile_alloc(struct ftrace_profile_stat *stat, unsigned long ip)
-{
- struct ftrace_profile *rec = NULL;
-
- /* prevent recursion (from NMIs) */
- if (atomic_inc_return(&stat->disabled) != 1)
- goto out;
-
- /*
- * Try to find the function again since an NMI
- * could have added it
- */
- rec = ftrace_find_profiled_func(stat, ip);
- if (rec)
- goto out;
-
- if (stat->pages->index == PROFILES_PER_PAGE) {
- if (!stat->pages->next)
- goto out;
- stat->pages = stat->pages->next;
- }
-
- rec = &stat->pages->records[stat->pages->index++];
- rec->ip = ip;
- ftrace_add_profile(stat, rec);
-
- out:
- atomic_dec(&stat->disabled);
-
- return rec;
-}
-
static void
function_profile_call(unsigned long ip, unsigned long parent_ip)
{
- struct ftrace_profile_stat *stat;
- struct ftrace_profile *rec;
+ struct func_hlist *hlist;
+ struct func_node *rec;
unsigned long flags;
if (!ftrace_profile_enabled)
@@ -615,13 +386,13 @@ function_profile_call(unsigned long ip, unsigned long parent_ip)
local_irq_save(flags);
- stat = &__get_cpu_var(ftrace_profile_stats);
- if (!stat->hash || !ftrace_profile_enabled)
+ hlist = &__get_cpu_var(func_hlist_cpu);
+ if (!hlist->hash || !ftrace_profile_enabled)
goto out;
- rec = ftrace_find_profiled_func(stat, ip);
+ rec = function_find_hlist_node(hlist, ip);
if (!rec) {
- rec = ftrace_profile_alloc(stat, ip);
+ rec = function_hlist_record_alloc(hlist, ip);
if (!rec)
goto out;
}
@@ -640,14 +411,14 @@ static int profile_graph_entry(struct ftrace_graph_ent *trace)
static void profile_graph_return(struct ftrace_graph_ret *trace)
{
- struct ftrace_profile_stat *stat;
+ struct func_hlist *hlist;
unsigned long long calltime;
- struct ftrace_profile *rec;
+ struct func_node *rec;
unsigned long flags;
local_irq_save(flags);
- stat = &__get_cpu_var(ftrace_profile_stats);
- if (!stat->hash || !ftrace_profile_enabled)
+ hlist = &__get_cpu_var(func_hlist_cpu);
+ if (!hlist->hash || !ftrace_profile_enabled)
goto out;
calltime = trace->rettime - trace->calltime;
@@ -667,7 +438,7 @@ static void profile_graph_return(struct ftrace_graph_ret *trace)
calltime = 0;
}
- rec = ftrace_find_profiled_func(stat, trace->func);
+ rec = function_find_hlist_node(hlist, trace->func);
if (rec)
rec->time += calltime;
@@ -727,7 +498,7 @@ ftrace_profile_write(struct file *filp, const char __user *ubuf,
mutex_lock(&ftrace_profile_lock);
if (ftrace_profile_enabled ^ val) {
if (val) {
- ret = ftrace_profile_init();
+ ret = function_hlist_init();
if (ret < 0) {
cnt = ret;
goto out;
@@ -785,14 +556,14 @@ static struct tracer_stat function_stats __initdata = {
static __init void ftrace_profile_debugfs(struct dentry *d_tracer)
{
- struct ftrace_profile_stat *stat;
+ struct func_hlist *hlist;
struct dentry *entry;
char *name;
int ret;
int cpu;
for_each_possible_cpu(cpu) {
- stat = &per_cpu(ftrace_profile_stats, cpu);
+ hlist = &per_cpu(func_hlist_cpu, cpu);
/* allocate enough for function name + cpu number */
name = kmalloc(32, GFP_KERNEL);
@@ -806,10 +577,10 @@ static __init void ftrace_profile_debugfs(struct dentry *d_tracer)
cpu);
return;
}
- stat->stat = function_stats;
+ hlist->stat = function_stats;
snprintf(name, 32, "function%d", cpu);
- stat->stat.name = name;
- ret = register_stat_tracer(&stat->stat);
+ hlist->stat.name = name;
+ ret = register_stat_tracer(&hlist->stat);
if (ret) {
WARN(1,
"Could not register function stat for cpu %d\n",
diff --git a/kernel/trace/functions_hlist.c b/kernel/trace/functions_hlist.c
new file mode 100644
index 0000000..37804c4
--- /dev/null
+++ b/kernel/trace/functions_hlist.c
@@ -0,0 +1,217 @@
+/*
+ * Copyright (C) 2007-2010 Steven Rostedt <[email protected]>
+ *
+ * Extracted from function profiling in ftrace.c, generalize naming:
+ * Copyright (C) 2010 Frederic Weisbecker <[email protected]>
+ */
+
+#include <linux/gfp.h>
+#include "functions_hlist.h"
+#include "trace.h"
+
+#define FUNCTIONS_RECORDS_SIZE \
+ (PAGE_SIZE - offsetof(struct func_hlist_page, records))
+
+#define RECORDS_PER_PAGE \
+ (FUNCTIONS_RECORDS_SIZE / sizeof(struct func_node))
+
+#define FUNCTIONS_HLIST_SIZE 1024 /* must be power of 2 */
+
+DEFINE_PER_CPU(struct func_hlist, func_hlist_cpu);
+
+int functions_hash_bits __read_mostly;
+
+static void function_hlist_reset(struct func_hlist *hlist)
+{
+ struct func_hlist_page *pg;
+
+ pg = hlist->pages = hlist->start;
+
+ while (pg) {
+ memset(pg->records, 0, FUNCTIONS_RECORDS_SIZE);
+ pg->index = 0;
+ pg = pg->next;
+ }
+
+ memset(hlist->hash, 0,
+ FUNCTIONS_HLIST_SIZE * sizeof(struct hlist_head));
+}
+
+static int function_hlist_pages_init(struct func_hlist *hlist)
+{
+ struct func_hlist_page *pg;
+ int functions;
+ int pages;
+ int i;
+
+ /* If we already allocated, do nothing */
+ if (hlist->pages)
+ return 0;
+
+ hlist->pages = (void *)get_zeroed_page(GFP_KERNEL);
+ if (!hlist->pages)
+ return -ENOMEM;
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+ functions = ftrace_update_tot_cnt;
+#else
+ /*
+ * We do not know the number of functions that exist because
+ * dynamic tracing is what counts them. With past experience
+ * we have around 20K functions. That should be more than enough.
+ * It is highly unlikely we will execute every function in
+ * the kernel.
+ */
+ functions = 20000;
+#endif
+
+ pg = hlist->start = hlist->pages;
+
+ pages = DIV_ROUND_UP(functions, RECORDS_PER_PAGE);
+
+ for (i = 0; i < pages; i++) {
+ pg->next = (void *)get_zeroed_page(GFP_KERNEL);
+ if (!pg->next)
+ goto out_free;
+ pg = pg->next;
+ }
+
+ return 0;
+
+ out_free:
+ pg = hlist->start;
+ while (pg) {
+ unsigned long tmp = (unsigned long)pg;
+
+ pg = pg->next;
+ free_page(tmp);
+ }
+
+ free_page((unsigned long)hlist->pages);
+ hlist->pages = NULL;
+ hlist->start = NULL;
+
+ return -ENOMEM;
+}
+
+static int function_hlist_init_cpu(int cpu)
+{
+ struct func_hlist *hlist;
+ int size;
+
+ hlist = &per_cpu(func_hlist_cpu, cpu);
+
+ if (hlist->hash) {
+ /* If the profile is already created, simply reset it */
+ function_hlist_reset(hlist);
+ return 0;
+ }
+
+ /*
+ * We are profiling all functions, but usually only a few thousand
+ * functions are hit. We'll make a hash of 1024 items.
+ */
+ size = FUNCTIONS_HLIST_SIZE;
+
+ hlist->hash = kzalloc(sizeof(struct hlist_head) * size, GFP_KERNEL);
+
+ if (!hlist->hash)
+ return -ENOMEM;
+
+ if (!functions_hash_bits) {
+ size--;
+
+ for (; size; size >>= 1)
+ functions_hash_bits++;
+ }
+
+ /* Preallocate the function profiling pages */
+ if (function_hlist_pages_init(hlist) < 0) {
+ kfree(hlist->hash);
+ hlist->hash = NULL;
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+int function_hlist_init(void)
+{
+ int cpu;
+ int ret = 0;
+
+ for_each_online_cpu(cpu) {
+ ret = function_hlist_init_cpu(cpu);
+ if (ret)
+ break;
+ }
+
+ return ret;
+}
+
+struct func_node *
+function_find_hlist_node(struct func_hlist *hlist, unsigned long ip)
+{
+ struct func_node *rec;
+ struct hlist_head *hhd;
+ struct hlist_node *n;
+ unsigned long key;
+
+ key = hash_long(ip, functions_hash_bits);
+ hhd = &hlist->hash[key];
+
+ if (hlist_empty(hhd))
+ return NULL;
+
+ hlist_for_each_entry_rcu(rec, n, hhd, node) {
+ if (rec->ip == ip)
+ return rec;
+ }
+
+ return NULL;
+}
+
+static void function_hlist_add(struct func_hlist *hlist,
+ struct func_node *rec)
+{
+ unsigned long key;
+
+ key = hash_long(rec->ip, functions_hash_bits);
+ hlist_add_head_rcu(&rec->node, &hlist->hash[key]);
+}
+
+/*
+ * The memory is already allocated, this simply finds a new record to use.
+ */
+struct func_node *
+function_hlist_record_alloc(struct func_hlist *hlist, unsigned long ip)
+{
+ struct func_node *rec = NULL;
+
+ /* prevent recursion (from NMIs) */
+ if (atomic_inc_return(&hlist->disabled) != 1)
+ goto out;
+
+ /*
+ * Try to find the function again since an NMI
+ * could have added it
+ */
+ rec = function_find_hlist_node(hlist, ip);
+ if (rec)
+ goto out;
+
+ if (hlist->pages->index == RECORDS_PER_PAGE) {
+ if (!hlist->pages->next)
+ goto out;
+ hlist->pages = hlist->pages->next;
+ }
+
+ rec = &hlist->pages->records[hlist->pages->index++];
+ rec->ip = ip;
+ function_hlist_add(hlist, rec);
+
+ out:
+ atomic_dec(&hlist->disabled);
+
+ return rec;
+}
diff --git a/kernel/trace/functions_hlist.h b/kernel/trace/functions_hlist.h
new file mode 100644
index 0000000..3f4e485
--- /dev/null
+++ b/kernel/trace/functions_hlist.h
@@ -0,0 +1,38 @@
+#include <linux/hash.h>
+#include <linux/list.h>
+#include "trace_stat.h"
+
+struct func_node {
+ struct hlist_node node;
+ unsigned long ip;
+ unsigned long counter;
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+ unsigned long long time;
+#endif
+};
+
+struct func_hlist_page {
+ struct func_hlist_page *next;
+ unsigned long index;
+ struct func_node records[];
+};
+
+struct func_hlist {
+ atomic_t disabled;
+ struct hlist_head *hash;
+ struct func_hlist_page *pages;
+ struct func_hlist_page *start;
+ struct tracer_stat stat;
+};
+
+DECLARE_PER_CPU(struct func_hlist, func_hlist_cpu);
+
+extern int functions_hash_bits;
+
+struct func_node *
+function_find_hlist_node(struct func_hlist *hlist, unsigned long ip);
+
+struct func_node *
+function_hlist_record_alloc(struct func_hlist *hlist, unsigned long ip);
+
+int function_hlist_init(void);
--
1.6.2.3
On Fri, 2010-01-22 at 02:16 +0100, Frederic Weisbecker wrote:
> When we run under dynamic tracing, we know that after calling
> unregister_ftrace_graph(), tracing has really stopped because of
> the hot patching and use of stop_machine().
This is incorrect. Even after unregister_ftrace_graph() with
stop_machine(), we still have no guarantee that a call back is not being
called. This is the reason I use sub tracing instead of NULLs. The call
to the trace function could have been loaded in a register and then
preempted. Even after stop_machine() that trace function can be called.
This is also the reason that I never let modules add hooks to the
function tracer (although I can easily make a wrapper to do so).
>
> But in static tracing case, we only set stub callbacks. This is
> not sufficient on archs that have weak memory ordering to assume
> the older callbacks won't be called right after we leave
> unregister_ftrace_graph().
>
> Insert a read/write memory barrier in the end of
> unregister_ftrace_graph() so that the code that follow can safely
> assume tracing has really stopped. This can avoid its older tracing
> callbacks to perform checks about various states like ensuring
> needed buffers have been allocated, etc...
There's no guarantee, even with a smp_mb() that a trace function will
not be called after being unregistered.
-- Steve
On Thu, Jan 21, 2010 at 08:51:48PM -0500, Steven Rostedt wrote:
> On Fri, 2010-01-22 at 02:16 +0100, Frederic Weisbecker wrote:
> > When we run under dynamic tracing, we know that after calling
> > unregister_ftrace_graph(), tracing has really stopped because of
> > the hot patching and use of stop_machine().
>
> This is incorrect. Even after unregister_ftrace_graph() with
> stop_machine(), we still have no guarantee that a call back is not being
> called. This is the reason I use sub tracing instead of NULLs. The call
> to the trace function could have been loaded in a register and then
> preempted. Even after stop_machine() that trace function can be called.
> This is also the reason that I never let modules add hooks to the
> function tracer (although I can easily make a wrapper to do so).
Ah, you are utterly right! I forgot about all that. And looks like
nothing can easily help this.
I just dream about a magic synchronize_trace().
> >
> > But in static tracing case, we only set stub callbacks. This is
> > not sufficient on archs that have weak memory ordering to assume
> > the older callbacks won't be called right after we leave
> > unregister_ftrace_graph().
> >
> > Insert a read/write memory barrier in the end of
> > unregister_ftrace_graph() so that the code that follow can safely
> > assume tracing has really stopped. This can avoid its older tracing
> > callbacks to perform checks about various states like ensuring
> > needed buffers have been allocated, etc...
>
> There's no guarantee, even with a smp_mb() that a trace function will
> not be called after being unregistered.
Yeah, indeed...
On Fri, 2010-01-22 at 02:16 +0100, Frederic Weisbecker wrote:
> Every time we enter the function profiler tracing callbacks, we first
> check if the function profiling is enabled.
>
> This check is useless because we register the function graph
> callbacks only after the hashlist has been initialized.
Unfortunately, since the previous patch is incorrect, it makes this one
buggy too.
If you remove the check to ftrace_profile_enabled, the call to the
profiled code could have been preempted and pending to be called.
Stop machine may remove all calls to the tracing, but it only affects
new hits. Pending hits may still exist.
If you remove this check, and the user re-enables the profiling, then
all PER_CPU hashs will be reset. If in the process of this happening,
the task with the pending trace wakes up, it may access the PER_CPU list
and corrupt it.
Now for the reason I Cc'd Paul and Mathieu...
If we had a synchronize_sched() like function that would wait and return
when all preempted tasks have been scheduled again and went to either
userspace or called schedule directly, then we could actually do this.
After unregistering the function graph trace, you call this
"synchronize_tasks()" and it will guarantee that all currently preempted
tasks have either went to userspace or have called schedule() directly.
Then it would be safe to remove this check.
-- Steve
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Li Zefan <[email protected]>
> Cc: Lai Jiangshan <[email protected]>
> ---
> kernel/trace/ftrace.c | 7 ++-----
> 1 files changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 94117ec..f258f67 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -381,13 +381,10 @@ function_profile_call(unsigned long ip, unsigned long parent_ip)
> struct func_node *rec;
> unsigned long flags;
>
> - if (!ftrace_profile_enabled)
> - return;
> -
> local_irq_save(flags);
>
> hlist = &__get_cpu_var(func_hlist_cpu);
> - if (!hlist->hash || !ftrace_profile_enabled)
> + if (!hlist->hash)
> goto out;
>
> rec = function_find_hlist_node(hlist, ip);
> @@ -418,7 +415,7 @@ static void profile_graph_return(struct ftrace_graph_ret *trace)
>
> local_irq_save(flags);
> hlist = &__get_cpu_var(func_hlist_cpu);
> - if (!hlist->hash || !ftrace_profile_enabled)
> + if (!hlist->hash)
> goto out;
>
> calltime = trace->rettime - trace->calltime;
* Steven Rostedt ([email protected]) wrote:
> On Fri, 2010-01-22 at 02:16 +0100, Frederic Weisbecker wrote:
> > Every time we enter the function profiler tracing callbacks, we first
> > check if the function profiling is enabled.
> >
> > This check is useless because we register the function graph
> > callbacks only after the hashlist has been initialized.
>
> Unfortunately, since the previous patch is incorrect, it makes this one
> buggy too.
>
> If you remove the check to ftrace_profile_enabled, the call to the
> profiled code could have been preempted and pending to be called.
>
> Stop machine may remove all calls to the tracing, but it only affects
> new hits. Pending hits may still exist.
>
> If you remove this check, and the user re-enables the profiling, then
> all PER_CPU hashs will be reset. If in the process of this happening,
> the task with the pending trace wakes up, it may access the PER_CPU list
> and corrupt it.
>
> Now for the reason I Cc'd Paul and Mathieu...
>
> If we had a synchronize_sched() like function that would wait and return
> when all preempted tasks have been scheduled again and went to either
> userspace or called schedule directly, then we could actually do this.
>
> After unregistering the function graph trace, you call this
> "synchronize_tasks()" and it will guarantee that all currently preempted
> tasks have either went to userspace or have called schedule() directly.
> Then it would be safe to remove this check.
OK, so basically you need to know when you reach a quiescent state, but
preemption is enabled and there is no RCU read lock taken around these
code paths, am I correct ?
With tracepoints, life is easy because I disable preemption around the
calls, so I can use synchronize_sched() to know when quiescent state is
reached.
I recommend looking at kernel/kprobes.c:check_safety(). It uses
thaw_processes() and synchronize_sched() for this purpose. Basically, it
rely on the "refrigeration" points to detect such quiescent state. This
trick should do the job for the function graph tracer too.
I'm adding Masami in CC. He is the one who implemented check_safety(),
and I remember discussing it with him in the past.
Thanks,
Mathieu
>
> -- Steve
>
> >
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Li Zefan <[email protected]>
> > Cc: Lai Jiangshan <[email protected]>
> > ---
> > kernel/trace/ftrace.c | 7 ++-----
> > 1 files changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index 94117ec..f258f67 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -381,13 +381,10 @@ function_profile_call(unsigned long ip, unsigned long parent_ip)
> > struct func_node *rec;
> > unsigned long flags;
> >
> > - if (!ftrace_profile_enabled)
> > - return;
> > -
> > local_irq_save(flags);
> >
> > hlist = &__get_cpu_var(func_hlist_cpu);
> > - if (!hlist->hash || !ftrace_profile_enabled)
> > + if (!hlist->hash)
> > goto out;
> >
> > rec = function_find_hlist_node(hlist, ip);
> > @@ -418,7 +415,7 @@ static void profile_graph_return(struct ftrace_graph_ret *trace)
> >
> > local_irq_save(flags);
> > hlist = &__get_cpu_var(func_hlist_cpu);
> > - if (!hlist->hash || !ftrace_profile_enabled)
> > + if (!hlist->hash)
> > goto out;
> >
> > calltime = trace->rettime - trace->calltime;
>
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
On Thu, Jan 21, 2010 at 09:05:17PM -0500, Steven Rostedt wrote:
> On Fri, 2010-01-22 at 02:16 +0100, Frederic Weisbecker wrote:
> > Every time we enter the function profiler tracing callbacks, we first
> > check if the function profiling is enabled.
> >
> > This check is useless because we register the function graph
> > callbacks only after the hashlist has been initialized.
>
> Unfortunately, since the previous patch is incorrect, it makes this one
> buggy too.
>
> If you remove the check to ftrace_profile_enabled, the call to the
> profiled code could have been preempted and pending to be called.
>
> Stop machine may remove all calls to the tracing, but it only affects
> new hits. Pending hits may still exist.
>
> If you remove this check, and the user re-enables the profiling, then
> all PER_CPU hashs will be reset. If in the process of this happening,
> the task with the pending trace wakes up, it may access the PER_CPU list
> and corrupt it.
Indeed.
> Now for the reason I Cc'd Paul and Mathieu...
>
> If we had a synchronize_sched() like function that would wait and return
> when all preempted tasks have been scheduled again and went to either
> userspace or called schedule directly, then we could actually do this.
>
> After unregistering the function graph trace, you call this
> "synchronize_tasks()" and it will guarantee that all currently preempted
> tasks have either went to userspace or have called schedule() directly.
> Then it would be safe to remove this check.
Good point!
I fear that would require heavy hooks in the scheduler though...
On Fri, 2010-01-22 at 03:43 +0100, Frederic Weisbecker wrote:
> > Now for the reason I Cc'd Paul and Mathieu...
> >
> > If we had a synchronize_sched() like function that would wait and return
> > when all preempted tasks have been scheduled again and went to either
> > userspace or called schedule directly, then we could actually do this.
> >
> > After unregistering the function graph trace, you call this
> > "synchronize_tasks()" and it will guarantee that all currently preempted
> > tasks have either went to userspace or have called schedule() directly.
> > Then it would be safe to remove this check.
>
>
>
> Good point!
>
> I fear that would require heavy hooks in the scheduler though...
>
Not a heavy one. We could add a field to the task_struct and just call
something if it is set.
At start of schedule()
if (unlikely(current->pcount))
handle_pcount_waiters(current);
void handle_pcount_waiters(struct task_struct *p)
{
current->pcount = 0;
wake_up(pcount_waiters);
}
and for the synchronize_tasks(), just search the task list for tasks
that are on the run queue but not running, and add a pcount timestamp
and record the list of tasks (allocated list).
After it is woken up, it checks the list of tasks and if a task does not
have the pcount timestamp that matches what was stored, it removes it
from the list. When it is finally woken up and does not have any more
tasks on the list, it continues.
This is just a basic idea, i left out a bunch of details, but I'm sure
it is feasible. This type of wait may work for other types of lockless
algorithms too.
-- Steve
On Thu, 2010-01-21 at 21:28 -0500, Mathieu Desnoyers wrote:
> * Steven Rostedt ([email protected]) wrote:
> > Now for the reason I Cc'd Paul and Mathieu...
> >
> > If we had a synchronize_sched() like function that would wait and return
> > when all preempted tasks have been scheduled again and went to either
> > userspace or called schedule directly, then we could actually do this.
> >
> > After unregistering the function graph trace, you call this
> > "synchronize_tasks()" and it will guarantee that all currently preempted
> > tasks have either went to userspace or have called schedule() directly.
> > Then it would be safe to remove this check.
>
> OK, so basically you need to know when you reach a quiescent state, but
> preemption is enabled and there is no RCU read lock taken around these
> code paths, am I correct ?
>
> With tracepoints, life is easy because I disable preemption around the
> calls, so I can use synchronize_sched() to know when quiescent state is
> reached.
>
> I recommend looking at kernel/kprobes.c:check_safety(). It uses
> thaw_processes() and synchronize_sched() for this purpose. Basically, it
> rely on the "refrigeration" points to detect such quiescent state. This
> trick should do the job for the function graph tracer too.
>
> I'm adding Masami in CC. He is the one who implemented check_safety(),
> and I remember discussing it with him in the past.
Hmm, interesting. Maybe something like that might work. But what if
CONFIG_PREEMPT is enabled but CONFIG_FREEZER is not?
-- Steve
* Steven Rostedt ([email protected]) wrote:
> On Thu, 2010-01-21 at 21:28 -0500, Mathieu Desnoyers wrote:
> > * Steven Rostedt ([email protected]) wrote:
>
> > > Now for the reason I Cc'd Paul and Mathieu...
> > >
> > > If we had a synchronize_sched() like function that would wait and return
> > > when all preempted tasks have been scheduled again and went to either
> > > userspace or called schedule directly, then we could actually do this.
> > >
> > > After unregistering the function graph trace, you call this
> > > "synchronize_tasks()" and it will guarantee that all currently preempted
> > > tasks have either went to userspace or have called schedule() directly.
> > > Then it would be safe to remove this check.
> >
> > OK, so basically you need to know when you reach a quiescent state, but
> > preemption is enabled and there is no RCU read lock taken around these
> > code paths, am I correct ?
> >
> > With tracepoints, life is easy because I disable preemption around the
> > calls, so I can use synchronize_sched() to know when quiescent state is
> > reached.
> >
> > I recommend looking at kernel/kprobes.c:check_safety(). It uses
> > thaw_processes() and synchronize_sched() for this purpose. Basically, it
> > rely on the "refrigeration" points to detect such quiescent state. This
> > trick should do the job for the function graph tracer too.
> >
> > I'm adding Masami in CC. He is the one who implemented check_safety(),
> > and I remember discussing it with him in the past.
>
> Hmm, interesting. Maybe something like that might work. But what if
> CONFIG_PREEMPT is enabled but CONFIG_FREEZER is not?
Then you may want to make the function tracer depend on CONFIG_FREEZER,
but maybe Masami has other ideas ?
Mathieu
>
> -- Steve
>
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
On Thu, 2010-01-21 at 23:09 -0500, Mathieu Desnoyers wrote:
> * Steven Rostedt ([email protected]) wrote:
> > Hmm, interesting. Maybe something like that might work. But what if
> > CONFIG_PREEMPT is enabled but CONFIG_FREEZER is not?
>
> Then you may want to make the function tracer depend on CONFIG_FREEZER,
> but maybe Masami has other ideas ?
egad no! This is just to help add guarantees to those that use the
function tracer that when the tracing is disabled, it is guaranteed that
no more tracing will be called by the function tracer. Currently,
nothing relies on this. But we may add cases that might need this.
In fact, only those that need this requirement would need to do this
trick. Anyway, we could make those depend on CONFIG_FREEZER, but that
just seems to be a strange dependency.
-- Steve
* Steven Rostedt ([email protected]) wrote:
> On Thu, 2010-01-21 at 23:09 -0500, Mathieu Desnoyers wrote:
> > * Steven Rostedt ([email protected]) wrote:
>
> > > Hmm, interesting. Maybe something like that might work. But what if
> > > CONFIG_PREEMPT is enabled but CONFIG_FREEZER is not?
> >
> > Then you may want to make the function tracer depend on CONFIG_FREEZER,
> > but maybe Masami has other ideas ?
>
> egad no! This is just to help add guarantees to those that use the
> function tracer that when the tracing is disabled, it is guaranteed that
> no more tracing will be called by the function tracer. Currently,
> nothing relies on this. But we may add cases that might need this.
Yep, identifying tracer quiescent state can become handy.
>
> In fact, only those that need this requirement would need to do this
> trick. Anyway, we could make those depend on CONFIG_FREEZER, but that
> just seems to be a strange dependency.
This makes me wonder (question for Masami)...
static int __kprobes check_safety(void)
{
int ret = 0;
#if defined(CONFIG_PREEMPT) && defined(CONFIG_FREEZER)
ret = freeze_processes();
if (ret == 0) {
struct task_struct *p, *q;
do_each_thread(p, q) {
if (p != current && p->state == TASK_RUNNING &&
p->pid != 0) {
printk("Check failed: %s is running\n",p->comm);
ret = -1;
goto loop_end;
}
} while_each_thread(p, q);
}
loop_end:
thaw_processes();
#else
synchronize_sched();
#endif
return ret;
}
How does that deal with CONFIG_PREEMPT && !CONFIG_FREEZER exactly ?
Thanks,
Mathieu
>
> -- Steve
>
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
Mathieu Desnoyers wrote:
> * Steven Rostedt ([email protected]) wrote:
>> On Thu, 2010-01-21 at 23:09 -0500, Mathieu Desnoyers wrote:
>>> * Steven Rostedt ([email protected]) wrote:
>>
>>>> Hmm, interesting. Maybe something like that might work. But what if
>>>> CONFIG_PREEMPT is enabled but CONFIG_FREEZER is not?
>>>
>>> Then you may want to make the function tracer depend on CONFIG_FREEZER,
>>> but maybe Masami has other ideas ?
>>
>> egad no! This is just to help add guarantees to those that use the
>> function tracer that when the tracing is disabled, it is guaranteed that
>> no more tracing will be called by the function tracer. Currently,
>> nothing relies on this. But we may add cases that might need this.
>
> Yep, identifying tracer quiescent state can become handy.
>
>>
>> In fact, only those that need this requirement would need to do this
>> trick. Anyway, we could make those depend on CONFIG_FREEZER, but that
>> just seems to be a strange dependency.
>
> This makes me wonder (question for Masami)...
>
> static int __kprobes check_safety(void)
> {
> int ret = 0;
> #if defined(CONFIG_PREEMPT) && defined(CONFIG_FREEZER)
> ret = freeze_processes();
> if (ret == 0) {
> struct task_struct *p, *q;
> do_each_thread(p, q) {
> if (p != current && p->state == TASK_RUNNING &&
> p->pid != 0) {
> printk("Check failed: %s is running\n",p->comm);
> ret = -1;
> goto loop_end;
> }
> } while_each_thread(p, q);
> }
> loop_end:
> thaw_processes();
> #else
> synchronize_sched();
> #endif
> return ret;
> }
>
> How does that deal with CONFIG_PREEMPT && !CONFIG_FREEZER exactly ?
In that case, it just disables kprobe booster, and never use this
safety check.
Actually, this safety check is currently only for boosted probes,
which is transparently enabled on all kprobes. This means that we
can fall back to normal kprobes if we can't use the booster.
(the functionality of probing still exists, but not boosted,
becomes slow.)
I'm not so sure how much cost can be reduced by dropping
ftrace_profile_enabled(). I agree that using freeze_processes() will
help you, but if there is no alternative(e.g. use ftrace_profile_enable()
only if CONFIG_PREEMPT&&!CONFIG_FREEZER), I don't recommend to use it.
If we can make synchronize_tasks(), I'd like to use it in above safety
check too:-)
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
Frederic Weisbecker wrote:
> Drop the null check on hlist->hash. It is wasteful because
> we don't register the tracer if the buffer allocation failed,
> and the memory barrier in register_ftrace_graph() ensure it
> is visible to the callbacks right away.
>
When it is on a cpu which is just came up, hlist->hash is not
allocated either.
See ftrace_profile_init().
function_profile_call(), profile_graph_entry() and profile_graph_return()
on a just online cpu are wasteful. I think we need call
ftrace_profile_init_cpu()(if ftrace_profile_enabled=1) when CPU_UP_PREPARE.
> Also we know the tracing callbacks won't be called after
> register_ftrace_graph(), so subsequent buffer resets are safe
> too.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Li Zefan <[email protected]>
> Cc: Lai Jiangshan <[email protected]>
> ---
> kernel/trace/ftrace.c | 5 -----
> 1 files changed, 0 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 3cdb35e..dfd8f7c 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -384,8 +384,6 @@ function_profile_call(unsigned long ip, unsigned long parent_ip)
> local_irq_save(flags);
>
> hlist = &__get_cpu_var(func_hlist_cpu);
> - if (!hlist->hash)
> - goto out;
>
> rec = function_find_hlist_node(hlist, ip);
> if (!rec) {
> @@ -415,8 +413,6 @@ static void profile_graph_return(struct ftrace_graph_ret *trace)
>
> local_irq_save(flags);
> hlist = &__get_cpu_var(func_hlist_cpu);
> - if (!hlist->hash)
> - goto out;
>
> calltime = trace->rettime - trace->calltime;
>
> @@ -439,7 +435,6 @@ static void profile_graph_return(struct ftrace_graph_ret *trace)
> if (rec)
> rec->time += calltime;
>
> - out:
> local_irq_restore(flags);
> }
>
Frederic Weisbecker wrote:
> After we disable the function profiler, the function hashlist
> stays in memory. This is wasteful as nobody needs it anymore,
> until the next use if any.
>
> Release it when we disable the function profiler instead of
> resetting it in the next use.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Li Zefan <[email protected]>
> Cc: Lai Jiangshan <[email protected]>
> ---
> kernel/trace/ftrace.c | 1 +
> kernel/trace/functions_hlist.c | 61 +++++++++++++++++-----------------------
> kernel/trace/functions_hlist.h | 1 +
> 3 files changed, 28 insertions(+), 35 deletions(-)
>
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index dfd8f7c..0ded01c 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -509,6 +509,7 @@ ftrace_profile_write(struct file *filp, const char __user *ubuf,
> /*
> * unregister_ftrace_profiler calls stop_machine
> * so this acts like an synchronize_sched.
> */
> unregister_ftrace_profiler();
unluckily, when !CONFIG_DYNAMIC_FTRACE, it does not call stop_machine()
nor synchronize_sched(), bug here?
> + function_hlist_release();
Frederic Weisbecker wrote:
> When we set a filter to start tracing from a given function in
> the function graph tracer, the filter is stored in a linear array.
>
> It doesn't scale very well, we even limited the number of such
> functions to 32.
>
> Now that we have a hashlist of functions, lets put a field inside
> each function node so that we can check if a function is one of
> these filters using the hashlist, not a linear array.
The linear array @ftrace_graph_funcs is still used in this patch.
we still limit the number of such functions to 32?
[...]
> #define FTRACE_GRAPH_MAX_FUNCS 32
> extern int ftrace_graph_count;
> extern unsigned long ftrace_graph_funcs[FTRACE_GRAPH_MAX_FUNCS];
> -
> -static inline int ftrace_graph_addr(unsigned long addr)
> -{
> - int i;
> -
> - if (!ftrace_graph_count)
> - return 1;
Here return 1.
[...]
> +static inline int ftrace_graph_addr(unsigned long addr)
> +{
> + struct func_node *rec;
> + struct func_hlist *hlist;
> +
> + if (!ftrace_graph_count)
> + return 0;
> +
But in this patch, return 0 here, the behave will be changed.
On Thu, Jan 21, 2010 at 10:05:06PM -0500, Steven Rostedt wrote:
> On Fri, 2010-01-22 at 03:43 +0100, Frederic Weisbecker wrote:
>
> > > Now for the reason I Cc'd Paul and Mathieu...
> > >
> > > If we had a synchronize_sched() like function that would wait and return
> > > when all preempted tasks have been scheduled again and went to either
> > > userspace or called schedule directly, then we could actually do this.
> > >
> > > After unregistering the function graph trace, you call this
> > > "synchronize_tasks()" and it will guarantee that all currently preempted
> > > tasks have either went to userspace or have called schedule() directly.
> > > Then it would be safe to remove this check.
> >
> >
> >
> > Good point!
> >
> > I fear that would require heavy hooks in the scheduler though...
> >
>
> Not a heavy one. We could add a field to the task_struct and just call
> something if it is set.
>
>
> At start of schedule()
>
> if (unlikely(current->pcount))
> handle_pcount_waiters(current);
>
>
> void handle_pcount_waiters(struct task_struct *p)
> {
> current->pcount = 0;
> wake_up(pcount_waiters);
> }
>
>
> and for the synchronize_tasks(), just search the task list for tasks
> that are on the run queue but not running, and add a pcount timestamp
> and record the list of tasks (allocated list).
>
> After it is woken up, it checks the list of tasks and if a task does not
> have the pcount timestamp that matches what was stored, it removes it
> from the list. When it is finally woken up and does not have any more
> tasks on the list, it continues.
>
> This is just a basic idea, i left out a bunch of details, but I'm sure
> it is feasible. This type of wait may work for other types of lockless
> algorithms too.
>
> -- Steve
Sounds like a good idea. That would avoid these wasteful checks and
resident buffers.
On Fri, Jan 22, 2010 at 07:34:51AM -0500, Mathieu Desnoyers wrote:
> * Steven Rostedt ([email protected]) wrote:
> > On Thu, 2010-01-21 at 23:09 -0500, Mathieu Desnoyers wrote:
> > > * Steven Rostedt ([email protected]) wrote:
> >
> > > > Hmm, interesting. Maybe something like that might work. But what if
> > > > CONFIG_PREEMPT is enabled but CONFIG_FREEZER is not?
> > >
> > > Then you may want to make the function tracer depend on CONFIG_FREEZER,
> > > but maybe Masami has other ideas ?
> >
> > egad no! This is just to help add guarantees to those that use the
> > function tracer that when the tracing is disabled, it is guaranteed that
> > no more tracing will be called by the function tracer. Currently,
> > nothing relies on this. But we may add cases that might need this.
>
> Yep, identifying tracer quiescent state can become handy.
>
> >
> > In fact, only those that need this requirement would need to do this
> > trick. Anyway, we could make those depend on CONFIG_FREEZER, but that
> > just seems to be a strange dependency.
>
> This makes me wonder (question for Masami)...
>
> static int __kprobes check_safety(void)
> {
> int ret = 0;
> #if defined(CONFIG_PREEMPT) && defined(CONFIG_FREEZER)
> ret = freeze_processes();
> if (ret == 0) {
> struct task_struct *p, *q;
> do_each_thread(p, q) {
> if (p != current && p->state == TASK_RUNNING &&
> p->pid != 0) {
> printk("Check failed: %s is running\n",p->comm);
> ret = -1;
> goto loop_end;
> }
> } while_each_thread(p, q);
How does that deal with kernel threads that don't freeze?
Also freezing every processes seems a bit of a heavy thing for that.
Looks like a synchronize_tasks() would be really useful.
Frederic Weisbecker wrote:
> On Fri, Jan 22, 2010 at 07:34:51AM -0500, Mathieu Desnoyers wrote:
>> * Steven Rostedt ([email protected]) wrote:
>>> On Thu, 2010-01-21 at 23:09 -0500, Mathieu Desnoyers wrote:
>>>> * Steven Rostedt ([email protected]) wrote:
>>>
>>>>> Hmm, interesting. Maybe something like that might work. But what if
>>>>> CONFIG_PREEMPT is enabled but CONFIG_FREEZER is not?
>>>>
>>>> Then you may want to make the function tracer depend on CONFIG_FREEZER,
>>>> but maybe Masami has other ideas ?
>>>
>>> egad no! This is just to help add guarantees to those that use the
>>> function tracer that when the tracing is disabled, it is guaranteed that
>>> no more tracing will be called by the function tracer. Currently,
>>> nothing relies on this. But we may add cases that might need this.
>>
>> Yep, identifying tracer quiescent state can become handy.
>>
>>>
>>> In fact, only those that need this requirement would need to do this
>>> trick. Anyway, we could make those depend on CONFIG_FREEZER, but that
>>> just seems to be a strange dependency.
>>
>> This makes me wonder (question for Masami)...
>>
>> static int __kprobes check_safety(void)
>> {
>> int ret = 0;
>> #if defined(CONFIG_PREEMPT) && defined(CONFIG_FREEZER)
>> ret = freeze_processes();
>> if (ret == 0) {
>> struct task_struct *p, *q;
>> do_each_thread(p, q) {
>> if (p != current && p->state == TASK_RUNNING &&
>> p->pid != 0) {
>> printk("Check failed: %s is running\n",p->comm);
>> ret = -1;
>> goto loop_end;
>> }
>> } while_each_thread(p, q);
>
>
>
> How does that deal with kernel threads that don't freeze?
Hmm, right. It can't handle non-freezable kernel threads.
> Also freezing every processes seems a bit of a heavy thing for that.
> Looks like a synchronize_tasks() would be really useful.
Sure :-)
Maybe, I'd better remove booster support on preemptive kernel until then.
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
On Mon, Jan 25, 2010 at 05:14:16PM -0500, Masami Hiramatsu wrote:
> Frederic Weisbecker wrote:
> > On Fri, Jan 22, 2010 at 07:34:51AM -0500, Mathieu Desnoyers wrote:
> >> * Steven Rostedt ([email protected]) wrote:
> >>> On Thu, 2010-01-21 at 23:09 -0500, Mathieu Desnoyers wrote:
> >>>> * Steven Rostedt ([email protected]) wrote:
> >>>
> >>>>> Hmm, interesting. Maybe something like that might work. But what if
> >>>>> CONFIG_PREEMPT is enabled but CONFIG_FREEZER is not?
> >>>>
> >>>> Then you may want to make the function tracer depend on CONFIG_FREEZER,
> >>>> but maybe Masami has other ideas ?
> >>>
> >>> egad no! This is just to help add guarantees to those that use the
> >>> function tracer that when the tracing is disabled, it is guaranteed that
> >>> no more tracing will be called by the function tracer. Currently,
> >>> nothing relies on this. But we may add cases that might need this.
> >>
> >> Yep, identifying tracer quiescent state can become handy.
> >>
> >>>
> >>> In fact, only those that need this requirement would need to do this
> >>> trick. Anyway, we could make those depend on CONFIG_FREEZER, but that
> >>> just seems to be a strange dependency.
> >>
> >> This makes me wonder (question for Masami)...
> >>
> >> static int __kprobes check_safety(void)
> >> {
> >> int ret = 0;
> >> #if defined(CONFIG_PREEMPT) && defined(CONFIG_FREEZER)
> >> ret = freeze_processes();
> >> if (ret == 0) {
> >> struct task_struct *p, *q;
> >> do_each_thread(p, q) {
> >> if (p != current && p->state == TASK_RUNNING &&
> >> p->pid != 0) {
> >> printk("Check failed: %s is running\n",p->comm);
> >> ret = -1;
> >> goto loop_end;
> >> }
> >> } while_each_thread(p, q);
> >
> >
> >
> > How does that deal with kernel threads that don't freeze?
>
> Hmm, right. It can't handle non-freezable kernel threads.
>
> > Also freezing every processes seems a bit of a heavy thing for that.
> > Looks like a synchronize_tasks() would be really useful.
>
> Sure :-)
> Maybe, I'd better remove booster support on preemptive kernel until then.
I don't know as I haven't looked deeper into check_safety(), but does the
fact we have non-freezable tasks break the assumptions that make
kprobes booster safe? If so then yeah, may be deactivate it for now.
* Frederic Weisbecker ([email protected]) wrote:
> On Mon, Jan 25, 2010 at 05:14:16PM -0500, Masami Hiramatsu wrote:
> > Frederic Weisbecker wrote:
> > > On Fri, Jan 22, 2010 at 07:34:51AM -0500, Mathieu Desnoyers wrote:
> > >> * Steven Rostedt ([email protected]) wrote:
> > >>> On Thu, 2010-01-21 at 23:09 -0500, Mathieu Desnoyers wrote:
> > >>>> * Steven Rostedt ([email protected]) wrote:
> > >>>
> > >>>>> Hmm, interesting. Maybe something like that might work. But what if
> > >>>>> CONFIG_PREEMPT is enabled but CONFIG_FREEZER is not?
> > >>>>
> > >>>> Then you may want to make the function tracer depend on CONFIG_FREEZER,
> > >>>> but maybe Masami has other ideas ?
> > >>>
> > >>> egad no! This is just to help add guarantees to those that use the
> > >>> function tracer that when the tracing is disabled, it is guaranteed that
> > >>> no more tracing will be called by the function tracer. Currently,
> > >>> nothing relies on this. But we may add cases that might need this.
> > >>
> > >> Yep, identifying tracer quiescent state can become handy.
> > >>
> > >>>
> > >>> In fact, only those that need this requirement would need to do this
> > >>> trick. Anyway, we could make those depend on CONFIG_FREEZER, but that
> > >>> just seems to be a strange dependency.
> > >>
> > >> This makes me wonder (question for Masami)...
> > >>
> > >> static int __kprobes check_safety(void)
> > >> {
> > >> int ret = 0;
> > >> #if defined(CONFIG_PREEMPT) && defined(CONFIG_FREEZER)
> > >> ret = freeze_processes();
> > >> if (ret == 0) {
> > >> struct task_struct *p, *q;
> > >> do_each_thread(p, q) {
> > >> if (p != current && p->state == TASK_RUNNING &&
> > >> p->pid != 0) {
> > >> printk("Check failed: %s is running\n",p->comm);
> > >> ret = -1;
> > >> goto loop_end;
> > >> }
> > >> } while_each_thread(p, q);
> > >
> > >
> > >
> > > How does that deal with kernel threads that don't freeze?
> >
> > Hmm, right. It can't handle non-freezable kernel threads.
> >
> > > Also freezing every processes seems a bit of a heavy thing for that.
> > > Looks like a synchronize_tasks() would be really useful.
> >
> > Sure :-)
> > Maybe, I'd better remove booster support on preemptive kernel until then.
>
>
> I don't know as I haven't looked deeper into check_safety(), but does the
> fact we have non-freezable tasks break the assumptions that make
> kprobes booster safe? If so then yeah, may be deactivate it for now.
>
In the case of check_safety, it's not a bug per se if a task happens to
be non freezable. freeze_processes() will likely return a non-zero
value, and the whole check_safety will therefore return that value, so
standard breakpoints will be used instead.
But that doesn't fit with the function graph tracer requirements.
Thanks,
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
Mathieu Desnoyers wrote:
> * Frederic Weisbecker ([email protected]) wrote:
>> On Mon, Jan 25, 2010 at 05:14:16PM -0500, Masami Hiramatsu wrote:
>>> Frederic Weisbecker wrote:
>>>> On Fri, Jan 22, 2010 at 07:34:51AM -0500, Mathieu Desnoyers wrote:
>>>>> * Steven Rostedt ([email protected]) wrote:
>>>>>> On Thu, 2010-01-21 at 23:09 -0500, Mathieu Desnoyers wrote:
>>>>>>> * Steven Rostedt ([email protected]) wrote:
>>>>>>
>>>>>>>> Hmm, interesting. Maybe something like that might work. But what if
>>>>>>>> CONFIG_PREEMPT is enabled but CONFIG_FREEZER is not?
>>>>>>>
>>>>>>> Then you may want to make the function tracer depend on CONFIG_FREEZER,
>>>>>>> but maybe Masami has other ideas ?
>>>>>>
>>>>>> egad no! This is just to help add guarantees to those that use the
>>>>>> function tracer that when the tracing is disabled, it is guaranteed that
>>>>>> no more tracing will be called by the function tracer. Currently,
>>>>>> nothing relies on this. But we may add cases that might need this.
>>>>>
>>>>> Yep, identifying tracer quiescent state can become handy.
>>>>>
>>>>>>
>>>>>> In fact, only those that need this requirement would need to do this
>>>>>> trick. Anyway, we could make those depend on CONFIG_FREEZER, but that
>>>>>> just seems to be a strange dependency.
>>>>>
>>>>> This makes me wonder (question for Masami)...
>>>>>
>>>>> static int __kprobes check_safety(void)
>>>>> {
>>>>> int ret = 0;
>>>>> #if defined(CONFIG_PREEMPT) && defined(CONFIG_FREEZER)
>>>>> ret = freeze_processes();
>>>>> if (ret == 0) {
>>>>> struct task_struct *p, *q;
>>>>> do_each_thread(p, q) {
>>>>> if (p != current && p->state == TASK_RUNNING &&
>>>>> p->pid != 0) {
>>>>> printk("Check failed: %s is running\n",p->comm);
>>>>> ret = -1;
>>>>> goto loop_end;
>>>>> }
>>>>> } while_each_thread(p, q);
>>>>
>>>>
>>>>
>>>> How does that deal with kernel threads that don't freeze?
>>>
>>> Hmm, right. It can't handle non-freezable kernel threads.
>>>
>>>> Also freezing every processes seems a bit of a heavy thing for that.
>>>> Looks like a synchronize_tasks() would be really useful.
>>>
>>> Sure :-)
>>> Maybe, I'd better remove booster support on preemptive kernel until then.
>>
>>
>> I don't know as I haven't looked deeper into check_safety(), but does the
>> fact we have non-freezable tasks break the assumptions that make
>> kprobes booster safe? If so then yeah, may be deactivate it for now.
>>
>
> In the case of check_safety, it's not a bug per se if a task happens to
> be non freezable. freeze_processes() will likely return a non-zero
> value, and the whole check_safety will therefore return that value, so
> standard breakpoints will be used instead.
Hmm, unfortunately no, because freeze_processes() doesn't count
non-freezable threads...
---
todo = 0;
read_lock(&tasklist_lock);
do_each_thread(g, p) {
if (frozen(p) || !freezeable(p))
continue;
if (!freeze_task(p, sig_only))
continue;
/*
* Now that we've done set_freeze_flag, don't
* perturb a task in TASK_STOPPED or TASK_TRACED.
* It is "frozen enough". If the task does wake
* up, it will immediately call try_to_freeze.
*/
if (!task_is_stopped_or_traced(p) &&
!freezer_should_skip(p))
todo++;
} while_each_thread(g, p);
read_unlock(&tasklist_lock);
if (!todo || time_after(jiffies, end_time))
break;
---
So, it can return 0 if there is non-freezable threads running.
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
Disable kprobe booster when CONFIG_PREEMPT=y, because it can't ensure
that all kernel threads preempted on kprobe's boosted slot run out from
the slot even using freeze_processes().
The booster on preemptive kernel will be resumed if synchronize_tasks()
or something like that is introduced.
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jim Keniston <[email protected]>
CC: Mathieu Desnoyers <[email protected]>
Cc: Steven Rostedt <[email protected]>
---
arch/ia64/kernel/kprobes.c | 2 +-
arch/x86/kernel/kprobes.c | 2 +-
kernel/kprobes.c | 29 ++---------------------------
3 files changed, 4 insertions(+), 29 deletions(-)
diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
index 9adac44..7026b29 100644
--- a/arch/ia64/kernel/kprobes.c
+++ b/arch/ia64/kernel/kprobes.c
@@ -870,7 +870,7 @@ static int __kprobes pre_kprobes_handler(struct die_args *args)
return 1;
ss_probe:
-#if !defined(CONFIG_PREEMPT) || defined(CONFIG_FREEZER)
+#if !defined(CONFIG_PREEMPT)
if (p->ainsn.inst_flag == INST_FLAG_BOOSTABLE && !p->post_handler) {
/* Boost up -- we can execute copied instructions directly */
ia64_psr(regs)->ri = p->ainsn.slot;
diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index 5b8c750..9453815 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -429,7 +429,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
static void __kprobes setup_singlestep(struct kprobe *p, struct pt_regs *regs,
struct kprobe_ctlblk *kcb)
{
-#if !defined(CONFIG_PREEMPT) || defined(CONFIG_FREEZER)
+#if !defined(CONFIG_PREEMPT)
if (p->ainsn.boostable == 1 && !p->post_handler) {
/* Boost up -- we can execute copied instructions directly */
reset_current_kprobe();
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index b7df302..9907a03 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -124,30 +124,6 @@ static LIST_HEAD(kprobe_insn_pages);
static int kprobe_garbage_slots;
static int collect_garbage_slots(void);
-static int __kprobes check_safety(void)
-{
- int ret = 0;
-#if defined(CONFIG_PREEMPT) && defined(CONFIG_FREEZER)
- ret = freeze_processes();
- if (ret == 0) {
- struct task_struct *p, *q;
- do_each_thread(p, q) {
- if (p != current && p->state == TASK_RUNNING &&
- p->pid != 0) {
- printk("Check failed: %s is running\n",p->comm);
- ret = -1;
- goto loop_end;
- }
- } while_each_thread(p, q);
- }
-loop_end:
- thaw_processes();
-#else
- synchronize_sched();
-#endif
- return ret;
-}
-
/**
* __get_insn_slot() - Find a slot on an executable page for an instruction.
* We allocate an executable page if there's no room on existing ones.
@@ -235,9 +211,8 @@ static int __kprobes collect_garbage_slots(void)
{
struct kprobe_insn_page *kip, *next;
- /* Ensure no-one is preepmted on the garbages */
- if (check_safety())
- return -EAGAIN;
+ /* Ensure no-one is interrupted on the garbages */
+ synchronize_sched();
list_for_each_entry_safe(kip, next, &kprobe_insn_pages, list) {
int i;
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
* Masami Hiramatsu ([email protected]) wrote:
> Disable kprobe booster when CONFIG_PREEMPT=y, because it can't ensure
> that all kernel threads preempted on kprobe's boosted slot run out from
> the slot even using freeze_processes().
>
> The booster on preemptive kernel will be resumed if synchronize_tasks()
> or something like that is introduced.
Yes, given that the freezer does not deal with non-freezable tasks (as
you pointed out in a different thread), I think we cannot rely on it
with CONFIG_PREEMPT.
Acked-by: Mathieu Desnoyers <[email protected]>
>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Cc: Ananth N Mavinakayanahalli <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Jim Keniston <[email protected]>
> CC: Mathieu Desnoyers <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> ---
>
> arch/ia64/kernel/kprobes.c | 2 +-
> arch/x86/kernel/kprobes.c | 2 +-
> kernel/kprobes.c | 29 ++---------------------------
> 3 files changed, 4 insertions(+), 29 deletions(-)
>
> diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
> index 9adac44..7026b29 100644
> --- a/arch/ia64/kernel/kprobes.c
> +++ b/arch/ia64/kernel/kprobes.c
> @@ -870,7 +870,7 @@ static int __kprobes pre_kprobes_handler(struct die_args *args)
> return 1;
>
> ss_probe:
> -#if !defined(CONFIG_PREEMPT) || defined(CONFIG_FREEZER)
> +#if !defined(CONFIG_PREEMPT)
> if (p->ainsn.inst_flag == INST_FLAG_BOOSTABLE && !p->post_handler) {
> /* Boost up -- we can execute copied instructions directly */
> ia64_psr(regs)->ri = p->ainsn.slot;
> diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
> index 5b8c750..9453815 100644
> --- a/arch/x86/kernel/kprobes.c
> +++ b/arch/x86/kernel/kprobes.c
> @@ -429,7 +429,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
> static void __kprobes setup_singlestep(struct kprobe *p, struct pt_regs *regs,
> struct kprobe_ctlblk *kcb)
> {
> -#if !defined(CONFIG_PREEMPT) || defined(CONFIG_FREEZER)
> +#if !defined(CONFIG_PREEMPT)
> if (p->ainsn.boostable == 1 && !p->post_handler) {
> /* Boost up -- we can execute copied instructions directly */
> reset_current_kprobe();
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index b7df302..9907a03 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -124,30 +124,6 @@ static LIST_HEAD(kprobe_insn_pages);
> static int kprobe_garbage_slots;
> static int collect_garbage_slots(void);
>
> -static int __kprobes check_safety(void)
> -{
> - int ret = 0;
> -#if defined(CONFIG_PREEMPT) && defined(CONFIG_FREEZER)
> - ret = freeze_processes();
> - if (ret == 0) {
> - struct task_struct *p, *q;
> - do_each_thread(p, q) {
> - if (p != current && p->state == TASK_RUNNING &&
> - p->pid != 0) {
> - printk("Check failed: %s is running\n",p->comm);
> - ret = -1;
> - goto loop_end;
> - }
> - } while_each_thread(p, q);
> - }
> -loop_end:
> - thaw_processes();
> -#else
> - synchronize_sched();
> -#endif
> - return ret;
> -}
> -
> /**
> * __get_insn_slot() - Find a slot on an executable page for an instruction.
> * We allocate an executable page if there's no room on existing ones.
> @@ -235,9 +211,8 @@ static int __kprobes collect_garbage_slots(void)
> {
> struct kprobe_insn_page *kip, *next;
>
> - /* Ensure no-one is preepmted on the garbages */
> - if (check_safety())
> - return -EAGAIN;
> + /* Ensure no-one is interrupted on the garbages */
> + synchronize_sched();
>
> list_for_each_entry_safe(kip, next, &kprobe_insn_pages, list) {
> int i;
>
>
> --
> Masami Hiramatsu
>
> Software Engineer
> Hitachi Computer Products (America), Inc.
> Software Solutions Division
>
> e-mail: [email protected]
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
On Wed, Jan 27, 2010 at 04:55:31PM -0500, Masami Hiramatsu wrote:
> Disable kprobe booster when CONFIG_PREEMPT=y, because it can't ensure
> that all kernel threads preempted on kprobe's boosted slot run out from
> the slot even using freeze_processes().
>
> The booster on preemptive kernel will be resumed if synchronize_tasks()
> or something like that is introduced.
>
> Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Ananth N Mavinakayanahalli <[email protected]>
* Masami Hiramatsu <[email protected]> wrote:
> Disable kprobe booster when CONFIG_PREEMPT=y, because it can't ensure that
> all kernel threads preempted on kprobe's boosted slot run out from the slot
> even using freeze_processes().
hm, this really sucks as it makes preemptible kernels perform worse. Is there
no better solution?
> The booster on preemptive kernel will be resumed if synchronize_tasks() or
> something like that is introduced.
such as this one?
Ingo
On Fri, 2010-01-29 at 10:21 +0100, Ingo Molnar wrote:
> * Masami Hiramatsu <[email protected]> wrote:
>
> > Disable kprobe booster when CONFIG_PREEMPT=y, because it can't ensure that
> > all kernel threads preempted on kprobe's boosted slot run out from the slot
> > even using freeze_processes().
>
> hm, this really sucks as it makes preemptible kernels perform worse. Is there
> no better solution?
We could pre-allocate 1 slot per kernel thread.
Ingo Molnar wrote:
>
> * Masami Hiramatsu <[email protected]> wrote:
>
>> Disable kprobe booster when CONFIG_PREEMPT=y, because it can't ensure that
>> all kernel threads preempted on kprobe's boosted slot run out from the slot
>> even using freeze_processes().
>
> hm, this really sucks as it makes preemptible kernels perform worse. Is there
> no better solution?
>
>> The booster on preemptive kernel will be resumed if synchronize_tasks() or
>> something like that is introduced.
>
> such as this one?
Yes, I think this synchronize_tasks(), which just (sleeping) wait until
all currently preempted tasks are wake up and scheduled, can ensure safety.
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
* Masami Hiramatsu ([email protected]) wrote:
> Ingo Molnar wrote:
> >
> > * Masami Hiramatsu <[email protected]> wrote:
> >
> >> Disable kprobe booster when CONFIG_PREEMPT=y, because it can't ensure that
> >> all kernel threads preempted on kprobe's boosted slot run out from the slot
> >> even using freeze_processes().
> >
> > hm, this really sucks as it makes preemptible kernels perform worse. Is there
> > no better solution?
> >
> >> The booster on preemptive kernel will be resumed if synchronize_tasks() or
> >> something like that is introduced.
> >
> > such as this one?
>
> Yes, I think this synchronize_tasks(), which just (sleeping) wait until
> all currently preempted tasks are wake up and scheduled, can ensure safety
If a task is set as stopped, and the preempted before calling schedule,
can this result in a preempted task staying in that state for an
arbitrary long period of time ? Or is there some mechanism prohibiting
that in the scheduler ?
Thanks,
Mathieu
>
> Thank you,
>
> --
> Masami Hiramatsu
>
> Software Engineer
> Hitachi Computer Products (America), Inc.
> Software Solutions Division
>
> e-mail: [email protected]
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
On Fri, 2010-01-29 at 12:08 -0500, Mathieu Desnoyers wrote:
>
> If a task is set as stopped, and the preempted before calling schedule,
> can this result in a preempted task staying in that state for an
> arbitrary long period of time ? Or is there some mechanism prohibiting
> that in the scheduler ?
PREEMPT_ACTIVE does that:
preempt_schedule()
add_preempt_count(PREEMPT_ACTIVE);
schedule();
schedule()
if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
if (unlikely(signal_pending_state(prev->state, prev)))
prev->state = TASK_RUNNING;
else
deactivate_task(rq, prev, 1);
switch_count = &prev->nvcsw;
}
* Peter Zijlstra ([email protected]) wrote:
> On Fri, 2010-01-29 at 12:08 -0500, Mathieu Desnoyers wrote:
> >
> > If a task is set as stopped, and the preempted before calling schedule,
> > can this result in a preempted task staying in that state for an
> > arbitrary long period of time ? Or is there some mechanism prohibiting
> > that in the scheduler ?
>
> PREEMPT_ACTIVE does that:
>
> preempt_schedule()
> add_preempt_count(PREEMPT_ACTIVE);
> schedule();
>
>
> schedule()
> if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
> if (unlikely(signal_pending_state(prev->state, prev)))
> prev->state = TASK_RUNNING;
> else
> deactivate_task(rq, prev, 1);
> switch_count = &prev->nvcsw;
> }
OK, it looks safe for preemption. Is there any unforeseen weird way a
task can be scheduled out and stopped that would permit it to either:
- stall the algorithm forever (DoS)
- appear as quiescent to the algorithm while its stack would hold return
pointers to incorrect locations
?
I'm concerned about page faults here.
Thanks,
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
Mathieu Desnoyers wrote:
> * Peter Zijlstra ([email protected]) wrote:
>> On Fri, 2010-01-29 at 12:08 -0500, Mathieu Desnoyers wrote:
>>>
>>> If a task is set as stopped, and the preempted before calling schedule,
>>> can this result in a preempted task staying in that state for an
>>> arbitrary long period of time ? Or is there some mechanism prohibiting
>>> that in the scheduler ?
>>
>> PREEMPT_ACTIVE does that:
>>
>> preempt_schedule()
>> add_preempt_count(PREEMPT_ACTIVE);
>> schedule();
>>
>>
>> schedule()
>> if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
>> if (unlikely(signal_pending_state(prev->state, prev)))
>> prev->state = TASK_RUNNING;
>> else
>> deactivate_task(rq, prev, 1);
>> switch_count = &prev->nvcsw;
>> }
>
> OK, it looks safe for preemption. Is there any unforeseen weird way a
> task can be scheduled out and stopped that would permit it to either:
>
> - stall the algorithm forever (DoS)
> - appear as quiescent to the algorithm while its stack would hold return
> pointers to incorrect locations
>
> ?
>
> I'm concerned about page faults here.
booster also checks whether the instruction can cause page-fault,
if it will cause it, kprobes doesn't boost it.
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
On Mon, Jan 25, 2010 at 02:17:29PM +0800, Lai Jiangshan wrote:
> Frederic Weisbecker wrote:
> > Drop the null check on hlist->hash. It is wasteful because
> > we don't register the tracer if the buffer allocation failed,
> > and the memory barrier in register_ftrace_graph() ensure it
> > is visible to the callbacks right away.
> >
>
> When it is on a cpu which is just came up, hlist->hash is not
> allocated either.
>
> See ftrace_profile_init().
>
> function_profile_call(), profile_graph_entry() and profile_graph_return()
> on a just online cpu are wasteful. I think we need call
> ftrace_profile_init_cpu()(if ftrace_profile_enabled=1) when CPU_UP_PREPARE.
May be yeah, or perhaps we should play with cpu_possible_mask()
instead of cpu_online. Whatever, the patch is also buggy for the reasons
given by Steve, I'm going to drop it until we have a sane implementation
of synchronize_preempted_tasks().
On Mon, Jan 25, 2010 at 02:41:59PM +0800, Lai Jiangshan wrote:
> Frederic Weisbecker wrote:
> > After we disable the function profiler, the function hashlist
> > stays in memory. This is wasteful as nobody needs it anymore,
> > until the next use if any.
> >
> > Release it when we disable the function profiler instead of
> > resetting it in the next use.
> >
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Li Zefan <[email protected]>
> > Cc: Lai Jiangshan <[email protected]>
> > ---
> > kernel/trace/ftrace.c | 1 +
> > kernel/trace/functions_hlist.c | 61 +++++++++++++++++-----------------------
> > kernel/trace/functions_hlist.h | 1 +
> > 3 files changed, 28 insertions(+), 35 deletions(-)
> >
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index dfd8f7c..0ded01c 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -509,6 +509,7 @@ ftrace_profile_write(struct file *filp, const char __user *ubuf,
>
> > /*
> > * unregister_ftrace_profiler calls stop_machine
> > * so this acts like an synchronize_sched.
> > */
> > unregister_ftrace_profiler();
>
> unluckily, when !CONFIG_DYNAMIC_FTRACE, it does not call stop_machine()
> nor synchronize_sched(), bug here?
Yeah. I guess the synchronize_sched() is here to ensure
that if a user reenables the profiler, there is no
pending tracing callback that touches the hlist
that can become reset.
We indeed may need an explicit synchronize_sched() for
!CONFIG_DYNAMIC_FTRACE.
On Mon, Jan 25, 2010 at 04:50:48PM +0800, Lai Jiangshan wrote:
> Frederic Weisbecker wrote:
> > When we set a filter to start tracing from a given function in
> > the function graph tracer, the filter is stored in a linear array.
> >
> > It doesn't scale very well, we even limited the number of such
> > functions to 32.
> >
> > Now that we have a hashlist of functions, lets put a field inside
> > each function node so that we can check if a function is one of
> > these filters using the hashlist, not a linear array.
>
> The linear array @ftrace_graph_funcs is still used in this patch.
> we still limit the number of such functions to 32?
>
> [...]
Heh, that's right. I should probably make it a list. The array
is only used when we touch set_graph_function file.
> > #define FTRACE_GRAPH_MAX_FUNCS 32
> > extern int ftrace_graph_count;
> > extern unsigned long ftrace_graph_funcs[FTRACE_GRAPH_MAX_FUNCS];
> > -
> > -static inline int ftrace_graph_addr(unsigned long addr)
> > -{
> > - int i;
> > -
> > - if (!ftrace_graph_count)
> > - return 1;
>
> Here return 1.
>
> [...]
> > +static inline int ftrace_graph_addr(unsigned long addr)
> > +{
> > + struct func_node *rec;
> > + struct func_hlist *hlist;
> > +
> > + if (!ftrace_graph_count)
> > + return 0;
> > +
>
> But in this patch, return 0 here, the behave will be changed.
Yeah, I've inverted the check from the tracing callback.
It seems to me that:
if (ftrace_graph_addr(addr))
more logically means that we trace this addr. Making it
a boolean would make the things more clear perhaps?