LinuxLists.cc - [PATCH 0/9] tracing: Bug fixes and minor cleanup

2017-09-22 19:58:41

Subject: [PATCH 0/9] tracing: Bug fixes and minor cleanup

Hi,

This is a set of small fixes and cleanup patches that were originally
part of V2 of the inter-event tracing patchset, along with 2 of Steve
Rostedt's patches related to that patchset.

The 'tracing: Steve's unofficial trace_recursive_lock() patch' patch
is from Steve but not signed off on - he has a replacement under test
that should replace this.

The 'tracing: Reverse the order of trace_types_lock and event_mutex'
is an official patch from Steve, included here for convenience.

The rest are bugfixes and cleanups from the old V2 inter-event
patchset, separated out since they're standalone improvements that can
stand on their own outside of the inter-event patches.

Thanks,

Tom

The following changes since commit 170b3b1050e28d1ba0700e262f0899ffa4fccc52:

tracing: Apply trace_clock changes to instance max buffer (2017-09-06 20:52:20 -0400)

are available in the git repository at:

https://github.com/tzanussi/linux-trace-inter-event.git tzanussi/pre-inter-event-v3
https://github.com/tzanussi/linux-trace-inter-event/tree/tzanussi/pre-inter-event-v3

Steven Rostedt (1):
tracing: Steve's unofficial trace_recursive_lock() patch

Steven Rostedt (VMware) (1):
tracing: Reverse the order of trace_types_lock and event_mutex

Tom Zanussi (7):
tracing: Exclude 'generic fields' from histograms
tracing: Remove lookups from tracing_map hitcount
tracing: Increase tracing map KEYS_MAX size
tracing: Make traceprobe parsing code reusable
tracing: Clean up hist_field_flags enum
tracing: Add hist_field_name() accessor
tracing: Reimplement log2

kernel/trace/ring_buffer.c | 64 ++++++--------------
kernel/trace/trace.c | 91 +++++++++++++++++++++++++++++
kernel/trace/trace.h | 7 +++
kernel/trace/trace_events.c | 31 +++++-----
kernel/trace/trace_events_hist.c | 122 +++++++++++++++++++++++++++------------
kernel/trace/trace_kprobe.c | 18 +++---
kernel/trace/trace_probe.c | 86 ---------------------------
kernel/trace/trace_probe.h | 7 ---
kernel/trace/trace_uprobe.c | 2 +-
kernel/trace/tracing_map.c | 3 +-
kernel/trace/tracing_map.h | 2 +-
11 files changed, 227 insertions(+), 206 deletions(-)

--
1.9.3

2017-09-22 19:58:47

by Tom Zanussi

[permalink] [raw]

Subject: [PATCH 1/9] tracing: Steve's unofficial trace_recursive_lock() patch

From: Steven Rostedt <[email protected]>

On Tue, 5 Sep 2017 16:57:52 -0500
Tom Zanussi <[email protected]> wrote:

> Synthetic event generation requires the reservation of a second event
> while the reservation of a previous event is still in progress. The
> trace_recursive_lock() check in ring_buffer_lock_reserve() prevents
> this however.
>
> This sets up a special reserve pathway for this particular case,
> leaving existing pathways untouched, other than an additional check in
> ring_buffer_lock_reserve() and trace_event_buffer_reserve(). These
> checks could be gotten rid of as well, with copies of those functions,
> but for now try to avoid that unless necessary.
>
> Signed-off-by: Tom Zanussi <[email protected]>

I've been planing on changing that lock, which may help you here
without having to mess around with parameters. That is to simply add a
counter. Would this patch help you. You can add a patch to increment
the count to 5 with an explanation of handling synthetic events, but
even getting to 4 is extremely unlikely.

I'll make this into an official patch if this works for you, and then
you can include it in your series.

-- Steve
---
kernel/trace/ring_buffer.c | 64 ++++++++++++----------------------------------
1 file changed, 17 insertions(+), 47 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 81279c6..f6ee9b1 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2538,61 +2538,29 @@ static void rb_commit(struct ring_buffer_per_cpu *cpu_buffer,
* The lock and unlock are done within a preempt disable section.
* The current_context per_cpu variable can only be modified
* by the current task between lock and unlock. But it can
- * be modified more than once via an interrupt. To pass this
- * information from the lock to the unlock without having to
- * access the 'in_interrupt()' functions again (which do show
- * a bit of overhead in something as critical as function tracing,
- * we use a bitmask trick.
+ * be modified more than once via an interrupt. There are four
+ * different contexts that we need to consider.
*
- * bit 0 = NMI context
- * bit 1 = IRQ context
- * bit 2 = SoftIRQ context
- * bit 3 = normal context.
+ * Normal context.
+ * SoftIRQ context
+ * IRQ context
+ * NMI context
*
- * This works because this is the order of contexts that can
- * preempt other contexts. A SoftIRQ never preempts an IRQ
- * context.
- *
- * When the context is determined, the corresponding bit is
- * checked and set (if it was set, then a recursion of that context
- * happened).
- *
- * On unlock, we need to clear this bit. To do so, just subtract
- * 1 from the current_context and AND it to itself.
- *
- * (binary)
- * 101 - 1 = 100
- * 101 & 100 = 100 (clearing bit zero)
- *
- * 1010 - 1 = 1001
- * 1010 & 1001 = 1000 (clearing bit 1)
- *
- * The least significant bit can be cleared this way, and it
- * just so happens that it is the same bit corresponding to
- * the current context.
+ * If for some reason the ring buffer starts to recurse, we
+ * only allow that to happen at most 4 times (one for each
+ * context). If it happens 5 times, then we consider this a
+ * recusive loop and do not let it go further.
*/

static __always_inline int
trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer)
{
- unsigned int val = cpu_buffer->current_context;
- int bit;
-
- if (in_interrupt()) {
- if (in_nmi())
- bit = RB_CTX_NMI;
- else if (in_irq())
- bit = RB_CTX_IRQ;
- else
- bit = RB_CTX_SOFTIRQ;
- } else
- bit = RB_CTX_NORMAL;
-
- if (unlikely(val & (1 << bit)))
+ if (cpu_buffer->current_context >= 4)
return 1;

- val |= (1 << bit);
- cpu_buffer->current_context = val;
+ cpu_buffer->current_context++;
+ /* Interrupts must see this update */
+ barrier();

return 0;
}
@@ -2600,7 +2568,9 @@ static void rb_commit(struct ring_buffer_per_cpu *cpu_buffer,
static __always_inline void
trace_recursive_unlock(struct ring_buffer_per_cpu *cpu_buffer)
{
- cpu_buffer->current_context &= cpu_buffer->current_context - 1;
+ /* Don't let the dec leak out */
+ barrier();
+ cpu_buffer->current_context--;
}

/**
--
1.9.3

2017-09-22 19:59:05

by Tom Zanussi

[permalink] [raw]

Subject: [PATCH 3/9] tracing: Exclude 'generic fields' from histograms

There are a small number of 'generic fields' (comm/COMM/cpu/CPU) that
are found by trace_find_event_field() but are only meant for
filtering. Specifically, they unlike normal fields, they have a size
of 0 and thus wreak havoc when used as a histogram key.

Exclude these (return -EINVAL) when used as histogram keys.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 1c21d0e..7eb975a 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -450,7 +450,7 @@ static int create_val_field(struct hist_trigger_data *hist_data,
}

field = trace_find_event_field(file->event_call, field_name);
- if (!field) {
+ if (!field || !field->size) {
ret = -EINVAL;
goto out;
}
@@ -548,7 +548,7 @@ static int create_key_field(struct hist_trigger_data *hist_data,
}

field = trace_find_event_field(file->event_call, field_name);
- if (!field) {
+ if (!field || !field->size) {
ret = -EINVAL;
goto out;
}
--
1.9.3

2017-09-22 19:59:14

by Tom Zanussi

[permalink] [raw]

Subject: [PATCH 4/9] tracing: Remove lookups from tracing_map hitcount

Lookups inflate the hitcount, making it essentially useless. Only
inserts and updates should really affect the hitcount anyway, so
explicitly filter lookups out.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/tracing_map.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
index 305039b..07e7534 100644
--- a/kernel/trace/tracing_map.c
+++ b/kernel/trace/tracing_map.c
@@ -428,7 +428,8 @@ static inline bool keys_match(void *key, void *test_key, unsigned key_size)

if (test_key && test_key == key_hash && entry->val &&
keys_match(key, entry->val->key, map->key_size)) {
- atomic64_inc(&map->hits);
+ if (!lookup_only)
+ atomic64_inc(&map->hits);
return entry->val;
}

--
1.9.3

2017-09-22 19:59:24

by Tom Zanussi

[permalink] [raw]

Subject: [PATCH 5/9] tracing: Increase tracing map KEYS_MAX size

The current default for the number of subkeys in a compound key is 2,
which is too restrictive. Increase it to a more realistic value of 3.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/tracing_map.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/tracing_map.h b/kernel/trace/tracing_map.h
index 618838f..f097511 100644
--- a/kernel/trace/tracing_map.h
+++ b/kernel/trace/tracing_map.h
@@ -5,7 +5,7 @@
#define TRACING_MAP_BITS_MAX 17
#define TRACING_MAP_BITS_MIN 7

-#define TRACING_MAP_KEYS_MAX 2
+#define TRACING_MAP_KEYS_MAX 3
#define TRACING_MAP_VALS_MAX 3
#define TRACING_MAP_FIELDS_MAX (TRACING_MAP_KEYS_MAX + \
TRACING_MAP_VALS_MAX)
--
1.9.3

2017-09-22 19:59:36

by Tom Zanussi

[permalink] [raw]

Subject: [PATCH 6/9] tracing: Make traceprobe parsing code reusable

traceprobe_probes_write() and traceprobe_command() actually contain
nothing that ties them to kprobes - the code is generically useful for
similar types of parsing elsewhere, so separate it out and move it to
trace.c/trace.h.

Other than moving it, the only change is in naming:
traceprobe_probes_write() becomes trace_parse_run_command() and
traceprobe_command() becomes trace_run_command().

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace.c | 86 +++++++++++++++++++++++++++++++++++++++++++++
kernel/trace/trace.h | 7 ++++
kernel/trace/trace_kprobe.c | 18 +++++-----
kernel/trace/trace_probe.c | 86 ---------------------------------------------
kernel/trace/trace_probe.h | 7 ----
kernel/trace/trace_uprobe.c | 2 +-
6 files changed, 103 insertions(+), 103 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b0ad927..c203b4b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8268,6 +8268,92 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
}
EXPORT_SYMBOL_GPL(ftrace_dump);

+int trace_run_command(const char *buf, int (*createfn)(int, char **))
+{
+ char **argv;
+ int argc, ret;
+
+ argc = 0;
+ ret = 0;
+ argv = argv_split(GFP_KERNEL, buf, &argc);
+ if (!argv)
+ return -ENOMEM;
+
+ if (argc)
+ ret = createfn(argc, argv);
+
+ argv_free(argv);
+
+ return ret;
+}
+
+#define WRITE_BUFSIZE 4096
+
+ssize_t trace_parse_run_command(struct file *file, const char __user *buffer,
+ size_t count, loff_t *ppos,
+ int (*createfn)(int, char **))
+{
+ char *kbuf, *buf, *tmp;
+ int ret = 0;
+ size_t done = 0;
+ size_t size;
+
+ kbuf = kmalloc(WRITE_BUFSIZE, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ while (done < count) {
+ size = count - done;
+
+ if (size >= WRITE_BUFSIZE)
+ size = WRITE_BUFSIZE - 1;
+
+ if (copy_from_user(kbuf, buffer + done, size)) {
+ ret = -EFAULT;
+ goto out;
+ }
+ kbuf[size] = '\0';
+ buf = kbuf;
+ do {
+ tmp = strchr(buf, '\n');
+ if (tmp) {
+ *tmp = '\0';
+ size = tmp - buf + 1;
+ } else {
+ size = strlen(buf);
+ if (done + size < count) {
+ if (buf != kbuf)
+ break;
+ /* This can accept WRITE_BUFSIZE - 2 ('\n' + '\0') */
+ pr_warn("Line length is too long: Should be less than %d\n",
+ WRITE_BUFSIZE - 2);
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+ done += size;
+
+ /* Remove comments */
+ tmp = strchr(buf, '#');
+
+ if (tmp)
+ *tmp = '\0';
+
+ ret = trace_run_command(buf, createfn);
+ if (ret)
+ goto out;
+ buf += size;
+
+ } while (done < count);
+ }
+ ret = done;
+
+out:
+ kfree(kbuf);
+
+ return ret;
+}
+
__init static int tracer_alloc_buffers(void)
{
int ring_buf_size;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index fb5d54d..61e2f35 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1752,6 +1752,13 @@ extern int trace_event_enable_disable(struct trace_event_file *file,
int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set);
int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled);

+#define MAX_EVENT_NAME_LEN 64
+
+extern int trace_run_command(const char *buf, int (*createfn)(int, char**));
+extern ssize_t trace_parse_run_command(struct file *file,
+ const char __user *buffer, size_t count, loff_t *ppos,
+ int (*createfn)(int, char**));
+
/*
* Normal trace_printk() and friends allocates special buffers
* to do the manipulation, as well as saves the print formats
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index c9b5aa1..996902a 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -907,8 +907,8 @@ static int probes_open(struct inode *inode, struct file *file)
static ssize_t probes_write(struct file *file, const char __user *buffer,
size_t count, loff_t *ppos)
{
- return traceprobe_probes_write(file, buffer, count, ppos,
- create_trace_kprobe);
+ return trace_parse_run_command(file, buffer, count, ppos,
+ create_trace_kprobe);
}

static const struct file_operations kprobe_events_ops = {
@@ -1433,9 +1433,9 @@ static __init int kprobe_trace_self_tests_init(void)

pr_info("Testing kprobe tracing: ");

- ret = traceprobe_command("p:testprobe kprobe_trace_selftest_target "
- "$stack $stack0 +0($stack)",
- create_trace_kprobe);
+ ret = trace_run_command("p:testprobe kprobe_trace_selftest_target "
+ "$stack $stack0 +0($stack)",
+ create_trace_kprobe);
if (WARN_ON_ONCE(ret)) {
pr_warn("error on probing function entry.\n");
warn++;
@@ -1455,8 +1455,8 @@ static __init int kprobe_trace_self_tests_init(void)
}
}

- ret = traceprobe_command("r:testprobe2 kprobe_trace_selftest_target "
- "$retval", create_trace_kprobe);
+ ret = trace_run_command("r:testprobe2 kprobe_trace_selftest_target "
+ "$retval", create_trace_kprobe);
if (WARN_ON_ONCE(ret)) {
pr_warn("error on probing function return.\n");
warn++;
@@ -1526,13 +1526,13 @@ static __init int kprobe_trace_self_tests_init(void)
disable_trace_kprobe(tk, file);
}

- ret = traceprobe_command("-:testprobe", create_trace_kprobe);
+ ret = trace_run_command("-:testprobe", create_trace_kprobe);
if (WARN_ON_ONCE(ret)) {
pr_warn("error on deleting a probe.\n");
warn++;
}

- ret = traceprobe_command("-:testprobe2", create_trace_kprobe);
+ ret = trace_run_command("-:testprobe2", create_trace_kprobe);
if (WARN_ON_ONCE(ret)) {
pr_warn("error on deleting a probe.\n");
warn++;
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 52478f0..d593573 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -623,92 +623,6 @@ void traceprobe_free_probe_arg(struct probe_arg *arg)
kfree(arg->comm);
}

-int traceprobe_command(const char *buf, int (*createfn)(int, char **))
-{
- char **argv;
- int argc, ret;
-
- argc = 0;
- ret = 0;
- argv = argv_split(GFP_KERNEL, buf, &argc);
- if (!argv)
- return -ENOMEM;
-
- if (argc)
- ret = createfn(argc, argv);
-
- argv_free(argv);
-
- return ret;
-}
-
-#define WRITE_BUFSIZE 4096
-
-ssize_t traceprobe_probes_write(struct file *file, const char __user *buffer,
- size_t count, loff_t *ppos,
- int (*createfn)(int, char **))
-{
- char *kbuf, *buf, *tmp;
- int ret = 0;
- size_t done = 0;
- size_t size;
-
- kbuf = kmalloc(WRITE_BUFSIZE, GFP_KERNEL);
- if (!kbuf)
- return -ENOMEM;
-
- while (done < count) {
- size = count - done;
-
- if (size >= WRITE_BUFSIZE)
- size = WRITE_BUFSIZE - 1;
-
- if (copy_from_user(kbuf, buffer + done, size)) {
- ret = -EFAULT;
- goto out;
- }
- kbuf[size] = '\0';
- buf = kbuf;
- do {
- tmp = strchr(buf, '\n');
- if (tmp) {
- *tmp = '\0';
- size = tmp - buf + 1;
- } else {
- size = strlen(buf);
- if (done + size < count) {
- if (buf != kbuf)
- break;
- /* This can accept WRITE_BUFSIZE - 2 ('\n' + '\0') */
- pr_warn("Line length is too long: Should be less than %d\n",
- WRITE_BUFSIZE - 2);
- ret = -EINVAL;
- goto out;
- }
- }
- done += size;
-
- /* Remove comments */
- tmp = strchr(buf, '#');
-
- if (tmp)
- *tmp = '\0';
-
- ret = traceprobe_command(buf, createfn);
- if (ret)
- goto out;
- buf += size;
-
- } while (done < count);
- }
- ret = done;
-
-out:
- kfree(kbuf);
-
- return ret;
-}
-
static int __set_print_fmt(struct trace_probe *tp, char *buf, int len,
bool is_return)
{
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 903273c..fb66e3e 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -42,7 +42,6 @@

#define MAX_TRACE_ARGS 128
#define MAX_ARGSTR_LEN 63
-#define MAX_EVENT_NAME_LEN 64
#define MAX_STRING_SIZE PATH_MAX

/* Reserved field names */
@@ -356,12 +355,6 @@ extern int traceprobe_conflict_field_name(const char *name,

extern int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset);

-extern ssize_t traceprobe_probes_write(struct file *file,
- const char __user *buffer, size_t count, loff_t *ppos,
- int (*createfn)(int, char**));
-
-extern int traceprobe_command(const char *buf, int (*createfn)(int, char**));
-
/* Sum up total data length for dynamic arraies (strings) */
static nokprobe_inline int
__get_data_size(struct trace_probe *tp, struct pt_regs *regs)
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index a7581fe..402120b 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -651,7 +651,7 @@ static int probes_open(struct inode *inode, struct file *file)
static ssize_t probes_write(struct file *file, const char __user *buffer,
size_t count, loff_t *ppos)
{
- return traceprobe_probes_write(file, buffer, count, ppos, create_trace_uprobe);
+ return trace_parse_run_command(file, buffer, count, ppos, create_trace_uprobe);
}

static const struct file_operations uprobe_events_ops = {
--
1.9.3

2017-09-22 19:59:45

by Tom Zanussi

[permalink] [raw]

Subject: [PATCH 7/9] tracing: Clean up hist_field_flags enum

As we add more flags, specifying explicit integers for the flag values
becomes more unwieldy and error-prone - switch them over to left-shift
values.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 7eb975a..4f6b640 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -110,16 +110,16 @@ static u64 hist_field_log2(struct hist_field *hist_field, void *event)
#define HIST_KEY_SIZE_MAX (MAX_FILTER_STR_VAL + HIST_STACKTRACE_SIZE)

enum hist_field_flags {
- HIST_FIELD_FL_HITCOUNT = 1,
- HIST_FIELD_FL_KEY = 2,
- HIST_FIELD_FL_STRING = 4,
- HIST_FIELD_FL_HEX = 8,
- HIST_FIELD_FL_SYM = 16,
- HIST_FIELD_FL_SYM_OFFSET = 32,
- HIST_FIELD_FL_EXECNAME = 64,
- HIST_FIELD_FL_SYSCALL = 128,
- HIST_FIELD_FL_STACKTRACE = 256,
- HIST_FIELD_FL_LOG2 = 512,
+ HIST_FIELD_FL_HITCOUNT = 1 << 0,
+ HIST_FIELD_FL_KEY = 1 << 1,
+ HIST_FIELD_FL_STRING = 1 << 2,
+ HIST_FIELD_FL_HEX = 1 << 3,
+ HIST_FIELD_FL_SYM = 1 << 4,
+ HIST_FIELD_FL_SYM_OFFSET = 1 << 5,
+ HIST_FIELD_FL_EXECNAME = 1 << 6,
+ HIST_FIELD_FL_SYSCALL = 1 << 7,
+ HIST_FIELD_FL_STACKTRACE = 1 << 8,
+ HIST_FIELD_FL_LOG2 = 1 << 9,
};

struct hist_trigger_attrs {
--
1.9.3

2017-09-22 19:59:56

by Tom Zanussi

[permalink] [raw]

Subject: [PATCH 8/9] tracing: Add hist_field_name() accessor

In preparation for hist_fields that won't be strictly based on
trace_event_fields, add a new hist_field_name() accessor to allow that
flexibility and update associated users.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 67 +++++++++++++++++++++++++++-------------
1 file changed, 45 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 4f6b640..4dc39e3 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -146,6 +146,23 @@ struct hist_trigger_data {
struct tracing_map *map;
};

+static const char *hist_field_name(struct hist_field *field,
+ unsigned int level)
+{
+ const char *field_name = "";
+
+ if (level > 1)
+ return field_name;
+
+ if (field->field)
+ field_name = field->field->name;
+
+ if (field_name == NULL)
+ field_name = "";
+
+ return field_name;
+}
+
static hist_field_fn_t select_value_fn(int field_size, int field_is_signed)
{
hist_field_fn_t fn = NULL;
@@ -653,7 +670,6 @@ static int is_descending(const char *str)
static int create_sort_keys(struct hist_trigger_data *hist_data)
{
char *fields_str = hist_data->attrs->sort_key_str;
- struct ftrace_event_field *field = NULL;
struct tracing_map_sort_key *sort_key;
int descending, ret = 0;
unsigned int i, j;
@@ -670,7 +686,9 @@ static int create_sort_keys(struct hist_trigger_data *hist_data)
}

for (i = 0; i < TRACING_MAP_SORT_KEYS_MAX; i++) {
+ struct hist_field *hist_field;
char *field_str, *field_name;
+ const char *test_name;

sort_key = &hist_data->sort_keys[i];

@@ -703,8 +721,10 @@ static int create_sort_keys(struct hist_trigger_data *hist_data)
}

for (j = 1; j < hist_data->n_fields; j++) {
- field = hist_data->fields[j]->field;
- if (field && (strcmp(field_name, field->name) == 0)) {
+ hist_field = hist_data->fields[j];
+ test_name = hist_field_name(hist_field, 0);
+
+ if (strcmp(field_name, test_name) == 0) {
sort_key->field_idx = j;
descending = is_descending(field_str);
if (descending < 0) {
@@ -952,6 +972,7 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
struct hist_field *key_field;
char str[KSYM_SYMBOL_LEN];
bool multiline = false;
+ const char *field_name;
unsigned int i;
u64 uval;

@@ -963,26 +984,27 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
if (i > hist_data->n_vals)
seq_puts(m, ", ");

+ field_name = hist_field_name(key_field, 0);
+
if (key_field->flags & HIST_FIELD_FL_HEX) {
uval = *(u64 *)(key + key_field->offset);
- seq_printf(m, "%s: %llx",
- key_field->field->name, uval);
+ seq_printf(m, "%s: %llx", field_name, uval);
} else if (key_field->flags & HIST_FIELD_FL_SYM) {
uval = *(u64 *)(key + key_field->offset);
sprint_symbol_no_offset(str, uval);
- seq_printf(m, "%s: [%llx] %-45s",
- key_field->field->name, uval, str);
+ seq_printf(m, "%s: [%llx] %-45s", field_name,
+ uval, str);
} else if (key_field->flags & HIST_FIELD_FL_SYM_OFFSET) {
uval = *(u64 *)(key + key_field->offset);
sprint_symbol(str, uval);
- seq_printf(m, "%s: [%llx] %-55s",
- key_field->field->name, uval, str);
+ seq_printf(m, "%s: [%llx] %-55s", field_name,
+ uval, str);
} else if (key_field->flags & HIST_FIELD_FL_EXECNAME) {
char *comm = elt->private_data;

uval = *(u64 *)(key + key_field->offset);
- seq_printf(m, "%s: %-16s[%10llu]",
- key_field->field->name, comm, uval);
+ seq_printf(m, "%s: %-16s[%10llu]", field_name,
+ comm, uval);
} else if (key_field->flags & HIST_FIELD_FL_SYSCALL) {
const char *syscall_name;

@@ -991,8 +1013,8 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
if (!syscall_name)
syscall_name = "unknown_syscall";

- seq_printf(m, "%s: %-30s[%3llu]",
- key_field->field->name, syscall_name, uval);
+ seq_printf(m, "%s: %-30s[%3llu]", field_name,
+ syscall_name, uval);
} else if (key_field->flags & HIST_FIELD_FL_STACKTRACE) {
seq_puts(m, "stacktrace:\n");
hist_trigger_stacktrace_print(m,
@@ -1000,15 +1022,14 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
HIST_STACKTRACE_DEPTH);
multiline = true;
} else if (key_field->flags & HIST_FIELD_FL_LOG2) {
- seq_printf(m, "%s: ~ 2^%-2llu", key_field->field->name,
+ seq_printf(m, "%s: ~ 2^%-2llu", field_name,
*(u64 *)(key + key_field->offset));
} else if (key_field->flags & HIST_FIELD_FL_STRING) {
- seq_printf(m, "%s: %-50s", key_field->field->name,
+ seq_printf(m, "%s: %-50s", field_name,
(char *)(key + key_field->offset));
} else {
uval = *(u64 *)(key + key_field->offset);
- seq_printf(m, "%s: %10llu", key_field->field->name,
- uval);
+ seq_printf(m, "%s: %10llu", field_name, uval);
}
}

@@ -1021,13 +1042,13 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
tracing_map_read_sum(elt, HITCOUNT_IDX));

for (i = 1; i < hist_data->n_vals; i++) {
+ field_name = hist_field_name(hist_data->fields[i], 0);
+
if (hist_data->fields[i]->flags & HIST_FIELD_FL_HEX) {
- seq_printf(m, " %s: %10llx",
- hist_data->fields[i]->field->name,
+ seq_printf(m, " %s: %10llx", field_name,
tracing_map_read_sum(elt, i));
} else {
- seq_printf(m, " %s: %10llu",
- hist_data->fields[i]->field->name,
+ seq_printf(m, " %s: %10llu", field_name,
tracing_map_read_sum(elt, i));
}
}
@@ -1142,7 +1163,9 @@ static const char *get_hist_field_flags(struct hist_field *hist_field)

static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
{
- seq_printf(m, "%s", hist_field->field->name);
+ const char *field_name = hist_field_name(hist_field, 0);
+
+ seq_printf(m, "%s", field_name);
if (hist_field->flags) {
const char *flags_str = get_hist_field_flags(hist_field);

--
1.9.3

2017-09-22 20:00:05

by Tom Zanussi

[permalink] [raw]

Subject: [PATCH 9/9] tracing: Reimplement log2

log2 as currently implemented applies only to u64 trace_event_field
derived fields, and assumes that anything it's applied to is a u64
field.

To prepare for synthetic fields like latencies, log2 should be
applicable to those as well, so take the opportunity now to fix the
current problems as well as expand to more general uses.

log2 should be thought of as a chaining function rather than a field
type. To enable this as well as possible future function
implementations, add a hist_field operand array into the hist_field
definition for this purpose, and make use of it to implement the log2
'function'.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 4dc39e3..4eddc19 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -28,12 +28,16 @@

typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event);

+#define HIST_FIELD_OPERANDS_MAX 2
+
struct hist_field {
struct ftrace_event_field *field;
unsigned long flags;
hist_field_fn_t fn;
unsigned int size;
unsigned int offset;
+ unsigned int is_signed;
+ struct hist_field *operands[HIST_FIELD_OPERANDS_MAX];
};

static u64 hist_field_none(struct hist_field *field, void *event)
@@ -71,7 +75,9 @@ static u64 hist_field_pstring(struct hist_field *hist_field, void *event)

static u64 hist_field_log2(struct hist_field *hist_field, void *event)
{
- u64 val = *(u64 *)(event + hist_field->field->offset);
+ struct hist_field *operand = hist_field->operands[0];
+
+ u64 val = operand->fn(operand, event);

return (u64) ilog2(roundup_pow_of_two(val));
}
@@ -156,6 +162,8 @@ static const char *hist_field_name(struct hist_field *field,

if (field->field)
field_name = field->field->name;
+ else if (field->flags & HIST_FIELD_FL_LOG2)
+ field_name = hist_field_name(field->operands[0], ++level);

if (field_name == NULL)
field_name = "";
@@ -357,8 +365,20 @@ static void hist_trigger_elt_comm_init(struct tracing_map_elt *elt)
.elt_init = hist_trigger_elt_comm_init,
};

-static void destroy_hist_field(struct hist_field *hist_field)
+static void destroy_hist_field(struct hist_field *hist_field,
+ unsigned int level)
{
+ unsigned int i;
+
+ if (level > 2)
+ return;
+
+ if (!hist_field)
+ return;
+
+ for (i = 0; i < HIST_FIELD_OPERANDS_MAX; i++)
+ destroy_hist_field(hist_field->operands[i], level + 1);
+
kfree(hist_field);
}

@@ -385,7 +405,10 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
}

if (flags & HIST_FIELD_FL_LOG2) {
+ unsigned long fl = flags & ~HIST_FIELD_FL_LOG2;
hist_field->fn = hist_field_log2;
+ hist_field->operands[0] = create_hist_field(field, fl);
+ hist_field->size = hist_field->operands[0]->size;
goto out;
}

@@ -405,7 +428,7 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
hist_field->fn = select_value_fn(field->size,
field->is_signed);
if (!hist_field->fn) {
- destroy_hist_field(hist_field);
+ destroy_hist_field(hist_field, 0);
return NULL;
}
}
@@ -422,7 +445,7 @@ static void destroy_hist_fields(struct hist_trigger_data *hist_data)

for (i = 0; i < TRACING_MAP_FIELDS_MAX; i++) {
if (hist_data->fields[i]) {
- destroy_hist_field(hist_data->fields[i]);
+ destroy_hist_field(hist_data->fields[i], 0);
hist_data->fields[i] = NULL;
}
}
--
1.9.3

2017-09-22 20:11:05

by Tom Zanussi

[permalink] [raw]

Subject: [PATCH 2/9] tracing: Reverse the order of trace_types_lock and event_mutex

From: "Steven Rostedt (VMware)" <[email protected]>

In order to make future changes where we need to call
tracing_set_clock() from within an event command, the order of
trace_types_lock and event_mutex must be reversed, as the event command
will hold event_mutex and the trace_types_lock is taken from within
tracing_set_clock().

Requested-by: Tom Zanussi <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
---
kernel/trace/trace.c | 5 +++++
kernel/trace/trace_events.c | 31 +++++++++++++++----------------
2 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5360b7a..b0ad927 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7674,6 +7674,7 @@ static int instance_mkdir(const char *name)
struct trace_array *tr;
int ret;

+ mutex_lock(&event_mutex);
mutex_lock(&trace_types_lock);

ret = -EEXIST;
@@ -7729,6 +7730,7 @@ static int instance_mkdir(const char *name)
list_add(&tr->list, &ftrace_trace_arrays);

mutex_unlock(&trace_types_lock);
+ mutex_unlock(&event_mutex);

return 0;

@@ -7740,6 +7742,7 @@ static int instance_mkdir(const char *name)

out_unlock:
mutex_unlock(&trace_types_lock);
+ mutex_unlock(&event_mutex);

return ret;

@@ -7752,6 +7755,7 @@ static int instance_rmdir(const char *name)
int ret;
int i;

+ mutex_lock(&event_mutex);
mutex_lock(&trace_types_lock);

ret = -ENODEV;
@@ -7797,6 +7801,7 @@ static int instance_rmdir(const char *name)

out_unlock:
mutex_unlock(&trace_types_lock);
+ mutex_unlock(&event_mutex);

return ret;
}
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 8746839..ec0f9aa 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1406,8 +1406,8 @@ static int subsystem_open(struct inode *inode, struct file *filp)
return -ENODEV;

/* Make sure the system still exists */
- mutex_lock(&trace_types_lock);
mutex_lock(&event_mutex);
+ mutex_lock(&trace_types_lock);
list_for_each_entry(tr, &ftrace_trace_arrays, list) {
list_for_each_entry(dir, &tr->systems, list) {
if (dir == inode->i_private) {
@@ -1421,8 +1421,8 @@ static int subsystem_open(struct inode *inode, struct file *filp)
}
}
exit_loop:
- mutex_unlock(&event_mutex);
mutex_unlock(&trace_types_lock);
+ mutex_unlock(&event_mutex);

if (!system)
return -ENODEV;
@@ -2294,15 +2294,15 @@ void trace_event_eval_update(struct trace_eval_map **map, int len)
int trace_add_event_call(struct trace_event_call *call)
{
int ret;
- mutex_lock(&trace_types_lock);
mutex_lock(&event_mutex);
+ mutex_lock(&trace_types_lock);

ret = __register_event(call, NULL);
if (ret >= 0)
__add_event_to_tracers(call);

- mutex_unlock(&event_mutex);
mutex_unlock(&trace_types_lock);
+ mutex_unlock(&event_mutex);
return ret;
}

@@ -2356,13 +2356,13 @@ int trace_remove_event_call(struct trace_event_call *call)
{
int ret;

- mutex_lock(&trace_types_lock);
mutex_lock(&event_mutex);
+ mutex_lock(&trace_types_lock);
down_write(&trace_event_sem);
ret = probe_remove_event_call(call);
up_write(&trace_event_sem);
- mutex_unlock(&event_mutex);
mutex_unlock(&trace_types_lock);
+ mutex_unlock(&event_mutex);

return ret;
}
@@ -2424,8 +2424,8 @@ static int trace_module_notify(struct notifier_block *self,
{
struct module *mod = data;

- mutex_lock(&trace_types_lock);
mutex_lock(&event_mutex);
+ mutex_lock(&trace_types_lock);
switch (val) {
case MODULE_STATE_COMING:
trace_module_add_events(mod);
@@ -2434,8 +2434,8 @@ static int trace_module_notify(struct notifier_block *self,
trace_module_remove_events(mod);
break;
}
- mutex_unlock(&event_mutex);
mutex_unlock(&trace_types_lock);
+ mutex_unlock(&event_mutex);

return 0;
}
@@ -2950,24 +2950,24 @@ static __init int setup_trace_event(char *str)
* creates the event hierachry in the @parent/events directory.
*
* Returns 0 on success.
+ *
+ * Must be called with event_mutex held.
*/
int event_trace_add_tracer(struct dentry *parent, struct trace_array *tr)
{
int ret;

- mutex_lock(&event_mutex);
+ lockdep_assert_held(&event_mutex);

ret = create_event_toplevel_files(parent, tr);
if (ret)
- goto out_unlock;
+ goto out;

down_write(&trace_event_sem);
__trace_add_event_dirs(tr);
up_write(&trace_event_sem);

- out_unlock:
- mutex_unlock(&event_mutex);
-
+ out:
return ret;
}

@@ -2996,9 +2996,10 @@ int event_trace_add_tracer(struct dentry *parent, struct trace_array *tr)
return ret;
}

+/* Must be called with event_mutex held */
int event_trace_del_tracer(struct trace_array *tr)
{
- mutex_lock(&event_mutex);
+ lockdep_assert_held(&event_mutex);

/* Disable any event triggers and associated soft-disabled events */
clear_event_triggers(tr);
@@ -3019,8 +3020,6 @@ int event_trace_del_tracer(struct trace_array *tr)

tr->event_dir = NULL;

- mutex_unlock(&event_mutex);
-
return 0;
}

--
1.9.3