2019-06-21 16:19:23

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH 00/11] tracing: of: Boot time tracing using devicetree

Hi,

Here is an RFC series of patches to add boot-time tracing using
devicetree.

Currently, kernel support boot-time tracing using kernel command-line
parameters. But that is very limited because of limited expressions
and limited length of command line. Recently, useful features like
histogram, synthetic events, etc. are being added to ftrace, but it is
clear that we can not expand command-line options to support these
features.

Hoever, I've found that there is a devicetree which can pass more
structured commands to kernel at boot time :) The devicetree is usually
used for dscribing hardware configuration, but I think we can expand it
for software configuration too (e.g. AOSP and OPTEE already introduced
firmware node.) Also, grub and qemu already supports loading devicetree,
so we can use it not only on embedded devices but also on x86 PC too.

With the devicetree, we can setup new kprobe and synthetic events, more
complicated event filters and trigger actions including histogram.

For example, following kernel parameters

trace_options=sym-addr trace_event=initcall:* tp_printk trace_buf_size=1M

it can be written in devicetree like below.

ftrace {
compatible = "linux,ftrace";
options = "sym-addr";
events = "initcall:*";
tp-printk;
buffer-size-kb = <0x400>; // 1024KB == 1MB
};

Moreover, now we can expand it to add filters for events, kprobe events,
and synthetic events with histogram like below.

ftrace {
compatible = "linux,ftrace";
...
event0 {
event = "task:task_newtask";
filter = "pid < 128"; // adding filters
enable;
};
event1 {
event = "kprobes:vfs_read";
probes = "vfs_read $arg1 $arg2"; // add kprobes
filter = "common_pid < 200";
enable;
};
event2 {
event = "initcall_latency"; // add synth event
fields = "unsigned long func", "u64 lat";
// with histogram
actions = "hist:keys=func.sym,lat:vals=lat:sort=lat";
};
// and synthetic event callers
event3 {
event = "initcall:initcall_start";
actions = "hist:keys=func:ts0=common_timestamp.usecs";
};
event4 {
event = "initcall:initcall_finish";
actions = "hist:keys=func:lat=common_timestamp.usecs-$ts0:onmatch(initcall.initcall_start).initcall_latency(func,$lat)";
};
};

These complex configuration can not be done by kernel parameters.
However, this is not replacing boot-time tracing by kernel parameters.
This devicetree settings are applied in fs_initcall() stage, but kernel
parameters are applied earlier stage. Anyway, this is enough useful
for debugging/tracing kernel driver initializations.

I would like to discuss on some points about this idea.

- Can we use devicetree for configuring kernel dynamically?
- Would you have any comment for the devicetree format and default
behaviors?
- Currently, kprobe and synthetic events are defined inside event
node, but it is able to define globally in ftrace node. Which is
better?
- Do we need to support "status" property on each event node so
that someone can prepare "dtsi" include file and override the status?
- Do we need instance-wide pid filter (set_event_pid) when boot-time?
- Do we need more structured tree, like spliting event and group,
event actions and probes to be a tree of node, etc?
- Do we need per group filter & enablement support?
- How to support instances? (nested tree or different tree?)
- What kind of options would we need?

Some kernel parameters are not implemented yet, like ftrace_filter,
ftrace_notrace, etc. These will be implemented afterwards.

Thank you,

---

Masami Hiramatsu (11):
tracing: Apply soft-disabled and filter to tracepoints printk
tracing: kprobes: Output kprobe event to printk buffer
tracing: Expose EXPORT_SYMBOL_GPL symbol
tracing: kprobes: Register to dynevent earlier stage
tracing: Accept different type for synthetic event fields
tracing: Add NULL trace-array check in print_synth_event()
dt-bindings: tracing: Add ftrace binding document
tracing: of: Add setup tracing by devicetree support
tracing: of: Add trace event settings
tracing: of: Add kprobe event support
tracing: of: Add synthetic event support


.../devicetree/bindings/tracing/ftrace.yaml | 170 +++++++++++
include/linux/trace_events.h | 1
kernel/trace/Kconfig | 10 +
kernel/trace/Makefile | 1
kernel/trace/trace.c | 49 ++-
kernel/trace/trace_events.c | 3
kernel/trace/trace_events_hist.c | 14 +
kernel/trace/trace_events_trigger.c | 2
kernel/trace/trace_kprobe.c | 81 +++--
kernel/trace/trace_of.c | 311 ++++++++++++++++++++
10 files changed, 589 insertions(+), 53 deletions(-)
create mode 100644 Documentation/devicetree/bindings/tracing/ftrace.yaml
create mode 100644 kernel/trace/trace_of.c

--
Masami Hiramatsu (Linaro) <[email protected]>


2019-06-21 16:19:41

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH 05/11] tracing: Accept different type for synthetic event fields

Make the synthetic event accepts a different type field to record.
However, the size and signed flag must be same.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
kernel/trace/trace_events_hist.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index ca6b0dff60c5..a7f447195143 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -4063,8 +4063,11 @@ static int check_synth_field(struct synth_event *event,

field = event->fields[field_pos];

- if (strcmp(field->type, hist_field->type) != 0)
- return -EINVAL;
+ if (strcmp(field->type, hist_field->type) != 0) {
+ if (field->size != hist_field->size ||
+ field->is_signed != hist_field->is_signed)
+ return -EINVAL;
+ }

return 0;
}

2019-06-21 16:20:08

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH 07/11] dt-bindings: tracing: Add ftrace binding document

Add a devicetree binding document for ftrace node.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
.../devicetree/bindings/tracing/ftrace.yaml | 170 ++++++++++++++++++++
1 file changed, 170 insertions(+)
create mode 100644 Documentation/devicetree/bindings/tracing/ftrace.yaml

diff --git a/Documentation/devicetree/bindings/tracing/ftrace.yaml b/Documentation/devicetree/bindings/tracing/ftrace.yaml
new file mode 100644
index 000000000000..cbd7af986c38
--- /dev/null
+++ b/Documentation/devicetree/bindings/tracing/ftrace.yaml
@@ -0,0 +1,170 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+# Copyright 2019 Linaro Ltd.
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/tracing/ftrace.yaml#"
+$schema: "http://devicetree.org/meta-schemas/core.yaml#"
+
+title: Ftrace setting devicetree binding
+
+maintainers:
+ - Masami Hiramatsu <[email protected]>
+
+description: |
+ Boot-time ftrace tracing setting via devicetree. Users can use devicetree node
+ for programming ftrace boot-time tracing.
+
+properties:
+ compatible:
+ items:
+ - const: linux,ftrace
+
+ trace-clock:
+ allOf:
+ - $ref: /schemas/types.yaml#/definitions/string
+ - enum: [ global, local, counter, uptime, perf, mono, mono_raw, boot, ppc-tb, x86-tsc ]
+ description: Specify which clock method is used for trace-clock.
+
+ dump-on-oops:
+ allOf:
+ - $ref: /schemas/types.yaml#/definitions/uint32
+ - enum: [0, 1, 2]
+ description: |
+ A neumerical flag to enable ftrace dump on Kernel Oops. 0 means no dump,
+ 1 means dump on the origin cpu of the oops, and means dump on all cpus.
+
+ traceoff-on-warning:
+ $ref: /schemas/types.yaml#/definitions/flag
+ description: A flag to stop tracing on warning.
+
+ tp-printk:
+ $ref: /schemas/types.yaml#/definitions/flag
+ description: A flag to send tracing output to printk buffer too.
+
+ alloc-snapshot:
+ $ref: /schemas/types.yaml#/definitions/flag
+ description: |
+ A flag to allocate snapshot buffer at boot. This will be required if you
+ use "snapshot" action on some events.
+
+ buffer-size-kb:
+ allOf:
+ - $ref: /schemas/types.yaml#/definitions/uint32
+ - minimum: 1
+ description: |
+ The size of per-cpu tracing buffer in KByte. Note that the size must be
+ larger than page size, and total size of buffers depends on the number
+ of CPUs.
+
+ events:
+ minItems: 1
+ $ref: /schemas/types.yaml#/definitions/string-array
+ description: |
+ A string array of enabling events (including wildcard patterns).
+ See Documentation/trace/events.rst for detail.
+
+ options:
+ minItems: 1
+ $ref: /schemas/types.yaml#/definitions/string-array
+ description: |
+ A string array of trace options for ftrace to control what gets printed
+ in the trace output or manipulate the tracers.
+ See Documentation/trace/ftrace.rst#trace_options for detail.
+
+ tracer:
+ default: nop
+ $ref: /schemas/types.yaml#/definitions/string
+ description: A string of the tracer to start up.
+
+patternProperties:
+ "event[0-9a-fA-F]+$":
+ type: object
+
+ description: |
+ event-* properties are child nodes for per-event settings. It is also
+ able to define new kprobe events and synthetic events. Note that you
+ can not define both "probes" and "fields" properties on same event.
+
+ properties:
+ event:
+ $ref: /schemas/types.yaml#/definitions/string
+ description: |
+ Event name string formatted as "GROUP:EVENT". For synthetic event,
+ you must use "synthetic" for group name. For kprobe and synthetic
+ event, you can ommit the group name.
+
+ probes:
+ minItems: 1
+ $ref: /schemas/types.yaml#/definitions/string-array
+ description: |
+ A string array of kprobe event definitions, including location
+ (symbol+offset) and event arguments.
+ See Documentation/trace/kprobetrace.rst for detail.
+
+ fields:
+ minItems: 1
+ $ref: /schemas/types.yaml#/definitions/string-array
+ description: |
+ A string of synthetic event's fields definition. Note that you
+ don't need to repeat event name.
+
+ filter:
+ $ref: /schemas/types.yaml#/definitions/string
+ description: A string of event filter rule
+
+ actions:
+ minItems: 1
+ $ref: /schemas/types.yaml#/definitions/string-array
+ description: A string array of event trigger actions.
+
+ enable:
+ type: boolean
+ description: |
+ A flag to enable event. Note that the event is not enabled by
+ default. (But actions will set the event soft-disabled)
+
+ oneOf:
+ - required:
+ - event
+ - required:
+ - event
+ - probes
+ - required:
+ - event
+ - fields
+
+required:
+ - compatible
+
+examples:
+ - |
+ ftrace {
+ compatible = "linux,ftrace";
+ events = "initcall:*";
+ tp-printk;
+ buffer-size-kb = <0x400>;
+ event0 {
+ event = "task:task_newtask";
+ filter = "pid < 128";
+ enable;
+ };
+ event1 {
+ event = "kprobes:vfs_read";
+ probes = "vfs_read $arg1 $arg2";
+ filter = "common_pid < 200";
+ };
+ event2 {
+ event = "initcall_latency";
+ fields = "unsigned long func", "u64 lat";
+ actions = "hist:keys=func.sym,lat:vals=lat:sort=lat";
+ };
+ event3 {
+ event = "initcall:initcall_start";
+ actions = "hist:keys=func:ts0=common_timestamp.usecs";
+ };
+ event4 {
+ event = "initcall:initcall_finish";
+ actions = "hist:keys=func:lat=common_timestamp.usecs-$ts0:onmatch(initcall.initcall_start).initcall_latency(func,$lat)";
+ };
+
+ };

2019-06-21 16:20:18

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH 04/11] tracing: kprobes: Register to dynevent earlier stage

Register kprobe event to dynevent in subsys_initcall level.
This will allow kernel to register new kprobe events in
fs_initcall level via trace_run_command.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
kernel/trace/trace_kprobe.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 3eb03cf880e1..5166a12a9d49 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1547,11 +1547,12 @@ static __init void setup_boot_kprobe_events(void)
enable_boot_kprobe_events();
}

-/* Make a tracefs interface for controlling probe points */
-static __init int init_kprobe_trace(void)
+/*
+ * Register dynevent at subsys_initcall. This allows kernel to setup kprobe
+ * events in fs_initcall without tracefs.
+ */
+static __init int init_kprobe_trace_early(void)
{
- struct dentry *d_tracer;
- struct dentry *entry;
int ret;

ret = dyn_event_register(&trace_kprobe_ops);
@@ -1561,6 +1562,16 @@ static __init int init_kprobe_trace(void)
if (register_module_notifier(&trace_kprobe_module_nb))
return -EINVAL;

+ return 0;
+}
+subsys_initcall(init_kprobe_trace_early);
+
+/* Make a tracefs interface for controlling probe points */
+static __init int init_kprobe_trace(void)
+{
+ struct dentry *d_tracer;
+ struct dentry *entry;
+
d_tracer = tracing_init_dentry();
if (IS_ERR(d_tracer))
return 0;

2019-06-21 16:20:26

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH 09/11] tracing: of: Add trace event settings

Add per-event settings, which includes filter and actions.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
kernel/trace/trace_events_trigger.c | 2 -
kernel/trace/trace_of.c | 81 +++++++++++++++++++++++++++++++++++
2 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 2a2912cb4533..74a19c18219f 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -208,7 +208,7 @@ static int event_trigger_regex_open(struct inode *inode, struct file *file)
return ret;
}

-static int trigger_process_regex(struct trace_event_file *file, char *buff)
+int trigger_process_regex(struct trace_event_file *file, char *buff)
{
char *command, *next = buff;
struct event_command *p;
diff --git a/kernel/trace/trace_of.c b/kernel/trace/trace_of.c
index 5e34e2475d42..696f59234e62 100644
--- a/kernel/trace/trace_of.c
+++ b/kernel/trace/trace_of.c
@@ -22,6 +22,7 @@ extern int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set);
extern void __init trace_init_tracepoint_printk(void);
extern ssize_t tracing_resize_ring_buffer(struct trace_array *tr,
unsigned long size, int cpu_id);
+extern int trigger_process_regex(struct trace_event_file *file, char *buff);

static void __init
trace_of_set_ftrace_options(struct trace_array *tr, struct device_node *node)
@@ -98,6 +99,85 @@ trace_of_enable_events(struct trace_array *tr, struct device_node *node)
}
}

+static void __init
+trace_of_init_one_event(struct trace_array *tr, struct device_node *node)
+{
+ struct trace_event_file *file;
+ struct property *prop;
+ char buf[MAX_BUF_LEN];
+ char *bufp;
+ const char *p;
+ int err;
+
+ if (!of_node_name_prefix(node, "event"))
+ return;
+
+ err = of_property_read_string(node, "event", &p);
+ if (err) {
+ pr_err("Failed to find event on %s\n", of_node_full_name(node));
+ return;
+ }
+
+ err = strlcpy(buf, p, ARRAY_SIZE(buf));
+ if (err >= ARRAY_SIZE(buf)) {
+ pr_err("Event name is too long: %s\n", p);
+ return;
+ }
+ bufp = buf;
+
+ p = strsep(&bufp, ":");
+ if (!bufp) {
+ pr_err("%s has no group name\n", buf);
+ return;
+ }
+
+ mutex_lock(&event_mutex);
+ file = find_event_file(tr, buf, bufp);
+ if (!file) {
+ pr_err("Failed to find event: %s\n", buf);
+ return;
+ }
+
+ err = of_property_read_string(node, "filter", &p);
+ if (!err) {
+ if (strlcpy(buf, p, ARRAY_SIZE(buf)) >= ARRAY_SIZE(buf)) {
+ pr_err("filter string is too long: %s\n", p);
+ return;
+ }
+
+ if (apply_event_filter(file, buf) < 0) {
+ pr_err("Failed to apply filter: %s\n", buf);
+ return;
+ }
+ }
+
+ of_property_for_each_string(node, "actions", prop, p) {
+ if (strlcpy(buf, p, ARRAY_SIZE(buf)) >= ARRAY_SIZE(buf)) {
+ pr_err("action string is too long: %s\n", p);
+ continue;
+ }
+
+ if (trigger_process_regex(file, buf) < 0)
+ pr_err("Failed to apply an action: %s\n", buf);
+ }
+
+ if (of_property_read_bool(node, "enable")) {
+ if (trace_event_enable_disable(file, 1, 0) < 0)
+ pr_err("Failed to enable event node: %s\n",
+ of_node_full_name(node));
+ }
+ mutex_unlock(&event_mutex);
+}
+
+static void __init
+trace_of_init_events(struct trace_array *tr, struct device_node *node)
+{
+ struct device_node *enode;
+
+ for_each_child_of_node(node, enode)
+ trace_of_init_one_event(tr, enode);
+}
+
static void __init
trace_of_enable_tracer(struct trace_array *tr, struct device_node *trace_node)
{
@@ -126,6 +206,7 @@ static int __init trace_of_init(void)
goto end;

trace_of_set_ftrace_options(tr, trace_node);
+ trace_of_init_events(tr, trace_node);
trace_of_enable_events(tr, trace_node);
trace_of_enable_tracer(tr, trace_node);


2019-06-21 16:20:34

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH 03/11] tracing: Expose EXPORT_SYMBOL_GPL symbol

Since ftrace_set_clr_event is already exported by EXPORT_SYMBOL_GPL,
it should not be static.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
kernel/trace/trace_events.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 733ae975e7b8..c9b458dcdd36 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -796,7 +796,7 @@ static int __ftrace_set_clr_event(struct trace_array *tr, const char *match,
return ret;
}

-static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set)
+int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set)
{
char *event = NULL, *sub = NULL, *match;
int ret;

2019-06-21 16:21:10

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH 06/11] tracing: Add NULL trace-array check in print_synth_event()

Add NULL trace-array check in print_synth_event(), because
if we enable tp_printk option, iter->tr can be NULL.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
kernel/trace/trace_events_hist.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index a7f447195143..db973928e580 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -822,7 +822,7 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
fmt = synth_field_fmt(se->fields[i]->type);

/* parameter types */
- if (tr->trace_flags & TRACE_ITER_VERBOSE)
+ if (tr && tr->trace_flags & TRACE_ITER_VERBOSE)
trace_seq_printf(s, "%s ", fmt);

snprintf(print_fmt, sizeof(print_fmt), "%%s=%s%%s", fmt);

2019-06-21 16:21:29

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH 08/11] tracing: of: Add setup tracing by devicetree support

Setup tracing options by devicetree instead of kernel parameters.

Since the kernel parameter is limited length, sometimes there is
no space to setup the tracing options. This will read the tracing
options from devicetree "ftrace" node and setup tracers at boot.

Note that this is not replacing the kernel parameters, because
this devicetree based setting is later than that. If you want to
trace earlier boot events, you still need kernel parameters.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
kernel/trace/Kconfig | 10 +++
kernel/trace/Makefile | 1
kernel/trace/trace.c | 38 +++++++++----
kernel/trace/trace_of.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 173 insertions(+), 13 deletions(-)
create mode 100644 kernel/trace/trace_of.c

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 07d22c61b634..025198bb2764 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -796,6 +796,16 @@ config GCOV_PROFILE_FTRACE
Note that on a kernel compiled with this config, ftrace will
run significantly slower.

+config OF_TRACING
+ bool "Boot Trace Programming by Devicetree"
+ depends on OF && TRACING
+ select OF_EARLY_FLATTREE
+ default y
+ help
+ Enable developer to setup ftrace subsystem via devicetree
+ at boot time for debugging (tracing) driver initialization
+ and boot process.
+
endif # FTRACE

endif # TRACING_SUPPORT
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index c2b2148bb1d2..6f68271f5c56 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -82,6 +82,7 @@ endif
obj-$(CONFIG_DYNAMIC_EVENTS) += trace_dynevent.o
obj-$(CONFIG_PROBE_EVENTS) += trace_probe.o
obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
+obj-$(CONFIG_OF_TRACING) += trace_of.o

obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6530f6004643..7d481260bdd6 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -158,7 +158,7 @@ union trace_eval_map_item {
static union trace_eval_map_item *trace_eval_maps;
#endif /* CONFIG_TRACE_EVAL_MAP_FILE */

-static int tracing_set_tracer(struct trace_array *tr, const char *buf);
+int tracing_set_tracer(struct trace_array *tr, const char *buf);
static void ftrace_trace_userstack(struct ring_buffer *buffer,
unsigned long flags, int pc);

@@ -178,6 +178,11 @@ static int __init set_cmdline_ftrace(char *str)
}
__setup("ftrace=", set_cmdline_ftrace);

+void __init trace_init_dump_on_oops(int mode)
+{
+ ftrace_dump_on_oops = mode;
+}
+
static int __init set_ftrace_dump_on_oops(char *str)
{
if (*str++ != '=' || !*str) {
@@ -4614,7 +4619,7 @@ int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
return 0;
}

-static int trace_set_options(struct trace_array *tr, char *option)
+int trace_set_options(struct trace_array *tr, char *option)
{
char *cmp;
int neg = 0;
@@ -5505,8 +5510,8 @@ static int __tracing_resize_ring_buffer(struct trace_array *tr,
return ret;
}

-static ssize_t tracing_resize_ring_buffer(struct trace_array *tr,
- unsigned long size, int cpu_id)
+ssize_t tracing_resize_ring_buffer(struct trace_array *tr,
+ unsigned long size, int cpu_id)
{
int ret = size;

@@ -5585,7 +5590,7 @@ static void add_tracer_options(struct trace_array *tr, struct tracer *t)
create_trace_option_files(tr, t);
}

-static int tracing_set_tracer(struct trace_array *tr, const char *buf)
+int tracing_set_tracer(struct trace_array *tr, const char *buf)
{
struct tracer *t;
#ifdef CONFIG_TRACER_MAX_TRACE
@@ -9160,16 +9165,23 @@ __init static int tracer_alloc_buffers(void)
return ret;
}

+void __init trace_init_tracepoint_printk(void)
+{
+ tracepoint_printk = 1;
+
+ tracepoint_print_iter =
+ kmalloc(sizeof(*tracepoint_print_iter), GFP_KERNEL);
+ if (WARN_ON(!tracepoint_print_iter))
+ tracepoint_printk = 0;
+ else
+ static_key_enable(&tracepoint_printk_key.key);
+}
+
void __init early_trace_init(void)
{
- if (tracepoint_printk) {
- tracepoint_print_iter =
- kmalloc(sizeof(*tracepoint_print_iter), GFP_KERNEL);
- if (WARN_ON(!tracepoint_print_iter))
- tracepoint_printk = 0;
- else
- static_key_enable(&tracepoint_printk_key.key);
- }
+ if (tracepoint_printk)
+ trace_init_tracepoint_printk();
+
tracer_alloc_buffers();
}

diff --git a/kernel/trace/trace_of.c b/kernel/trace/trace_of.c
new file mode 100644
index 000000000000..5e34e2475d42
--- /dev/null
+++ b/kernel/trace/trace_of.c
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * trace_of.c
+ * devicetree tracing programming APIs
+ */
+
+#define pr_fmt(fmt) "trace_of: " fmt
+
+#include <linux/ftrace.h>
+#include <linux/init.h>
+#include <linux/of.h>
+
+#include "trace.h"
+
+#define MAX_BUF_LEN 256
+
+extern int trace_set_options(struct trace_array *tr, char *option);
+extern enum ftrace_dump_mode ftrace_dump_on_oops;
+extern int __disable_trace_on_warning;
+extern int tracing_set_tracer(struct trace_array *tr, const char *buf);
+extern int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set);
+extern void __init trace_init_tracepoint_printk(void);
+extern ssize_t tracing_resize_ring_buffer(struct trace_array *tr,
+ unsigned long size, int cpu_id);
+
+static void __init
+trace_of_set_ftrace_options(struct trace_array *tr, struct device_node *node)
+{
+ struct property *prop;
+ const char *p;
+ char buf[MAX_BUF_LEN];
+ u32 v = 0;
+ int err;
+
+ /* Common ftrace options */
+ of_property_for_each_string(node, "options", prop, p) {
+ if (strlcpy(buf, p, ARRAY_SIZE(buf)) >= ARRAY_SIZE(buf)) {
+ pr_err("String is too long: %s\n", p);
+ continue;
+ }
+
+ if (trace_set_options(tr, buf) < 0)
+ pr_err("Failed to set option: %s\n", buf);
+ }
+
+ err = of_property_read_string(node, "trace-clock", &p);
+ if (!err) {
+ if (tracing_set_clock(tr, p) < 0)
+ pr_err("Failed to set trace clock: %s\n", p);
+ }
+
+ /* Command line boot options */
+ if (of_find_property(node, "dump-on-oops", NULL)) {
+ err = of_property_read_u32_index(node, "dump-on-oops", 0, &v);
+ if (err || v == 1)
+ ftrace_dump_on_oops = DUMP_ALL;
+ else if (!err && v == 2)
+ ftrace_dump_on_oops = DUMP_ORIG;
+ }
+
+ if (of_find_property(node, "traceoff-on-warning", NULL))
+ __disable_trace_on_warning = 1;
+
+ if (of_find_property(node, "tp-printk", NULL))
+ trace_init_tracepoint_printk();
+
+ /* This accepts per-cpu buffer size in KB */
+ err = of_property_read_u32_index(node, "buffer-size-kb", 0, &v);
+ if (!err) {
+ v <<= 10; /* KB to Byte */
+ if (v < PAGE_SIZE)
+ pr_err("Buffer size is too small: %d KB\n", v >> 10);
+ if (tracing_resize_ring_buffer(tr, v, RING_BUFFER_ALL_CPUS) < 0)
+ pr_err("Failed to resize trace buffer to %d KB\n",
+ v >> 10);
+ }
+
+ if (of_find_property(node, "alloc-snapshot", NULL))
+ if (tracing_alloc_snapshot() < 0)
+ pr_err("Failed to allocate snapshot buffer\n");
+}
+
+static void __init
+trace_of_enable_events(struct trace_array *tr, struct device_node *node)
+{
+ struct property *prop;
+ char buf[MAX_BUF_LEN];
+ const char *p;
+
+ of_property_for_each_string(node, "events", prop, p) {
+ if (strlcpy(buf, p, ARRAY_SIZE(buf)) >= ARRAY_SIZE(buf)) {
+ pr_err("String is too long: %s\n", p);
+ continue;
+ }
+
+ if (ftrace_set_clr_event(tr, buf, 1) < 0)
+ pr_err("Failed to enable event: %s\n", p);
+ }
+}
+
+static void __init
+trace_of_enable_tracer(struct trace_array *tr, struct device_node *trace_node)
+{
+ const char *p;
+
+ if (!of_property_read_string(trace_node, "tracer", &p)) {
+ if (tracing_set_tracer(tr, p) < 0)
+ pr_err("Failed to set given tracer: %s\n", p);
+
+ }
+}
+
+static int __init trace_of_init(void)
+{
+ struct device_node *trace_node;
+ struct trace_array *tr;
+
+ trace_node = of_find_compatible_node(NULL, NULL, "linux,ftrace");
+ if (!trace_node)
+ return 0;
+
+ trace_node = of_node_get(trace_node);
+
+ tr = top_trace_array();
+ if (!tr)
+ goto end;
+
+ trace_of_set_ftrace_options(tr, trace_node);
+ trace_of_enable_events(tr, trace_node);
+ trace_of_enable_tracer(tr, trace_node);
+
+end:
+ of_node_put(trace_node);
+ return 0;
+}
+
+fs_initcall(trace_of_init);

2019-06-21 16:21:51

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH 10/11] tracing: of: Add kprobe event support

Add kprobe event support in event node. User can add probe definitions
by "probes" property as a string array.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
kernel/trace/trace_kprobe.c | 5 +++
kernel/trace/trace_of.c | 65 ++++++++++++++++++++++++++++++++++++-------
2 files changed, 60 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5166a12a9d49..cc5ba13028cd 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -765,6 +765,11 @@ static int create_or_delete_trace_kprobe(int argc, char **argv)
return ret == -ECANCELED ? -EINVAL : ret;
}

+int trace_kprobe_run_command(const char *command)
+{
+ return trace_run_command(command, create_or_delete_trace_kprobe);
+}
+
static int trace_kprobe_release(struct dyn_event *ev)
{
struct trace_kprobe *tk = to_trace_kprobe(ev);
diff --git a/kernel/trace/trace_of.c b/kernel/trace/trace_of.c
index 696f59234e62..43d87e5065a3 100644
--- a/kernel/trace/trace_of.c
+++ b/kernel/trace/trace_of.c
@@ -23,6 +23,7 @@ extern void __init trace_init_tracepoint_printk(void);
extern ssize_t tracing_resize_ring_buffer(struct trace_array *tr,
unsigned long size, int cpu_id);
extern int trigger_process_regex(struct trace_event_file *file, char *buff);
+extern int trace_kprobe_run_command(const char *command);

static void __init
trace_of_set_ftrace_options(struct trace_array *tr, struct device_node *node)
@@ -99,14 +100,47 @@ trace_of_enable_events(struct trace_array *tr, struct device_node *node)
}
}

+static int __init
+trace_of_add_kprobe_event(struct device_node *node,
+ const char *group, const char *event)
+{
+ struct property *prop;
+ char buf[MAX_BUF_LEN];
+ const char *p;
+ char *q;
+ int len, ret;
+
+ len = snprintf(buf, ARRAY_SIZE(buf) - 1, "p:%s/%s ", group, event);
+ if (len >= ARRAY_SIZE(buf)) {
+ pr_err("Event name is too long: %s\n", event);
+ return -E2BIG;
+ }
+ q = buf + len;
+ len = ARRAY_SIZE(buf) - len;
+
+ of_property_for_each_string(node, "probes", prop, p) {
+ if (strlcpy(q, p, len) >= len) {
+ pr_err("Probe definition is too long: %s\n", p);
+ return -E2BIG;
+ }
+ ret = trace_kprobe_run_command(buf);
+ if (ret < 0) {
+ pr_err("Failed to add probe: %s\n", buf);
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
static void __init
trace_of_init_one_event(struct trace_array *tr, struct device_node *node)
{
struct trace_event_file *file;
struct property *prop;
char buf[MAX_BUF_LEN];
- char *bufp;
- const char *p;
+ const char *p, *group;
+ char *event;
int err;

if (!of_node_name_prefix(node, "event"))
@@ -123,18 +157,29 @@ trace_of_init_one_event(struct trace_array *tr, struct device_node *node)
pr_err("Event name is too long: %s\n", p);
return;
}
- bufp = buf;
-
- p = strsep(&bufp, ":");
- if (!bufp) {
- pr_err("%s has no group name\n", buf);
- return;
+ event = buf;
+
+ group = strsep(&event, ":");
+ /* For a kprobe event, we have to generates an event at first */
+ if (of_find_property(node, "probes", NULL)) {
+ if (!event) {
+ event = buf;
+ group = "kprobes";
+ }
+ err = trace_of_add_kprobe_event(node, group, event);
+ if (err < 0)
+ return;
+ } else {
+ if (!event) {
+ pr_err("%s has no group name\n", buf);
+ return;
+ }
}

mutex_lock(&event_mutex);
- file = find_event_file(tr, buf, bufp);
+ file = find_event_file(tr, group, event);
if (!file) {
- pr_err("Failed to find event: %s\n", buf);
+ pr_err("Failed to find event: %s:%s\n", group, event);
return;
}


2019-06-21 16:21:57

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH 11/11] tracing: of: Add synthetic event support

Add synthetic event node support. The synthetic event node must be
a child node of ftrace node, and the node must start with "synth@"
prefix. The synth node requires fields string (not string array),
which defines the fields as same as tracing/synth_events interface.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
kernel/trace/trace_events_hist.c | 5 ++++
kernel/trace/trace_of.c | 54 ++++++++++++++++++++++++++++++++++++--
2 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index db973928e580..e7f5d0a353e2 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1343,6 +1343,11 @@ static int create_or_delete_synth_event(int argc, char **argv)
return ret == -ECANCELED ? -EINVAL : ret;
}

+int synth_event_run_command(const char *command)
+{
+ return trace_run_command(command, create_or_delete_synth_event);
+}
+
static int synth_event_create(int argc, const char **argv)
{
const char *name = argv[0];
diff --git a/kernel/trace/trace_of.c b/kernel/trace/trace_of.c
index 43d87e5065a3..1b2a1306fc60 100644
--- a/kernel/trace/trace_of.c
+++ b/kernel/trace/trace_of.c
@@ -24,6 +24,7 @@ extern ssize_t tracing_resize_ring_buffer(struct trace_array *tr,
unsigned long size, int cpu_id);
extern int trigger_process_regex(struct trace_event_file *file, char *buff);
extern int trace_kprobe_run_command(const char *command);
+extern int synth_event_run_command(const char *command);

static void __init
trace_of_set_ftrace_options(struct trace_array *tr, struct device_node *node)
@@ -133,6 +134,38 @@ trace_of_add_kprobe_event(struct device_node *node,
return 0;
}

+static int __init
+trace_of_add_synth_event(struct device_node *node, const char *event)
+{
+ struct property *prop;
+ char buf[MAX_BUF_LEN], *q;
+ const char *p;
+ int len, delta, ret;
+
+ len = ARRAY_SIZE(buf);
+ delta = snprintf(buf, len, "%s", event);
+ if (delta >= len) {
+ pr_err("Event name is too long: %s\n", event);
+ return -E2BIG;
+ }
+ len -= delta; q = buf + delta;
+
+ of_property_for_each_string(node, "fields", prop, p) {
+ delta = snprintf(q, len, " %s;", p);
+ if (delta >= len) {
+ pr_err("fields string is too long: %s\n", p);
+ return -E2BIG;
+ }
+ len -= delta; q += delta;
+ }
+
+ ret = synth_event_run_command(buf);
+ if (ret < 0)
+ pr_err("Failed to add synthetic event: %s\n", buf);
+
+ return ret;
+}
+
static void __init
trace_of_init_one_event(struct trace_array *tr, struct device_node *node)
{
@@ -160,15 +193,30 @@ trace_of_init_one_event(struct trace_array *tr, struct device_node *node)
event = buf;

group = strsep(&event, ":");
- /* For a kprobe event, we have to generates an event at first */
+
+ /* Generates kprobe/synth event at first */
if (of_find_property(node, "probes", NULL)) {
+ if (of_find_property(node, "fields", NULL)) {
+ pr_err("Error: %s node has both probes and fields\n",
+ of_node_full_name(node));
+ return;
+ }
if (!event) {
event = buf;
group = "kprobes";
}
- err = trace_of_add_kprobe_event(node, group, event);
- if (err < 0)
+ if (trace_of_add_kprobe_event(node, group, event) < 0)
+ return;
+ } else if (of_find_property(node, "fields", NULL)) {
+ if (!event)
+ event = buf;
+ else if (strcmp(group, "synthetic") != 0) {
+ pr_err("Synthetic event must be in synthetic group\n");
+ return;
+ }
+ if (trace_of_add_synth_event(node, event) < 0)
return;
+ group = "synthetic";
} else {
if (!event) {
pr_err("%s has no group name\n", buf);

2019-06-23 19:59:44

by Frank Rowand

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] tracing: of: Boot time tracing using devicetree

Hi Masami,

On 6/21/19 9:18 AM, Masami Hiramatsu wrote:
> Hi,
>
> Here is an RFC series of patches to add boot-time tracing using
> devicetree.
>
> Currently, kernel support boot-time tracing using kernel command-line
> parameters. But that is very limited because of limited expressions
> and limited length of command line. Recently, useful features like
> histogram, synthetic events, etc. are being added to ftrace, but it is
> clear that we can not expand command-line options to support these
> features.

"it is clear that we can not expand command-line options" needs a fuller
explanation. And maybe further exploration.


>
> Hoever, I've found that there is a devicetree which can pass more
> structured commands to kernel at boot time :) The devicetree is usually
> used for dscribing hardware configuration, but I think we can expand it

Devicetree is standardized and documented as hardware description.


> for software configuration too (e.g. AOSP and OPTEE already introduced
> firmware node.) Also, grub and qemu already supports loading devicetree,
> so we can use it not only on embedded devices but also on x86 PC too.

Devicetree is NOT for configuration information. This has been discussed
over and over again in mail lists, at various conferences, and was also an
entire session at plumbers a few years ago:

https://elinux.org/Device_tree_future#Linux_Plumbers_2016_Device_Tree_Track

There is one part of device tree that does allow non-hardware description,
which is the "chosen" node which is provided to allow communication between
the bootloader and the kernel.

There clearly are many use cases for providing configuration information
and other types of data to a booting kernel. I have been encouraging
people to come up with an additional boot time communication channel or
data object to support this use case. So far, no serious proposal that
I am aware of.

>
> With the devicetree, we can setup new kprobe and synthetic events, more
> complicated event filters and trigger actions including histogram.
>
> For example, following kernel parameters
>
> trace_options=sym-addr trace_event=initcall:* tp_printk trace_buf_size=1M
>
> it can be written in devicetree like below.
>
> ftrace {
> compatible = "linux,ftrace";
> options = "sym-addr";
> events = "initcall:*";
> tp-printk;
> buffer-size-kb = <0x400>; // 1024KB == 1MB
> };
>
> Moreover, now we can expand it to add filters for events, kprobe events,
> and synthetic events with histogram like below.
>
> ftrace {
> compatible = "linux,ftrace";
> ...
> event0 {
> event = "task:task_newtask";
> filter = "pid < 128"; // adding filters
> enable;
> };
> event1 {
> event = "kprobes:vfs_read";
> probes = "vfs_read $arg1 $arg2"; // add kprobes
> filter = "common_pid < 200";
> enable;
> };
> event2 {
> event = "initcall_latency"; // add synth event
> fields = "unsigned long func", "u64 lat";
> // with histogram
> actions = "hist:keys=func.sym,lat:vals=lat:sort=lat";
> };
> // and synthetic event callers
> event3 {
> event = "initcall:initcall_start";
> actions = "hist:keys=func:ts0=common_timestamp.usecs";
> };
> event4 {
> event = "initcall:initcall_finish";
> actions = "hist:keys=func:lat=common_timestamp.usecs-$ts0:onmatch(initcall.initcall_start).initcall_latency(func,$lat)";
> };
> };
>
> These complex configuration can not be done by kernel parameters.
> However, this is not replacing boot-time tracing by kernel parameters.
> This devicetree settings are applied in fs_initcall() stage, but kernel
> parameters are applied earlier stage. Anyway, this is enough useful
> for debugging/tracing kernel driver initializations.
>
> I would like to discuss on some points about this idea.
>
> - Can we use devicetree for configuring kernel dynamically?

No. Sorry.

My understanding of this proposal is that it is intended to better
support boot time kernel and driver debugging. As an alternate
implementation, could you compile the ftrace configuration information
directly into a kernel data structure? It seems like it would not be
very difficult to populate the data structure data via a few macros.

-Frank


> - Would you have any comment for the devicetree format and default
> behaviors?
> - Currently, kprobe and synthetic events are defined inside event
> node, but it is able to define globally in ftrace node. Which is
> better?
> - Do we need to support "status" property on each event node so
> that someone can prepare "dtsi" include file and override the status?
> - Do we need instance-wide pid filter (set_event_pid) when boot-time?
> - Do we need more structured tree, like spliting event and group,
> event actions and probes to be a tree of node, etc?
> - Do we need per group filter & enablement support?
> - How to support instances? (nested tree or different tree?)
> - What kind of options would we need?
>
> Some kernel parameters are not implemented yet, like ftrace_filter,
> ftrace_notrace, etc. These will be implemented afterwards.
>
> Thank you,
>
> ---
>
> Masami Hiramatsu (11):
> tracing: Apply soft-disabled and filter to tracepoints printk
> tracing: kprobes: Output kprobe event to printk buffer
> tracing: Expose EXPORT_SYMBOL_GPL symbol
> tracing: kprobes: Register to dynevent earlier stage
> tracing: Accept different type for synthetic event fields
> tracing: Add NULL trace-array check in print_synth_event()
> dt-bindings: tracing: Add ftrace binding document
> tracing: of: Add setup tracing by devicetree support
> tracing: of: Add trace event settings
> tracing: of: Add kprobe event support
> tracing: of: Add synthetic event support
>
>
> .../devicetree/bindings/tracing/ftrace.yaml | 170 +++++++++++
> include/linux/trace_events.h | 1
> kernel/trace/Kconfig | 10 +
> kernel/trace/Makefile | 1
> kernel/trace/trace.c | 49 ++-
> kernel/trace/trace_events.c | 3
> kernel/trace/trace_events_hist.c | 14 +
> kernel/trace/trace_events_trigger.c | 2
> kernel/trace/trace_kprobe.c | 81 +++--
> kernel/trace/trace_of.c | 311 ++++++++++++++++++++
> 10 files changed, 589 insertions(+), 53 deletions(-)
> create mode 100644 Documentation/devicetree/bindings/tracing/ftrace.yaml
> create mode 100644 kernel/trace/trace_of.c
>
> --
> Masami Hiramatsu (Linaro) <[email protected]>
>

2019-06-24 02:52:48

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] tracing: of: Boot time tracing using devicetree

Hi Frank,

Thank you for your comment!

On Sun, 23 Jun 2019 12:58:45 -0700
Frank Rowand <[email protected]> wrote:

> Hi Masami,
>
> On 6/21/19 9:18 AM, Masami Hiramatsu wrote:
> > Hi,
> >
> > Here is an RFC series of patches to add boot-time tracing using
> > devicetree.
> >
> > Currently, kernel support boot-time tracing using kernel command-line
> > parameters. But that is very limited because of limited expressions
> > and limited length of command line. Recently, useful features like
> > histogram, synthetic events, etc. are being added to ftrace, but it is
> > clear that we can not expand command-line options to support these
> > features.
>
> "it is clear that we can not expand command-line options" needs a fuller
> explanation. And maybe further exploration.

Indeed. As an example of tracing settings in the first mail, even for simple
use-case, the trace command is long and complicated. I think it is hard to
express that as 1-liner kernel command line. But devicetree looks very good
for expressing structured data. That is great and I like it :)

> >
> > Hoever, I've found that there is a devicetree which can pass more
> > structured commands to kernel at boot time :) The devicetree is usually
> > used for dscribing hardware configuration, but I think we can expand it
>
> Devicetree is standardized and documented as hardware description.

Yes, at this moment. Can't we talk about some future things?

> > for software configuration too (e.g. AOSP and OPTEE already introduced
> > firmware node.) Also, grub and qemu already supports loading devicetree,
> > so we can use it not only on embedded devices but also on x86 PC too.
>
> Devicetree is NOT for configuration information. This has been discussed
> over and over again in mail lists, at various conferences, and was also an
> entire session at plumbers a few years ago:
>
> https://elinux.org/Device_tree_future#Linux_Plumbers_2016_Device_Tree_Track

Thanks, I'll check that.

>
> There is one part of device tree that does allow non-hardware description,
> which is the "chosen" node which is provided to allow communication between
> the bootloader and the kernel.

Ah, "chosen" will be suit for me :)

> There clearly are many use cases for providing configuration information
> and other types of data to a booting kernel. I have been encouraging
> people to come up with an additional boot time communication channel or
> data object to support this use case. So far, no serious proposal that
> I am aware of.

Hmm, then, can we add "ftrace" node under "chosen" node?
It seems that "chosen" is supporting some (flat) properties, and I would
like to add a tree of nodes for describing per-event setting.

What about something like below? (do we need "compatible" ?)

chosen {
linux,ftrace {
tp-printk;
buffer-size-kb = <400>;
event0 {
event = "...";
};
};
};

[..]
> >
> > I would like to discuss on some points about this idea.
> >
> > - Can we use devicetree for configuring kernel dynamically?
>
> No. Sorry.
>
> My understanding of this proposal is that it is intended to better
> support boot time kernel and driver debugging. As an alternate
> implementation, could you compile the ftrace configuration information
> directly into a kernel data structure? It seems like it would not be
> very difficult to populate the data structure data via a few macros.

No, that is not what I intended. My intention was to trace boot up
process "without recompiling kernel", but with a structured data.

For such purpose, we have to implement a tool to parse and pack the
data and a channel to load it at earlier stage in bootloader. And
those are already done by devicetree. Thus I thought I could get a
piggyback on devicetree.

Thank you,

--
Masami Hiramatsu <[email protected]>

2019-06-24 22:33:10

by Frank Rowand

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] tracing: of: Boot time tracing using devicetree

On 6/23/19 7:52 PM, Masami Hiramatsu wrote:
> Hi Frank,
>
> Thank you for your comment!
>
> On Sun, 23 Jun 2019 12:58:45 -0700
> Frank Rowand <[email protected]> wrote:
>
>> Hi Masami,
>>
>> On 6/21/19 9:18 AM, Masami Hiramatsu wrote:
>>> Hi,
>>>
>>> Here is an RFC series of patches to add boot-time tracing using
>>> devicetree.
>>>
>>> Currently, kernel support boot-time tracing using kernel command-line
>>> parameters. But that is very limited because of limited expressions
>>> and limited length of command line. Recently, useful features like
>>> histogram, synthetic events, etc. are being added to ftrace, but it is
>>> clear that we can not expand command-line options to support these
>>> features.
>>
>> "it is clear that we can not expand command-line options" needs a fuller
>> explanation. And maybe further exploration.
>
> Indeed. As an example of tracing settings in the first mail, even for simple
> use-case, the trace command is long and complicated. I think it is hard to
> express that as 1-liner kernel command line. But devicetree looks very good
> for expressing structured data. That is great and I like it :)

But you could extend the command line paradigm to meet your needs.

>
>>>
>>> Hoever, I've found that there is a devicetree which can pass more
>>> structured commands to kernel at boot time :) The devicetree is usually
>>> used for dscribing hardware configuration, but I think we can expand it
>>
>> Devicetree is standardized and documented as hardware description.
>
> Yes, at this moment. Can't we talk about some future things?>
>>> for software configuration too (e.g. AOSP and OPTEE already introduced
>>> firmware node.) Also, grub and qemu already supports loading devicetree,
>>> so we can use it not only on embedded devices but also on x86 PC too.
>>
>> Devicetree is NOT for configuration information. This has been discussed
>> over and over again in mail lists, at various conferences, and was also an
>> entire session at plumbers a few years ago:
>>
>> https://elinux.org/Device_tree_future#Linux_Plumbers_2016_Device_Tree_Track
>
> Thanks, I'll check that.
>
>>
>> There is one part of device tree that does allow non-hardware description,
>> which is the "chosen" node which is provided to allow communication between
>> the bootloader and the kernel.
>
> Ah, "chosen" will be suit for me :)

No. This is not communicating boot loader information.

>
>> There clearly are many use cases for providing configuration information
>> and other types of data to a booting kernel. I have been encouraging
>> people to come up with an additional boot time communication channel or
>> data object to support this use case. So far, no serious proposal that
>> I am aware of.
>
> Hmm, then, can we add "ftrace" node under "chosen" node?
> It seems that "chosen" is supporting some (flat) properties, and I would
> like to add a tree of nodes for describing per-event setting.
>
> What about something like below? (do we need "compatible" ?)
>
> chosen {
> linux,ftrace {
> tp-printk;
> buffer-size-kb = <400>;
> event0 {
> event = "...";
> };
> };
> };
>
> [..]
>>>
>>> I would like to discuss on some points about this idea.
>>>
>>> - Can we use devicetree for configuring kernel dynamically?
>>
>> No. Sorry.
>>
>> My understanding of this proposal is that it is intended to better
>> support boot time kernel and driver debugging. As an alternate
>> implementation, could you compile the ftrace configuration information
>> directly into a kernel data structure? It seems like it would not be
>> very difficult to populate the data structure data via a few macros.
>
> No, that is not what I intended. My intention was to trace boot up
> process "without recompiling kernel", but with a structured data.

That is debugging. Or if you want to be pedantic, a complex performance
measurement of the boot process (more than holding a stopwatch in your
hand).

Recompiling a single object file (containing the ftrace command data)
and re-linking the kernel is not a big price in that context). Or if
you create a new communication channel, you will have the cost of
creating that data object (certainly not much different than compiling
a devicetree) and have the bootloader provide the ftrace data object
to the kernel.

>
> For such purpose, we have to implement a tool to parse and pack the
> data and a channel to load it at earlier stage in bootloader. And
> those are already done by devicetree. Thus I thought I could get a
> piggyback on devicetree.

Devicetree is not the universal dumping ground for communicating
information to a booting kernel. Please create another communication
channel.

>
> Thank you,
>

2019-06-25 07:08:44

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] tracing: of: Boot time tracing using devicetree

Hi Frank,

On Mon, 24 Jun 2019 15:31:07 -0700
Frank Rowand <[email protected]> wrote:
> >>> Currently, kernel support boot-time tracing using kernel command-line
> >>> parameters. But that is very limited because of limited expressions
> >>> and limited length of command line. Recently, useful features like
> >>> histogram, synthetic events, etc. are being added to ftrace, but it is
> >>> clear that we can not expand command-line options to support these
> >>> features.
> >>
> >> "it is clear that we can not expand command-line options" needs a fuller
> >> explanation. And maybe further exploration.
> >
> > Indeed. As an example of tracing settings in the first mail, even for simple
> > use-case, the trace command is long and complicated. I think it is hard to
> > express that as 1-liner kernel command line. But devicetree looks very good
> > for expressing structured data. That is great and I like it :)
>
> But you could extend the command line paradigm to meet your needs.

But the kernel command line is a command line. Would you mean encoding the
structured setting in binary format with base64 and pass it? :(

> >> Devicetree is NOT for configuration information. This has been discussed
> >> over and over again in mail lists, at various conferences, and was also an
> >> entire session at plumbers a few years ago:
> >>
> >> https://elinux.org/Device_tree_future#Linux_Plumbers_2016_Device_Tree_Track
> >
> > Thanks, I'll check that.

I found following discussion in etherpad log, https://elinux.org/Device_tree_plumbers_2016_etherpad
----
If you have data that the kernel does not have a good way to get, that's OK to put into DT.

Operating points are OK - but should still be structured well.
----

This sounds like if it is structured well, and there are no other way,
we will be able to use DT as a channel.

> >>
> >> There is one part of device tree that does allow non-hardware description,
> >> which is the "chosen" node which is provided to allow communication between
> >> the bootloader and the kernel.
> >
> > Ah, "chosen" will be suit for me :)
>
> No. This is not communicating boot loader information.

Hmm, it's a kind of communication with the operator of the boot loader, since there
is an admin or developer behind it. I think the comminication is to communicate
with that human. Then if they intend to trace boot process, that is a kind of
communication.

[...]
> >>> - Can we use devicetree for configuring kernel dynamically?
> >>
> >> No. Sorry.
> >>
> >> My understanding of this proposal is that it is intended to better
> >> support boot time kernel and driver debugging. As an alternate
> >> implementation, could you compile the ftrace configuration information
> >> directly into a kernel data structure? It seems like it would not be
> >> very difficult to populate the data structure data via a few macros.
> >
> > No, that is not what I intended. My intention was to trace boot up
> > process "without recompiling kernel", but with a structured data.
>
> That is debugging. Or if you want to be pedantic, a complex performance
> measurement of the boot process (more than holding a stopwatch in your
> hand).

Yeah, that's right.

> Recompiling a single object file (containing the ftrace command data)
> and re-linking the kernel is not a big price in that context).

No, if I can use DT, I can choose one of them while boot up.
That will be a big difference.
(Of course for that purpose, I should work on boot loader to support
DT overlay)

> Or if
> you create a new communication channel, you will have the cost of
> creating that data object (certainly not much different than compiling
> a devicetree) and have the bootloader provide the ftrace data object
> to the kernel.

Yes, and for me, that sounds like just a reinvention of the wheel.
If I can reuse devicetree infrastructure, it is easily done (as I
implemented in this series. It's just about 500LOC (and YAML document)

I can clone drivers/of/ code only for that new communication channel,
but that makes no one happy. :(

> > For such purpose, we have to implement a tool to parse and pack the
> > data and a channel to load it at earlier stage in bootloader. And
> > those are already done by devicetree. Thus I thought I could get a
> > piggyback on devicetree.
>
> Devicetree is not the universal dumping ground for communicating
> information to a booting kernel. Please create another communication
> channel.

Why should we so limit the availability of even a small corner of existing
open source software...?

Thank you,

--
Masami Hiramatsu <[email protected]>

2019-06-26 21:59:38

by Rob Herring

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] tracing: of: Boot time tracing using devicetree

On Fri, Jun 21, 2019 at 10:18 AM Masami Hiramatsu <[email protected]> wrote:
>
> Hi,
>
> Here is an RFC series of patches to add boot-time tracing using
> devicetree.
>
> Currently, kernel support boot-time tracing using kernel command-line
> parameters. But that is very limited because of limited expressions
> and limited length of command line. Recently, useful features like
> histogram, synthetic events, etc. are being added to ftrace, but it is
> clear that we can not expand command-line options to support these
> features.
>
> Hoever, I've found that there is a devicetree which can pass more
> structured commands to kernel at boot time :) The devicetree is usually
> used for dscribing hardware configuration, but I think we can expand it
> for software configuration too (e.g. AOSP and OPTEE already introduced
> firmware node.) Also, grub and qemu already supports loading devicetree,
> so we can use it not only on embedded devices but also on x86 PC too.

Do the x86 versions of grub, qemu, EFI, any other bootloader actually
enable DT support? I didn't think so. Certainly, an x86 kernel doesn't
normally (other than OLPC and ce4100) have a defined way to even pass
a dtb from the bootloader to the kernel and the kernel doesn't
unflatten the dtb.

For arm64, the bootloader to kernel interface is DT even for ACPI
based systems. So unlike Frank, I'm not completely against DT being
the interface, but it's hardly universal across architectures and
something like this should be. Neither making DT the universal kernel
boot interface nor creating some new channel as Frank suggested seems
like an easy task.

Rob

2019-06-27 02:56:41

by Frank Rowand

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] tracing: of: Boot time tracing using devicetree

On 6/26/19 2:58 PM, Rob Herring wrote:
> On Fri, Jun 21, 2019 at 10:18 AM Masami Hiramatsu <[email protected]> wrote:
>>
>> Hi,
>>
>> Here is an RFC series of patches to add boot-time tracing using
>> devicetree.
>>
>> Currently, kernel support boot-time tracing using kernel command-line
>> parameters. But that is very limited because of limited expressions
>> and limited length of command line. Recently, useful features like
>> histogram, synthetic events, etc. are being added to ftrace, but it is
>> clear that we can not expand command-line options to support these
>> features.
>>
>> Hoever, I've found that there is a devicetree which can pass more
>> structured commands to kernel at boot time :) The devicetree is usually
>> used for dscribing hardware configuration, but I think we can expand it
>> for software configuration too (e.g. AOSP and OPTEE already introduced
>> firmware node.) Also, grub and qemu already supports loading devicetree,
>> so we can use it not only on embedded devices but also on x86 PC too.
>
> Do the x86 versions of grub, qemu, EFI, any other bootloader actually
> enable DT support? I didn't think so. Certainly, an x86 kernel doesn't
> normally (other than OLPC and ce4100) have a defined way to even pass
> a dtb from the bootloader to the kernel and the kernel doesn't
> unflatten the dtb.
>
> For arm64, the bootloader to kernel interface is DT even for ACPI
> based systems. So unlike Frank, I'm not completely against DT being
> the interface, but it's hardly universal across architectures and
> something like this should be. Neither making DT the universal kernel
> boot interface nor creating some new channel as Frank suggested seems
> like an easy task.
>
> Rob
>
Agreed, creating a new generic channel is not a small or easy task. And
if it does get added it should support several different use cases that
have been expressed over the last few years. Or have enough good
architecturfe such thatvit is easy to add support for other use cases
without impacting previously existing cases.

-Frank

2019-06-27 10:59:07

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] tracing: of: Boot time tracing using devicetree

Hi Rob,

On Wed, 26 Jun 2019 15:58:50 -0600
Rob Herring <[email protected]> wrote:

> On Fri, Jun 21, 2019 at 10:18 AM Masami Hiramatsu <[email protected]> wrote:
> >
> > Hi,
> >
> > Here is an RFC series of patches to add boot-time tracing using
> > devicetree.
> >
> > Currently, kernel support boot-time tracing using kernel command-line
> > parameters. But that is very limited because of limited expressions
> > and limited length of command line. Recently, useful features like
> > histogram, synthetic events, etc. are being added to ftrace, but it is
> > clear that we can not expand command-line options to support these
> > features.
> >
> > Hoever, I've found that there is a devicetree which can pass more
> > structured commands to kernel at boot time :) The devicetree is usually
> > used for dscribing hardware configuration, but I think we can expand it
> > for software configuration too (e.g. AOSP and OPTEE already introduced
> > firmware node.) Also, grub and qemu already supports loading devicetree,
> > so we can use it not only on embedded devices but also on x86 PC too.
>
> Do the x86 versions of grub, qemu, EFI, any other bootloader actually
> enable DT support? I didn't think so. Certainly, an x86 kernel doesn't
> normally (other than OLPC and ce4100) have a defined way to even pass
> a dtb from the bootloader to the kernel and the kernel doesn't
> unflatten the dtb.

Sorry, the grub part, I just found this entry. I need to check this
can work on x86 too.

https://www.gnu.org/software/grub/manual/grub/html_node/devicetree.html

Anyway, I've tested this series on qemu-x86 with --dtb option.
The kernel boot with ACPI and DT (hardware drivers seem initialized
by ACPI), and it seems unflatten the dtb correctly.

>
> For arm64, the bootloader to kernel interface is DT even for ACPI
> based systems. So unlike Frank, I'm not completely against DT being
> the interface, but it's hardly universal across architectures and
> something like this should be. Neither making DT the universal kernel
> boot interface nor creating some new channel as Frank suggested seems
> like an easy task.

I don't want it making this for all architectures but an option for
architecutres which supports DT already...

Thank you,


>
> Rob


--
Masami Hiramatsu <[email protected]>

2019-07-02 09:48:13

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] tracing: of: Boot time tracing using devicetree

Hello,

On Thu, 27 Jun 2019 19:58:17 +0900
Masami Hiramatsu <[email protected]> wrote:

> Hi Rob,
>
> On Wed, 26 Jun 2019 15:58:50 -0600
> Rob Herring <[email protected]> wrote:
>
> > On Fri, Jun 21, 2019 at 10:18 AM Masami Hiramatsu <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > Here is an RFC series of patches to add boot-time tracing using
> > > devicetree.
> > >
> > > Currently, kernel support boot-time tracing using kernel command-line
> > > parameters. But that is very limited because of limited expressions
> > > and limited length of command line. Recently, useful features like
> > > histogram, synthetic events, etc. are being added to ftrace, but it is
> > > clear that we can not expand command-line options to support these
> > > features.
> > >
> > > Hoever, I've found that there is a devicetree which can pass more
> > > structured commands to kernel at boot time :) The devicetree is usually
> > > used for dscribing hardware configuration, but I think we can expand it
> > > for software configuration too (e.g. AOSP and OPTEE already introduced
> > > firmware node.) Also, grub and qemu already supports loading devicetree,
> > > so we can use it not only on embedded devices but also on x86 PC too.
> >
> > Do the x86 versions of grub, qemu, EFI, any other bootloader actually
> > enable DT support? I didn't think so. Certainly, an x86 kernel doesn't
> > normally (other than OLPC and ce4100) have a defined way to even pass
> > a dtb from the bootloader to the kernel and the kernel doesn't
> > unflatten the dtb.
>
> Sorry, the grub part, I just found this entry. I need to check this
> can work on x86 too.

I've confirmed that grub-x86 doesn't support devicetree option. I tried
to add it, and tested it.

https://github.com/mhiramat/grub/commit/644c35bfd2d18c772cc353b74215344f8264923a

This works if there is ACPI, if it includes /chosen/linux,ftrace node only.
(Anyway, we don't need other nodes on x86)

At this moment, grub doesn't support DT overlay, so on arm/arm64 user must
decompile DTB, add linux,ftrace node for tracing and compile it again.
But if it supports overlay, I think we can give an overlay for tracer setting
on boot up, that will be handy on arm/arm64 too. :)

Thank you,

--
Masami Hiramatsu <[email protected]>

2019-07-15 15:11:25

by Frank Rowand

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] tracing: of: Boot time tracing using devicetree

Hi Masami,

Note that Masami has submitted v2 of the patch with further discussion. I
noticed that I had left some comments unanswered in this thread, so I am
doing this reply before moving on to reply to the v2 thread.

Feel free to move the discussion to v2 instead of replying to this email
if you find that easier.

On 6/24/19 10:00 PM, Masami Hiramatsu wrote:
> Hi Frank,
>
> On Mon, 24 Jun 2019 15:31:07 -0700
> Frank Rowand <[email protected]> wrote:
>>>>> Currently, kernel support boot-time tracing using kernel command-line
>>>>> parameters. But that is very limited because of limited expressions
>>>>> and limited length of command line. Recently, useful features like
>>>>> histogram, synthetic events, etc. are being added to ftrace, but it is
>>>>> clear that we can not expand command-line options to support these
>>>>> features.
>>>>
>>>> "it is clear that we can not expand command-line options" needs a fuller
>>>> explanation. And maybe further exploration.
>>>
>>> Indeed. As an example of tracing settings in the first mail, even for simple
>>> use-case, the trace command is long and complicated. I think it is hard to
>>> express that as 1-liner kernel command line. But devicetree looks very good
>>> for expressing structured data. That is great and I like it :)
>>
>> But you could extend the command line paradigm to meet your needs.
>
> But the kernel command line is a command line. Would you mean encoding the
> structured setting in binary format with base64 and pass it? :(

If you want to put structured data in the command line, then yes, you could use
an ascii safe form, such as base64, to contain it. If that is what you want.


>
>>>> Devicetree is NOT for configuration information. This has been discussed
>>>> over and over again in mail lists, at various conferences, and was also an
>>>> entire session at plumbers a few years ago:
>>>>
>>>> https://elinux.org/Device_tree_future#Linux_Plumbers_2016_Device_Tree_Track
>>>
>>> Thanks, I'll check that.
>
> I found following discussion in etherpad log, https://elinux.org/Device_tree_plumbers_2016_etherpad
> ----
> If you have data that the kernel does not have a good way to get, that's OK to put into DT.
>
> Operating points are OK - but should still be structured well.
> ----
>
> This sounds like if it is structured well, and there are no other way,
> we will be able to use DT as a channel.
>
>>>>
>>>> There is one part of device tree that does allow non-hardware description,
>>>> which is the "chosen" node which is provided to allow communication between
>>>> the bootloader and the kernel.
>>>
>>> Ah, "chosen" will be suit for me :)
>>
>> No. This is not communicating boot loader information.
>
> Hmm, it's a kind of communication with the operator of the boot loader, since there
> is an admin or developer behind it. I think the comminication is to communicate
> with that human. Then if they intend to trace boot process, that is a kind of
> communication.

The quote from the plumbers 2016 devicetree etherpad ("Operating points are OK ...)
conveniently ignores a sentence just a few lines later:

"If firmware is deciding the operating point, then it's OK to convey it via DT."

The operating points example is clearly communicating boot loader information to
the kernel.

Something the admin or developer wants to communicate is not boot loader
information.


>
> [...]
>>>>> - Can we use devicetree for configuring kernel dynamically?
>>>>
>>>> No. Sorry.
>>>>
>>>> My understanding of this proposal is that it is intended to better
>>>> support boot time kernel and driver debugging. As an alternate
>>>> implementation, could you compile the ftrace configuration information
>>>> directly into a kernel data structure? It seems like it would not be
>>>> very difficult to populate the data structure data via a few macros.
>>>
>>> No, that is not what I intended. My intention was to trace boot up
>>> process "without recompiling kernel", but with a structured data.
>>
>> That is debugging. Or if you want to be pedantic, a complex performance
>> measurement of the boot process (more than holding a stopwatch in your
>> hand).
>
> Yeah, that's right.
>
>> Recompiling a single object file (containing the ftrace command data)
>> and re-linking the kernel is not a big price in that context).
>
> No, if I can use DT, I can choose one of them while boot up.
> That will be a big difference.
> (Of course for that purpose, I should work on boot loader to support
> DT overlay)

This is debugging kernel drivers. I do not think that it is a big cost for
a kernel developer to re-link the kernel. On any reasonable modern
development system this should be a matter of seconds, not minutes.

Compiling a devicetree source is not significantly less time. (To be
fair, you imply that you would have various compiled devicetrees to
choose from at boot time. It may be realistic to have a library of
ftrace commands, or it may be more realistic that someone debugging
a kernel driver will create a unique ftrace command set for the
particular case they are debugging.)

>
>> Or if
>> you create a new communication channel, you will have the cost of
>> creating that data object (certainly not much different than compiling
>> a devicetree) and have the bootloader provide the ftrace data object
>> to the kernel.
>
> Yes, and for me, that sounds like just a reinvention of the wheel.
> If I can reuse devicetree infrastructure, it is easily done (as I
> implemented in this series. It's just about 500LOC (and YAML document)

Or you can just use the existing ftrace boot command line syntax.


>
> I can clone drivers/of/ code only for that new communication channel,
> but that makes no one happy. :(
>
>>> For such purpose, we have to implement a tool to parse and pack the
>>> data and a channel to load it at earlier stage in bootloader. And
>>> those are already done by devicetree. Thus I thought I could get a
>>> piggyback on devicetree.
>>
>> Devicetree is not the universal dumping ground for communicating
>> information to a booting kernel. Please create another communication
>> channel.
>
> Why should we so limit the availability of even a small corner of existing
> open source software...?

Because the proposal is a mis-use of devicetree.

>
> Thank you,
>

2019-07-17 00:59:02

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [RFC PATCH 00/11] tracing: of: Boot time tracing using devicetree

Hi Frank,

On Mon, 15 Jul 2019 06:55:40 -0700
Frank Rowand <[email protected]> wrote:

> > Hmm, it's a kind of communication with the operator of the boot loader, since there
> > is an admin or developer behind it. I think the comminication is to communicate
> > with that human. Then if they intend to trace boot process, that is a kind of
> > communication.
>
> The quote from the plumbers 2016 devicetree etherpad ("Operating points are OK ...)
> conveniently ignores a sentence just a few lines later:
>
> "If firmware is deciding the operating point, then it's OK to convey it via DT."
>
> The operating points example is clearly communicating boot loader information to
> the kernel.
>
> Something the admin or developer wants to communicate is not boot loader
> information.

But the cmdline (bootargs) is already supported, I think this is a kind of bootargs.

> > [...]
> >>>>> - Can we use devicetree for configuring kernel dynamically?
> >>>>
> >>>> No. Sorry.
> >>>>
> >>>> My understanding of this proposal is that it is intended to better
> >>>> support boot time kernel and driver debugging. As an alternate
> >>>> implementation, could you compile the ftrace configuration information
> >>>> directly into a kernel data structure? It seems like it would not be
> >>>> very difficult to populate the data structure data via a few macros.
> >>>
> >>> No, that is not what I intended. My intention was to trace boot up
> >>> process "without recompiling kernel", but with a structured data.
> >>
> >> That is debugging. Or if you want to be pedantic, a complex performance
> >> measurement of the boot process (more than holding a stopwatch in your
> >> hand).
> >
> > Yeah, that's right.
> >
> >> Recompiling a single object file (containing the ftrace command data)
> >> and re-linking the kernel is not a big price in that context).
> >
> > No, if I can use DT, I can choose one of them while boot up.
> > That will be a big difference.
> > (Of course for that purpose, I should work on boot loader to support
> > DT overlay)
>
> This is debugging kernel drivers. I do not think that it is a big cost for
> a kernel developer to re-link the kernel. On any reasonable modern
> development system this should be a matter of seconds, not minutes.

Tracing is not only debugging the kernel drivers, but also measures
performance or behavior and compare with different settings.
Also, in some case, we can not change the binary, like distro kernel.

> Compiling a devicetree source is not significantly less time. (To be
> fair, you imply that you would have various compiled devicetrees to
> choose from at boot time. It may be realistic to have a library of
> ftrace commands, or it may be more realistic that someone debugging
> a kernel driver will create a unique ftrace command set for the
> particular case they are debugging.)

Yeah, devicetree build time may not be matter, decoupling from the
kernel image is, I think, the key point.

Thank you,

--
Masami Hiramatsu <[email protected]>