2021-11-13 12:06:54

by Jeff Xie

[permalink] [raw]
Subject: [RFC][PATCH v5 1/4] trace: Add trace any kernel object

Introduce a method based on function tracer to trace any object and get
the value of the object dynamically. the object can be obtained from the
dynamic event (kprobe_event/uprobe_event) or the static event(tracepoint).

Usage:
When using the kprobe event, only need to set the objtrace(a new trigger),
we can get the value of the object. The object is from the setting of the
kprobe event.

For example:
For the function bio_add_page():

int bio_add_page(struct bio *bio, struct page *page,
unsigned int len, unsigned int offset)

Firstly, we can set the base of the object, thus the first string "arg1"
stands for the value of the first parameter of this function bio_add_gage(),

# echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events

Secondly, we can get the value dynamically base the above object.

find the offset of the bi_size in struct bio:
$ gdb vmlinux
(gdb) p &(((struct bio *)0)->bi_iter.bi_size)
$1 = (unsigned int *) 0x28

# echo 'objtrace:0x28(arg1):u32:1 if comm == "cat"' > ./events/kprobes/ \
p_bio_add_page_0/trigger

The best way to use this is that we can set the entrance event and exit
event, for example, the following example is to set the read_papes as
the entrance event, and set the blk_account_io_start as the exit event.

# dd if=/dev/zero of=/test.txt bs=4k count=3
# cd /sys/kernel/debug/tracing/
# echo 0 > ./tracing_on
# echo 'p read_pages' >> ./kprobe_events
# echo 'p blk_account_io_start' >> ./kprobe_events
# echo 'traceon if comm == "cat"' > ./events/kprobes/p_read_pages_0/trigger
# echo 'traceoff if comm == "cat"' > ./events/kprobes/p_blk_account_io_start_0/trigger
# echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
# echo 'objtrace:0x28(arg1):u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger

# du -sh /test.txt
12.0K /test.txt

# cat /test.txt > /dev/null
# cat ./trace

# tracer: nop
#
# entries-in-buffer/entries-written: 50/50 #P:1
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
cat-95 [000] ..... 1.412065: _raw_spin_unlock_irqrestore <-event_triggers_call object:0xffff888108af6328 value:0x0
cat-95 [000] ..... 1.412066: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x0
cat-95 [000] ..... 1.412066: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x0
cat-95 [000] ..... 1.412066: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x1000
cat-95 [000] ..... 1.412066: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x1000
cat-95 [000] ..... 1.412066: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x1000
cat-95 [000] ..... 1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x1000
cat-95 [000] ..... 1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x1000
cat-95 [000] ..... 1.412067: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x1000
cat-95 [000] ..... 1.412067: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x2000
cat-95 [000] ..... 1.412067: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x2000
cat-95 [000] ..... 1.412067: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x2000
cat-95 [000] ..... 1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x2000
cat-95 [000] ..... 1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x2000
cat-95 [000] ..... 1.412068: submit_bio <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412068: submit_bio_noacct <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412068: __submit_bio <-submit_bio_noacct object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412068: blk_try_enter_queue <-__submit_bio object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412068: submit_bio_checks <-__submit_bio object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412068: __cond_resched <-submit_bio_checks object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412068: rcu_all_qs <-__cond_resched object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412068: should_fail_bio <-submit_bio_checks object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412069: create_task_io_context <-submit_bio_checks object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412069: kmem_cache_alloc_node <-create_task_io_context object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412069: should_failslab <-kmem_cache_alloc_node object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412069: _raw_spin_lock <-create_task_io_context object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412069: blk_mq_submit_bio <-__submit_bio object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412069: __blk_queue_split <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412069: bvec_split_segs <-__blk_queue_split object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412069: blk_attempt_plug_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412070: __blk_mq_sched_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412070: dd_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412070: _raw_spin_lock <-dd_bio_merge object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412070: blk_mq_sched_try_merge <-dd_bio_merge object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412070: elv_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412070: elv_rqhash_find <-elv_merge object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412070: dd_request_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412070: elv_rb_find <-dd_request_merge object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412070: __blk_mq_alloc_request <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412071: dd_limit_depth <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412071: blk_mq_get_tag <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412071: __blk_mq_get_tag <-blk_mq_get_tag object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412071: blk_mq_rq_ctx_init.isra.0 <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412071: ktime_get <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412071: dd_prepare_request <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412071: blk_account_io_start <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412071: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x3000
cat-95 [000] ..... 1.412071: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x3000

Signed-off-by: Jeff Xie <[email protected]>
---
include/linux/trace_events.h | 1 +
kernel/trace/Kconfig | 10 +
kernel/trace/Makefile | 1 +
kernel/trace/trace.h | 8 +
kernel/trace/trace_entries.h | 17 +
kernel/trace/trace_events_trigger.c | 1 +
kernel/trace/trace_object.c | 532 ++++++++++++++++++++++++++++
kernel/trace/trace_output.c | 40 +++
8 files changed, 610 insertions(+)
create mode 100644 kernel/trace/trace_object.c

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 3e475eeb5a99..84b5c12c7f44 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -684,6 +684,7 @@ enum event_trigger_type {
ETT_EVENT_HIST = (1 << 4),
ETT_HIST_ENABLE = (1 << 5),
ETT_EVENT_EPROBE = (1 << 6),
+ ETT_TRACE_OBJECT = (1 << 7),
};

extern int filter_match_preds(struct event_filter *filter, void *rec);
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 420ff4bc67fd..f69761cbfaaa 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -237,6 +237,16 @@ config FUNCTION_PROFILER

If in doubt, say N.

+config TRACE_OBJECT
+ bool "Trace kernel object in function parameter"
+ depends on FUNCTION_TRACER
+ depends on HAVE_FUNCTION_ARG_ACCESS_API
+ select TRACING
+ default y
+ help
+ You can trace the kernel object in the kernel function parameter.
+ The kernel object is dynamically specified via event trigger.
+
config STACK_TRACER
bool "Trace max stack"
depends on HAVE_FUNCTION_TRACER
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 6de5d4d63165..6d9e78a488aa 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += fgraph.o
+obj-$(CONFIG_TRACE_OBJECT) += trace_object.o
ifeq ($(CONFIG_BLOCK),y)
obj-$(CONFIG_EVENT_TRACING) += blktrace.o
endif
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index b7c0f8e160fb..d2508f977fe6 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -49,6 +49,7 @@ enum trace_type {
TRACE_TIMERLAT,
TRACE_RAW_DATA,
TRACE_FUNC_REPEATS,
+ TRACE_OBJECT,

__TRACE_LAST_TYPE,
};
@@ -460,6 +461,7 @@ extern void __ftrace_bad_type(void);
TRACE_GRAPH_RET); \
IF_ASSIGN(var, ent, struct func_repeats_entry, \
TRACE_FUNC_REPEATS); \
+ IF_ASSIGN(var, ent, struct trace_object_entry, TRACE_OBJECT);\
__ftrace_bad_type(); \
} while (0)

@@ -1510,6 +1512,12 @@ static inline int register_trigger_hist_cmd(void) { return 0; }
static inline int register_trigger_hist_enable_disable_cmds(void) { return 0; }
#endif

+#ifdef CONFIG_TRACE_OBJECT
+extern int register_trigger_object_cmd(void);
+#else
+static inline int register_trigger_object_cmd(void) { return 0; }
+#endif
+
extern int register_trigger_cmds(void);
extern void clear_event_triggers(struct trace_array *tr);

diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index cd41e863b51c..bb120d9498a9 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -401,3 +401,20 @@ FTRACE_ENTRY(timerlat, timerlat_entry,
__entry->context,
__entry->timer_latency)
);
+
+/*
+ * trace object entry:
+ */
+FTRACE_ENTRY(object, trace_object_entry,
+
+ TRACE_OBJECT,
+
+ F_STRUCT(
+ __field( unsigned long, ip )
+ __field( unsigned long, parent_ip )
+ __field( unsigned long, object )
+ ),
+
+ F_printk(" %ps <-- %ps object:%lx\n",
+ (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
+);
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 3d5c07239a2a..da2d871fa809 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -1687,6 +1687,7 @@ __init int register_trigger_cmds(void)
register_trigger_enable_disable_cmds();
register_trigger_hist_enable_disable_cmds();
register_trigger_hist_cmd();
+ register_trigger_object_cmd();

return 0;
}
diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
new file mode 100644
index 000000000000..69465c2ffb7e
--- /dev/null
+++ b/kernel/trace/trace_object.c
@@ -0,0 +1,532 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * trace any object
+ * Copyright (C) 2021 Jeff Xie <[email protected]>
+ */
+
+#define pr_fmt(fmt) "trace_object: " fmt
+
+#include "trace_output.h"
+#include <linux/freelist.h>
+
+static DEFINE_PER_CPU(atomic_t, trace_object_event_disable);
+static struct trace_event_file event_trace_file;
+static const int max_args_num = 6;
+static const int max_obj_pool = 10;
+static atomic_t trace_object_ref;
+static int exit_trace_object(void);
+static int init_trace_object(void);
+
+struct object_instance {
+ void *object;
+ struct freelist_node freelist;
+};
+
+struct obj_pool {
+ int maxactive;
+ atomic_t nobject;
+ struct freelist_head product_freelist;
+ struct freelist_head customer_freelist;
+};
+static struct obj_pool *obj_pool;
+
+static bool object_exist(void *obj)
+{
+ struct freelist_node *node;
+ struct object_instance *inst;
+ bool ret = false;
+
+ node = obj_pool->customer_freelist.head;
+
+ while (node) {
+ inst = container_of(node, struct object_instance, freelist);
+ if (inst->object == obj) {
+ ret = true;
+ goto out;
+ }
+ node = node->next;
+ }
+out:
+ return ret;
+}
+
+static bool object_empty(void)
+{
+ bool ret;
+
+ ret = atomic_read(&obj_pool->nobject) == 0;
+
+ return ret;
+}
+
+
+static void set_trace_object(void *obj)
+{
+ struct freelist_node *fn;
+ struct object_instance *ins;
+
+ if (!obj)
+ return;
+
+ if (object_exist(obj))
+ goto out;
+
+ fn = freelist_try_get(&obj_pool->product_freelist);
+ if (!fn) {
+ trace_printk("object_pool is full, can't trace object:0x%px\n", obj);
+ return;
+ }
+
+ ins = container_of(fn, struct object_instance, freelist);
+ ins->object = obj;
+
+ freelist_add(&ins->freelist, &obj_pool->customer_freelist);
+ atomic_inc(&obj_pool->nobject);
+
+out:
+ return;
+}
+
+static inline void free_objects(struct freelist_node *node)
+{
+
+ struct object_instance *inst;
+
+ while (node) {
+ inst = container_of(node, struct object_instance, freelist);
+ node = node->next;
+ kfree(inst);
+ }
+}
+
+static inline void free_object_pool(void)
+{
+ free_objects(obj_pool->product_freelist.head);
+ free_objects(obj_pool->customer_freelist.head);
+ kfree(obj_pool);
+}
+
+
+static int init_object_pool(void)
+{
+ struct object_instance *inst;
+ int i, ret = 0;
+
+ obj_pool = kzalloc(sizeof(*obj_pool), GFP_KERNEL);
+ if (!obj_pool) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ obj_pool->maxactive = max_obj_pool;
+ obj_pool->product_freelist.head = NULL;
+ obj_pool->customer_freelist.head = NULL;
+
+ for (i = 0; i < obj_pool->maxactive; i++) {
+ inst = kzalloc(sizeof(*inst), GFP_KERNEL);
+ if (!inst) {
+ free_object_pool();
+ ret = -ENOMEM;
+ goto out;
+ }
+ freelist_add(&inst->freelist, &obj_pool->product_freelist);
+ }
+out:
+ return ret;
+}
+
+static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
+ unsigned long object)
+{
+
+ struct trace_buffer *buffer;
+ struct ring_buffer_event *event;
+ struct trace_object_entry *entry;
+ int pc;
+
+ pc = preempt_count();
+ event = trace_event_buffer_lock_reserve(&buffer, &event_trace_file,
+ TRACE_OBJECT, sizeof(*entry), pc);
+ if (!event)
+ return;
+ entry = ring_buffer_event_data(event);
+ entry->ip = ip;
+ entry->parent_ip = parent_ip;
+ entry->object = object;
+
+ event_trigger_unlock_commit(&event_trace_file, buffer, event,
+ entry, pc);
+}
+
+static void
+trace_object_events_call(unsigned long ip, unsigned long parent_ip,
+ struct ftrace_ops *op, struct ftrace_regs *fregs)
+{
+ struct pt_regs *pt_regs = ftrace_get_regs(fregs);
+ unsigned long obj;
+ long disabled;
+ int cpu, n;
+
+ preempt_disable_notrace();
+
+ cpu = raw_smp_processor_id();
+ disabled = atomic_inc_return(&per_cpu(trace_object_event_disable, cpu));
+
+ if (disabled != 1)
+ goto out;
+
+ if (object_empty())
+ goto out;
+
+ for (n = 0; n < max_args_num; n++) {
+ obj = regs_get_kernel_argument(pt_regs, n);
+ if (object_exist((void *)obj))
+ submit_trace_object(ip, parent_ip, obj);
+ }
+
+out:
+ atomic_dec(&per_cpu(trace_object_event_disable, cpu));
+ preempt_enable_notrace();
+}
+
+static struct ftrace_ops trace_ops = {
+ .func = trace_object_events_call,
+ .flags = FTRACE_OPS_FL_SAVE_REGS,
+};
+
+static void
+trace_object_trigger(struct event_trigger_data *data,
+ struct trace_buffer *buffer, void *rec,
+ struct ring_buffer_event *event)
+{
+
+ struct ftrace_event_field *field = data->private_data;
+ void *obj = NULL;
+
+ memcpy(&obj, rec + field->offset, sizeof(obj));
+ set_trace_object(obj);
+}
+
+static void
+trace_object_trigger_free(struct event_trigger_ops *ops,
+ struct event_trigger_data *data)
+{
+ if (WARN_ON_ONCE(data->ref <= 0))
+ return;
+
+ data->ref--;
+ if (!data->ref)
+ trigger_data_free(data);
+}
+
+static void
+trace_object_count_trigger(struct event_trigger_data *data,
+ struct trace_buffer *buffer, void *rec,
+ struct ring_buffer_event *event)
+{
+ if (!data->count)
+ return;
+
+ if (data->count != -1)
+ (data->count)--;
+
+ trace_object_trigger(data, buffer, rec, event);
+}
+
+static int event_object_trigger_init(struct event_trigger_ops *ops,
+ struct event_trigger_data *data)
+{
+ data->ref++;
+ return 0;
+}
+
+static int
+event_trigger_print(const char *name, struct seq_file *m,
+ void *data, char *filter_str)
+{
+ long count = (long)data;
+
+ seq_puts(m, name);
+
+ if (count == -1)
+ seq_puts(m, ":unlimited");
+ else
+ seq_printf(m, ":count=%ld", count);
+
+ if (filter_str)
+ seq_printf(m, " if %s\n", filter_str);
+ else
+ seq_putc(m, '\n');
+
+ return 0;
+}
+
+static int
+trace_object_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+ struct event_trigger_data *data)
+{
+ return event_trigger_print("objtrace", m, (void *)data->count,
+ data->filter_str);
+}
+
+
+static struct event_trigger_ops objecttrace_trigger_ops = {
+ .func = trace_object_trigger,
+ .print = trace_object_trigger_print,
+ .init = event_object_trigger_init,
+ .free = trace_object_trigger_free,
+};
+
+static struct event_trigger_ops objecttrace_count_trigger_ops = {
+ .func = trace_object_count_trigger,
+ .print = trace_object_trigger_print,
+ .init = event_object_trigger_init,
+ .free = trace_object_trigger_free,
+};
+
+static struct event_trigger_ops *
+objecttrace_get_trigger_ops(char *cmd, char *param)
+{
+ return param ? &objecttrace_count_trigger_ops : &objecttrace_trigger_ops;
+}
+
+static int register_object_trigger(char *glob, struct event_trigger_ops *ops,
+ struct event_trigger_data *data,
+ struct trace_event_file *file)
+{
+ struct event_trigger_data *test;
+ int ret = 0;
+
+ lockdep_assert_held(&event_mutex);
+
+ list_for_each_entry(test, &file->triggers, list) {
+ if (test->cmd_ops->trigger_type == data->cmd_ops->trigger_type) {
+ ret = -EEXIST;
+ goto out;
+ }
+ }
+
+ if (data->ops->init) {
+ ret = data->ops->init(data->ops, data);
+ if (ret < 0)
+ goto out;
+ }
+
+ list_add_rcu(&data->list, &file->triggers);
+ ret++;
+
+ update_cond_flag(file);
+ if (trace_event_trigger_enable_disable(file, 1) < 0) {
+ list_del_rcu(&data->list);
+ update_cond_flag(file);
+ ret--;
+ }
+ init_trace_object();
+out:
+ return ret;
+}
+
+static void unregister_object_trigger(char *glob, struct event_trigger_ops *ops,
+ struct event_trigger_data *test,
+ struct trace_event_file *file)
+{
+ struct event_trigger_data *data;
+ bool unregistered = false;
+
+ lockdep_assert_held(&event_mutex);
+
+ list_for_each_entry(data, &file->triggers, list) {
+ if (data->cmd_ops->trigger_type == test->cmd_ops->trigger_type) {
+ unregistered = true;
+ list_del_rcu(&data->list);
+ trace_event_trigger_enable_disable(file, 0);
+ update_cond_flag(file);
+ break;
+ }
+ }
+
+ if (unregistered && data->ops->free) {
+ data->ops->free(data->ops, data);
+ exit_trace_object();
+ }
+}
+
+static int
+event_object_trigger_callback(struct event_command *cmd_ops,
+ struct trace_event_file *file,
+ char *glob, char *cmd, char *param)
+{
+ struct event_trigger_data *trigger_data;
+ struct event_trigger_ops *trigger_ops;
+ struct trace_event_call *call;
+ struct ftrace_event_field *field;
+ char *objtrace_cmd;
+ char *trigger = NULL;
+ char *arg;
+ char *number;
+ int ret;
+
+ ret = -EINVAL;
+ if (!param)
+ goto out;
+
+ /* separate the trigger from the filter (c:a:n [if filter]) */
+ trigger = strsep(&param, " \t");
+ if (!trigger)
+ goto out;
+ if (param) {
+ param = skip_spaces(param);
+ if (!*param)
+ param = NULL;
+ }
+
+ objtrace_cmd = strsep(&trigger, ":");
+ if (!objtrace_cmd || strcmp(objtrace_cmd, "add"))
+ goto out;
+
+ arg = strsep(&trigger, ":");
+ if (!arg)
+ goto out;
+ call = file->event_call;
+ field = trace_find_event_field(call, arg);
+ if (!field)
+ goto out;
+
+ trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
+
+ ret = -ENOMEM;
+ trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
+ if (!trigger_data)
+ goto out;
+
+ trigger_data->count = -1;
+ trigger_data->ops = trigger_ops;
+ trigger_data->cmd_ops = cmd_ops;
+ trigger_data->private_data = field;
+ INIT_LIST_HEAD(&trigger_data->list);
+ INIT_LIST_HEAD(&trigger_data->named_list);
+
+ if (glob[0] == '!') {
+ cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
+ kfree(trigger_data);
+ ret = 0;
+ goto out;
+ }
+
+ if (trigger) {
+ number = strsep(&trigger, ":");
+
+ ret = -EINVAL;
+ if (!strlen(number))
+ goto out_free;
+
+ /*
+ * We use the callback data field (which is a pointer)
+ * as our counter.
+ */
+ ret = kstrtoul(number, 0, &trigger_data->count);
+ if (ret)
+ goto out_free;
+ }
+
+ if (!param) /* if param is non-empty, it's supposed to be a filter */
+ goto out_reg;
+
+ if (!cmd_ops->set_filter)
+ goto out_reg;
+
+ ret = cmd_ops->set_filter(param, trigger_data, file);
+ if (ret < 0)
+ goto out_free;
+
+ out_reg:
+ /* Up the trigger_data count to make sure reg doesn't free it on failure */
+ event_object_trigger_init(trigger_ops, trigger_data);
+ ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
+ /*
+ * The above returns on success the # of functions enabled,
+ * but if it didn't find any functions it returns zero.
+ * Consider no functions a failure too.
+ */
+ if (!ret) {
+ cmd_ops->unreg(glob, trigger_ops, trigger_data, file);
+ ret = -ENOENT;
+ } else if (ret > 0)
+ ret = 0;
+
+ /* Down the counter of trigger_data or free it if not used anymore */
+ trace_object_trigger_free(trigger_ops, trigger_data);
+ out:
+ return ret;
+
+ out_free:
+ if (cmd_ops->set_filter)
+ cmd_ops->set_filter(NULL, trigger_data, NULL);
+ kfree(trigger_data);
+ goto out;
+}
+
+static struct event_command trigger_object_cmd = {
+ .name = "objtrace",
+ .trigger_type = ETT_TRACE_OBJECT,
+ .flags = EVENT_CMD_FL_NEEDS_REC,
+ .func = event_object_trigger_callback,
+ .reg = register_object_trigger,
+ .unreg = unregister_object_trigger,
+ .get_trigger_ops = objecttrace_get_trigger_ops,
+ .set_filter = set_trigger_filter,
+};
+
+__init int register_trigger_object_cmd(void)
+{
+ int ret;
+
+ ret = register_event_command(&trigger_object_cmd);
+ WARN_ON(ret < 0);
+
+ return ret;
+}
+
+static int init_trace_object(void)
+{
+ int ret;
+
+ if (atomic_inc_return(&trace_object_ref) != 1) {
+ ret = 0;
+ goto out;
+ }
+
+ ret = init_object_pool();
+ if (ret)
+ goto out;
+
+ event_trace_file.tr = top_trace_array();
+ if (WARN_ON(!event_trace_file.tr)) {
+ ret = -1;
+ goto out;
+ }
+ ret = register_ftrace_function(&trace_ops);
+out:
+ return ret;
+}
+
+static int exit_trace_object(void)
+{
+ int ret;
+
+ if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))
+ goto out;
+
+ if (atomic_dec_return(&trace_object_ref) != 0) {
+ ret = 0;
+ goto out;
+ }
+
+ ret = unregister_ftrace_function(&trace_ops);
+ if (ret) {
+ pr_err("can't unregister ftrace for trace object\n");
+ goto out;
+ }
+ free_object_pool();
+out:
+ return ret;
+}
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index c2ca40e8595b..76ca560af693 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -1552,6 +1552,45 @@ static struct trace_event trace_func_repeats_event = {
.funcs = &trace_func_repeats_funcs,
};

+/* TRACE_OBJECT */
+static enum print_line_t trace_object_print(struct trace_iterator *iter, int flags,
+ struct trace_event *event)
+{
+ struct trace_object_entry *field;
+ struct trace_seq *s = &iter->seq;
+
+ trace_assign_type(field, iter->ent);
+ print_fn_trace(s, field->ip, field->parent_ip, flags);
+ trace_seq_printf(s, " object:0x%lx", field->object);
+ trace_seq_putc(s, '\n');
+
+ return trace_handle_return(s);
+}
+
+static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags,
+ struct trace_event *event)
+{
+ struct trace_object_entry *field;
+
+ trace_assign_type(field, iter->ent);
+
+ trace_seq_printf(&iter->seq, "%lx %lx\n",
+ field->ip,
+ field->parent_ip);
+
+ return trace_handle_return(&iter->seq);
+}
+
+static struct trace_event_functions trace_object_funcs = {
+ .trace = trace_object_print,
+ .raw = trace_object_raw,
+};
+
+static struct trace_event trace_object_event = {
+ .type = TRACE_OBJECT,
+ .funcs = &trace_object_funcs,
+};
+
static struct trace_event *events[] __initdata = {
&trace_fn_event,
&trace_ctx_event,
@@ -1566,6 +1605,7 @@ static struct trace_event *events[] __initdata = {
&trace_timerlat_event,
&trace_raw_data_event,
&trace_func_repeats_event,
+ &trace_object_event,
NULL
};

--
2.25.1



2021-11-13 12:07:06

by Jeff Xie

[permalink] [raw]
Subject: [RFC][PATCH v5 3/4] trace/objtrace: Add testcases for objtrace trigger parsing

Signed-off-by: Jeff Xie <[email protected]>
---
.../ftrace/test.d/trigger/trigger-objtrace.tc | 41 +++++++++++++++++++
1 file changed, 41 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc

diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
new file mode 100644
index 000000000000..4efcd7eee694
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
@@ -0,0 +1,41 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: event trigger - test objtrace-trigger
+# requires: kprobe_events
+
+fail() { #msg
+ echo $1
+ exit_fail
+}
+
+echo 'p bio_add_page arg1=$arg1' > kprobe_events
+
+FEATURE=`grep objtrace events/kprobes/p_bio_add_page_0/trigger`
+if [ -z "$FEATURE" ]; then
+ echo "objtrace trigger is not supported"
+ exit_unsupported
+fi
+
+echo "Test objtrace trigger"
+echo 'objtrace:add:0x28(arg1):u32:1 if comm == "cat"' > \
+ events/kprobes/p_bio_add_page_0/trigger
+if [ -z $? ]; then
+ fail "objtrace trigger syntax error"
+fi
+
+echo "Test objtrace semantic errors"
+
+# Being lack of type size
+! echo 'objtrace:add:0x28(arg1):1' > events/kprobes/p_bio_add_page_0/trigger
+# Being lack of objtrace command
+! echo 'objtrace:0x28(arg1):u32:1' > events/kprobes/p_bio_add_page_0/trigger
+# Bad parameter name
+! echo 'objtrace:add:argx:u32:1' > events/kprobes/p_bio_add_page_0/trigger
+
+echo "reset objtrace trigger"
+
+echo '!objtrace:add:0x28(arg1):u32' > \
+ events/kprobes/p_bio_add_page_0/trigger
+echo '-:p_bio_add_page_0' >> ./kprobe_events
+
+exit 0
--
2.25.1


2021-11-13 12:07:05

by Jeff Xie

[permalink] [raw]
Subject: [RFC][PATCH v5 2/4] trace/objtrace: get the value of the object

Signed-off-by: Jeff Xie <[email protected]>
---
kernel/trace/trace_entries.h | 5 +-
kernel/trace/trace_object.c | 121 +++++++++++++++++++++++++++++------
kernel/trace/trace_output.c | 6 +-
3 files changed, 107 insertions(+), 25 deletions(-)

diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index bb120d9498a9..2407c45a568c 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -413,8 +413,9 @@ FTRACE_ENTRY(object, trace_object_entry,
__field( unsigned long, ip )
__field( unsigned long, parent_ip )
__field( unsigned long, object )
+ __field( unsigned long, value )
),

- F_printk(" %ps <-- %ps object:%lx\n",
- (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
+ F_printk(" %ps <-- %ps object:%lx value:%lx\n", (void *)__entry->ip,
+ (void *)__entry->parent_ip, __entry->object, __entry->value)
);
diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
index 69465c2ffb7e..14993f7d0e5a 100644
--- a/kernel/trace/trace_object.c
+++ b/kernel/trace/trace_object.c
@@ -11,14 +11,25 @@

static DEFINE_PER_CPU(atomic_t, trace_object_event_disable);
static struct trace_event_file event_trace_file;
-static const int max_args_num = 6;
static const int max_obj_pool = 10;
static atomic_t trace_object_ref;
static int exit_trace_object(void);
static int init_trace_object(void);

+struct objtrace_trigger_data {
+ struct ftrace_event_field *field;
+ long offset;
+ int type_size;
+};
+
+struct objtrace_fetch_type {
+ char *name;
+ int type_size;
+};
+
struct object_instance {
void *object;
+ int obj_type_size;
struct freelist_node freelist;
};

@@ -59,8 +70,7 @@ static bool object_empty(void)
return ret;
}

-
-static void set_trace_object(void *obj)
+static void set_trace_object(void *obj, int type_size)
{
struct freelist_node *fn;
struct object_instance *ins;
@@ -79,6 +89,7 @@ static void set_trace_object(void *obj)

ins = container_of(fn, struct object_instance, freelist);
ins->object = obj;
+ ins->obj_type_size = type_size;

freelist_add(&ins->freelist, &obj_pool->customer_freelist);
atomic_inc(&obj_pool->nobject);
@@ -135,7 +146,7 @@ static int init_object_pool(void)
}

static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
- unsigned long object)
+ unsigned long object, unsigned long value)
{

struct trace_buffer *buffer;
@@ -152,6 +163,7 @@ static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
entry->ip = ip;
entry->parent_ip = parent_ip;
entry->object = object;
+ entry->value = value;

event_trigger_unlock_commit(&event_trace_file, buffer, event,
entry, pc);
@@ -161,10 +173,11 @@ static void
trace_object_events_call(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *op, struct ftrace_regs *fregs)
{
- struct pt_regs *pt_regs = ftrace_get_regs(fregs);
- unsigned long obj;
+ struct freelist_node *node;
+ struct object_instance *inst;
+ unsigned long val = 0;
long disabled;
- int cpu, n;
+ int cpu;

preempt_disable_notrace();

@@ -177,10 +190,14 @@ trace_object_events_call(unsigned long ip, unsigned long parent_ip,
if (object_empty())
goto out;

- for (n = 0; n < max_args_num; n++) {
- obj = regs_get_kernel_argument(pt_regs, n);
- if (object_exist((void *)obj))
- submit_trace_object(ip, parent_ip, obj);
+ node = obj_pool->customer_freelist.head;
+
+ while (node) {
+ inst = container_of(node, struct object_instance, freelist);
+ if (copy_from_kernel_nofault(&val, inst->object, inst->obj_type_size))
+ goto out;
+ submit_trace_object(ip, parent_ip, (unsigned long)inst->object, val);
+ node = node->next;
}

out:
@@ -198,12 +215,14 @@ trace_object_trigger(struct event_trigger_data *data,
struct trace_buffer *buffer, void *rec,
struct ring_buffer_event *event)
{
+ struct objtrace_trigger_data *obj_data = data->private_data;
+ struct ftrace_event_field *field;
+ void *obj, *val = NULL;

- struct ftrace_event_field *field = data->private_data;
- void *obj = NULL;
-
- memcpy(&obj, rec + field->offset, sizeof(obj));
- set_trace_object(obj);
+ field = obj_data->field;
+ memcpy(&val, rec + field->offset, sizeof(val));
+ obj = val + obj_data->offset;
+ set_trace_object(obj, obj_data->type_size);
}

static void
@@ -350,6 +369,22 @@ static void unregister_object_trigger(char *glob, struct event_trigger_ops *ops,
}
}

+static const struct objtrace_fetch_type objtrace_fetch_types[] = {
+ {"u8", 1},
+ {"s8", 1},
+ {"x8", 1},
+ {"u16", 2},
+ {"s16", 2},
+ {"x16", 2},
+ {"u32", 4},
+ {"s32", 4},
+ {"x32", 4},
+ {"u64", 8},
+ {"s64", 8},
+ {"x64", 8},
+ {}
+};
+
static int
event_object_trigger_callback(struct event_command *cmd_ops,
struct trace_event_file *file,
@@ -357,13 +392,15 @@ event_object_trigger_callback(struct event_command *cmd_ops,
{
struct event_trigger_data *trigger_data;
struct event_trigger_ops *trigger_ops;
+ struct objtrace_trigger_data *obj_data;
struct trace_event_call *call;
struct ftrace_event_field *field;
char *objtrace_cmd;
+ long offset = 0;
char *trigger = NULL;
- char *arg;
+ char *arg, *type, *tr, *tr_end;
char *number;
- int ret;
+ int ret, i, type_size = 0;

ret = -EINVAL;
if (!param)
@@ -386,6 +423,38 @@ event_object_trigger_callback(struct event_command *cmd_ops,
arg = strsep(&trigger, ":");
if (!arg)
goto out;
+
+ tr = strchr(arg, '(');
+ /* now force to get the value of the val. */
+ if (!tr)
+ goto out;
+ tr_end = strchr(tr, ')');
+ if (!tr_end)
+ goto out;
+ *tr++ = '\0';
+ *tr_end = '\0';
+ ret = kstrtol(arg, 0, &offset);
+ if (ret)
+ goto out;
+ arg = tr;
+ ret = -EINVAL;
+ if (!trigger)
+ goto out;
+
+ type = strsep(&trigger, ":");
+ if (!type)
+ goto out;
+ for (i = 0; objtrace_fetch_types[i].name; i++) {
+ if (strcmp(objtrace_fetch_types[i].name, type) == 0) {
+ type_size = objtrace_fetch_types[i].type_size;
+ break;
+ }
+ }
+
+ if (type_size == 0)
+ goto out;
+
+
call = file->event_call;
field = trace_find_event_field(call, arg);
if (!field)
@@ -394,19 +463,30 @@ event_object_trigger_callback(struct event_command *cmd_ops,
trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);

ret = -ENOMEM;
+ obj_data = kzalloc(sizeof(*obj_data), GFP_KERNEL);
+ if (!obj_data)
+ goto out;
+
+ obj_data->field = field;
+ obj_data->offset = offset;
+ obj_data->type_size = type_size;
+
trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
- if (!trigger_data)
+ if (!trigger_data) {
+ kfree(obj_data);
goto out;
+ }

trigger_data->count = -1;
trigger_data->ops = trigger_ops;
trigger_data->cmd_ops = cmd_ops;
- trigger_data->private_data = field;
+ trigger_data->private_data = obj_data;
INIT_LIST_HEAD(&trigger_data->list);
INIT_LIST_HEAD(&trigger_data->named_list);

if (glob[0] == '!') {
cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
+ kfree(obj_data);
kfree(trigger_data);
ret = 0;
goto out;
@@ -461,6 +541,7 @@ event_object_trigger_callback(struct event_command *cmd_ops,
out_free:
if (cmd_ops->set_filter)
cmd_ops->set_filter(NULL, trigger_data, NULL);
+ kfree(obj_data);
kfree(trigger_data);
goto out;
}
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 76ca560af693..c8c427c23127 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -1562,6 +1562,7 @@ static enum print_line_t trace_object_print(struct trace_iterator *iter, int fla
trace_assign_type(field, iter->ent);
print_fn_trace(s, field->ip, field->parent_ip, flags);
trace_seq_printf(s, " object:0x%lx", field->object);
+ trace_seq_printf(s, " value:0x%lx", field->value);
trace_seq_putc(s, '\n');

return trace_handle_return(s);
@@ -1574,9 +1575,8 @@ static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags

trace_assign_type(field, iter->ent);

- trace_seq_printf(&iter->seq, "%lx %lx\n",
- field->ip,
- field->parent_ip);
+ trace_seq_printf(&iter->seq, "%lx %lx %lx %lx\n", field->ip,
+ field->parent_ip, field->object, field->value);

return trace_handle_return(&iter->seq);
}
--
2.25.1


2021-11-13 12:07:06

by Jeff Xie

[permalink] [raw]
Subject: [RFC][PATCH v5 4/4] trace/objtrace: Add documentation for objtrace trigger

Signed-off-by: Jeff Xie <[email protected]>
---
Documentation/trace/events.rst | 114 +++++++++++++++++++++++++++++++++
1 file changed, 114 insertions(+)

diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
index 8ddb9b09451c..d1ccb9ac5e7c 100644
--- a/Documentation/trace/events.rst
+++ b/Documentation/trace/events.rst
@@ -527,6 +527,120 @@ The following commands are supported:

See Documentation/trace/histogram.rst for details and examples.

+- objtrace
+
+ This command provides a way to get the value of any object, The object
+ can be obtained from the dynamic event(kprobe_event/uprobe_event) or the
+ static event(tracepoint).
+
+ Usage:
+ When using the kprobe event, only need to set the objtrace(a new trigger),
+ we can get the value of the object. The object is from the setting of the
+ kprobe event.
+
+ For example:
+ For the function bio_add_page():
+
+ int bio_add_page(struct bio *bio, struct page *page,
+ unsigned int len, unsigned int offset)
+
+ Firstly, we can set the base of the object, thus the first string "arg1"
+ stands for the value of the first parameter of this function bio_add_gage(),
+
+ # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
+
+ Secondly, we can get the value dynamically base the above object.
+
+ find the offset of the bi_size in struct bio:
+ $ gdb vmlinux
+ (gdb) p &(((struct bio *)0)->bi_iter.bi_size)
+ $1 = (unsigned int *) 0x28
+
+ # echo 'objtrace:0x28(arg1):u32:1 if comm == "cat"' > ./events/kprobes/ \
+ p_bio_add_page_0/trigger
+
+ The best way to use this is that we can set the entrance event and exit
+ event, for example, the following example is to set the read_papes as
+ the entrance event, and set the blk_account_io_start as the exit event.
+
+ # cd /sys/kernel/debug/tracing/
+ # echo 0 > ./tracing_on
+ # echo 'p read_pages' >> ./kprobe_events
+ # echo 'p blk_account_io_start' >> ./kprobe_events
+ # echo 'traceon if comm == "cat"' > ./events/kprobes/p_read_pages_0/trigger
+ # echo 'traceoff if comm == "cat"' > ./events/kprobes/p_blk_account_io_start_0/trigger
+ # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
+ # echo 'objtrace:0x28(arg1):u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger
+
+ # du -sh /test.txt
+ 12.0K /test.txt
+
+ # cat /test.txt > /dev/null
+ # cat ./trace
+
+ # tracer: nop
+ #
+ # entries-in-buffer/entries-written: 50/50 #P:1
+ #
+ # _-----=> irqs-off
+ # / _----=> need-resched
+ # | / _---=> hardirq/softirq
+ # || / _--=> preempt-depth
+ # ||| / _-=> migrate-disable
+ # |||| / delay
+ # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
+ # | | | ||||| | |
+ cat-95 [000] ..... 1.412065: _raw_spin_unlock_irqrestore <-event_triggers_call object:0xffff888108af6328 value:0x0
+ cat-95 [000] ..... 1.412066: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x0
+ cat-95 [000] ..... 1.412066: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x0
+ cat-95 [000] ..... 1.412066: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x1000
+ cat-95 [000] ..... 1.412066: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x1000
+ cat-95 [000] ..... 1.412066: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x1000
+ cat-95 [000] ..... 1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x1000
+ cat-95 [000] ..... 1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x1000
+ cat-95 [000] ..... 1.412067: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x1000
+ cat-95 [000] ..... 1.412067: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x2000
+ cat-95 [000] ..... 1.412067: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x2000
+ cat-95 [000] ..... 1.412067: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x2000
+ cat-95 [000] ..... 1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x2000
+ cat-95 [000] ..... 1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x2000
+ cat-95 [000] ..... 1.412068: submit_bio <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412068: submit_bio_noacct <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412068: __submit_bio <-submit_bio_noacct object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412068: blk_try_enter_queue <-__submit_bio object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412068: submit_bio_checks <-__submit_bio object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412068: __cond_resched <-submit_bio_checks object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412068: rcu_all_qs <-__cond_resched object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412068: should_fail_bio <-submit_bio_checks object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412069: create_task_io_context <-submit_bio_checks object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412069: kmem_cache_alloc_node <-create_task_io_context object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412069: should_failslab <-kmem_cache_alloc_node object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412069: _raw_spin_lock <-create_task_io_context object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412069: blk_mq_submit_bio <-__submit_bio object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412069: __blk_queue_split <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412069: bvec_split_segs <-__blk_queue_split object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412069: blk_attempt_plug_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412070: __blk_mq_sched_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412070: dd_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412070: _raw_spin_lock <-dd_bio_merge object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412070: blk_mq_sched_try_merge <-dd_bio_merge object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412070: elv_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412070: elv_rqhash_find <-elv_merge object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412070: dd_request_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412070: elv_rb_find <-dd_request_merge object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412070: __blk_mq_alloc_request <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412071: dd_limit_depth <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412071: blk_mq_get_tag <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412071: __blk_mq_get_tag <-blk_mq_get_tag object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412071: blk_mq_rq_ctx_init.isra.0 <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412071: ktime_get <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412071: dd_prepare_request <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412071: blk_account_io_start <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412071: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x3000
+ cat-95 [000] ..... 1.412071: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x3000
+
7. In-kernel trace event API
============================

--
2.25.1


2021-11-19 14:01:00

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [RFC][PATCH v5 1/4] trace: Add trace any kernel object

Hi Jeff,

Sorry for replying late!

On Sat, 13 Nov 2021 20:06:29 +0800
Jeff Xie <[email protected]> wrote:

> Introduce a method based on function tracer to trace any object and get
> the value of the object dynamically. the object can be obtained from the
> dynamic event (kprobe_event/uprobe_event) or the static event(tracepoint).
>
> Usage:
> When using the kprobe event, only need to set the objtrace(a new trigger),
> we can get the value of the object. The object is from the setting of the
> kprobe event.
>
> For example:
> For the function bio_add_page():
>
> int bio_add_page(struct bio *bio, struct page *page,
> unsigned int len, unsigned int offset)
>
> Firstly, we can set the base of the object, thus the first string "arg1"
> stands for the value of the first parameter of this function bio_add_gage(),
>
> # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
>
> Secondly, we can get the value dynamically base the above object.
>
> find the offset of the bi_size in struct bio:
> $ gdb vmlinux
> (gdb) p &(((struct bio *)0)->bi_iter.bi_size)
> $1 = (unsigned int *) 0x28
>
> # echo 'objtrace:0x28(arg1):u32:1 if comm == "cat"' > ./events/kprobes/ \
> p_bio_add_page_0/trigger

You may need "objtrace:add:..." right? :-)

BTW, this value support is done in [2/4], you should focus on what this patch
does, not the entire series does. It seems what you need is a cover mail
for this series. The description like this must be done in the cover mail.
(For example, long trace output etc.)

>
> The best way to use this is that we can set the entrance event and exit
> event, for example, the following example is to set the read_papes as
> the entrance event, and set the blk_account_io_start as the exit event.
>
> # dd if=/dev/zero of=/test.txt bs=4k count=3
> # cd /sys/kernel/debug/tracing/
> # echo 0 > ./tracing_on
> # echo 'p read_pages' >> ./kprobe_events
> # echo 'p blk_account_io_start' >> ./kprobe_events
> # echo 'traceon if comm == "cat"' > ./events/kprobes/p_read_pages_0/trigger
> # echo 'traceoff if comm == "cat"' > ./events/kprobes/p_blk_account_io_start_0/trigger
> # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
> # echo 'objtrace:0x28(arg1):u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger
>
> # du -sh /test.txt
> 12.0K /test.txt
>
> # cat /test.txt > /dev/null
> # cat ./trace
>
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 50/50 #P:1
> #
> # _-----=> irqs-off
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / _-=> migrate-disable
> # |||| / delay
> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
> # | | | ||||| | |
> cat-95 [000] ..... 1.412065: _raw_spin_unlock_irqrestore <-event_triggers_call object:0xffff888108af6328 value:0x0
> cat-95 [000] ..... 1.412066: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x0
> cat-95 [000] ..... 1.412066: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x0
> cat-95 [000] ..... 1.412066: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x1000
> cat-95 [000] ..... 1.412066: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x1000
> cat-95 [000] ..... 1.412066: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x1000
> cat-95 [000] ..... 1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x1000
> cat-95 [000] ..... 1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x1000
> cat-95 [000] ..... 1.412067: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x1000
> cat-95 [000] ..... 1.412067: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x2000
> cat-95 [000] ..... 1.412067: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x2000
> cat-95 [000] ..... 1.412067: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x2000
> cat-95 [000] ..... 1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x2000
> cat-95 [000] ..... 1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x2000
> cat-95 [000] ..... 1.412068: submit_bio <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412068: submit_bio_noacct <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412068: __submit_bio <-submit_bio_noacct object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412068: blk_try_enter_queue <-__submit_bio object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412068: submit_bio_checks <-__submit_bio object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412068: __cond_resched <-submit_bio_checks object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412068: rcu_all_qs <-__cond_resched object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412068: should_fail_bio <-submit_bio_checks object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412069: create_task_io_context <-submit_bio_checks object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412069: kmem_cache_alloc_node <-create_task_io_context object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412069: should_failslab <-kmem_cache_alloc_node object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412069: _raw_spin_lock <-create_task_io_context object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412069: blk_mq_submit_bio <-__submit_bio object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412069: __blk_queue_split <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412069: bvec_split_segs <-__blk_queue_split object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412069: blk_attempt_plug_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412070: __blk_mq_sched_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412070: dd_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412070: _raw_spin_lock <-dd_bio_merge object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412070: blk_mq_sched_try_merge <-dd_bio_merge object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412070: elv_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412070: elv_rqhash_find <-elv_merge object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412070: dd_request_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412070: elv_rb_find <-dd_request_merge object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412070: __blk_mq_alloc_request <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412071: dd_limit_depth <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412071: blk_mq_get_tag <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412071: __blk_mq_get_tag <-blk_mq_get_tag object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412071: blk_mq_rq_ctx_init.isra.0 <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412071: ktime_get <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412071: dd_prepare_request <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412071: blk_account_io_start <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412071: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x3000
> cat-95 [000] ..... 1.412071: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x3000
>
> Signed-off-by: Jeff Xie <[email protected]>
> ---
> include/linux/trace_events.h | 1 +
> kernel/trace/Kconfig | 10 +
> kernel/trace/Makefile | 1 +
> kernel/trace/trace.h | 8 +
> kernel/trace/trace_entries.h | 17 +
> kernel/trace/trace_events_trigger.c | 1 +
> kernel/trace/trace_object.c | 532 ++++++++++++++++++++++++++++
> kernel/trace/trace_output.c | 40 +++
> 8 files changed, 610 insertions(+)
> create mode 100644 kernel/trace/trace_object.c
>
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 3e475eeb5a99..84b5c12c7f44 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -684,6 +684,7 @@ enum event_trigger_type {
> ETT_EVENT_HIST = (1 << 4),
> ETT_HIST_ENABLE = (1 << 5),
> ETT_EVENT_EPROBE = (1 << 6),
> + ETT_TRACE_OBJECT = (1 << 7),
> };
>
> extern int filter_match_preds(struct event_filter *filter, void *rec);
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index 420ff4bc67fd..f69761cbfaaa 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -237,6 +237,16 @@ config FUNCTION_PROFILER
>
> If in doubt, say N.
>
> +config TRACE_OBJECT
> + bool "Trace kernel object in function parameter"
> + depends on FUNCTION_TRACER
> + depends on HAVE_FUNCTION_ARG_ACCESS_API
> + select TRACING
> + default y
> + help
> + You can trace the kernel object in the kernel function parameter.
> + The kernel object is dynamically specified via event trigger.
> +
> config STACK_TRACER
> bool "Trace max stack"
> depends on HAVE_FUNCTION_TRACER
> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index 6de5d4d63165..6d9e78a488aa 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -66,6 +66,7 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
> obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
> obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
> obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += fgraph.o
> +obj-$(CONFIG_TRACE_OBJECT) += trace_object.o
> ifeq ($(CONFIG_BLOCK),y)
> obj-$(CONFIG_EVENT_TRACING) += blktrace.o
> endif
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index b7c0f8e160fb..d2508f977fe6 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -49,6 +49,7 @@ enum trace_type {
> TRACE_TIMERLAT,
> TRACE_RAW_DATA,
> TRACE_FUNC_REPEATS,
> + TRACE_OBJECT,
>
> __TRACE_LAST_TYPE,
> };
> @@ -460,6 +461,7 @@ extern void __ftrace_bad_type(void);
> TRACE_GRAPH_RET); \
> IF_ASSIGN(var, ent, struct func_repeats_entry, \
> TRACE_FUNC_REPEATS); \
> + IF_ASSIGN(var, ent, struct trace_object_entry, TRACE_OBJECT);\
> __ftrace_bad_type(); \
> } while (0)
>
> @@ -1510,6 +1512,12 @@ static inline int register_trigger_hist_cmd(void) { return 0; }
> static inline int register_trigger_hist_enable_disable_cmds(void) { return 0; }
> #endif
>
> +#ifdef CONFIG_TRACE_OBJECT
> +extern int register_trigger_object_cmd(void);
> +#else
> +static inline int register_trigger_object_cmd(void) { return 0; }
> +#endif
> +
> extern int register_trigger_cmds(void);
> extern void clear_event_triggers(struct trace_array *tr);
>
> diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> index cd41e863b51c..bb120d9498a9 100644
> --- a/kernel/trace/trace_entries.h
> +++ b/kernel/trace/trace_entries.h
> @@ -401,3 +401,20 @@ FTRACE_ENTRY(timerlat, timerlat_entry,
> __entry->context,
> __entry->timer_latency)
> );
> +
> +/*
> + * trace object entry:
> + */
> +FTRACE_ENTRY(object, trace_object_entry,
> +
> + TRACE_OBJECT,
> +
> + F_STRUCT(
> + __field( unsigned long, ip )
> + __field( unsigned long, parent_ip )
> + __field( unsigned long, object )
> + ),
> +
> + F_printk(" %ps <-- %ps object:%lx\n",
> + (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
> +);
> diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> index 3d5c07239a2a..da2d871fa809 100644
> --- a/kernel/trace/trace_events_trigger.c
> +++ b/kernel/trace/trace_events_trigger.c
> @@ -1687,6 +1687,7 @@ __init int register_trigger_cmds(void)
> register_trigger_enable_disable_cmds();
> register_trigger_hist_enable_disable_cmds();
> register_trigger_hist_cmd();
> + register_trigger_object_cmd();
>
> return 0;
> }
> diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
> new file mode 100644
> index 000000000000..69465c2ffb7e
> --- /dev/null
> +++ b/kernel/trace/trace_object.c
> @@ -0,0 +1,532 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * trace any object
> + * Copyright (C) 2021 Jeff Xie <[email protected]>
> + */
> +
> +#define pr_fmt(fmt) "trace_object: " fmt
> +
> +#include "trace_output.h"
> +#include <linux/freelist.h>
> +
> +static DEFINE_PER_CPU(atomic_t, trace_object_event_disable);
> +static struct trace_event_file event_trace_file;
> +static const int max_args_num = 6;
> +static const int max_obj_pool = 10;
> +static atomic_t trace_object_ref;
> +static int exit_trace_object(void);
> +static int init_trace_object(void);
> +
> +struct object_instance {
> + void *object;
> + struct freelist_node freelist;
> +};
> +
> +struct obj_pool {
> + int maxactive;

It seems that 'maxactive' is used only in the init_object_pool(), so
we may not need this.

> + atomic_t nobject;
> + struct freelist_head product_freelist;
> + struct freelist_head customer_freelist;

Sorry, I might confuse you. You shouldn't use the freelist for active
list, which searches a specific entry in the list. The freelist is good
for pooling the unused object_instances but not good for search an entry.
So please use normal "list_head" for the list for active objects. And
in that case, we can use rcu_list operation on it.

BTW, I rather like calling those "active_list" and "free_list", because
product/customer are unclear what it means.

> +};
> +static struct obj_pool *obj_pool;
> +
> +static bool object_exist(void *obj)
> +{
> + struct freelist_node *node;
> + struct object_instance *inst;
> + bool ret = false;
> +
> + node = obj_pool->customer_freelist.head;
> +
> + while (node) {

Then, you can use list_for_each_entry_rcu() here.

> + inst = container_of(node, struct object_instance, freelist);
> + if (inst->object == obj) {
> + ret = true;
> + goto out;
> + }
> + node = node->next;
> + }
> +out:
> + return ret;
> +}
> +
> +static bool object_empty(void)
> +{
> + bool ret;
> +
> + ret = atomic_read(&obj_pool->nobject) == 0;
> +
> + return ret;

Instead of use atomic_read(), you can check the list itself by

return !list_empty_rcu(&obj_pool->active_list);

> +}
> +
> +
> +static void set_trace_object(void *obj)
> +{
> + struct freelist_node *fn;
> + struct object_instance *ins;
> +
> + if (!obj)
> + return;
> +
> + if (object_exist(obj))
> + goto out;

No need to use goto. Just return is OK.

> +
> + fn = freelist_try_get(&obj_pool->product_freelist);
> + if (!fn) {
> + trace_printk("object_pool is full, can't trace object:0x%px\n", obj);
> + return;
> + }
> +
> + ins = container_of(fn, struct object_instance, freelist);
> + ins->object = obj;
> +
> + freelist_add(&ins->freelist, &obj_pool->customer_freelist);
> + atomic_inc(&obj_pool->nobject);

Here, you can use list_add_rcu().

> +
> +out:
> + return;
> +}
> +
> +static inline void free_objects(struct freelist_node *node)

Usually, this kind of function will get the freelist_head, since
node can be a node in the middle of the list.

> +{
> +
> + struct object_instance *inst;
> +
> + while (node) {
> + inst = container_of(node, struct object_instance, freelist);
> + node = node->next;
> + kfree(inst);
> + }
> +}
> +
> +static inline void free_object_pool(void)
> +{
> + free_objects(obj_pool->product_freelist.head);
> + free_objects(obj_pool->customer_freelist.head);
> + kfree(obj_pool);
> +}
> +
> +
> +static int init_object_pool(void)
> +{
> + struct object_instance *inst;
> + int i, ret = 0;
> +
> + obj_pool = kzalloc(sizeof(*obj_pool), GFP_KERNEL);
> + if (!obj_pool) {
> + ret = -ENOMEM;
> + goto out;
> + }
> + obj_pool->maxactive = max_obj_pool;
> + obj_pool->product_freelist.head = NULL;
> + obj_pool->customer_freelist.head = NULL;

You don't need to these because you already allocated a zero-filled
object by kzalloc().

> +
> + for (i = 0; i < obj_pool->maxactive; i++) {
> + inst = kzalloc(sizeof(*inst), GFP_KERNEL);
> + if (!inst) {
> + free_object_pool();
> + ret = -ENOMEM;
> + goto out;
> + }
> + freelist_add(&inst->freelist, &obj_pool->product_freelist);
> + }
> +out:
> + return ret;
> +}
> +
> +static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> + unsigned long object)
> +{
> +
> + struct trace_buffer *buffer;
> + struct ring_buffer_event *event;
> + struct trace_object_entry *entry;
> + int pc;
> +
> + pc = preempt_count();
> + event = trace_event_buffer_lock_reserve(&buffer, &event_trace_file,
> + TRACE_OBJECT, sizeof(*entry), pc);
> + if (!event)
> + return;
> + entry = ring_buffer_event_data(event);
> + entry->ip = ip;
> + entry->parent_ip = parent_ip;
> + entry->object = object;
> +
> + event_trigger_unlock_commit(&event_trace_file, buffer, event,
> + entry, pc);
> +}
> +
> +static void
> +trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> + struct ftrace_ops *op, struct ftrace_regs *fregs)
> +{
> + struct pt_regs *pt_regs = ftrace_get_regs(fregs);
> + unsigned long obj;
> + long disabled;
> + int cpu, n;
> +
> + preempt_disable_notrace();
> +
> + cpu = raw_smp_processor_id();
> + disabled = atomic_inc_return(&per_cpu(trace_object_event_disable, cpu));

What is the purpose of this?
(BTW, if you use per_cpu variable, you don't need atomic ops, because there is
no racing with other cpus.)

> +
> + if (disabled != 1)
> + goto out;
> +
> + if (object_empty())
> + goto out;
> +
> + for (n = 0; n < max_args_num; n++) {
> + obj = regs_get_kernel_argument(pt_regs, n);
> + if (object_exist((void *)obj))
> + submit_trace_object(ip, parent_ip, obj);
> + }
> +
> +out:
> + atomic_dec(&per_cpu(trace_object_event_disable, cpu));
> + preempt_enable_notrace();
> +}
> +
> +static struct ftrace_ops trace_ops = {
> + .func = trace_object_events_call,
> + .flags = FTRACE_OPS_FL_SAVE_REGS,
> +};
> +
> +static void
> +trace_object_trigger(struct event_trigger_data *data,
> + struct trace_buffer *buffer, void *rec,
> + struct ring_buffer_event *event)
> +{
> +
> + struct ftrace_event_field *field = data->private_data;
> + void *obj = NULL;
> +
> + memcpy(&obj, rec + field->offset, sizeof(obj));
> + set_trace_object(obj);
> +}
> +
> +static void
> +trace_object_trigger_free(struct event_trigger_ops *ops,
> + struct event_trigger_data *data)
> +{
> + if (WARN_ON_ONCE(data->ref <= 0))
> + return;
> +
> + data->ref--;
> + if (!data->ref)
> + trigger_data_free(data);
> +}
> +
> +static void
> +trace_object_count_trigger(struct event_trigger_data *data,
> + struct trace_buffer *buffer, void *rec,
> + struct ring_buffer_event *event)
> +{
> + if (!data->count)
> + return;
> +
> + if (data->count != -1)
> + (data->count)--;
> +
> + trace_object_trigger(data, buffer, rec, event);
> +}
> +
> +static int event_object_trigger_init(struct event_trigger_ops *ops,
> + struct event_trigger_data *data)
> +{
> + data->ref++;
> + return 0;
> +}
> +
> +static int
> +event_trigger_print(const char *name, struct seq_file *m,
> + void *data, char *filter_str)
> +{
> + long count = (long)data;
> +
> + seq_puts(m, name);
> +
> + if (count == -1)
> + seq_puts(m, ":unlimited");
> + else
> + seq_printf(m, ":count=%ld", count);
> +
> + if (filter_str)
> + seq_printf(m, " if %s\n", filter_str);
> + else
> + seq_putc(m, '\n');
> +
> + return 0;
> +}
> +
> +static int
> +trace_object_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
> + struct event_trigger_data *data)
> +{
> + return event_trigger_print("objtrace", m, (void *)data->count,
> + data->filter_str);
> +}
> +
> +
> +static struct event_trigger_ops objecttrace_trigger_ops = {
> + .func = trace_object_trigger,
> + .print = trace_object_trigger_print,
> + .init = event_object_trigger_init,
> + .free = trace_object_trigger_free,
> +};
> +
> +static struct event_trigger_ops objecttrace_count_trigger_ops = {
> + .func = trace_object_count_trigger,
> + .print = trace_object_trigger_print,
> + .init = event_object_trigger_init,
> + .free = trace_object_trigger_free,
> +};
> +
> +static struct event_trigger_ops *
> +objecttrace_get_trigger_ops(char *cmd, char *param)
> +{
> + return param ? &objecttrace_count_trigger_ops : &objecttrace_trigger_ops;
> +}
> +
> +static int register_object_trigger(char *glob, struct event_trigger_ops *ops,
> + struct event_trigger_data *data,
> + struct trace_event_file *file)
> +{
> + struct event_trigger_data *test;
> + int ret = 0;
> +
> + lockdep_assert_held(&event_mutex);
> +
> + list_for_each_entry(test, &file->triggers, list) {
> + if (test->cmd_ops->trigger_type == data->cmd_ops->trigger_type) {
> + ret = -EEXIST;
> + goto out;
> + }
> + }
> +
> + if (data->ops->init) {
> + ret = data->ops->init(data->ops, data);
> + if (ret < 0)
> + goto out;
> + }
> +
> + list_add_rcu(&data->list, &file->triggers);
> + ret++;
> +
> + update_cond_flag(file);
> + if (trace_event_trigger_enable_disable(file, 1) < 0) {
> + list_del_rcu(&data->list);
> + update_cond_flag(file);
> + ret--;
> + }
> + init_trace_object();
> +out:
> + return ret;
> +}
> +
> +static void unregister_object_trigger(char *glob, struct event_trigger_ops *ops,
> + struct event_trigger_data *test,
> + struct trace_event_file *file)
> +{
> + struct event_trigger_data *data;
> + bool unregistered = false;
> +
> + lockdep_assert_held(&event_mutex);
> +
> + list_for_each_entry(data, &file->triggers, list) {
> + if (data->cmd_ops->trigger_type == test->cmd_ops->trigger_type) {
> + unregistered = true;
> + list_del_rcu(&data->list);
> + trace_event_trigger_enable_disable(file, 0);
> + update_cond_flag(file);
> + break;
> + }
> + }
> +
> + if (unregistered && data->ops->free) {
> + data->ops->free(data->ops, data);
> + exit_trace_object();
> + }
> +}
> +
> +static int
> +event_object_trigger_callback(struct event_command *cmd_ops,
> + struct trace_event_file *file,
> + char *glob, char *cmd, char *param)
> +{
> + struct event_trigger_data *trigger_data;
> + struct event_trigger_ops *trigger_ops;
> + struct trace_event_call *call;
> + struct ftrace_event_field *field;
> + char *objtrace_cmd;
> + char *trigger = NULL;
> + char *arg;
> + char *number;
> + int ret;
> +
> + ret = -EINVAL;
> + if (!param)
> + goto out;
> +
> + /* separate the trigger from the filter (c:a:n [if filter]) */
> + trigger = strsep(&param, " \t");
> + if (!trigger)
> + goto out;
> + if (param) {
> + param = skip_spaces(param);
> + if (!*param)
> + param = NULL;
> + }
> +
> + objtrace_cmd = strsep(&trigger, ":");
> + if (!objtrace_cmd || strcmp(objtrace_cmd, "add"))
> + goto out;
> +
> + arg = strsep(&trigger, ":");
> + if (!arg)
> + goto out;
> + call = file->event_call;
> + field = trace_find_event_field(call, arg);
> + if (!field)
> + goto out;

As I pointed in v3, please ensure the field->size == sizeof(void *).
(Note that memcpy() is available only if the both size is same.
For example, on the big endian architecture, copying 8bit 0x12 on 64bit
pointer will be 0x1200000000000000.)

Thank you,


> +
> + trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
> +
> + ret = -ENOMEM;
> + trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
> + if (!trigger_data)
> + goto out;
> +
> + trigger_data->count = -1;
> + trigger_data->ops = trigger_ops;
> + trigger_data->cmd_ops = cmd_ops;
> + trigger_data->private_data = field;
> + INIT_LIST_HEAD(&trigger_data->list);
> + INIT_LIST_HEAD(&trigger_data->named_list);
> +
> + if (glob[0] == '!') {
> + cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
> + kfree(trigger_data);
> + ret = 0;
> + goto out;
> + }
> +
> + if (trigger) {
> + number = strsep(&trigger, ":");
> +
> + ret = -EINVAL;
> + if (!strlen(number))
> + goto out_free;
> +
> + /*
> + * We use the callback data field (which is a pointer)
> + * as our counter.
> + */
> + ret = kstrtoul(number, 0, &trigger_data->count);
> + if (ret)
> + goto out_free;
> + }
> +
> + if (!param) /* if param is non-empty, it's supposed to be a filter */
> + goto out_reg;
> +
> + if (!cmd_ops->set_filter)
> + goto out_reg;
> +
> + ret = cmd_ops->set_filter(param, trigger_data, file);
> + if (ret < 0)
> + goto out_free;
> +
> + out_reg:
> + /* Up the trigger_data count to make sure reg doesn't free it on failure */
> + event_object_trigger_init(trigger_ops, trigger_data);
> + ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
> + /*
> + * The above returns on success the # of functions enabled,
> + * but if it didn't find any functions it returns zero.
> + * Consider no functions a failure too.
> + */
> + if (!ret) {
> + cmd_ops->unreg(glob, trigger_ops, trigger_data, file);
> + ret = -ENOENT;
> + } else if (ret > 0)
> + ret = 0;
> +
> + /* Down the counter of trigger_data or free it if not used anymore */
> + trace_object_trigger_free(trigger_ops, trigger_data);
> + out:
> + return ret;
> +
> + out_free:
> + if (cmd_ops->set_filter)
> + cmd_ops->set_filter(NULL, trigger_data, NULL);
> + kfree(trigger_data);
> + goto out;
> +}
> +
> +static struct event_command trigger_object_cmd = {
> + .name = "objtrace",
> + .trigger_type = ETT_TRACE_OBJECT,
> + .flags = EVENT_CMD_FL_NEEDS_REC,
> + .func = event_object_trigger_callback,
> + .reg = register_object_trigger,
> + .unreg = unregister_object_trigger,
> + .get_trigger_ops = objecttrace_get_trigger_ops,
> + .set_filter = set_trigger_filter,
> +};
> +
> +__init int register_trigger_object_cmd(void)
> +{
> + int ret;
> +
> + ret = register_event_command(&trigger_object_cmd);
> + WARN_ON(ret < 0);
> +
> + return ret;
> +}
> +
> +static int init_trace_object(void)
> +{
> + int ret;
> +
> + if (atomic_inc_return(&trace_object_ref) != 1) {
> + ret = 0;
> + goto out;
> + }
> +
> + ret = init_object_pool();
> + if (ret)
> + goto out;
> +
> + event_trace_file.tr = top_trace_array();
> + if (WARN_ON(!event_trace_file.tr)) {
> + ret = -1;
> + goto out;
> + }
> + ret = register_ftrace_function(&trace_ops);
> +out:
> + return ret;
> +}
> +
> +static int exit_trace_object(void)
> +{
> + int ret;
> +
> + if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))
> + goto out;
> +
> + if (atomic_dec_return(&trace_object_ref) != 0) {
> + ret = 0;
> + goto out;
> + }
> +
> + ret = unregister_ftrace_function(&trace_ops);
> + if (ret) {
> + pr_err("can't unregister ftrace for trace object\n");
> + goto out;
> + }
> + free_object_pool();
> +out:
> + return ret;
> +}
> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index c2ca40e8595b..76ca560af693 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -1552,6 +1552,45 @@ static struct trace_event trace_func_repeats_event = {
> .funcs = &trace_func_repeats_funcs,
> };
>
> +/* TRACE_OBJECT */
> +static enum print_line_t trace_object_print(struct trace_iterator *iter, int flags,
> + struct trace_event *event)
> +{
> + struct trace_object_entry *field;
> + struct trace_seq *s = &iter->seq;
> +
> + trace_assign_type(field, iter->ent);
> + print_fn_trace(s, field->ip, field->parent_ip, flags);
> + trace_seq_printf(s, " object:0x%lx", field->object);
> + trace_seq_putc(s, '\n');
> +
> + return trace_handle_return(s);
> +}
> +
> +static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags,
> + struct trace_event *event)
> +{
> + struct trace_object_entry *field;
> +
> + trace_assign_type(field, iter->ent);
> +
> + trace_seq_printf(&iter->seq, "%lx %lx\n",
> + field->ip,
> + field->parent_ip);
> +
> + return trace_handle_return(&iter->seq);
> +}
> +
> +static struct trace_event_functions trace_object_funcs = {
> + .trace = trace_object_print,
> + .raw = trace_object_raw,
> +};
> +
> +static struct trace_event trace_object_event = {
> + .type = TRACE_OBJECT,
> + .funcs = &trace_object_funcs,
> +};
> +
> static struct trace_event *events[] __initdata = {
> &trace_fn_event,
> &trace_ctx_event,
> @@ -1566,6 +1605,7 @@ static struct trace_event *events[] __initdata = {
> &trace_timerlat_event,
> &trace_raw_data_event,
> &trace_func_repeats_event,
> + &trace_object_event,
> NULL
> };
>
> --
> 2.25.1
>


--
Masami Hiramatsu <[email protected]>

2021-11-19 14:01:25

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [RFC][PATCH v5 2/4] trace/objtrace: get the value of the object

Hi Jeff,

On Sat, 13 Nov 2021 20:06:30 +0800
Jeff Xie <[email protected]> wrote:

Please describe here what feature this patch adds.
How to use, and new syntax, etc.

BTW, the syntax for this value trace is a bit confusing.

objtrace:add:OFFS(OBJ):TYPE[:COUNT]

This trace "OBJ", but from the user point of view, this seems to trace
"OFFS(OBJ)".

I rather like make it optional and split from OBJ as below;

objtrace:add:OBJ[,OFFS:TYPE][:COUNT]

(Note that the part braced by [] is optional.)

Thank you,

> Signed-off-by: Jeff Xie <[email protected]>
> ---
> kernel/trace/trace_entries.h | 5 +-
> kernel/trace/trace_object.c | 121 +++++++++++++++++++++++++++++------
> kernel/trace/trace_output.c | 6 +-
> 3 files changed, 107 insertions(+), 25 deletions(-)
>
> diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> index bb120d9498a9..2407c45a568c 100644
> --- a/kernel/trace/trace_entries.h
> +++ b/kernel/trace/trace_entries.h
> @@ -413,8 +413,9 @@ FTRACE_ENTRY(object, trace_object_entry,
> __field( unsigned long, ip )
> __field( unsigned long, parent_ip )
> __field( unsigned long, object )
> + __field( unsigned long, value )
> ),
>
> - F_printk(" %ps <-- %ps object:%lx\n",
> - (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
> + F_printk(" %ps <-- %ps object:%lx value:%lx\n", (void *)__entry->ip,
> + (void *)__entry->parent_ip, __entry->object, __entry->value)
> );
> diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
> index 69465c2ffb7e..14993f7d0e5a 100644
> --- a/kernel/trace/trace_object.c
> +++ b/kernel/trace/trace_object.c
> @@ -11,14 +11,25 @@
>
> static DEFINE_PER_CPU(atomic_t, trace_object_event_disable);
> static struct trace_event_file event_trace_file;
> -static const int max_args_num = 6;
> static const int max_obj_pool = 10;
> static atomic_t trace_object_ref;
> static int exit_trace_object(void);
> static int init_trace_object(void);
>
> +struct objtrace_trigger_data {
> + struct ftrace_event_field *field;
> + long offset;
> + int type_size;
> +};
> +
> +struct objtrace_fetch_type {
> + char *name;
> + int type_size;
> +};
> +
> struct object_instance {
> void *object;
> + int obj_type_size;
> struct freelist_node freelist;
> };
>
> @@ -59,8 +70,7 @@ static bool object_empty(void)
> return ret;
> }
>
> -
> -static void set_trace_object(void *obj)
> +static void set_trace_object(void *obj, int type_size)
> {
> struct freelist_node *fn;
> struct object_instance *ins;
> @@ -79,6 +89,7 @@ static void set_trace_object(void *obj)
>
> ins = container_of(fn, struct object_instance, freelist);
> ins->object = obj;
> + ins->obj_type_size = type_size;
>
> freelist_add(&ins->freelist, &obj_pool->customer_freelist);
> atomic_inc(&obj_pool->nobject);
> @@ -135,7 +146,7 @@ static int init_object_pool(void)
> }
>
> static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> - unsigned long object)
> + unsigned long object, unsigned long value)
> {
>
> struct trace_buffer *buffer;
> @@ -152,6 +163,7 @@ static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> entry->ip = ip;
> entry->parent_ip = parent_ip;
> entry->object = object;
> + entry->value = value;
>
> event_trigger_unlock_commit(&event_trace_file, buffer, event,
> entry, pc);
> @@ -161,10 +173,11 @@ static void
> trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> struct ftrace_ops *op, struct ftrace_regs *fregs)
> {
> - struct pt_regs *pt_regs = ftrace_get_regs(fregs);
> - unsigned long obj;
> + struct freelist_node *node;
> + struct object_instance *inst;
> + unsigned long val = 0;
> long disabled;
> - int cpu, n;
> + int cpu;
>
> preempt_disable_notrace();
>
> @@ -177,10 +190,14 @@ trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> if (object_empty())
> goto out;
>
> - for (n = 0; n < max_args_num; n++) {
> - obj = regs_get_kernel_argument(pt_regs, n);
> - if (object_exist((void *)obj))
> - submit_trace_object(ip, parent_ip, obj);
> + node = obj_pool->customer_freelist.head;
> +
> + while (node) {
> + inst = container_of(node, struct object_instance, freelist);
> + if (copy_from_kernel_nofault(&val, inst->object, inst->obj_type_size))
> + goto out;
> + submit_trace_object(ip, parent_ip, (unsigned long)inst->object, val);
> + node = node->next;
> }
>
> out:
> @@ -198,12 +215,14 @@ trace_object_trigger(struct event_trigger_data *data,
> struct trace_buffer *buffer, void *rec,
> struct ring_buffer_event *event)
> {
> + struct objtrace_trigger_data *obj_data = data->private_data;
> + struct ftrace_event_field *field;
> + void *obj, *val = NULL;
>
> - struct ftrace_event_field *field = data->private_data;
> - void *obj = NULL;
> -
> - memcpy(&obj, rec + field->offset, sizeof(obj));
> - set_trace_object(obj);
> + field = obj_data->field;
> + memcpy(&val, rec + field->offset, sizeof(val));
> + obj = val + obj_data->offset;
> + set_trace_object(obj, obj_data->type_size);
> }
>
> static void
> @@ -350,6 +369,22 @@ static void unregister_object_trigger(char *glob, struct event_trigger_ops *ops,
> }
> }
>
> +static const struct objtrace_fetch_type objtrace_fetch_types[] = {
> + {"u8", 1},
> + {"s8", 1},
> + {"x8", 1},
> + {"u16", 2},
> + {"s16", 2},
> + {"x16", 2},
> + {"u32", 4},
> + {"s32", 4},
> + {"x32", 4},
> + {"u64", 8},
> + {"s64", 8},
> + {"x64", 8},
> + {}
> +};
> +
> static int
> event_object_trigger_callback(struct event_command *cmd_ops,
> struct trace_event_file *file,
> @@ -357,13 +392,15 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> {
> struct event_trigger_data *trigger_data;
> struct event_trigger_ops *trigger_ops;
> + struct objtrace_trigger_data *obj_data;
> struct trace_event_call *call;
> struct ftrace_event_field *field;
> char *objtrace_cmd;
> + long offset = 0;
> char *trigger = NULL;
> - char *arg;
> + char *arg, *type, *tr, *tr_end;
> char *number;
> - int ret;
> + int ret, i, type_size = 0;
>
> ret = -EINVAL;
> if (!param)
> @@ -386,6 +423,38 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> arg = strsep(&trigger, ":");
> if (!arg)
> goto out;
> +
> + tr = strchr(arg, '(');
> + /* now force to get the value of the val. */
> + if (!tr)
> + goto out;
> + tr_end = strchr(tr, ')');
> + if (!tr_end)
> + goto out;
> + *tr++ = '\0';
> + *tr_end = '\0';
> + ret = kstrtol(arg, 0, &offset);
> + if (ret)
> + goto out;
> + arg = tr;
> + ret = -EINVAL;
> + if (!trigger)
> + goto out;
> +
> + type = strsep(&trigger, ":");
> + if (!type)
> + goto out;
> + for (i = 0; objtrace_fetch_types[i].name; i++) {
> + if (strcmp(objtrace_fetch_types[i].name, type) == 0) {
> + type_size = objtrace_fetch_types[i].type_size;
> + break;
> + }
> + }
> +
> + if (type_size == 0)
> + goto out;
> +
> +
> call = file->event_call;
> field = trace_find_event_field(call, arg);
> if (!field)
> @@ -394,19 +463,30 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
>
> ret = -ENOMEM;
> + obj_data = kzalloc(sizeof(*obj_data), GFP_KERNEL);
> + if (!obj_data)
> + goto out;
> +
> + obj_data->field = field;
> + obj_data->offset = offset;
> + obj_data->type_size = type_size;
> +
> trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
> - if (!trigger_data)
> + if (!trigger_data) {
> + kfree(obj_data);
> goto out;
> + }
>
> trigger_data->count = -1;
> trigger_data->ops = trigger_ops;
> trigger_data->cmd_ops = cmd_ops;
> - trigger_data->private_data = field;
> + trigger_data->private_data = obj_data;
> INIT_LIST_HEAD(&trigger_data->list);
> INIT_LIST_HEAD(&trigger_data->named_list);
>
> if (glob[0] == '!') {
> cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
> + kfree(obj_data);
> kfree(trigger_data);
> ret = 0;
> goto out;
> @@ -461,6 +541,7 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> out_free:
> if (cmd_ops->set_filter)
> cmd_ops->set_filter(NULL, trigger_data, NULL);
> + kfree(obj_data);
> kfree(trigger_data);
> goto out;
> }
> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index 76ca560af693..c8c427c23127 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -1562,6 +1562,7 @@ static enum print_line_t trace_object_print(struct trace_iterator *iter, int fla
> trace_assign_type(field, iter->ent);
> print_fn_trace(s, field->ip, field->parent_ip, flags);
> trace_seq_printf(s, " object:0x%lx", field->object);
> + trace_seq_printf(s, " value:0x%lx", field->value);
> trace_seq_putc(s, '\n');
>
> return trace_handle_return(s);
> @@ -1574,9 +1575,8 @@ static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags
>
> trace_assign_type(field, iter->ent);
>
> - trace_seq_printf(&iter->seq, "%lx %lx\n",
> - field->ip,
> - field->parent_ip);
> + trace_seq_printf(&iter->seq, "%lx %lx %lx %lx\n", field->ip,
> + field->parent_ip, field->object, field->value);
>
> return trace_handle_return(&iter->seq);
> }
> --
> 2.25.1
>


--
Masami Hiramatsu <[email protected]>

2021-11-19 14:09:46

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [RFC][PATCH v5 3/4] trace/objtrace: Add testcases for objtrace trigger parsing

On Sat, 13 Nov 2021 20:06:31 +0800
Jeff Xie <[email protected]> wrote:

Please add a patch description here.

> Signed-off-by: Jeff Xie <[email protected]>
> ---
> .../ftrace/test.d/trigger/trigger-objtrace.tc | 41 +++++++++++++++++++
> 1 file changed, 41 insertions(+)
> create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
>
> diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
> new file mode 100644
> index 000000000000..4efcd7eee694
> --- /dev/null
> +++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
> @@ -0,0 +1,41 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0
> +# description: event trigger - test objtrace-trigger
> +# requires: kprobe_events

Hmm, didn't you add something for this in <tracefs>/README?
Instead of checking trigger file, I recommend to update

There is a text data in the kernel/trace/trace.c;
static const char readme_msg[] =
...
"\t trigger: traceon, traceoff\n"
"\t enable_event:<system>:<event>\n"
"\t disable_event:<system>:<event>\n"
#ifdef CONFIG_HIST_TRIGGERS
"\t enable_hist:<system>:<event>\n"
"\t disable_hist:<system>:<event>\n"
#endif

You can add somewhere near there as

#ifdef CONFIG_TRACE_OBJECT
"\t objtrace:add:..." // fill syntax.
#endif

And then, you can write the pattern in the requires line, like

# requires: kprobe_events "objtrace":README

Thank you,

> +
> +fail() { #msg
> + echo $1
> + exit_fail
> +}
> +
> +echo 'p bio_add_page arg1=$arg1' > kprobe_events
> +
> +FEATURE=`grep objtrace events/kprobes/p_bio_add_page_0/trigger`
> +if [ -z "$FEATURE" ]; then
> + echo "objtrace trigger is not supported"
> + exit_unsupported
> +fi


> +
> +echo "Test objtrace trigger"
> +echo 'objtrace:add:0x28(arg1):u32:1 if comm == "cat"' > \
> + events/kprobes/p_bio_add_page_0/trigger
> +if [ -z $? ]; then
> + fail "objtrace trigger syntax error"
> +fi
> +
> +echo "Test objtrace semantic errors"
> +
> +# Being lack of type size
> +! echo 'objtrace:add:0x28(arg1):1' > events/kprobes/p_bio_add_page_0/trigger
> +# Being lack of objtrace command
> +! echo 'objtrace:0x28(arg1):u32:1' > events/kprobes/p_bio_add_page_0/trigger
> +# Bad parameter name
> +! echo 'objtrace:add:argx:u32:1' > events/kprobes/p_bio_add_page_0/trigger
> +
> +echo "reset objtrace trigger"
> +
> +echo '!objtrace:add:0x28(arg1):u32' > \
> + events/kprobes/p_bio_add_page_0/trigger
> +echo '-:p_bio_add_page_0' >> ./kprobe_events
> +
> +exit 0
> --
> 2.25.1
>


--
Masami Hiramatsu <[email protected]>

2021-11-19 14:58:19

by Jeff Xie

[permalink] [raw]
Subject: Re: [RFC][PATCH v5 1/4] trace: Add trace any kernel object

Hi Masami,

Thank you so much for your patient comments!
I have been following the kernel mailing list every day recently. and
know that you and Steve are both super busy.

I have to say that I am really embarrassed (too many mistakes;-) ).
But now I just want to say that I will spend time
to correct all the mistakes seriously.

On Fri, Nov 19, 2021 at 10:00 PM Masami Hiramatsu <[email protected]> wrote:
>
> Hi Jeff,
>
> Sorry for replying late!
>
> On Sat, 13 Nov 2021 20:06:29 +0800
> Jeff Xie <[email protected]> wrote:
>
> > Introduce a method based on function tracer to trace any object and get
> > the value of the object dynamically. the object can be obtained from the
> > dynamic event (kprobe_event/uprobe_event) or the static event(tracepoint).
> >
> > Usage:
> > When using the kprobe event, only need to set the objtrace(a new trigger),
> > we can get the value of the object. The object is from the setting of the
> > kprobe event.
> >
> > For example:
> > For the function bio_add_page():
> >
> > int bio_add_page(struct bio *bio, struct page *page,
> > unsigned int len, unsigned int offset)
> >
> > Firstly, we can set the base of the object, thus the first string "arg1"
> > stands for the value of the first parameter of this function bio_add_gage(),
> >
> > # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
> >
> > Secondly, we can get the value dynamically base the above object.
> >
> > find the offset of the bi_size in struct bio:
> > $ gdb vmlinux
> > (gdb) p &(((struct bio *)0)->bi_iter.bi_size)
> > $1 = (unsigned int *) 0x28
> >
> > # echo 'objtrace:0x28(arg1):u32:1 if comm == "cat"' > ./events/kprobes/ \
> > p_bio_add_page_0/trigger
>
> You may need "objtrace:add:..." right? :-)
>
> BTW, this value support is done in [2/4], you should focus on what this patch
> does, not the entire series does. It seems what you need is a cover mail
> for this series. The description like this must be done in the cover mail.
> (For example, long trace output etc.)
>
> >
> > The best way to use this is that we can set the entrance event and exit
> > event, for example, the following example is to set the read_papes as
> > the entrance event, and set the blk_account_io_start as the exit event.
> >
> > # dd if=/dev/zero of=/test.txt bs=4k count=3
> > # cd /sys/kernel/debug/tracing/
> > # echo 0 > ./tracing_on
> > # echo 'p read_pages' >> ./kprobe_events
> > # echo 'p blk_account_io_start' >> ./kprobe_events
> > # echo 'traceon if comm == "cat"' > ./events/kprobes/p_read_pages_0/trigger
> > # echo 'traceoff if comm == "cat"' > ./events/kprobes/p_blk_account_io_start_0/trigger
> > # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
> > # echo 'objtrace:0x28(arg1):u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger
> >
> > # du -sh /test.txt
> > 12.0K /test.txt
> >
> > # cat /test.txt > /dev/null
> > # cat ./trace
> >
> > # tracer: nop
> > #
> > # entries-in-buffer/entries-written: 50/50 #P:1
> > #
> > # _-----=> irqs-off
> > # / _----=> need-resched
> > # | / _---=> hardirq/softirq
> > # || / _--=> preempt-depth
> > # ||| / _-=> migrate-disable
> > # |||| / delay
> > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
> > # | | | ||||| | |
> > cat-95 [000] ..... 1.412065: _raw_spin_unlock_irqrestore <-event_triggers_call object:0xffff888108af6328 value:0x0
> > cat-95 [000] ..... 1.412066: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x0
> > cat-95 [000] ..... 1.412066: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x0
> > cat-95 [000] ..... 1.412066: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x1000
> > cat-95 [000] ..... 1.412066: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x1000
> > cat-95 [000] ..... 1.412066: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x1000
> > cat-95 [000] ..... 1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x1000
> > cat-95 [000] ..... 1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x1000
> > cat-95 [000] ..... 1.412067: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x1000
> > cat-95 [000] ..... 1.412067: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x2000
> > cat-95 [000] ..... 1.412067: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x2000
> > cat-95 [000] ..... 1.412067: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x2000
> > cat-95 [000] ..... 1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x2000
> > cat-95 [000] ..... 1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x2000
> > cat-95 [000] ..... 1.412068: submit_bio <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412068: submit_bio_noacct <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412068: __submit_bio <-submit_bio_noacct object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412068: blk_try_enter_queue <-__submit_bio object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412068: submit_bio_checks <-__submit_bio object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412068: __cond_resched <-submit_bio_checks object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412068: rcu_all_qs <-__cond_resched object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412068: should_fail_bio <-submit_bio_checks object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412069: create_task_io_context <-submit_bio_checks object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412069: kmem_cache_alloc_node <-create_task_io_context object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412069: should_failslab <-kmem_cache_alloc_node object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412069: _raw_spin_lock <-create_task_io_context object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412069: blk_mq_submit_bio <-__submit_bio object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412069: __blk_queue_split <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412069: bvec_split_segs <-__blk_queue_split object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412069: blk_attempt_plug_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412070: __blk_mq_sched_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412070: dd_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412070: _raw_spin_lock <-dd_bio_merge object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412070: blk_mq_sched_try_merge <-dd_bio_merge object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412070: elv_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412070: elv_rqhash_find <-elv_merge object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412070: dd_request_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412070: elv_rb_find <-dd_request_merge object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412070: __blk_mq_alloc_request <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412071: dd_limit_depth <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412071: blk_mq_get_tag <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412071: __blk_mq_get_tag <-blk_mq_get_tag object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412071: blk_mq_rq_ctx_init.isra.0 <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412071: ktime_get <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412071: dd_prepare_request <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412071: blk_account_io_start <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412071: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x3000
> > cat-95 [000] ..... 1.412071: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x3000
> >
> > Signed-off-by: Jeff Xie <[email protected]>
> > ---
> > include/linux/trace_events.h | 1 +
> > kernel/trace/Kconfig | 10 +
> > kernel/trace/Makefile | 1 +
> > kernel/trace/trace.h | 8 +
> > kernel/trace/trace_entries.h | 17 +
> > kernel/trace/trace_events_trigger.c | 1 +
> > kernel/trace/trace_object.c | 532 ++++++++++++++++++++++++++++
> > kernel/trace/trace_output.c | 40 +++
> > 8 files changed, 610 insertions(+)
> > create mode 100644 kernel/trace/trace_object.c
> >
> > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> > index 3e475eeb5a99..84b5c12c7f44 100644
> > --- a/include/linux/trace_events.h
> > +++ b/include/linux/trace_events.h
> > @@ -684,6 +684,7 @@ enum event_trigger_type {
> > ETT_EVENT_HIST = (1 << 4),
> > ETT_HIST_ENABLE = (1 << 5),
> > ETT_EVENT_EPROBE = (1 << 6),
> > + ETT_TRACE_OBJECT = (1 << 7),
> > };
> >
> > extern int filter_match_preds(struct event_filter *filter, void *rec);
> > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > index 420ff4bc67fd..f69761cbfaaa 100644
> > --- a/kernel/trace/Kconfig
> > +++ b/kernel/trace/Kconfig
> > @@ -237,6 +237,16 @@ config FUNCTION_PROFILER
> >
> > If in doubt, say N.
> >
> > +config TRACE_OBJECT
> > + bool "Trace kernel object in function parameter"
> > + depends on FUNCTION_TRACER
> > + depends on HAVE_FUNCTION_ARG_ACCESS_API
> > + select TRACING
> > + default y
> > + help
> > + You can trace the kernel object in the kernel function parameter.
> > + The kernel object is dynamically specified via event trigger.
> > +
> > config STACK_TRACER
> > bool "Trace max stack"
> > depends on HAVE_FUNCTION_TRACER
> > diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> > index 6de5d4d63165..6d9e78a488aa 100644
> > --- a/kernel/trace/Makefile
> > +++ b/kernel/trace/Makefile
> > @@ -66,6 +66,7 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
> > obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
> > obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
> > obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += fgraph.o
> > +obj-$(CONFIG_TRACE_OBJECT) += trace_object.o
> > ifeq ($(CONFIG_BLOCK),y)
> > obj-$(CONFIG_EVENT_TRACING) += blktrace.o
> > endif
> > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > index b7c0f8e160fb..d2508f977fe6 100644
> > --- a/kernel/trace/trace.h
> > +++ b/kernel/trace/trace.h
> > @@ -49,6 +49,7 @@ enum trace_type {
> > TRACE_TIMERLAT,
> > TRACE_RAW_DATA,
> > TRACE_FUNC_REPEATS,
> > + TRACE_OBJECT,
> >
> > __TRACE_LAST_TYPE,
> > };
> > @@ -460,6 +461,7 @@ extern void __ftrace_bad_type(void);
> > TRACE_GRAPH_RET); \
> > IF_ASSIGN(var, ent, struct func_repeats_entry, \
> > TRACE_FUNC_REPEATS); \
> > + IF_ASSIGN(var, ent, struct trace_object_entry, TRACE_OBJECT);\
> > __ftrace_bad_type(); \
> > } while (0)
> >
> > @@ -1510,6 +1512,12 @@ static inline int register_trigger_hist_cmd(void) { return 0; }
> > static inline int register_trigger_hist_enable_disable_cmds(void) { return 0; }
> > #endif
> >
> > +#ifdef CONFIG_TRACE_OBJECT
> > +extern int register_trigger_object_cmd(void);
> > +#else
> > +static inline int register_trigger_object_cmd(void) { return 0; }
> > +#endif
> > +
> > extern int register_trigger_cmds(void);
> > extern void clear_event_triggers(struct trace_array *tr);
> >
> > diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> > index cd41e863b51c..bb120d9498a9 100644
> > --- a/kernel/trace/trace_entries.h
> > +++ b/kernel/trace/trace_entries.h
> > @@ -401,3 +401,20 @@ FTRACE_ENTRY(timerlat, timerlat_entry,
> > __entry->context,
> > __entry->timer_latency)
> > );
> > +
> > +/*
> > + * trace object entry:
> > + */
> > +FTRACE_ENTRY(object, trace_object_entry,
> > +
> > + TRACE_OBJECT,
> > +
> > + F_STRUCT(
> > + __field( unsigned long, ip )
> > + __field( unsigned long, parent_ip )
> > + __field( unsigned long, object )
> > + ),
> > +
> > + F_printk(" %ps <-- %ps object:%lx\n",
> > + (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
> > +);
> > diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> > index 3d5c07239a2a..da2d871fa809 100644
> > --- a/kernel/trace/trace_events_trigger.c
> > +++ b/kernel/trace/trace_events_trigger.c
> > @@ -1687,6 +1687,7 @@ __init int register_trigger_cmds(void)
> > register_trigger_enable_disable_cmds();
> > register_trigger_hist_enable_disable_cmds();
> > register_trigger_hist_cmd();
> > + register_trigger_object_cmd();
> >
> > return 0;
> > }
> > diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
> > new file mode 100644
> > index 000000000000..69465c2ffb7e
> > --- /dev/null
> > +++ b/kernel/trace/trace_object.c
> > @@ -0,0 +1,532 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * trace any object
> > + * Copyright (C) 2021 Jeff Xie <[email protected]>
> > + */
> > +
> > +#define pr_fmt(fmt) "trace_object: " fmt
> > +
> > +#include "trace_output.h"
> > +#include <linux/freelist.h>
> > +
> > +static DEFINE_PER_CPU(atomic_t, trace_object_event_disable);
> > +static struct trace_event_file event_trace_file;
> > +static const int max_args_num = 6;
> > +static const int max_obj_pool = 10;
> > +static atomic_t trace_object_ref;
> > +static int exit_trace_object(void);
> > +static int init_trace_object(void);
> > +
> > +struct object_instance {
> > + void *object;
> > + struct freelist_node freelist;
> > +};
> > +
> > +struct obj_pool {
> > + int maxactive;
>
> It seems that 'maxactive' is used only in the init_object_pool(), so
> we may not need this.
>
> > + atomic_t nobject;
> > + struct freelist_head product_freelist;
> > + struct freelist_head customer_freelist;
>
> Sorry, I might confuse you. You shouldn't use the freelist for active
> list, which searches a specific entry in the list. The freelist is good
> for pooling the unused object_instances but not good for search an entry.
> So please use normal "list_head" for the list for active objects. And
> in that case, we can use rcu_list operation on it.
>
> BTW, I rather like calling those "active_list" and "free_list", because
> product/customer are unclear what it means.
>
> > +};
> > +static struct obj_pool *obj_pool;
> > +
> > +static bool object_exist(void *obj)
> > +{
> > + struct freelist_node *node;
> > + struct object_instance *inst;
> > + bool ret = false;
> > +
> > + node = obj_pool->customer_freelist.head;
> > +
> > + while (node) {
>
> Then, you can use list_for_each_entry_rcu() here.
>
> > + inst = container_of(node, struct object_instance, freelist);
> > + if (inst->object == obj) {
> > + ret = true;
> > + goto out;
> > + }
> > + node = node->next;
> > + }
> > +out:
> > + return ret;
> > +}
> > +
> > +static bool object_empty(void)
> > +{
> > + bool ret;
> > +
> > + ret = atomic_read(&obj_pool->nobject) == 0;
> > +
> > + return ret;
>
> Instead of use atomic_read(), you can check the list itself by
>
> return !list_empty_rcu(&obj_pool->active_list);
>
> > +}
> > +
> > +
> > +static void set_trace_object(void *obj)
> > +{
> > + struct freelist_node *fn;
> > + struct object_instance *ins;
> > +
> > + if (!obj)
> > + return;
> > +
> > + if (object_exist(obj))
> > + goto out;
>
> No need to use goto. Just return is OK.
>
> > +
> > + fn = freelist_try_get(&obj_pool->product_freelist);
> > + if (!fn) {
> > + trace_printk("object_pool is full, can't trace object:0x%px\n", obj);
> > + return;
> > + }
> > +
> > + ins = container_of(fn, struct object_instance, freelist);
> > + ins->object = obj;
> > +
> > + freelist_add(&ins->freelist, &obj_pool->customer_freelist);
> > + atomic_inc(&obj_pool->nobject);
>
> Here, you can use list_add_rcu().
>
> > +
> > +out:
> > + return;
> > +}
> > +
> > +static inline void free_objects(struct freelist_node *node)
>
> Usually, this kind of function will get the freelist_head, since
> node can be a node in the middle of the list.
>
> > +{
> > +
> > + struct object_instance *inst;
> > +
> > + while (node) {
> > + inst = container_of(node, struct object_instance, freelist);
> > + node = node->next;
> > + kfree(inst);
> > + }
> > +}
> > +
> > +static inline void free_object_pool(void)
> > +{
> > + free_objects(obj_pool->product_freelist.head);
> > + free_objects(obj_pool->customer_freelist.head);
> > + kfree(obj_pool);
> > +}
> > +
> > +
> > +static int init_object_pool(void)
> > +{
> > + struct object_instance *inst;
> > + int i, ret = 0;
> > +
> > + obj_pool = kzalloc(sizeof(*obj_pool), GFP_KERNEL);
> > + if (!obj_pool) {
> > + ret = -ENOMEM;
> > + goto out;
> > + }
> > + obj_pool->maxactive = max_obj_pool;
> > + obj_pool->product_freelist.head = NULL;
> > + obj_pool->customer_freelist.head = NULL;
>
> You don't need to these because you already allocated a zero-filled
> object by kzalloc().
>
> > +
> > + for (i = 0; i < obj_pool->maxactive; i++) {
> > + inst = kzalloc(sizeof(*inst), GFP_KERNEL);
> > + if (!inst) {
> > + free_object_pool();
> > + ret = -ENOMEM;
> > + goto out;
> > + }
> > + freelist_add(&inst->freelist, &obj_pool->product_freelist);
> > + }
> > +out:
> > + return ret;
> > +}
> > +
> > +static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> > + unsigned long object)
> > +{
> > +
> > + struct trace_buffer *buffer;
> > + struct ring_buffer_event *event;
> > + struct trace_object_entry *entry;
> > + int pc;
> > +
> > + pc = preempt_count();
> > + event = trace_event_buffer_lock_reserve(&buffer, &event_trace_file,
> > + TRACE_OBJECT, sizeof(*entry), pc);
> > + if (!event)
> > + return;
> > + entry = ring_buffer_event_data(event);
> > + entry->ip = ip;
> > + entry->parent_ip = parent_ip;
> > + entry->object = object;
> > +
> > + event_trigger_unlock_commit(&event_trace_file, buffer, event,
> > + entry, pc);
> > +}
> > +
> > +static void
> > +trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> > + struct ftrace_ops *op, struct ftrace_regs *fregs)
> > +{
> > + struct pt_regs *pt_regs = ftrace_get_regs(fregs);
> > + unsigned long obj;
> > + long disabled;
> > + int cpu, n;
> > +
> > + preempt_disable_notrace();
> > +
> > + cpu = raw_smp_processor_id();
> > + disabled = atomic_inc_return(&per_cpu(trace_object_event_disable, cpu));
>
> What is the purpose of this?
> (BTW, if you use per_cpu variable, you don't need atomic ops, because there is
> no racing with other cpus.)
>
> > +
> > + if (disabled != 1)
> > + goto out;
> > +
> > + if (object_empty())
> > + goto out;
> > +
> > + for (n = 0; n < max_args_num; n++) {
> > + obj = regs_get_kernel_argument(pt_regs, n);
> > + if (object_exist((void *)obj))
> > + submit_trace_object(ip, parent_ip, obj);
> > + }
> > +
> > +out:
> > + atomic_dec(&per_cpu(trace_object_event_disable, cpu));
> > + preempt_enable_notrace();
> > +}
> > +
> > +static struct ftrace_ops trace_ops = {
> > + .func = trace_object_events_call,
> > + .flags = FTRACE_OPS_FL_SAVE_REGS,
> > +};
> > +
> > +static void
> > +trace_object_trigger(struct event_trigger_data *data,
> > + struct trace_buffer *buffer, void *rec,
> > + struct ring_buffer_event *event)
> > +{
> > +
> > + struct ftrace_event_field *field = data->private_data;
> > + void *obj = NULL;
> > +
> > + memcpy(&obj, rec + field->offset, sizeof(obj));
> > + set_trace_object(obj);
> > +}
> > +
> > +static void
> > +trace_object_trigger_free(struct event_trigger_ops *ops,
> > + struct event_trigger_data *data)
> > +{
> > + if (WARN_ON_ONCE(data->ref <= 0))
> > + return;
> > +
> > + data->ref--;
> > + if (!data->ref)
> > + trigger_data_free(data);
> > +}
> > +
> > +static void
> > +trace_object_count_trigger(struct event_trigger_data *data,
> > + struct trace_buffer *buffer, void *rec,
> > + struct ring_buffer_event *event)
> > +{
> > + if (!data->count)
> > + return;
> > +
> > + if (data->count != -1)
> > + (data->count)--;
> > +
> > + trace_object_trigger(data, buffer, rec, event);
> > +}
> > +
> > +static int event_object_trigger_init(struct event_trigger_ops *ops,
> > + struct event_trigger_data *data)
> > +{
> > + data->ref++;
> > + return 0;
> > +}
> > +
> > +static int
> > +event_trigger_print(const char *name, struct seq_file *m,
> > + void *data, char *filter_str)
> > +{
> > + long count = (long)data;
> > +
> > + seq_puts(m, name);
> > +
> > + if (count == -1)
> > + seq_puts(m, ":unlimited");
> > + else
> > + seq_printf(m, ":count=%ld", count);
> > +
> > + if (filter_str)
> > + seq_printf(m, " if %s\n", filter_str);
> > + else
> > + seq_putc(m, '\n');
> > +
> > + return 0;
> > +}
> > +
> > +static int
> > +trace_object_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
> > + struct event_trigger_data *data)
> > +{
> > + return event_trigger_print("objtrace", m, (void *)data->count,
> > + data->filter_str);
> > +}
> > +
> > +
> > +static struct event_trigger_ops objecttrace_trigger_ops = {
> > + .func = trace_object_trigger,
> > + .print = trace_object_trigger_print,
> > + .init = event_object_trigger_init,
> > + .free = trace_object_trigger_free,
> > +};
> > +
> > +static struct event_trigger_ops objecttrace_count_trigger_ops = {
> > + .func = trace_object_count_trigger,
> > + .print = trace_object_trigger_print,
> > + .init = event_object_trigger_init,
> > + .free = trace_object_trigger_free,
> > +};
> > +
> > +static struct event_trigger_ops *
> > +objecttrace_get_trigger_ops(char *cmd, char *param)
> > +{
> > + return param ? &objecttrace_count_trigger_ops : &objecttrace_trigger_ops;
> > +}
> > +
> > +static int register_object_trigger(char *glob, struct event_trigger_ops *ops,
> > + struct event_trigger_data *data,
> > + struct trace_event_file *file)
> > +{
> > + struct event_trigger_data *test;
> > + int ret = 0;
> > +
> > + lockdep_assert_held(&event_mutex);
> > +
> > + list_for_each_entry(test, &file->triggers, list) {
> > + if (test->cmd_ops->trigger_type == data->cmd_ops->trigger_type) {
> > + ret = -EEXIST;
> > + goto out;
> > + }
> > + }
> > +
> > + if (data->ops->init) {
> > + ret = data->ops->init(data->ops, data);
> > + if (ret < 0)
> > + goto out;
> > + }
> > +
> > + list_add_rcu(&data->list, &file->triggers);
> > + ret++;
> > +
> > + update_cond_flag(file);
> > + if (trace_event_trigger_enable_disable(file, 1) < 0) {
> > + list_del_rcu(&data->list);
> > + update_cond_flag(file);
> > + ret--;
> > + }
> > + init_trace_object();
> > +out:
> > + return ret;
> > +}
> > +
> > +static void unregister_object_trigger(char *glob, struct event_trigger_ops *ops,
> > + struct event_trigger_data *test,
> > + struct trace_event_file *file)
> > +{
> > + struct event_trigger_data *data;
> > + bool unregistered = false;
> > +
> > + lockdep_assert_held(&event_mutex);
> > +
> > + list_for_each_entry(data, &file->triggers, list) {
> > + if (data->cmd_ops->trigger_type == test->cmd_ops->trigger_type) {
> > + unregistered = true;
> > + list_del_rcu(&data->list);
> > + trace_event_trigger_enable_disable(file, 0);
> > + update_cond_flag(file);
> > + break;
> > + }
> > + }
> > +
> > + if (unregistered && data->ops->free) {
> > + data->ops->free(data->ops, data);
> > + exit_trace_object();
> > + }
> > +}
> > +
> > +static int
> > +event_object_trigger_callback(struct event_command *cmd_ops,
> > + struct trace_event_file *file,
> > + char *glob, char *cmd, char *param)
> > +{
> > + struct event_trigger_data *trigger_data;
> > + struct event_trigger_ops *trigger_ops;
> > + struct trace_event_call *call;
> > + struct ftrace_event_field *field;
> > + char *objtrace_cmd;
> > + char *trigger = NULL;
> > + char *arg;
> > + char *number;
> > + int ret;
> > +
> > + ret = -EINVAL;
> > + if (!param)
> > + goto out;
> > +
> > + /* separate the trigger from the filter (c:a:n [if filter]) */
> > + trigger = strsep(&param, " \t");
> > + if (!trigger)
> > + goto out;
> > + if (param) {
> > + param = skip_spaces(param);
> > + if (!*param)
> > + param = NULL;
> > + }
> > +
> > + objtrace_cmd = strsep(&trigger, ":");
> > + if (!objtrace_cmd || strcmp(objtrace_cmd, "add"))
> > + goto out;
> > +
> > + arg = strsep(&trigger, ":");
> > + if (!arg)
> > + goto out;
> > + call = file->event_call;
> > + field = trace_find_event_field(call, arg);
> > + if (!field)
> > + goto out;
>
> As I pointed in v3, please ensure the field->size == sizeof(void *).
> (Note that memcpy() is available only if the both size is same.
> For example, on the big endian architecture, copying 8bit 0x12 on 64bit
> pointer will be 0x1200000000000000.)
>
> Thank you,
>
>
> > +
> > + trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
> > +
> > + ret = -ENOMEM;
> > + trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
> > + if (!trigger_data)
> > + goto out;
> > +
> > + trigger_data->count = -1;
> > + trigger_data->ops = trigger_ops;
> > + trigger_data->cmd_ops = cmd_ops;
> > + trigger_data->private_data = field;
> > + INIT_LIST_HEAD(&trigger_data->list);
> > + INIT_LIST_HEAD(&trigger_data->named_list);
> > +
> > + if (glob[0] == '!') {
> > + cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
> > + kfree(trigger_data);
> > + ret = 0;
> > + goto out;
> > + }
> > +
> > + if (trigger) {
> > + number = strsep(&trigger, ":");
> > +
> > + ret = -EINVAL;
> > + if (!strlen(number))
> > + goto out_free;
> > +
> > + /*
> > + * We use the callback data field (which is a pointer)
> > + * as our counter.
> > + */
> > + ret = kstrtoul(number, 0, &trigger_data->count);
> > + if (ret)
> > + goto out_free;
> > + }
> > +
> > + if (!param) /* if param is non-empty, it's supposed to be a filter */
> > + goto out_reg;
> > +
> > + if (!cmd_ops->set_filter)
> > + goto out_reg;
> > +
> > + ret = cmd_ops->set_filter(param, trigger_data, file);
> > + if (ret < 0)
> > + goto out_free;
> > +
> > + out_reg:
> > + /* Up the trigger_data count to make sure reg doesn't free it on failure */
> > + event_object_trigger_init(trigger_ops, trigger_data);
> > + ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
> > + /*
> > + * The above returns on success the # of functions enabled,
> > + * but if it didn't find any functions it returns zero.
> > + * Consider no functions a failure too.
> > + */
> > + if (!ret) {
> > + cmd_ops->unreg(glob, trigger_ops, trigger_data, file);
> > + ret = -ENOENT;
> > + } else if (ret > 0)
> > + ret = 0;
> > +
> > + /* Down the counter of trigger_data or free it if not used anymore */
> > + trace_object_trigger_free(trigger_ops, trigger_data);
> > + out:
> > + return ret;
> > +
> > + out_free:
> > + if (cmd_ops->set_filter)
> > + cmd_ops->set_filter(NULL, trigger_data, NULL);
> > + kfree(trigger_data);
> > + goto out;
> > +}
> > +
> > +static struct event_command trigger_object_cmd = {
> > + .name = "objtrace",
> > + .trigger_type = ETT_TRACE_OBJECT,
> > + .flags = EVENT_CMD_FL_NEEDS_REC,
> > + .func = event_object_trigger_callback,
> > + .reg = register_object_trigger,
> > + .unreg = unregister_object_trigger,
> > + .get_trigger_ops = objecttrace_get_trigger_ops,
> > + .set_filter = set_trigger_filter,
> > +};
> > +
> > +__init int register_trigger_object_cmd(void)
> > +{
> > + int ret;
> > +
> > + ret = register_event_command(&trigger_object_cmd);
> > + WARN_ON(ret < 0);
> > +
> > + return ret;
> > +}
> > +
> > +static int init_trace_object(void)
> > +{
> > + int ret;
> > +
> > + if (atomic_inc_return(&trace_object_ref) != 1) {
> > + ret = 0;
> > + goto out;
> > + }
> > +
> > + ret = init_object_pool();
> > + if (ret)
> > + goto out;
> > +
> > + event_trace_file.tr = top_trace_array();
> > + if (WARN_ON(!event_trace_file.tr)) {
> > + ret = -1;
> > + goto out;
> > + }
> > + ret = register_ftrace_function(&trace_ops);
> > +out:
> > + return ret;
> > +}
> > +
> > +static int exit_trace_object(void)
> > +{
> > + int ret;
> > +
> > + if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))
> > + goto out;
> > +
> > + if (atomic_dec_return(&trace_object_ref) != 0) {
> > + ret = 0;
> > + goto out;
> > + }
> > +
> > + ret = unregister_ftrace_function(&trace_ops);
> > + if (ret) {
> > + pr_err("can't unregister ftrace for trace object\n");
> > + goto out;
> > + }
> > + free_object_pool();
> > +out:
> > + return ret;
> > +}
> > diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> > index c2ca40e8595b..76ca560af693 100644
> > --- a/kernel/trace/trace_output.c
> > +++ b/kernel/trace/trace_output.c
> > @@ -1552,6 +1552,45 @@ static struct trace_event trace_func_repeats_event = {
> > .funcs = &trace_func_repeats_funcs,
> > };
> >
> > +/* TRACE_OBJECT */
> > +static enum print_line_t trace_object_print(struct trace_iterator *iter, int flags,
> > + struct trace_event *event)
> > +{
> > + struct trace_object_entry *field;
> > + struct trace_seq *s = &iter->seq;
> > +
> > + trace_assign_type(field, iter->ent);
> > + print_fn_trace(s, field->ip, field->parent_ip, flags);
> > + trace_seq_printf(s, " object:0x%lx", field->object);
> > + trace_seq_putc(s, '\n');
> > +
> > + return trace_handle_return(s);
> > +}
> > +
> > +static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags,
> > + struct trace_event *event)
> > +{
> > + struct trace_object_entry *field;
> > +
> > + trace_assign_type(field, iter->ent);
> > +
> > + trace_seq_printf(&iter->seq, "%lx %lx\n",
> > + field->ip,
> > + field->parent_ip);
> > +
> > + return trace_handle_return(&iter->seq);
> > +}
> > +
> > +static struct trace_event_functions trace_object_funcs = {
> > + .trace = trace_object_print,
> > + .raw = trace_object_raw,
> > +};
> > +
> > +static struct trace_event trace_object_event = {
> > + .type = TRACE_OBJECT,
> > + .funcs = &trace_object_funcs,
> > +};
> > +
> > static struct trace_event *events[] __initdata = {
> > &trace_fn_event,
> > &trace_ctx_event,
> > @@ -1566,6 +1605,7 @@ static struct trace_event *events[] __initdata = {
> > &trace_timerlat_event,
> > &trace_raw_data_event,
> > &trace_func_repeats_event,
> > + &trace_object_event,
> > NULL
> > };
> >
> > --
> > 2.25.1
> >
>
>
> --
> Masami Hiramatsu <[email protected]>

Thanks,
---
JeffXie

2021-11-21 17:13:40

by Jeff Xie

[permalink] [raw]
Subject: Re: [RFC][PATCH v5 1/4] trace: Add trace any kernel object

Hi Masami,

Thank you again for you patient guidance!

On Fri, Nov 19, 2021 at 10:58 PM Jeff Xie <[email protected]> wrote:
>
> Hi Masami,
>
> Thank you so much for your patient comments!
> I have been following the kernel mailing list every day recently. and
> know that you and Steve are both super busy.
>
> I have to say that I am really embarrassed (too many mistakes;-) ).
> But now I just want to say that I will spend time
> to correct all the mistakes seriously.
>
> On Fri, Nov 19, 2021 at 10:00 PM Masami Hiramatsu <[email protected]> wrote:
> >
> > Hi Jeff,
> >
> > Sorry for replying late!
> >
> > On Sat, 13 Nov 2021 20:06:29 +0800
> > Jeff Xie <[email protected]> wrote:
> >
> > > Introduce a method based on function tracer to trace any object and get
> > > the value of the object dynamically. the object can be obtained from the
> > > dynamic event (kprobe_event/uprobe_event) or the static event(tracepoint).
> > >
> > > Usage:
> > > When using the kprobe event, only need to set the objtrace(a new trigger),
> > > we can get the value of the object. The object is from the setting of the
> > > kprobe event.
> > >
> > > For example:
> > > For the function bio_add_page():
> > >
> > > int bio_add_page(struct bio *bio, struct page *page,
> > > unsigned int len, unsigned int offset)
> > >
> > > Firstly, we can set the base of the object, thus the first string "arg1"
> > > stands for the value of the first parameter of this function bio_add_gage(),
> > >
> > > # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
> > >
> > > Secondly, we can get the value dynamically base the above object.
> > >
> > > find the offset of the bi_size in struct bio:
> > > $ gdb vmlinux
> > > (gdb) p &(((struct bio *)0)->bi_iter.bi_size)
> > > $1 = (unsigned int *) 0x28
> > >
> > > # echo 'objtrace:0x28(arg1):u32:1 if comm == "cat"' > ./events/kprobes/ \
> > > p_bio_add_page_0/trigger
> >
> > You may need "objtrace:add:..." right? :-)

Yes, I will correct it. ;-)

> >
> > BTW, this value support is done in [2/4], you should focus on what this patch
> > does, not the entire series does. It seems what you need is a cover mail
> > for this series. The description like this must be done in the cover mail.
> > (For example, long trace output etc.)

I agree, I will add the cover mail.

> > >
> > > The best way to use this is that we can set the entrance event and exit
> > > event, for example, the following example is to set the read_papes as
> > > the entrance event, and set the blk_account_io_start as the exit event.
> > >
> > > # dd if=/dev/zero of=/test.txt bs=4k count=3
> > > # cd /sys/kernel/debug/tracing/
> > > # echo 0 > ./tracing_on
> > > # echo 'p read_pages' >> ./kprobe_events
> > > # echo 'p blk_account_io_start' >> ./kprobe_events
> > > # echo 'traceon if comm == "cat"' > ./events/kprobes/p_read_pages_0/trigger
> > > # echo 'traceoff if comm == "cat"' > ./events/kprobes/p_blk_account_io_start_0/trigger
> > > # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
> > > # echo 'objtrace:0x28(arg1):u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger
> > >
> > > # du -sh /test.txt
> > > 12.0K /test.txt
> > >
> > > # cat /test.txt > /dev/null
> > > # cat ./trace
> > >
> > > # tracer: nop
> > > #
> > > # entries-in-buffer/entries-written: 50/50 #P:1
> > > #
> > > # _-----=> irqs-off
> > > # / _----=> need-resched
> > > # | / _---=> hardirq/softirq
> > > # || / _--=> preempt-depth
> > > # ||| / _-=> migrate-disable
> > > # |||| / delay
> > > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
> > > # | | | ||||| | |
> > > cat-95 [000] ..... 1.412065: _raw_spin_unlock_irqrestore <-event_triggers_call object:0xffff888108af6328 value:0x0
> > > cat-95 [000] ..... 1.412066: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x0
> > > cat-95 [000] ..... 1.412066: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x0
> > > cat-95 [000] ..... 1.412066: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x1000
> > > cat-95 [000] ..... 1.412066: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x1000
> > > cat-95 [000] ..... 1.412066: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x1000
> > > cat-95 [000] ..... 1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x1000
> > > cat-95 [000] ..... 1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x1000
> > > cat-95 [000] ..... 1.412067: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x1000
> > > cat-95 [000] ..... 1.412067: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x2000
> > > cat-95 [000] ..... 1.412067: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x2000
> > > cat-95 [000] ..... 1.412067: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x2000
> > > cat-95 [000] ..... 1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x2000
> > > cat-95 [000] ..... 1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x2000
> > > cat-95 [000] ..... 1.412068: submit_bio <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412068: submit_bio_noacct <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412068: __submit_bio <-submit_bio_noacct object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412068: blk_try_enter_queue <-__submit_bio object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412068: submit_bio_checks <-__submit_bio object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412068: __cond_resched <-submit_bio_checks object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412068: rcu_all_qs <-__cond_resched object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412068: should_fail_bio <-submit_bio_checks object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412069: create_task_io_context <-submit_bio_checks object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412069: kmem_cache_alloc_node <-create_task_io_context object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412069: should_failslab <-kmem_cache_alloc_node object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412069: _raw_spin_lock <-create_task_io_context object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412069: blk_mq_submit_bio <-__submit_bio object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412069: __blk_queue_split <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412069: bvec_split_segs <-__blk_queue_split object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412069: blk_attempt_plug_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412070: __blk_mq_sched_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412070: dd_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412070: _raw_spin_lock <-dd_bio_merge object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412070: blk_mq_sched_try_merge <-dd_bio_merge object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412070: elv_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412070: elv_rqhash_find <-elv_merge object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412070: dd_request_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412070: elv_rb_find <-dd_request_merge object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412070: __blk_mq_alloc_request <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412071: dd_limit_depth <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412071: blk_mq_get_tag <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412071: __blk_mq_get_tag <-blk_mq_get_tag object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412071: blk_mq_rq_ctx_init.isra.0 <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412071: ktime_get <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412071: dd_prepare_request <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412071: blk_account_io_start <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412071: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x3000
> > > cat-95 [000] ..... 1.412071: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x3000
> > >
> > > Signed-off-by: Jeff Xie <[email protected]>
> > > ---
> > > include/linux/trace_events.h | 1 +
> > > kernel/trace/Kconfig | 10 +
> > > kernel/trace/Makefile | 1 +
> > > kernel/trace/trace.h | 8 +
> > > kernel/trace/trace_entries.h | 17 +
> > > kernel/trace/trace_events_trigger.c | 1 +
> > > kernel/trace/trace_object.c | 532 ++++++++++++++++++++++++++++
> > > kernel/trace/trace_output.c | 40 +++
> > > 8 files changed, 610 insertions(+)
> > > create mode 100644 kernel/trace/trace_object.c
> > >
> > > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> > > index 3e475eeb5a99..84b5c12c7f44 100644
> > > --- a/include/linux/trace_events.h
> > > +++ b/include/linux/trace_events.h
> > > @@ -684,6 +684,7 @@ enum event_trigger_type {
> > > ETT_EVENT_HIST = (1 << 4),
> > > ETT_HIST_ENABLE = (1 << 5),
> > > ETT_EVENT_EPROBE = (1 << 6),
> > > + ETT_TRACE_OBJECT = (1 << 7),
> > > };
> > >
> > > extern int filter_match_preds(struct event_filter *filter, void *rec);
> > > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > > index 420ff4bc67fd..f69761cbfaaa 100644
> > > --- a/kernel/trace/Kconfig
> > > +++ b/kernel/trace/Kconfig
> > > @@ -237,6 +237,16 @@ config FUNCTION_PROFILER
> > >
> > > If in doubt, say N.
> > >
> > > +config TRACE_OBJECT
> > > + bool "Trace kernel object in function parameter"
> > > + depends on FUNCTION_TRACER
> > > + depends on HAVE_FUNCTION_ARG_ACCESS_API
> > > + select TRACING
> > > + default y
> > > + help
> > > + You can trace the kernel object in the kernel function parameter.
> > > + The kernel object is dynamically specified via event trigger.
> > > +
> > > config STACK_TRACER
> > > bool "Trace max stack"
> > > depends on HAVE_FUNCTION_TRACER
> > > diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> > > index 6de5d4d63165..6d9e78a488aa 100644
> > > --- a/kernel/trace/Makefile
> > > +++ b/kernel/trace/Makefile
> > > @@ -66,6 +66,7 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
> > > obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
> > > obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
> > > obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += fgraph.o
> > > +obj-$(CONFIG_TRACE_OBJECT) += trace_object.o
> > > ifeq ($(CONFIG_BLOCK),y)
> > > obj-$(CONFIG_EVENT_TRACING) += blktrace.o
> > > endif
> > > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > > index b7c0f8e160fb..d2508f977fe6 100644
> > > --- a/kernel/trace/trace.h
> > > +++ b/kernel/trace/trace.h
> > > @@ -49,6 +49,7 @@ enum trace_type {
> > > TRACE_TIMERLAT,
> > > TRACE_RAW_DATA,
> > > TRACE_FUNC_REPEATS,
> > > + TRACE_OBJECT,
> > >
> > > __TRACE_LAST_TYPE,
> > > };
> > > @@ -460,6 +461,7 @@ extern void __ftrace_bad_type(void);
> > > TRACE_GRAPH_RET); \
> > > IF_ASSIGN(var, ent, struct func_repeats_entry, \
> > > TRACE_FUNC_REPEATS); \
> > > + IF_ASSIGN(var, ent, struct trace_object_entry, TRACE_OBJECT);\
> > > __ftrace_bad_type(); \
> > > } while (0)
> > >
> > > @@ -1510,6 +1512,12 @@ static inline int register_trigger_hist_cmd(void) { return 0; }
> > > static inline int register_trigger_hist_enable_disable_cmds(void) { return 0; }
> > > #endif
> > >
> > > +#ifdef CONFIG_TRACE_OBJECT
> > > +extern int register_trigger_object_cmd(void);
> > > +#else
> > > +static inline int register_trigger_object_cmd(void) { return 0; }
> > > +#endif
> > > +
> > > extern int register_trigger_cmds(void);
> > > extern void clear_event_triggers(struct trace_array *tr);
> > >
> > > diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> > > index cd41e863b51c..bb120d9498a9 100644
> > > --- a/kernel/trace/trace_entries.h
> > > +++ b/kernel/trace/trace_entries.h
> > > @@ -401,3 +401,20 @@ FTRACE_ENTRY(timerlat, timerlat_entry,
> > > __entry->context,
> > > __entry->timer_latency)
> > > );
> > > +
> > > +/*
> > > + * trace object entry:
> > > + */
> > > +FTRACE_ENTRY(object, trace_object_entry,
> > > +
> > > + TRACE_OBJECT,
> > > +
> > > + F_STRUCT(
> > > + __field( unsigned long, ip )
> > > + __field( unsigned long, parent_ip )
> > > + __field( unsigned long, object )
> > > + ),
> > > +
> > > + F_printk(" %ps <-- %ps object:%lx\n",
> > > + (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
> > > +);
> > > diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> > > index 3d5c07239a2a..da2d871fa809 100644
> > > --- a/kernel/trace/trace_events_trigger.c
> > > +++ b/kernel/trace/trace_events_trigger.c
> > > @@ -1687,6 +1687,7 @@ __init int register_trigger_cmds(void)
> > > register_trigger_enable_disable_cmds();
> > > register_trigger_hist_enable_disable_cmds();
> > > register_trigger_hist_cmd();
> > > + register_trigger_object_cmd();
> > >
> > > return 0;
> > > }
> > > diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
> > > new file mode 100644
> > > index 000000000000..69465c2ffb7e
> > > --- /dev/null
> > > +++ b/kernel/trace/trace_object.c
> > > @@ -0,0 +1,532 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * trace any object
> > > + * Copyright (C) 2021 Jeff Xie <[email protected]>
> > > + */
> > > +
> > > +#define pr_fmt(fmt) "trace_object: " fmt
> > > +
> > > +#include "trace_output.h"
> > > +#include <linux/freelist.h>
> > > +
> > > +static DEFINE_PER_CPU(atomic_t, trace_object_event_disable);
> > > +static struct trace_event_file event_trace_file;
> > > +static const int max_args_num = 6;
> > > +static const int max_obj_pool = 10;
> > > +static atomic_t trace_object_ref;
> > > +static int exit_trace_object(void);
> > > +static int init_trace_object(void);
> > > +
> > > +struct object_instance {
> > > + void *object;
> > > + struct freelist_node freelist;
> > > +};
> > > +
> > > +struct obj_pool {
> > > + int maxactive;
> >
> > It seems that 'maxactive' is used only in the init_object_pool(), so
> > we may not need this.

I agree, thanks.

> >
> > > + atomic_t nobject;
> > > + struct freelist_head product_freelist;
> > > + struct freelist_head customer_freelist;
> >
> > Sorry, I might confuse you. You shouldn't use the freelist for active
> > list, which searches a specific entry in the list. The freelist is good
> > for pooling the unused object_instances but not good for search an entry.
> > So please use normal "list_head" for the list for active objects. And
> > in that case, we can use rcu_list operation on it.
> >
> > BTW, I rather like calling those "active_list" and "free_list", because
> > product/customer are unclear what it means.

I agree, thanks.

> >
> > > +};
> > > +static struct obj_pool *obj_pool;
> > > +
> > > +static bool object_exist(void *obj)
> > > +{
> > > + struct freelist_node *node;
> > > + struct object_instance *inst;
> > > + bool ret = false;
> > > +
> > > + node = obj_pool->customer_freelist.head;
> > > +
> > > + while (node) {
> >
> > Then, you can use list_for_each_entry_rcu() here.
> >
> > > + inst = container_of(node, struct object_instance, freelist);
> > > + if (inst->object == obj) {
> > > + ret = true;
> > > + goto out;
> > > + }
> > > + node = node->next;
> > > + }
> > > +out:
> > > + return ret;
> > > +}
> > > +
> > > +static bool object_empty(void)
> > > +{
> > > + bool ret;
> > > +
> > > + ret = atomic_read(&obj_pool->nobject) == 0;
> > > +
> > > + return ret;
> >
> > Instead of use atomic_read(), you can check the list itself by
> >
> > return !list_empty_rcu(&obj_pool->active_list);

Maybe should be list_empty_rcu(&obj_pool->active_list) for object_empty() -;)

> >
> > > +}
> > > +
> > > +
> > > +static void set_trace_object(void *obj)
> > > +{
> > > + struct freelist_node *fn;
> > > + struct object_instance *ins;
> > > +
> > > + if (!obj)
> > > + return;
> > > +
> > > + if (object_exist(obj))
> > > + goto out;
> >
> > No need to use goto. Just return is OK.

I agree ,thanks.

> >
> > > +
> > > + fn = freelist_try_get(&obj_pool->product_freelist);
> > > + if (!fn) {
> > > + trace_printk("object_pool is full, can't trace object:0x%px\n", obj);
> > > + return;
> > > + }
> > > +
> > > + ins = container_of(fn, struct object_instance, freelist);
> > > + ins->object = obj;
> > > +
> > > + freelist_add(&ins->freelist, &obj_pool->customer_freelist);
> > > + atomic_inc(&obj_pool->nobject);
> >
> > Here, you can use list_add_rcu().

This place may need to add a
raw_spin_lock_irqsave/raw_spin_unlock_irqrestore to protect
list_add_rcu
and should exit when in nmi context for the function set_trace_object.

> >
> > > +
> > > +out:
> > > + return;
> > > +}
> > > +
> > > +static inline void free_objects(struct freelist_node *node)
> >
> > Usually, this kind of function will get the freelist_head, since
> > node can be a node in the middle of the list.

I agree, thanks.

> >
> > > +{
> > > +
> > > + struct object_instance *inst;
> > > +
> > > + while (node) {
> > > + inst = container_of(node, struct object_instance, freelist);
> > > + node = node->next;
> > > + kfree(inst);
> > > + }
> > > +}
> > > +
> > > +static inline void free_object_pool(void)
> > > +{
> > > + free_objects(obj_pool->product_freelist.head);
> > > + free_objects(obj_pool->customer_freelist.head);
> > > + kfree(obj_pool);
> > > +}
> > > +
> > > +
> > > +static int init_object_pool(void)
> > > +{
> > > + struct object_instance *inst;
> > > + int i, ret = 0;
> > > +
> > > + obj_pool = kzalloc(sizeof(*obj_pool), GFP_KERNEL);
> > > + if (!obj_pool) {
> > > + ret = -ENOMEM;
> > > + goto out;
> > > + }
> > > + obj_pool->maxactive = max_obj_pool;
> > > + obj_pool->product_freelist.head = NULL;
> > > + obj_pool->customer_freelist.head = NULL;
> >
> > You don't need to these because you already allocated a zero-filled
> > object by kzalloc().

Thanks for your reminder.

> >
> > > +
> > > + for (i = 0; i < obj_pool->maxactive; i++) {
> > > + inst = kzalloc(sizeof(*inst), GFP_KERNEL);
> > > + if (!inst) {
> > > + free_object_pool();
> > > + ret = -ENOMEM;
> > > + goto out;
> > > + }
> > > + freelist_add(&inst->freelist, &obj_pool->product_freelist);
> > > + }
> > > +out:
> > > + return ret;
> > > +}
> > > +
> > > +static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> > > + unsigned long object)
> > > +{
> > > +
> > > + struct trace_buffer *buffer;
> > > + struct ring_buffer_event *event;
> > > + struct trace_object_entry *entry;
> > > + int pc;
> > > +
> > > + pc = preempt_count();
> > > + event = trace_event_buffer_lock_reserve(&buffer, &event_trace_file,
> > > + TRACE_OBJECT, sizeof(*entry), pc);
> > > + if (!event)
> > > + return;
> > > + entry = ring_buffer_event_data(event);
> > > + entry->ip = ip;
> > > + entry->parent_ip = parent_ip;
> > > + entry->object = object;
> > > +
> > > + event_trigger_unlock_commit(&event_trace_file, buffer, event,
> > > + entry, pc);
> > > +}
> > > +
> > > +static void
> > > +trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> > > + struct ftrace_ops *op, struct ftrace_regs *fregs)
> > > +{
> > > + struct pt_regs *pt_regs = ftrace_get_regs(fregs);
> > > + unsigned long obj;
> > > + long disabled;
> > > + int cpu, n;
> > > +
> > > + preempt_disable_notrace();
> > > +
> > > + cpu = raw_smp_processor_id();
> > > + disabled = atomic_inc_return(&per_cpu(trace_object_event_disable, cpu));
> >
> > What is the purpose of this?
> > (BTW, if you use per_cpu variable, you don't need atomic ops, because there is
> > no racing with other cpus.)

The atomic ops can avoid the interrupt context or the nmi context on
the same cpu to preempt it.

> >
> > > +
> > > + if (disabled != 1)
> > > + goto out;
> > > +
> > > + if (object_empty())
> > > + goto out;
> > > +
> > > + for (n = 0; n < max_args_num; n++) {
> > > + obj = regs_get_kernel_argument(pt_regs, n);
> > > + if (object_exist((void *)obj))
> > > + submit_trace_object(ip, parent_ip, obj);
> > > + }
> > > +
> > > +out:
> > > + atomic_dec(&per_cpu(trace_object_event_disable, cpu));
> > > + preempt_enable_notrace();
> > > +}
> > > +
> > > +static struct ftrace_ops trace_ops = {
> > > + .func = trace_object_events_call,
> > > + .flags = FTRACE_OPS_FL_SAVE_REGS,
> > > +};
> > > +
> > > +static void
> > > +trace_object_trigger(struct event_trigger_data *data,
> > > + struct trace_buffer *buffer, void *rec,
> > > + struct ring_buffer_event *event)
> > > +{
> > > +
> > > + struct ftrace_event_field *field = data->private_data;
> > > + void *obj = NULL;
> > > +
> > > + memcpy(&obj, rec + field->offset, sizeof(obj));
> > > + set_trace_object(obj);
> > > +}
> > > +
> > > +static void
> > > +trace_object_trigger_free(struct event_trigger_ops *ops,
> > > + struct event_trigger_data *data)
> > > +{
> > > + if (WARN_ON_ONCE(data->ref <= 0))
> > > + return;
> > > +
> > > + data->ref--;
> > > + if (!data->ref)
> > > + trigger_data_free(data);
> > > +}
> > > +
> > > +static void
> > > +trace_object_count_trigger(struct event_trigger_data *data,
> > > + struct trace_buffer *buffer, void *rec,
> > > + struct ring_buffer_event *event)
> > > +{
> > > + if (!data->count)
> > > + return;
> > > +
> > > + if (data->count != -1)
> > > + (data->count)--;
> > > +
> > > + trace_object_trigger(data, buffer, rec, event);
> > > +}
> > > +
> > > +static int event_object_trigger_init(struct event_trigger_ops *ops,
> > > + struct event_trigger_data *data)
> > > +{
> > > + data->ref++;
> > > + return 0;
> > > +}
> > > +
> > > +static int
> > > +event_trigger_print(const char *name, struct seq_file *m,
> > > + void *data, char *filter_str)
> > > +{
> > > + long count = (long)data;
> > > +
> > > + seq_puts(m, name);
> > > +
> > > + if (count == -1)
> > > + seq_puts(m, ":unlimited");
> > > + else
> > > + seq_printf(m, ":count=%ld", count);
> > > +
> > > + if (filter_str)
> > > + seq_printf(m, " if %s\n", filter_str);
> > > + else
> > > + seq_putc(m, '\n');
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int
> > > +trace_object_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
> > > + struct event_trigger_data *data)
> > > +{
> > > + return event_trigger_print("objtrace", m, (void *)data->count,
> > > + data->filter_str);
> > > +}
> > > +
> > > +
> > > +static struct event_trigger_ops objecttrace_trigger_ops = {
> > > + .func = trace_object_trigger,
> > > + .print = trace_object_trigger_print,
> > > + .init = event_object_trigger_init,
> > > + .free = trace_object_trigger_free,
> > > +};
> > > +
> > > +static struct event_trigger_ops objecttrace_count_trigger_ops = {
> > > + .func = trace_object_count_trigger,
> > > + .print = trace_object_trigger_print,
> > > + .init = event_object_trigger_init,
> > > + .free = trace_object_trigger_free,
> > > +};
> > > +
> > > +static struct event_trigger_ops *
> > > +objecttrace_get_trigger_ops(char *cmd, char *param)
> > > +{
> > > + return param ? &objecttrace_count_trigger_ops : &objecttrace_trigger_ops;
> > > +}
> > > +
> > > +static int register_object_trigger(char *glob, struct event_trigger_ops *ops,
> > > + struct event_trigger_data *data,
> > > + struct trace_event_file *file)
> > > +{
> > > + struct event_trigger_data *test;
> > > + int ret = 0;
> > > +
> > > + lockdep_assert_held(&event_mutex);
> > > +
> > > + list_for_each_entry(test, &file->triggers, list) {
> > > + if (test->cmd_ops->trigger_type == data->cmd_ops->trigger_type) {
> > > + ret = -EEXIST;
> > > + goto out;
> > > + }
> > > + }
> > > +
> > > + if (data->ops->init) {
> > > + ret = data->ops->init(data->ops, data);
> > > + if (ret < 0)
> > > + goto out;
> > > + }
> > > +
> > > + list_add_rcu(&data->list, &file->triggers);
> > > + ret++;
> > > +
> > > + update_cond_flag(file);
> > > + if (trace_event_trigger_enable_disable(file, 1) < 0) {
> > > + list_del_rcu(&data->list);
> > > + update_cond_flag(file);
> > > + ret--;
> > > + }
> > > + init_trace_object();
> > > +out:
> > > + return ret;
> > > +}
> > > +
> > > +static void unregister_object_trigger(char *glob, struct event_trigger_ops *ops,
> > > + struct event_trigger_data *test,
> > > + struct trace_event_file *file)
> > > +{
> > > + struct event_trigger_data *data;
> > > + bool unregistered = false;
> > > +
> > > + lockdep_assert_held(&event_mutex);
> > > +
> > > + list_for_each_entry(data, &file->triggers, list) {
> > > + if (data->cmd_ops->trigger_type == test->cmd_ops->trigger_type) {
> > > + unregistered = true;
> > > + list_del_rcu(&data->list);
> > > + trace_event_trigger_enable_disable(file, 0);
> > > + update_cond_flag(file);
> > > + break;
> > > + }
> > > + }
> > > +
> > > + if (unregistered && data->ops->free) {
> > > + data->ops->free(data->ops, data);
> > > + exit_trace_object();
> > > + }
> > > +}
> > > +
> > > +static int
> > > +event_object_trigger_callback(struct event_command *cmd_ops,
> > > + struct trace_event_file *file,
> > > + char *glob, char *cmd, char *param)
> > > +{
> > > + struct event_trigger_data *trigger_data;
> > > + struct event_trigger_ops *trigger_ops;
> > > + struct trace_event_call *call;
> > > + struct ftrace_event_field *field;
> > > + char *objtrace_cmd;
> > > + char *trigger = NULL;
> > > + char *arg;
> > > + char *number;
> > > + int ret;
> > > +
> > > + ret = -EINVAL;
> > > + if (!param)
> > > + goto out;
> > > +
> > > + /* separate the trigger from the filter (c:a:n [if filter]) */
> > > + trigger = strsep(&param, " \t");
> > > + if (!trigger)
> > > + goto out;
> > > + if (param) {
> > > + param = skip_spaces(param);
> > > + if (!*param)
> > > + param = NULL;
> > > + }
> > > +
> > > + objtrace_cmd = strsep(&trigger, ":");
> > > + if (!objtrace_cmd || strcmp(objtrace_cmd, "add"))
> > > + goto out;
> > > +
> > > + arg = strsep(&trigger, ":");
> > > + if (!arg)
> > > + goto out;
> > > + call = file->event_call;
> > > + field = trace_find_event_field(call, arg);
> > > + if (!field)
> > > + goto out;
> >
> > As I pointed in v3, please ensure the field->size == sizeof(void *).
> > (Note that memcpy() is available only if the both size is same.
> > For example, on the big endian architecture, copying 8bit 0x12 on 64bit
> > pointer will be 0x1200000000000000.)
> >

I agree, thanks for your detailed explanation.

> > Thank you,
> >

Thank you again.

> >
> > > +
> > > + trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
> > > +
> > > + ret = -ENOMEM;
> > > + trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
> > > + if (!trigger_data)
> > > + goto out;
> > > +
> > > + trigger_data->count = -1;
> > > + trigger_data->ops = trigger_ops;
> > > + trigger_data->cmd_ops = cmd_ops;
> > > + trigger_data->private_data = field;
> > > + INIT_LIST_HEAD(&trigger_data->list);
> > > + INIT_LIST_HEAD(&trigger_data->named_list);
> > > +
> > > + if (glob[0] == '!') {
> > > + cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
> > > + kfree(trigger_data);
> > > + ret = 0;
> > > + goto out;
> > > + }
> > > +
> > > + if (trigger) {
> > > + number = strsep(&trigger, ":");
> > > +
> > > + ret = -EINVAL;
> > > + if (!strlen(number))
> > > + goto out_free;
> > > +
> > > + /*
> > > + * We use the callback data field (which is a pointer)
> > > + * as our counter.
> > > + */
> > > + ret = kstrtoul(number, 0, &trigger_data->count);
> > > + if (ret)
> > > + goto out_free;
> > > + }
> > > +
> > > + if (!param) /* if param is non-empty, it's supposed to be a filter */
> > > + goto out_reg;
> > > +
> > > + if (!cmd_ops->set_filter)
> > > + goto out_reg;
> > > +
> > > + ret = cmd_ops->set_filter(param, trigger_data, file);
> > > + if (ret < 0)
> > > + goto out_free;
> > > +
> > > + out_reg:
> > > + /* Up the trigger_data count to make sure reg doesn't free it on failure */
> > > + event_object_trigger_init(trigger_ops, trigger_data);
> > > + ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
> > > + /*
> > > + * The above returns on success the # of functions enabled,
> > > + * but if it didn't find any functions it returns zero.
> > > + * Consider no functions a failure too.
> > > + */
> > > + if (!ret) {
> > > + cmd_ops->unreg(glob, trigger_ops, trigger_data, file);
> > > + ret = -ENOENT;
> > > + } else if (ret > 0)
> > > + ret = 0;
> > > +
> > > + /* Down the counter of trigger_data or free it if not used anymore */
> > > + trace_object_trigger_free(trigger_ops, trigger_data);
> > > + out:
> > > + return ret;
> > > +
> > > + out_free:
> > > + if (cmd_ops->set_filter)
> > > + cmd_ops->set_filter(NULL, trigger_data, NULL);
> > > + kfree(trigger_data);
> > > + goto out;
> > > +}
> > > +
> > > +static struct event_command trigger_object_cmd = {
> > > + .name = "objtrace",
> > > + .trigger_type = ETT_TRACE_OBJECT,
> > > + .flags = EVENT_CMD_FL_NEEDS_REC,
> > > + .func = event_object_trigger_callback,
> > > + .reg = register_object_trigger,
> > > + .unreg = unregister_object_trigger,
> > > + .get_trigger_ops = objecttrace_get_trigger_ops,
> > > + .set_filter = set_trigger_filter,
> > > +};
> > > +
> > > +__init int register_trigger_object_cmd(void)
> > > +{
> > > + int ret;
> > > +
> > > + ret = register_event_command(&trigger_object_cmd);
> > > + WARN_ON(ret < 0);
> > > +
> > > + return ret;
> > > +}
> > > +
> > > +static int init_trace_object(void)
> > > +{
> > > + int ret;
> > > +
> > > + if (atomic_inc_return(&trace_object_ref) != 1) {
> > > + ret = 0;
> > > + goto out;
> > > + }
> > > +
> > > + ret = init_object_pool();
> > > + if (ret)
> > > + goto out;
> > > +
> > > + event_trace_file.tr = top_trace_array();
> > > + if (WARN_ON(!event_trace_file.tr)) {
> > > + ret = -1;
> > > + goto out;
> > > + }
> > > + ret = register_ftrace_function(&trace_ops);
> > > +out:
> > > + return ret;
> > > +}
> > > +
> > > +static int exit_trace_object(void)
> > > +{
> > > + int ret;
> > > +
> > > + if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))
> > > + goto out;
> > > +
> > > + if (atomic_dec_return(&trace_object_ref) != 0) {
> > > + ret = 0;
> > > + goto out;
> > > + }
> > > +
> > > + ret = unregister_ftrace_function(&trace_ops);
> > > + if (ret) {
> > > + pr_err("can't unregister ftrace for trace object\n");
> > > + goto out;
> > > + }
> > > + free_object_pool();
> > > +out:
> > > + return ret;
> > > +}
> > > diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> > > index c2ca40e8595b..76ca560af693 100644
> > > --- a/kernel/trace/trace_output.c
> > > +++ b/kernel/trace/trace_output.c
> > > @@ -1552,6 +1552,45 @@ static struct trace_event trace_func_repeats_event = {
> > > .funcs = &trace_func_repeats_funcs,
> > > };
> > >
> > > +/* TRACE_OBJECT */
> > > +static enum print_line_t trace_object_print(struct trace_iterator *iter, int flags,
> > > + struct trace_event *event)
> > > +{
> > > + struct trace_object_entry *field;
> > > + struct trace_seq *s = &iter->seq;
> > > +
> > > + trace_assign_type(field, iter->ent);
> > > + print_fn_trace(s, field->ip, field->parent_ip, flags);
> > > + trace_seq_printf(s, " object:0x%lx", field->object);
> > > + trace_seq_putc(s, '\n');
> > > +
> > > + return trace_handle_return(s);
> > > +}
> > > +
> > > +static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags,
> > > + struct trace_event *event)
> > > +{
> > > + struct trace_object_entry *field;
> > > +
> > > + trace_assign_type(field, iter->ent);
> > > +
> > > + trace_seq_printf(&iter->seq, "%lx %lx\n",
> > > + field->ip,
> > > + field->parent_ip);
> > > +
> > > + return trace_handle_return(&iter->seq);
> > > +}
> > > +
> > > +static struct trace_event_functions trace_object_funcs = {
> > > + .trace = trace_object_print,
> > > + .raw = trace_object_raw,
> > > +};
> > > +
> > > +static struct trace_event trace_object_event = {
> > > + .type = TRACE_OBJECT,
> > > + .funcs = &trace_object_funcs,
> > > +};
> > > +
> > > static struct trace_event *events[] __initdata = {
> > > &trace_fn_event,
> > > &trace_ctx_event,
> > > @@ -1566,6 +1605,7 @@ static struct trace_event *events[] __initdata = {
> > > &trace_timerlat_event,
> > > &trace_raw_data_event,
> > > &trace_func_repeats_event,
> > > + &trace_object_event,
> > > NULL
> > > };
> > >
> > > --
> > > 2.25.1
> > >
> >
> >
> > --
> > Masami Hiramatsu <[email protected]>
>
> Thanks,
> ---
> JeffXie

2021-11-21 17:15:47

by Jeff Xie

[permalink] [raw]
Subject: Re: [RFC][PATCH v5 2/4] trace/objtrace: get the value of the object

Hi Masami,

On Fri, Nov 19, 2021 at 10:01 PM Masami Hiramatsu <[email protected]> wrote:
>
> Hi Jeff,
>
> On Sat, 13 Nov 2021 20:06:30 +0800
> Jeff Xie <[email protected]> wrote:
>
> Please describe here what feature this patch adds.
> How to use, and new syntax, etc.
>
> BTW, the syntax for this value trace is a bit confusing.
>
> objtrace:add:OFFS(OBJ):TYPE[:COUNT]
>
> This trace "OBJ", but from the user point of view, this seems to trace
> "OFFS(OBJ)".
>
> I rather like make it optional and split from OBJ as below;
>
> objtrace:add:OBJ[,OFFS:TYPE][:COUNT]
>
> (Note that the part braced by [] is optional.)

Thank you for your suggestion, it does seem clearer, I will modify it like this.

> Thank you,
>
> > Signed-off-by: Jeff Xie <[email protected]>
> > ---
> > kernel/trace/trace_entries.h | 5 +-
> > kernel/trace/trace_object.c | 121 +++++++++++++++++++++++++++++------
> > kernel/trace/trace_output.c | 6 +-
> > 3 files changed, 107 insertions(+), 25 deletions(-)
> >
> > diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> > index bb120d9498a9..2407c45a568c 100644
> > --- a/kernel/trace/trace_entries.h
> > +++ b/kernel/trace/trace_entries.h
> > @@ -413,8 +413,9 @@ FTRACE_ENTRY(object, trace_object_entry,
> > __field( unsigned long, ip )
> > __field( unsigned long, parent_ip )
> > __field( unsigned long, object )
> > + __field( unsigned long, value )
> > ),
> >
> > - F_printk(" %ps <-- %ps object:%lx\n",
> > - (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
> > + F_printk(" %ps <-- %ps object:%lx value:%lx\n", (void *)__entry->ip,
> > + (void *)__entry->parent_ip, __entry->object, __entry->value)
> > );
> > diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
> > index 69465c2ffb7e..14993f7d0e5a 100644
> > --- a/kernel/trace/trace_object.c
> > +++ b/kernel/trace/trace_object.c
> > @@ -11,14 +11,25 @@
> >
> > static DEFINE_PER_CPU(atomic_t, trace_object_event_disable);
> > static struct trace_event_file event_trace_file;
> > -static const int max_args_num = 6;
> > static const int max_obj_pool = 10;
> > static atomic_t trace_object_ref;
> > static int exit_trace_object(void);
> > static int init_trace_object(void);
> >
> > +struct objtrace_trigger_data {
> > + struct ftrace_event_field *field;
> > + long offset;
> > + int type_size;
> > +};
> > +
> > +struct objtrace_fetch_type {
> > + char *name;
> > + int type_size;
> > +};
> > +
> > struct object_instance {
> > void *object;
> > + int obj_type_size;
> > struct freelist_node freelist;
> > };
> >
> > @@ -59,8 +70,7 @@ static bool object_empty(void)
> > return ret;
> > }
> >
> > -
> > -static void set_trace_object(void *obj)
> > +static void set_trace_object(void *obj, int type_size)
> > {
> > struct freelist_node *fn;
> > struct object_instance *ins;
> > @@ -79,6 +89,7 @@ static void set_trace_object(void *obj)
> >
> > ins = container_of(fn, struct object_instance, freelist);
> > ins->object = obj;
> > + ins->obj_type_size = type_size;
> >
> > freelist_add(&ins->freelist, &obj_pool->customer_freelist);
> > atomic_inc(&obj_pool->nobject);
> > @@ -135,7 +146,7 @@ static int init_object_pool(void)
> > }
> >
> > static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> > - unsigned long object)
> > + unsigned long object, unsigned long value)
> > {
> >
> > struct trace_buffer *buffer;
> > @@ -152,6 +163,7 @@ static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> > entry->ip = ip;
> > entry->parent_ip = parent_ip;
> > entry->object = object;
> > + entry->value = value;
> >
> > event_trigger_unlock_commit(&event_trace_file, buffer, event,
> > entry, pc);
> > @@ -161,10 +173,11 @@ static void
> > trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> > struct ftrace_ops *op, struct ftrace_regs *fregs)
> > {
> > - struct pt_regs *pt_regs = ftrace_get_regs(fregs);
> > - unsigned long obj;
> > + struct freelist_node *node;
> > + struct object_instance *inst;
> > + unsigned long val = 0;
> > long disabled;
> > - int cpu, n;
> > + int cpu;
> >
> > preempt_disable_notrace();
> >
> > @@ -177,10 +190,14 @@ trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> > if (object_empty())
> > goto out;
> >
> > - for (n = 0; n < max_args_num; n++) {
> > - obj = regs_get_kernel_argument(pt_regs, n);
> > - if (object_exist((void *)obj))
> > - submit_trace_object(ip, parent_ip, obj);
> > + node = obj_pool->customer_freelist.head;
> > +
> > + while (node) {
> > + inst = container_of(node, struct object_instance, freelist);
> > + if (copy_from_kernel_nofault(&val, inst->object, inst->obj_type_size))
> > + goto out;
> > + submit_trace_object(ip, parent_ip, (unsigned long)inst->object, val);
> > + node = node->next;
> > }
> >
> > out:
> > @@ -198,12 +215,14 @@ trace_object_trigger(struct event_trigger_data *data,
> > struct trace_buffer *buffer, void *rec,
> > struct ring_buffer_event *event)
> > {
> > + struct objtrace_trigger_data *obj_data = data->private_data;
> > + struct ftrace_event_field *field;
> > + void *obj, *val = NULL;
> >
> > - struct ftrace_event_field *field = data->private_data;
> > - void *obj = NULL;
> > -
> > - memcpy(&obj, rec + field->offset, sizeof(obj));
> > - set_trace_object(obj);
> > + field = obj_data->field;
> > + memcpy(&val, rec + field->offset, sizeof(val));
> > + obj = val + obj_data->offset;
> > + set_trace_object(obj, obj_data->type_size);
> > }
> >
> > static void
> > @@ -350,6 +369,22 @@ static void unregister_object_trigger(char *glob, struct event_trigger_ops *ops,
> > }
> > }
> >
> > +static const struct objtrace_fetch_type objtrace_fetch_types[] = {
> > + {"u8", 1},
> > + {"s8", 1},
> > + {"x8", 1},
> > + {"u16", 2},
> > + {"s16", 2},
> > + {"x16", 2},
> > + {"u32", 4},
> > + {"s32", 4},
> > + {"x32", 4},
> > + {"u64", 8},
> > + {"s64", 8},
> > + {"x64", 8},
> > + {}
> > +};
> > +
> > static int
> > event_object_trigger_callback(struct event_command *cmd_ops,
> > struct trace_event_file *file,
> > @@ -357,13 +392,15 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> > {
> > struct event_trigger_data *trigger_data;
> > struct event_trigger_ops *trigger_ops;
> > + struct objtrace_trigger_data *obj_data;
> > struct trace_event_call *call;
> > struct ftrace_event_field *field;
> > char *objtrace_cmd;
> > + long offset = 0;
> > char *trigger = NULL;
> > - char *arg;
> > + char *arg, *type, *tr, *tr_end;
> > char *number;
> > - int ret;
> > + int ret, i, type_size = 0;
> >
> > ret = -EINVAL;
> > if (!param)
> > @@ -386,6 +423,38 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> > arg = strsep(&trigger, ":");
> > if (!arg)
> > goto out;
> > +
> > + tr = strchr(arg, '(');
> > + /* now force to get the value of the val. */
> > + if (!tr)
> > + goto out;
> > + tr_end = strchr(tr, ')');
> > + if (!tr_end)
> > + goto out;
> > + *tr++ = '\0';
> > + *tr_end = '\0';
> > + ret = kstrtol(arg, 0, &offset);
> > + if (ret)
> > + goto out;
> > + arg = tr;
> > + ret = -EINVAL;
> > + if (!trigger)
> > + goto out;
> > +
> > + type = strsep(&trigger, ":");
> > + if (!type)
> > + goto out;
> > + for (i = 0; objtrace_fetch_types[i].name; i++) {
> > + if (strcmp(objtrace_fetch_types[i].name, type) == 0) {
> > + type_size = objtrace_fetch_types[i].type_size;
> > + break;
> > + }
> > + }
> > +
> > + if (type_size == 0)
> > + goto out;
> > +
> > +
> > call = file->event_call;
> > field = trace_find_event_field(call, arg);
> > if (!field)
> > @@ -394,19 +463,30 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> > trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
> >
> > ret = -ENOMEM;
> > + obj_data = kzalloc(sizeof(*obj_data), GFP_KERNEL);
> > + if (!obj_data)
> > + goto out;
> > +
> > + obj_data->field = field;
> > + obj_data->offset = offset;
> > + obj_data->type_size = type_size;
> > +
> > trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
> > - if (!trigger_data)
> > + if (!trigger_data) {
> > + kfree(obj_data);
> > goto out;
> > + }
> >
> > trigger_data->count = -1;
> > trigger_data->ops = trigger_ops;
> > trigger_data->cmd_ops = cmd_ops;
> > - trigger_data->private_data = field;
> > + trigger_data->private_data = obj_data;
> > INIT_LIST_HEAD(&trigger_data->list);
> > INIT_LIST_HEAD(&trigger_data->named_list);
> >
> > if (glob[0] == '!') {
> > cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
> > + kfree(obj_data);
> > kfree(trigger_data);
> > ret = 0;
> > goto out;
> > @@ -461,6 +541,7 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> > out_free:
> > if (cmd_ops->set_filter)
> > cmd_ops->set_filter(NULL, trigger_data, NULL);
> > + kfree(obj_data);
> > kfree(trigger_data);
> > goto out;
> > }
> > diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> > index 76ca560af693..c8c427c23127 100644
> > --- a/kernel/trace/trace_output.c
> > +++ b/kernel/trace/trace_output.c
> > @@ -1562,6 +1562,7 @@ static enum print_line_t trace_object_print(struct trace_iterator *iter, int fla
> > trace_assign_type(field, iter->ent);
> > print_fn_trace(s, field->ip, field->parent_ip, flags);
> > trace_seq_printf(s, " object:0x%lx", field->object);
> > + trace_seq_printf(s, " value:0x%lx", field->value);
> > trace_seq_putc(s, '\n');
> >
> > return trace_handle_return(s);
> > @@ -1574,9 +1575,8 @@ static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags
> >
> > trace_assign_type(field, iter->ent);
> >
> > - trace_seq_printf(&iter->seq, "%lx %lx\n",
> > - field->ip,
> > - field->parent_ip);
> > + trace_seq_printf(&iter->seq, "%lx %lx %lx %lx\n", field->ip,
> > + field->parent_ip, field->object, field->value);
> >
> > return trace_handle_return(&iter->seq);
> > }
> > --
> > 2.25.1
> >
>
>
> --
> Masami Hiramatsu <[email protected]>

Thanks,
---
JeffXie

2021-11-21 17:16:37

by Jeff Xie

[permalink] [raw]
Subject: Re: [RFC][PATCH v5 3/4] trace/objtrace: Add testcases for objtrace trigger parsing

Hi Masami,

On Fri, Nov 19, 2021 at 10:09 PM Masami Hiramatsu <[email protected]> wrote:
>
> On Sat, 13 Nov 2021 20:06:31 +0800
> Jeff Xie <[email protected]> wrote:
>
> Please add a patch description here.
>
> > Signed-off-by: Jeff Xie <[email protected]>
> > ---
> > .../ftrace/test.d/trigger/trigger-objtrace.tc | 41 +++++++++++++++++++
> > 1 file changed, 41 insertions(+)
> > create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
> >
> > diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
> > new file mode 100644
> > index 000000000000..4efcd7eee694
> > --- /dev/null
> > +++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
> > @@ -0,0 +1,41 @@
> > +#!/bin/sh
> > +# SPDX-License-Identifier: GPL-2.0
> > +# description: event trigger - test objtrace-trigger
> > +# requires: kprobe_events
>
> Hmm, didn't you add something for this in <tracefs>/README?

Sorry, I ignored it.

> Instead of checking trigger file, I recommend to update
>
> There is a text data in the kernel/trace/trace.c;
> static const char readme_msg[] =
> ...
> "\t trigger: traceon, traceoff\n"
> "\t enable_event:<system>:<event>\n"
> "\t disable_event:<system>:<event>\n"
> #ifdef CONFIG_HIST_TRIGGERS
> "\t enable_hist:<system>:<event>\n"
> "\t disable_hist:<system>:<event>\n"
> #endif
>
> You can add somewhere near there as
>
> #ifdef CONFIG_TRACE_OBJECT
> "\t objtrace:add:..." // fill syntax.
> #endif
>
> And then, you can write the pattern in the requires line, like
>
> # requires: kprobe_events "objtrace":README
>
> Thank you,

Thanks, I will add them to <tracefs>/README, and modify the
trigger-objtrace.tc file.

> > +
> > +fail() { #msg
> > + echo $1
> > + exit_fail
> > +}
> > +
> > +echo 'p bio_add_page arg1=$arg1' > kprobe_events
> > +
> > +FEATURE=`grep objtrace events/kprobes/p_bio_add_page_0/trigger`
> > +if [ -z "$FEATURE" ]; then
> > + echo "objtrace trigger is not supported"
> > + exit_unsupported
> > +fi
>
>
> > +
> > +echo "Test objtrace trigger"
> > +echo 'objtrace:add:0x28(arg1):u32:1 if comm == "cat"' > \
> > + events/kprobes/p_bio_add_page_0/trigger
> > +if [ -z $? ]; then
> > + fail "objtrace trigger syntax error"
> > +fi
> > +
> > +echo "Test objtrace semantic errors"
> > +
> > +# Being lack of type size
> > +! echo 'objtrace:add:0x28(arg1):1' > events/kprobes/p_bio_add_page_0/trigger
> > +# Being lack of objtrace command
> > +! echo 'objtrace:0x28(arg1):u32:1' > events/kprobes/p_bio_add_page_0/trigger
> > +# Bad parameter name
> > +! echo 'objtrace:add:argx:u32:1' > events/kprobes/p_bio_add_page_0/trigger
> > +
> > +echo "reset objtrace trigger"
> > +
> > +echo '!objtrace:add:0x28(arg1):u32' > \
> > + events/kprobes/p_bio_add_page_0/trigger
> > +echo '-:p_bio_add_page_0' >> ./kprobe_events
> > +
> > +exit 0
> > --
> > 2.25.1
> >
>
>
> --
> Masami Hiramatsu <[email protected]>
Thanks,
---
JeffXie

2021-11-26 14:32:38

by Jeff Xie

[permalink] [raw]
Subject: Re: [RFC][PATCH v5 2/4] trace/objtrace: get the value of the object

Hi Masami,

On Mon, Nov 22, 2021 at 1:15 AM Jeff Xie <[email protected]> wrote:
>
> Hi Masami,
>
> On Fri, Nov 19, 2021 at 10:01 PM Masami Hiramatsu <[email protected]> wrote:
> >
> > Hi Jeff,
> >
> > On Sat, 13 Nov 2021 20:06:30 +0800
> > Jeff Xie <[email protected]> wrote:
> >
> > Please describe here what feature this patch adds.
> > How to use, and new syntax, etc.
> >
> > BTW, the syntax for this value trace is a bit confusing.
> >
> > objtrace:add:OFFS(OBJ):TYPE[:COUNT]
> >
> > This trace "OBJ", but from the user point of view, this seems to trace
> > "OFFS(OBJ)".
> >
> > I rather like make it optional and split from OBJ as below;
> >
> > objtrace:add:OBJ[,OFFS:TYPE][:COUNT]

I have been thinking about this place for a long time, would it be
better if it like below?
objtrace:add:OBJ[,OFFS][:TYPE][:COUNT]

If the user does not specify the type, the default type will be u64.

> >
> > (Note that the part braced by [] is optional.)
>
> Thank you for your suggestion, it does seem clearer, I will modify it like this.
>
> > Thank you,
> >
> > > Signed-off-by: Jeff Xie <[email protected]>
> > > ---
> > > kernel/trace/trace_entries.h | 5 +-
> > > kernel/trace/trace_object.c | 121 +++++++++++++++++++++++++++++------
> > > kernel/trace/trace_output.c | 6 +-
> > > 3 files changed, 107 insertions(+), 25 deletions(-)
> > >
> > > diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> > > index bb120d9498a9..2407c45a568c 100644
> > > --- a/kernel/trace/trace_entries.h
> > > +++ b/kernel/trace/trace_entries.h
> > > @@ -413,8 +413,9 @@ FTRACE_ENTRY(object, trace_object_entry,
> > > __field( unsigned long, ip )
> > > __field( unsigned long, parent_ip )
> > > __field( unsigned long, object )
> > > + __field( unsigned long, value )
> > > ),
> > >
> > > - F_printk(" %ps <-- %ps object:%lx\n",
> > > - (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
> > > + F_printk(" %ps <-- %ps object:%lx value:%lx\n", (void *)__entry->ip,
> > > + (void *)__entry->parent_ip, __entry->object, __entry->value)
> > > );
> > > diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
> > > index 69465c2ffb7e..14993f7d0e5a 100644
> > > --- a/kernel/trace/trace_object.c
> > > +++ b/kernel/trace/trace_object.c
> > > @@ -11,14 +11,25 @@
> > >
> > > static DEFINE_PER_CPU(atomic_t, trace_object_event_disable);
> > > static struct trace_event_file event_trace_file;
> > > -static const int max_args_num = 6;
> > > static const int max_obj_pool = 10;
> > > static atomic_t trace_object_ref;
> > > static int exit_trace_object(void);
> > > static int init_trace_object(void);
> > >
> > > +struct objtrace_trigger_data {
> > > + struct ftrace_event_field *field;
> > > + long offset;
> > > + int type_size;
> > > +};
> > > +
> > > +struct objtrace_fetch_type {
> > > + char *name;
> > > + int type_size;
> > > +};
> > > +
> > > struct object_instance {
> > > void *object;
> > > + int obj_type_size;
> > > struct freelist_node freelist;
> > > };
> > >
> > > @@ -59,8 +70,7 @@ static bool object_empty(void)
> > > return ret;
> > > }
> > >
> > > -
> > > -static void set_trace_object(void *obj)
> > > +static void set_trace_object(void *obj, int type_size)
> > > {
> > > struct freelist_node *fn;
> > > struct object_instance *ins;
> > > @@ -79,6 +89,7 @@ static void set_trace_object(void *obj)
> > >
> > > ins = container_of(fn, struct object_instance, freelist);
> > > ins->object = obj;
> > > + ins->obj_type_size = type_size;
> > >
> > > freelist_add(&ins->freelist, &obj_pool->customer_freelist);
> > > atomic_inc(&obj_pool->nobject);
> > > @@ -135,7 +146,7 @@ static int init_object_pool(void)
> > > }
> > >
> > > static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> > > - unsigned long object)
> > > + unsigned long object, unsigned long value)
> > > {
> > >
> > > struct trace_buffer *buffer;
> > > @@ -152,6 +163,7 @@ static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> > > entry->ip = ip;
> > > entry->parent_ip = parent_ip;
> > > entry->object = object;
> > > + entry->value = value;
> > >
> > > event_trigger_unlock_commit(&event_trace_file, buffer, event,
> > > entry, pc);
> > > @@ -161,10 +173,11 @@ static void
> > > trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> > > struct ftrace_ops *op, struct ftrace_regs *fregs)
> > > {
> > > - struct pt_regs *pt_regs = ftrace_get_regs(fregs);
> > > - unsigned long obj;
> > > + struct freelist_node *node;
> > > + struct object_instance *inst;
> > > + unsigned long val = 0;
> > > long disabled;
> > > - int cpu, n;
> > > + int cpu;
> > >
> > > preempt_disable_notrace();
> > >
> > > @@ -177,10 +190,14 @@ trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> > > if (object_empty())
> > > goto out;
> > >
> > > - for (n = 0; n < max_args_num; n++) {
> > > - obj = regs_get_kernel_argument(pt_regs, n);
> > > - if (object_exist((void *)obj))
> > > - submit_trace_object(ip, parent_ip, obj);
> > > + node = obj_pool->customer_freelist.head;
> > > +
> > > + while (node) {
> > > + inst = container_of(node, struct object_instance, freelist);
> > > + if (copy_from_kernel_nofault(&val, inst->object, inst->obj_type_size))
> > > + goto out;
> > > + submit_trace_object(ip, parent_ip, (unsigned long)inst->object, val);
> > > + node = node->next;
> > > }
> > >
> > > out:
> > > @@ -198,12 +215,14 @@ trace_object_trigger(struct event_trigger_data *data,
> > > struct trace_buffer *buffer, void *rec,
> > > struct ring_buffer_event *event)
> > > {
> > > + struct objtrace_trigger_data *obj_data = data->private_data;
> > > + struct ftrace_event_field *field;
> > > + void *obj, *val = NULL;
> > >
> > > - struct ftrace_event_field *field = data->private_data;
> > > - void *obj = NULL;
> > > -
> > > - memcpy(&obj, rec + field->offset, sizeof(obj));
> > > - set_trace_object(obj);
> > > + field = obj_data->field;
> > > + memcpy(&val, rec + field->offset, sizeof(val));
> > > + obj = val + obj_data->offset;
> > > + set_trace_object(obj, obj_data->type_size);
> > > }
> > >
> > > static void
> > > @@ -350,6 +369,22 @@ static void unregister_object_trigger(char *glob, struct event_trigger_ops *ops,
> > > }
> > > }
> > >
> > > +static const struct objtrace_fetch_type objtrace_fetch_types[] = {
> > > + {"u8", 1},
> > > + {"s8", 1},
> > > + {"x8", 1},
> > > + {"u16", 2},
> > > + {"s16", 2},
> > > + {"x16", 2},
> > > + {"u32", 4},
> > > + {"s32", 4},
> > > + {"x32", 4},
> > > + {"u64", 8},
> > > + {"s64", 8},
> > > + {"x64", 8},
> > > + {}
> > > +};
> > > +
> > > static int
> > > event_object_trigger_callback(struct event_command *cmd_ops,
> > > struct trace_event_file *file,
> > > @@ -357,13 +392,15 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> > > {
> > > struct event_trigger_data *trigger_data;
> > > struct event_trigger_ops *trigger_ops;
> > > + struct objtrace_trigger_data *obj_data;
> > > struct trace_event_call *call;
> > > struct ftrace_event_field *field;
> > > char *objtrace_cmd;
> > > + long offset = 0;
> > > char *trigger = NULL;
> > > - char *arg;
> > > + char *arg, *type, *tr, *tr_end;
> > > char *number;
> > > - int ret;
> > > + int ret, i, type_size = 0;
> > >
> > > ret = -EINVAL;
> > > if (!param)
> > > @@ -386,6 +423,38 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> > > arg = strsep(&trigger, ":");
> > > if (!arg)
> > > goto out;
> > > +
> > > + tr = strchr(arg, '(');
> > > + /* now force to get the value of the val. */
> > > + if (!tr)
> > > + goto out;
> > > + tr_end = strchr(tr, ')');
> > > + if (!tr_end)
> > > + goto out;
> > > + *tr++ = '\0';
> > > + *tr_end = '\0';
> > > + ret = kstrtol(arg, 0, &offset);
> > > + if (ret)
> > > + goto out;
> > > + arg = tr;
> > > + ret = -EINVAL;
> > > + if (!trigger)
> > > + goto out;
> > > +
> > > + type = strsep(&trigger, ":");
> > > + if (!type)
> > > + goto out;
> > > + for (i = 0; objtrace_fetch_types[i].name; i++) {
> > > + if (strcmp(objtrace_fetch_types[i].name, type) == 0) {
> > > + type_size = objtrace_fetch_types[i].type_size;
> > > + break;
> > > + }
> > > + }
> > > +
> > > + if (type_size == 0)
> > > + goto out;
> > > +
> > > +
> > > call = file->event_call;
> > > field = trace_find_event_field(call, arg);
> > > if (!field)
> > > @@ -394,19 +463,30 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> > > trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
> > >
> > > ret = -ENOMEM;
> > > + obj_data = kzalloc(sizeof(*obj_data), GFP_KERNEL);
> > > + if (!obj_data)
> > > + goto out;
> > > +
> > > + obj_data->field = field;
> > > + obj_data->offset = offset;
> > > + obj_data->type_size = type_size;
> > > +
> > > trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
> > > - if (!trigger_data)
> > > + if (!trigger_data) {
> > > + kfree(obj_data);
> > > goto out;
> > > + }
> > >
> > > trigger_data->count = -1;
> > > trigger_data->ops = trigger_ops;
> > > trigger_data->cmd_ops = cmd_ops;
> > > - trigger_data->private_data = field;
> > > + trigger_data->private_data = obj_data;
> > > INIT_LIST_HEAD(&trigger_data->list);
> > > INIT_LIST_HEAD(&trigger_data->named_list);
> > >
> > > if (glob[0] == '!') {
> > > cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
> > > + kfree(obj_data);
> > > kfree(trigger_data);
> > > ret = 0;
> > > goto out;
> > > @@ -461,6 +541,7 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> > > out_free:
> > > if (cmd_ops->set_filter)
> > > cmd_ops->set_filter(NULL, trigger_data, NULL);
> > > + kfree(obj_data);
> > > kfree(trigger_data);
> > > goto out;
> > > }
> > > diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> > > index 76ca560af693..c8c427c23127 100644
> > > --- a/kernel/trace/trace_output.c
> > > +++ b/kernel/trace/trace_output.c
> > > @@ -1562,6 +1562,7 @@ static enum print_line_t trace_object_print(struct trace_iterator *iter, int fla
> > > trace_assign_type(field, iter->ent);
> > > print_fn_trace(s, field->ip, field->parent_ip, flags);
> > > trace_seq_printf(s, " object:0x%lx", field->object);
> > > + trace_seq_printf(s, " value:0x%lx", field->value);
> > > trace_seq_putc(s, '\n');
> > >
> > > return trace_handle_return(s);
> > > @@ -1574,9 +1575,8 @@ static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags
> > >
> > > trace_assign_type(field, iter->ent);
> > >
> > > - trace_seq_printf(&iter->seq, "%lx %lx\n",
> > > - field->ip,
> > > - field->parent_ip);
> > > + trace_seq_printf(&iter->seq, "%lx %lx %lx %lx\n", field->ip,
> > > + field->parent_ip, field->object, field->value);
> > >
> > > return trace_handle_return(&iter->seq);
> > > }
> > > --
> > > 2.25.1
> > >
> >
> >
> > --
> > Masami Hiramatsu <[email protected]>
>
> Thanks,
> ---
> JeffXie

---
JeffXie

2021-11-26 15:28:46

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [RFC][PATCH v5 2/4] trace/objtrace: get the value of the object

Hello Jeff,

On Fri, 26 Nov 2021 21:47:24 +0800
Jeff Xie <[email protected]> wrote:

> Hi Masami,
>
> On Mon, Nov 22, 2021 at 1:15 AM Jeff Xie <[email protected]> wrote:
> >
> > Hi Masami,
> >
> > On Fri, Nov 19, 2021 at 10:01 PM Masami Hiramatsu <[email protected]> wrote:
> > >
> > > Hi Jeff,
> > >
> > > On Sat, 13 Nov 2021 20:06:30 +0800
> > > Jeff Xie <[email protected]> wrote:
> > >
> > > Please describe here what feature this patch adds.
> > > How to use, and new syntax, etc.
> > >
> > > BTW, the syntax for this value trace is a bit confusing.
> > >
> > > objtrace:add:OFFS(OBJ):TYPE[:COUNT]
> > >
> > > This trace "OBJ", but from the user point of view, this seems to trace
> > > "OFFS(OBJ)".
> > >
> > > I rather like make it optional and split from OBJ as below;
> > >
> > > objtrace:add:OBJ[,OFFS:TYPE][:COUNT]
>
> I have been thinking about this place for a long time, would it be
> better if it like below?
> objtrace:add:OBJ[,OFFS][:TYPE][:COUNT]
>
> If the user does not specify the type, the default type will be u64.

u64 or ulong? In kprobe event, I chose ulong (u32 for 32bit arch and
u64 for 64bit arch). If the object or the data at offs is a pointer,
it is better to use appropriate type for each arch.

Thank you,



>
> > >
> > > (Note that the part braced by [] is optional.)
> >
> > Thank you for your suggestion, it does seem clearer, I will modify it like this.
> >
> > > Thank you,
> > >
> > > > Signed-off-by: Jeff Xie <[email protected]>
> > > > ---
> > > > kernel/trace/trace_entries.h | 5 +-
> > > > kernel/trace/trace_object.c | 121 +++++++++++++++++++++++++++++------
> > > > kernel/trace/trace_output.c | 6 +-
> > > > 3 files changed, 107 insertions(+), 25 deletions(-)
> > > >
> > > > diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> > > > index bb120d9498a9..2407c45a568c 100644
> > > > --- a/kernel/trace/trace_entries.h
> > > > +++ b/kernel/trace/trace_entries.h
> > > > @@ -413,8 +413,9 @@ FTRACE_ENTRY(object, trace_object_entry,
> > > > __field( unsigned long, ip )
> > > > __field( unsigned long, parent_ip )
> > > > __field( unsigned long, object )
> > > > + __field( unsigned long, value )
> > > > ),
> > > >
> > > > - F_printk(" %ps <-- %ps object:%lx\n",
> > > > - (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
> > > > + F_printk(" %ps <-- %ps object:%lx value:%lx\n", (void *)__entry->ip,
> > > > + (void *)__entry->parent_ip, __entry->object, __entry->value)
> > > > );
> > > > diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
> > > > index 69465c2ffb7e..14993f7d0e5a 100644
> > > > --- a/kernel/trace/trace_object.c
> > > > +++ b/kernel/trace/trace_object.c
> > > > @@ -11,14 +11,25 @@
> > > >
> > > > static DEFINE_PER_CPU(atomic_t, trace_object_event_disable);
> > > > static struct trace_event_file event_trace_file;
> > > > -static const int max_args_num = 6;
> > > > static const int max_obj_pool = 10;
> > > > static atomic_t trace_object_ref;
> > > > static int exit_trace_object(void);
> > > > static int init_trace_object(void);
> > > >
> > > > +struct objtrace_trigger_data {
> > > > + struct ftrace_event_field *field;
> > > > + long offset;
> > > > + int type_size;
> > > > +};
> > > > +
> > > > +struct objtrace_fetch_type {
> > > > + char *name;
> > > > + int type_size;
> > > > +};
> > > > +
> > > > struct object_instance {
> > > > void *object;
> > > > + int obj_type_size;
> > > > struct freelist_node freelist;
> > > > };
> > > >
> > > > @@ -59,8 +70,7 @@ static bool object_empty(void)
> > > > return ret;
> > > > }
> > > >
> > > > -
> > > > -static void set_trace_object(void *obj)
> > > > +static void set_trace_object(void *obj, int type_size)
> > > > {
> > > > struct freelist_node *fn;
> > > > struct object_instance *ins;
> > > > @@ -79,6 +89,7 @@ static void set_trace_object(void *obj)
> > > >
> > > > ins = container_of(fn, struct object_instance, freelist);
> > > > ins->object = obj;
> > > > + ins->obj_type_size = type_size;
> > > >
> > > > freelist_add(&ins->freelist, &obj_pool->customer_freelist);
> > > > atomic_inc(&obj_pool->nobject);
> > > > @@ -135,7 +146,7 @@ static int init_object_pool(void)
> > > > }
> > > >
> > > > static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> > > > - unsigned long object)
> > > > + unsigned long object, unsigned long value)
> > > > {
> > > >
> > > > struct trace_buffer *buffer;
> > > > @@ -152,6 +163,7 @@ static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> > > > entry->ip = ip;
> > > > entry->parent_ip = parent_ip;
> > > > entry->object = object;
> > > > + entry->value = value;
> > > >
> > > > event_trigger_unlock_commit(&event_trace_file, buffer, event,
> > > > entry, pc);
> > > > @@ -161,10 +173,11 @@ static void
> > > > trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> > > > struct ftrace_ops *op, struct ftrace_regs *fregs)
> > > > {
> > > > - struct pt_regs *pt_regs = ftrace_get_regs(fregs);
> > > > - unsigned long obj;
> > > > + struct freelist_node *node;
> > > > + struct object_instance *inst;
> > > > + unsigned long val = 0;
> > > > long disabled;
> > > > - int cpu, n;
> > > > + int cpu;
> > > >
> > > > preempt_disable_notrace();
> > > >
> > > > @@ -177,10 +190,14 @@ trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> > > > if (object_empty())
> > > > goto out;
> > > >
> > > > - for (n = 0; n < max_args_num; n++) {
> > > > - obj = regs_get_kernel_argument(pt_regs, n);
> > > > - if (object_exist((void *)obj))
> > > > - submit_trace_object(ip, parent_ip, obj);
> > > > + node = obj_pool->customer_freelist.head;
> > > > +
> > > > + while (node) {
> > > > + inst = container_of(node, struct object_instance, freelist);
> > > > + if (copy_from_kernel_nofault(&val, inst->object, inst->obj_type_size))
> > > > + goto out;
> > > > + submit_trace_object(ip, parent_ip, (unsigned long)inst->object, val);
> > > > + node = node->next;
> > > > }
> > > >
> > > > out:
> > > > @@ -198,12 +215,14 @@ trace_object_trigger(struct event_trigger_data *data,
> > > > struct trace_buffer *buffer, void *rec,
> > > > struct ring_buffer_event *event)
> > > > {
> > > > + struct objtrace_trigger_data *obj_data = data->private_data;
> > > > + struct ftrace_event_field *field;
> > > > + void *obj, *val = NULL;
> > > >
> > > > - struct ftrace_event_field *field = data->private_data;
> > > > - void *obj = NULL;
> > > > -
> > > > - memcpy(&obj, rec + field->offset, sizeof(obj));
> > > > - set_trace_object(obj);
> > > > + field = obj_data->field;
> > > > + memcpy(&val, rec + field->offset, sizeof(val));
> > > > + obj = val + obj_data->offset;
> > > > + set_trace_object(obj, obj_data->type_size);
> > > > }
> > > >
> > > > static void
> > > > @@ -350,6 +369,22 @@ static void unregister_object_trigger(char *glob, struct event_trigger_ops *ops,
> > > > }
> > > > }
> > > >
> > > > +static const struct objtrace_fetch_type objtrace_fetch_types[] = {
> > > > + {"u8", 1},
> > > > + {"s8", 1},
> > > > + {"x8", 1},
> > > > + {"u16", 2},
> > > > + {"s16", 2},
> > > > + {"x16", 2},
> > > > + {"u32", 4},
> > > > + {"s32", 4},
> > > > + {"x32", 4},
> > > > + {"u64", 8},
> > > > + {"s64", 8},
> > > > + {"x64", 8},
> > > > + {}
> > > > +};
> > > > +
> > > > static int
> > > > event_object_trigger_callback(struct event_command *cmd_ops,
> > > > struct trace_event_file *file,
> > > > @@ -357,13 +392,15 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> > > > {
> > > > struct event_trigger_data *trigger_data;
> > > > struct event_trigger_ops *trigger_ops;
> > > > + struct objtrace_trigger_data *obj_data;
> > > > struct trace_event_call *call;
> > > > struct ftrace_event_field *field;
> > > > char *objtrace_cmd;
> > > > + long offset = 0;
> > > > char *trigger = NULL;
> > > > - char *arg;
> > > > + char *arg, *type, *tr, *tr_end;
> > > > char *number;
> > > > - int ret;
> > > > + int ret, i, type_size = 0;
> > > >
> > > > ret = -EINVAL;
> > > > if (!param)
> > > > @@ -386,6 +423,38 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> > > > arg = strsep(&trigger, ":");
> > > > if (!arg)
> > > > goto out;
> > > > +
> > > > + tr = strchr(arg, '(');
> > > > + /* now force to get the value of the val. */
> > > > + if (!tr)
> > > > + goto out;
> > > > + tr_end = strchr(tr, ')');
> > > > + if (!tr_end)
> > > > + goto out;
> > > > + *tr++ = '\0';
> > > > + *tr_end = '\0';
> > > > + ret = kstrtol(arg, 0, &offset);
> > > > + if (ret)
> > > > + goto out;
> > > > + arg = tr;
> > > > + ret = -EINVAL;
> > > > + if (!trigger)
> > > > + goto out;
> > > > +
> > > > + type = strsep(&trigger, ":");
> > > > + if (!type)
> > > > + goto out;
> > > > + for (i = 0; objtrace_fetch_types[i].name; i++) {
> > > > + if (strcmp(objtrace_fetch_types[i].name, type) == 0) {
> > > > + type_size = objtrace_fetch_types[i].type_size;
> > > > + break;
> > > > + }
> > > > + }
> > > > +
> > > > + if (type_size == 0)
> > > > + goto out;
> > > > +
> > > > +
> > > > call = file->event_call;
> > > > field = trace_find_event_field(call, arg);
> > > > if (!field)
> > > > @@ -394,19 +463,30 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> > > > trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
> > > >
> > > > ret = -ENOMEM;
> > > > + obj_data = kzalloc(sizeof(*obj_data), GFP_KERNEL);
> > > > + if (!obj_data)
> > > > + goto out;
> > > > +
> > > > + obj_data->field = field;
> > > > + obj_data->offset = offset;
> > > > + obj_data->type_size = type_size;
> > > > +
> > > > trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
> > > > - if (!trigger_data)
> > > > + if (!trigger_data) {
> > > > + kfree(obj_data);
> > > > goto out;
> > > > + }
> > > >
> > > > trigger_data->count = -1;
> > > > trigger_data->ops = trigger_ops;
> > > > trigger_data->cmd_ops = cmd_ops;
> > > > - trigger_data->private_data = field;
> > > > + trigger_data->private_data = obj_data;
> > > > INIT_LIST_HEAD(&trigger_data->list);
> > > > INIT_LIST_HEAD(&trigger_data->named_list);
> > > >
> > > > if (glob[0] == '!') {
> > > > cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
> > > > + kfree(obj_data);
> > > > kfree(trigger_data);
> > > > ret = 0;
> > > > goto out;
> > > > @@ -461,6 +541,7 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> > > > out_free:
> > > > if (cmd_ops->set_filter)
> > > > cmd_ops->set_filter(NULL, trigger_data, NULL);
> > > > + kfree(obj_data);
> > > > kfree(trigger_data);
> > > > goto out;
> > > > }
> > > > diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> > > > index 76ca560af693..c8c427c23127 100644
> > > > --- a/kernel/trace/trace_output.c
> > > > +++ b/kernel/trace/trace_output.c
> > > > @@ -1562,6 +1562,7 @@ static enum print_line_t trace_object_print(struct trace_iterator *iter, int fla
> > > > trace_assign_type(field, iter->ent);
> > > > print_fn_trace(s, field->ip, field->parent_ip, flags);
> > > > trace_seq_printf(s, " object:0x%lx", field->object);
> > > > + trace_seq_printf(s, " value:0x%lx", field->value);
> > > > trace_seq_putc(s, '\n');
> > > >
> > > > return trace_handle_return(s);
> > > > @@ -1574,9 +1575,8 @@ static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags
> > > >
> > > > trace_assign_type(field, iter->ent);
> > > >
> > > > - trace_seq_printf(&iter->seq, "%lx %lx\n",
> > > > - field->ip,
> > > > - field->parent_ip);
> > > > + trace_seq_printf(&iter->seq, "%lx %lx %lx %lx\n", field->ip,
> > > > + field->parent_ip, field->object, field->value);
> > > >
> > > > return trace_handle_return(&iter->seq);
> > > > }
> > > > --
> > > > 2.25.1
> > > >
> > >
> > >
> > > --
> > > Masami Hiramatsu <[email protected]>
> >
> > Thanks,
> > ---
> > JeffXie
>
> ---
> JeffXie


--
Masami Hiramatsu <[email protected]>

2021-11-26 15:41:34

by Jeff Xie

[permalink] [raw]
Subject: Re: [RFC][PATCH v5 2/4] trace/objtrace: get the value of the object

Hi Masami,

On Fri, Nov 26, 2021 at 11:19 PM Masami Hiramatsu <[email protected]> wrote:
>
> Hello Jeff,
>
> On Fri, 26 Nov 2021 21:47:24 +0800
> Jeff Xie <[email protected]> wrote:
>
> > Hi Masami,
> >
> > On Mon, Nov 22, 2021 at 1:15 AM Jeff Xie <[email protected]> wrote:
> > >
> > > Hi Masami,
> > >
> > > On Fri, Nov 19, 2021 at 10:01 PM Masami Hiramatsu <[email protected]> wrote:
> > > >
> > > > Hi Jeff,
> > > >
> > > > On Sat, 13 Nov 2021 20:06:30 +0800
> > > > Jeff Xie <[email protected]> wrote:
> > > >
> > > > Please describe here what feature this patch adds.
> > > > How to use, and new syntax, etc.
> > > >
> > > > BTW, the syntax for this value trace is a bit confusing.
> > > >
> > > > objtrace:add:OFFS(OBJ):TYPE[:COUNT]
> > > >
> > > > This trace "OBJ", but from the user point of view, this seems to trace
> > > > "OFFS(OBJ)".
> > > >
> > > > I rather like make it optional and split from OBJ as below;
> > > >
> > > > objtrace:add:OBJ[,OFFS:TYPE][:COUNT]
> >
> > I have been thinking about this place for a long time, would it be
> > better if it like below?
> > objtrace:add:OBJ[,OFFS][:TYPE][:COUNT]
> >
> > If the user does not specify the type, the default type will be u64.
>
> u64 or ulong? In kprobe event, I chose ulong (u32 for 32bit arch and
> u64 for 64bit arch). If the object or the data at offs is a pointer,
> it is better to use appropriate type for each arch.

It is ulong, I will never forget a 32-bit architecture ;-)

Thanks.

> Thank you,
>
>
>
> >
> > > >
> > > > (Note that the part braced by [] is optional.)
> > >
> > > Thank you for your suggestion, it does seem clearer, I will modify it like this.
> > >
> > > > Thank you,
> > > >
> > > > > Signed-off-by: Jeff Xie <[email protected]>
> > > > > ---
> > > > > kernel/trace/trace_entries.h | 5 +-
> > > > > kernel/trace/trace_object.c | 121 +++++++++++++++++++++++++++++------
> > > > > kernel/trace/trace_output.c | 6 +-
> > > > > 3 files changed, 107 insertions(+), 25 deletions(-)
> > > > >
> > > > > diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> > > > > index bb120d9498a9..2407c45a568c 100644
> > > > > --- a/kernel/trace/trace_entries.h
> > > > > +++ b/kernel/trace/trace_entries.h
> > > > > @@ -413,8 +413,9 @@ FTRACE_ENTRY(object, trace_object_entry,
> > > > > __field( unsigned long, ip )
> > > > > __field( unsigned long, parent_ip )
> > > > > __field( unsigned long, object )
> > > > > + __field( unsigned long, value )
> > > > > ),
> > > > >
> > > > > - F_printk(" %ps <-- %ps object:%lx\n",
> > > > > - (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
> > > > > + F_printk(" %ps <-- %ps object:%lx value:%lx\n", (void *)__entry->ip,
> > > > > + (void *)__entry->parent_ip, __entry->object, __entry->value)
> > > > > );
> > > > > diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
> > > > > index 69465c2ffb7e..14993f7d0e5a 100644
> > > > > --- a/kernel/trace/trace_object.c
> > > > > +++ b/kernel/trace/trace_object.c
> > > > > @@ -11,14 +11,25 @@
> > > > >
> > > > > static DEFINE_PER_CPU(atomic_t, trace_object_event_disable);
> > > > > static struct trace_event_file event_trace_file;
> > > > > -static const int max_args_num = 6;
> > > > > static const int max_obj_pool = 10;
> > > > > static atomic_t trace_object_ref;
> > > > > static int exit_trace_object(void);
> > > > > static int init_trace_object(void);
> > > > >
> > > > > +struct objtrace_trigger_data {
> > > > > + struct ftrace_event_field *field;
> > > > > + long offset;
> > > > > + int type_size;
> > > > > +};
> > > > > +
> > > > > +struct objtrace_fetch_type {
> > > > > + char *name;
> > > > > + int type_size;
> > > > > +};
> > > > > +
> > > > > struct object_instance {
> > > > > void *object;
> > > > > + int obj_type_size;
> > > > > struct freelist_node freelist;
> > > > > };
> > > > >
> > > > > @@ -59,8 +70,7 @@ static bool object_empty(void)
> > > > > return ret;
> > > > > }
> > > > >
> > > > > -
> > > > > -static void set_trace_object(void *obj)
> > > > > +static void set_trace_object(void *obj, int type_size)
> > > > > {
> > > > > struct freelist_node *fn;
> > > > > struct object_instance *ins;
> > > > > @@ -79,6 +89,7 @@ static void set_trace_object(void *obj)
> > > > >
> > > > > ins = container_of(fn, struct object_instance, freelist);
> > > > > ins->object = obj;
> > > > > + ins->obj_type_size = type_size;
> > > > >
> > > > > freelist_add(&ins->freelist, &obj_pool->customer_freelist);
> > > > > atomic_inc(&obj_pool->nobject);
> > > > > @@ -135,7 +146,7 @@ static int init_object_pool(void)
> > > > > }
> > > > >
> > > > > static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> > > > > - unsigned long object)
> > > > > + unsigned long object, unsigned long value)
> > > > > {
> > > > >
> > > > > struct trace_buffer *buffer;
> > > > > @@ -152,6 +163,7 @@ static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> > > > > entry->ip = ip;
> > > > > entry->parent_ip = parent_ip;
> > > > > entry->object = object;
> > > > > + entry->value = value;
> > > > >
> > > > > event_trigger_unlock_commit(&event_trace_file, buffer, event,
> > > > > entry, pc);
> > > > > @@ -161,10 +173,11 @@ static void
> > > > > trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> > > > > struct ftrace_ops *op, struct ftrace_regs *fregs)
> > > > > {
> > > > > - struct pt_regs *pt_regs = ftrace_get_regs(fregs);
> > > > > - unsigned long obj;
> > > > > + struct freelist_node *node;
> > > > > + struct object_instance *inst;
> > > > > + unsigned long val = 0;
> > > > > long disabled;
> > > > > - int cpu, n;
> > > > > + int cpu;
> > > > >
> > > > > preempt_disable_notrace();
> > > > >
> > > > > @@ -177,10 +190,14 @@ trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> > > > > if (object_empty())
> > > > > goto out;
> > > > >
> > > > > - for (n = 0; n < max_args_num; n++) {
> > > > > - obj = regs_get_kernel_argument(pt_regs, n);
> > > > > - if (object_exist((void *)obj))
> > > > > - submit_trace_object(ip, parent_ip, obj);
> > > > > + node = obj_pool->customer_freelist.head;
> > > > > +
> > > > > + while (node) {
> > > > > + inst = container_of(node, struct object_instance, freelist);
> > > > > + if (copy_from_kernel_nofault(&val, inst->object, inst->obj_type_size))
> > > > > + goto out;
> > > > > + submit_trace_object(ip, parent_ip, (unsigned long)inst->object, val);
> > > > > + node = node->next;
> > > > > }
> > > > >
> > > > > out:
> > > > > @@ -198,12 +215,14 @@ trace_object_trigger(struct event_trigger_data *data,
> > > > > struct trace_buffer *buffer, void *rec,
> > > > > struct ring_buffer_event *event)
> > > > > {
> > > > > + struct objtrace_trigger_data *obj_data = data->private_data;
> > > > > + struct ftrace_event_field *field;
> > > > > + void *obj, *val = NULL;
> > > > >
> > > > > - struct ftrace_event_field *field = data->private_data;
> > > > > - void *obj = NULL;
> > > > > -
> > > > > - memcpy(&obj, rec + field->offset, sizeof(obj));
> > > > > - set_trace_object(obj);
> > > > > + field = obj_data->field;
> > > > > + memcpy(&val, rec + field->offset, sizeof(val));
> > > > > + obj = val + obj_data->offset;
> > > > > + set_trace_object(obj, obj_data->type_size);
> > > > > }
> > > > >
> > > > > static void
> > > > > @@ -350,6 +369,22 @@ static void unregister_object_trigger(char *glob, struct event_trigger_ops *ops,
> > > > > }
> > > > > }
> > > > >
> > > > > +static const struct objtrace_fetch_type objtrace_fetch_types[] = {
> > > > > + {"u8", 1},
> > > > > + {"s8", 1},
> > > > > + {"x8", 1},
> > > > > + {"u16", 2},
> > > > > + {"s16", 2},
> > > > > + {"x16", 2},
> > > > > + {"u32", 4},
> > > > > + {"s32", 4},
> > > > > + {"x32", 4},
> > > > > + {"u64", 8},
> > > > > + {"s64", 8},
> > > > > + {"x64", 8},
> > > > > + {}
> > > > > +};
> > > > > +
> > > > > static int
> > > > > event_object_trigger_callback(struct event_command *cmd_ops,
> > > > > struct trace_event_file *file,
> > > > > @@ -357,13 +392,15 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> > > > > {
> > > > > struct event_trigger_data *trigger_data;
> > > > > struct event_trigger_ops *trigger_ops;
> > > > > + struct objtrace_trigger_data *obj_data;
> > > > > struct trace_event_call *call;
> > > > > struct ftrace_event_field *field;
> > > > > char *objtrace_cmd;
> > > > > + long offset = 0;
> > > > > char *trigger = NULL;
> > > > > - char *arg;
> > > > > + char *arg, *type, *tr, *tr_end;
> > > > > char *number;
> > > > > - int ret;
> > > > > + int ret, i, type_size = 0;
> > > > >
> > > > > ret = -EINVAL;
> > > > > if (!param)
> > > > > @@ -386,6 +423,38 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> > > > > arg = strsep(&trigger, ":");
> > > > > if (!arg)
> > > > > goto out;
> > > > > +
> > > > > + tr = strchr(arg, '(');
> > > > > + /* now force to get the value of the val. */
> > > > > + if (!tr)
> > > > > + goto out;
> > > > > + tr_end = strchr(tr, ')');
> > > > > + if (!tr_end)
> > > > > + goto out;
> > > > > + *tr++ = '\0';
> > > > > + *tr_end = '\0';
> > > > > + ret = kstrtol(arg, 0, &offset);
> > > > > + if (ret)
> > > > > + goto out;
> > > > > + arg = tr;
> > > > > + ret = -EINVAL;
> > > > > + if (!trigger)
> > > > > + goto out;
> > > > > +
> > > > > + type = strsep(&trigger, ":");
> > > > > + if (!type)
> > > > > + goto out;
> > > > > + for (i = 0; objtrace_fetch_types[i].name; i++) {
> > > > > + if (strcmp(objtrace_fetch_types[i].name, type) == 0) {
> > > > > + type_size = objtrace_fetch_types[i].type_size;
> > > > > + break;
> > > > > + }
> > > > > + }
> > > > > +
> > > > > + if (type_size == 0)
> > > > > + goto out;
> > > > > +
> > > > > +
> > > > > call = file->event_call;
> > > > > field = trace_find_event_field(call, arg);
> > > > > if (!field)
> > > > > @@ -394,19 +463,30 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> > > > > trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
> > > > >
> > > > > ret = -ENOMEM;
> > > > > + obj_data = kzalloc(sizeof(*obj_data), GFP_KERNEL);
> > > > > + if (!obj_data)
> > > > > + goto out;
> > > > > +
> > > > > + obj_data->field = field;
> > > > > + obj_data->offset = offset;
> > > > > + obj_data->type_size = type_size;
> > > > > +
> > > > > trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
> > > > > - if (!trigger_data)
> > > > > + if (!trigger_data) {
> > > > > + kfree(obj_data);
> > > > > goto out;
> > > > > + }
> > > > >
> > > > > trigger_data->count = -1;
> > > > > trigger_data->ops = trigger_ops;
> > > > > trigger_data->cmd_ops = cmd_ops;
> > > > > - trigger_data->private_data = field;
> > > > > + trigger_data->private_data = obj_data;
> > > > > INIT_LIST_HEAD(&trigger_data->list);
> > > > > INIT_LIST_HEAD(&trigger_data->named_list);
> > > > >
> > > > > if (glob[0] == '!') {
> > > > > cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
> > > > > + kfree(obj_data);
> > > > > kfree(trigger_data);
> > > > > ret = 0;
> > > > > goto out;
> > > > > @@ -461,6 +541,7 @@ event_object_trigger_callback(struct event_command *cmd_ops,
> > > > > out_free:
> > > > > if (cmd_ops->set_filter)
> > > > > cmd_ops->set_filter(NULL, trigger_data, NULL);
> > > > > + kfree(obj_data);
> > > > > kfree(trigger_data);
> > > > > goto out;
> > > > > }
> > > > > diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> > > > > index 76ca560af693..c8c427c23127 100644
> > > > > --- a/kernel/trace/trace_output.c
> > > > > +++ b/kernel/trace/trace_output.c
> > > > > @@ -1562,6 +1562,7 @@ static enum print_line_t trace_object_print(struct trace_iterator *iter, int fla
> > > > > trace_assign_type(field, iter->ent);
> > > > > print_fn_trace(s, field->ip, field->parent_ip, flags);
> > > > > trace_seq_printf(s, " object:0x%lx", field->object);
> > > > > + trace_seq_printf(s, " value:0x%lx", field->value);
> > > > > trace_seq_putc(s, '\n');
> > > > >
> > > > > return trace_handle_return(s);
> > > > > @@ -1574,9 +1575,8 @@ static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags
> > > > >
> > > > > trace_assign_type(field, iter->ent);
> > > > >
> > > > > - trace_seq_printf(&iter->seq, "%lx %lx\n",
> > > > > - field->ip,
> > > > > - field->parent_ip);
> > > > > + trace_seq_printf(&iter->seq, "%lx %lx %lx %lx\n", field->ip,
> > > > > + field->parent_ip, field->object, field->value);
> > > > >
> > > > > return trace_handle_return(&iter->seq);
> > > > > }
> > > > > --
> > > > > 2.25.1
> > > > >
> > > >
> > > >
> > > > --
> > > > Masami Hiramatsu <[email protected]>
> > >
> > > Thanks,
> > > ---
> > > JeffXie
> >
> > ---
> > JeffXie
>
>
> --
> Masami Hiramatsu <[email protected]>
---
JeffXie

2022-01-06 00:55:47

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC][PATCH v5 3/4] trace/objtrace: Add testcases for objtrace trigger parsing

On Thu, 6 Jan 2022 00:02:53 +0000
fuyufan <[email protected]> wrote:

> From: xuhuijie <[email protected]>

Needs a change log.

-- Steve

>
> Signed-off-by: xuhuijie <[email protected]>
> ---
> .../ftrace/test.d/trigger/trigger-objtrace.tc | 41 +++++++++++++++++++
> 1 file changed, 41 insertions(+)
> create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
>
> diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
> new file mode 100644
> index 000000000000..4efcd7eee694
> --- /dev/null
> +++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
> @@ -0,0 +1,41 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0
> +# description: event trigger - test objtrace-trigger
> +# requires: kprobe_events
> +
> +fail() { #msg
> + echo $1
> + exit_fail
> +}
> +
> +echo 'p bio_add_page arg1=$arg1' > kprobe_events
> +
> +FEATURE=`grep objtrace events/kprobes/p_bio_add_page_0/trigger`
> +if [ -z "$FEATURE" ]; then
> + echo "objtrace trigger is not supported"
> + exit_unsupported
> +fi
> +
> +echo "Test objtrace trigger"
> +echo 'objtrace:add:0x28(arg1):u32:1 if comm == "cat"' > \
> + events/kprobes/p_bio_add_page_0/trigger
> +if [ -z $? ]; then
> + fail "objtrace trigger syntax error"
> +fi
> +
> +echo "Test objtrace semantic errors"
> +
> +# Being lack of type size
> +! echo 'objtrace:add:0x28(arg1):1' > events/kprobes/p_bio_add_page_0/trigger
> +# Being lack of objtrace command
> +! echo 'objtrace:0x28(arg1):u32:1' > events/kprobes/p_bio_add_page_0/trigger
> +# Bad parameter name
> +! echo 'objtrace:add:argx:u32:1' > events/kprobes/p_bio_add_page_0/trigger
> +
> +echo "reset objtrace trigger"
> +
> +echo '!objtrace:add:0x28(arg1):u32' > \
> + events/kprobes/p_bio_add_page_0/trigger
> +echo '-:p_bio_add_page_0' >> ./kprobe_events
> +
> +exit 0