Hi,
Here is 2 bugfixes and 1 testcase related to ftrace snapshot
feature. The 1st one is a double free bug and happens very rare.
It happened on my pc when I was cleaning up ftracetest
(but not reproducible). Anyway, it was easy to find the root
cause because I had a backtrace.
The 2nd one is an odd behavior of snapshotting. Since it is
easy to reproduce, I made a kselftest testcase for this bug too.
Thank you,
---
Masami Hiramatsu (3):
[BUGFIX] tracing: Fix double free of event_trigger_data
[BUGFIX] ring_buffer: tracing: Inherit the tracing setting to next ring buffer
selftests/ftrace: Add snapshot and tracing_on test case
include/linux/ring_buffer.h | 1 +
kernel/trace/ring_buffer.c | 12 +++++++++
kernel/trace/trace.c | 11 ++++++++
kernel/trace/trace.h | 2 +
kernel/trace/trace_events_trigger.c | 10 ++++---
.../selftests/ftrace/test.d/00basic/snapshot.tc | 28 ++++++++++++++++++++
6 files changed, 60 insertions(+), 4 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/00basic/snapshot.tc
--
Masami Hiramatsu (Linaro) <[email protected]>
Fix a double free bug of event_trigger_data caused by
calling unregister_trigger() from register_snapshot_trigger().
This kicks a kernel BUG if double free checker is enabled
as below;
kernel BUG at /home/mhiramat/ksrc/linux/mm/slub.c:296!
invalid opcode: 0000 [#1] SMP PTI
CPU: 2 PID: 4312 Comm: ftracetest Not tainted 4.18.0-rc1+ #44
Hardware name: ASUS All Series/B85M-G, BIOS 2108 08/11/2014
RIP: 0010:set_freepointer.part.37+0x0/0x10
Code: 41 b8 01 00 00 00 29 c8 4d 8d 0c 0c b9 10 00 00 00 50 e8 e3 28 23 00 8b 53 08 5e 5f 89 d1 81 e1 00 04 00 00 e9 e9 fe ff ff 90 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 48 c7 c6 90 7f 0d
RSP: 0018:ffffa799caa3bd90 EFLAGS: 00010246
RAX: ffff9b825f8c8e80 RBX: ffff9b825f8c8e80 RCX: ffff9b825f8c8e80
RDX: 0000000000021562 RSI: ffff9b830e9e70e0 RDI: 0000000000000202
RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9b830e0072c0
R13: ffffeb8e0d7e3200 R14: ffffffff961db7af R15: 00000000fffffffe
FS: 00007f135ba9f700(0000) GS:ffff9b830e800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563736b5f3a2 CR3: 0000000295916005 CR4: 00000000001606e0
Call Trace:
kfree+0x35d/0x380
event_trigger_callback+0x13f/0x1c0
event_trigger_write+0xf2/0x1a0
? lock_acquire+0x9f/0x200
__vfs_write+0x26/0x170
? rcu_read_lock_sched_held+0x6b/0x80
? rcu_sync_lockdep_assert+0x2e/0x60
? __sb_start_write+0x13e/0x1a0
? vfs_write+0x18a/0x1b0
vfs_write+0xc1/0x1b0
ksys_write+0x45/0xa0
do_syscall_64+0x60/0x200
entry_SYSCALL_64_after_hwframe+0x49/0xbe
unregister_trigger() will free given event_trigger_data
at last. But that event_trigger_data will be freed again
in event_trigger_callback() if register_snapshot_trigger()
is failed, and causes a double free bug.
Registering the data should be the final operation in the
register function on normal path, because the trigger
must be ready for taking action right after it is
registered.
Fixes: commit 93e31ffbf417 ("tracing: Add 'snapshot' event trigger command")
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Tom Zanussi <[email protected]>
Cc: [email protected]
---
kernel/trace/trace.c | 5 +++++
kernel/trace/trace.h | 2 ++
kernel/trace/trace_events_trigger.c | 10 ++++++----
3 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index f054bd6a1c66..2556d8c097d2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -980,6 +980,11 @@ static void free_snapshot(struct trace_array *tr)
tr->allocated_snapshot = false;
}
+void tracing_free_snapshot_instance(struct trace_array *tr)
+{
+ free_snapshot(tr);
+}
+
/**
* tracing_alloc_snapshot - allocate snapshot buffer.
*
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index f8f86231ad90..03468bb8a79a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1823,12 +1823,14 @@ static inline void trace_event_eval_update(struct trace_eval_map **map, int len)
#ifdef CONFIG_TRACER_SNAPSHOT
void tracing_snapshot_instance(struct trace_array *tr);
int tracing_alloc_snapshot_instance(struct trace_array *tr);
+void tracing_free_snapshot_instance(struct trace_array *tr);
#else
static inline void tracing_snapshot_instance(struct trace_array *tr) { }
static inline int tracing_alloc_snapshot_instance(struct trace_array *tr)
{
return 0;
}
+static inline void tracing_free_snapshot_instance(struct trace_array *tr) { }
#endif
extern struct trace_iterator *tracepoint_print_iter;
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index d18249683682..40e2f4406b2c 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -1079,11 +1079,13 @@ register_snapshot_trigger(char *glob, struct event_trigger_ops *ops,
struct event_trigger_data *data,
struct trace_event_file *file)
{
- int ret = register_trigger(glob, ops, data, file);
+ int free_if_fail = !file->tr->allocated_snapshot;
+ int ret = 0;
- if (ret > 0 && tracing_alloc_snapshot_instance(file->tr) != 0) {
- unregister_trigger(glob, ops, data, file);
- ret = 0;
+ if (!tracing_alloc_snapshot_instance(file->tr)) {
+ ret = register_trigger(glob, ops, data, file);
+ if (ret == 0 && free_if_fail)
+ tracing_free_snapshot_instance(file->tr);
}
return ret;
Inherit the tracing on/off setting on ring_buffer to next
trace buffer when taking a snapshot.
Taking a snapshot is done by swapping with backup ring buffer
(max_tr_buffer). But since the tracing on/off setting is set
in the ring buffer, when swapping it, tracing on/off setting
can also be changed. This causes a strange result like below;
/sys/kernel/debug/tracing # cat tracing_on
1
/sys/kernel/debug/tracing # echo 0 > tracing_on
/sys/kernel/debug/tracing # echo 1 > snapshot
/sys/kernel/debug/tracing # cat tracing_on
1
/sys/kernel/debug/tracing # echo 1 > snapshot
/sys/kernel/debug/tracing # cat tracing_on
0
We don't touch tracing_on, but snapshot changes tracing_on
setting each time. This must be a bug, because user never know
that each "ring_buffer" stores tracing-enable state and
snapshot is done by swapping ring buffers.
This patch fixes above strange behavior.
Fixes: commit debdd57f5145 ("tracing: Make a snapshot feature available from userspace")
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Hiraku Toyooka <[email protected]>
Cc: [email protected]
---
include/linux/ring_buffer.h | 1 +
kernel/trace/ring_buffer.c | 12 ++++++++++++
kernel/trace/trace.c | 6 ++++++
3 files changed, 19 insertions(+)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index b72ebdff0b77..003d09ab308d 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -165,6 +165,7 @@ void ring_buffer_record_enable(struct ring_buffer *buffer);
void ring_buffer_record_off(struct ring_buffer *buffer);
void ring_buffer_record_on(struct ring_buffer *buffer);
int ring_buffer_record_is_on(struct ring_buffer *buffer);
+int ring_buffer_record_is_set_on(struct ring_buffer *buffer);
void ring_buffer_record_disable_cpu(struct ring_buffer *buffer, int cpu);
void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 6a46af21765c..4038ed74ab95 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3227,6 +3227,18 @@ int ring_buffer_record_is_on(struct ring_buffer *buffer)
}
/**
+ * ring_buffer_record_is_set_on - return true if the ring buffer is set writable
+ * @buffer: The ring buffer to see if write is set enabled
+ *
+ * Returns true if the ring buffer is set writable by ring_buffer_record_on().
+ * Note that this does NOT mean it is in a writable state.
+ */
+int ring_buffer_record_is_set_on(struct ring_buffer *buffer)
+{
+ return !(atomic_read(&buffer->record_disabled) & RB_BUFFER_OFF);
+}
+
+/**
* ring_buffer_record_disable_cpu - stop all writes into the cpu_buffer
* @buffer: The ring buffer to stop writes to.
* @cpu: The CPU buffer to stop
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2556d8c097d2..bbd5a94a7ef1 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1378,6 +1378,12 @@ update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu)
arch_spin_lock(&tr->max_lock);
+ /* Inherit the recordable setting from trace_buffer */
+ if (ring_buffer_record_is_set_on(tr->trace_buffer.buffer))
+ ring_buffer_record_on(tr->max_buffer.buffer);
+ else
+ ring_buffer_record_off(tr->max_buffer.buffer);
+
swap(tr->trace_buffer.buffer, tr->max_buffer.buffer);
__update_max_tr(tr, tsk, cpu);
Add a testcase for checking snapshot and tracing_on
relationship. This ensures that the snapshotting doesn't
affect current tracing on/off settings.
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: [email protected]
---
.../selftests/ftrace/test.d/00basic/snapshot.tc | 28 ++++++++++++++++++++
1 file changed, 28 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/00basic/snapshot.tc
diff --git a/tools/testing/selftests/ftrace/test.d/00basic/snapshot.tc b/tools/testing/selftests/ftrace/test.d/00basic/snapshot.tc
new file mode 100644
index 000000000000..3b1f45e13a2e
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/00basic/snapshot.tc
@@ -0,0 +1,28 @@
+#!/bin/sh
+# description: Snapshot and tracing setting
+# flags: instance
+
+[ ! -f snapshot ] && exit_unsupported
+
+echo "Set tracing off"
+echo 0 > tracing_on
+
+echo "Allocate and take a snapshot"
+echo 1 > snapshot
+
+# Since trace buffer is empty, snapshot is also empty, but allocated
+grep -q "Snapshot is allocated" snapshot
+
+echo "Ensure keep tracing off"
+test `cat tracing_on` -eq 0
+
+echo "Set tracing on"
+echo 1 > tracing_on
+
+echo "Take a snapshot again"
+echo 1 > snapshot
+
+echo "Ensure keep tracing on"
+test `cat tracing_on` -eq 1
+
+exit 0
On Sat, 14 Jul 2018 01:27:47 +0900
Masami Hiramatsu <[email protected]> wrote:
> Fix a double free bug of event_trigger_data caused by
> calling unregister_trigger() from register_snapshot_trigger().
> This kicks a kernel BUG if double free checker is enabled
> as below;
>
> kernel BUG at /home/mhiramat/ksrc/linux/mm/slub.c:296!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 2 PID: 4312 Comm: ftracetest Not tainted 4.18.0-rc1+ #44
> Hardware name: ASUS All Series/B85M-G, BIOS 2108 08/11/2014
> RIP: 0010:set_freepointer.part.37+0x0/0x10
> Code: 41 b8 01 00 00 00 29 c8 4d 8d 0c 0c b9 10 00 00 00 50 e8 e3 28 23 00 8b 53 08 5e 5f 89 d1 81 e1 00 04 00 00 e9 e9 fe ff ff 90 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 48 c7 c6 90 7f 0d
> RSP: 0018:ffffa799caa3bd90 EFLAGS: 00010246
> RAX: ffff9b825f8c8e80 RBX: ffff9b825f8c8e80 RCX: ffff9b825f8c8e80
> RDX: 0000000000021562 RSI: ffff9b830e9e70e0 RDI: 0000000000000202
> RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff9b830e0072c0
> R13: ffffeb8e0d7e3200 R14: ffffffff961db7af R15: 00000000fffffffe
> FS: 00007f135ba9f700(0000) GS:ffff9b830e800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000563736b5f3a2 CR3: 0000000295916005 CR4: 00000000001606e0
> Call Trace:
> kfree+0x35d/0x380
> event_trigger_callback+0x13f/0x1c0
> event_trigger_write+0xf2/0x1a0
> ? lock_acquire+0x9f/0x200
> __vfs_write+0x26/0x170
> ? rcu_read_lock_sched_held+0x6b/0x80
> ? rcu_sync_lockdep_assert+0x2e/0x60
> ? __sb_start_write+0x13e/0x1a0
> ? vfs_write+0x18a/0x1b0
> vfs_write+0xc1/0x1b0
> ksys_write+0x45/0xa0
> do_syscall_64+0x60/0x200
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> unregister_trigger() will free given event_trigger_data
> at last. But that event_trigger_data will be freed again
> in event_trigger_callback() if register_snapshot_trigger()
> is failed, and causes a double free bug.
>
> Registering the data should be the final operation in the
> register function on normal path, because the trigger
> must be ready for taking action right after it is
> registered.
Nice catch. I can reproduce this. But this patch is fixing the symptom
and not the disease.
>
> Fixes: commit 93e31ffbf417 ("tracing: Add 'snapshot' event trigger command")
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Tom Zanussi <[email protected]>
> Cc: [email protected]
> ---
> kernel/trace/trace.c | 5 +++++
> kernel/trace/trace.h | 2 ++
> kernel/trace/trace_events_trigger.c | 10 ++++++----
> 3 files changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index f054bd6a1c66..2556d8c097d2 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -980,6 +980,11 @@ static void free_snapshot(struct trace_array *tr)
> tr->allocated_snapshot = false;
> }
>
> +void tracing_free_snapshot_instance(struct trace_array *tr)
> +{
> + free_snapshot(tr);
> +}
> +
> /**
> * tracing_alloc_snapshot - allocate snapshot buffer.
> *
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index f8f86231ad90..03468bb8a79a 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -1823,12 +1823,14 @@ static inline void trace_event_eval_update(struct trace_eval_map **map, int len)
> #ifdef CONFIG_TRACER_SNAPSHOT
> void tracing_snapshot_instance(struct trace_array *tr);
> int tracing_alloc_snapshot_instance(struct trace_array *tr);
> +void tracing_free_snapshot_instance(struct trace_array *tr);
> #else
> static inline void tracing_snapshot_instance(struct trace_array *tr) { }
> static inline int tracing_alloc_snapshot_instance(struct trace_array *tr)
> {
> return 0;
> }
> +static inline void tracing_free_snapshot_instance(struct trace_array *tr) { }
> #endif
>
> extern struct trace_iterator *tracepoint_print_iter;
> diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> index d18249683682..40e2f4406b2c 100644
> --- a/kernel/trace/trace_events_trigger.c
> +++ b/kernel/trace/trace_events_trigger.c
> @@ -1079,11 +1079,13 @@ register_snapshot_trigger(char *glob, struct event_trigger_ops *ops,
> struct event_trigger_data *data,
> struct trace_event_file *file)
> {
> - int ret = register_trigger(glob, ops, data, file);
> + int free_if_fail = !file->tr->allocated_snapshot;
> + int ret = 0;
>
> - if (ret > 0 && tracing_alloc_snapshot_instance(file->tr) != 0) {
> - unregister_trigger(glob, ops, data, file);
> - ret = 0;
> + if (!tracing_alloc_snapshot_instance(file->tr)) {
> + ret = register_trigger(glob, ops, data, file);
> + if (ret == 0 && free_if_fail)
> + tracing_free_snapshot_instance(file->tr);
> }
>
> return ret;
Really, when we register_trigger() we should be able to freely call
unresgister_trigger() with no side effects like the above happens.
Instead of doing the above, I believe this is a better approach:
Thoughts?
P.S. This brings up another minor bug. The failure should return ENOMEM
not ENOENT.
-- Steve
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index d18249683682..d15a746bcdfb 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -679,6 +679,7 @@ event_trigger_callback(struct event_command *cmd_ops,
goto out_free;
out_reg:
+ event_trigger_init(trigger_ops, trigger_data);
ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
/*
* The above returns on success the # of functions enabled,
@@ -687,9 +688,9 @@ event_trigger_callback(struct event_command *cmd_ops,
*/
if (!ret) {
ret = -ENOENT;
- goto out_free;
+ goto out_free_trigger;
} else if (ret < 0)
- goto out_free;
+ goto out_free_trigger;
ret = 0;
out:
return ret;
@@ -699,6 +700,10 @@ event_trigger_callback(struct event_command *cmd_ops,
cmd_ops->set_filter(NULL, trigger_data, NULL);
kfree(trigger_data);
goto out;
+
+ out_free_trigger:
+ event_trigger_free(trigger_ops, trigger_data);
+ goto out;
}
/**
On Sat, 14 Jul 2018 01:28:15 +0900
Masami Hiramatsu <[email protected]> wrote:
> Inherit the tracing on/off setting on ring_buffer to next
> trace buffer when taking a snapshot.
>
> Taking a snapshot is done by swapping with backup ring buffer
> (max_tr_buffer). But since the tracing on/off setting is set
> in the ring buffer, when swapping it, tracing on/off setting
> can also be changed. This causes a strange result like below;
>
> /sys/kernel/debug/tracing # cat tracing_on
> 1
> /sys/kernel/debug/tracing # echo 0 > tracing_on
> /sys/kernel/debug/tracing # echo 1 > snapshot
> /sys/kernel/debug/tracing # cat tracing_on
> 1
> /sys/kernel/debug/tracing # echo 1 > snapshot
> /sys/kernel/debug/tracing # cat tracing_on
> 0
>
> We don't touch tracing_on, but snapshot changes tracing_on
> setting each time. This must be a bug, because user never know
> that each "ring_buffer" stores tracing-enable state and
> snapshot is done by swapping ring buffers.
>
> This patch fixes above strange behavior.
>
OK, so the issue is that the tracing_on state is saved with the buffer
itself. And when we swap the buffer, we also swap the state.
> Fixes: commit debdd57f5145 ("tracing: Make a snapshot feature available from userspace")
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Hiraku Toyooka <[email protected]>
> Cc: [email protected]
> ---
> include/linux/ring_buffer.h | 1 +
> kernel/trace/ring_buffer.c | 12 ++++++++++++
> kernel/trace/trace.c | 6 ++++++
> 3 files changed, 19 insertions(+)
>
> diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> index b72ebdff0b77..003d09ab308d 100644
> --- a/include/linux/ring_buffer.h
> +++ b/include/linux/ring_buffer.h
> @@ -165,6 +165,7 @@ void ring_buffer_record_enable(struct ring_buffer *buffer);
> void ring_buffer_record_off(struct ring_buffer *buffer);
> void ring_buffer_record_on(struct ring_buffer *buffer);
> int ring_buffer_record_is_on(struct ring_buffer *buffer);
> +int ring_buffer_record_is_set_on(struct ring_buffer *buffer);
> void ring_buffer_record_disable_cpu(struct ring_buffer *buffer, int cpu);
> void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu);
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 6a46af21765c..4038ed74ab95 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -3227,6 +3227,18 @@ int ring_buffer_record_is_on(struct ring_buffer *buffer)
> }
>
> /**
> + * ring_buffer_record_is_set_on - return true if the ring buffer is set writable
> + * @buffer: The ring buffer to see if write is set enabled
> + *
> + * Returns true if the ring buffer is set writable by ring_buffer_record_on().
> + * Note that this does NOT mean it is in a writable state.
I'd add a little more context here. Something like:
* It may return true when the ring buffer has been disabled by
* ring_buffer_record_disable(), as that is a temporary disabling of
* the ring buffer.
-- Steve
> + */
> +int ring_buffer_record_is_set_on(struct ring_buffer *buffer)
> +{
> + return !(atomic_read(&buffer->record_disabled) & RB_BUFFER_OFF);
> +}
> +
> +/**
> * ring_buffer_record_disable_cpu - stop all writes into the cpu_buffer
> * @buffer: The ring buffer to stop writes to.
> * @cpu: The CPU buffer to stop
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 2556d8c097d2..bbd5a94a7ef1 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -1378,6 +1378,12 @@ update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu)
>
> arch_spin_lock(&tr->max_lock);
>
> + /* Inherit the recordable setting from trace_buffer */
> + if (ring_buffer_record_is_set_on(tr->trace_buffer.buffer))
> + ring_buffer_record_on(tr->max_buffer.buffer);
> + else
> + ring_buffer_record_off(tr->max_buffer.buffer);
> +
> swap(tr->trace_buffer.buffer, tr->max_buffer.buffer);
>
> __update_max_tr(tr, tsk, cpu);
On Mon, 23 Jul 2018 22:25:34 -0400
Steven Rostedt <[email protected]> wrote:
> On Sat, 14 Jul 2018 01:28:15 +0900
> Masami Hiramatsu <[email protected]> wrote:
>
> > Inherit the tracing on/off setting on ring_buffer to next
> > trace buffer when taking a snapshot.
> >
> > Taking a snapshot is done by swapping with backup ring buffer
> > (max_tr_buffer). But since the tracing on/off setting is set
> > in the ring buffer, when swapping it, tracing on/off setting
> > can also be changed. This causes a strange result like below;
> >
> > /sys/kernel/debug/tracing # cat tracing_on
> > 1
> > /sys/kernel/debug/tracing # echo 0 > tracing_on
> > /sys/kernel/debug/tracing # echo 1 > snapshot
> > /sys/kernel/debug/tracing # cat tracing_on
> > 1
> > /sys/kernel/debug/tracing # echo 1 > snapshot
> > /sys/kernel/debug/tracing # cat tracing_on
> > 0
> >
> > We don't touch tracing_on, but snapshot changes tracing_on
> > setting each time. This must be a bug, because user never know
> > that each "ring_buffer" stores tracing-enable state and
> > snapshot is done by swapping ring buffers.
> >
> > This patch fixes above strange behavior.
> >
>
>
> OK, so the issue is that the tracing_on state is saved with the buffer
> itself. And when we swap the buffer, we also swap the state.
>
>
> > Fixes: commit debdd57f5145 ("tracing: Make a snapshot feature available from userspace")
> > Signed-off-by: Masami Hiramatsu <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Hiraku Toyooka <[email protected]>
> > Cc: [email protected]
> > ---
> > include/linux/ring_buffer.h | 1 +
> > kernel/trace/ring_buffer.c | 12 ++++++++++++
> > kernel/trace/trace.c | 6 ++++++
> > 3 files changed, 19 insertions(+)
> >
> > diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> > index b72ebdff0b77..003d09ab308d 100644
> > --- a/include/linux/ring_buffer.h
> > +++ b/include/linux/ring_buffer.h
> > @@ -165,6 +165,7 @@ void ring_buffer_record_enable(struct ring_buffer *buffer);
> > void ring_buffer_record_off(struct ring_buffer *buffer);
> > void ring_buffer_record_on(struct ring_buffer *buffer);
> > int ring_buffer_record_is_on(struct ring_buffer *buffer);
> > +int ring_buffer_record_is_set_on(struct ring_buffer *buffer);
> > void ring_buffer_record_disable_cpu(struct ring_buffer *buffer, int cpu);
> > void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu);
> >
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index 6a46af21765c..4038ed74ab95 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c
> > @@ -3227,6 +3227,18 @@ int ring_buffer_record_is_on(struct ring_buffer *buffer)
> > }
> >
> > /**
> > + * ring_buffer_record_is_set_on - return true if the ring buffer is set writable
> > + * @buffer: The ring buffer to see if write is set enabled
> > + *
> > + * Returns true if the ring buffer is set writable by ring_buffer_record_on().
> > + * Note that this does NOT mean it is in a writable state.
>
> I'd add a little more context here. Something like:
>
> * It may return true when the ring buffer has been disabled by
> * ring_buffer_record_disable(), as that is a temporary disabling of
> * the ring buffer.
Looks nice to me :) It makes the meaning clearer.
Thanks!
>
> -- Steve
>
> > + */
> > +int ring_buffer_record_is_set_on(struct ring_buffer *buffer)
> > +{
> > + return !(atomic_read(&buffer->record_disabled) & RB_BUFFER_OFF);
> > +}
> > +
> > +/**
> > * ring_buffer_record_disable_cpu - stop all writes into the cpu_buffer
> > * @buffer: The ring buffer to stop writes to.
> > * @cpu: The CPU buffer to stop
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 2556d8c097d2..bbd5a94a7ef1 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -1378,6 +1378,12 @@ update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu)
> >
> > arch_spin_lock(&tr->max_lock);
> >
> > + /* Inherit the recordable setting from trace_buffer */
> > + if (ring_buffer_record_is_set_on(tr->trace_buffer.buffer))
> > + ring_buffer_record_on(tr->max_buffer.buffer);
> > + else
> > + ring_buffer_record_off(tr->max_buffer.buffer);
> > +
> > swap(tr->trace_buffer.buffer, tr->max_buffer.buffer);
> >
> > __update_max_tr(tr, tsk, cpu);
>
--
Masami Hiramatsu <[email protected]>
On Mon, 23 Jul 2018 22:10:06 -0400
Steven Rostedt <[email protected]> wrote:
> On Sat, 14 Jul 2018 01:27:47 +0900
> Masami Hiramatsu <[email protected]> wrote:
>
> > Fix a double free bug of event_trigger_data caused by
> > calling unregister_trigger() from register_snapshot_trigger().
> > This kicks a kernel BUG if double free checker is enabled
> > as below;
> >
> > kernel BUG at /home/mhiramat/ksrc/linux/mm/slub.c:296!
> > invalid opcode: 0000 [#1] SMP PTI
> > CPU: 2 PID: 4312 Comm: ftracetest Not tainted 4.18.0-rc1+ #44
> > Hardware name: ASUS All Series/B85M-G, BIOS 2108 08/11/2014
> > RIP: 0010:set_freepointer.part.37+0x0/0x10
> > Code: 41 b8 01 00 00 00 29 c8 4d 8d 0c 0c b9 10 00 00 00 50 e8 e3 28 23 00 8b 53 08 5e 5f 89 d1 81 e1 00 04 00 00 e9 e9 fe ff ff 90 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 48 c7 c6 90 7f 0d
> > RSP: 0018:ffffa799caa3bd90 EFLAGS: 00010246
> > RAX: ffff9b825f8c8e80 RBX: ffff9b825f8c8e80 RCX: ffff9b825f8c8e80
> > RDX: 0000000000021562 RSI: ffff9b830e9e70e0 RDI: 0000000000000202
> > RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffff9b830e0072c0
> > R13: ffffeb8e0d7e3200 R14: ffffffff961db7af R15: 00000000fffffffe
> > FS: 00007f135ba9f700(0000) GS:ffff9b830e800000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000563736b5f3a2 CR3: 0000000295916005 CR4: 00000000001606e0
> > Call Trace:
> > kfree+0x35d/0x380
> > event_trigger_callback+0x13f/0x1c0
> > event_trigger_write+0xf2/0x1a0
> > ? lock_acquire+0x9f/0x200
> > __vfs_write+0x26/0x170
> > ? rcu_read_lock_sched_held+0x6b/0x80
> > ? rcu_sync_lockdep_assert+0x2e/0x60
> > ? __sb_start_write+0x13e/0x1a0
> > ? vfs_write+0x18a/0x1b0
> > vfs_write+0xc1/0x1b0
> > ksys_write+0x45/0xa0
> > do_syscall_64+0x60/0x200
> > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > unregister_trigger() will free given event_trigger_data
> > at last. But that event_trigger_data will be freed again
> > in event_trigger_callback() if register_snapshot_trigger()
> > is failed, and causes a double free bug.
> >
> > Registering the data should be the final operation in the
> > register function on normal path, because the trigger
> > must be ready for taking action right after it is
> > registered.
>
> Nice catch. I can reproduce this. But this patch is fixing the symptom
> and not the disease.
>
> >
> > Fixes: commit 93e31ffbf417 ("tracing: Add 'snapshot' event trigger command")
> > Signed-off-by: Masami Hiramatsu <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Tom Zanussi <[email protected]>
> > Cc: [email protected]
> > ---
> > kernel/trace/trace.c | 5 +++++
> > kernel/trace/trace.h | 2 ++
> > kernel/trace/trace_events_trigger.c | 10 ++++++----
> > 3 files changed, 13 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index f054bd6a1c66..2556d8c097d2 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -980,6 +980,11 @@ static void free_snapshot(struct trace_array *tr)
> > tr->allocated_snapshot = false;
> > }
> >
> > +void tracing_free_snapshot_instance(struct trace_array *tr)
> > +{
> > + free_snapshot(tr);
> > +}
> > +
> > /**
> > * tracing_alloc_snapshot - allocate snapshot buffer.
> > *
> > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > index f8f86231ad90..03468bb8a79a 100644
> > --- a/kernel/trace/trace.h
> > +++ b/kernel/trace/trace.h
> > @@ -1823,12 +1823,14 @@ static inline void trace_event_eval_update(struct trace_eval_map **map, int len)
> > #ifdef CONFIG_TRACER_SNAPSHOT
> > void tracing_snapshot_instance(struct trace_array *tr);
> > int tracing_alloc_snapshot_instance(struct trace_array *tr);
> > +void tracing_free_snapshot_instance(struct trace_array *tr);
> > #else
> > static inline void tracing_snapshot_instance(struct trace_array *tr) { }
> > static inline int tracing_alloc_snapshot_instance(struct trace_array *tr)
> > {
> > return 0;
> > }
> > +static inline void tracing_free_snapshot_instance(struct trace_array *tr) { }
> > #endif
> >
> > extern struct trace_iterator *tracepoint_print_iter;
> > diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> > index d18249683682..40e2f4406b2c 100644
> > --- a/kernel/trace/trace_events_trigger.c
> > +++ b/kernel/trace/trace_events_trigger.c
> > @@ -1079,11 +1079,13 @@ register_snapshot_trigger(char *glob, struct event_trigger_ops *ops,
> > struct event_trigger_data *data,
> > struct trace_event_file *file)
> > {
> > - int ret = register_trigger(glob, ops, data, file);
> > + int free_if_fail = !file->tr->allocated_snapshot;
> > + int ret = 0;
> >
> > - if (ret > 0 && tracing_alloc_snapshot_instance(file->tr) != 0) {
> > - unregister_trigger(glob, ops, data, file);
> > - ret = 0;
> > + if (!tracing_alloc_snapshot_instance(file->tr)) {
> > + ret = register_trigger(glob, ops, data, file);
> > + if (ret == 0 && free_if_fail)
> > + tracing_free_snapshot_instance(file->tr);
> > }
> >
> > return ret;
>
> Really, when we register_trigger() we should be able to freely call
> unresgister_trigger() with no side effects like the above happens.
> Instead of doing the above, I believe this is a better approach:
>
> Thoughts?
Hmm, your patch seems to leak a memory since event_trigger_init() will
be called twice on same trigger_data (Note that event_trigger_init()
does not init ref counter, but increment it.) So we should decrement
it when we find it is succeeded. Moreover, if register_trigger()
fails before calling data->ops->init() (see -EEXIST case), the ref
counter will be 0 (-1 +1). But if it fails after data->ops->init(),
the ref counter will be 1 (-1 +1 +1). It still be unstable.
(Ah, that means we may have another trouble...)
>
> P.S. This brings up another minor bug. The failure should return ENOMEM
> not ENOENT.
Hmm it seems we should review the register_trigger() implementation.
It should return the return value of trace_event_trigger_enable_disable(),
shouldn't it?
Thank you,
>
> -- Steve
>
> diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> index d18249683682..d15a746bcdfb 100644
> --- a/kernel/trace/trace_events_trigger.c
> +++ b/kernel/trace/trace_events_trigger.c
> @@ -679,6 +679,7 @@ event_trigger_callback(struct event_command *cmd_ops,
> goto out_free;
>
> out_reg:
> + event_trigger_init(trigger_ops, trigger_data);
> ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
> /*
> * The above returns on success the # of functions enabled,
> @@ -687,9 +688,9 @@ event_trigger_callback(struct event_command *cmd_ops,
> */
> if (!ret) {
> ret = -ENOENT;
> - goto out_free;
> + goto out_free_trigger;
> } else if (ret < 0)
> - goto out_free;
> + goto out_free_trigger;
> ret = 0;
> out:
> return ret;
> @@ -699,6 +700,10 @@ event_trigger_callback(struct event_command *cmd_ops,
> cmd_ops->set_filter(NULL, trigger_data, NULL);
> kfree(trigger_data);
> goto out;
> +
> + out_free_trigger:
> + event_trigger_free(trigger_ops, trigger_data);
> + goto out;
> }
>
> /**
--
Masami Hiramatsu <[email protected]>
On Wed, 25 Jul 2018 00:09:09 +0900
Masami Hiramatsu <[email protected]> wrote:
> Hmm, your patch seems to leak a memory since event_trigger_init() will
> be called twice on same trigger_data (Note that event_trigger_init()
> does not init ref counter, but increment it.) So we should decrement
> it when we find it is succeeded. Moreover, if register_trigger()
Good catch, and easily fixed.
> fails before calling data->ops->init() (see -EEXIST case), the ref
> counter will be 0 (-1 +1). But if it fails after data->ops->init(),
> the ref counter will be 1 (-1 +1 +1). It still be unstable.
> (Ah, that means we may have another trouble...)
I'm not sure there's a problem here. I now have:
out_reg:
/* Up the trigger_data count to make sure reg doesn't free it on failuer */
event_trigger_init(trigger_ops, trigger_data);
ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
/*
* The above returns on success the # of functions enabled,
* but if it didn't find any functions it returns zero.
* Consider no functions a failure too.
*/
if (!ret) {
ret = -ENOENT;
} else if (ret > 0)
ret = 0;
/* Down the counter of trigger_data or free it if not used anymore */
event_trigger_free(trigger_ops, trigger_data);
out:
return ret;
Thus we increment trigger_data before calling reg, and free it
afterward. But if reg() did an init too, then the event_trigger_free()
just decs the ref counter.
As for register_trigger()
>
> >
> > P.S. This brings up another minor bug. The failure should return ENOMEM
> > not ENOENT.
>
> Hmm it seems we should review the register_trigger() implementation.
> It should return the return value of trace_event_trigger_enable_disable(),
> shouldn't it?
>
Yeah, that's not done well. I'll fix it up.
Thanks for pointing it out.
-- Steve
On Tue, 24 Jul 2018 16:49:59 -0400
Steven Rostedt <[email protected]> wrote:
> > Hmm it seems we should review the register_trigger() implementation.
> > It should return the return value of trace_event_trigger_enable_disable(),
> > shouldn't it?
> >
>
> Yeah, that's not done well. I'll fix it up.
>
> Thanks for pointing it out.
Tom,
register_trigger() is messed up. I should have caught this when it was
first submitted, but I'm totally confused. The comments don't match the
code.
First we have this:
ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
/*
* The above returns on success the # of functions enabled,
* but if it didn't find any functions it returns zero.
* Consider no functions a failure too.
*/
Which looks to be total BS.
As we have this:
/**
* register_trigger - Generic event_command @reg implementation
* @glob: The raw string used to register the trigger
* @ops: The trigger ops associated with the trigger
* @data: Trigger-specific data to associate with the trigger
* @file: The trace_event_file associated with the event
*
* Common implementation for event trigger registration.
*
* Usually used directly as the @reg method in event command
* implementations.
*
* Return: 0 on success, errno otherwise
*/
static int register_trigger(char *glob, struct event_trigger_ops *ops,
struct event_trigger_data *data,
struct trace_event_file *file)
{
struct event_trigger_data *test;
int ret = 0;
list_for_each_entry_rcu(test, &file->triggers, list) {
if (test->cmd_ops->trigger_type == data->cmd_ops->trigger_type) {
ret = -EEXIST;
goto out;
}
}
if (data->ops->init) {
ret = data->ops->init(data->ops, data);
if (ret < 0)
goto out;
}
list_add_rcu(&data->list, &file->triggers);
ret++;
update_cond_flag(file);
if (trace_event_trigger_enable_disable(file, 1) < 0) {
list_del_rcu(&data->list);
update_cond_flag(file);
ret--;
}
out:
return ret;
}
Where the comment is total wrong. It doesn't return 0 on success, it
returns 1. And if trace_event_trigger_enable_disable() fails it returns
zero.
And that can fail with the call->class->reg() return value, which could
fail for various strange reasons. I don't know why we would want to
return 0 when it fails?
I don't see where ->reg() would return anything but 1 on success. Maybe
I'm missing something. I'll look some more, but I'm thinking of changing
->reg() to return zero on all success, and negative on all errors and
just check those results.
-- Steve
On Tue, 24 Jul 2018 17:30:08 -0400
Steven Rostedt <[email protected]> wrote:
> I don't see where ->reg() would return anything but 1 on success. Maybe
> I'm missing something. I'll look some more, but I'm thinking of changing
> ->reg() to return zero on all success, and negative on all errors and
> just check those results.
Yeah, I'm going to make these changes. That is, all struct
event_command .reg functions will return negative on error, and
zero or positive on everything else. All the checks are going to be
only for the negative value.
But for the bug fix itself, I believe this should do it. Masami, what
do you think?
-- Steve
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index d18249683682..8771a27ca60f 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -679,6 +679,8 @@ event_trigger_callback(struct event_command *cmd_ops,
goto out_free;
out_reg:
+ /* Up the trigger_data count to make sure reg doesn't free it on failure */
+ event_trigger_init(trigger_ops, trigger_data);
ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
/*
* The above returns on success the # of functions enabled,
@@ -687,10 +689,11 @@ event_trigger_callback(struct event_command *cmd_ops,
*/
if (!ret) {
ret = -ENOENT;
- goto out_free;
- } else if (ret < 0)
- goto out_free;
- ret = 0;
+ } else if (ret > 0)
+ ret = 0;
+
+ /* Down the counter of trigger_data or free it if not used anymore */
+ event_trigger_free(trigger_ops, trigger_data);
out:
return ret;
On Tue, 24 Jul 2018 16:49:59 -0400
Steven Rostedt <[email protected]> wrote:
> On Wed, 25 Jul 2018 00:09:09 +0900
> Masami Hiramatsu <[email protected]> wrote:
>
> > Hmm, your patch seems to leak a memory since event_trigger_init() will
> > be called twice on same trigger_data (Note that event_trigger_init()
> > does not init ref counter, but increment it.) So we should decrement
> > it when we find it is succeeded. Moreover, if register_trigger()
>
> Good catch, and easily fixed.
>
> > fails before calling data->ops->init() (see -EEXIST case), the ref
> > counter will be 0 (-1 +1). But if it fails after data->ops->init(),
> > the ref counter will be 1 (-1 +1 +1). It still be unstable.
> > (Ah, that means we may have another trouble...)
>
> I'm not sure there's a problem here. I now have:
>
> out_reg:
> /* Up the trigger_data count to make sure reg doesn't free it on failuer */
> event_trigger_init(trigger_ops, trigger_data);
> ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
> /*
> * The above returns on success the # of functions enabled,
> * but if it didn't find any functions it returns zero.
> * Consider no functions a failure too.
> */
> if (!ret) {
> ret = -ENOENT;
> } else if (ret > 0)
> ret = 0;
Can we mixed up ret == 0 and ret > 0? It seems cmd_ops->reg() == 0
is a failure case.
>
> /* Down the counter of trigger_data or free it if not used anymore */
> event_trigger_free(trigger_ops, trigger_data);
> out:
> return ret;
>
> Thus we increment trigger_data before calling reg, and free it
> afterward. But if reg() did an init too, then the event_trigger_free()
> just decs the ref counter.
To avoid confusion, I would like to suggest to rename those pair to
event_trigger_data_get/put(). :)
>
> As for register_trigger()
>
>
> >
> > >
> > > P.S. This brings up another minor bug. The failure should return ENOMEM
> > > not ENOENT.
> >
> > Hmm it seems we should review the register_trigger() implementation.
> > It should return the return value of trace_event_trigger_enable_disable(),
> > shouldn't it?
> >
>
> Yeah, that's not done well. I'll fix it up.
Thanks!
>
> Thanks for pointing it out.
>
> -- Steve
--
Masami Hiramatsu <[email protected]>
On Tue, 24 Jul 2018 19:13:31 -0400
Steven Rostedt <[email protected]> wrote:
> On Tue, 24 Jul 2018 17:30:08 -0400
> Steven Rostedt <[email protected]> wrote:
>
> > I don't see where ->reg() would return anything but 1 on success. Maybe
> > I'm missing something. I'll look some more, but I'm thinking of changing
> > ->reg() to return zero on all success, and negative on all errors and
> > just check those results.
>
> Yeah, I'm going to make these changes. That is, all struct
> event_command .reg functions will return negative on error, and
> zero or positive on everything else. All the checks are going to be
> only for the negative value.
>
> But for the bug fix itself, I believe this should do it. Masami, what
> do you think?
Hm, as far as I can see, when register_trigger() returns >= 0, it already
calls ->init the trigger_data. This means its refcount++, and in that
case, below patch will miss to free the trigger_data.
How about below for tentative fix?
if (!ret) {
ret = -ENOENT;
/* We have to forcibly free the trigger_data */
goto out_free;
} else if (ret > 0)
ret = 0;
Thank you,
>
> -- Steve
>
> diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> index d18249683682..8771a27ca60f 100644
> --- a/kernel/trace/trace_events_trigger.c
> +++ b/kernel/trace/trace_events_trigger.c
> @@ -679,6 +679,8 @@ event_trigger_callback(struct event_command *cmd_ops,
> goto out_free;
>
> out_reg:
> + /* Up the trigger_data count to make sure reg doesn't free it on failure */
> + event_trigger_init(trigger_ops, trigger_data);
> ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
> /*
> * The above returns on success the # of functions enabled,
> @@ -687,10 +689,11 @@ event_trigger_callback(struct event_command *cmd_ops,
> */
> if (!ret) {
> ret = -ENOENT;
> - goto out_free;
> - } else if (ret < 0)
> - goto out_free;
> - ret = 0;
> + } else if (ret > 0)
> + ret = 0;
> +
> + /* Down the counter of trigger_data or free it if not used anymore */
> + event_trigger_free(trigger_ops, trigger_data);
> out:
> return ret;
>
--
Masami Hiramatsu <[email protected]>
On Wed, 25 Jul 2018 10:16:53 +0900
Masami Hiramatsu <[email protected]> wrote:
> Hm, as far as I can see, when register_trigger() returns >= 0, it already
> calls ->init the trigger_data. This means its refcount++, and in that
> case, below patch will miss to free the trigger_data.
> How about below for tentative fix?
>
> if (!ret) {
> ret = -ENOENT;
> /* We have to forcibly free the trigger_data */
> goto out_free;
> } else if (ret > 0)
> ret = 0;
Or better yet, match it properly:
out_reg:
/* Up the trigger_data count to make sure reg doesn't free it on failure */
event_trigger_init(trigger_ops, trigger_data);
ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
/*
* The above returns on success the # of functions enabled,
* but if it didn't find any functions it returns zero.
* Consider no functions a failure too.
*/
if (!ret) {
cmd_ops->unreg(glob, trigger_ops, trigger_data, file);
ret = -ENOENT;
} else if (ret > 0)
ret = 0;
/* Down the counter of trigger_data or free it if not used anymore */
event_trigger_free(trigger_ops, trigger_data);
out:
return ret;
-- Steve
On Tue, 24 Jul 2018 22:41:49 -0400
Steven Rostedt <[email protected]> wrote:
> On Wed, 25 Jul 2018 10:16:53 +0900
> Masami Hiramatsu <[email protected]> wrote:
>
> > Hm, as far as I can see, when register_trigger() returns >= 0, it already
> > calls ->init the trigger_data. This means its refcount++, and in that
> > case, below patch will miss to free the trigger_data.
> > How about below for tentative fix?
> >
> > if (!ret) {
> > ret = -ENOENT;
> > /* We have to forcibly free the trigger_data */
> > goto out_free;
> > } else if (ret > 0)
> > ret = 0;
>
> Or better yet, match it properly:
OK, this looks good to me :)
Reviewed-by: Masami Hiramatsu <[email protected]>
Thanks!
>
> out_reg:
> /* Up the trigger_data count to make sure reg doesn't free it on failure */
> event_trigger_init(trigger_ops, trigger_data);
> ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
> /*
> * The above returns on success the # of functions enabled,
> * but if it didn't find any functions it returns zero.
> * Consider no functions a failure too.
> */
> if (!ret) {
> cmd_ops->unreg(glob, trigger_ops, trigger_data, file);
> ret = -ENOENT;
> } else if (ret > 0)
> ret = 0;
>
> /* Down the counter of trigger_data or free it if not used anymore */
> event_trigger_free(trigger_ops, trigger_data);
> out:
> return ret;
>
> -- Steve
--
Masami Hiramatsu <[email protected]>
Hi Steve,
On 7/24/2018 4:30 PM, Steven Rostedt wrote:
> On Tue, 24 Jul 2018 16:49:59 -0400
> Steven Rostedt <[email protected]> wrote:
>
>>> Hmm it seems we should review the register_trigger() implementation.
>>> It should return the return value of trace_event_trigger_enable_disable(),
>>> shouldn't it?
>>>
>>
>> Yeah, that's not done well. I'll fix it up.
>>
>> Thanks for pointing it out.
>
> Tom,
>
> register_trigger() is messed up. I should have caught this when it was
> first submitted, but I'm totally confused. The comments don't match the
> code.
>
> First we have this:
>
> ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
> /*
> * The above returns on success the # of functions enabled,
> * but if it didn't find any functions it returns zero.
> * Consider no functions a failure too.
> */
>
> Which looks to be total BS.
Yes, it is BS in the case of event triggers. This was taken from the
ftrace function trigger code, where it does make sense. I think I left
it in thinking the code would at some point later converge.
>
> As we have this:
>
> /**
> * register_trigger - Generic event_command @reg implementation
> * @glob: The raw string used to register the trigger
> * @ops: The trigger ops associated with the trigger
> * @data: Trigger-specific data to associate with the trigger
> * @file: The trace_event_file associated with the event
> *
> * Common implementation for event trigger registration.
> *
> * Usually used directly as the @reg method in event command
> * implementations.
> *
> * Return: 0 on success, errno otherwise
And this is how it should work.
> */
> static int register_trigger(char *glob, struct event_trigger_ops *ops,
> struct event_trigger_data *data,
> struct trace_event_file *file)
> {
> struct event_trigger_data *test;
> int ret = 0;
>
> list_for_each_entry_rcu(test, &file->triggers, list) {
> if (test->cmd_ops->trigger_type == data->cmd_ops->trigger_type) {
> ret = -EEXIST;
> goto out;
> }
> }
>
> if (data->ops->init) {
> ret = data->ops->init(data->ops, data);
> if (ret < 0)
> goto out;
> }
>
> list_add_rcu(&data->list, &file->triggers);
> ret++;
>
> update_cond_flag(file);
> if (trace_event_trigger_enable_disable(file, 1) < 0) {
> list_del_rcu(&data->list);
> update_cond_flag(file);
> ret--;
> }
> out:
> return ret;
> }
>
> Where the comment is total wrong. It doesn't return 0 on success, it
> returns 1. And if trace_event_trigger_enable_disable() fails it returns
> zero.
>
> And that can fail with the call->class->reg() return value, which could
> fail for various strange reasons. I don't know why we would want to
> return 0 when it fails?
>
> I don't see where ->reg() would return anything but 1 on success. Maybe
> I'm missing something. I'll look some more, but I'm thinking of changing
> ->reg() to return zero on all success, and negative on all errors and
> just check those results.
>
Right, in the case of event triggers, we only register one at a time,
whereas with the trace function triggers, with globbing multiple
functions can register triggers at the same time, so it makes sense
there to have reg() return a count and the more convoluted error handling.
So I agree, simplifying things here by using the standard error handling
would be an improvement.
Tom
> -- Steve
>
On Wed, 25 Jul 2018 11:01:22 -0500
Tom Zanussi <[email protected]> wrote:
> > First we have this:
> >
> > ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
> > /*
> > * The above returns on success the # of functions enabled,
> > * but if it didn't find any functions it returns zero.
> > * Consider no functions a failure too.
> > */
> >
> > Which looks to be total BS.
>
> Yes, it is BS in the case of event triggers. This was taken from the
> ftrace function trigger code, where it does make sense. I think I left
> it in thinking the code would at some point later converge.
OK, that makes a little more sense.
>
> >
> > As we have this:
> >
> > /**
> > * register_trigger - Generic event_command @reg implementation
> > * @glob: The raw string used to register the trigger
> > * @ops: The trigger ops associated with the trigger
> > * @data: Trigger-specific data to associate with the trigger
> > * @file: The trace_event_file associated with the event
> > *
> > * Common implementation for event trigger registration.
> > *
> > * Usually used directly as the @reg method in event command
> > * implementations.
> > *
> > * Return: 0 on success, errno otherwise
>
> And this is how it should work.
>
> > */
> > static int register_trigger(char *glob, struct event_trigger_ops *ops,
> > struct event_trigger_data *data,
> > struct trace_event_file *file)
> > {
> > struct event_trigger_data *test;
> > int ret = 0;
> >
> > list_for_each_entry_rcu(test, &file->triggers, list) {
> > if (test->cmd_ops->trigger_type == data->cmd_ops->trigger_type) {
> > ret = -EEXIST;
> > goto out;
> > }
> > }
> >
> > if (data->ops->init) {
> > ret = data->ops->init(data->ops, data);
> > if (ret < 0)
> > goto out;
> > }
> >
> > list_add_rcu(&data->list, &file->triggers);
> > ret++;
> >
> > update_cond_flag(file);
> > if (trace_event_trigger_enable_disable(file, 1) < 0) {
> > list_del_rcu(&data->list);
> > update_cond_flag(file);
> > ret--;
> > }
> > out:
> > return ret;
> > }
> >
> > Where the comment is total wrong. It doesn't return 0 on success, it
> > returns 1. And if trace_event_trigger_enable_disable() fails it returns
> > zero.
> >
> > And that can fail with the call->class->reg() return value, which could
> > fail for various strange reasons. I don't know why we would want to
> > return 0 when it fails?
> >
> > I don't see where ->reg() would return anything but 1 on success. Maybe
> > I'm missing something. I'll look some more, but I'm thinking of changing
> > ->reg() to return zero on all success, and negative on all errors and
> > just check those results.
> >
>
> Right, in the case of event triggers, we only register one at a time,
> whereas with the trace function triggers, with globbing multiple
> functions can register triggers at the same time, so it makes sense
> there to have reg() return a count and the more convoluted error handling.
OK, reg in function probes will be handled differently.
>
> So I agree, simplifying things here by using the standard error handling
> would be an improvement.
I'll start working on something for 4.19 to simplify it.
Thanks for confirming!
-- Steve