Hello,
It took me several hours to realize that the strange bug I hit was
caused by the change I have nacked in the past ;)
But it appears that I should take the blame. I was cc'ed, but I missed
that email or forgot to reply, so another attempt to push this "trivial"
change was successful.
I think that 1/4 should go to stable. 2-4 are "while at it" fixes.
Oleg.
kernel/events/uprobes.c | 6 ++--
kernel/trace/trace_uprobe.c | 46 +++++++++++++++++++++++++-----------------
2 files changed, 30 insertions(+), 22 deletions(-)
The usage of uprobe_buffer_enable() added by dcad1a20 is very wrong,
1. uprobe_buffer_enable() and uprobe_buffer_disable() are not balanced,
_enable() should be called only if !enabled.
2. If uprobe_buffer_enable() fails probe_event_enable() should clear
tp.flags and free event_file_link.
3. If uprobe_register() fails it should do uprobe_buffer_disable().
Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_uprobe.c | 31 +++++++++++++++++++------------
1 files changed, 19 insertions(+), 12 deletions(-)
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index c4cf0ab..3c9b97e 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -911,26 +911,33 @@ probe_event_enable(struct trace_uprobe *tu, struct ftrace_event_file *file,
tu->tp.flags |= TP_FLAG_PROFILE;
}
- ret = uprobe_buffer_enable();
- if (ret < 0)
- return ret;
-
WARN_ON(!uprobe_filter_is_empty(&tu->filter));
if (enabled)
return 0;
+ ret = uprobe_buffer_enable();
+ if (ret)
+ goto err_flags;
+
tu->consumer.filter = filter;
ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
- if (ret) {
- if (file) {
- list_del(&link->list);
- kfree(link);
- tu->tp.flags &= ~TP_FLAG_TRACE;
- } else
- tu->tp.flags &= ~TP_FLAG_PROFILE;
- }
+ if (ret)
+ goto err_buffer;
+ return 0;
+
+ err_buffer:
+ uprobe_buffer_disable();
+
+ err_flags:
+ if (file) {
+ list_del(&link->list);
+ kfree(link);
+ tu->tp.flags &= ~TP_FLAG_TRACE;
+ } else {
+ tu->tp.flags &= ~TP_FLAG_PROFILE;
+ }
return ret;
}
--
1.5.5.1
Add WARN_ON's into uprobe_unregister() and uprobe_apply() to ensure
that nobody tries to play with the dead uprobe/consumer. This helps
to catch the bugs like the one fixed by the previous patch.
In the longer term we should fix this poorly designed interface.
uprobe_register() should return "struct uprobe *" which should be
passed to apply/unregister. Plus other semantic changes, see the
changelog in commit 41ccba029e94.
Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/events/uprobes.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c445e39..6f3254e 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -846,7 +846,7 @@ static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *u
{
int err;
- if (!consumer_del(uprobe, uc)) /* WARN? */
+ if (WARN_ON(!consumer_del(uprobe, uc)))
return;
err = register_for_each_vma(uprobe, NULL);
@@ -927,7 +927,7 @@ int uprobe_apply(struct inode *inode, loff_t offset,
int ret = -ENOENT;
uprobe = find_uprobe(inode, offset);
- if (!uprobe)
+ if (WARN_ON(!uprobe))
return ret;
down_write(&uprobe->register_rwsem);
@@ -952,7 +952,7 @@ void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume
struct uprobe *uprobe;
uprobe = find_uprobe(inode, offset);
- if (!uprobe)
+ if (WARN_ON(!uprobe))
return;
down_write(&uprobe->register_rwsem);
--
1.5.5.1
This reverts commit 43fe98913c9f67e3b523615ee3316f9520a623e0.
This patch is very wrong. Firstly, this change leads to unbalanced
uprobe_unregister(). Just for example,
# perf probe -x /lib/libc.so.6 syscall
# echo 1 >> /sys/kernel/debug/tracing/events/probe_libc/enable
# perf record -e probe_libc:syscall whatever
after that uprobe is dead (unregistered) but the user of ftrace/perf
can't know this, and it looks as if nobody hits this probe.
This would be easy to fix, but there are other reasons why it is not
simple to mix ftrace and perf. If nothing else, they can't share the
same ->consumer.filter. This is fixable too, but probably we need to
fix the poorly designed uprobe_register() interface first. At least
"register" and "apply" should be clearly separated.
Cc: [email protected] # v3.14
Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_uprobe.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 04fdb5d..08e7970 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -893,6 +893,9 @@ probe_event_enable(struct trace_uprobe *tu, struct ftrace_event_file *file,
int ret;
if (file) {
+ if (tu->tp.flags & TP_FLAG_PROFILE)
+ return -EINTR;
+
link = kmalloc(sizeof(*link), GFP_KERNEL);
if (!link)
return -ENOMEM;
@@ -901,8 +904,12 @@ probe_event_enable(struct trace_uprobe *tu, struct ftrace_event_file *file,
list_add_tail_rcu(&link->list, &tu->tp.files);
tu->tp.flags |= TP_FLAG_TRACE;
- } else
+ } else {
+ if (tu->tp.flags & TP_FLAG_TRACE)
+ return -EINTR;
+
tu->tp.flags |= TP_FLAG_PROFILE;
+ }
ret = uprobe_buffer_enable();
if (ret < 0)
--
1.5.5.1
I do not know why dd9fa555d7bb "tracing/uprobes: Move argument fetching
to uprobe_dispatcher()" added the UPROBE_HANDLER_REMOVE, but it looks
wrong.
OK, perhaps it makes sense to avoid store_trace_args() if the tracee is
nacked by uprobe_perf_filter(). But then we should kill the same code
in uprobe_perf_func() and unify the TRACE/PROFILE filtering (we need to
do this anyway to mix perf/ftrace). Until then this code actually adds
the pessimization because uprobe_perf_filter() will be called twice and
return T in likely case.
Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_uprobe.c | 6 ------
1 files changed, 0 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 08e7970..c4cf0ab 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1208,12 +1208,6 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
current->utask->vaddr = (unsigned long) &udd;
-#ifdef CONFIG_PERF_EVENTS
- if ((tu->tp.flags & TP_FLAG_TRACE) == 0 &&
- !uprobe_perf_filter(&tu->consumer, 0, current->mm))
- return UPROBE_HANDLER_REMOVE;
-#endif
-
if (WARN_ON_ONCE(!uprobe_cpu_buffer))
return 0;
--
1.5.5.1
On Fri, 27 Jun 2014 19:01:40 +0200, Oleg Nesterov wrote:
> Add WARN_ON's into uprobe_unregister() and uprobe_apply() to ensure
> that nobody tries to play with the dead uprobe/consumer. This helps
> to catch the bugs like the one fixed by the previous patch.
>
> In the longer term we should fix this poorly designed interface.
> uprobe_register() should return "struct uprobe *" which should be
> passed to apply/unregister. Plus other semantic changes, see the
> changelog in commit 41ccba029e94.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Thanks,
Namhyung
> ---
> kernel/events/uprobes.c | 6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index c445e39..6f3254e 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -846,7 +846,7 @@ static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *u
> {
> int err;
>
> - if (!consumer_del(uprobe, uc)) /* WARN? */
> + if (WARN_ON(!consumer_del(uprobe, uc)))
> return;
>
> err = register_for_each_vma(uprobe, NULL);
> @@ -927,7 +927,7 @@ int uprobe_apply(struct inode *inode, loff_t offset,
> int ret = -ENOENT;
>
> uprobe = find_uprobe(inode, offset);
> - if (!uprobe)
> + if (WARN_ON(!uprobe))
> return ret;
>
> down_write(&uprobe->register_rwsem);
> @@ -952,7 +952,7 @@ void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume
> struct uprobe *uprobe;
>
> uprobe = find_uprobe(inode, offset);
> - if (!uprobe)
> + if (WARN_ON(!uprobe))
> return;
>
> down_write(&uprobe->register_rwsem);
On Fri, 27 Jun 2014 19:01:43 +0200, Oleg Nesterov wrote:
> I do not know why dd9fa555d7bb "tracing/uprobes: Move argument fetching
> to uprobe_dispatcher()" added the UPROBE_HANDLER_REMOVE, but it looks
> wrong.
>
> OK, perhaps it makes sense to avoid store_trace_args() if the tracee is
> nacked by uprobe_perf_filter(). But then we should kill the same code
> in uprobe_perf_func() and unify the TRACE/PROFILE filtering (we need to
> do this anyway to mix perf/ftrace). Until then this code actually adds
> the pessimization because uprobe_perf_filter() will be called twice and
> return T in likely case.
Right, I wanted to avoid to call the store_trace_args() which might be
costly if possible. But it seems not necessary since it doesn't get
called once handler returns UPROBE_HANDLER_REMOVE. And we need to fix
the filtering first..
So I'm okay with removing this "pessimization". :)
Acked-by: Namhyung Kim <[email protected]>
Thanks,
Namhyung
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace_uprobe.c | 6 ------
> 1 files changed, 0 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 08e7970..c4cf0ab 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -1208,12 +1208,6 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
>
> current->utask->vaddr = (unsigned long) &udd;
>
> -#ifdef CONFIG_PERF_EVENTS
> - if ((tu->tp.flags & TP_FLAG_TRACE) == 0 &&
> - !uprobe_perf_filter(&tu->consumer, 0, current->mm))
> - return UPROBE_HANDLER_REMOVE;
> -#endif
> -
> if (WARN_ON_ONCE(!uprobe_cpu_buffer))
> return 0;
Hi Oleg,
On Fri, 27 Jun 2014 19:01:36 +0200, Oleg Nesterov wrote:
> This reverts commit 43fe98913c9f67e3b523615ee3316f9520a623e0.
>
> This patch is very wrong. Firstly, this change leads to unbalanced
> uprobe_unregister(). Just for example,
>
> # perf probe -x /lib/libc.so.6 syscall
> # echo 1 >> /sys/kernel/debug/tracing/events/probe_libc/enable
> # perf record -e probe_libc:syscall whatever
>
> after that uprobe is dead (unregistered) but the user of ftrace/perf
> can't know this, and it looks as if nobody hits this probe.
Nah, I missed to check the unregister path.. :-/
>
> This would be easy to fix, but there are other reasons why it is not
> simple to mix ftrace and perf. If nothing else, they can't share the
> same ->consumer.filter. This is fixable too, but probably we need to
> fix the poorly designed uprobe_register() interface first. At least
> "register" and "apply" should be clearly separated.
Hmm.. right. It seems the current filter logic only cares about the
perf. If ftrace comes after perf, it might not see some events due to
the filter, right?
>
> Cc: [email protected] # v3.14
> Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Thanks,
Namhyung
> ---
> kernel/trace/trace_uprobe.c | 9 ++++++++-
> 1 files changed, 8 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 04fdb5d..08e7970 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -893,6 +893,9 @@ probe_event_enable(struct trace_uprobe *tu, struct ftrace_event_file *file,
> int ret;
>
> if (file) {
> + if (tu->tp.flags & TP_FLAG_PROFILE)
> + return -EINTR;
> +
> link = kmalloc(sizeof(*link), GFP_KERNEL);
> if (!link)
> return -ENOMEM;
> @@ -901,8 +904,12 @@ probe_event_enable(struct trace_uprobe *tu, struct ftrace_event_file *file,
> list_add_tail_rcu(&link->list, &tu->tp.files);
>
> tu->tp.flags |= TP_FLAG_TRACE;
> - } else
> + } else {
> + if (tu->tp.flags & TP_FLAG_TRACE)
> + return -EINTR;
> +
> tu->tp.flags |= TP_FLAG_PROFILE;
> + }
>
> ret = uprobe_buffer_enable();
> if (ret < 0)
On Fri, 27 Jun 2014 19:01:46 +0200, Oleg Nesterov wrote:
> The usage of uprobe_buffer_enable() added by dcad1a20 is very wrong,
>
> 1. uprobe_buffer_enable() and uprobe_buffer_disable() are not balanced,
> _enable() should be called only if !enabled.
>
> 2. If uprobe_buffer_enable() fails probe_event_enable() should clear
> tp.flags and free event_file_link.
>
> 3. If uprobe_register() fails it should do uprobe_buffer_disable().
>
> Signed-off-by: Oleg Nesterov <[email protected]>
Thanks for the fix.
Acked-by: Namhyung Kim <[email protected]>
> ---
> kernel/trace/trace_uprobe.c | 31 +++++++++++++++++++------------
> 1 files changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index c4cf0ab..3c9b97e 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -911,26 +911,33 @@ probe_event_enable(struct trace_uprobe *tu, struct ftrace_event_file *file,
> tu->tp.flags |= TP_FLAG_PROFILE;
> }
>
> - ret = uprobe_buffer_enable();
> - if (ret < 0)
> - return ret;
> -
> WARN_ON(!uprobe_filter_is_empty(&tu->filter));
>
> if (enabled)
> return 0;
>
> + ret = uprobe_buffer_enable();
> + if (ret)
> + goto err_flags;
> +
> tu->consumer.filter = filter;
> ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
> - if (ret) {
> - if (file) {
> - list_del(&link->list);
> - kfree(link);
> - tu->tp.flags &= ~TP_FLAG_TRACE;
> - } else
> - tu->tp.flags &= ~TP_FLAG_PROFILE;
> - }
> + if (ret)
> + goto err_buffer;
>
> + return 0;
> +
> + err_buffer:
> + uprobe_buffer_disable();
> +
> + err_flags:
> + if (file) {
> + list_del(&link->list);
> + kfree(link);
> + tu->tp.flags &= ~TP_FLAG_TRACE;
> + } else {
> + tu->tp.flags &= ~TP_FLAG_PROFILE;
> + }
> return ret;
> }
(2014/06/28 2:01), Oleg Nesterov wrote:
> The usage of uprobe_buffer_enable() added by dcad1a20 is very wrong,
>
> 1. uprobe_buffer_enable() and uprobe_buffer_disable() are not balanced,
> _enable() should be called only if !enabled.
>
> 2. If uprobe_buffer_enable() fails probe_event_enable() should clear
> tp.flags and free event_file_link.
>
> 3. If uprobe_register() fails it should do uprobe_buffer_disable().
Looks good to me ;)
Reviewed-by: Masami Hiramatsu <[email protected]>
Thanks,
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace_uprobe.c | 31 +++++++++++++++++++------------
> 1 files changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index c4cf0ab..3c9b97e 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -911,26 +911,33 @@ probe_event_enable(struct trace_uprobe *tu, struct ftrace_event_file *file,
> tu->tp.flags |= TP_FLAG_PROFILE;
> }
>
> - ret = uprobe_buffer_enable();
> - if (ret < 0)
> - return ret;
> -
> WARN_ON(!uprobe_filter_is_empty(&tu->filter));
>
> if (enabled)
> return 0;
>
> + ret = uprobe_buffer_enable();
> + if (ret)
> + goto err_flags;
> +
> tu->consumer.filter = filter;
> ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
> - if (ret) {
> - if (file) {
> - list_del(&link->list);
> - kfree(link);
> - tu->tp.flags &= ~TP_FLAG_TRACE;
> - } else
> - tu->tp.flags &= ~TP_FLAG_PROFILE;
> - }
> + if (ret)
> + goto err_buffer;
>
> + return 0;
> +
> + err_buffer:
> + uprobe_buffer_disable();
> +
> + err_flags:
> + if (file) {
> + list_del(&link->list);
> + kfree(link);
> + tu->tp.flags &= ~TP_FLAG_TRACE;
> + } else {
> + tu->tp.flags &= ~TP_FLAG_PROFILE;
> + }
> return ret;
> }
>
>
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
(2014/06/28 2:01), Oleg Nesterov wrote:
> This reverts commit 43fe98913c9f67e3b523615ee3316f9520a623e0.
>
> This patch is very wrong. Firstly, this change leads to unbalanced
> uprobe_unregister(). Just for example,
>
> # perf probe -x /lib/libc.so.6 syscall
> # echo 1 >> /sys/kernel/debug/tracing/events/probe_libc/enable
> # perf record -e probe_libc:syscall whatever
>
> after that uprobe is dead (unregistered) but the user of ftrace/perf
> can't know this, and it looks as if nobody hits this probe.
>
> This would be easy to fix, but there are other reasons why it is not
> simple to mix ftrace and perf. If nothing else, they can't share the
> same ->consumer.filter. This is fixable too, but probably we need to
> fix the poorly designed uprobe_register() interface first. At least
> "register" and "apply" should be clearly separated.
Hm, it should be a uprobes original issue. it's different from kprobes.
Reviewed-by: Masami Hiramatsu <[email protected]>
Thank you,
>
> Cc: [email protected] # v3.14
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace_uprobe.c | 9 ++++++++-
> 1 files changed, 8 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 04fdb5d..08e7970 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -893,6 +893,9 @@ probe_event_enable(struct trace_uprobe *tu, struct ftrace_event_file *file,
> int ret;
>
> if (file) {
> + if (tu->tp.flags & TP_FLAG_PROFILE)
> + return -EINTR;
> +
> link = kmalloc(sizeof(*link), GFP_KERNEL);
> if (!link)
> return -ENOMEM;
> @@ -901,8 +904,12 @@ probe_event_enable(struct trace_uprobe *tu, struct ftrace_event_file *file,
> list_add_tail_rcu(&link->list, &tu->tp.files);
>
> tu->tp.flags |= TP_FLAG_TRACE;
> - } else
> + } else {
> + if (tu->tp.flags & TP_FLAG_TRACE)
> + return -EINTR;
> +
> tu->tp.flags |= TP_FLAG_PROFILE;
> + }
>
> ret = uprobe_buffer_enable();
> if (ret < 0)
>
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
On Fri, 27 Jun 2014 19:01:16 +0200
Oleg Nesterov <[email protected]> wrote:
> Hello,
>
> It took me several hours to realize that the strange bug I hit was
> caused by the change I have nacked in the past ;)
>
> But it appears that I should take the blame. I was cc'ed, but I missed
> that email or forgot to reply, so another attempt to push this "trivial"
> change was successful.
>
> I think that 1/4 should go to stable. 2-4 are "while at it" fixes.
>
> Oleg.
>
> kernel/events/uprobes.c | 6 ++--
> kernel/trace/trace_uprobe.c | 46 +++++++++++++++++++++++++-----------------
> 2 files changed, 30 insertions(+), 22 deletions(-)
I pulled these into my urgent branch and I'm testing them now.
Thanks Oleg!
-- Steve
* Oleg Nesterov <[email protected]> [2014-06-27 19:01:36]:
> This reverts commit 43fe98913c9f67e3b523615ee3316f9520a623e0.
>
> This patch is very wrong. Firstly, this change leads to unbalanced
> uprobe_unregister(). Just for example,
>
> # perf probe -x /lib/libc.so.6 syscall
> # echo 1 >> /sys/kernel/debug/tracing/events/probe_libc/enable
> # perf record -e probe_libc:syscall whatever
>
> after that uprobe is dead (unregistered) but the user of ftrace/perf
> can't know this, and it looks as if nobody hits this probe.
>
> This would be easy to fix, but there are other reasons why it is not
> simple to mix ftrace and perf. If nothing else, they can't share the
> same ->consumer.filter. This is fixable too, but probably we need to
> fix the poorly designed uprobe_register() interface first. At least
> "register" and "apply" should be clearly separated.
>
> Cc: [email protected] # v3.14
> Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
--
Thanks and Regards
Srikar Dronamraju
* Oleg Nesterov <[email protected]> [2014-06-27 19:01:40]:
> Add WARN_ON's into uprobe_unregister() and uprobe_apply() to ensure
> that nobody tries to play with the dead uprobe/consumer. This helps
> to catch the bugs like the one fixed by the previous patch.
>
> In the longer term we should fix this poorly designed interface.
> uprobe_register() should return "struct uprobe *" which should be
> passed to apply/unregister. Plus other semantic changes, see the
> changelog in commit 41ccba029e94.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
--
Thanks and Regards
Srikar Dronamraju
* Oleg Nesterov <[email protected]> [2014-06-27 19:01:43]:
> I do not know why dd9fa555d7bb "tracing/uprobes: Move argument fetching
> to uprobe_dispatcher()" added the UPROBE_HANDLER_REMOVE, but it looks
> wrong.
>
> OK, perhaps it makes sense to avoid store_trace_args() if the tracee is
> nacked by uprobe_perf_filter(). But then we should kill the same code
> in uprobe_perf_func() and unify the TRACE/PROFILE filtering (we need to
> do this anyway to mix perf/ftrace). Until then this code actually adds
> the pessimization because uprobe_perf_filter() will be called twice and
> return T in likely case.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
--
Thanks and Regards
Srikar Dronamraju
> The usage of uprobe_buffer_enable() added by dcad1a20 is very wrong,
>
> 1. uprobe_buffer_enable() and uprobe_buffer_disable() are not balanced,
> _enable() should be called only if !enabled.
>
> 2. If uprobe_buffer_enable() fails probe_event_enable() should clear
> tp.flags and free event_file_link.
>
> 3. If uprobe_register() fails it should do uprobe_buffer_disable().
>
> Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
(one nit .. )
> + ret = uprobe_buffer_enable();
> + if (ret)
> + goto err_flags;
> +
> tu->consumer.filter = filter;
> ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
> - if (ret) {
> - if (file) {
> - list_del(&link->list);
> - kfree(link);
> - tu->tp.flags &= ~TP_FLAG_TRACE;
> - } else
> - tu->tp.flags &= ~TP_FLAG_PROFILE;
> - }
> + if (ret)
> + goto err_buffer;
>
> + return 0;
> +
> + err_buffer:
> + uprobe_buffer_disable();
> +
How about avoiding err_buffer label?
+ if (!ret)
+ return 0;
+ uprobe_buffer_disable();
+
> + err_flags:
> + if (file) {
> + list_del(&link->list);
>
> --
> 1.5.5.1
>
--
Thanks and Regards
Srikar Dronamraju
On Mon, 30 Jun 2014 22:34:09 +0530
Srikar Dronamraju <[email protected]> wrote:
> Acked-by: Srikar Dronamraju <[email protected]>
> (one nit .. )
>
> > + ret = uprobe_buffer_enable();
> > + if (ret)
> > + goto err_flags;
> > +
> > tu->consumer.filter = filter;
> > ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
> > - if (ret) {
> > - if (file) {
> > - list_del(&link->list);
> > - kfree(link);
> > - tu->tp.flags &= ~TP_FLAG_TRACE;
> > - } else
> > - tu->tp.flags &= ~TP_FLAG_PROFILE;
> > - }
> > + if (ret)
> > + goto err_buffer;
> >
> > + return 0;
> > +
> > + err_buffer:
> > + uprobe_buffer_disable();
> > +
>
> How about avoiding err_buffer label?
> + if (!ret)
> + return 0;
>
> + uprobe_buffer_disable();
> +
>
Oleg, you OK with this update?
I can kill my tests and restart with this update. Or you can resend this
patch. Or we can just push it as is, and have this be a patch that
get's queued as a cleanup for 3.17?
-- Steve
>
> > + err_flags:
> > + if (file) {
> > + list_del(&link->list);
> >
> > --
> > 1.5.5.1
> >
>
On 06/30, Srikar Dronamraju wrote:
>
> > The usage of uprobe_buffer_enable() added by dcad1a20 is very wrong,
> >
> > 1. uprobe_buffer_enable() and uprobe_buffer_disable() are not balanced,
> > _enable() should be called only if !enabled.
> >
> > 2. If uprobe_buffer_enable() fails probe_event_enable() should clear
> > tp.flags and free event_file_link.
> >
> > 3. If uprobe_register() fails it should do uprobe_buffer_disable().
> >
> > Signed-off-by: Oleg Nesterov <[email protected]>
>
> Acked-by: Srikar Dronamraju <[email protected]>
Thanks!
> (one nit .. )
>
> > + ret = uprobe_buffer_enable();
> > + if (ret)
> > + goto err_flags;
> > +
> > tu->consumer.filter = filter;
> > ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
> > - if (ret) {
> > - if (file) {
> > - list_del(&link->list);
> > - kfree(link);
> > - tu->tp.flags &= ~TP_FLAG_TRACE;
> > - } else
> > - tu->tp.flags &= ~TP_FLAG_PROFILE;
> > - }
> > + if (ret)
> > + goto err_buffer;
> >
> > + return 0;
> > +
> > + err_buffer:
> > + uprobe_buffer_disable();
> > +
>
> How about avoiding err_buffer label?
> + if (!ret)
> + return 0;
>
> + uprobe_buffer_disable();
> +
Well, I do not really mind. But to me it looks more consistent this way,
if-something-fail-goto-err_label.
IOW, I think that the code should either not use err-labels, or always
use them like above.
Besides, perhaps we will add "if (file) uprobe_apply()" after _register()
to mix perf/ftrace, then we will need to change this "if (!ret)" code again.
Oleg.
On 06/30, Steven Rostedt wrote:
>
> On Mon, 30 Jun 2014 22:34:09 +0530
> Srikar Dronamraju <[email protected]> wrote:
>
> > > + if (ret)
> > > + goto err_buffer;
> > >
> > > + return 0;
> > > +
> > > + err_buffer:
> > > + uprobe_buffer_disable();
> > > +
> >
> > How about avoiding err_buffer label?
> > + if (!ret)
> > + return 0;
> >
> > + uprobe_buffer_disable();
> > +
> >
>
> Oleg, you OK with this update?
>
> I can kill my tests and restart with this update. Or you can resend this
> patch. Or we can just push it as is, and have this be a patch that
> get's queued as a cleanup for 3.17?
Well, if you too think that this change can make the code cleaner I should
probably make it ;)
But, to me
err = init_1();
if (err)
goto err_1;
err = init_2();
if (err)
goto err_2;
return 0;
err_2:
cleanup_2();
err_1:
cleanup_1();
looks better than
err = init_1();
if (err)
goto err_1;
err = init_2();
if (!err)
return 0;
cleanup_2();
err_1:
cleanup_1();
just because the 1st variant is more symmetrical. And in fact it is more
flexible, we might add init_3/etc.
But I won't insist, this is subjective. So please let me know if you still
think it would be better to add this change, I'll send v2.
Oleg.
On Mon, 30 Jun 2014 19:50:25 +0200
Oleg Nesterov <[email protected]> wrote:
> Well, I do not really mind. But to me it looks more consistent this way,
> if-something-fail-goto-err_label.
>
> IOW, I think that the code should either not use err-labels, or always
> use them like above.
Ah I missed the other error labels. Yeah, we should keep the patch as
is to be consistent (also it means I don't need to restart a test
that's already been running for hours).
-- Steve
>
> Besides, perhaps we will add "if (file) uprobe_apply()" after _register()
> to mix perf/ftrace, then we will need to change this "if (!ret)" code again.
>
> Oleg.
On Mon, 30 Jun 2014 19:58:36 +0200
Oleg Nesterov <[email protected]> wrote:
> But I won't insist, this is subjective. So please let me know if you still
> think it would be better to add this change, I'll send v2.
Don't bother. I didn't look at the patch in context to make that reply.
I think your original patch looks better than the alternative. I'll
keep it as is.
Thanks,
-- Steve
Hi Namhyung,
On 06/30, Namhyung Kim wrote:
>
> On Fri, 27 Jun 2014 19:01:36 +0200, Oleg Nesterov wrote:
> >
> > This would be easy to fix, but there are other reasons why it is not
> > simple to mix ftrace and perf. If nothing else, they can't share the
> > same ->consumer.filter. This is fixable too, but probably we need to
> > fix the poorly designed uprobe_register() interface first. At least
> > "register" and "apply" should be clearly separated.
>
> Hmm.. right. It seems the current filter logic only cares about the
> perf. If ftrace comes after perf, it might not see some events due to
> the filter, right?
Yes. Or vice versa, ftrace can miss the events because perf can return
UPROBE_HANDLER_REMOVE. Or ftrace can come after perf, but in this case
it should call uprobe_apply() or it won't add the new breakpoints.
Actually, I'll probably try to make the patch tomorrow. It looks simple
enough, the main complication is CONFIG_PERF. And, to keep this patch
simple, I won't try to optimize (say) the TP_FLAG_TRACE-comes-first
case which could avoid uprobe_apply().
Yes, I still think it would be better to change the register/unregister
API first, but I do not know when I do this ;)
> > Cc: [email protected] # v3.14
> > Signed-off-by: Oleg Nesterov <[email protected]>
>
> Acked-by: Namhyung Kim <[email protected]>
Thanks!
Oleg.
Namhyung, Masami,
Please look at the question below. Perhaps we discussed this before,
but I can recall nothing.
On 06/30, Oleg Nesterov wrote:
>
> Actually, I'll probably try to make the patch tomorrow. It looks simple
> enough, the main complication is CONFIG_PERF. And, to keep this patch
> simple, I won't try to optimize (say) the TP_FLAG_TRACE-comes-first
> case which could avoid uprobe_apply().
I regret very much I said this ;) OK, I'll probably try anyway, but...
> Yes, I still think it would be better to change the register/unregister
> API first, but I do not know when I do this ;)
OK, we can do this later.
But it turns out that trace_uprobe.c needs other cleanups, and I simply
can't uglify this code more without these cleanups... Starting from
set/clear TP_FLAG_*. I'll recheck, but it seems that there is no reason
to carefully set (say) TP_FLAG_TRACE before uprobe_register() and then
clear it if _register() fails. And uprobe_dispatcher() is very ugly if
is_ret_probe(). And more. So it needs a series.
-------------------------------------------------------------------------
And. I am puzzled by probe_event_disable()->synchronize_sched(). Why
do we need it? I mean, why we can't use call_rcu() ? The comment says
"synchronize with u{,ret}probe_trace_func" but it doesn't explain _why_
do we need to sync.
I thought that perhaps the caller needs to synch with the callbacks.
Say, __trace_remove_event_call() can destroy the data which can be used
by the callbacks. But no, this is only possible if we are going to call
uprobe_unregister(), and this adds the necessary serialization.
So why? Looks like, the only reason is instance_rmdir() which can do
TRACE_REG_UNREGISTER and after that destroy this ftrace_event_file?
But event_trace_del_tracer() also has synchronize_sched() right after
__ftrace_set_clr_event_nolock() ?
So please tell me why do we need this synchronize_sched ;) And imo
this should be documented.
Oleg.
Hi Oleg,
On Tue, 1 Jul 2014 21:31:47 +0200, Oleg Nesterov wrote:
> Namhyung, Masami,
>
> Please look at the question below. Perhaps we discussed this before,
> but I can recall nothing.
I'm not sure I grok the code enough to answer your question, but...
>
>
> On 06/30, Oleg Nesterov wrote:
>>
>> Actually, I'll probably try to make the patch tomorrow. It looks simple
>> enough, the main complication is CONFIG_PERF. And, to keep this patch
>> simple, I won't try to optimize (say) the TP_FLAG_TRACE-comes-first
>> case which could avoid uprobe_apply().
>
> I regret very much I said this ;) OK, I'll probably try anyway, but...
>
>> Yes, I still think it would be better to change the register/unregister
>> API first, but I do not know when I do this ;)
>
> OK, we can do this later.
>
> But it turns out that trace_uprobe.c needs other cleanups, and I simply
> can't uglify this code more without these cleanups... Starting from
> set/clear TP_FLAG_*. I'll recheck, but it seems that there is no reason
> to carefully set (say) TP_FLAG_TRACE before uprobe_register() and then
> clear it if _register() fails. And uprobe_dispatcher() is very ugly if
> is_ret_probe(). And more. So it needs a series.
Okay, I'd like to see it. :)
>
> -------------------------------------------------------------------------
> And. I am puzzled by probe_event_disable()->synchronize_sched(). Why
> do we need it? I mean, why we can't use call_rcu() ? The comment says
> "synchronize with u{,ret}probe_trace_func" but it doesn't explain _why_
> do we need to sync.
It looks like the code was copied from trace_kprobe.c file. But IIUC,
unlike kprobes, uprobe events are always called in a process context.
Also u{,ret}probe_trace_func() call handlers under rcu_read_lock() not
rcu_read_lock_sched() so I guess the synchronize_sched() can go.
Thanks,
Namhyung
>
> I thought that perhaps the caller needs to synch with the callbacks.
> Say, __trace_remove_event_call() can destroy the data which can be used
> by the callbacks. But no, this is only possible if we are going to call
> uprobe_unregister(), and this adds the necessary serialization.
>
> So why? Looks like, the only reason is instance_rmdir() which can do
> TRACE_REG_UNREGISTER and after that destroy this ftrace_event_file?
> But event_trace_del_tracer() also has synchronize_sched() right after
> __ftrace_set_clr_event_nolock() ?
>
> So please tell me why do we need this synchronize_sched ;) And imo
> this should be documented.
>
> Oleg.
(2014/07/02 4:31), Oleg Nesterov wrote:
> Namhyung, Masami,
>
> Please look at the question below. Perhaps we discussed this before,
> but I can recall nothing.
>
>
> On 06/30, Oleg Nesterov wrote:
>>
>> Actually, I'll probably try to make the patch tomorrow. It looks simple
>> enough, the main complication is CONFIG_PERF. And, to keep this patch
>> simple, I won't try to optimize (say) the TP_FLAG_TRACE-comes-first
>> case which could avoid uprobe_apply().
>
> I regret very much I said this ;) OK, I'll probably try anyway, but...
>
>> Yes, I still think it would be better to change the register/unregister
>> API first, but I do not know when I do this ;)
>
> OK, we can do this later.
>
> But it turns out that trace_uprobe.c needs other cleanups, and I simply
> can't uglify this code more without these cleanups... Starting from
> set/clear TP_FLAG_*. I'll recheck, but it seems that there is no reason
> to carefully set (say) TP_FLAG_TRACE before uprobe_register() and then
> clear it if _register() fails. And uprobe_dispatcher() is very ugly if
> is_ret_probe(). And more. So it needs a series.
>
> -------------------------------------------------------------------------
> And. I am puzzled by probe_event_disable()->synchronize_sched(). Why
> do we need it? I mean, why we can't use call_rcu() ? The comment says
> "synchronize with u{,ret}probe_trace_func" but it doesn't explain _why_
> do we need to sync.
>
> I thought that perhaps the caller needs to synch with the callbacks.
> Say, __trace_remove_event_call() can destroy the data which can be used
> by the callbacks. But no, this is only possible if we are going to call
> uprobe_unregister(), and this adds the necessary serialization.
Hmm, similar code in the trace_kprobe.c said that trace_remove_event_call()
will remove something, so it should be synchronized.
Here, I tracked down the path from trace_remove_event_call().
1) trace_remove_event_call() locks mutexes to protect event status from
others, and calls probe_remove_event_call()
2) probe_remove_event_call() checks call's refcount and state of files which
related to the call. If there is any enabled file or reference, it returns
EBUSY.
If there is no active user of the call, it calls __trace_remove_event_call()
3) __trace_remove_event_call() calls event_remove(call) and after that it
releases all objects related to the call [*1].
4) Finally, event_remove() tries to disable each file on the call and soon
after that calls destroy_preds() [*2]. Here, to disable the file, it calls
ftrace_event_enable_disable() and it finally calls probe_event_disable(),
only if the event is *enabled*.
And as above described, probe_remove_event_call() already checked that
no user enables this event. This means that nobody calls probe_event_disable()
on the path from trace_remove_event_call().
Hmm, however, for [*1] and [*2], both are free event_file related objects
right after disabling the event. This means those expects that calling
ftrace_event_enable_disable() ensures no event handlers running.
synchronize_sched makes it true, but as long as calling
__trace_remove_event_call() from probe_remove_event_call(),
probe_event_disable() is not called at all because the event has to be
disabled when probe_remove_event_call() is called.
One possible scenario is here; someone disables an event and tries to remove
it (both will be done by different syscalls). If we don't synchronize
the first disabling, the event flag set disabled, but the event itself
is not disabled. Thus event handler is still possible to be running
somewhere when it is removed.
The other path of __trace_remove_event_call() is trace_module_remove_events()
which is not related to kprobes/uprobes (Even so, there is no obvious check of
that.)
> So why? Looks like, the only reason is instance_rmdir() which can do
> TRACE_REG_UNREGISTER and after that destroy this ftrace_event_file?
> But event_trace_del_tracer() also has synchronize_sched() right after
> __ftrace_set_clr_event_nolock() ?
I think it doesn't need to call synchronize_sched() because
event_trace_del_tracer() ensures that all events are disabled
(handlers are not running anymore)
Thank you,
>
> So please tell me why do we need this synchronize_sched ;) And imo
> this should be documented.
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
(2014/07/02 4:31), Oleg Nesterov wrote:
> Namhyung, Masami,
>
> Please look at the question below. Perhaps we discussed this before,
> but I can recall nothing.
>
>
> On 06/30, Oleg Nesterov wrote:
>>
>> Actually, I'll probably try to make the patch tomorrow. It looks simple
>> enough, the main complication is CONFIG_PERF. And, to keep this patch
>> simple, I won't try to optimize (say) the TP_FLAG_TRACE-comes-first
>> case which could avoid uprobe_apply().
>
> I regret very much I said this ;) OK, I'll probably try anyway, but...
>
>> Yes, I still think it would be better to change the register/unregister
>> API first, but I do not know when I do this ;)
>
> OK, we can do this later.
>
> But it turns out that trace_uprobe.c needs other cleanups, and I simply
> can't uglify this code more without these cleanups... Starting from
> set/clear TP_FLAG_*. I'll recheck, but it seems that there is no reason
> to carefully set (say) TP_FLAG_TRACE before uprobe_register() and then
> clear it if _register() fails. And uprobe_dispatcher() is very ugly if
> is_ret_probe(). And more. So it needs a series.
>
> -------------------------------------------------------------------------
> And. I am puzzled by probe_event_disable()->synchronize_sched(). Why
> do we need it? I mean, why we can't use call_rcu() ? The comment says
> "synchronize with u{,ret}probe_trace_func" but it doesn't explain _why_
> do we need to sync.
>
> I thought that perhaps the caller needs to synch with the callbacks.
> Say, __trace_remove_event_call() can destroy the data which can be used
> by the callbacks. But no, this is only possible if we are going to call
> uprobe_unregister(), and this adds the necessary serialization.
Hmm, similar code in the trace_kprobe.c said that trace_remove_event_call()
will remove something, so it should be synchronized.
Here, I tracked down the path from trace_remove_event_call().
1) trace_remove_event_call() locks mutexes to protect event status from
others, and calls probe_remove_event_call()
2) probe_remove_event_call() checks call's refcount and state of files which
related to the call. If there is any enabled file or reference, it returns
EBUSY.
If there is no active user of the call, it calls __trace_remove_event_call()
3) __trace_remove_event_call() calls event_remove(call) and after that it
releases all objects related to the call [*1].
4) Finally, event_remove() tries to disable each file on the call and soon
after that calls destroy_preds() [*2]. Here, to disable the file, it calls
ftrace_event_enable_disable() and it finally calls probe_event_disable(),
only if the event is *enabled*.
And as above described, probe_remove_event_call() already checked that
no user enables this event. This means that nobody calls probe_event_disable()
on the path from trace_remove_event_call().
Hmm, however, for [*1] and [*2], both are free event_file related objects
right after disabling the event. This means those expects that calling
ftrace_event_enable_disable() ensures no event handlers running.
synchronize_sched makes it true, but as long as calling
__trace_remove_event_call() from probe_remove_event_call(),
probe_event_disable() is not called at all because the event has to be
disabled when probe_remove_event_call() is called.
One possible scenario is here; someone disables an event and tries to remove
it (both will be done by different syscalls). If we don't synchronize
the first disabling, the event flag set disabled, but the event itself
is not disabled. Thus event handler is still possible to be running
somewhere when it is removed.
The other path of __trace_remove_event_call() is trace_module_remove_events()
which is not related to kprobes/uprobes (Even so, there is no obvious check of
that.)
> So why? Looks like, the only reason is instance_rmdir() which can do
> TRACE_REG_UNREGISTER and after that destroy this ftrace_event_file?
> But event_trace_del_tracer() also has synchronize_sched() right after
> __ftrace_set_clr_event_nolock() ?
I think it doesn't need to call synchronize_sched() because
event_trace_del_tracer() ensures that all events are disabled
(handlers are not running anymore)
Thank you,
>
> So please tell me why do we need this synchronize_sched ;) And imo
> this should be documented.
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
Hi Masami,
On Thu, 03 Jul 2014 14:46:09 +0900, Masami Hiramatsu wrote:
> One possible scenario is here; someone disables an event and tries to remove
> it (both will be done by different syscalls). If we don't synchronize
> the first disabling, the event flag set disabled, but the event itself
> is not disabled. Thus event handler is still possible to be running
> somewhere when it is removed.
But, IIUC both of disable and remove path are protected by event_mutex.
So one cannot see the case of disabled event flag but enabled event, no?
Thanks,
Namhyung
On 07/03, Namhyung Kim wrote:
>
> On Tue, 1 Jul 2014 21:31:47 +0200, Oleg Nesterov wrote:
> >
> > And. I am puzzled by probe_event_disable()->synchronize_sched(). Why
> > do we need it? I mean, why we can't use call_rcu() ? The comment says
> > "synchronize with u{,ret}probe_trace_func" but it doesn't explain _why_
> > do we need to sync.
>
> It looks like the code was copied from trace_kprobe.c file. But IIUC,
> unlike kprobes, uprobe events are always called in a process context.
>
> Also u{,ret}probe_trace_func() call handlers under rcu_read_lock() not
> rcu_read_lock_sched() so I guess the synchronize_sched() can go.
Heh ;) I didn't even notice that "synchronize" and "lock" do not match.
So this should be fixed anyway. But lets discuss other issues first.
Oleg.
On 07/03, Masami Hiramatsu wrote:
>
> (2014/07/02 4:31), Oleg Nesterov wrote:
> > And. I am puzzled by probe_event_disable()->synchronize_sched(). Why
> > do we need it? I mean, why we can't use call_rcu() ? The comment says
> > "synchronize with u{,ret}probe_trace_func" but it doesn't explain _why_
> > do we need to sync.
> >
> > I thought that perhaps the caller needs to synch with the callbacks.
> > Say, __trace_remove_event_call() can destroy the data which can be used
> > by the callbacks. But no, this is only possible if we are going to call
> > uprobe_unregister(), and this adds the necessary serialization.
>
> Hmm, similar code in the trace_kprobe.c said that trace_remove_event_call()
> will remove something, so it should be synchronized.
>
> Here, I tracked down the path from trace_remove_event_call().
>
> 1) trace_remove_event_call() locks mutexes to protect event status from
> others, and calls probe_remove_event_call()
>
> 2) probe_remove_event_call() checks call's refcount and state of files which
> related to the call. If there is any enabled file or reference, it returns
> EBUSY.
Yep. So this path is fine. uprobe_unregister() was already called
(FTRACE_EVENT_FL_ENABLED is not set), there are no callbacks in flight.
> One possible scenario is here; someone disables an event and tries to remove
> it (both will be done by different syscalls). If we don't synchronize
> the first disabling, the event flag set disabled, but the event itself
> is not disabled. Thus event handler is still possible to be running
> somewhere when it is removed.
See above. "remove" can't succed until all ftrace_event_file's are inactive.
And probe_event_disable() does uprobe_unregister() in this case which (again)
serializes with the callbacks itself.
> The other path of __trace_remove_event_call() is trace_module_remove_events()
> which is not related to kprobes/uprobes (Even so, there is no obvious check of
> that.)
Yes, uprobe can ignore modules ;)
> > So why? Looks like, the only reason is instance_rmdir() which can do
> > TRACE_REG_UNREGISTER and after that destroy this ftrace_event_file?
> > But event_trace_del_tracer() also has synchronize_sched() right after
> > __ftrace_set_clr_event_nolock() ?
>
> I think it doesn't need to call synchronize_sched() because
> event_trace_del_tracer() ensures that all events are disabled
> (handlers are not running anymore)
Not really, afaics... Well yes, it calls __ftrace_set_clr_event_nolock(),
but this can race with the callbacks; this doesn't necessary call
uprobe_unregister() because there can be other active ftrace_event_file's.
So we need to synchronize before we start to destroy the data.
So do you agree that we can remove that synchronize_sched() in trace_uprobe.c
and replace it with call_rcu?
Hmm. Off-topic, but it seems that instance_rmdir() leaks the memory? Say,
file->filter?
Oleg.
On 07/03, Oleg Nesterov wrote:
>
> Hmm. Off-topic, but it seems that instance_rmdir() leaks the memory? Say,
> file->filter?
Perhaps I am totally confused, but don't we need something like the patch
below? I'll try to recheck later...
Better yet, we can probably move destroy_preds() from event_remove() to
remove_event_file_dir()... not sure, need to recheck.
Oleg.
--- x/kernel/trace/trace_events.c
+++ x/kernel/trace/trace_events.c
@@ -470,6 +470,7 @@ static void remove_event_file_dir(struct ftrace_event_file *file)
list_del(&file->list);
remove_subsystem(file->system);
+ destroy_file_preds(file);
kmem_cache_free(file_cachep, file);
}
(2014/07/03 16:44), Namhyung Kim wrote:
> Hi Masami,
>
> On Thu, 03 Jul 2014 14:46:09 +0900, Masami Hiramatsu wrote:
>> One possible scenario is here; someone disables an event and tries to remove
>> it (both will be done by different syscalls). If we don't synchronize
>> the first disabling, the event flag set disabled, but the event itself
>> is not disabled. Thus event handler is still possible to be running
>> somewhere when it is removed.
>
> But, IIUC both of disable and remove path are protected by event_mutex.
> So one cannot see the case of disabled event flag but enabled event, no?
No, the flag is not protect the trace event handler itself.
I meant that running handlers and the flag was not synchronized.
Thank you,
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
(2014/07/04 1:22), Oleg Nesterov wrote:
>> One possible scenario is here; someone disables an event and tries to remove
>> it (both will be done by different syscalls). If we don't synchronize
>> the first disabling, the event flag set disabled, but the event itself
>> is not disabled. Thus event handler is still possible to be running
>> somewhere when it is removed.
>
> See above. "remove" can't succed until all ftrace_event_file's are inactive.
> And probe_event_disable() does uprobe_unregister() in this case which (again)
> serializes with the callbacks itself.
Ah, I see. kprobes uses disable_kprobe() instead of unregister, and that
is not serialized.
>> The other path of __trace_remove_event_call() is trace_module_remove_events()
>> which is not related to kprobes/uprobes (Even so, there is no obvious check of
>> that.)
>
> Yes, uprobe can ignore modules ;)
>
>>> So why? Looks like, the only reason is instance_rmdir() which can do
>>> TRACE_REG_UNREGISTER and after that destroy this ftrace_event_file?
>>> But event_trace_del_tracer() also has synchronize_sched() right after
>>> __ftrace_set_clr_event_nolock() ?
>>
>> I think it doesn't need to call synchronize_sched() because
>> event_trace_del_tracer() ensures that all events are disabled
>> (handlers are not running anymore)
>
> Not really, afaics... Well yes, it calls __ftrace_set_clr_event_nolock(),
> but this can race with the callbacks; this doesn't necessary call
> uprobe_unregister() because there can be other active ftrace_event_file's.
> So we need to synchronize before we start to destroy the data.
>
> So do you agree that we can remove that synchronize_sched() in trace_uprobe.c
> and replace it with call_rcu?
Yes for uprobes, kprobes still need it. :)
Thank you,
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
(2014/07/04 2:01), Oleg Nesterov wrote:
> On 07/03, Oleg Nesterov wrote:
>>
>> Hmm. Off-topic, but it seems that instance_rmdir() leaks the memory? Say,
>> file->filter?
>
> Perhaps I am totally confused, but don't we need something like the patch
> below? I'll try to recheck later...
>
> Better yet, we can probably move destroy_preds() from event_remove() to
> remove_event_file_dir()... not sure, need to recheck.
Ah, yes, it seems event_remove releases preds, and remove_event_file_dir()
called from event_trace_del_tracer() doesn't release it.
BTW, with following patch, we'd better remove destroy_preds() from
event_remove() and add destroy_call_preds() at the very last of the
function.
Thank you,
>
> Oleg.
>
> --- x/kernel/trace/trace_events.c
> +++ x/kernel/trace/trace_events.c
> @@ -470,6 +470,7 @@ static void remove_event_file_dir(struct ftrace_event_file *file)
>
> list_del(&file->list);
> remove_subsystem(file->system);
> + destroy_file_preds(file);
> kmem_cache_free(file_cachep, file);
> }
>
>
>
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
Hi Masami,
On Fri, 04 Jul 2014 10:00:51 +0900, Masami Hiramatsu wrote:
> (2014/07/03 16:44), Namhyung Kim wrote:
>> Hi Masami,
>>
>> On Thu, 03 Jul 2014 14:46:09 +0900, Masami Hiramatsu wrote:
>>> One possible scenario is here; someone disables an event and tries to remove
>>> it (both will be done by different syscalls). If we don't synchronize
>>> the first disabling, the event flag set disabled, but the event itself
>>> is not disabled. Thus event handler is still possible to be running
>>> somewhere when it is removed.
>>
>> But, IIUC both of disable and remove path are protected by event_mutex.
>> So one cannot see the case of disabled event flag but enabled event, no?
>
> No, the flag is not protect the trace event handler itself.
> I meant that running handlers and the flag was not synchronized.
Ah, right. Thanks for explanation. :)
Thanks,
Namhyung
On 07/04, Masami Hiramatsu wrote:
>
> (2014/07/04 2:01), Oleg Nesterov wrote:
> > On 07/03, Oleg Nesterov wrote:
> >>
> >> Hmm. Off-topic, but it seems that instance_rmdir() leaks the memory? Say,
> >> file->filter?
> >
> > Perhaps I am totally confused, but don't we need something like the patch
> > below? I'll try to recheck later...
> >
> > Better yet, we can probably move destroy_preds() from event_remove() to
> > remove_event_file_dir()... not sure, need to recheck.
>
> Ah, yes, it seems event_remove releases preds, and remove_event_file_dir()
> called from event_trace_del_tracer() doesn't release it.
>
> BTW, with following patch, we'd better remove destroy_preds() from
> event_remove() and add destroy_call_preds() at the very last of the
> function.
Yes, please see "Better yet" above. And probably we can simply remove
destroy_preds() after that.
But I still need to reccheck, didn't have time today. And you know what?
call->filter logic looks broken too ;)
Oleg.