2013-07-02 19:39:29

by Oleg Nesterov

[permalink] [raw]
Subject: PATCH? trace_remove_event_call() should fail if call is active

To remind/summarise, the usage of unregister_trace_probe() in
trace_kprobe.c and trace_uprobe.c is racy.

unregister_trace_probe() checks trace_probe_is_enabled() but
we can't trust the result, we can race with kprobe_register()
which is going to set TP_FLAG_TRACE/PROFILE.

And we can't add the new lock (or reuse probe_lock) to avoid
the race. kprobe_register() is called under event_mutex, so
unregister_trace_probe() can't hold this new lock around
unregister_probe_event() which takes event_mutex.

And we shouldn't shift event_mutex from trace_remove_event_call()
to its callers, this lock should be private to the core tracing
code.


Masami, Steven. What do you think about the patch below?

To simplify, lets ignore trace_module_remove_events() which calls
__trace_remove_event_call() too, although at first glance this
case should be fine too.

It really seems to me that trace_remove_event_call() should not
succeed if this ftrace_event_call is "active". Even if we forget
about perf, ftrace_event_enable_disable(file, 0) doesn't guarantee
reg(call, TRACE_REG_UNREGISTER) if SOFT_MODE is set.



If something like this patch can work then we can fix trace_kprobe.
unregister_trace_probe() can do unregister_probe_event() first and
abort if trace_remove_event_call() fails.

As for release_all_trace_probes(), we lose the all-or-nothing
semantics, but probably this is fine.

Or I misunderstood this logic completely?

Oleg.
-------------------------------------------------------------------------------

In essence this change is one-liner, it does

- ftrace_event_enable_disable(file);
+ if (file->flags & FTRACE_EVENT_FL_ENABLED)
+ return -EBUSY;

and propagates the error.

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 4372658..f98ab06 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -332,7 +332,7 @@ extern int trace_define_field(struct ftrace_event_call *call, const char *type,
const char *name, int offset, int size,
int is_signed, int filter_type);
extern int trace_add_event_call(struct ftrace_event_call *call);
-extern void trace_remove_event_call(struct ftrace_event_call *call);
+extern int trace_remove_event_call(struct ftrace_event_call *call);

#define is_signed_type(type) (((type)(-1)) < (type)1)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 27963e2..876957c 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1473,15 +1473,20 @@ static void remove_event_from_tracers(struct ftrace_event_call *call)
} while_for_each_event_file();
}

-static void event_remove(struct ftrace_event_call *call)
+static int event_remove(struct ftrace_event_call *call)
{
struct trace_array *tr;
struct ftrace_event_file *file;

+#ifdef CONFIG_PERF_EVENTS
+ if (call->perf_refcount)
+ return -EBUSY;
+#endif
do_for_each_event_file(tr, file) {
if (file->event_call != call)
continue;
- ftrace_event_enable_disable(file, 0);
+ if (file->flags & FTRACE_EVENT_FL_ENABLED)
+ return -EBUSY;
/*
* The do_for_each_event_file() is
* a double loop. After finding the call for this
@@ -1495,6 +1500,7 @@ static void event_remove(struct ftrace_event_call *call)
__unregister_ftrace_event(&call->event);
remove_event_from_tracers(call);
list_del(&call->list);
+ return 0;
}

static int event_init(struct ftrace_event_call *call)
@@ -1604,21 +1610,30 @@ int trace_add_event_call(struct ftrace_event_call *call)
/*
* Must be called under locking both of event_mutex and trace_event_sem.
*/
-static void __trace_remove_event_call(struct ftrace_event_call *call)
+static int __trace_remove_event_call(struct ftrace_event_call *call)
{
- event_remove(call);
- trace_destroy_fields(call);
- destroy_preds(call);
+ int err = event_remove(call);
+
+ if (!err) {
+ trace_destroy_fields(call);
+ destroy_preds(call);
+ }
+
+ return err;
}

/* Remove an event_call */
-void trace_remove_event_call(struct ftrace_event_call *call)
+int trace_remove_event_call(struct ftrace_event_call *call)
{
+ int err;
+
mutex_lock(&event_mutex);
down_write(&trace_event_sem);
- __trace_remove_event_call(call);
+ err = __trace_remove_event_call(call);
up_write(&trace_event_sem);
mutex_unlock(&event_mutex);
+
+ return err;
}

#define for_each_event(event, start, end) \


2013-07-02 21:04:50

by Steven Rostedt

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

On Tue, 2013-07-02 at 21:34 +0200, Oleg Nesterov wrote:
> To remind/summarise, the usage of unregister_trace_probe() in
> trace_kprobe.c and trace_uprobe.c is racy.
>
> unregister_trace_probe() checks trace_probe_is_enabled() but
> we can't trust the result, we can race with kprobe_register()
> which is going to set TP_FLAG_TRACE/PROFILE.
>
> And we can't add the new lock (or reuse probe_lock) to avoid
> the race. kprobe_register() is called under event_mutex, so
> unregister_trace_probe() can't hold this new lock around
> unregister_probe_event() which takes event_mutex.
>
> And we shouldn't shift event_mutex from trace_remove_event_call()
> to its callers, this lock should be private to the core tracing
> code.
>
>
> Masami, Steven. What do you think about the patch below?
>
> To simplify, lets ignore trace_module_remove_events() which calls
> __trace_remove_event_call() too, although at first glance this
> case should be fine too.

We can't ignore modules. If it fails to be removed, then it will leak
memory.

>
> It really seems to me that trace_remove_event_call() should not
> succeed if this ftrace_event_call is "active". Even if we forget
> about perf, ftrace_event_enable_disable(file, 0) doesn't guarantee
> reg(call, TRACE_REG_UNREGISTER) if SOFT_MODE is set.
>
>
>
> If something like this patch can work then we can fix trace_kprobe.
> unregister_trace_probe() can do unregister_probe_event() first and
> abort if trace_remove_event_call() fails.
>
> As for release_all_trace_probes(), we lose the all-or-nothing
> semantics, but probably this is fine.
>
> Or I misunderstood this logic completely?

I'm actually worried about this change.

But what if we simply add a new event file flag called
FTRACE_EVENT_FL_SHUTDOWN. Which will not allow the event to be enabled
anymore. What do you think about this patch?

We set the SHUTDOWN flag, which is done under the event_mutex, and then
we can check the trace_probe_is_enabled(), as that gets set under the
event_mutex, and only if the SHUTDOWN flag was not previously set. If we
fail, then we just re-enable the event and return. Otherwise, we
shouldn't have to worry about the event being set again.

-- Steve

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 4372658..5b525d6 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -253,6 +253,7 @@ enum {
FTRACE_EVENT_FL_RECORDED_CMD_BIT,
FTRACE_EVENT_FL_SOFT_MODE_BIT,
FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
+ FTRACE_EVENT_FL_SHUTDOWN_BIT,
};

/*
@@ -262,12 +263,15 @@ enum {
* SOFT_MODE - The event is enabled/disabled by SOFT_DISABLED
* SOFT_DISABLED - When set, do not trace the event (even though its
* tracepoint may be enabled)
+ * SHUTDOWN - The event is going away, don't allow it to be enabled
+ * anymore.
*/
enum {
FTRACE_EVENT_FL_ENABLED = (1 << FTRACE_EVENT_FL_ENABLED_BIT),
FTRACE_EVENT_FL_RECORDED_CMD = (1 << FTRACE_EVENT_FL_RECORDED_CMD_BIT),
FTRACE_EVENT_FL_SOFT_MODE = (1 << FTRACE_EVENT_FL_SOFT_MODE_BIT),
FTRACE_EVENT_FL_SOFT_DISABLED = (1 << FTRACE_EVENT_FL_SOFT_DISABLED_BIT),
+ FTRACE_EVENT_FL_SHUTDOWN = (1 << FTRACE_EVENT_FL_SHUTDOWN_BIT),
};

struct ftrace_event_file {
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index c7fbf93..dd5f16d 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -229,6 +229,9 @@ extern struct mutex trace_types_lock;
extern int trace_array_get(struct trace_array *tr);
extern void trace_array_put(struct trace_array *tr);

+extern void ftrace_event_shutdown(struct ftrace_event_call *call);
+extern void ftrace_event_restart(struct ftrace_event_call *call);
+
/*
* The global tracer (top) should be the first trace array added,
* but we check the flag anyway.
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 734901d..ef282402 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -299,6 +299,13 @@ static int __ftrace_event_enable_disable(struct ftrace_event_file *file,
break;
case 1:
/*
+ * If the event is set to go away, do not allow anyone to
+ * enable it anymore.
+ */
+ if (file->flags & FTRACE_EVENT_FL_SHUTDOWN)
+ return -EBUSY;
+
+ /*
* When soft_disable is set and enable is set, we want to
* register the tracepoint for the event, but leave the event
* as is. That means, if the event was already enabled, we do
@@ -342,6 +349,56 @@ static int __ftrace_event_enable_disable(struct ftrace_event_file *file,
return ret;
}

+void ftrace_event_shutdown(struct ftrace_event_call *call)
+{
+ struct trace_array *tr;
+ struct ftrace_event_file *file;
+
+ mutex_lock(&trace_types_lock);
+ mutex_lock(&event_mutex);
+
+ do_for_each_event_file(tr, file) {
+ if (file->event_call != call)
+ continue;
+ set_bit(FTRACE_EVENT_FL_SHUTDOWN, &file->flags);
+ /*
+ * The do_for_each_event_file() is
+ * a double loop. After finding the call for this
+ * trace_array, we use break to jump to the next
+ * trace_array.
+ */
+ break;
+ } while_for_each_event_file();
+
+ mutex_unlock(&event_mutex);
+ mutex_unlock(&trace_types_lock);
+}
+
+void ftrace_event_restart(struct ftrace_event_call *call)
+{
+ struct trace_array *tr;
+ struct ftrace_event_file *file;
+
+ mutex_lock(&trace_types_lock);
+ mutex_lock(&event_mutex);
+
+ do_for_each_event_file(tr, file) {
+ if (file->event_call != call)
+ continue;
+ clear_bit(FTRACE_EVENT_FL_SHUTDOWN, &file->flags);
+ /*
+ * The do_for_each_event_file() is
+ * a double loop. After finding the call for this
+ * trace_array, we use break to jump to the next
+ * trace_array.
+ */
+ break;
+ } while_for_each_event_file();
+
+ mutex_unlock(&event_mutex);
+ mutex_unlock(&trace_types_lock);
+}
+
static int ftrace_event_enable_disable(struct ftrace_event_file *file,
int enable)
{
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 7ed6976..0408b9f 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -336,9 +336,14 @@ static void __unregister_trace_probe(struct trace_probe *tp)
/* Unregister a trace_probe and probe_event: call with locking probe_lock */
static int unregister_trace_probe(struct trace_probe *tp)
{
+ /* Prevent the event from being enabled */
+ ftrace_event_shutdown(tp->call);
+
/* Enabled event can not be unregistered */
- if (trace_probe_is_enabled(tp))
+ if (trace_probe_is_enabled(tp)) {
+ ftrace_event_restart(tp->call);
return -EBUSY;
+ }

__unregister_trace_probe(tp);
list_del(&tp->list);

2013-07-02 21:35:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

On Tue, 2013-07-02 at 17:04 -0400, Steven Rostedt wrote:

> But what if we simply add a new event file flag called
> FTRACE_EVENT_FL_SHUTDOWN. Which will not allow the event to be enabled
> anymore. What do you think about this patch?

Would be helpful to try to at least compile the patch ;-)

This one has been complied *and* tested.

-- Steve

Index: linux-trace.git/include/linux/ftrace_event.h
===================================================================
--- linux-trace.git.orig/include/linux/ftrace_event.h
+++ linux-trace.git/include/linux/ftrace_event.h
@@ -253,6 +253,7 @@ enum {
FTRACE_EVENT_FL_RECORDED_CMD_BIT,
FTRACE_EVENT_FL_SOFT_MODE_BIT,
FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
+ FTRACE_EVENT_FL_SHUTDOWN_BIT,
};

/*
@@ -262,12 +263,15 @@ enum {
* SOFT_MODE - The event is enabled/disabled by SOFT_DISABLED
* SOFT_DISABLED - When set, do not trace the event (even though its
* tracepoint may be enabled)
+ * SHUTDOWN - The event is going away, don't allow it to be enabled
+ * anymore.
*/
enum {
FTRACE_EVENT_FL_ENABLED = (1 << FTRACE_EVENT_FL_ENABLED_BIT),
FTRACE_EVENT_FL_RECORDED_CMD = (1 << FTRACE_EVENT_FL_RECORDED_CMD_BIT),
FTRACE_EVENT_FL_SOFT_MODE = (1 << FTRACE_EVENT_FL_SOFT_MODE_BIT),
FTRACE_EVENT_FL_SOFT_DISABLED = (1 << FTRACE_EVENT_FL_SOFT_DISABLED_BIT),
+ FTRACE_EVENT_FL_SHUTDOWN = (1 << FTRACE_EVENT_FL_SHUTDOWN_BIT),
};

struct ftrace_event_file {
Index: linux-trace.git/kernel/trace/trace.h
===================================================================
--- linux-trace.git.orig/kernel/trace/trace.h
+++ linux-trace.git/kernel/trace/trace.h
@@ -229,6 +229,9 @@ extern struct mutex trace_types_lock;
extern int trace_array_get(struct trace_array *tr);
extern void trace_array_put(struct trace_array *tr);

+extern void ftrace_event_shutdown(struct ftrace_event_call *call);
+extern void ftrace_event_restart(struct ftrace_event_call *call);
+
/*
* The global tracer (top) should be the first trace array added,
* but we check the flag anyway.
Index: linux-trace.git/kernel/trace/trace_events.c
===================================================================
--- linux-trace.git.orig/kernel/trace/trace_events.c
+++ linux-trace.git/kernel/trace/trace_events.c
@@ -299,6 +299,13 @@ static int __ftrace_event_enable_disable
break;
case 1:
/*
+ * If the event is set to go away, do not allow anyone to
+ * enable it anymore.
+ */
+ if (file->flags & FTRACE_EVENT_FL_SHUTDOWN)
+ return -EBUSY;
+
+ /*
* When soft_disable is set and enable is set, we want to
* register the tracepoint for the event, but leave the event
* as is. That means, if the event was already enabled, we do
@@ -342,6 +349,56 @@ static int __ftrace_event_enable_disable
return ret;
}

+void ftrace_event_shutdown(struct ftrace_event_call *call)
+{
+ struct trace_array *tr;
+ struct ftrace_event_file *file;
+
+ mutex_lock(&trace_types_lock);
+ mutex_lock(&event_mutex);
+
+ do_for_each_event_file(tr, file) {
+ if (file->event_call != call)
+ continue;
+ set_bit(FTRACE_EVENT_FL_SHUTDOWN, &file->flags);
+ /*
+ * The do_for_each_event_file() is
+ * a double loop. After finding the call for this
+ * trace_array, we use break to jump to the next
+ * trace_array.
+ */
+ break;
+ } while_for_each_event_file();
+
+ mutex_unlock(&event_mutex);
+ mutex_unlock(&trace_types_lock);
+}
+
+void ftrace_event_restart(struct ftrace_event_call *call)
+{
+ struct trace_array *tr;
+ struct ftrace_event_file *file;
+
+ mutex_lock(&trace_types_lock);
+ mutex_lock(&event_mutex);
+
+ do_for_each_event_file(tr, file) {
+ if (file->event_call != call)
+ continue;
+ clear_bit(FTRACE_EVENT_FL_SHUTDOWN, &file->flags);
+ /*
+ * The do_for_each_event_file() is
+ * a double loop. After finding the call for this
+ * trace_array, we use break to jump to the next
+ * trace_array.
+ */
+ break;
+ } while_for_each_event_file();
+
+ mutex_unlock(&event_mutex);
+ mutex_unlock(&trace_types_lock);
+}
+
static int ftrace_event_enable_disable(struct ftrace_event_file *file,
int enable)
{
Index: linux-trace.git/kernel/trace/trace_kprobe.c
===================================================================
--- linux-trace.git.orig/kernel/trace/trace_kprobe.c
+++ linux-trace.git/kernel/trace/trace_kprobe.c
@@ -336,9 +336,14 @@ static void __unregister_trace_probe(str
/* Unregister a trace_probe and probe_event: call with locking probe_lock */
static int unregister_trace_probe(struct trace_probe *tp)
{
+ /* Prevent the event from being enabled */
+ ftrace_event_shutdown(&tp->call);
+
/* Enabled event can not be unregistered */
- if (trace_probe_is_enabled(tp))
+ if (trace_probe_is_enabled(tp)) {
+ ftrace_event_restart(&tp->call);
return -EBUSY;
+ }

__unregister_trace_probe(tp);
list_del(&tp->list);

2013-07-02 21:42:58

by Oleg Nesterov

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

On 07/02, Steven Rostedt wrote:
>
> On Tue, 2013-07-02 at 21:34 +0200, Oleg Nesterov wrote:
> >
> > To simplify, lets ignore trace_module_remove_events() which calls
> > __trace_remove_event_call() too, although at first glance this
> > case should be fine too.
>
> We can't ignore modules. If it fails to be removed, then it will leak
> memory.

Ah, but I only meant "ignore for the moment", to simplify the discussion.
We can add the "force" argument for trace_module_remove_events(), or
we can add another do_for_each_event_file/FTRACE_EVENT_FL_ENABLED loop
into trace_remove_event_call() before __trace_remove_event_call().
This is trivial.

So please ignore modules ;)

> > It really seems to me that trace_remove_event_call() should not
> > succeed if this ftrace_event_call is "active". Even if we forget
> > about perf, ftrace_event_enable_disable(file, 0) doesn't guarantee
> > reg(call, TRACE_REG_UNREGISTER) if SOFT_MODE is set.
> >
> >
> >
> > If something like this patch can work then we can fix trace_kprobe.
> > unregister_trace_probe() can do unregister_probe_event() first and
> > abort if trace_remove_event_call() fails.
> >
> > As for release_all_trace_probes(), we lose the all-or-nothing
> > semantics, but probably this is fine.
> >
> > Or I misunderstood this logic completely?
>
> I'm actually worried about this change.

Me too.

But again. Any reason trace_remove_event_call() should succeed if
this this ftrace_event_call is active?

> But what if we simply add a new event file flag called
> FTRACE_EVENT_FL_SHUTDOWN. Which will not allow the event to be enabled
> anymore. What do you think about this patch?

It is too late for me to even try to read this patch...

But at first glance, how this patch can protect from the subsequent
perf_trace_event_open/perf_trace_add which can use this ->tp_event?

> We set the SHUTDOWN flag, which is done under the event_mutex, and then
> we can check the trace_probe_is_enabled(), as that gets set under the
> event_mutex, and only if the SHUTDOWN flag was not previously set. If we
> fail, then we just re-enable the event and return.

Heh. As a from-time-to-time reader of this code I am not worried ;)
This doesn't look simple. And if nothing else FL_SHUTDOWN doesn't
cover perf_trace_init() ?

Oleg.

2013-07-02 21:45:58

by Oleg Nesterov

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

On 07/02, Oleg Nesterov wrote:
>
> On 07/02, Steven Rostedt wrote:
> >
> > I'm actually worried about this change.
>
> Me too.
>
> But again. Any reason trace_remove_event_call() should succeed if
> this this ftrace_event_call is active?

And note that kprobes/uprobes are the only users of trace_remove_event_call(),
and both need the same semantics: kill it only if it is not active.

Oleg.

2013-07-02 22:28:49

by Oleg Nesterov

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

On 07/02, Oleg Nesterov wrote:
>
> So please ignore modules ;)

Or lets discuss the change above.

Oleg.

--- x/kernel/trace/trace_events.c
+++ x/kernel/trace/trace_events.c
@@ -1611,14 +1611,40 @@ static void __trace_remove_event_call(st
destroy_preds(call);
}

+static int event_can_remove(struct ftrace_event_call *call)
+{
+ struct trace_array *tr;
+ struct ftrace_event_file *file;
+
+#ifdef CONFIG_PERF_EVENTS
+ if (call->perf_refcount)
+ return -EBUSY;
+#endif
+ do_for_each_event_file(tr, file) {
+ if (file->event_call != call)
+ continue;
+ if (file->flags & FTRACE_EVENT_FL_ENABLED)
+ return -EBUSY;
+ break;
+ } while_for_each_event_file();
+
+ return 0;
+}
+
/* Remove an event_call */
-void trace_remove_event_call(struct ftrace_event_call *call)
+int trace_remove_event_call(struct ftrace_event_call *call)
{
+ int err;
+
mutex_lock(&event_mutex);
down_write(&trace_event_sem);
- __trace_remove_event_call(call);
+ err = event_can_remove(call);
+ if (!err)
+ __trace_remove_event_call(call);
up_write(&trace_event_sem);
mutex_unlock(&event_mutex);
+
+ return err;
}

#define for_each_event(event, start, end) \

2013-07-02 22:49:31

by Steven Rostedt

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

On Wed, 2013-07-03 at 00:23 +0200, Oleg Nesterov wrote:
> On 07/02, Oleg Nesterov wrote:
> >
> > So please ignore modules ;)
>
> Or lets discuss the change above.

I like this better than your first change, as the module code really
can't fail to remove events, as that will complicate things more than
they already are ;-)

>
> Oleg.
>
> --- x/kernel/trace/trace_events.c
> +++ x/kernel/trace/trace_events.c
> @@ -1611,14 +1611,40 @@ static void __trace_remove_event_call(st
> destroy_preds(call);
> }
>
> +static int event_can_remove(struct ftrace_event_call *call)

Should rename this to "event_is_busy" as we are only seeing if the event
is active or not, as we do allow for events to be removed when active.
This may confuse new reviewers.

> +{
> + struct trace_array *tr;
> + struct ftrace_event_file *file;
> +
> +#ifdef CONFIG_PERF_EVENTS
> + if (call->perf_refcount)
> + return -EBUSY;
> +#endif
> + do_for_each_event_file(tr, file) {
> + if (file->event_call != call)
> + continue;
> + if (file->flags & FTRACE_EVENT_FL_ENABLED)
> + return -EBUSY;
> + break;
> + } while_for_each_event_file();
> +
> + return 0;
> +}
> +
> /* Remove an event_call */
> -void trace_remove_event_call(struct ftrace_event_call *call)
> +int trace_remove_event_call(struct ftrace_event_call *call)

Probably should rename this to trace_probe_remove_event_call() as this
deals only with the probe callers (kprobe and uprobe). Also this code
will need to be documented a bit more.

Other than that, sure, go with it. Masami, you have any comments?

-- Steve

> {
> + int err;
> +
> mutex_lock(&event_mutex);
> down_write(&trace_event_sem);
> - __trace_remove_event_call(call);
> + err = event_can_remove(call);
> + if (!err)
> + __trace_remove_event_call(call);
> up_write(&trace_event_sem);
> mutex_unlock(&event_mutex);
> +
> + return err;
> }
>
> #define for_each_event(event, start, end) \

2013-07-02 22:53:54

by Steven Rostedt

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

On Wed, 2013-07-03 at 00:23 +0200, Oleg Nesterov wrote:

> /* Remove an event_call */
> -void trace_remove_event_call(struct ftrace_event_call *call)
> +int trace_remove_event_call(struct ftrace_event_call *call)
> {
> + int err;
> +
> mutex_lock(&event_mutex);
> down_write(&trace_event_sem);

Oh, BTW, can you rebase this on top of my for-next branch. That's what
I'll be pushing off to Linus, and this will conflict with those changes.

-- Steve

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git


> - __trace_remove_event_call(call);
> + err = event_can_remove(call);
> + if (!err)
> + __trace_remove_event_call(call);
> up_write(&trace_event_sem);
> mutex_unlock(&event_mutex);
> +
> + return err;
> }
>
> #define for_each_event(event, start, end) \

Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

(2013/07/03 7:23), Oleg Nesterov wrote:
> On 07/02, Oleg Nesterov wrote:
>>
>> So please ignore modules ;)
>
> Or lets discuss the change above.

No, I think this still doesn't ensure that we can remove dynamic
event safely. Since the event is related to several files under
events/ dir and buffer instances, someone can just stay open the
files while the event is removed and read/write it.
Perhaps, we need per-event_call refcounter not per-trace_array
one, and do as below.

Open file:
-> lock event_mutex
-> find event_call and event_file
-> get event_call
-> unlock event_mutex

Close:
-> lock event_mutex
-> put event_call
-> unlock event_mutex

Remove event (via kprobe_events):
-> lock event_mutex
-> find event_call
-> -EBUSY if event is enabled or refcount != 0
(here, no one accessing the event and not enabled)
-> unregister_kprobe
-> remove event
-> unlock event_mutex

The key is holding event_mutex *and* getting refcount
under any operation.
And of course, we can unregister the kprobe outside of
the event_mutex, but it still need a synchronize_sched
for safety.

-> lock event_mutex
-> wait_for_rcu (to ensure no disabled kprobe accesses the event)
-> find event_call
-> -EBUSY if event is enabled or refcount != 0
(here, no one accessing the event and not enabled)
-> remove event
-> unlock event_mutex
-> unregister_kprobe

Thank you,


--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-07-03 02:57:36

by Steven Rostedt

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

On Wed, 2013-07-03 at 11:42 +0900, Masami Hiramatsu wrote:
> (2013/07/03 7:23), Oleg Nesterov wrote:
> > On 07/02, Oleg Nesterov wrote:
> >>
> >> So please ignore modules ;)
> >
> > Or lets discuss the change above.
>
> No, I think this still doesn't ensure that we can remove dynamic
> event safely. Since the event is related to several files under
> events/ dir and buffer instances, someone can just stay open the
> files while the event is removed and read/write it.
> Perhaps, we need per-event_call refcounter not per-trace_array
> one, and do as below.

Or both. I need the trace-array counter for rmdir, and don't want to
check every event. But that doesn't mean we can't have a event counter.


>
> Open file:
> -> lock event_mutex
> -> find event_call and event_file
> -> get event_call
> -> unlock event_mutex
>
> Close:
> -> lock event_mutex
> -> put event_call
> -> unlock event_mutex
>
> Remove event (via kprobe_events):
> -> lock event_mutex
> -> find event_call
> -> -EBUSY if event is enabled or refcount != 0
> (here, no one accessing the event and not enabled)
> -> unregister_kprobe
> -> remove event
> -> unlock event_mutex
>
> The key is holding event_mutex *and* getting refcount
> under any operation.

And this would be to the event call, not the event file itself, right?

-- Steve

> And of course, we can unregister the kprobe outside of
> the event_mutex, but it still need a synchronize_sched
> for safety.
>
> -> lock event_mutex
> -> wait_for_rcu (to ensure no disabled kprobe accesses the event)
> -> find event_call
> -> -EBUSY if event is enabled or refcount != 0
> (here, no one accessing the event and not enabled)
> -> remove event
> -> unlock event_mutex
> -> unregister_kprobe
>
> Thank you,
>
>

Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

(2013/07/03 11:57), Steven Rostedt wrote:
> On Wed, 2013-07-03 at 11:42 +0900, Masami Hiramatsu wrote:
>> (2013/07/03 7:23), Oleg Nesterov wrote:
>>> On 07/02, Oleg Nesterov wrote:
>>>>
>>>> So please ignore modules ;)
>>>
>>> Or lets discuss the change above.
>>
>> No, I think this still doesn't ensure that we can remove dynamic
>> event safely. Since the event is related to several files under
>> events/ dir and buffer instances, someone can just stay open the
>> files while the event is removed and read/write it.
>> Perhaps, we need per-event_call refcounter not per-trace_array
>> one, and do as below.
>
> Or both. I need the trace-array counter for rmdir, and don't want to
> check every event. But that doesn't mean we can't have a event counter.

Agreed.

>> Open file:
>> -> lock event_mutex
>> -> find event_call and event_file
>> -> get event_call
>> -> unlock event_mutex
>>
>> Close:
>> -> lock event_mutex
>> -> put event_call
>> -> unlock event_mutex
>>
>> Remove event (via kprobe_events):
>> -> lock event_mutex
>> -> find event_call
>> -> -EBUSY if event is enabled or refcount != 0
>> (here, no one accessing the event and not enabled)
>> -> unregister_kprobe
>> -> remove event
>> -> unlock event_mutex
>>
>> The key is holding event_mutex *and* getting refcount
>> under any operation.
>
> And this would be to the event call, not the event file itself, right?

Right, because an event file just indicates an instance of the
event. This "Remove event" totally removes the event including
all instances.

Thank you,

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-07-03 17:25:24

by Oleg Nesterov

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

Masami,

I am not sure I understand. And please do not forget I am not
familiar with this code ;)

In short: do you think that the patch I sent "can't help" or
"not enough" ?

If "not enough" then I fully agree.

On 07/03, Masami Hiramatsu wrote:
>
> (2013/07/03 7:23), Oleg Nesterov wrote:
> > On 07/02, Oleg Nesterov wrote:
> >>
> >> So please ignore modules ;)
> >
> > Or lets discuss the change above.
>
> No, I think this still doesn't ensure that we can remove dynamic
> event safely. Since the event is related to several files under
> events/ dir and buffer instances, someone can just stay open the
> files while the event is removed and read/write it.

So, for example, event_enable_write() can happily play with
ftrace_event_file after unregister_trace_probe/free_trace_probe.

Did you mean this?

Sure, but this is another problem? And we already discussed it a
bit, an application can keep the file we need to remove opened.

As for event_enable_write() in particular, we can probably mark
file/call as dead somewhere in trace_remove_event_call(). But
I simply do not understand this code enough, I do not know what
else we should do.

IOW. So far _I think_ we just need the additional changes in
trace_remove_event_call() if it succeeds (with the patch I sent)
to prevent the races like above, but I didn't even try to think
about this problem.

Or I missed your point completely?

Oleg.

2013-07-03 17:59:11

by Oleg Nesterov

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

On 07/03, Oleg Nesterov wrote:
>
> IOW. So far _I think_ we just need the additional changes in
> trace_remove_event_call() if it succeeds (with the patch I sent)
> to prevent the races like above, but I didn't even try to think
> about this problem.

And I guess greatly underestimated the problem(s). When I look at
this code now, it seems that, say, event_enable_write() will use
the already freed ftrace_event_file in this case.

Still I think this is another (although closely related) problem.

> Or I missed your point completely?

Yes?

Oleg.

2013-07-03 18:02:32

by Steven Rostedt

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

On Wed, 2013-07-03 at 19:54 +0200, Oleg Nesterov wrote:
> On 07/03, Oleg Nesterov wrote:
> >
> > IOW. So far _I think_ we just need the additional changes in
> > trace_remove_event_call() if it succeeds (with the patch I sent)
> > to prevent the races like above, but I didn't even try to think
> > about this problem.
>
> And I guess greatly underestimated the problem(s). When I look at
> this code now, it seems that, say, event_enable_write() will use
> the already freed ftrace_event_file in this case.
>
> Still I think this is another (although closely related) problem.

Correct, and I think if we fix that problem, it will encapsulate fixing
the kprobe race too.

-- Steve

2013-07-03 19:22:39

by Oleg Nesterov

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

On 07/03, Steven Rostedt wrote:
>
> On Wed, 2013-07-03 at 19:54 +0200, Oleg Nesterov wrote:
> > On 07/03, Oleg Nesterov wrote:
> > >
> > > IOW. So far _I think_ we just need the additional changes in
> > > trace_remove_event_call() if it succeeds (with the patch I sent)
> > > to prevent the races like above, but I didn't even try to think
> > > about this problem.
> >
> > And I guess greatly underestimated the problem(s). When I look at
> > this code now, it seems that, say, event_enable_write() will use
> > the already freed ftrace_event_file in this case.
> >
> > Still I think this is another (although closely related) problem.
>
> Correct, and I think if we fix that problem, it will encapsulate fixing
> the kprobe race too.

I do not think so, but I can be easily wrong. Again, we shouldn't
destroy the event if there is a perf_event attached to this tp_event.
And we can't (afaics!) rely on TRACE_REG_UNREGISTER from event_remove()
paths, FTRACE_EVENT_FL_SOFT_MODE can nack it.

So I still think that we also need something like the patch I sent.
But please forget about this for the moment.

Can't we do something like below? Just in case, of course this change
is incomplete, just to explain what I mean... And of course I how no
idea if the change in debugfs is safe, I never looked into fs/debugfs
before. But, perhaps, somehow we can clear i_private under event_mutex
and kernel/trace can use file_inode() instead of filp->private_data ?

Oleg.


diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 4888cb3..c23d41e 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -475,6 +475,7 @@ static int __debugfs_remove(struct dentry *dentry, struct dentry *parent)
kfree(dentry->d_inode->i_private);
/* fall through */
default:
+ dentry->d_inode->i_private = NULL;
simple_unlink(parent->d_inode, dentry);
break;
}
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 27963e2..bdfd161 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -643,13 +643,10 @@ static ssize_t
event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
loff_t *ppos)
{
- struct ftrace_event_file *file = filp->private_data;
+ struct ftrace_event_file *file;
unsigned long val;
int ret;

- if (!file)
- return -EINVAL;
-
ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
if (ret)
return ret;
@@ -661,8 +658,11 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
switch (val) {
case 0:
case 1:
+ ret = -EINVAL;
mutex_lock(&event_mutex);
- ret = ftrace_event_enable_disable(file, val);
+ file = file_inode(filp)->i_private;
+ if (file)
+ ret = ftrace_event_enable_disable(file, val);
mutex_unlock(&event_mutex);
break;

2013-07-03 20:34:08

by Steven Rostedt

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

On Wed, 2013-07-03 at 21:17 +0200, Oleg Nesterov wrote:
> On 07/03, Steven Rostedt wrote:
> >
> > On Wed, 2013-07-03 at 19:54 +0200, Oleg Nesterov wrote:
> > > On 07/03, Oleg Nesterov wrote:
> > > >
> > > > IOW. So far _I think_ we just need the additional changes in
> > > > trace_remove_event_call() if it succeeds (with the patch I sent)
> > > > to prevent the races like above, but I didn't even try to think
> > > > about this problem.
> > >
> > > And I guess greatly underestimated the problem(s). When I look at
> > > this code now, it seems that, say, event_enable_write() will use
> > > the already freed ftrace_event_file in this case.
> > >
> > > Still I think this is another (although closely related) problem.
> >
> > Correct, and I think if we fix that problem, it will encapsulate fixing
> > the kprobe race too.
>
> I do not think so, but I can be easily wrong. Again, we shouldn't
> destroy the event if there is a perf_event attached to this tp_event.
> And we can't (afaics!) rely on TRACE_REG_UNREGISTER from event_remove()
> paths, FTRACE_EVENT_FL_SOFT_MODE can nack it.
>
> So I still think that we also need something like the patch I sent.
> But please forget about this for the moment.
>
> Can't we do something like below? Just in case, of course this change
> is incomplete, just to explain what I mean... And of course I how no
> idea if the change in debugfs is safe, I never looked into fs/debugfs
> before. But, perhaps, somehow we can clear i_private under event_mutex
> and kernel/trace can use file_inode() instead of filp->private_data ?
>
> Oleg.
>
>
> diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
> index 4888cb3..c23d41e 100644
> --- a/fs/debugfs/inode.c
> +++ b/fs/debugfs/inode.c
> @@ -475,6 +475,7 @@ static int __debugfs_remove(struct dentry *dentry, struct dentry *parent)
> kfree(dentry->d_inode->i_private);
> /* fall through */
> default:
> + dentry->d_inode->i_private = NULL;
> simple_unlink(parent->d_inode, dentry);
> break;
> }

No, I would avoid any changes to the debugfs infrastructure.

OK, what about the below patch, followed by an updated version of your
patch. I'll send that as a reply to this one.

-- Steve

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 4372658..72ff2c6 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -195,6 +195,7 @@ extern int ftrace_event_reg(struct ftrace_event_call *event,
enum trace_reg type, void *data);

enum {
+ TRACE_EVENT_FL_REF_MAX_BIT = 20,
TRACE_EVENT_FL_FILTERED_BIT,
TRACE_EVENT_FL_CAP_ANY_BIT,
TRACE_EVENT_FL_NO_SET_FILTER_BIT,
@@ -203,6 +204,8 @@ enum {
};

/*
+ * Event ref count uses the first 20 bits of the flags field.
+ *
* Event flags:
* FILTERED - The event has a filter attached
* CAP_ANY - Any user can enable for perf
@@ -213,6 +216,7 @@ enum {
* it is best to clear the buffers that used it).
*/
enum {
+ TRACE_EVENT_FL_REF_MAX = (1 << TRACE_EVENT_FL_REF_MAX_BIT),
TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT),
TRACE_EVENT_FL_CAP_ANY = (1 << TRACE_EVENT_FL_CAP_ANY_BIT),
TRACE_EVENT_FL_NO_SET_FILTER = (1 << TRACE_EVENT_FL_NO_SET_FILTER_BIT),
@@ -220,6 +224,8 @@ enum {
TRACE_EVENT_FL_WAS_ENABLED = (1 << TRACE_EVENT_FL_WAS_ENABLED_BIT),
};

+#define TRACE_EVENT_FL_REF_MASK (TRACE_EVENT_FL_REF_MAX - 1)
+
struct ftrace_event_call {
struct list_head list;
struct ftrace_event_class *class;
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 7d85429..16f3236 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -391,6 +391,23 @@ static void __get_system_dir(struct ftrace_subsystem_dir *dir)
__get_system(dir->subsystem);
}

+static int ftrace_event_call_get(struct ftrace_event_call *call)
+{
+ if ((call->flags & TRACE_EVENT_FL_REF_MASK) == TRACE_EVENT_FL_REF_MAX - 1)
+ return -EBUSY;
+
+ call->flags++;
+ return 0;
+}
+
+static void ftrace_event_call_put(struct ftrace_event_call *call)
+{
+ if (WARN_ON_ONCE(!(call->flags & TRACE_EVENT_FL_REF_MASK)))
+ return;
+
+ call->flags--;
+}
+
static void __put_system_dir(struct ftrace_subsystem_dir *dir)
{
WARN_ON_ONCE(dir->ref_count == 0);
@@ -424,7 +441,15 @@ static int tracing_open_generic_file(struct inode *inode, struct file *filp)

ret = tracing_open_generic(inode, filp);
if (ret < 0)
- trace_array_put(tr);
+ goto fail;
+
+ ret = ftrace_event_call_get(file->event_call);
+ if (ret < 0)
+ goto fail;
+
+ return 0;
+ fail:
+ trace_array_put(tr);
return ret;
}

@@ -433,12 +458,40 @@ static int tracing_release_generic_file(struct inode *inode, struct file *filp)
struct ftrace_event_file *file = inode->i_private;
struct trace_array *tr = file->tr;

+ ftrace_event_call_put(file->event_call);
trace_array_put(tr);

return 0;
}

/*
+ * Open and update call ref count.
+ * Must have ftrace_event_call passed in.
+ */
+static int tracing_open_generic_call(struct inode *inode, struct file *filp)
+{
+ struct ftrace_event_call *call = inode->i_private;
+
+ return ftrace_event_call_get(call);
+}
+
+static int tracing_release_generic_call(struct inode *inode, struct file *filp)
+{
+ struct ftrace_event_call *call = inode->i_private;
+
+ ftrace_event_call_put(call);
+ return 0;
+}
+
+static int tracing_seq_release_call(struct inode *inode, struct file *filp)
+{
+ struct ftrace_event_call *call = inode->i_private;
+
+ ftrace_event_call_put(call);
+ return seq_release(inode, filp);
+}
+
+/*
* __ftrace_set_clr_event(NULL, NULL, NULL, set) will set/unset all events.
*/
static int
@@ -949,10 +1002,16 @@ static int trace_format_open(struct inode *inode, struct file *file)
struct seq_file *m;
int ret;

- ret = seq_open(file, &trace_format_seq_ops);
+ ret = ftrace_event_call_get(call);
if (ret < 0)
return ret;

+ ret = seq_open(file, &trace_format_seq_ops);
+ if (ret < 0) {
+ ftrace_event_call_put(call);
+ return ret;
+ }
+
m = file->private_data;
m->private = call;

@@ -1260,20 +1319,22 @@ static const struct file_operations ftrace_event_format_fops = {
.open = trace_format_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release,
+ .release = tracing_seq_release_call,
};

static const struct file_operations ftrace_event_id_fops = {
- .open = tracing_open_generic,
+ .open = tracing_open_generic_call,
.read = event_id_read,
.llseek = default_llseek,
+ .release = tracing_release_generic_call,
};

static const struct file_operations ftrace_event_filter_fops = {
- .open = tracing_open_generic,
+ .open = tracing_open_generic_call,
.read = event_filter_read,
.write = event_filter_write,
.llseek = default_llseek,
+ .release = tracing_release_generic_call,
};

static const struct file_operations ftrace_subsystem_filter_fops = {

2013-07-03 20:36:30

by Steven Rostedt

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

On Wed, 2013-07-03 at 16:34 -0400, Steven Rostedt wrote:
> OK, what about the below patch, followed by an updated version of your
> patch. I'll send that as a reply to this one.

This is a modification of your patch:

If you like it, please add a proper change log and SOB tag. Oh, and we
need to still update trace_kprobes.c

-- Steve

From: Oleg Nesterov <[email protected]>

Index: linux-trace.git/kernel/trace/trace_events.c
===================================================================
--- linux-trace.git.orig/kernel/trace/trace_events.c
+++ linux-trace.git/kernel/trace/trace_events.c
@@ -1761,16 +1761,45 @@ static void __trace_remove_event_call(st
destroy_preds(call);
}

+static int probe_remove_event_call(struct ftrace_event_call *call)
+{
+ struct trace_array *tr;
+ struct ftrace_event_file *file;
+
+ if (call->flags & TRACE_EVENT_FL_REF_MASK)
+ return -EBUSY;
+
+#ifdef CONFIG_PERF_EVENTS
+ if (call->perf_refcount)
+ return -EBUSY;
+#endif
+ do_for_each_event_file(tr, file) {
+ if (file->event_call != call)
+ continue;
+ if (file->flags & FTRACE_EVENT_FL_ENABLED)
+ return -EBUSY;
+ break;
+ } while_for_each_event_file();
+
+ __trace_remove_event_call(call);
+
+ return 0;
+}
+
/* Remove an event_call */
-void trace_remove_event_call(struct ftrace_event_call *call)
+int trace_remove_event_call(struct ftrace_event_call *call)
{
+ int ret;
+
mutex_lock(&trace_types_lock);
mutex_lock(&event_mutex);
down_write(&trace_event_sem);
- __trace_remove_event_call(call);
+ ret = probe_remove_event_call(call);
up_write(&trace_event_sem);
mutex_unlock(&event_mutex);
mutex_unlock(&trace_types_lock);
+
+ return ret;
}

#define for_each_event(event, start, end) \
Index: linux-trace.git/include/linux/ftrace_event.h
===================================================================
--- linux-trace.git.orig/include/linux/ftrace_event.h
+++ linux-trace.git/include/linux/ftrace_event.h
@@ -338,7 +338,7 @@ extern int trace_define_field(struct ftr
const char *name, int offset, int size,
int is_signed, int filter_type);
extern int trace_add_event_call(struct ftrace_event_call *call);
-extern void trace_remove_event_call(struct ftrace_event_call *call);
+extern int trace_remove_event_call(struct ftrace_event_call *call);

#define is_signed_type(type) (((type)(-1)) < (type)1)



2013-07-03 21:02:48

by Steven Rostedt

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

On Wed, 2013-07-03 at 21:17 +0200, Oleg Nesterov wrote:

> diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
> index 4888cb3..c23d41e 100644
> --- a/fs/debugfs/inode.c
> +++ b/fs/debugfs/inode.c
> @@ -475,6 +475,7 @@ static int __debugfs_remove(struct dentry *dentry, struct dentry *parent)
> kfree(dentry->d_inode->i_private);
> /* fall through */
> default:
> + dentry->d_inode->i_private = NULL;
> simple_unlink(parent->d_inode, dentry);
> break;
> }

Ah, I see what you are saying. If the file is being opened just as it is
being deleted, it can up the dentry refcount and prevent it from
actually being deleted at that point. :-p

There's a slight race that we can get to the open call when the dentry
was deleted. But can it? Seems that there would be other places that
have issues as I would think it would be common to do something like:

data = kmalloc(size, GFP_KERNEL);
dentry = debugfs_create_file("file", 0644, parent, data, &ops);

[...]

debugfs_remove(dentry);
kfree(data);

Any thing like this would have issues if the open referenced the data.

-- Steve

2013-07-03 22:24:40

by Oleg Nesterov

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

On 07/03, Steven Rostedt wrote:
>
> No, I would avoid any changes to the debugfs infrastructure.

YEs, agreed.

> OK, what about the below patch, followed by an updated version of your
> patch. I'll send that as a reply to this one.

Steven, you do understand that I can't review the changes in this area.

But at first glance, _I think_ this should work. And this is much simpler,
->open() blocks trace_remove_event_call() (you added TRACE_EVENT_FL_REF_MASK
check into the next patch).

Which tree this patch is based on? I have pulled linux-trace.git#for-next
and I do not see tracing_open_generic_file/etc in trace_events.c.

I do not understand what protects call->flags, I guess there is another
lock which I do not see in my tree?

Oleg.

2013-07-03 23:16:09

by Oleg Nesterov

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

On 07/03, Steven Rostedt wrote:
>
> On Wed, 2013-07-03 at 16:34 -0400, Steven Rostedt wrote:
> > OK, what about the below patch, followed by an updated version of your
> > patch. I'll send that as a reply to this one.
>
> This is a modification of your patch:

Thanks.

> If you like it, please add a proper change log and SOB tag. Oh, and we
> need to still update trace_kprobes.c

and trace_uprobes.c. Please find the changelog below.


tracing: trace_remove_event_call() should fail if call/file is in use

Change trace_remove_event_call(call) to return the error if this
call is active. This is what the callers assume but can't verify
outside of the tracing locks. Both trace_kprobe.c/trace_uprobe.c
need the additional changes, unregister_trace_probe() should abort
if trace_remove_event_call() fails.

We also check TRACE_EVENT_FL_REF_MASK to ensure that nobody opened
the files we are going to remove, these means that nobody can access
the soon-to-be-freed ftrace_event_file/call via filp->private_data.

Signed-off-by: Oleg Nesterov <[email protected]>

2013-07-04 00:19:41

by Steven Rostedt

[permalink] [raw]
Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

On Thu, 2013-07-04 at 00:18 +0200, Oleg Nesterov wrote:
> On 07/03, Steven Rostedt wrote:
> >
> > No, I would avoid any changes to the debugfs infrastructure.
>
> YEs, agreed.
>
> > OK, what about the below patch, followed by an updated version of your
> > patch. I'll send that as a reply to this one.
>
> Steven, you do understand that I can't review the changes in this area.

I have more faith in you than you do ;-)

>
> But at first glance, _I think_ this should work. And this is much simpler,
> ->open() blocks trace_remove_event_call() (you added TRACE_EVENT_FL_REF_MASK
> check into the next patch).

Yep.

>
> Which tree this patch is based on? I have pulled linux-trace.git#for-next
> and I do not see tracing_open_generic_file/etc in trace_events.c.

Ug! Thanks! I posted my [for-next] series but never pushed it to my git
tree. I just pushed it now. I'm glad you told me this because I was
under the assumption that the code was already in my kernel.org repo,
and I would have pushed to Linus thinking it was already in linux-next
and would have been embarrassed if something went wrong.

>
> I do not understand what protects call->flags, I guess there is another
> lock which I do not see in my tree?

Those flags should only be set under the event_mutex lock. But I see I
didn't do that :-) Yeah, I need to add locks for that.

See, you can review my patch and provide valuable feedback!

-- Steve

Subject: Re: PATCH? trace_remove_event_call() should fail if call is active

(2013/07/04 6:02), Steven Rostedt wrote:
> On Wed, 2013-07-03 at 21:17 +0200, Oleg Nesterov wrote:
>
>> diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
>> index 4888cb3..c23d41e 100644
>> --- a/fs/debugfs/inode.c
>> +++ b/fs/debugfs/inode.c
>> @@ -475,6 +475,7 @@ static int __debugfs_remove(struct dentry *dentry, struct dentry *parent)
>> kfree(dentry->d_inode->i_private);
>> /* fall through */
>> default:
>> + dentry->d_inode->i_private = NULL;
>> simple_unlink(parent->d_inode, dentry);
>> break;
>> }
>
> Ah, I see what you are saying. If the file is being opened just as it is
> being deleted, it can up the dentry refcount and prevent it from
> actually being deleted at that point. :-p

Yeah, that's actually what I'd like to point out.,,

>
> There's a slight race that we can get to the open call when the dentry
> was deleted. But can it? Seems that there would be other places that
> have issues as I would think it would be common to do something like:
>
> data = kmalloc(size, GFP_KERNEL);
> dentry = debugfs_create_file("file", 0644, parent, data, &ops);
>
> [...]
>
> debugfs_remove(dentry);
> kfree(data);
>
> Any thing like this would have issues if the open referenced the data.

I'm not so sure about vfs layer, but yes, I think that pattern may be
always unsafe. Perhaps, we need to wait the entry is surely removed
before freeing the data.

Thanks,

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]