2013-07-26 17:31:51

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 0/6] tracing: open/delete fixes

Hello.

Testing: seems to work, but assumes that debugs was fixed too. Hopefully
"debugfs: debugfs_remove_recursive() must not rely on list_empty(d_subdirs)"
is fine.

(We should probably make debugfs_remove_recursive() to return the error
and trace_events.c should warn if it fails)

However: I am still testing this patches, so this is still mostly for
review. I'll report if I find anything, but to remind we have other
bugs which need more fixes. And in fact I need to re-read this series
to convince myself it should work, I was distracted by the unexpected
problem in debugfs.

Changes: a lot but all cosmetic, addresses the comments from Steven.

Steven, 1/6 was not changed, I assume you agreed with my reply.

Masami, I'll appreciate your review ;)

Oleg.

kernel/trace/trace_events.c | 155 +++++++++++++++++++-----------------
kernel/trace/trace_events_filter.c | 17 ++---
2 files changed, 87 insertions(+), 85 deletions(-)


2013-07-26 17:30:57

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 1/6] tracing: Turn event/id->i_private into call->event.type

event_id_read() is racy, ftrace_event_call can be already freed
by trace_remove_event_call() callers.

Change event_create_dir() to pass "data = call->event.type", this
is all event_id_read() needs. ftrace_event_id_fops no longer needs
tracing_open_generic().

We add the new helper, event_file_data(), to read ->i_private, it
will have more users.

Note: currently ACCESS_ONCE() and "id != 0" check are not needed,
but we are going to change event_remove/rmdir to clear ->i_private.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_events.c | 17 ++++++++++++-----
1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 76eed53..77990c4 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -409,6 +409,11 @@ static void put_system(struct ftrace_subsystem_dir *dir)
mutex_unlock(&event_mutex);
}

+static void *event_file_data(struct file *filp)
+{
+ return ACCESS_ONCE(file_inode(filp)->i_private);
+}
+
/*
* Open and update trace_array ref count.
* Must have the current trace_array passed to it.
@@ -946,14 +951,17 @@ static int trace_format_open(struct inode *inode, struct file *file)
static ssize_t
event_id_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
{
- struct ftrace_event_call *call = filp->private_data;
+ int id = (long)event_file_data(filp);
char buf[32];
int len;

if (*ppos)
return 0;

- len = sprintf(buf, "%d\n", call->event.type);
+ if (unlikely(!id))
+ return -ENODEV;
+
+ len = sprintf(buf, "%d\n", id);
return simple_read_from_buffer(ubuf, cnt, ppos, buf, len);
}

@@ -1240,7 +1248,6 @@ static const struct file_operations ftrace_event_format_fops = {
};

static const struct file_operations ftrace_event_id_fops = {
- .open = tracing_open_generic,
.read = event_id_read,
.llseek = default_llseek,
};
@@ -1488,8 +1495,8 @@ event_create_dir(struct dentry *parent,

#ifdef CONFIG_PERF_EVENTS
if (call->event.type && call->class->reg)
- trace_create_file("id", 0444, file->dir, call,
- id);
+ trace_create_file("id", 0444, file->dir,
+ (void *)(long)call->event.type, id);
#endif

/*
--
1.5.5.1

2013-07-26 17:31:08

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 3/6] tracing: Change event_filter_read/write to verify i_private != NULL

event_filter_read/write() are racy, ftrace_event_call can be already
freed by trace_remove_event_call() callers.

1. Shift mutex_lock(event_mutex) from print/apply_event_filter to
the callers.

2. Change the callers, event_filter_read() and event_filter_write()
to read i_private under this mutex and abort if it is NULL.

This fixes nothing, but now we can change debugfs_remove("filter")
callers to nullify ->i_private and fix the the problem.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_events.c | 28 +++++++++++++++++++---------
kernel/trace/trace_events_filter.c | 17 ++++++-----------
2 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 39cb049..b5144c4 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -978,24 +978,30 @@ static ssize_t
event_filter_read(struct file *filp, char __user *ubuf, size_t cnt,
loff_t *ppos)
{
- struct ftrace_event_call *call = filp->private_data;
+ struct ftrace_event_call *call;
struct trace_seq *s;
- int r;
+ int r = -ENODEV;

if (*ppos)
return 0;

s = kmalloc(sizeof(*s), GFP_KERNEL);
+
if (!s)
return -ENOMEM;

trace_seq_init(s);

- print_event_filter(call, s);
- r = simple_read_from_buffer(ubuf, cnt, ppos, s->buffer, s->len);
+ mutex_lock(&event_mutex);
+ call = event_file_data(filp);
+ if (call)
+ print_event_filter(call, s);
+ mutex_unlock(&event_mutex);

- kfree(s);
+ if (call)
+ r = simple_read_from_buffer(ubuf, cnt, ppos, s->buffer, s->len);

+ kfree(s);
return r;
}

@@ -1003,9 +1009,9 @@ static ssize_t
event_filter_write(struct file *filp, const char __user *ubuf, size_t cnt,
loff_t *ppos)
{
- struct ftrace_event_call *call = filp->private_data;
+ struct ftrace_event_call *call;
char *buf;
- int err;
+ int err = -ENODEV;

if (cnt >= PAGE_SIZE)
return -EINVAL;
@@ -1020,13 +1026,17 @@ event_filter_write(struct file *filp, const char __user *ubuf, size_t cnt,
}
buf[cnt] = '\0';

- err = apply_event_filter(call, buf);
+ mutex_lock(&event_mutex);
+ call = event_file_data(filp);
+ if (call)
+ err = apply_event_filter(call, buf);
+ mutex_unlock(&event_mutex);
+
free_page((unsigned long) buf);
if (err < 0)
return err;

*ppos += cnt;
-
return cnt;
}

diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 0c7b75a..97daa8c 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -637,17 +637,15 @@ static void append_filter_err(struct filter_parse_state *ps,
free_page((unsigned long) buf);
}

+/* caller must hold event_mutex */
void print_event_filter(struct ftrace_event_call *call, struct trace_seq *s)
{
- struct event_filter *filter;
+ struct event_filter *filter = call->filter;

- mutex_lock(&event_mutex);
- filter = call->filter;
if (filter && filter->filter_string)
trace_seq_printf(s, "%s\n", filter->filter_string);
else
trace_seq_puts(s, "none\n");
- mutex_unlock(&event_mutex);
}

void print_subsystem_event_filter(struct event_subsystem *system,
@@ -1841,23 +1839,22 @@ static int create_system_filter(struct event_subsystem *system,
return err;
}

+/* caller must hold event_mutex */
int apply_event_filter(struct ftrace_event_call *call, char *filter_string)
{
struct event_filter *filter;
- int err = 0;
-
- mutex_lock(&event_mutex);
+ int err;

if (!strcmp(strstrip(filter_string), "0")) {
filter_disable(call);
filter = call->filter;
if (!filter)
- goto out_unlock;
+ return 0;
RCU_INIT_POINTER(call->filter, NULL);
/* Make sure the filter is not being used */
synchronize_sched();
__free_filter(filter);
- goto out_unlock;
+ return 0;
}

err = create_filter(call, filter_string, true, &filter);
@@ -1884,8 +1881,6 @@ int apply_event_filter(struct ftrace_event_call *call, char *filter_string)
__free_filter(tmp);
}
}
-out_unlock:
- mutex_unlock(&event_mutex);

return err;
}
--
1.5.5.1

2013-07-26 17:31:12

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 4/6] tracing: Change f_start() to take event_mutex and verify i_private != NULL

trace_format_open() and trace_format_seq_ops are racy, nothing
protects ftrace_event_call from trace_remove_event_call().

Change f_start() to take event_mutex and verify i_private != NULL,
change f_stop() to drop this lock.

This fixes nothing, but now we can change debugfs_remove("format")
callers to nullify ->i_private and fix the the problem.

Note: the usage of event_mutex is sub-optimal but simple, we can
change this later.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_events.c | 13 +++++++++----
1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index b5144c4..3de2aca 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -839,7 +839,7 @@ enum {

static void *f_next(struct seq_file *m, void *v, loff_t *pos)
{
- struct ftrace_event_call *call = m->private;
+ struct ftrace_event_call *call = event_file_data(m->private);
struct list_head *common_head = &ftrace_common_fields;
struct list_head *head = trace_get_fields(call);
struct list_head *node = v;
@@ -871,7 +871,7 @@ static void *f_next(struct seq_file *m, void *v, loff_t *pos)

static int f_show(struct seq_file *m, void *v)
{
- struct ftrace_event_call *call = m->private;
+ struct ftrace_event_call *call = event_file_data(m->private);
struct ftrace_event_field *field;
const char *array_descriptor;

@@ -924,6 +924,11 @@ static void *f_start(struct seq_file *m, loff_t *pos)
void *p = (void *)FORMAT_HEADER;
loff_t l = 0;

+ /* ->stop() is called even if ->start() fails */
+ mutex_lock(&event_mutex);
+ if (!event_file_data(m->private))
+ return ERR_PTR(-ENODEV);
+
while (l < *pos && p)
p = f_next(m, p, &l);

@@ -932,6 +937,7 @@ static void *f_start(struct seq_file *m, loff_t *pos)

static void f_stop(struct seq_file *m, void *p)
{
+ mutex_unlock(&event_mutex);
}

static const struct seq_operations trace_format_seq_ops = {
@@ -943,7 +949,6 @@ static const struct seq_operations trace_format_seq_ops = {

static int trace_format_open(struct inode *inode, struct file *file)
{
- struct ftrace_event_call *call = inode->i_private;
struct seq_file *m;
int ret;

@@ -952,7 +957,7 @@ static int trace_format_open(struct inode *inode, struct file *file)
return ret;

m = file->private_data;
- m->private = call;
+ m->private = file;

return 0;
}
--
1.5.5.1

2013-07-26 17:31:15

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 6/6] tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_private

Change remove_event_file_dir() to clear ->i_private for every
file we are going to remove.

tracing_open_generic_file() and tracing_release_generic_file()
can go away, ftrace_enable_fops and ftrace_event_filter_fops()
use tracing_open_generic() but only to check tracing_disabled.

This fixes all races with event_remove() or instance_delete().
f_op->read/write/whatever can never use the freed file/call,
all event/* files were changed to check and use ->i_private
under event_mutex.

Note: this doesn't not fix other problems, event_remove() can
destroy the active ftrace_event_call, we need more changes but
those changes are completely orthogonal.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_events.c | 41 +++++++++--------------------------------
1 files changed, 9 insertions(+), 32 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 2a4f68a..1112fa0 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -428,42 +428,20 @@ static void *event_file_data(struct file *filp)

static void remove_event_file_dir(struct ftrace_event_file *file)
{
+ struct dentry *dir = file->dir;
+ struct dentry *child;
+
+ /* ->i_mutex is not needed, nobody can create/remove a file */
+ list_for_each_entry(child, &dir->d_subdirs, d_u.d_child)
+ child->d_inode->i_private = NULL;
+
list_del(&file->list);
- debugfs_remove_recursive(file->dir);
+ debugfs_remove_recursive(dir);
remove_subsystem(file->system);
kmem_cache_free(file_cachep, file);
}

/*
- * Open and update trace_array ref count.
- * Must have the current trace_array passed to it.
- */
-static int tracing_open_generic_file(struct inode *inode, struct file *filp)
-{
- struct ftrace_event_file *file = inode->i_private;
- struct trace_array *tr = file->tr;
- int ret;
-
- if (trace_array_get(tr) < 0)
- return -ENODEV;
-
- ret = tracing_open_generic(inode, filp);
- if (ret < 0)
- trace_array_put(tr);
- return ret;
-}
-
-static int tracing_release_generic_file(struct inode *inode, struct file *filp)
-{
- struct ftrace_event_file *file = inode->i_private;
- struct trace_array *tr = file->tr;
-
- trace_array_put(tr);
-
- return 0;
-}
-
-/*
* __ftrace_set_clr_event(NULL, NULL, NULL, set) will set/unset all events.
*/
static int
@@ -1277,10 +1255,9 @@ static const struct file_operations ftrace_set_event_fops = {
};

static const struct file_operations ftrace_enable_fops = {
- .open = tracing_open_generic_file,
+ .open = tracing_open_generic,
.read = event_enable_read,
.write = event_enable_write,
- .release = tracing_release_generic_file,
.llseek = default_llseek,
};

--
1.5.5.1

2013-07-26 17:31:33

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 2/6] tracing: Change event_enable/disable_read() to verify i_private != NULL

tracing_open_generic_file() is racy, ftrace_event_file can be
already freed by rmdir or trace_remove_event_call().

Change event_enable_read() and event_disable_read() to read and
verify "file = i_private" under event_mutex.

This fixes nothing, but now we can change debugfs_remove("enable")
callers to nullify ->i_private and fix the the problem.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_events.c | 31 ++++++++++++++++++++-----------
1 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 77990c4..39cb049 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -684,19 +684,28 @@ static ssize_t
event_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
loff_t *ppos)
{
- struct ftrace_event_file *file = filp->private_data;
+ struct ftrace_event_file *file;
+ unsigned long flags;
char buf[4] = "0";

- if (file->flags & FTRACE_EVENT_FL_ENABLED &&
- !(file->flags & FTRACE_EVENT_FL_SOFT_DISABLED))
+ mutex_lock(&event_mutex);
+ file = event_file_data(filp);
+ if (likely(file))
+ flags = file->flags;
+ mutex_unlock(&event_mutex);
+
+ if (!file)
+ return -ENODEV;
+
+ if (flags & FTRACE_EVENT_FL_ENABLED &&
+ !(flags & FTRACE_EVENT_FL_SOFT_DISABLED))
strcpy(buf, "1");

- if (file->flags & FTRACE_EVENT_FL_SOFT_DISABLED ||
- file->flags & FTRACE_EVENT_FL_SOFT_MODE)
+ if (flags & FTRACE_EVENT_FL_SOFT_DISABLED ||
+ flags & FTRACE_EVENT_FL_SOFT_MODE)
strcat(buf, "*");

strcat(buf, "\n");
-
return simple_read_from_buffer(ubuf, cnt, ppos, buf, strlen(buf));
}

@@ -704,13 +713,10 @@ static ssize_t
event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
loff_t *ppos)
{
- struct ftrace_event_file *file = filp->private_data;
+ struct ftrace_event_file *file;
unsigned long val;
int ret;

- if (!file)
- return -EINVAL;
-
ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
if (ret)
return ret;
@@ -722,8 +728,11 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
switch (val) {
case 0:
case 1:
+ ret = -ENODEV;
mutex_lock(&event_mutex);
- ret = ftrace_event_enable_disable(file, val);
+ file = event_file_data(filp);
+ if (likely(file))
+ ret = ftrace_event_enable_disable(file, val);
mutex_unlock(&event_mutex);
break;

--
1.5.5.1

2013-07-26 17:32:10

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 5/6] tracing: Introduce remove_event_file_dir()

Preparation for the next patch. Extract the common code from
remove_event_from_tracers() and __trace_remove_event_dirs()
into the new helper, remove_event_file_dir().

The patch looks more complicated than it actually is, it also
moves remove_subsystem() up to avoid the forward declaration.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_events.c | 47 +++++++++++++++++++++----------------------
1 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 3de2aca..2a4f68a 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -409,11 +409,31 @@ static void put_system(struct ftrace_subsystem_dir *dir)
mutex_unlock(&event_mutex);
}

+static void remove_subsystem(struct ftrace_subsystem_dir *dir)
+{
+ if (!dir)
+ return;
+
+ if (!--dir->nr_events) {
+ debugfs_remove_recursive(dir->entry);
+ list_del(&dir->list);
+ __put_system_dir(dir);
+ }
+}
+
static void *event_file_data(struct file *filp)
{
return ACCESS_ONCE(file_inode(filp)->i_private);
}

+static void remove_event_file_dir(struct ftrace_event_file *file)
+{
+ list_del(&file->list);
+ debugfs_remove_recursive(file->dir);
+ remove_subsystem(file->system);
+ kmem_cache_free(file_cachep, file);
+}
+
/*
* Open and update trace_array ref count.
* Must have the current trace_array passed to it.
@@ -1545,33 +1565,16 @@ event_create_dir(struct dentry *parent,
return 0;
}

-static void remove_subsystem(struct ftrace_subsystem_dir *dir)
-{
- if (!dir)
- return;
-
- if (!--dir->nr_events) {
- debugfs_remove_recursive(dir->entry);
- list_del(&dir->list);
- __put_system_dir(dir);
- }
-}
-
static void remove_event_from_tracers(struct ftrace_event_call *call)
{
struct ftrace_event_file *file;
struct trace_array *tr;

do_for_each_event_file_safe(tr, file) {
-
if (file->event_call != call)
continue;

- list_del(&file->list);
- debugfs_remove_recursive(file->dir);
- remove_subsystem(file->system);
- kmem_cache_free(file_cachep, file);
-
+ remove_event_file_dir(file);
/*
* The do_for_each_event_file_safe() is
* a double loop. After finding the call for this
@@ -2301,12 +2304,8 @@ __trace_remove_event_dirs(struct trace_array *tr)
{
struct ftrace_event_file *file, *next;

- list_for_each_entry_safe(file, next, &tr->events, list) {
- list_del(&file->list);
- debugfs_remove_recursive(file->dir);
- remove_subsystem(file->system);
- kmem_cache_free(file_cachep, file);
- }
+ list_for_each_entry_safe(file, next, &tr->events, list)
+ remove_event_file_dir(file);
}

static void
--
1.5.5.1

2013-07-28 18:39:45

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] tracing: open/delete fixes

On 07/26, Oleg Nesterov wrote:
>
> Testing: seems to work, but assumes that debugs was fixed too. Hopefully
> "debugfs: debugfs_remove_recursive() must not rely on list_empty(d_subdirs)"
> is fine.
>
> (We should probably make debugfs_remove_recursive() to return the error
> and trace_events.c should warn if it fails)
>
> However: I am still testing this patches, so this is still mostly for
> review. I'll report if I find anything,

I tried to play more with these patches, and didn't find anything wrong.

However, after I re-read 6/6 I think it should be updated:

- remove_event_file_dir() needs to check file->dir != NULL

it can be NULL if event_create_dir()->debugfs_create_dir()
fails, or if event_create_dir() fails before.

- I've also added spin_lock(d_lock) and d_inode != NULL check.

I do not think this is needed. Afaics __create_file() can't
leave the !debugfs_positive() file on ->d_subdirs list if
debugfs_get_inode() fails, dput()->d_kill() should remove it.

But I do not understand this code enough, and debugfs checks
d_inode != NULL all the time.

I am sending "PATCH v3 6/6" in reply to "PATCH v2 6/6". See the diff
below.

So I am waiting for your review. If you and Masami agree with these
changes I'll resend other fixes (1 from me and 2 from Steven) on top
of this series.

Oleg.

--- x/kernel/trace/trace_events.c
+++ x/kernel/trace/trace_events.c
@@ -431,12 +431,18 @@ static void remove_event_file_dir(struct
struct dentry *dir = file->dir;
struct dentry *child;

- /* ->i_mutex is not needed, nobody can create/remove a file */
- list_for_each_entry(child, &dir->d_subdirs, d_u.d_child)
- child->d_inode->i_private = NULL;
+ if (dir) {
+ spin_lock(&dir->d_lock); /* probably unneeded */
+ list_for_each_entry(child, &dir->d_subdirs, d_u.d_child) {
+ if (child->d_inode) /* probably unneeded */
+ child->d_inode->i_private = NULL;
+ }
+ spin_unlock(&dir->d_lock);
+
+ debugfs_remove_recursive(dir);
+ }

list_del(&file->list);
- debugfs_remove_recursive(dir);
remove_subsystem(file->system);
kmem_cache_free(file_cachep, file);
}

2013-07-28 18:40:56

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v3 6/6] tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_private

Change remove_event_file_dir() to clear ->i_private for every
file we are going to remove.

We need to check file->dir != NULL because event_create_dir()
can fail. debugfs_remove_recursive(NULL) is fine but the patch
moves it under the same check anyway for readability.

spin_lock(d_lock) and "d_inode != NULL" check are not needed
afaics, but I do not understand this code enough.

tracing_open_generic_file() and tracing_release_generic_file()
can go away, ftrace_enable_fops and ftrace_event_filter_fops()
use tracing_open_generic() but only to check tracing_disabled.

This fixes all races with event_remove() or instance_delete().
f_op->read/write/whatever can never use the freed file/call,
all event/* files were changed to check and use ->i_private
under event_mutex.

Note: this doesn't not fix other problems, event_remove() can
destroy the active ftrace_event_call, we need more changes but
those changes are completely orthogonal.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_events.c | 47 +++++++++++++-----------------------------
1 files changed, 15 insertions(+), 32 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 2a4f68a..79a2743 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -428,42 +428,26 @@ static void *event_file_data(struct file *filp)

static void remove_event_file_dir(struct ftrace_event_file *file)
{
+ struct dentry *dir = file->dir;
+ struct dentry *child;
+
+ if (dir) {
+ spin_lock(&dir->d_lock); /* probably unneeded */
+ list_for_each_entry(child, &dir->d_subdirs, d_u.d_child) {
+ if (child->d_inode) /* probably unneeded */
+ child->d_inode->i_private = NULL;
+ }
+ spin_unlock(&dir->d_lock);
+
+ debugfs_remove_recursive(dir);
+ }
+
list_del(&file->list);
- debugfs_remove_recursive(file->dir);
remove_subsystem(file->system);
kmem_cache_free(file_cachep, file);
}

/*
- * Open and update trace_array ref count.
- * Must have the current trace_array passed to it.
- */
-static int tracing_open_generic_file(struct inode *inode, struct file *filp)
-{
- struct ftrace_event_file *file = inode->i_private;
- struct trace_array *tr = file->tr;
- int ret;
-
- if (trace_array_get(tr) < 0)
- return -ENODEV;
-
- ret = tracing_open_generic(inode, filp);
- if (ret < 0)
- trace_array_put(tr);
- return ret;
-}
-
-static int tracing_release_generic_file(struct inode *inode, struct file *filp)
-{
- struct ftrace_event_file *file = inode->i_private;
- struct trace_array *tr = file->tr;
-
- trace_array_put(tr);
-
- return 0;
-}
-
-/*
* __ftrace_set_clr_event(NULL, NULL, NULL, set) will set/unset all events.
*/
static int
@@ -1277,10 +1261,9 @@ static const struct file_operations ftrace_set_event_fops = {
};

static const struct file_operations ftrace_enable_fops = {
- .open = tracing_open_generic_file,
+ .open = tracing_open_generic,
.read = event_enable_read,
.write = event_enable_write,
- .release = tracing_release_generic_file,
.llseek = default_llseek,
};

--
1.5.5.1

Subject: Re: [PATCH v2 6/6] tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_private

(2013/07/27 2:25), Oleg Nesterov wrote:
> Change remove_event_file_dir() to clear ->i_private for every
> file we are going to remove.

Oleg, I think this should be done first.

AFAICS, your former patches depend strongly on this change.
Without this, they don't work as you expected, and it may
prevent git-bisect.

I see, including this patch also breaks current ftrace.
Thus, IMHO, it would better to make a separate patch which just
nullifies i_private with adding some NULL checks where accessing
i_private to avoid the breakages.

This looks a bit long way round, but it ensures bisecting.

Thank you,

> tracing_open_generic_file() and tracing_release_generic_file()
> can go away, ftrace_enable_fops and ftrace_event_filter_fops()
> use tracing_open_generic() but only to check tracing_disabled.
>
> This fixes all races with event_remove() or instance_delete().
> f_op->read/write/whatever can never use the freed file/call,
> all event/* files were changed to check and use ->i_private
> under event_mutex.
>
> Note: this doesn't not fix other problems, event_remove() can
> destroy the active ftrace_event_call, we need more changes but
> those changes are completely orthogonal.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace_events.c | 41 +++++++++--------------------------------
> 1 files changed, 9 insertions(+), 32 deletions(-)
>
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 2a4f68a..1112fa0 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -428,42 +428,20 @@ static void *event_file_data(struct file *filp)
>
> static void remove_event_file_dir(struct ftrace_event_file *file)
> {
> + struct dentry *dir = file->dir;
> + struct dentry *child;
> +
> + /* ->i_mutex is not needed, nobody can create/remove a file */
> + list_for_each_entry(child, &dir->d_subdirs, d_u.d_child)
> + child->d_inode->i_private = NULL;
> +
> list_del(&file->list);
> - debugfs_remove_recursive(file->dir);
> + debugfs_remove_recursive(dir);
> remove_subsystem(file->system);
> kmem_cache_free(file_cachep, file);
> }
>
> /*
> - * Open and update trace_array ref count.
> - * Must have the current trace_array passed to it.
> - */
> -static int tracing_open_generic_file(struct inode *inode, struct file *filp)
> -{
> - struct ftrace_event_file *file = inode->i_private;
> - struct trace_array *tr = file->tr;
> - int ret;
> -
> - if (trace_array_get(tr) < 0)
> - return -ENODEV;
> -
> - ret = tracing_open_generic(inode, filp);
> - if (ret < 0)
> - trace_array_put(tr);
> - return ret;
> -}
> -
> -static int tracing_release_generic_file(struct inode *inode, struct file *filp)
> -{
> - struct ftrace_event_file *file = inode->i_private;
> - struct trace_array *tr = file->tr;
> -
> - trace_array_put(tr);
> -
> - return 0;
> -}
> -
> -/*
> * __ftrace_set_clr_event(NULL, NULL, NULL, set) will set/unset all events.
> */
> static int
> @@ -1277,10 +1255,9 @@ static const struct file_operations ftrace_set_event_fops = {
> };
>
> static const struct file_operations ftrace_enable_fops = {
> - .open = tracing_open_generic_file,
> + .open = tracing_open_generic,
> .read = event_enable_read,
> .write = event_enable_write,
> - .release = tracing_release_generic_file,
> .llseek = default_llseek,
> };
>
>


--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [PATCH v2 0/6] tracing: open/delete fixes

(2013/07/27 2:25), Oleg Nesterov wrote:
> Hello.
>
> Testing: seems to work, but assumes that debugs was fixed too. Hopefully
> "debugfs: debugfs_remove_recursive() must not rely on list_empty(d_subdirs)"
> is fine.
>
> (We should probably make debugfs_remove_recursive() to return the error
> and trace_events.c should warn if it fails)
>
> However: I am still testing this patches, so this is still mostly for
> review. I'll report if I find anything, but to remind we have other
> bugs which need more fixes. And in fact I need to re-read this series
> to convince myself it should work, I was distracted by the unexpected
> problem in debugfs.
>
> Changes: a lot but all cosmetic, addresses the comments from Steven.
>
> Steven, 1/6 was not changed, I assume you agreed with my reply.
>
> Masami, I'll appreciate your review ;)

I think this is OK except for the bisecting issue, as I pointed in
reply to 6/6. ;)

Thank you!

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-07-29 14:27:31

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_private

On 07/29, Masami Hiramatsu wrote:
>
> (2013/07/27 2:25), Oleg Nesterov wrote:
> > Change remove_event_file_dir() to clear ->i_private for every
> > file we are going to remove.
>
> Oleg, I think this should be done first.
>
> AFAICS, your former patches depend strongly on this change.
> Without this, they don't work as you expected, and it may
> prevent git-bisect.

Why?

v2 specially does this in the last change for bisecting/etc.

1-4 change f_op->read/write to check i_private != NULL but until
this final patch i_private == NULL is not possible.

> I see, including this patch also breaks current ftrace.

Could you clarify?

Oleg.

2013-07-29 14:49:32

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] tracing: open/delete fixes

On 07/29, Masami Hiramatsu wrote:
>
> (2013/07/27 2:25), Oleg Nesterov wrote:
> >
> > Masami, I'll appreciate your review ;)
>
> I think this is OK

Great, thanks.

> except for the bisecting issue, as I pointed in
> reply to 6/6. ;)

Perhaps, but so far the problem is not clear to me.

Anyway, if you agree with this approach we can fix the details I
missed.


Btw. I constantly forget to mention this, but it seems that with
these changes we can also remove the PITA code added by 701970b3.

Or is there another reason for fops_get() I missed?

Oleg.

Subject: Re: [PATCH v2 6/6] tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_private

(2013/07/29 23:21), Oleg Nesterov wrote:
> On 07/29, Masami Hiramatsu wrote:
>>
>> (2013/07/27 2:25), Oleg Nesterov wrote:
>>> Change remove_event_file_dir() to clear ->i_private for every
>>> file we are going to remove.
>>
>> Oleg, I think this should be done first.
>>
>> AFAICS, your former patches depend strongly on this change.
>> Without this, they don't work as you expected, and it may
>> prevent git-bisect.
>
> Why?
>
> v2 specially does this in the last change for bisecting/etc.
>
> 1-4 change f_op->read/write to check i_private != NULL but until
> this final patch i_private == NULL is not possible.

Ah, OK. So 1-4 changes still remain refcount code, then
it is safe, just i_private != NULL are added. I misread.


>> I see, including this patch also breaks current ftrace.
>
> Could you clarify?

Yes, let me add reviewed-by tags ;)

Thank you,

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [PATCH v2 1/6] tracing: Turn event/id->i_private into call->event.type

(2013/07/27 2:25), Oleg Nesterov wrote:
> event_id_read() is racy, ftrace_event_call can be already freed
> by trace_remove_event_call() callers.
>
> Change event_create_dir() to pass "data = call->event.type", this
> is all event_id_read() needs. ftrace_event_id_fops no longer needs
> tracing_open_generic().
>
> We add the new helper, event_file_data(), to read ->i_private, it
> will have more users.
>
> Note: currently ACCESS_ONCE() and "id != 0" check are not needed,
> but we are going to change event_remove/rmdir to clear ->i_private.
>

Looks OK for me.

Reviewed-by: Masami Hiramatsu <[email protected]>

> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace_events.c | 17 ++++++++++++-----
> 1 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 76eed53..77990c4 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -409,6 +409,11 @@ static void put_system(struct ftrace_subsystem_dir *dir)
> mutex_unlock(&event_mutex);
> }
>
> +static void *event_file_data(struct file *filp)
> +{
> + return ACCESS_ONCE(file_inode(filp)->i_private);
> +}
> +
> /*
> * Open and update trace_array ref count.
> * Must have the current trace_array passed to it.
> @@ -946,14 +951,17 @@ static int trace_format_open(struct inode *inode, struct file *file)
> static ssize_t
> event_id_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
> {
> - struct ftrace_event_call *call = filp->private_data;
> + int id = (long)event_file_data(filp);
> char buf[32];
> int len;
>
> if (*ppos)
> return 0;
>
> - len = sprintf(buf, "%d\n", call->event.type);
> + if (unlikely(!id))
> + return -ENODEV;
> +
> + len = sprintf(buf, "%d\n", id);
> return simple_read_from_buffer(ubuf, cnt, ppos, buf, len);
> }
>
> @@ -1240,7 +1248,6 @@ static const struct file_operations ftrace_event_format_fops = {
> };
>
> static const struct file_operations ftrace_event_id_fops = {
> - .open = tracing_open_generic,
> .read = event_id_read,
> .llseek = default_llseek,
> };
> @@ -1488,8 +1495,8 @@ event_create_dir(struct dentry *parent,
>
> #ifdef CONFIG_PERF_EVENTS
> if (call->event.type && call->class->reg)
> - trace_create_file("id", 0444, file->dir, call,
> - id);
> + trace_create_file("id", 0444, file->dir,
> + (void *)(long)call->event.type, id);
> #endif
>
> /*
>


--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [PATCH v2 2/6] tracing: Change event_enable/disable_read() to verify i_private != NULL

(2013/07/27 2:25), Oleg Nesterov wrote:
> tracing_open_generic_file() is racy, ftrace_event_file can be
> already freed by rmdir or trace_remove_event_call().
>
> Change event_enable_read() and event_disable_read() to read and
> verify "file = i_private" under event_mutex.
>
> This fixes nothing, but now we can change debugfs_remove("enable")
> callers to nullify ->i_private and fix the the problem.


Reviewed-by: Masami Hiramatsu <[email protected]>

BTW, this still has a cosmetic change, is that OK for Steven?

>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace_events.c | 31 ++++++++++++++++++++-----------
> 1 files changed, 20 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 77990c4..39cb049 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -684,19 +684,28 @@ static ssize_t
> event_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
> loff_t *ppos)
> {
> - struct ftrace_event_file *file = filp->private_data;
> + struct ftrace_event_file *file;
> + unsigned long flags;
> char buf[4] = "0";
>
> - if (file->flags & FTRACE_EVENT_FL_ENABLED &&
> - !(file->flags & FTRACE_EVENT_FL_SOFT_DISABLED))
> + mutex_lock(&event_mutex);
> + file = event_file_data(filp);
> + if (likely(file))
> + flags = file->flags;
> + mutex_unlock(&event_mutex);
> +
> + if (!file)
> + return -ENODEV;
> +
> + if (flags & FTRACE_EVENT_FL_ENABLED &&
> + !(flags & FTRACE_EVENT_FL_SOFT_DISABLED))
> strcpy(buf, "1");
>
> - if (file->flags & FTRACE_EVENT_FL_SOFT_DISABLED ||
> - file->flags & FTRACE_EVENT_FL_SOFT_MODE)
> + if (flags & FTRACE_EVENT_FL_SOFT_DISABLED ||
> + flags & FTRACE_EVENT_FL_SOFT_MODE)
> strcat(buf, "*");
>
> strcat(buf, "\n");
> -
> return simple_read_from_buffer(ubuf, cnt, ppos, buf, strlen(buf));
> }
>
> @@ -704,13 +713,10 @@ static ssize_t
> event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
> loff_t *ppos)
> {
> - struct ftrace_event_file *file = filp->private_data;
> + struct ftrace_event_file *file;
> unsigned long val;
> int ret;
>
> - if (!file)
> - return -EINVAL;
> -
> ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
> if (ret)
> return ret;
> @@ -722,8 +728,11 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
> switch (val) {
> case 0:
> case 1:
> + ret = -ENODEV;
> mutex_lock(&event_mutex);
> - ret = ftrace_event_enable_disable(file, val);
> + file = event_file_data(filp);
> + if (likely(file))
> + ret = ftrace_event_enable_disable(file, val);
> mutex_unlock(&event_mutex);
> break;
>
>


--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [PATCH v2 5/6] tracing: Introduce remove_event_file_dir()

(2013/07/27 2:25), Oleg Nesterov wrote:
> Preparation for the next patch. Extract the common code from
> remove_event_from_tracers() and __trace_remove_event_dirs()
> into the new helper, remove_event_file_dir().
>
> The patch looks more complicated than it actually is, it also
> moves remove_subsystem() up to avoid the forward declaration.

Looks good for me.

Reviewed-by: Masami Hiramatsu <[email protected]>

Thanks!

>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace_events.c | 47 +++++++++++++++++++++----------------------
> 1 files changed, 23 insertions(+), 24 deletions(-)
>
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 3de2aca..2a4f68a 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -409,11 +409,31 @@ static void put_system(struct ftrace_subsystem_dir *dir)
> mutex_unlock(&event_mutex);
> }
>
> +static void remove_subsystem(struct ftrace_subsystem_dir *dir)
> +{
> + if (!dir)
> + return;
> +
> + if (!--dir->nr_events) {
> + debugfs_remove_recursive(dir->entry);
> + list_del(&dir->list);
> + __put_system_dir(dir);
> + }
> +}
> +
> static void *event_file_data(struct file *filp)
> {
> return ACCESS_ONCE(file_inode(filp)->i_private);
> }
>
> +static void remove_event_file_dir(struct ftrace_event_file *file)
> +{
> + list_del(&file->list);
> + debugfs_remove_recursive(file->dir);
> + remove_subsystem(file->system);
> + kmem_cache_free(file_cachep, file);
> +}
> +
> /*
> * Open and update trace_array ref count.
> * Must have the current trace_array passed to it.
> @@ -1545,33 +1565,16 @@ event_create_dir(struct dentry *parent,
> return 0;
> }
>
> -static void remove_subsystem(struct ftrace_subsystem_dir *dir)
> -{
> - if (!dir)
> - return;
> -
> - if (!--dir->nr_events) {
> - debugfs_remove_recursive(dir->entry);
> - list_del(&dir->list);
> - __put_system_dir(dir);
> - }
> -}
> -
> static void remove_event_from_tracers(struct ftrace_event_call *call)
> {
> struct ftrace_event_file *file;
> struct trace_array *tr;
>
> do_for_each_event_file_safe(tr, file) {
> -
> if (file->event_call != call)
> continue;
>
> - list_del(&file->list);
> - debugfs_remove_recursive(file->dir);
> - remove_subsystem(file->system);
> - kmem_cache_free(file_cachep, file);
> -
> + remove_event_file_dir(file);
> /*
> * The do_for_each_event_file_safe() is
> * a double loop. After finding the call for this
> @@ -2301,12 +2304,8 @@ __trace_remove_event_dirs(struct trace_array *tr)
> {
> struct ftrace_event_file *file, *next;
>
> - list_for_each_entry_safe(file, next, &tr->events, list) {
> - list_del(&file->list);
> - debugfs_remove_recursive(file->dir);
> - remove_subsystem(file->system);
> - kmem_cache_free(file_cachep, file);
> - }
> + list_for_each_entry_safe(file, next, &tr->events, list)
> + remove_event_file_dir(file);
> }
>
> static void
>


--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [PATCH v2 4/6] tracing: Change f_start() to take event_mutex and verify i_private != NULL

(2013/07/27 2:25), Oleg Nesterov wrote:
> trace_format_open() and trace_format_seq_ops are racy, nothing
> protects ftrace_event_call from trace_remove_event_call().
>
> Change f_start() to take event_mutex and verify i_private != NULL,
> change f_stop() to drop this lock.
>
> This fixes nothing, but now we can change debugfs_remove("format")
> callers to nullify ->i_private and fix the the problem.
>
> Note: the usage of event_mutex is sub-optimal but simple, we can
> change this later.
>

Looks good for me.

Reviewed-by: Masami Hiramatsu <[email protected]>

Thanks!

> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace_events.c | 13 +++++++++----
> 1 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index b5144c4..3de2aca 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -839,7 +839,7 @@ enum {
>
> static void *f_next(struct seq_file *m, void *v, loff_t *pos)
> {
> - struct ftrace_event_call *call = m->private;
> + struct ftrace_event_call *call = event_file_data(m->private);
> struct list_head *common_head = &ftrace_common_fields;
> struct list_head *head = trace_get_fields(call);
> struct list_head *node = v;
> @@ -871,7 +871,7 @@ static void *f_next(struct seq_file *m, void *v, loff_t *pos)
>
> static int f_show(struct seq_file *m, void *v)
> {
> - struct ftrace_event_call *call = m->private;
> + struct ftrace_event_call *call = event_file_data(m->private);
> struct ftrace_event_field *field;
> const char *array_descriptor;
>
> @@ -924,6 +924,11 @@ static void *f_start(struct seq_file *m, loff_t *pos)
> void *p = (void *)FORMAT_HEADER;
> loff_t l = 0;
>
> + /* ->stop() is called even if ->start() fails */
> + mutex_lock(&event_mutex);
> + if (!event_file_data(m->private))
> + return ERR_PTR(-ENODEV);
> +
> while (l < *pos && p)
> p = f_next(m, p, &l);
>
> @@ -932,6 +937,7 @@ static void *f_start(struct seq_file *m, loff_t *pos)
>
> static void f_stop(struct seq_file *m, void *p)
> {
> + mutex_unlock(&event_mutex);
> }
>
> static const struct seq_operations trace_format_seq_ops = {
> @@ -943,7 +949,6 @@ static const struct seq_operations trace_format_seq_ops = {
>
> static int trace_format_open(struct inode *inode, struct file *file)
> {
> - struct ftrace_event_call *call = inode->i_private;
> struct seq_file *m;
> int ret;
>
> @@ -952,7 +957,7 @@ static int trace_format_open(struct inode *inode, struct file *file)
> return ret;
>
> m = file->private_data;
> - m->private = call;
> + m->private = file;
>
> return 0;
> }
>


--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [PATCH v2 3/6] tracing: Change event_filter_read/write to verify i_private != NULL

(2013/07/27 2:25), Oleg Nesterov wrote:
> event_filter_read/write() are racy, ftrace_event_call can be already
> freed by trace_remove_event_call() callers.
>
> 1. Shift mutex_lock(event_mutex) from print/apply_event_filter to
> the callers.
>
> 2. Change the callers, event_filter_read() and event_filter_write()
> to read i_private under this mutex and abort if it is NULL.
>
> This fixes nothing, but now we can change debugfs_remove("filter")
> callers to nullify ->i_private and fix the the problem.

Looks good for me. (Note: this also has some cosmetic changes)

Reviewed-by: Masami Hiramatsu <[email protected]>

Thanks!

>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace_events.c | 28 +++++++++++++++++++---------
> kernel/trace/trace_events_filter.c | 17 ++++++-----------
> 2 files changed, 25 insertions(+), 20 deletions(-)
>
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 39cb049..b5144c4 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -978,24 +978,30 @@ static ssize_t
> event_filter_read(struct file *filp, char __user *ubuf, size_t cnt,
> loff_t *ppos)
> {
> - struct ftrace_event_call *call = filp->private_data;
> + struct ftrace_event_call *call;
> struct trace_seq *s;
> - int r;
> + int r = -ENODEV;
>
> if (*ppos)
> return 0;
>
> s = kmalloc(sizeof(*s), GFP_KERNEL);
> +
> if (!s)
> return -ENOMEM;
>
> trace_seq_init(s);
>
> - print_event_filter(call, s);
> - r = simple_read_from_buffer(ubuf, cnt, ppos, s->buffer, s->len);
> + mutex_lock(&event_mutex);
> + call = event_file_data(filp);
> + if (call)
> + print_event_filter(call, s);
> + mutex_unlock(&event_mutex);
>
> - kfree(s);
> + if (call)
> + r = simple_read_from_buffer(ubuf, cnt, ppos, s->buffer, s->len);
>
> + kfree(s);
> return r;
> }
>
> @@ -1003,9 +1009,9 @@ static ssize_t
> event_filter_write(struct file *filp, const char __user *ubuf, size_t cnt,
> loff_t *ppos)
> {
> - struct ftrace_event_call *call = filp->private_data;
> + struct ftrace_event_call *call;
> char *buf;
> - int err;
> + int err = -ENODEV;
>
> if (cnt >= PAGE_SIZE)
> return -EINVAL;
> @@ -1020,13 +1026,17 @@ event_filter_write(struct file *filp, const char __user *ubuf, size_t cnt,
> }
> buf[cnt] = '\0';
>
> - err = apply_event_filter(call, buf);
> + mutex_lock(&event_mutex);
> + call = event_file_data(filp);
> + if (call)
> + err = apply_event_filter(call, buf);
> + mutex_unlock(&event_mutex);
> +
> free_page((unsigned long) buf);
> if (err < 0)
> return err;
>
> *ppos += cnt;
> -
> return cnt;
> }
>
> diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
> index 0c7b75a..97daa8c 100644
> --- a/kernel/trace/trace_events_filter.c
> +++ b/kernel/trace/trace_events_filter.c
> @@ -637,17 +637,15 @@ static void append_filter_err(struct filter_parse_state *ps,
> free_page((unsigned long) buf);
> }
>
> +/* caller must hold event_mutex */
> void print_event_filter(struct ftrace_event_call *call, struct trace_seq *s)
> {
> - struct event_filter *filter;
> + struct event_filter *filter = call->filter;
>
> - mutex_lock(&event_mutex);
> - filter = call->filter;
> if (filter && filter->filter_string)
> trace_seq_printf(s, "%s\n", filter->filter_string);
> else
> trace_seq_puts(s, "none\n");
> - mutex_unlock(&event_mutex);
> }
>
> void print_subsystem_event_filter(struct event_subsystem *system,
> @@ -1841,23 +1839,22 @@ static int create_system_filter(struct event_subsystem *system,
> return err;
> }
>
> +/* caller must hold event_mutex */
> int apply_event_filter(struct ftrace_event_call *call, char *filter_string)
> {
> struct event_filter *filter;
> - int err = 0;
> -
> - mutex_lock(&event_mutex);
> + int err;
>
> if (!strcmp(strstrip(filter_string), "0")) {
> filter_disable(call);
> filter = call->filter;
> if (!filter)
> - goto out_unlock;
> + return 0;
> RCU_INIT_POINTER(call->filter, NULL);
> /* Make sure the filter is not being used */
> synchronize_sched();
> __free_filter(filter);
> - goto out_unlock;
> + return 0;
> }
>
> err = create_filter(call, filter_string, true, &filter);
> @@ -1884,8 +1881,6 @@ int apply_event_filter(struct ftrace_event_call *call, char *filter_string)
> __free_filter(tmp);
> }
> }
> -out_unlock:
> - mutex_unlock(&event_mutex);
>
> return err;
> }
>


--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [PATCH v3 6/6] tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_private

(2013/07/29 3:35), Oleg Nesterov wrote:
> Change remove_event_file_dir() to clear ->i_private for every
> file we are going to remove.
>
> We need to check file->dir != NULL because event_create_dir()
> can fail. debugfs_remove_recursive(NULL) is fine but the patch
> moves it under the same check anyway for readability.
>
> spin_lock(d_lock) and "d_inode != NULL" check are not needed
> afaics, but I do not understand this code enough.
>
> tracing_open_generic_file() and tracing_release_generic_file()
> can go away, ftrace_enable_fops and ftrace_event_filter_fops()
> use tracing_open_generic() but only to check tracing_disabled.
>
> This fixes all races with event_remove() or instance_delete().
> f_op->read/write/whatever can never use the freed file/call,
> all event/* files were changed to check and use ->i_private
> under event_mutex.
>
> Note: this doesn't not fix other problems, event_remove() can
> destroy the active ftrace_event_call, we need more changes but
> those changes are completely orthogonal.

Looks good for me.

Reviewed-by: Masami Hiramatsu <[email protected]>

Thanks!

>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace_events.c | 47 +++++++++++++-----------------------------
> 1 files changed, 15 insertions(+), 32 deletions(-)
>
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 2a4f68a..79a2743 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -428,42 +428,26 @@ static void *event_file_data(struct file *filp)
>
> static void remove_event_file_dir(struct ftrace_event_file *file)
> {
> + struct dentry *dir = file->dir;
> + struct dentry *child;
> +
> + if (dir) {
> + spin_lock(&dir->d_lock); /* probably unneeded */
> + list_for_each_entry(child, &dir->d_subdirs, d_u.d_child) {
> + if (child->d_inode) /* probably unneeded */
> + child->d_inode->i_private = NULL;
> + }
> + spin_unlock(&dir->d_lock);
> +
> + debugfs_remove_recursive(dir);
> + }
> +
> list_del(&file->list);
> - debugfs_remove_recursive(file->dir);
> remove_subsystem(file->system);
> kmem_cache_free(file_cachep, file);
> }
>
> /*
> - * Open and update trace_array ref count.
> - * Must have the current trace_array passed to it.
> - */
> -static int tracing_open_generic_file(struct inode *inode, struct file *filp)
> -{
> - struct ftrace_event_file *file = inode->i_private;
> - struct trace_array *tr = file->tr;
> - int ret;
> -
> - if (trace_array_get(tr) < 0)
> - return -ENODEV;
> -
> - ret = tracing_open_generic(inode, filp);
> - if (ret < 0)
> - trace_array_put(tr);
> - return ret;
> -}
> -
> -static int tracing_release_generic_file(struct inode *inode, struct file *filp)
> -{
> - struct ftrace_event_file *file = inode->i_private;
> - struct trace_array *tr = file->tr;
> -
> - trace_array_put(tr);
> -
> - return 0;
> -}
> -
> -/*
> * __ftrace_set_clr_event(NULL, NULL, NULL, set) will set/unset all events.
> */
> static int
> @@ -1277,10 +1261,9 @@ static const struct file_operations ftrace_set_event_fops = {
> };
>
> static const struct file_operations ftrace_enable_fops = {
> - .open = tracing_open_generic_file,
> + .open = tracing_open_generic,
> .read = event_enable_read,
> .write = event_enable_write,
> - .release = tracing_release_generic_file,
> .llseek = default_llseek,
> };
>
>


--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-07-30 01:42:55

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] tracing: Change event_enable/disable_read() to verify i_private != NULL

On Tue, 2013-07-30 at 10:31 +0900, Masami Hiramatsu wrote:

> BTW, this still has a cosmetic change, is that OK for Steven?
>

> > strcat(buf, "\n");
> > -
> > return simple_read_from_buffer(ubuf, cnt, ppos, buf, strlen(buf));
> > }
> >

You mean the above? Don't worry, I removed it ;-)

I don't usually touch patches, but whitespace changes like this are not
worthy of even a mention.

-- Steve

2013-07-30 01:59:58

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_private

On Tue, 2013-07-30 at 10:28 +0900, Masami Hiramatsu wrote:
> (2013/07/29 23:21), Oleg Nesterov wrote:
> > On 07/29, Masami Hiramatsu wrote:
> >>
> >> (2013/07/27 2:25), Oleg Nesterov wrote:
> >>> Change remove_event_file_dir() to clear ->i_private for every
> >>> file we are going to remove.
> >>
> >> Oleg, I think this should be done first.
> >>
> >> AFAICS, your former patches depend strongly on this change.
> >> Without this, they don't work as you expected, and it may
> >> prevent git-bisect.
> >
> > Why?
> >
> > v2 specially does this in the last change for bisecting/etc.
> >
> > 1-4 change f_op->read/write to check i_private != NULL but until
> > this final patch i_private == NULL is not possible.
>
> Ah, OK. So 1-4 changes still remain refcount code, then
> it is safe, just i_private != NULL are added. I misread.

I just ran them all individually applied through my ftrace stress tests.
All passed, thus they are as bisectible as I would like them to be.

-- Steve

2013-07-30 02:01:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_private

On Tue, 2013-07-30 at 10:36 +0900, Masami Hiramatsu wrote:

> Looks good for me.
>
> Reviewed-by: Masami Hiramatsu <[email protected]>

I've applied them all locally, but I'll rebase and add your reviewed-by
tags. (also make some other checks) Then I need to rerun them through
my main tests before I post them to next, and then onto Linus.

Thanks to both of you,

-- Steve