Currently there exists a race with deleting a kprobe or uprobe and
a user opening the probe event file or using perf events.
The problem stems from not being able to take the probe_lock from the
unregister code because we may have the event_mutex at the time, and
the event mutex may be taken with the probe_lock held.
To solve this, the events get a ref count (using the flags field), where
when an event file is opened, the ftrace_event_call ref count increments.
Then this is checked under event_mutex and if set, the unregistering
of the probe will fail.
Here's a test that shows how things break:
# cd /sys/kernel/debug/tracing
# echo 'p:sigprocmask sigprocmask' > kprobe_events || exit -1
# enable_probe() {
sleep 10
echo 1
}
# file=events/kprobes/sigprocmask/enable
# enable_probe > $file &
> kprobe_events
The above will corrupt the kprobe system, as the write to the enable
file will happen after the kprobe was deleted.
Trying to create the probe again fails:
# echo 'p:sigprocmask sigprocmask' > kprobe_events
# cat kprobe_events
p:kprobes/sigprocmask sigprocmask
# ls events/kprobes/
enable filter
After applying these patches, the "> kprobe_events" fails due to the
event being busy.
Masami, please review these patches and give your ack.
Srikar, can you please review the last patch. I didn't test uprobes
with this. I'll do that after the 4th.
Thanks,
-- Steve
Oleg Nesterov (1):
tracing: trace_remove_event_call() should fail if call/file is in use
Steven Rostedt (Red Hat) (3):
tracing: Add ref count to ftrace_event_call
tracing/kprobes: Fail to unregister if probe event files are open
tracing/uprobes: Fail to unregister if probe event files are open
----
include/linux/ftrace_event.h | 8 +++-
kernel/trace/trace_events.c | 109 +++++++++++++++++++++++++++++++++++++++---
kernel/trace/trace_kprobe.c | 21 +++++---
kernel/trace/trace_uprobe.c | 48 ++++++++++++++-----
4 files changed, 160 insertions(+), 26 deletions(-)
(2013/07/04 12:33), Steven Rostedt wrote:
> Currently there exists a race with deleting a kprobe or uprobe and
> a user opening the probe event file or using perf events.
>
> The problem stems from not being able to take the probe_lock from the
> unregister code because we may have the event_mutex at the time, and
> the event mutex may be taken with the probe_lock held.
>
> To solve this, the events get a ref count (using the flags field), where
> when an event file is opened, the ftrace_event_call ref count increments.
> Then this is checked under event_mutex and if set, the unregistering
> of the probe will fail.
>
> Here's a test that shows how things break:
>
> # cd /sys/kernel/debug/tracing
> # echo 'p:sigprocmask sigprocmask' > kprobe_events || exit -1
> # enable_probe() {
> sleep 10
> echo 1
> }
> # file=events/kprobes/sigprocmask/enable
> # enable_probe > $file &
> > kprobe_events
>
> The above will corrupt the kprobe system, as the write to the enable
> file will happen after the kprobe was deleted.
>
> Trying to create the probe again fails:
>
> # echo 'p:sigprocmask sigprocmask' > kprobe_events
> # cat kprobe_events
> p:kprobes/sigprocmask sigprocmask
> # ls events/kprobes/
> enable filter
>
> After applying these patches, the "> kprobe_events" fails due to the
> event being busy.
>
> Masami, please review these patches and give your ack.
Thanks Steven!
> Oleg Nesterov (1):
> tracing: trace_remove_event_call() should fail if call/file is in use
>
> Steven Rostedt (Red Hat) (3):
> tracing: Add ref count to ftrace_event_call
> tracing/kprobes: Fail to unregister if probe event files are open
> tracing/uprobes: Fail to unregister if probe event files are open
I just started to look into the series, but the 3/4 and 4/4 seems same...
Which one is good to go?
Thank you,
--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
(2013/07/04 13:00), Masami Hiramatsu wrote:
>> Steven Rostedt (Red Hat) (3):
>> tracing: Add ref count to ftrace_event_call
>> tracing/kprobes: Fail to unregister if probe event files are open
>> tracing/uprobes: Fail to unregister if probe event files are open
>
> I just started to look into the series, but the 3/4 and 4/4 seems same...
> Which one is good to go?
Ah, now I see, those are for kprobes and uprobes...
Thank you,
--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
(2013/07/04 12:33), Steven Rostedt wrote:
> Currently there exists a race with deleting a kprobe or uprobe and
> a user opening the probe event file or using perf events.
>
> The problem stems from not being able to take the probe_lock from the
> unregister code because we may have the event_mutex at the time, and
> the event mutex may be taken with the probe_lock held.
>
> To solve this, the events get a ref count (using the flags field), where
> when an event file is opened, the ftrace_event_call ref count increments.
> Then this is checked under event_mutex and if set, the unregistering
> of the probe will fail.
>
> Here's a test that shows how things break:
>
> # cd /sys/kernel/debug/tracing
> # echo 'p:sigprocmask sigprocmask' > kprobe_events || exit -1
> # enable_probe() {
> sleep 10
> echo 1
> }
> # file=events/kprobes/sigprocmask/enable
> # enable_probe > $file &
> > kprobe_events
>
> The above will corrupt the kprobe system, as the write to the enable
> file will happen after the kprobe was deleted.
>
> Trying to create the probe again fails:
>
> # echo 'p:sigprocmask sigprocmask' > kprobe_events
> # cat kprobe_events
> p:kprobes/sigprocmask sigprocmask
> # ls events/kprobes/
> enable filter
>
> After applying these patches, the "> kprobe_events" fails due to the
> event being busy.
>
> Masami, please review these patches and give your ack.
Steve, Oleg, could you also take a look on my additional 2 patches too?
I think both this series and my patches are required for
fixing most of the problem. (what we need is a patch to ensure
finishing all running uprobe handlers at disabling it *if needed*)
Thank you,
--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
On 07/12, Masami Hiramatsu wrote:
>
> Steve, Oleg, could you also take a look on my additional 2 patches too?
Sorry, sorry, I was a bit busy.
Masami, I am not sure I can help, but so far I _feel_ that perhaps we
should do something else. I'll try to recheck and report on Monday.
But "Wait for disabling all running kprobe handlers" looks "obviously
fine" to me.
Oleg.
On 07/03, Steven Rostedt wrote:
>
> Currently there exists a race with deleting a kprobe or uprobe and
> a user opening the probe event file or using perf events.
>
> The problem stems from not being able to take the probe_lock from the
> unregister code because we may have the event_mutex at the time, and
> the event mutex may be taken with the probe_lock held.
>
> To solve this, the events get a ref count (using the flags field), where
> when an event file is opened, the ftrace_event_call ref count increments.
> Then this is checked under event_mutex and if set, the unregistering
> of the probe will fail.
So. As Masami pointed out, this is not enough. Probably we can add more
hacks, but I'd like to discuss the alternative approach.
Note also that this ref count has the unfortunate property, if someone
keeps the file opened we can't remove an event.
Not sure you will like it, not sure this can actually work, but let me
try ;)
Note:
- the "patch" below is only for illustration, it is intentionally
incomplete, it only "fixes" ftrace_enable_fops and ignores
id/format/filter. However I they could be fixed too. The most
problematic is ftrace_event_format_fops, it needs to update
trace_format_seq_ops.
- we still need "trace_remove_event_call() should fail if call/file
is in use" which everyone seems to agree with
- and we still need the discussed changes in trace_kpobes/uprobes
What this patch does:
- add the new "topmost" rw_semaphore, file_sem.
- trace_remove_event_call() takes this sem for writing and
cleares enable/id/format/filter->i_private
- removes tracing_open_generic_file/tracing_release_generic_file,
we do not use file->f_private any longer
- changes event_enable_read/event_enable_write to read
ftrace_event_file = i_private under read_lock(file_sem) and
abort if it is null.
Question: why event_enable_write() needs trace_array_get() ?
Steven, Masami, do you think this can make any sense?
Oleg.
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -410,35 +410,6 @@ static void put_system(struct ftrace_subsystem_dir *dir)
}
/*
- * Open and update trace_array ref count.
- * Must have the current trace_array passed to it.
- */
-static int tracing_open_generic_file(struct inode *inode, struct file *filp)
-{
- struct ftrace_event_file *file = inode->i_private;
- struct trace_array *tr = file->tr;
- int ret;
-
- if (trace_array_get(tr) < 0)
- return -ENODEV;
-
- ret = tracing_open_generic(inode, filp);
- if (ret < 0)
- trace_array_put(tr);
- return ret;
-}
-
-static int tracing_release_generic_file(struct inode *inode, struct file *filp)
-{
- struct ftrace_event_file *file = inode->i_private;
- struct trace_array *tr = file->tr;
-
- trace_array_put(tr);
-
- return 0;
-}
-
-/*
* __ftrace_set_clr_event(NULL, NULL, NULL, set) will set/unset all events.
*/
static int
@@ -675,19 +646,51 @@ static void t_stop(struct seq_file *m, void *p)
mutex_unlock(&event_mutex);
}
+static DECLARE_RWSEM(file_sem);
+
+static void *get_i_private(struct file *filp)
+{
+ void *data;
+ down_read(&file_sem);
+ data = file_inode(filp)->i_private;
+ if (!data)
+ up_read(&file_sem);
+ return data;
+}
+
+static void put_i_private(struct file *filp)
+{
+ WARN_ON(!file_inode(filp)->i_private);
+ up_read(&file_sem);
+}
+
+static void clear_i_private(struct dentry *dir)
+{
+ struct dentry *child;
+ list_for_each_entry(child, &dir->d_subdirs, d_u.d_child)
+ child->d_inode->i_private = NULL;
+}
+
static ssize_t
event_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
loff_t *ppos)
{
- struct ftrace_event_file *file = filp->private_data;
+ struct ftrace_event_file *file;
+ unsigned long flags;
char buf[4] = "0";
- if (file->flags & FTRACE_EVENT_FL_ENABLED &&
- !(file->flags & FTRACE_EVENT_FL_SOFT_DISABLED))
+ file = get_i_private(filp);
+ if (!file)
+ return -ENODEV;
+ flags = file->flags;
+ put_i_private(filp);
+
+ if (flags & FTRACE_EVENT_FL_ENABLED &&
+ !(flags & FTRACE_EVENT_FL_SOFT_DISABLED))
strcpy(buf, "1");
- if (file->flags & FTRACE_EVENT_FL_SOFT_DISABLED ||
- file->flags & FTRACE_EVENT_FL_SOFT_MODE)
+ if (flags & FTRACE_EVENT_FL_SOFT_DISABLED ||
+ flags & FTRACE_EVENT_FL_SOFT_MODE)
strcat(buf, "*");
strcat(buf, "\n");
@@ -699,13 +702,10 @@ static ssize_t
event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
loff_t *ppos)
{
- struct ftrace_event_file *file = filp->private_data;
+ struct ftrace_event_file *file;
unsigned long val;
int ret;
- if (!file)
- return -EINVAL;
-
ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
if (ret)
return ret;
@@ -717,9 +717,18 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
switch (val) {
case 0:
case 1:
- mutex_lock(&event_mutex);
- ret = ftrace_event_enable_disable(file, val);
- mutex_unlock(&event_mutex);
+ file = get_i_private(filp);
+ if (!file)
+ return -ENODEV;
+
+ ret = trace_array_get(file->tr);
+ if (!ret) {
+ mutex_lock(&event_mutex);
+ ret = ftrace_event_enable_disable(file, val);
+ mutex_unlock(&event_mutex);
+ trace_array_put(file->tr);
+ }
+ put_i_private(filp);
break;
default:
@@ -1249,10 +1258,8 @@ static const struct file_operations ftrace_set_event_fops = {
};
static const struct file_operations ftrace_enable_fops = {
- .open = tracing_open_generic_file,
.read = event_enable_read,
.write = event_enable_write,
- .release = tracing_release_generic_file,
.llseek = default_llseek,
};
@@ -1545,6 +1552,7 @@ static void remove_event_from_tracers(struct ftrace_event_call *call)
continue;
list_del(&file->list);
+ clear_i_private(file->dir);
debugfs_remove_recursive(file->dir);
remove_subsystem(file->system);
kmem_cache_free(file_cachep, file);
@@ -1703,6 +1711,7 @@ static void __trace_remove_event_call(struct ftrace_event_call *call)
/* Remove an event_call */
void trace_remove_event_call(struct ftrace_event_call *call)
{
+ down_write(&file_sem);
mutex_lock(&trace_types_lock);
mutex_lock(&event_mutex);
down_write(&trace_event_sem);
@@ -1710,6 +1719,7 @@ void trace_remove_event_call(struct ftrace_event_call *call)
up_write(&trace_event_sem);
mutex_unlock(&event_mutex);
mutex_unlock(&trace_types_lock);
+ up_write(&file_sem);
}
#define for_each_event(event, start, end) \
On 07/15, Oleg Nesterov wrote:
>
> So. As Masami pointed out, this is not enough. Probably we can add more
> hacks, but I'd like to discuss the alternative approach.
>
> Note also that this ref count has the unfortunate property, if someone
> keeps the file opened we can't remove an event.
And please correct me, but afaics there are similar problems with
rmdir instances/xxx.
> What this patch does:
>
> - add the new "topmost" rw_semaphore, file_sem.
probably unneeded...
> - trace_remove_event_call() takes this sem for writing and
> cleares enable/id/format/filter->i_private
>
> - removes tracing_open_generic_file/tracing_release_generic_file,
> we do not use file->f_private any longer
>
> - changes event_enable_read/event_enable_write to read
> ftrace_event_file = i_private under read_lock(file_sem) and
> abort if it is null.
>
> Question: why event_enable_write() needs trace_array_get() ?
probably it doesn't...
> Steven, Masami, do you think this can make any sense?
Oleg.
On Tue, 2013-07-16 at 18:38 +0200, Oleg Nesterov wrote:
> On 07/15, Oleg Nesterov wrote:
> >
> > So. As Masami pointed out, this is not enough. Probably we can add more
> > hacks, but I'd like to discuss the alternative approach.
> >
> > Note also that this ref count has the unfortunate property, if someone
> > keeps the file opened we can't remove an event.
>
> And please correct me, but afaics there are similar problems with
> rmdir instances/xxx.
The instances have their own refcount. And if a file is open, it wont
remove the directory. Just try it...
# cd /sys/kernel/debug/tracing/instances
# mkdir foo
# sleep 10 < foo/events/ext3/enable &
[1] 2087
# rmdir foo
rmdir: failed to remove `foo': Device or resource busy
#
[1]+ Done sleep 10 < foo/events/ext3/enable
# rmdir foo
#
>
> > What this patch does:
> >
> > - add the new "topmost" rw_semaphore, file_sem.
>
> probably unneeded...
>
> > - trace_remove_event_call() takes this sem for writing and
> > cleares enable/id/format/filter->i_private
> >
> > - removes tracing_open_generic_file/tracing_release_generic_file,
> > we do not use file->f_private any longer
> >
> > - changes event_enable_read/event_enable_write to read
> > ftrace_event_file = i_private under read_lock(file_sem) and
> > abort if it is null.
> >
> > Question: why event_enable_write() needs trace_array_get() ?
>
> probably it doesn't...
I'm confused. Which code has event_enable_write doing a
trace_array_get()?
>
> > Steven, Masami, do you think this can make any sense?
I need to sort these patches out. They are all over the place. What
exactly do we have for a solution here. Did we figure out which patches
are needed?
I'll have to go back and re-read the entire thread. I've been all over
the place lately with different topics, and don't remember all that has
been said here.
-- Steve
On 07/16, Steven Rostedt wrote:
>
> On Tue, 2013-07-16 at 18:38 +0200, Oleg Nesterov wrote:
> > On 07/15, Oleg Nesterov wrote:
> > >
> > > So. As Masami pointed out, this is not enough. Probably we can add more
> > > hacks, but I'd like to discuss the alternative approach.
> > >
> > > Note also that this ref count has the unfortunate property, if someone
> > > keeps the file opened we can't remove an event.
> >
> > And please correct me, but afaics there are similar problems with
> > rmdir instances/xxx.
>
> The instances have their own refcount. And if a file is open, it wont
> remove the directory. Just try it...
I guess you mean "if (tr->ref)" check.
But tracing_open_generic_file()->trace_array_get() is racy, no?
Somehow we need to ensure it is safe to use file / file->tr.
> > > Question: why event_enable_write() needs trace_array_get() ?
> >
> > probably it doesn't...
>
> I'm confused. Which code has event_enable_write doing a
> trace_array_get()?
I meant tracing_open_generic_file(), sorry. I _think_ it can die,
please see RFC I sent.
But let me repeat once again, I do not pretend I understand this
code ;)
> I need to sort these patches out. They are all over the place. What
> exactly do we have for a solution here. Did we figure out which patches
> are needed?
Well, see above. But I think:
1. We should close open/delete races
2. We still need probe_remove_event_call() which return -EBUSY
if perf_refcount || FTRACE_EVENT_FL_ENABLED
2/4 in your series but without TRACE_EVENT_FL_REF_MASK check
3. We need the changes in trace_kprobes.c (and uprobes but lets
ignore it for now). The patch from you (3/4) and another one
from Masami (Wait for disabling all running kprobe handlers).
However, I think it would be better to discuss 3. later.
Btw, Steven, what about other pending (and orthogonal changes) ?
>From Jovi, from me. I see nothing new in linux-trace.git#for-next...
Oleg.
On Tue, 2013-07-16 at 21:22 +0200, Oleg Nesterov wrote:
> But tracing_open_generic_file()->trace_array_get() is racy, no?
> Somehow we need to ensure it is safe to use file / file->tr.
Ah crap. Yeah, I see what you mean. It's the race that I described with
deleting private data and using it on open. :-p
> Btw, Steven, what about other pending (and orthogonal changes) ?
> >From Jovi, from me. I see nothing new in linux-trace.git#for-next...
My INBOX is full with too much crap. I thought I grabbed the patches
from you and Jovi, but they may have been something else.
Can you just tell me what the subjects were. My INBOX currently has
>10,000 emails, and it's just getting worse.
-- Steve
On Tue, 2013-07-16 at 15:38 -0400, Steven Rostedt wrote:
>
> > Btw, Steven, what about other pending (and orthogonal changes) ?
> > >From Jovi, from me. I see nothing new in linux-trace.git#for-next...
>
> My INBOX is full with too much crap. I thought I grabbed the patches
> from you and Jovi, but they may have been something else.
>
> Can you just tell me what the subjects were. My INBOX currently has
> >10,000 emails, and it's just getting worse.
Ping?
-- Steve
On 07/17, Steven Rostedt wrote:
>
> On Tue, 2013-07-16 at 15:38 -0400, Steven Rostedt wrote:
>
> > > Btw, Steven, what about other pending (and orthogonal changes) ?
> > > >From Jovi, from me. I see nothing new in linux-trace.git#for-next...
> >
> > My INBOX is full with too much crap. I thought I grabbed the patches
> > from you and Jovi, but they may have been something else.
> >
> > Can you just tell me what the subjects were. My INBOX currently has
> > >10,000 emails, and it's just getting worse.
Please see the attached mbox. It contains 2 series with
perf_trace_buf/perf_xxx hacks.
1-3 looks very simple and straightforward.
4-6 reimplement TP_perf_assign(), we already discussed this (and you even
did the performance testing), but I am still not sure if you agree with
this hack or not.
And 2 patches from Jovi, acked by Masami and me:
[PATCH 1/2 v6] tracing/uprobes: Support ftrace_event_file base multibuffer
[PATCH 2/2 v6] tracing/uprobes: Support soft-mode disabling
Oleg.