It seems that alloc_retstack_tasklist() can also take a lockless
approach for scanning the tasklist, instead of using the big global
tasklist_lock. For this we also kill another deprecated and rcu-unsafe
tsk->thread_group user replacing it with for_each_process_thread(),
maintaining semantics.
Here tasklist_lock does not protect anything other than the list
against concurrent fork/exit. And considering that the whole thing
is capped by FTRACE_RETSTACK_ALLOC_SIZE (32), it should not be a
problem to have a pontentially stale, yet stable, list. The task cannot
go away either, so we don't risk racing with ftrace_graph_exit_task()
which clears the retstack.
The tsk->ret_stack management is not protected by tasklist_lock, being
serialized with the corresponding publish/subscribe barriers against
concurrent ftrace_push_return_trace(). In addition this plays nicer
with cachelines by avoiding two atomic ops in the uncontended case.
Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/trace/fgraph.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 1af321dec0f1..5658f13037b3 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -387,8 +387,8 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
}
}
- read_lock(&tasklist_lock);
- do_each_thread(g, t) {
+ rcu_read_lock();
+ for_each_process_thread(g, t) {
if (start == end) {
ret = -EAGAIN;
goto unlock;
@@ -403,10 +403,10 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
smp_wmb();
t->ret_stack = ret_stack_list[start++];
}
- } while_each_thread(g, t);
+ }
unlock:
- read_unlock(&tasklist_lock);
+ rcu_read_unlock();
free:
for (i = start; i < end; i++)
kfree(ret_stack_list[i]);
--
2.26.2
On 09/06, Davidlohr Bueso wrote:
>
> Here tasklist_lock does not protect anything other than the list
> against concurrent fork/exit. And considering that the whole thing
> is capped by FTRACE_RETSTACK_ALLOC_SIZE (32), it should not be a
> problem to have a pontentially stale, yet stable, list. The task cannot
> go away either, so we don't risk racing with ftrace_graph_exit_task()
> which clears the retstack.
I don't understand this code but I think you right, tasklist_lock buys
nothing.
Afaics, with or without this change alloc_retstack_tasklist() can race
with copy_process() and miss the new child; ftrace_graph_init_task()
can't help, ftrace_graph_active can be set right after the check and
for_each_process_thread() can't see the new process yet.
This can't race with ftrace_graph_exit_task(), it is called after the
full gp pass. But this function looks very confusing to me, I don't
understand the barrier and the "NULL must become visible to IRQs before
we free it" comment.
Looks like, ftrace_graph_exit_task() was called by the exiting task
in the past? Indeed, see 65afa5e603d50 ("tracing/function-return-tracer:
free the return stack on free_task()"). I think it makes sense to
simplify this function now, it can simply do kfree(t->ret_stack) and
nothing more.
ACK, but ...
> @@ -387,8 +387,8 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
> }
> }
>
> - read_lock(&tasklist_lock);
then you should probably rename alloc_retstack_tasklist() ?
Oleg.
[ Back from my PTO and still digging out emails ]
On Mon, 7 Sep 2020 13:43:02 +0200
Oleg Nesterov <[email protected]> wrote:
> On 09/06, Davidlohr Bueso wrote:
> >
> > Here tasklist_lock does not protect anything other than the list
> > against concurrent fork/exit. And considering that the whole thing
> > is capped by FTRACE_RETSTACK_ALLOC_SIZE (32), it should not be a
> > problem to have a pontentially stale, yet stable, list. The task cannot
> > go away either, so we don't risk racing with ftrace_graph_exit_task()
> > which clears the retstack.
>
> I don't understand this code but I think you right, tasklist_lock buys
> nothing.
When I first wrote this code, I didn't want to take tasklist_lock, but
there was questions if rcu_read_lock() was enough. And since this code is
far from a fast path, I decided it was better to be safe than sorry, and
took the tasklist_lock as a paranoid measure.
>
> Afaics, with or without this change alloc_retstack_tasklist() can race
> with copy_process() and miss the new child; ftrace_graph_init_task()
> can't help, ftrace_graph_active can be set right after the check and
> for_each_process_thread() can't see the new process yet.
There's a call in copy_process(): ftrace_graph_init_task() that initializes
a new tasks ret_stack, and this loop will ignore it because it first checks
to see if the task has a ret_stack before adding one to it. And the child
gets one before being added to the list.
>
> This can't race with ftrace_graph_exit_task(), it is called after the
> full gp pass. But this function looks very confusing to me, I don't
> understand the barrier and the "NULL must become visible to IRQs before
> we free it" comment.
Probably not needed, but again, being very paranoid, as to not crash
anything. If this is called on a task that is running, and an interrupt
comes in after it is freed, but before the ret_stack variable is set to
NULL, then it will try to use it. I don't think this is possible, but it
may have been in the past.
>
> Looks like, ftrace_graph_exit_task() was called by the exiting task
> in the past? Indeed, see 65afa5e603d50 ("tracing/function-return-tracer:
> free the return stack on free_task()"). I think it makes sense to
> simplify this function now, it can simply do kfree(t->ret_stack) and
> nothing more.
Ah, yeah, then you are right. If it can't be called on a running task then
it can be simplified. Probably need a:
WARN_ON_ONCE(t->on_rq);
just to make sure this never happens.
>
> ACK, but ...
>
> > @@ -387,8 +387,8 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
> > }
> > }
> >
> > - read_lock(&tasklist_lock);
>
> then you should probably rename alloc_retstack_tasklist() ?
>
tasklist, process thead? Is there a difference?
Thanks for reviewing this!
-- Steve
On 09/18, Steven Rostedt wrote:
>
> On Mon, 7 Sep 2020 13:43:02 +0200
> Oleg Nesterov <[email protected]> wrote:
>
> > Afaics, with or without this change alloc_retstack_tasklist() can race
> > with copy_process() and miss the new child; ftrace_graph_init_task()
> > can't help, ftrace_graph_active can be set right after the check and
> > for_each_process_thread() can't see the new process yet.
>
> There's a call in copy_process(): ftrace_graph_init_task() that initializes
> a new tasks ret_stack,
Only if ftrace_graph_active != 0.
register_ftrace_graph() can increment ftrace_graph_active and call
alloc_retstack_tasklist() right after ftrace_graph_init_task() checks
ftrace_graph_active.
> and this loop will ignore it
and this loop won't see it unless the forking process finishes copy_process()
and does list_add_tail_rcu(&p->tasks, &init_task.tasks) which makes it
visible to for_each_process(). Yes, this is very unlikely.
> > Looks like, ftrace_graph_exit_task() was called by the exiting task
> > in the past? Indeed, see 65afa5e603d50 ("tracing/function-return-tracer:
> > free the return stack on free_task()"). I think it makes sense to
> > simplify this function now, it can simply do kfree(t->ret_stack) and
> > nothing more.
>
> Ah, yeah, then you are right. If it can't be called on a running task then
> it can be simplified. Probably need a:
>
> WARN_ON_ONCE(t->on_rq);
>
> just to make sure this never happens.
Well, ftrace_graph_exit_task(t) is called by free_task(t), right before
kmem_cache_free(t).
> > ACK, but ...
> >
> > > @@ -387,8 +387,8 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
> > > }
> > > }
> > >
> > > - read_lock(&tasklist_lock);
> >
> > then you should probably rename alloc_retstack_tasklist() ?
> >
>
> tasklist, process thead? Is there a difference?
Aah, please ignore. Somehow I misinterpreted the _tasklist suffix, as if it
refers to tasklist_lock.
Oleg.
On Sat, 19 Sep 2020 13:24:39 +0200
Oleg Nesterov <[email protected]> wrote:
> On 09/18, Steven Rostedt wrote:
> >
> > On Mon, 7 Sep 2020 13:43:02 +0200
> > Oleg Nesterov <[email protected]> wrote:
> >
> > > Afaics, with or without this change alloc_retstack_tasklist() can race
> > > with copy_process() and miss the new child; ftrace_graph_init_task()
> > > can't help, ftrace_graph_active can be set right after the check and
> > > for_each_process_thread() can't see the new process yet.
> >
> > There's a call in copy_process(): ftrace_graph_init_task() that initializes
> > a new tasks ret_stack,
>
> Only if ftrace_graph_active != 0.
>
> register_ftrace_graph() can increment ftrace_graph_active and call
> alloc_retstack_tasklist() right after ftrace_graph_init_task() checks
> ftrace_graph_active.
>
> > and this loop will ignore it
>
> and this loop won't see it unless the forking process finishes copy_process()
> and does list_add_tail_rcu(&p->tasks, &init_task.tasks) which makes it
> visible to for_each_process(). Yes, this is very unlikely.
Ah, I see what you mean. Hmm, not sure the best way to fix this. It
would be very rare to trigger and the only thing that it would do is
not to trace the new task. But that could be frustrating if that
happens. I guess we could put the hook after it gets added to the list,
and use a cmpxchg to update the ret_stack, to make sure we don't race
on initialization and copy_process.
-- Steve