On 11/14/23 11:32 AM, Oleg Nesterov wrote:
> Lockless use of next_thread() should be avoided, kernel/bpf/task_iter.c
> is the last user and the usage is wrong.
>
> task_group_seq_get_next() can return the group leader twice if it races
> with mt-thread exec which changes the group->leader's pid.
>
> Change the main loop to use __next_thread(), kill "next_tid == common->pid"
> check.
>
> __next_thread() can't loop forever, we can also change this code to retry
> if next_tid == 0.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/bpf/task_iter.c | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 26082b97894d..51ae15e2b290 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -70,15 +70,13 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
> return NULL;
>
> retry:
> - task = next_thread(task);
> + task = __next_thread(task);
> + if (!task)
> + return NULL;
>
> next_tid = __task_pid_nr_ns(task, PIDTYPE_PID, common->ns);
> - if (!next_tid || next_tid == common->pid) {
> - /* Run out of tasks of a process. The tasks of a
> - * thread_group are linked as circular linked list.
> - */
> - return NULL;
> - }
> + if (!next_tid)
> + goto retry;
Look at the code. Looks like next_tid should never be 0 unless some
task is migrated to other namespace which I think is not possible.
common->ns is assigned as below:
common->ns = get_pid_ns(task_active_pid_ns(current))
so we are searching tasks in the *current* namespace.
Look at:
pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns)
{
struct upid *upid;
pid_t nr = 0;
if (pid && ns->level <= pid->level) {
upid = &pid->numbers[ns->level];
if (upid->ns == ns)
nr = upid->nr;
}
return nr;
}
pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
struct pid_namespace *ns)
{
pid_t nr = 0;
rcu_read_lock();
if (!ns)
ns = task_active_pid_ns(current);
nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns);
rcu_read_unlock();
return nr;
}
In func pid_nr_ns(), ns->level should be equal to pid->level if pid is
in input parameter 'ns'. and in this case the return value 'nr'
should be none zero.
If this is the case, could you remove
if (!next_tid)
goto retry;
Other than above, the change looks good to me.
>
> if (skip_if_dup_files && task->files == task->group_leader->files)
> goto retry;
On 11/15, Yonghong Song wrote:
>
> On 11/14/23 11:32 AM, Oleg Nesterov wrote:
> >@@ -70,15 +70,13 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
> > return NULL;
> > retry:
> >- task = next_thread(task);
> >+ task = __next_thread(task);
> >+ if (!task)
> >+ return NULL;
> > next_tid = __task_pid_nr_ns(task, PIDTYPE_PID, common->ns);
> >- if (!next_tid || next_tid == common->pid) {
> >- /* Run out of tasks of a process. The tasks of a
> >- * thread_group are linked as circular linked list.
> >- */
> >- return NULL;
> >- }
> >+ if (!next_tid)
> >+ goto retry;
>
> Look at the code. Looks like next_tid should never be 0
...
> pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
> struct pid_namespace *ns)
> {
> pid_t nr = 0;
>
> rcu_read_lock();
> if (!ns)
> ns = task_active_pid_ns(current);
> nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns);
^^^^^^^^^^^^^^^^^^^^^^^^^
Please note that task_pid_ptr(task, type)) can return NULL if this
task has already exited and called detach_pid().
detach_pid() does __change_pid(task, type, NULL), please note the
*pid_ptr = new; // NULL in this case
assignment in __change_pid().
IOW. The problem is not that ns can change, the problem is that
task->thread_pid (and other pid links) can be NULL, and in this
case pid_nr_ns() returns zero.
This code should be rewritten from the very beginning, it should
not rely on pid_nr. If nothing else common->pid and/or pid_visiting
can be reused. But currently my only concern is next_thread().
> Other than above, the change looks good to me.
Thanks for review!
Oleg.
On 11/16/23 4:34 AM, Oleg Nesterov wrote:
> On 11/15, Yonghong Song wrote:
>> On 11/14/23 11:32 AM, Oleg Nesterov wrote:
>>> @@ -70,15 +70,13 @@ static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_comm
>>> return NULL;
>>> retry:
>>> - task = next_thread(task);
>>> + task = __next_thread(task);
>>> + if (!task)
>>> + return NULL;
>>> next_tid = __task_pid_nr_ns(task, PIDTYPE_PID, common->ns);
>>> - if (!next_tid || next_tid == common->pid) {
>>> - /* Run out of tasks of a process. The tasks of a
>>> - * thread_group are linked as circular linked list.
>>> - */
>>> - return NULL;
>>> - }
>>> + if (!next_tid)
>>> + goto retry;
>> Look at the code. Looks like next_tid should never be 0
> ...
>
>> pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
>> struct pid_namespace *ns)
>> {
>> pid_t nr = 0;
>>
>> rcu_read_lock();
>> if (!ns)
>> ns = task_active_pid_ns(current);
>> nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns);
> ^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Please note that task_pid_ptr(task, type)) can return NULL if this
> task has already exited and called detach_pid().
>
> detach_pid() does __change_pid(task, type, NULL), please note the
>
> *pid_ptr = new; // NULL in this case
>
> assignment in __change_pid().
>
> IOW. The problem is not that ns can change, the problem is that
> task->thread_pid (and other pid links) can be NULL, and in this
> case pid_nr_ns() returns zero.
Thanks for explanation. I certainly missed race between task
iterator and __change_pid(). Then the patch looks good to me.
Acked-by: Yonghong Song <[email protected]>
>
>
> This code should be rewritten from the very beginning, it should
> not rely on pid_nr. If nothing else common->pid and/or pid_visiting
> can be reused. But currently my only concern is next_thread().
>
>> Other than above, the change looks good to me.
> Thanks for review!
>
> Oleg.
>