2006-08-08 15:51:19

by Kirill Korotaev

[permalink] [raw]
Subject: [PATCH] sys_getppid oopses on debug kernel (v2)

sys_getppid() optimization can access a freed memory.
On kernels with DEBUG_SLAB turned ON, this results in Oops.
As Dave Hansen noted, this optimization is also unsafe
for memory hotplug.

So this patch always takes the lock to be safe.

Signed-Off-By: Kirill Korotaev <[email protected]>


--- ./kernel/timer.c.ppiddbg 2006-07-14 19:11:06.000000000 +0400
+++ ./kernel/timer.c 2006-08-08 19:45:57.000000000 +0400
@@ -1342,28 +1342,11 @@ asmlinkage long sys_getpid(void)
asmlinkage long sys_getppid(void)
{
int pid;
- struct task_struct *me = current;
- struct task_struct *parent;

- parent = me->group_leader->real_parent;
- for (;;) {
- pid = parent->tgid;
-#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT)
-{
- struct task_struct *old = parent;
+ read_lock(&tasklist_lock);
+ pid = current->group_leader->real_parent->tgid;
+ read_unlock(&tasklist_lock);

- /*
- * Make sure we read the pid before re-reading the
- * parent pointer:
- */
- smp_rmb();
- parent = me->group_leader->real_parent;
- if (old != parent)
- continue;
-}
-#endif
- break;
- }
return pid;
}


2006-08-08 16:09:50

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] sys_getppid oopses on debug kernel (v2)

On Tue, 2006-08-08 at 19:50 +0400, Kirill Korotaev wrote:
> sys_getppid() optimization can access a freed memory.
> On kernels with DEBUG_SLAB turned ON, this results in Oops.
> As Dave Hansen noted, this optimization is also unsafe
> for memory hotplug.

Yeah, I just tried coding up something to do a seqlock to note when
tasks are freed, but it doesn't work. Unless somebody is really going
crazy with getppid() on a very large system, this should be just fine.

-- Dave


2006-08-09 10:14:39

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] sys_getppid oopses on debug kernel (v2)

Andrew Morton wrote:
>
> Although I'm not sure it's needed for this problem. A getppid() which does
>
> asmlinkage long sys_getppid(void)
> {
> int pid;
>
> read_lock(&tasklist_lock);
> pid = current->group_leader->real_parent->tgid;
> read_unlock(&tasklist_lock);
>
> return pid;
> }
>
> seems like a fine implementation to me ;)

Why do we need to use ->group_leader? All threads should have the same
->real_parent.

Why do we need tasklist_lock? I think rcu_read_lock() is enough.

In other words, do you see any problems with this code

smlinkage long sys_getppid(void)
{
int pid;

rcu_read_lock();
pid = rcu_dereference(current->real_parent)->tgid;
rcu_read_unlock();

return pid;
}

? Yes, we may read a stale value for ->real_parent, but the memory
can't be freed while we are under rcu_read_lock(). And in this case
the returned value is ok because the task could be reparented just
after return anyway.

Oleg.

2006-08-09 12:06:59

by Kirill Korotaev

[permalink] [raw]
Subject: Re: [PATCH] sys_getppid oopses on debug kernel (v2)

>>Although I'm not sure it's needed for this problem. A getppid() which does
>>
>>asmlinkage long sys_getppid(void)
>>{
>> int pid;
>>
>> read_lock(&tasklist_lock);
>> pid = current->group_leader->real_parent->tgid;
>> read_unlock(&tasklist_lock);
>>
>> return pid;
>>}
>>
>>seems like a fine implementation to me ;)
>
>
> Why do we need to use ->group_leader? All threads should have the same
> ->real_parent.
I'm not sure this is true for old LinuxThreads...

> Why do we need tasklist_lock? I think rcu_read_lock() is enough.
>
> In other words, do you see any problems with this code
>
> smlinkage long sys_getppid(void)
> {
> int pid;
>
> rcu_read_lock();
> pid = rcu_dereference(current->real_parent)->tgid;
> rcu_read_unlock();
>
> return pid;
> }
>
> ? Yes, we may read a stale value for ->real_parent, but the memory
> can't be freed while we are under rcu_read_lock(). And in this case
> the returned value is ok because the task could be reparented just
> after return anyway.
Your patch doesn't cure the problem.
rcu_read_lock just disables preemtion and rcu_dereference
introduces memory barrier. _None_ of this _prevents_
another CPU from freeing old real_parent in parallel with your dereference.

You can minimize the probability very much by making local_irq_disable()/enable()
around the code in question, but still it won't be a real fix (at least due to NMIs).

Kirill

2006-08-09 12:31:07

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] sys_getppid oopses on debug kernel (v2)

On 08/09, Kirill Korotaev wrote:
>
> >Why do we need to use ->group_leader? All threads should have the same
> >->real_parent.
> I'm not sure this is true for old LinuxThreads...

Yes, from the user-space pov it may be not true, but ->group_leader->real_parent
should be equal ->real_parent anyway.

> >Why do we need tasklist_lock? I think rcu_read_lock() is enough.
> >
> >In other words, do you see any problems with this code
> >
> > smlinkage long sys_getppid(void)
> > {
> > int pid;
> >
> > rcu_read_lock();
> > pid = rcu_dereference(current->real_parent)->tgid;
> > rcu_read_unlock();
> >
> > return pid;
> > }
> >
> >? Yes, we may read a stale value for ->real_parent, but the memory
> >can't be freed while we are under rcu_read_lock(). And in this case
> >the returned value is ok because the task could be reparented just
> >after return anyway.
> Your patch doesn't cure the problem.
> rcu_read_lock just disables preemtion and rcu_dereference
> introduces memory barrier. _None_ of this _prevents_
> another CPU from freeing old real_parent in parallel with your dereference.

How so? Note that release_task() doesn't call put_task_struct(), it does
call_rcu(&p->rcu, delayed_put_task_struct) instead. When delayed_put_task_struct()
is called, all CPUs must see the new value of ->real_parent (otherwise
RCU is just broken). If CPU sees the old value of ->real_parent, rcu_read_lock()
protects us from delayed_put_task_struct() on another CPU.

Ok, I think this is the same "classic" pattern as:

old = global_ptr;
global_ptr = new;
call_rcu(..free_old...);
vs
rcu_read_lock();
use(global_ptr);
rcu_read_unlock();

Do you agree?

Oleg.

2006-08-09 13:01:26

by Kirill Korotaev

[permalink] [raw]
Subject: Re: [PATCH] sys_getppid oopses on debug kernel (v2)

>>Your patch doesn't cure the problem.
>>rcu_read_lock just disables preemtion and rcu_dereference
>>introduces memory barrier. _None_ of this _prevents_
>>another CPU from freeing old real_parent in parallel with your dereference.
>
>
> How so? Note that release_task() doesn't call put_task_struct(), it does
> call_rcu(&p->rcu, delayed_put_task_struct) instead. When delayed_put_task_struct()
> is called, all CPUs must see the new value of ->real_parent (otherwise
> RCU is just broken). If CPU sees the old value of ->real_parent, rcu_read_lock()
> protects us from delayed_put_task_struct() on another CPU.
>
> Ok, I think this is the same "classic" pattern as:
>
> old = global_ptr;
> global_ptr = new;
> call_rcu(..free_old...);
> vs
> rcu_read_lock();
> use(global_ptr);
> rcu_read_unlock();
>
> Do you agree?
Agree :)
Sorry, I didn't notice that task structs are freed now with RCU :)))
In this case your patch really helps the problem. Care to prepare it?

Thanks,
Kirill

2006-08-09 14:00:38

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] sys_getppid-oopses-on-debug-kernel-v2-simplify

On top of Kirill's sys_getppid-oopses-on-debug-kernel-v2.patch

- We don't need ->group_leader->real_parent, all threads should
have the same ->real_parent.

- We don't need tasklist_lock, task_struct is freed by RCU, so
rcu_read_lock() should be enough.

(Compile tested)

Signed-off-by: Oleg Nesterov <[email protected]>

--- 2.6.18-rc3/kernel/timer.c~ 2006-08-09 22:08:51.000000000 +0400
+++ 2.6.18-rc3/kernel/timer.c 2006-08-09 22:15:35.000000000 +0400
@@ -1324,28 +1324,18 @@ asmlinkage long sys_getpid(void)
}

/*
- * Accessing ->group_leader->real_parent is not SMP-safe, it could
- * change from under us. However, rather than getting any lock
- * we can use an optimistic algorithm: get the parent
- * pid, and go back and check that the parent is still
- * the same. If it has changed (which is extremely unlikely
- * indeed), we just try again..
- *
- * NOTE! This depends on the fact that even if we _do_
- * get an old value of "parent", we can happily dereference
- * the pointer (it was and remains a dereferencable kernel pointer
- * no matter what): we just can't necessarily trust the result
- * until we know that the parent pointer is valid.
- *
- * NOTE2: ->group_leader never changes from under us.
+ * Accessing ->real_parent is not SMP-safe, it could
+ * change from under us. However, we can use a stale
+ * value of ->real_parent under rcu_read_lock(), see
+ * release_task()->call_rcu(delayed_put_task_struct).
*/
asmlinkage long sys_getppid(void)
{
int pid;

- read_lock(&tasklist_lock);
- pid = current->group_leader->real_parent->tgid;
- read_unlock(&tasklist_lock);
+ rcu_read_lock();
+ pid = rcu_dereference(current->real_parent)->tgid;
+ rcu_read_unlock();

return pid;
}