2010-06-07 18:14:29

by Miles Lane

[permalink] [raw]
Subject: 2.6.35-rc2-git1 - include/linux/cgroup.h:534 invoked rcu_dereference_check() without protection!

Hi All,

I just reproduced a warning I reported quite a while ago.? Is a patch
for this in the pipeline?

[??? 0.167267] [ INFO: suspicious rcu_dereference_check() usage. ]
[??? 0.167396] ---------------------------------------------------
[??? 0.167526] include/linux/cgroup.h:534 invoked
rcu_dereference_check() without protection!
[??? 0.167728]
[??? 0.167729] other info that might help us debug this:
[??? 0.167731]
[??? 0.168092]
[??? 0.168093] rcu_scheduler_active = 1, debug_locks = 1
[??? 0.168337] no locks held by watchdog/0/5.
[??? 0.168462]
[??? 0.168463] stack backtrace:
[??? 0.168704] Pid: 5, comm: watchdog/0 Not tainted 2.6.35-rc2-git1 #8
[??? 0.168834] Call Trace:
[??? 0.168965]? [<ffffffff81064e9c>] lockdep_rcu_dereference+0x9d/0xa5
[??? 0.169100]? [<ffffffff8102c1ce>] task_subsys_state+0x59/0x70
[??? 0.169232]? [<ffffffff8103189b>] __sched_setscheduler+0x19d/0x2f8
[??? 0.169365]? [<ffffffff8102a5ef>] ? need_resched+0x1e/0x28
[??? 0.169497]? [<ffffffff813c7d01>] ? schedule+0x586/0x619
[??? 0.169628]? [<ffffffff81081c33>] ? watchdog+0x0/0x8c
[??? 0.169758]? [<ffffffff81031a11>] sched_setscheduler+0xe/0x10
[??? 0.169889]? [<ffffffff81081c5d>] watchdog+0x2a/0x8c
[??? 0.170010]? [<ffffffff81081c33>] ? watchdog+0x0/0x8c
[??? 0.170141]? [<ffffffff81054a82>] kthread+0x89/0x91
[??? 0.170274]? [<ffffffff81003054>] kernel_thread_helper+0x4/0x10
[??? 0.170405]? [<ffffffff813ca480>] ? restore_args+0x0/0x30
[??? 0.170536]? [<ffffffff810549f9>] ? kthread+0x0/0x91
[??? 0.170667]? [<ffffffff81003050>] ? kernel_thread_helper+0x0/0x10
[ 0.176751] lockdep: fixing up alternatives.


2010-06-08 00:19:38

by Paul E. McKenney

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git1 - include/linux/cgroup.h:534 invoked rcu_dereference_check() without protection!

On Mon, Jun 07, 2010 at 02:14:25PM -0400, Miles Lane wrote:
> Hi All,
>
> I just reproduced a warning I reported quite a while ago.? Is a patch
> for this in the pipeline?

I proposed a patch, thinking that it was a false positive. Peter Zijlstra
pointed out that there was a real race, and proposed an alternative patch,
which may be found at http://lkml.org/lkml/2010/4/22/603.

Could you please test Peter's patch and let us know if it cures the problem?

Thanx, Paul

> [??? 0.167267] [ INFO: suspicious rcu_dereference_check() usage. ]
> [??? 0.167396] ---------------------------------------------------
> [??? 0.167526] include/linux/cgroup.h:534 invoked
> rcu_dereference_check() without protection!
> [??? 0.167728]
> [??? 0.167729] other info that might help us debug this:
> [??? 0.167731]
> [??? 0.168092]
> [??? 0.168093] rcu_scheduler_active = 1, debug_locks = 1
> [??? 0.168337] no locks held by watchdog/0/5.
> [??? 0.168462]
> [??? 0.168463] stack backtrace:
> [??? 0.168704] Pid: 5, comm: watchdog/0 Not tainted 2.6.35-rc2-git1 #8
> [??? 0.168834] Call Trace:
> [??? 0.168965]? [<ffffffff81064e9c>] lockdep_rcu_dereference+0x9d/0xa5
> [??? 0.169100]? [<ffffffff8102c1ce>] task_subsys_state+0x59/0x70
> [??? 0.169232]? [<ffffffff8103189b>] __sched_setscheduler+0x19d/0x2f8
> [??? 0.169365]? [<ffffffff8102a5ef>] ? need_resched+0x1e/0x28
> [??? 0.169497]? [<ffffffff813c7d01>] ? schedule+0x586/0x619
> [??? 0.169628]? [<ffffffff81081c33>] ? watchdog+0x0/0x8c
> [??? 0.169758]? [<ffffffff81031a11>] sched_setscheduler+0xe/0x10
> [??? 0.169889]? [<ffffffff81081c5d>] watchdog+0x2a/0x8c
> [??? 0.170010]? [<ffffffff81081c33>] ? watchdog+0x0/0x8c
> [??? 0.170141]? [<ffffffff81054a82>] kthread+0x89/0x91
> [??? 0.170274]? [<ffffffff81003054>] kernel_thread_helper+0x4/0x10
> [??? 0.170405]? [<ffffffff813ca480>] ? restore_args+0x0/0x30
> [??? 0.170536]? [<ffffffff810549f9>] ? kthread+0x0/0x91
> [??? 0.170667]? [<ffffffff81003050>] ? kernel_thread_helper+0x0/0x10
> [ 0.176751] lockdep: fixing up alternatives.

2010-06-08 04:16:20

by Miles Lane

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git1 - include/linux/cgroup.h:534 invoked rcu_dereference_check() without protection!

On Mon, Jun 7, 2010 at 8:19 PM, Paul E. McKenney
<[email protected]> wrote:
> On Mon, Jun 07, 2010 at 02:14:25PM -0400, Miles Lane wrote:
>> Hi All,
>>
>> I just reproduced a warning I reported quite a while ago.? Is a patch
>> for this in the pipeline?
>
> I proposed a patch, thinking that it was a false positive. ?Peter Zijlstra
> pointed out that there was a real race, and proposed an alternative patch,
> which may be found at http://lkml.org/lkml/2010/4/22/603.
>
> Could you please test Peter's patch and let us know if it cures the problem?
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Thanx, Paul
>
>> [??? 0.167267] [ INFO: suspicious rcu_dereference_check() usage. ]
>> [??? 0.167396] ---------------------------------------------------
>> [??? 0.167526] include/linux/cgroup.h:534 invoked
>> rcu_dereference_check() without protection!
>> [??? 0.167728]
>> [??? 0.167729] other info that might help us debug this:
>> [??? 0.167731]
>> [??? 0.168092]
>> [??? 0.168093] rcu_scheduler_active = 1, debug_locks = 1
>> [??? 0.168337] no locks held by watchdog/0/5.
>> [??? 0.168462]
>> [??? 0.168463] stack backtrace:
>> [??? 0.168704] Pid: 5, comm: watchdog/0 Not tainted 2.6.35-rc2-git1 #8
>> [??? 0.168834] Call Trace:
>> [??? 0.168965]? [<ffffffff81064e9c>] lockdep_rcu_dereference+0x9d/0xa5
>> [??? 0.169100]? [<ffffffff8102c1ce>] task_subsys_state+0x59/0x70
>> [??? 0.169232]? [<ffffffff8103189b>] __sched_setscheduler+0x19d/0x2f8
>> [??? 0.169365]? [<ffffffff8102a5ef>] ? need_resched+0x1e/0x28
>> [??? 0.169497]? [<ffffffff813c7d01>] ? schedule+0x586/0x619
>> [??? 0.169628]? [<ffffffff81081c33>] ? watchdog+0x0/0x8c
>> [??? 0.169758]? [<ffffffff81031a11>] sched_setscheduler+0xe/0x10
>> [??? 0.169889]? [<ffffffff81081c5d>] watchdog+0x2a/0x8c
>> [??? 0.170010]? [<ffffffff81081c33>] ? watchdog+0x0/0x8c
>> [??? 0.170141]? [<ffffffff81054a82>] kthread+0x89/0x91
>> [??? 0.170274]? [<ffffffff81003054>] kernel_thread_helper+0x4/0x10
>> [??? 0.170405]? [<ffffffff813ca480>] ? restore_args+0x0/0x30
>> [??? 0.170536]? [<ffffffff810549f9>] ? kthread+0x0/0x91
>> [??? 0.170667]? [<ffffffff81003050>] ? kernel_thread_helper+0x0/0x10
>> [ ? ?0.176751] lockdep: fixing up alternatives.
>

With the patch, I get:

[ 0.151274] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 0.151390] ---------------------------------------------------
[ 0.151520] include/linux/cgroup.h:534 invoked
rcu_dereference_check() without protection!
[ 0.151723]
[ 0.151724] other info that might help us debug this:
[ 0.151726]
[ 0.151999]
[ 0.151999] rcu_scheduler_active = 1, debug_locks = 1
[ 0.151999] 2 locks held by kthreadd/10:
[ 0.151999] #0: (key){......}, at: [<ffffffff81036578>] complete+0x1c/0x4e
[ 0.151999] #1: (&rq->lock){-.-...}, at: [<ffffffff81037875>]
select_task_rq_fair+0x21f/0x791
[ 0.151999]
[ 0.151999] stack backtrace:
[ 0.151999] Pid: 10, comm: kthreadd Not tainted 2.6.35-rc2-git1 #11
[ 0.151999] Call Trace:
[ 0.151999] [<ffffffff81070a45>] lockdep_rcu_dereference+0x9d/0xa5
[ 0.151999] [<ffffffff8103675e>] task_subsys_state+0x59/0x70
[ 0.151999] [<ffffffff8103799a>] select_task_rq_fair+0x344/0x791
[ 0.151999] [<ffffffff81037335>] ? task_rq_lock+0x68/0x9d
[ 0.151999] [<ffffffff811d62f3>] ? do_raw_spin_lock+0x79/0x13e
[ 0.151999] [<ffffffff81037335>] ? task_rq_lock+0x68/0x9d
[ 0.151999] [<ffffffff8103ac1e>] select_task_rq+0x13/0x44
[ 0.151999] [<ffffffff810417c3>] try_to_wake_up+0xf2/0x37d
[ 0.151999] [<ffffffff81041a5b>] default_wake_function+0xd/0xf
[ 0.151999] [<ffffffff81034272>] __wake_up_common+0x49/0x7f
[ 0.151999] [<ffffffff81036596>] complete+0x3a/0x4e
[ 0.151999] [<ffffffff8105b598>] ? worker_thread+0x0/0x3a7
[ 0.151999] [<ffffffff8105f7d0>] kthread+0x73/0x91
[ 0.151999] [<ffffffff8100aba4>] kernel_thread_helper+0x4/0x10
[ 0.151999] [<ffffffff813e3694>] ? restore_args+0x0/0x30
[ 0.151999] [<ffffffff8105f75d>] ? kthread+0x0/0x91
[ 0.151999] [<ffffffff8100aba0>] ? kernel_thread_helper+0x0/0x10

2010-06-08 09:02:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git1 - include/linux/cgroup.h:534 invoked rcu_dereference_check() without protection!

On Tue, 2010-06-08 at 00:16 -0400, Miles Lane wrote:
> On Mon, Jun 7, 2010 at 8:19 PM, Paul E. McKenney
> <[email protected]> wrote:
> > On Mon, Jun 07, 2010 at 02:14:25PM -0400, Miles Lane wrote:
> >> Hi All,
> >>
> >> I just reproduced a warning I reported quite a while ago. Is a patch
> >> for this in the pipeline?
> >
> > I proposed a patch, thinking that it was a false positive. Peter Zijlstra
> > pointed out that there was a real race, and proposed an alternative patch,
> > which may be found at http://lkml.org/lkml/2010/4/22/603.
> >
> > Could you please test Peter's patch and let us know if it cures the problem?
> >

Gah, this task_group() stuff is annoying, how about something like the
below which teaches task_group() about the task_rq()->lock rule?

---
include/linux/cgroup.h | 20 +++++++++++----
kernel/sched.c | 61 +++++++++++++++++++++++++----------------------
2 files changed, 46 insertions(+), 35 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 0c62160..1efd212 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -525,13 +525,21 @@ static inline struct cgroup_subsys_state *cgroup_subsys_state(
return cgrp->subsys[subsys_id];
}

-static inline struct cgroup_subsys_state *task_subsys_state(
- struct task_struct *task, int subsys_id)
+/*
+ * function to get the cgroup_subsys_state which allows for extra
+ * rcu_dereference_check() conditions, such as locks used during the
+ * cgroup_subsys::attach() methods.
+ */
+#define task_subsys_state_check(task, subsys_id, __c) \
+ rcu_dereference_check(task->cgroups->subsys[subsys_id], \
+ rcu_read_lock_held() || \
+ lockdep_is_held(&task->alloc_lock) || \
+ cgroup_lock_is_held() || (__c))
+
+static inline struct cgroup_subsys_state *
+task_subsys_state(struct task_struct *task, int subsys_id)
{
- return rcu_dereference_check(task->cgroups->subsys[subsys_id],
- rcu_read_lock_held() ||
- lockdep_is_held(&task->alloc_lock) ||
- cgroup_lock_is_held());
+ return task_subsys_state_check(task, subsys_id, false);
}

static inline struct cgroup* task_cgroup(struct task_struct *task,
diff --git a/kernel/sched.c b/kernel/sched.c
index f8b8996..e01bb45 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -306,32 +306,26 @@ static int init_task_group_load = INIT_TASK_GROUP_LOAD;
*/
struct task_group init_task_group;

-/* return group to which a task belongs */
+/*
+ * Return the group to which this tasks belongs.
+ *
+ * We use task_subsys_state_check() and extend the RCU verification
+ * with lockdep_is_held(&task_rq(p)->lock) because cpu_cgroup_attach()
+ * holds that lock for each task it moves into the cgroup. Therefore
+ * by holding that lock, we pin the task to the current cgroup.
+ */
static inline struct task_group *task_group(struct task_struct *p)
{
- struct task_group *tg;
+ struct cgroup_subsys_state *css;

-#ifdef CONFIG_CGROUP_SCHED
- tg = container_of(task_subsys_state(p, cpu_cgroup_subsys_id),
- struct task_group, css);
-#else
- tg = &init_task_group;
-#endif
- return tg;
+ css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
+ lockdep_is_held(&task_rq(p)->lock));
+ return container_of(css, struct task_group, css);
}

/* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
{
- /*
- * Strictly speaking this rcu_read_lock() is not needed since the
- * task_group is tied to the cgroup, which in turn can never go away
- * as long as there are tasks attached to it.
- *
- * However since task_group() uses task_subsys_state() which is an
- * rcu_dereference() user, this quiets CONFIG_PROVE_RCU.
- */
- rcu_read_lock();
#ifdef CONFIG_FAIR_GROUP_SCHED
p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
p->se.parent = task_group(p)->se[cpu];
@@ -341,7 +335,6 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
p->rt.rt_rq = task_group(p)->rt_rq[cpu];
p->rt.parent = task_group(p)->rt_se[cpu];
#endif
- rcu_read_unlock();
}

#else
@@ -4465,16 +4458,6 @@ recheck:
}

if (user) {
-#ifdef CONFIG_RT_GROUP_SCHED
- /*
- * Do not allow realtime tasks into groups that have no runtime
- * assigned.
- */
- if (rt_bandwidth_enabled() && rt_policy(policy) &&
- task_group(p)->rt_bandwidth.rt_runtime == 0)
- return -EPERM;
-#endif
-
retval = security_task_setscheduler(p, policy, param);
if (retval)
return retval;
@@ -4490,6 +4473,26 @@ recheck:
* runqueue lock must be held.
*/
rq = __task_rq_lock(p);
+
+ retval = 0;
+#ifdef CONFIG_RT_GROUP_SCHED
+ if (user) {
+ /*
+ * Do not allow realtime tasks into groups that have no runtime
+ * assigned.
+ */
+ if (rt_bandwidth_enabled() && rt_policy(policy) &&
+ task_group(p)->rt_bandwidth.rt_runtime == 0)
+ retval = -EPERM;
+
+ if (retval) {
+ __task_rq_unlock(rq);
+ raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+ return retval;
+ }
+ }
+#endif
+
/* recheck policy now with rq lock held */
if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
policy = oldpolicy = -1;

2010-06-08 13:14:25

by Miles Lane

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git1 - include/linux/cgroup.h:534 invoked rcu_dereference_check() without protection!

On Tue, Jun 8, 2010 at 4:40 AM, Peter Zijlstra <[email protected]> wrote:
> On Tue, 2010-06-08 at 00:16 -0400, Miles Lane wrote:
>> On Mon, Jun 7, 2010 at 8:19 PM, Paul E. McKenney
>> <[email protected]> wrote:
>> > On Mon, Jun 07, 2010 at 02:14:25PM -0400, Miles Lane wrote:
>> >> Hi All,
>> >>
>> >> I just reproduced a warning I reported quite a while ago. ?Is a patch
>> >> for this in the pipeline?
>> >
>> > I proposed a patch, thinking that it was a false positive. ?Peter Zijlstra
>> > pointed out that there was a real race, and proposed an alternative patch,
>> > which may be found at http://lkml.org/lkml/2010/4/22/603.
>> >
>> > Could you please test Peter's patch and let us know if it cures the problem?
>> >
>
> Gah, this task_group() stuff is annoying, how about something like the
> below which teaches task_group() about the task_rq()->lock rule?
>
> ---
> ?include/linux/cgroup.h | ? 20 +++++++++++----
> ?kernel/sched.c ? ? ? ? | ? 61 +++++++++++++++++++++++++----------------------
> ?2 files changed, 46 insertions(+), 35 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 0c62160..1efd212 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -525,13 +525,21 @@ static inline struct cgroup_subsys_state *cgroup_subsys_state(
> ? ? ? ?return cgrp->subsys[subsys_id];
> ?}
>
> -static inline struct cgroup_subsys_state *task_subsys_state(
> - ? ? ? struct task_struct *task, int subsys_id)
> +/*
> + * function to get the cgroup_subsys_state which allows for extra
> + * rcu_dereference_check() conditions, such as locks used during the
> + * cgroup_subsys::attach() methods.
> + */
> +#define task_subsys_state_check(task, subsys_id, __c) ? ? ? ? ? ? ? ? ?\
> + ? ? ? rcu_dereference_check(task->cgroups->subsys[subsys_id], ? ? ? ? \
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? rcu_read_lock_held() || ? ? ? ? ? ? ? ? ? \
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? lockdep_is_held(&task->alloc_lock) || ? ? \
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? cgroup_lock_is_held() || (__c))
> +
> +static inline struct cgroup_subsys_state *
> +task_subsys_state(struct task_struct *task, int subsys_id)
> ?{
> - ? ? ? return rcu_dereference_check(task->cgroups->subsys[subsys_id],
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?rcu_read_lock_held() ||
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?lockdep_is_held(&task->alloc_lock) ||
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?cgroup_lock_is_held());
> + ? ? ? return task_subsys_state_check(task, subsys_id, false);
> ?}
>
> ?static inline struct cgroup* task_cgroup(struct task_struct *task,
> diff --git a/kernel/sched.c b/kernel/sched.c
> index f8b8996..e01bb45 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -306,32 +306,26 @@ static int init_task_group_load = INIT_TASK_GROUP_LOAD;
> ?*/
> ?struct task_group init_task_group;
>
> -/* return group to which a task belongs */
> +/*
> + * Return the group to which this tasks belongs.
> + *
> + * We use task_subsys_state_check() and extend the RCU verification
> + * with lockdep_is_held(&task_rq(p)->lock) because cpu_cgroup_attach()
> + * holds that lock for each task it moves into the cgroup. Therefore
> + * by holding that lock, we pin the task to the current cgroup.
> + */
> ?static inline struct task_group *task_group(struct task_struct *p)
> ?{
> - ? ? ? struct task_group *tg;
> + ? ? ? struct cgroup_subsys_state *css;
>
> -#ifdef CONFIG_CGROUP_SCHED
> - ? ? ? tg = container_of(task_subsys_state(p, cpu_cgroup_subsys_id),
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct task_group, css);
> -#else
> - ? ? ? tg = &init_task_group;
> -#endif
> - ? ? ? return tg;
> + ? ? ? css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
> + ? ? ? ? ? ? ? ? ? ? ? lockdep_is_held(&task_rq(p)->lock));
> + ? ? ? return container_of(css, struct task_group, css);
> ?}
>
> ?/* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
> ?static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
> ?{
> - ? ? ? /*
> - ? ? ? ?* Strictly speaking this rcu_read_lock() is not needed since the
> - ? ? ? ?* task_group is tied to the cgroup, which in turn can never go away
> - ? ? ? ?* as long as there are tasks attached to it.
> - ? ? ? ?*
> - ? ? ? ?* However since task_group() uses task_subsys_state() which is an
> - ? ? ? ?* rcu_dereference() user, this quiets CONFIG_PROVE_RCU.
> - ? ? ? ?*/
> - ? ? ? rcu_read_lock();
> ?#ifdef CONFIG_FAIR_GROUP_SCHED
> ? ? ? ?p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
> ? ? ? ?p->se.parent = task_group(p)->se[cpu];
> @@ -341,7 +335,6 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
> ? ? ? ?p->rt.rt_rq ?= task_group(p)->rt_rq[cpu];
> ? ? ? ?p->rt.parent = task_group(p)->rt_se[cpu];
> ?#endif
> - ? ? ? rcu_read_unlock();
> ?}
>
> ?#else
> @@ -4465,16 +4458,6 @@ recheck:
> ? ? ? ?}
>
> ? ? ? ?if (user) {
> -#ifdef CONFIG_RT_GROUP_SCHED
> - ? ? ? ? ? ? ? /*
> - ? ? ? ? ? ? ? ?* Do not allow realtime tasks into groups that have no runtime
> - ? ? ? ? ? ? ? ?* assigned.
> - ? ? ? ? ? ? ? ?*/
> - ? ? ? ? ? ? ? if (rt_bandwidth_enabled() && rt_policy(policy) &&
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? task_group(p)->rt_bandwidth.rt_runtime == 0)
> - ? ? ? ? ? ? ? ? ? ? ? return -EPERM;
> -#endif
> -
> ? ? ? ? ? ? ? ?retval = security_task_setscheduler(p, policy, param);
> ? ? ? ? ? ? ? ?if (retval)
> ? ? ? ? ? ? ? ? ? ? ? ?return retval;
> @@ -4490,6 +4473,26 @@ recheck:
> ? ? ? ? * runqueue lock must be held.
> ? ? ? ? */
> ? ? ? ?rq = __task_rq_lock(p);
> +
> + ? ? ? retval = 0;
> +#ifdef CONFIG_RT_GROUP_SCHED
> + ? ? ? if (user) {
> + ? ? ? ? ? ? ? /*
> + ? ? ? ? ? ? ? ?* Do not allow realtime tasks into groups that have no runtime
> + ? ? ? ? ? ? ? ?* assigned.
> + ? ? ? ? ? ? ? ?*/
> + ? ? ? ? ? ? ? if (rt_bandwidth_enabled() && rt_policy(policy) &&
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? task_group(p)->rt_bandwidth.rt_runtime == 0)
> + ? ? ? ? ? ? ? ? ? ? ? retval = -EPERM;
> +
> + ? ? ? ? ? ? ? if (retval) {
> + ? ? ? ? ? ? ? ? ? ? ? __task_rq_unlock(rq);
> + ? ? ? ? ? ? ? ? ? ? ? raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> + ? ? ? ? ? ? ? ? ? ? ? return retval;
> + ? ? ? ? ? ? ? }
> + ? ? ? }
> +#endif
> +
> ? ? ? ?/* recheck policy now with rq lock held */
> ? ? ? ?if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
> ? ? ? ? ? ? ? ?policy = oldpolicy = -1;
>
>

CC kernel/sched.o
kernel/sched.c: In function ?task_group?:
kernel/sched.c:321: error: implicit declaration of function ?task_rq?
kernel/sched.c:321: error: invalid type argument of ?->? (have ?int?)
make[1]: *** [kernel/sched.o] Error 1

I had to apply with fuzz. Did it mess up?

static inline struct task_group *task_group(struct task_struct *p)
{
struct cgroup_subsys_state *css;

css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
lockdep_is_held(&task_rq(p)->lock));
return container_of(css, struct task_group, css);
}

2010-06-08 13:34:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git1 - include/linux/cgroup.h:534 invoked rcu_dereference_check() without protection!

On Tue, 2010-06-08 at 09:14 -0400, Miles Lane wrote:

> CC kernel/sched.o
> kernel/sched.c: In function ‘task_group’:
> kernel/sched.c:321: error: implicit declaration of function ‘task_rq’
> kernel/sched.c:321: error: invalid type argument of ‘->’ (have ‘int’)
> make[1]: *** [kernel/sched.o] Error 1
>
> I had to apply with fuzz. Did it mess up?


No, I probably did.. task_rq() is defined on line 636 or thereabouts,
and this function landed around line 320.

Ahh, and it compiled here because I have CGROUP_SCHED=y, but
PROVE_RCU=n, so that whole check expression disappears and is never
evaluated...

/me fixes

---
Subject: sched: PROVE_RCU vs cpu_cgroup
From: Peter Zijlstra <[email protected]>
Date: Tue Jun 08 11:40:42 CEST 2010

PROVE_RCU has a few issues with the cpu_cgroup because the scheduler
typically holds rq->lock around the css rcu derefs but the generic
cgroup code doesn't (and can't) know about that lock.

Provide means to add extra checks to the css dereference and use that
in the scheduler to annotate its users.

The addition of rq->lock to these checks is correct because the
cgroup_subsys::attach() method takes the rq->lock for each task it
moves, therefore by holding that lock, we ensure the task is pinned to
the current cgroup and the RCU dereference is valid.

That leaves one genuine race in __sched_setscheduler() where we used
task_group() without holding any of the required locks and thus raced
with the cgroup code. Solve this by moving the check under the rq->lock.

Signed-off-by: Peter Zijlstra <[email protected]>
---
include/linux/cgroup.h | 20 +++++---
kernel/sched.c | 115 +++++++++++++++++++++++++------------------------
2 files changed, 73 insertions(+), 62 deletions(-)

Index: linux-2.6/include/linux/cgroup.h
===================================================================
--- linux-2.6.orig/include/linux/cgroup.h
+++ linux-2.6/include/linux/cgroup.h
@@ -525,13 +525,21 @@ static inline struct cgroup_subsys_state
return cgrp->subsys[subsys_id];
}

-static inline struct cgroup_subsys_state *task_subsys_state(
- struct task_struct *task, int subsys_id)
+/*
+ * function to get the cgroup_subsys_state which allows for extra
+ * rcu_dereference_check() conditions, such as locks used during the
+ * cgroup_subsys::attach() methods.
+ */
+#define task_subsys_state_check(task, subsys_id, __c) \
+ rcu_dereference_check(task->cgroups->subsys[subsys_id], \
+ rcu_read_lock_held() || \
+ lockdep_is_held(&task->alloc_lock) || \
+ cgroup_lock_is_held() || (__c))
+
+static inline struct cgroup_subsys_state *
+task_subsys_state(struct task_struct *task, int subsys_id)
{
- return rcu_dereference_check(task->cgroups->subsys[subsys_id],
- rcu_read_lock_held() ||
- lockdep_is_held(&task->alloc_lock) ||
- cgroup_lock_is_held());
+ return task_subsys_state_check(task, subsys_id, false);
}

static inline struct cgroup* task_cgroup(struct task_struct *task,
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -306,52 +306,6 @@ static int init_task_group_load = INIT_T
*/
struct task_group init_task_group;

-/* return group to which a task belongs */
-static inline struct task_group *task_group(struct task_struct *p)
-{
- struct task_group *tg;
-
-#ifdef CONFIG_CGROUP_SCHED
- tg = container_of(task_subsys_state(p, cpu_cgroup_subsys_id),
- struct task_group, css);
-#else
- tg = &init_task_group;
-#endif
- return tg;
-}
-
-/* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
-static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
-{
- /*
- * Strictly speaking this rcu_read_lock() is not needed since the
- * task_group is tied to the cgroup, which in turn can never go away
- * as long as there are tasks attached to it.
- *
- * However since task_group() uses task_subsys_state() which is an
- * rcu_dereference() user, this quiets CONFIG_PROVE_RCU.
- */
- rcu_read_lock();
-#ifdef CONFIG_FAIR_GROUP_SCHED
- p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
- p->se.parent = task_group(p)->se[cpu];
-#endif
-
-#ifdef CONFIG_RT_GROUP_SCHED
- p->rt.rt_rq = task_group(p)->rt_rq[cpu];
- p->rt.parent = task_group(p)->rt_se[cpu];
-#endif
- rcu_read_unlock();
-}
-
-#else
-
-static inline void set_task_rq(struct task_struct *p, unsigned int cpu) { }
-static inline struct task_group *task_group(struct task_struct *p)
-{
- return NULL;
-}
-
#endif /* CONFIG_CGROUP_SCHED */

/* CFS-related fields in a runqueue */
@@ -644,6 +598,49 @@ static inline int cpu_of(struct rq *rq)
#define cpu_curr(cpu) (cpu_rq(cpu)->curr)
#define raw_rq() (&__raw_get_cpu_var(runqueues))

+#ifdef CONFIG_CGROUP_SCHED
+
+/*
+ * Return the group to which this tasks belongs.
+ *
+ * We use task_subsys_state_check() and extend the RCU verification
+ * with lockdep_is_held(&task_rq(p)->lock) because cpu_cgroup_attach()
+ * holds that lock for each task it moves into the cgroup. Therefore
+ * by holding that lock, we pin the task to the current cgroup.
+ */
+static inline struct task_group *task_group(struct task_struct *p)
+{
+ struct cgroup_subsys_state *css;
+
+ css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
+ lockdep_is_held(&task_rq(p)->lock));
+ return container_of(css, struct task_group, css);
+}
+
+/* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
+static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
+{
+#ifdef CONFIG_FAIR_GROUP_SCHED
+ p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
+ p->se.parent = task_group(p)->se[cpu];
+#endif
+
+#ifdef CONFIG_RT_GROUP_SCHED
+ p->rt.rt_rq = task_group(p)->rt_rq[cpu];
+ p->rt.parent = task_group(p)->rt_se[cpu];
+#endif
+}
+
+#else /* CONFIG_CGROUP_SCHED */
+
+static inline void set_task_rq(struct task_struct *p, unsigned int cpu) { }
+static inline struct task_group *task_group(struct task_struct *p)
+{
+ return NULL;
+}
+
+#endif /* CONFIG_CGROUP_SCHED */
+
inline void update_rq_clock(struct rq *rq)
{
if (!rq->skip_clock_update)
@@ -4465,16 +4462,6 @@ recheck:
}

if (user) {
-#ifdef CONFIG_RT_GROUP_SCHED
- /*
- * Do not allow realtime tasks into groups that have no runtime
- * assigned.
- */
- if (rt_bandwidth_enabled() && rt_policy(policy) &&
- task_group(p)->rt_bandwidth.rt_runtime == 0)
- return -EPERM;
-#endif
-
retval = security_task_setscheduler(p, policy, param);
if (retval)
return retval;
@@ -4490,6 +4477,22 @@ recheck:
* runqueue lock must be held.
*/
rq = __task_rq_lock(p);
+
+#ifdef CONFIG_RT_GROUP_SCHED
+ if (user) {
+ /*
+ * Do not allow realtime tasks into groups that have no runtime
+ * assigned.
+ */
+ if (rt_bandwidth_enabled() && rt_policy(policy) &&
+ task_group(p)->rt_bandwidth.rt_runtime == 0) {
+ __task_rq_unlock(rq);
+ raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+ return -EPERM;
+ }
+ }
+#endif
+
/* recheck policy now with rq lock held */
if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
policy = oldpolicy = -1;

2010-06-09 15:11:48

by Miles Lane

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git1 - include/linux/cgroup.h:534 invoked rcu_dereference_check() without protection!

On Tue, Jun 8, 2010 at 9:34 AM, Peter Zijlstra <[email protected]> wrote:
> On Tue, 2010-06-08 at 09:14 -0400, Miles Lane wrote:
>
>> ? CC ? ? ?kernel/sched.o
>> kernel/sched.c: In function ?task_group?:
>> kernel/sched.c:321: error: implicit declaration of function ?task_rq?
>> kernel/sched.c:321: error: invalid type argument of ?->? (have ?int?)
>> make[1]: *** [kernel/sched.o] Error 1
>>
>> I had to apply with fuzz. ?Did it mess up?
>
>
> No, I probably did.. task_rq() is defined on line 636 or thereabouts,
> and this function landed around line 320.
>
> Ahh, and it compiled here because I have CGROUP_SCHED=y, but
> PROVE_RCU=n, so that whole check expression disappears and is never
> evaluated...
>
> /me fixes
>
> ---
> Subject: sched: PROVE_RCU vs cpu_cgroup
> From: Peter Zijlstra <[email protected]>
> Date: Tue Jun 08 11:40:42 CEST 2010
>
> PROVE_RCU has a few issues with the cpu_cgroup because the scheduler
> typically holds rq->lock around the css rcu derefs but the generic
> cgroup code doesn't (and can't) know about that lock.
>
> Provide means to add extra checks to the css dereference and use that
> in the scheduler to annotate its users.
>
> The addition of rq->lock to these checks is correct because the
> cgroup_subsys::attach() method takes the rq->lock for each task it
> moves, therefore by holding that lock, we ensure the task is pinned to
> the current cgroup and the RCU dereference is valid.
>
> That leaves one genuine race in __sched_setscheduler() where we used
> task_group() without holding any of the required locks and thus raced
> with the cgroup code. Solve this by moving the check under the rq->lock.
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
> ?include/linux/cgroup.h | ? 20 +++++---
> ?kernel/sched.c ? ? ? ? | ?115 +++++++++++++++++++++++++------------------------
> ?2 files changed, 73 insertions(+), 62 deletions(-)
>
> Index: linux-2.6/include/linux/cgroup.h
> ===================================================================
> --- linux-2.6.orig/include/linux/cgroup.h
> +++ linux-2.6/include/linux/cgroup.h
> @@ -525,13 +525,21 @@ static inline struct cgroup_subsys_state
> ? ? ? ?return cgrp->subsys[subsys_id];
> ?}
>
> -static inline struct cgroup_subsys_state *task_subsys_state(
> - ? ? ? struct task_struct *task, int subsys_id)
> +/*
> + * function to get the cgroup_subsys_state which allows for extra
> + * rcu_dereference_check() conditions, such as locks used during the
> + * cgroup_subsys::attach() methods.
> + */
> +#define task_subsys_state_check(task, subsys_id, __c) ? ? ? ? ? ? ? ? ?\
> + ? ? ? rcu_dereference_check(task->cgroups->subsys[subsys_id], ? ? ? ? \
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? rcu_read_lock_held() || ? ? ? ? ? ? ? ? ? \
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? lockdep_is_held(&task->alloc_lock) || ? ? \
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? cgroup_lock_is_held() || (__c))
> +
> +static inline struct cgroup_subsys_state *
> +task_subsys_state(struct task_struct *task, int subsys_id)
> ?{
> - ? ? ? return rcu_dereference_check(task->cgroups->subsys[subsys_id],
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?rcu_read_lock_held() ||
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?lockdep_is_held(&task->alloc_lock) ||
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?cgroup_lock_is_held());
> + ? ? ? return task_subsys_state_check(task, subsys_id, false);
> ?}
>
> ?static inline struct cgroup* task_cgroup(struct task_struct *task,
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -306,52 +306,6 @@ static int init_task_group_load = INIT_T
> ?*/
> ?struct task_group init_task_group;
>
> -/* return group to which a task belongs */
> -static inline struct task_group *task_group(struct task_struct *p)
> -{
> - ? ? ? struct task_group *tg;
> -
> -#ifdef CONFIG_CGROUP_SCHED
> - ? ? ? tg = container_of(task_subsys_state(p, cpu_cgroup_subsys_id),
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct task_group, css);
> -#else
> - ? ? ? tg = &init_task_group;
> -#endif
> - ? ? ? return tg;
> -}
> -
> -/* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
> -static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
> -{
> - ? ? ? /*
> - ? ? ? ?* Strictly speaking this rcu_read_lock() is not needed since the
> - ? ? ? ?* task_group is tied to the cgroup, which in turn can never go away
> - ? ? ? ?* as long as there are tasks attached to it.
> - ? ? ? ?*
> - ? ? ? ?* However since task_group() uses task_subsys_state() which is an
> - ? ? ? ?* rcu_dereference() user, this quiets CONFIG_PROVE_RCU.
> - ? ? ? ?*/
> - ? ? ? rcu_read_lock();
> -#ifdef CONFIG_FAIR_GROUP_SCHED
> - ? ? ? p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
> - ? ? ? p->se.parent = task_group(p)->se[cpu];
> -#endif
> -
> -#ifdef CONFIG_RT_GROUP_SCHED
> - ? ? ? p->rt.rt_rq ?= task_group(p)->rt_rq[cpu];
> - ? ? ? p->rt.parent = task_group(p)->rt_se[cpu];
> -#endif
> - ? ? ? rcu_read_unlock();
> -}
> -
> -#else
> -
> -static inline void set_task_rq(struct task_struct *p, unsigned int cpu) { }
> -static inline struct task_group *task_group(struct task_struct *p)
> -{
> - ? ? ? return NULL;
> -}
> -
> ?#endif /* CONFIG_CGROUP_SCHED */
>
> ?/* CFS-related fields in a runqueue */
> @@ -644,6 +598,49 @@ static inline int cpu_of(struct rq *rq)
> ?#define cpu_curr(cpu) ? ? ? ? ?(cpu_rq(cpu)->curr)
> ?#define raw_rq() ? ? ? ? ? ? ? (&__raw_get_cpu_var(runqueues))
>
> +#ifdef CONFIG_CGROUP_SCHED
> +
> +/*
> + * Return the group to which this tasks belongs.
> + *
> + * We use task_subsys_state_check() and extend the RCU verification
> + * with lockdep_is_held(&task_rq(p)->lock) because cpu_cgroup_attach()
> + * holds that lock for each task it moves into the cgroup. Therefore
> + * by holding that lock, we pin the task to the current cgroup.
> + */
> +static inline struct task_group *task_group(struct task_struct *p)
> +{
> + ? ? ? struct cgroup_subsys_state *css;
> +
> + ? ? ? css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
> + ? ? ? ? ? ? ? ? ? ? ? lockdep_is_held(&task_rq(p)->lock));
> + ? ? ? return container_of(css, struct task_group, css);
> +}
> +
> +/* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
> +static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
> +{
> +#ifdef CONFIG_FAIR_GROUP_SCHED
> + ? ? ? p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
> + ? ? ? p->se.parent = task_group(p)->se[cpu];
> +#endif
> +
> +#ifdef CONFIG_RT_GROUP_SCHED
> + ? ? ? p->rt.rt_rq ?= task_group(p)->rt_rq[cpu];
> + ? ? ? p->rt.parent = task_group(p)->rt_se[cpu];
> +#endif
> +}
> +
> +#else /* CONFIG_CGROUP_SCHED */
> +
> +static inline void set_task_rq(struct task_struct *p, unsigned int cpu) { }
> +static inline struct task_group *task_group(struct task_struct *p)
> +{
> + ? ? ? return NULL;
> +}
> +
> +#endif /* CONFIG_CGROUP_SCHED */
> +
> ?inline void update_rq_clock(struct rq *rq)
> ?{
> ? ? ? ?if (!rq->skip_clock_update)
> @@ -4465,16 +4462,6 @@ recheck:
> ? ? ? ?}
>
> ? ? ? ?if (user) {
> -#ifdef CONFIG_RT_GROUP_SCHED
> - ? ? ? ? ? ? ? /*
> - ? ? ? ? ? ? ? ?* Do not allow realtime tasks into groups that have no runtime
> - ? ? ? ? ? ? ? ?* assigned.
> - ? ? ? ? ? ? ? ?*/
> - ? ? ? ? ? ? ? if (rt_bandwidth_enabled() && rt_policy(policy) &&
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? task_group(p)->rt_bandwidth.rt_runtime == 0)
> - ? ? ? ? ? ? ? ? ? ? ? return -EPERM;
> -#endif
> -
> ? ? ? ? ? ? ? ?retval = security_task_setscheduler(p, policy, param);
> ? ? ? ? ? ? ? ?if (retval)
> ? ? ? ? ? ? ? ? ? ? ? ?return retval;
> @@ -4490,6 +4477,22 @@ recheck:
> ? ? ? ? * runqueue lock must be held.
> ? ? ? ? */
> ? ? ? ?rq = __task_rq_lock(p);
> +
> +#ifdef CONFIG_RT_GROUP_SCHED
> + ? ? ? if (user) {
> + ? ? ? ? ? ? ? /*
> + ? ? ? ? ? ? ? ?* Do not allow realtime tasks into groups that have no runtime
> + ? ? ? ? ? ? ? ?* assigned.
> + ? ? ? ? ? ? ? ?*/
> + ? ? ? ? ? ? ? if (rt_bandwidth_enabled() && rt_policy(policy) &&
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? task_group(p)->rt_bandwidth.rt_runtime == 0) {
> + ? ? ? ? ? ? ? ? ? ? ? __task_rq_unlock(rq);
> + ? ? ? ? ? ? ? ? ? ? ? raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> + ? ? ? ? ? ? ? ? ? ? ? return -EPERM;
> + ? ? ? ? ? ? ? }
> + ? ? ? }
> +#endif
> +
> ? ? ? ?/* recheck policy now with rq lock held */
> ? ? ? ?if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
> ? ? ? ? ? ? ? ?policy = oldpolicy = -1;
>
>

Sorry. I misunderstood this message when I first read it. I didn't
realize this message include a new version of the patch.
Anyhow, I just tried to apply the patch to 2.6.35-rc2-git3 and got this:

# patch -p1 -l -F 20 --dry-run < ../5.patch
patching file include/linux/cgroup.h
patching file kernel/sched.c
Hunk #1 succeeded at 306 with fuzz 1.
Hunk #3 FAILED at 4462.
Hunk #4 succeeded at 4487 with fuzz 3.
1 out of 4 hunks FAILED -- saving rejects to file kernel/sched.c.rej

2010-06-09 15:24:50

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git1 - include/linux/cgroup.h:534 invoked rcu_dereference_check() without protection!

On Wed, 2010-06-09 at 11:11 -0400, Miles Lane wrote:
>
> Sorry. I misunderstood this message when I first read it. I didn't
> realize this message include a new version of the patch.
> Anyhow, I just tried to apply the patch to 2.6.35-rc2-git3 and got this:
>
> # patch -p1 -l -F 20 --dry-run < ../5.patch
> patching file include/linux/cgroup.h
> patching file kernel/sched.c
> Hunk #1 succeeded at 306 with fuzz 1.
> Hunk #3 FAILED at 4462.
> Hunk #4 succeeded at 4487 with fuzz 3.
> 1 out of 4 hunks FAILED -- saving rejects to file kernel/sched.c.rej

Weird.. it seems to apply without trouble to Linus' git tree.

root@twins:/usr/src/linux-2.6# git checkout -f origin/master
HEAD is now at 84f7586... Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
root@twins:/usr/src/linux-2.6# quilt push
Applying patch patches/sched-rcu-validation.patch
patching file include/linux/cgroup.h
patching file kernel/sched.c

Now at patch patches/sched-rcu-validation.patch
root@twins:/usr/src/linux-2.6# git describe
v2.6.35-rc2-54-g84f7586

2010-06-09 16:28:19

by Miles Lane

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git1 - include/linux/cgroup.h:534 invoked rcu_dereference_check() without protection!

On Wed, Jun 9, 2010 at 11:24 AM, Peter Zijlstra <[email protected]> wrote:
> On Wed, 2010-06-09 at 11:11 -0400, Miles Lane wrote:
>>
>> Sorry. ?I misunderstood this message when I first read it. ?I didn't
>> realize this message include a new version of the patch.
>> Anyhow, I just tried to apply the patch to 2.6.35-rc2-git3 and got this:
>>
>> # patch -p1 -l -F 20 --dry-run < ../5.patch
>> patching file include/linux/cgroup.h
>> patching file kernel/sched.c
>> Hunk #1 succeeded at 306 with fuzz 1.
>> Hunk #3 FAILED at 4462.
>> Hunk #4 succeeded at 4487 with fuzz 3.
>> 1 out of 4 hunks FAILED -- saving rejects to file kernel/sched.c.rej
>
> Weird.. it seems to apply without trouble to Linus' git tree.
>
> root@twins:/usr/src/linux-2.6# git checkout -f origin/master
> HEAD is now at 84f7586... Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
> root@twins:/usr/src/linux-2.6# quilt push
> Applying patch patches/sched-rcu-validation.patch
> patching file include/linux/cgroup.h
> patching file kernel/sched.c
>
> Now at patch patches/sched-rcu-validation.patch
> root@twins:/usr/src/linux-2.6# git describe
> v2.6.35-rc2-54-g84f7586

Oh. Duh. I know what is going on. I had received another patch to
sched.c. They must conflict. I will test with just your patch now.

2010-06-09 17:00:28

by Miles Lane

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git1 - include/linux/cgroup.h:534 invoked rcu_dereference_check() without protection!

On Wed, Jun 9, 2010 at 11:29 AM, Miles Lane <[email protected]> wrote:
> On Wed, Jun 9, 2010 at 11:24 AM, Peter Zijlstra <[email protected]> wrote:
>> On Wed, 2010-06-09 at 11:11 -0400, Miles Lane wrote:
>>>
>>> Sorry. ?I misunderstood this message when I first read it. ?I didn't
>>> realize this message include a new version of the patch.
>>> Anyhow, I just tried to apply the patch to 2.6.35-rc2-git3 and got this:
>>>
>>> # patch -p1 -l -F 20 --dry-run < ../5.patch
>>> patching file include/linux/cgroup.h
>>> patching file kernel/sched.c
>>> Hunk #1 succeeded at 306 with fuzz 1.
>>> Hunk #3 FAILED at 4462.
>>> Hunk #4 succeeded at 4487 with fuzz 3.
>>> 1 out of 4 hunks FAILED -- saving rejects to file kernel/sched.c.rej
>>
>> Weird.. it seems to apply without trouble to Linus' git tree.
>>
>> root@twins:/usr/src/linux-2.6# git checkout -f origin/master
>> HEAD is now at 84f7586... Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
>> root@twins:/usr/src/linux-2.6# quilt push
>> Applying patch patches/sched-rcu-validation.patch
>> patching file include/linux/cgroup.h
>> patching file kernel/sched.c
>>
>> Now at patch patches/sched-rcu-validation.patch
>> root@twins:/usr/src/linux-2.6# git describe
>> v2.6.35-rc2-54-g84f7586
>
> Oh. ?Duh. I know what is going on. ?I had received another patch to
> sched.c. ?They must conflict. ?I will test with just your patch now.
>

ACPI: Core revision 20100428
[ 0.061088]
[ 0.061090] ===================================================
[ 0.062009] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 0.062138] ---------------------------------------------------
[ 0.062268] kernel/sched.c:616 invoked rcu_dereference_check()
without protection!
[ 0.062470]
[ 0.062471] other info that might help us debug this:
[ 0.062472]
[ 0.062835]
[ 0.062836] rcu_scheduler_active = 1, debug_locks = 1
[ 0.063009] no locks held by swapper/0.
[ 0.063134]
[ 0.063135] stack backtrace:
[ 0.063378] Pid: 0, comm: swapper Not tainted 2.6.35-rc2-git3 #3
[ 0.063507] Call Trace:
[ 0.063638] [<ffffffff81072205>] lockdep_rcu_dereference+0x9d/0xa5
[ 0.063773] [<ffffffff810379f9>] task_group+0x7b/0x8a
[ 0.064012] [<ffffffff81037a1d>] set_task_rq+0x15/0x6e
[ 0.064143] [<ffffffff8103e50f>] set_task_cpu+0xa9/0xba
[ 0.064274] [<ffffffff81042dbb>] sched_fork+0x10a/0x1b3
[ 0.064405] [<ffffffff810446f9>] copy_process+0x617/0x10e6
[ 0.064537] [<ffffffff8104533d>] do_fork+0x175/0x39b
[ 0.064670] [<ffffffff8106589b>] ? up+0xf/0x39
[ 0.064800] [<ffffffff8106589b>] ? up+0xf/0x39
[ 0.065013] [<ffffffff811dbf73>] ? do_raw_spin_lock+0x79/0x13e
[ 0.065148] [<ffffffff81011526>] kernel_thread+0x70/0x72
[ 0.065279] [<ffffffff816cc5e4>] ? kernel_init+0x0/0x1ce
[ 0.065411] [<ffffffff8100aba0>] ? kernel_thread_helper+0x0/0x10
[ 0.065545] [<ffffffff81096bea>] ? rcu_scheduler_starting+0x2a/0x4c
[ 0.065679] [<ffffffff813a8a4d>] rest_init+0x21/0xde
[ 0.065810] [<ffffffff816cce28>] start_kernel+0x448/0x453
[ 0.066013] [<ffffffff816cc2c8>] x86_64_start_reservations+0xb3/0xb7
[ 0.066148] [<ffffffff816cc418>] x86_64_start_kernel+0x14c/0x15b
[ 0.066499] Setting APIC routing to flat

2010-06-09 17:57:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git1 - include/linux/cgroup.h:534 invoked rcu_dereference_check() without protection!

On Wed, 2010-06-09 at 13:00 -0400, Miles Lane wrote:

> ACPI: Core revision 20100428
> [ 0.061088]
> [ 0.061090] ===================================================
> [ 0.062009] [ INFO: suspicious rcu_dereference_check() usage. ]
> [ 0.062138] ---------------------------------------------------
> [ 0.062268] kernel/sched.c:616 invoked rcu_dereference_check()
> without protection!
> [ 0.062470]
> [ 0.062471] other info that might help us debug this:
> [ 0.062472]
> [ 0.062835]
> [ 0.062836] rcu_scheduler_active = 1, debug_locks = 1
> [ 0.063009] no locks held by swapper/0.
> [ 0.063134]
> [ 0.063135] stack backtrace:
> [ 0.063378] Pid: 0, comm: swapper Not tainted 2.6.35-rc2-git3 #3
> [ 0.063507] Call Trace:
> [ 0.063638] [<ffffffff81072205>] lockdep_rcu_dereference+0x9d/0xa5
> [ 0.063773] [<ffffffff810379f9>] task_group+0x7b/0x8a
> [ 0.064012] [<ffffffff81037a1d>] set_task_rq+0x15/0x6e
> [ 0.064143] [<ffffffff8103e50f>] set_task_cpu+0xa9/0xba
> [ 0.064274] [<ffffffff81042dbb>] sched_fork+0x10a/0x1b3
> [ 0.064405] [<ffffffff810446f9>] copy_process+0x617/0x10e6
> [ 0.064537] [<ffffffff8104533d>] do_fork+0x175/0x39b
> [ 0.064670] [<ffffffff8106589b>] ? up+0xf/0x39
> [ 0.064800] [<ffffffff8106589b>] ? up+0xf/0x39
> [ 0.065013] [<ffffffff811dbf73>] ? do_raw_spin_lock+0x79/0x13e
> [ 0.065148] [<ffffffff81011526>] kernel_thread+0x70/0x72
> [ 0.065279] [<ffffffff816cc5e4>] ? kernel_init+0x0/0x1ce
> [ 0.065411] [<ffffffff8100aba0>] ? kernel_thread_helper+0x0/0x10
> [ 0.065545] [<ffffffff81096bea>] ? rcu_scheduler_starting+0x2a/0x4c
> [ 0.065679] [<ffffffff813a8a4d>] rest_init+0x21/0xde
> [ 0.065810] [<ffffffff816cce28>] start_kernel+0x448/0x453
> [ 0.066013] [<ffffffff816cc2c8>] x86_64_start_reservations+0xb3/0xb7
> [ 0.066148] [<ffffffff816cc418>] x86_64_start_kernel+0x14c/0x15b
> [ 0.066499] Setting APIC routing to flat

Argh, moar funkeh stuff..

Either we do something like the below, or add something like (p->flags &
PF_STARTING) to the task_subsys_state_check(), opinions?

---
kernel/sched.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 19b3c5d..bfd3128 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2564,7 +2564,15 @@ void sched_fork(struct task_struct *p, int clone_flags)
if (p->sched_class->task_fork)
p->sched_class->task_fork(p);

+ /*
+ * We're not in the pid-hash yet so no cgroup attach races, and the
+ * cgroup is pinned by the parent running this.
+ *
+ * Silence PROVE_RCU.
+ */
+ rcu_read_lock();
set_task_cpu(p, cpu);
+ rcu_read_unlock();

#if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
if (likely(sched_info_on()))

2010-06-09 18:15:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git1 - include/linux/cgroup.h:534 invoked rcu_dereference_check() without protection!

On Wed, 2010-06-09 at 19:57 +0200, Peter Zijlstra wrote:
> + /*
> + * We're not in the pid-hash yet so no cgroup attach races, and the
> + * cgroup is pinned by the parent running this.
> + *
> + * Silence PROVE_RCU.
> + */

Hum,.. not sure that's actually true though, the parent itself is still
susceptible to races afaict..

2010-06-09 21:15:31

by Miles Lane

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git1 - include/linux/cgroup.h:534 invoked rcu_dereference_check() without protection!

On Wed, Jun 9, 2010 at 2:15 PM, Peter Zijlstra <[email protected]> wrote:
> On Wed, 2010-06-09 at 19:57 +0200, Peter Zijlstra wrote:
>> + ? ? ? /*
>> + ? ? ? ?* We're not in the pid-hash yet so no cgroup attach races, and the
>> + ? ? ? ?* cgroup is pinned by the parent running this.
>> + ? ? ? ?*
>> + ? ? ? ?* Silence PROVE_RCU.
>> + ? ? ? ?*/
>
> Hum,.. not sure that's actually true though, the parent itself is still
> susceptible to races afaict..

Do you want what you sent earlier tested, or should I wait for another patch?

2010-06-09 21:21:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git1 - include/linux/cgroup.h:534 invoked rcu_dereference_check() without protection!

On Wed, 2010-06-09 at 17:15 -0400, Miles Lane wrote:
> On Wed, Jun 9, 2010 at 2:15 PM, Peter Zijlstra <[email protected]> wrote:
> > On Wed, 2010-06-09 at 19:57 +0200, Peter Zijlstra wrote:
> >> + /*
> >> + * We're not in the pid-hash yet so no cgroup attach races, and the
> >> + * cgroup is pinned by the parent running this.
> >> + *
> >> + * Silence PROVE_RCU.
> >> + */
> >
> > Hum,.. not sure that's actually true though, the parent itself is still
> > susceptible to races afaict..
>
> Do you want what you sent earlier tested, or should I wait for another patch?

please wait, I need to sit down and sort out that code.