2009-03-26 03:47:27

by Piet Delaney

[permalink] [raw]
Subject: [PATCH} - There appears to be a minor race condition in sched.c

Ingo, Peter:

There appears to be a minor race condition in sched.c where
you can get a division by zero. I suspect that it only shows
up when the kernel is compiled without optimization and the code
loads rq->nr_running from memory twice.

It's part of our SMP stabilization changes that I just posted to:

git://git.kernel.org/pub/scm/linux/kernel/git/piet/xtensa-2.6.27-smp.git

I mentioned it to Johannes the other day and he suggested passing it on to you ASAP.

-------------------------------- Begin kernel/sched.c --------------------------------
index 9a1ddb8..607ee38 100644
@@ -1388,9 +1388,11 @@ static int task_hot(struct task_struct *p, u64 now, struct sched_domain *sd);
static unsigned long cpu_avg_load_per_task(int cpu)
{
struct rq *rq = cpu_rq(cpu);
+ unsigned long nr_running = rq->nr_running;

- if (rq->nr_running)
- rq->avg_load_per_task = rq->load.weight / rq->nr_running;
+ /* Local copy of nr_running used to avoid a possible div by zero */
+ if (nr_running)
+ rq->avg_load_per_task = rq->load.weight / nr_running;

return rq->avg_load_per_task;
}
-------------------------------- End kernel/sched.c --------------------------------

-piet


2009-03-26 07:51:37

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH} - There appears to be a minor race condition in sched.c

* Piet Delaney <[email protected]> [2009-03-25 20:46:11]:

> Ingo, Peter:
>
> There appears to be a minor race condition in sched.c where
> you can get a division by zero. I suspect that it only shows
> up when the kernel is compiled without optimization and the code
> loads rq->nr_running from memory twice.
>
> It's part of our SMP stabilization changes that I just posted to:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/piet/xtensa-2.6.27-smp.git
>
> I mentioned it to Johannes the other day and he suggested passing it on to you ASAP.
>

The latest version uses ACCESS_ONCE to get rq->nr_running and then
uses that value. I am not sure what version you are talking about, if
it is older, you should consider backporting from the current version.


--
Balbir

2009-03-26 20:46:38

by Piet Delaney

[permalink] [raw]
Subject: Re: [PATCH} - There appears to be a minor race condition in sched.c

Balbir Singh wrote:
> * Piet Delaney <[email protected]> [2009-03-25 20:46:11]:
>
>> Ingo, Peter:
>>
>> There appears to be a minor race condition in sched.c where
>> you can get a division by zero. I suspect that it only shows
>> up when the kernel is compiled without optimization and the code
>> loads rq->nr_running from memory twice.
>>
>> It's part of our SMP stabilization changes that I just posted to:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/piet/xtensa-2.6.27-smp.git
>>
>> I mentioned it to Johannes the other day and he suggested passing it on to you ASAP.
>>
>
> The latest version uses ACCESS_ONCE to get rq->nr_running and then
> uses that value. I am not sure what version you are talking about, if
> it is older, you should consider backporting from the current version.

Hi Balbir:

It appears that Steven Rostedt changed cpu_ave_load_per_task() to use a local
variable nr_running, just as I suggested, apparently back in 2.6.28-rc5
last Nov; well after the 2.6.27 that I mentioned above.

A few days later Ingo added the ACCESS_ONCE() after Linus pointed out
that nothing prevented the compiler from reloading rg->rn_running.
Linus was right, adding the volatile is necessary to prevent gcc
from doing forward substitution.

I'll check Linus's current repo next time before suggesting bug fixes.

-piet