From: Dmitry Adamushko <[email protected]>
Subject: sched, hotplug: ensure a task is on the valid cpu after
set_cpus_allowed_ptr()
---
sched, hotplug: ensure a task is on the valid cpu after set_cpus_allowed_ptr()
The 'new_mask' may not include task_cpu(p) so we migrate 'p' on another 'cpu'.
In case it can't be placed on this 'cpu' immediately, we submit a request
to the migration thread and wait for its completion.
Now, by the moment this request gets handled by the migration_thread,
'cpu' may well be offline/non-active. As a result, 'p' continues
running on its old cpu which is not in the 'new_mask'.
Fix it: ensure 'p' ends up on a valid cpu.
Theoreticaly (but unlikely), we may get an endless loop if someone cpu_down()'s
a new cpu we have choosen on each iteration.
Alternatively, we may introduce a special type of request to migration_thread,
namely "move_to_any_allowed_cpu" (e.g. by specifying dest_cpu == -1).
Note, any_active_cpu() instead of any_online_cpu() would be better here.
Signed-off-by: Dmitry Adamushko <[email protected]>
diff --git a/kernel/sched.c b/kernel/sched.c
index b4ccc8b..c3bd78a 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5774,21 +5774,23 @@ int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)
}
/* Can the task run on the task's current CPU? If so, we're done */
- if (cpu_isset(task_cpu(p), *new_mask))
- goto out;
+ while (!cpu_isset(task_cpu(p), p->cpus_allowed)) {
+ int cpu = any_online_cpu(p->cpus_allowed);
- if (migrate_task(p, any_online_cpu(*new_mask), &req)) {
- /* Need to wait for migration thread (might exit: take ref). */
- struct task_struct *mt = rq->migration_thread;
+ if (migrate_task(p, cpu, &req)) {
+ /* Need to wait for migration thread (might exit: take ref). */
+ struct task_struct *mt = rq->migration_thread;
- get_task_struct(mt);
- task_rq_unlock(rq, &flags);
- wake_up_process(mt);
- put_task_struct(mt);
+ get_task_struct(mt);
+ task_rq_unlock(rq, &flags);
+ wake_up_process(mt);
+ put_task_struct(mt);
- wait_for_completion(&req.done);
- tlb_migrate_finish(p->mm);
- return 0;
+ wait_for_completion(&req.done);
+ tlb_migrate_finish(p->mm);
+
+ rq = task_rq_lock(p, &flags);
+ }
}
out:
task_rq_unlock(rq, &flags);
On Fri, 2008-07-25 at 00:15 +0200, Dmitry Adamushko wrote:
>
> From: Dmitry Adamushko <[email protected]>
> Subject: sched, hotplug: ensure a task is on the valid cpu after
> set_cpus_allowed_ptr()
>
> ---
> sched, hotplug: ensure a task is on the valid cpu after set_cpus_allowed_ptr()
>
> The 'new_mask' may not include task_cpu(p) so we migrate 'p' on another 'cpu'.
> In case it can't be placed on this 'cpu' immediately, we submit a request
> to the migration thread and wait for its completion.
>
> Now, by the moment this request gets handled by the migration_thread,
> 'cpu' may well be offline/non-active. As a result, 'p' continues
> running on its old cpu which is not in the 'new_mask'.
>
> Fix it: ensure 'p' ends up on a valid cpu.
>
> Theoreticaly (but unlikely), we may get an endless loop if someone cpu_down()'s
> a new cpu we have choosen on each iteration.
>
> Alternatively, we may introduce a special type of request to migration_thread,
> namely "move_to_any_allowed_cpu" (e.g. by specifying dest_cpu == -1).
>
> Note, any_active_cpu() instead of any_online_cpu() would be better here.
Hrmm,.. this is all growing into something of a mess.. defeating the
whole purpose of introducing that cpu_active_map stuff.
Would the suggested SRCU logic simplify all this?
> Signed-off-by: Dmitry Adamushko <[email protected]>
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index b4ccc8b..c3bd78a 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -5774,21 +5774,23 @@ int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)
> }
>
> /* Can the task run on the task's current CPU? If so, we're done */
> - if (cpu_isset(task_cpu(p), *new_mask))
> - goto out;
> + while (!cpu_isset(task_cpu(p), p->cpus_allowed)) {
> + int cpu = any_online_cpu(p->cpus_allowed);
>
> - if (migrate_task(p, any_online_cpu(*new_mask), &req)) {
> - /* Need to wait for migration thread (might exit: take ref). */
> - struct task_struct *mt = rq->migration_thread;
> + if (migrate_task(p, cpu, &req)) {
> + /* Need to wait for migration thread (might exit: take ref). */
> + struct task_struct *mt = rq->migration_thread;
>
> - get_task_struct(mt);
> - task_rq_unlock(rq, &flags);
> - wake_up_process(mt);
> - put_task_struct(mt);
> + get_task_struct(mt);
> + task_rq_unlock(rq, &flags);
> + wake_up_process(mt);
> + put_task_struct(mt);
>
> - wait_for_completion(&req.done);
> - tlb_migrate_finish(p->mm);
> - return 0;
> + wait_for_completion(&req.done);
> + tlb_migrate_finish(p->mm);
> +
> + rq = task_rq_lock(p, &flags);
> + }
> }
> out:
> task_rq_unlock(rq, &flags);
>
>
2008/7/25 Peter Zijlstra <[email protected]>:
> On Fri, 2008-07-25 at 00:15 +0200, Dmitry Adamushko wrote:
>>
>> From: Dmitry Adamushko <[email protected]>
>> Subject: sched, hotplug: ensure a task is on the valid cpu after
>> set_cpus_allowed_ptr()
>>
>> ---
>> sched, hotplug: ensure a task is on the valid cpu after set_cpus_allowed_ptr()
>>
>> The 'new_mask' may not include task_cpu(p) so we migrate 'p' on another 'cpu'.
>> In case it can't be placed on this 'cpu' immediately, we submit a request
>> to the migration thread and wait for its completion.
>>
>> Now, by the moment this request gets handled by the migration_thread,
>> 'cpu' may well be offline/non-active. As a result, 'p' continues
>> running on its old cpu which is not in the 'new_mask'.
>>
>> Fix it: ensure 'p' ends up on a valid cpu.
>>
>> Theoreticaly (but unlikely), we may get an endless loop if someone cpu_down()'s
>> a new cpu we have choosen on each iteration.
>>
>> Alternatively, we may introduce a special type of request to migration_thread,
>> namely "move_to_any_allowed_cpu" (e.g. by specifying dest_cpu == -1).
>>
>> Note, any_active_cpu() instead of any_online_cpu() would be better here.
>
> Hrmm,.. this is all growing into something of a mess.. defeating the
> whole purpose of introducing that cpu_active_map stuff.
>
> Would the suggested SRCU logic simplify all this?
Ah, wait a second.
sched_setaffinity() -> set_cpus_allowed_ptr() is ok vs. cpu_down() as
it does use get_online_cpus(). So none of the cpus can become offline
while we are in set_cpus_allowed_ptr().
but there are numerous calls to set_cpus_allowed_ptr() from other
places and not all of them seem to call get_online_cpus()...
yeah, I should check this issue again..
btw., indeed all these different sync. cases are a bit of mess.
---
btw., I was wondering about this change:
ba42059fbd0aa1ac91b582412b5fedb1258f241f
sched: hrtick_enabled() should use cpu_active()
Peter pointed out that hrtick_enabled() should use cpu_active().
>
>> Signed-off-by: Dmitry Adamushko <[email protected]>
>>
>> diff --git a/kernel/sched.c b/kernel/sched.c
>> index b4ccc8b..c3bd78a 100644
>> --- a/kernel/sched.c
>> +++ b/kernel/sched.c
>> @@ -5774,21 +5774,23 @@ int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)
>> }
>>
>> /* Can the task run on the task's current CPU? If so, we're done */
>> - if (cpu_isset(task_cpu(p), *new_mask))
>> - goto out;
>> + while (!cpu_isset(task_cpu(p), p->cpus_allowed)) {
>> + int cpu = any_online_cpu(p->cpus_allowed);
>>
>> - if (migrate_task(p, any_online_cpu(*new_mask), &req)) {
>> - /* Need to wait for migration thread (might exit: take ref). */
>> - struct task_struct *mt = rq->migration_thread;
>> + if (migrate_task(p, cpu, &req)) {
>> + /* Need to wait for migration thread (might exit: take ref). */
>> + struct task_struct *mt = rq->migration_thread;
>>
>> - get_task_struct(mt);
>> - task_rq_unlock(rq, &flags);
>> - wake_up_process(mt);
>> - put_task_struct(mt);
>> + get_task_struct(mt);
>> + task_rq_unlock(rq, &flags);
>> + wake_up_process(mt);
>> + put_task_struct(mt);
>>
>> - wait_for_completion(&req.done);
>> - tlb_migrate_finish(p->mm);
>> - return 0;
>> + wait_for_completion(&req.done);
>> + tlb_migrate_finish(p->mm);
>> +
>> + rq = task_rq_lock(p, &flags);
>> + }
>> }
>> out:
>> task_rq_unlock(rq, &flags);
>>
>>
>
>
--
Best regards,
Dmitry Adamushko
On Fri, 2008-07-25 at 15:20 +0200, Dmitry Adamushko wrote:
> 2008/7/25 Peter Zijlstra <[email protected]>:
> > On Fri, 2008-07-25 at 00:15 +0200, Dmitry Adamushko wrote:
> >>
> >> From: Dmitry Adamushko <[email protected]>
> >> Subject: sched, hotplug: ensure a task is on the valid cpu after
> >> set_cpus_allowed_ptr()
> >>
> >> ---
> >> sched, hotplug: ensure a task is on the valid cpu after set_cpus_allowed_ptr()
> >>
> >> The 'new_mask' may not include task_cpu(p) so we migrate 'p' on another 'cpu'.
> >> In case it can't be placed on this 'cpu' immediately, we submit a request
> >> to the migration thread and wait for its completion.
> >>
> >> Now, by the moment this request gets handled by the migration_thread,
> >> 'cpu' may well be offline/non-active. As a result, 'p' continues
> >> running on its old cpu which is not in the 'new_mask'.
> >>
> >> Fix it: ensure 'p' ends up on a valid cpu.
> >>
> >> Theoreticaly (but unlikely), we may get an endless loop if someone cpu_down()'s
> >> a new cpu we have choosen on each iteration.
> >>
> >> Alternatively, we may introduce a special type of request to migration_thread,
> >> namely "move_to_any_allowed_cpu" (e.g. by specifying dest_cpu == -1).
> >>
> >> Note, any_active_cpu() instead of any_online_cpu() would be better here.
> >
> > Hrmm,.. this is all growing into something of a mess.. defeating the
> > whole purpose of introducing that cpu_active_map stuff.
> >
> > Would the suggested SRCU logic simplify all this?
>
> Ah, wait a second.
>
> sched_setaffinity() -> set_cpus_allowed_ptr() is ok vs. cpu_down() as
> it does use get_online_cpus(). So none of the cpus can become offline
> while we are in set_cpus_allowed_ptr().
>
> but there are numerous calls to set_cpus_allowed_ptr() from other
> places and not all of them seem to call get_online_cpus()...
>
> yeah, I should check this issue again..
>
> btw., indeed all these different sync. cases are a bit of mess.
Will ponder it a bit more, but my brain can't seem to let go of SRCU
now.. I'll go concentrate on making the swap-over-nfs patches prettier,
maybe that will induce a brainwave ;-)
> ---
>
> btw., I was wondering about this change:
>
> ba42059fbd0aa1ac91b582412b5fedb1258f241f
>
> sched: hrtick_enabled() should use cpu_active()
>
> Peter pointed out that hrtick_enabled() should use cpu_active().
What exactly were you wondering about?
It seemed a good idea to stop starting hrtimers before we migrate them
to another cpu (one of the things done later in cpu_down), thereby
avoiding spurious fires on remote cpus.
On Fri, Jul 25, 2008 at 12:15:30AM +0200, Dmitry Adamushko wrote:
>
> From: Dmitry Adamushko <[email protected]>
> Subject: sched, hotplug: ensure a task is on the valid cpu after
> set_cpus_allowed_ptr()
>
> ---
> sched, hotplug: ensure a task is on the valid cpu after set_cpus_allowed_ptr()
>
> The 'new_mask' may not include task_cpu(p) so we migrate 'p' on another 'cpu'.
> In case it can't be placed on this 'cpu' immediately, we submit a request
> to the migration thread and wait for its completion.
>
> Now, by the moment this request gets handled by the migration_thread,
> 'cpu' may well be offline/non-active. As a result, 'p' continues
> running on its old cpu which is not in the 'new_mask'.
>
> Fix it: ensure 'p' ends up on a valid cpu.
>
> Theoreticaly (but unlikely), we may get an endless loop if someone cpu_down()'s
> a new cpu we have choosen on each iteration.
Hi Dmitry,
Wasn't set_cpus_allowed_ptr() created from set_cpus_allowed() code
path ?
that means this bug (if it really is one....) has been around for quite
some time now!!
>
> Alternatively, we may introduce a special type of request to migration_thread,
> namely "move_to_any_allowed_cpu" (e.g. by specifying dest_cpu == -1).
>
> Note, any_active_cpu() instead of any_online_cpu() would be better here.
>
> Signed-off-by: Dmitry Adamushko <[email protected]>
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index b4ccc8b..c3bd78a 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -5774,21 +5774,23 @@ int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)
> }
>
> /* Can the task run on the task's current CPU? If so, we're done */
> - if (cpu_isset(task_cpu(p), *new_mask))
> - goto out;
> + while (!cpu_isset(task_cpu(p), p->cpus_allowed)) {
> + int cpu = any_online_cpu(p->cpus_allowed);
>
> - if (migrate_task(p, any_online_cpu(*new_mask), &req)) {
> - /* Need to wait for migration thread (might exit: take ref). */
> - struct task_struct *mt = rq->migration_thread;
> + if (migrate_task(p, cpu, &req)) {
> + /* Need to wait for migration thread (might exit: take ref). */
> + struct task_struct *mt = rq->migration_thread;
>
> - get_task_struct(mt);
> - task_rq_unlock(rq, &flags);
> - wake_up_process(mt);
> - put_task_struct(mt);
> + get_task_struct(mt);
> + task_rq_unlock(rq, &flags);
> + wake_up_process(mt);
> + put_task_struct(mt);
>
> - wait_for_completion(&req.done);
> - tlb_migrate_finish(p->mm);
> - return 0;
> + wait_for_completion(&req.done);
> + tlb_migrate_finish(p->mm);
> +
> + rq = task_rq_lock(p, &flags);
> + }
> }
> out:
> task_rq_unlock(rq, &flags);
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Thanks and Regards
gautham
2008/7/25 Peter Zijlstra <[email protected]>:
> On Fri, 2008-07-25 at 15:20 +0200, Dmitry Adamushko wrote:
>> 2008/7/25 Peter Zijlstra <[email protected]>:
>> > On Fri, 2008-07-25 at 00:15 +0200, Dmitry Adamushko wrote:
>> >>
>> >> From: Dmitry Adamushko <[email protected]>
>> >> Subject: sched, hotplug: ensure a task is on the valid cpu after
>> >> set_cpus_allowed_ptr()
>> >>
>> >> ---
>> >> sched, hotplug: ensure a task is on the valid cpu after set_cpus_allowed_ptr()
>> >>
>> >> The 'new_mask' may not include task_cpu(p) so we migrate 'p' on another 'cpu'.
>> >> In case it can't be placed on this 'cpu' immediately, we submit a request
>> >> to the migration thread and wait for its completion.
>> >>
>> >> Now, by the moment this request gets handled by the migration_thread,
>> >> 'cpu' may well be offline/non-active. As a result, 'p' continues
>> >> running on its old cpu which is not in the 'new_mask'.
>> >>
>> >> Fix it: ensure 'p' ends up on a valid cpu.
>> >>
>> >> Theoreticaly (but unlikely), we may get an endless loop if someone cpu_down()'s
>> >> a new cpu we have choosen on each iteration.
>> >>
>> >> Alternatively, we may introduce a special type of request to migration_thread,
>> >> namely "move_to_any_allowed_cpu" (e.g. by specifying dest_cpu == -1).
>> >>
>> >> Note, any_active_cpu() instead of any_online_cpu() would be better here.
>> >
>> > Hrmm,.. this is all growing into something of a mess.. defeating the
>> > whole purpose of introducing that cpu_active_map stuff.
>> >
>> > Would the suggested SRCU logic simplify all this?
>>
>> Ah, wait a second.
>>
>> sched_setaffinity() -> set_cpus_allowed_ptr() is ok vs. cpu_down() as
>> it does use get_online_cpus(). So none of the cpus can become offline
>> while we are in set_cpus_allowed_ptr().
>>
>> but there are numerous calls to set_cpus_allowed_ptr() from other
>> places and not all of them seem to call get_online_cpus()...
>>
>> yeah, I should check this issue again..
>>
>> btw., indeed all these different sync. cases are a bit of mess.
>
> Will ponder it a bit more, but my brain can't seem to let go of SRCU
> now..
I like it too.
> I'll go concentrate on making the swap-over-nfs patches prettier,
> maybe that will induce a brainwave ;-)
what's about task-migration over NFS? ;-)
>> btw., I was wondering about this change:
>>
>> ba42059fbd0aa1ac91b582412b5fedb1258f241f
>>
>> sched: hrtick_enabled() should use cpu_active()
>>
>> Peter pointed out that hrtick_enabled() should use cpu_active().
>
> What exactly were you wondering about?
>
> It seemed a good idea to stop starting hrtimers before we migrate them
> to another cpu (one of the things done later in cpu_down), thereby
> avoiding spurious fires on remote cpus.
>
Yeah, I thought that it's likely cpu_down() related.
I looked at it from the point of cpu_up(), e.g. a cpu is online ->
tasks get queued and start running (while cpu is still _not_ active
for a while). So when they get enqueued first time, hrtick_enabled()
wil give 0 and hr-timer won't be used.
Actually, cpu_active_map has already broken expectations/assumptions -
http://lkml.org/lkml/2008/7/24/260 (in case you have missed it). But
this particular "microcode"s behavior is really bad, I think.
--
Best regards,
Dmitry Adamushko