2002-02-13 07:39:07

by Paul Mackerras

[permalink] [raw]
Subject: smp_send_reschedule vs. smp_migrate_task

I am looking at the updates for PPC that are needed because of the
changes to the scheduler in 2.5.x. I need to implement
smp_migrate_task(), but I do not have another IPI easily available;
the Open PIC interrupt controller used in a lot of SMP PPC machines
supports 4 IPIs in hardware and we are already using all of them.

Thus I was thinking of using the same IPI for smp_migrate_task and
smp_send_reschedule. The idea is that smp_send_reschedule(cpu) will
be effectively smp_migrate_task(cpu, NULL), and the code that receives
that IPI will check for the NULL and do set_need_resched() instead of
sched_task_migrated().

At present the i386 version of smp_migrate_task uses a single global
spinlock, thus only one task can be migrating at a time. If I make
smp_send_reschedule and smp_migrate_task both use the same global
spinlock, is that likely to cause deadlocks or unacceptable
contention? In fact it would not be hard to have a spinlock per cpu.
Would we ever be likely to do smp_migrate_task and set_need_resched
for the same target cpu at the same time?

Paul.


2002-02-13 16:59:39

by James Bottomley

[permalink] [raw]
Subject: Re:smp_send_reschedule vs. smp_migrate_task

> I am looking at the updates for PPC that are needed because of the
> changes to the scheduler in 2.5.x. I need to implement
> smp_migrate_task(), but I do not have another IPI easily available;
> the Open PIC interrupt controller used in a lot of SMP PPC machines
> supports 4 IPIs in hardware and we are already using all of them.

I have this problem with the older voyager architectures that effectively have
only one IPI. I solved it by using a per-cpu area mailbox for IPIs (to send
an IPI, set the bit in the per CPU area for all CPUs you want to send it to
and then send of the global IPI. Each receiving CPU clears the bit in their
area before they begin processing the particular IPI). It's not very
efficient through (especially with cache transfers from sender to recipient),
so using what you have is probably better.

> Thus I was thinking of using the same IPI for smp_migrate_task and
> smp_send_reschedule. The idea is that smp_send_reschedule(cpu) will
> be effectively smp_migrate_task(cpu, NULL), and the code that receives
> that IPI will check for the NULL and do set_need_resched() instead of
> sched_task_migrated().

I wouldn't necessarily do this. smp_send_reschedule is used a lot by the
scheduler and is designed to be fast execution (i.e. you just send out the IPI
and continue). Adding spinlocks to this code will slow it down and add cache
thrashing contention.

smp_migrate_task is designed to be fast executing, but it is only used in
set_cpus_allowed(), which is rarely called and is not time critical. Why not
use the smp_call_function interface instead? The semantics aren't entirely
the same, but I don't believe the impact to set_cpus_allowed() will be felt at
all.

James Bottomley


2002-02-14 22:03:50

by Valerie Henson

[permalink] [raw]
Subject: Re: smp_send_reschedule vs. smp_migrate_task

On Wed, Feb 13, 2002 at 06:37:14PM +1100, Paul Mackerras wrote:
> I am looking at the updates for PPC that are needed because of the
> changes to the scheduler in 2.5.x. I need to implement
> smp_migrate_task(), but I do not have another IPI easily available;
> the Open PIC interrupt controller used in a lot of SMP PPC machines
> supports 4 IPIs in hardware and we are already using all of them.

I had only one IPI for the RPIC (an interrupt controller only used on
Synergy PPC boards) and I implemented a little message queue to
simulate all 4 IPI's. The mailbox implementation suggested by James
Bottomley ended up having race conditions on our board. It's probably
not the most elegant solution, but it works and required no change to
the PowerPC SMP code. See my "Make Gemini boot" patch to linuxppc-dev
and take a look at the files rpic.c and rpic.h.

-VAL

2002-02-14 22:30:49

by Paul Mackerras

[permalink] [raw]
Subject: Re: smp_send_reschedule vs. smp_migrate_task

Val Henson writes:

> I had only one IPI for the RPIC (an interrupt controller only used on
> Synergy PPC boards) and I implemented a little message queue to
> simulate all 4 IPI's. The mailbox implementation suggested by James
> Bottomley ended up having race conditions on our board. It's probably
> not the most elegant solution, but it works and required no change to
> the PowerPC SMP code. See my "Make Gemini boot" patch to linuxppc-dev
> and take a look at the files rpic.c and rpic.h.

In that post I was really asking the following questions:

* how often does smp_send_reschedule get called?
* how often does smp_migrate_task get called?
* if smp_send_reschedule and smp_migrate_task were mutually exclusive,
i.e. both used the same spinlock, could that lead to deadlock?

James Bottomley answered the first two for me but not the third.

Paul.

2002-02-14 23:28:33

by James Bottomley

[permalink] [raw]
Subject: Re: smp_send_reschedule vs. smp_migrate_task

> In that post I was really asking the following questions:
> * how often does smp_send_reschedule get called? * how often does
> smp_migrate_task get called? * if smp_send_reschedule and
> smp_migrate_task were mutually exclusive,
> i.e. both used the same spinlock, could that lead to deadlock?

> James Bottomley answered the first two for me but not the third.

I think the answer to the third is yes.

The potential deadlock is inherent in smp_migrate_task(). Any code which
takes a spinlock on one CPU and unlocks it on another via an IPI is asking for
a deadlock.

Here's the scenario:

CPU 1 does a smp_migrate_task() to CPU 2 at the same time CPU 2 does the same
thing to CPU 1. They both contend for the migration lock. CPU 1 wins,
acquires the migration lock and sends the IPI to CPU 2. If CPU 2 is spinning
on the migration lock *with interrupts disabled* then you have a deadlock (it
can never accept the IPI and release the lock).

The way out is to make sure interrupts are always enabled when taking the
migration lock (which is true for all the task migration code paths). This,
in turn imposes a condition: the lock may never be taken from an interrupt
(otherwise it may deadlock itself).

Since smp_send_reschedule() is called down many process wake up code paths,
I'm not sure you can comply with the no calling from interrupt requirement if
you make it share a lock with smp_migrate_task().

James


2002-02-15 05:56:11

by Valerie Henson

[permalink] [raw]
Subject: Re: smp_send_reschedule vs. smp_migrate_task

On Fri, Feb 15, 2002 at 09:28:56AM +1100, Paul Mackerras wrote:
> Val Henson writes:
>
> > I had only one IPI for the RPIC (an interrupt controller only used on
> > Synergy PPC boards) and I implemented a little message queue to
> > simulate all 4 IPI's. The mailbox implementation suggested by James
> > Bottomley ended up having race conditions on our board. It's probably
> > not the most elegant solution, but it works and required no change to
> > the PowerPC SMP code. See my "Make Gemini boot" patch to linuxppc-dev
> > and take a look at the files rpic.c and rpic.h.
>
> In that post I was really asking the following questions:
>
> * how often does smp_send_reschedule get called?
> * how often does smp_migrate_task get called?
> * if smp_send_reschedule and smp_migrate_task were mutually exclusive,
> i.e. both used the same spinlock, could that lead to deadlock?
>
> James Bottomley answered the first two for me but not the third.

Understood.

I'm still a little disgusted by a system that works for 4
smp_<whatever> functions but not 5. :)

-VAL