Changes from V0:
a) Fix grammar and add summary for commit reference; (Sergei Shtylyov)
b) Collect Acks
Since commit 5fbd036b [sched: Cleanup cpu_active madness] and
commit 2baab4e9 [sched: Fix select_fallback_rq() vs cpu_active/cpu_online],
it's more safe to put notify_cpu_starting() and set_cpu_online() with
irq disabled, otherwise we will have a typical race condition which
above two commits try to resolve:
CPU1 CPU2
__cpu_up();
mp_ops->boot_secondary();
start_secondary();
->init_secondary();
local_irq_enable();
<IRQ>
do something;
wake up softirqd;
try_to_wake_up();
select_fallback_rq();
/* select wrong cpu */
set_cpu_online();
This patchset fix the above issue as well as set_cpu_online is
called on the caller cpu.
BTW, I'm only running it on Cavium board because of limited source,
so if anyone is interested to test it on other board, that's great :)
Yong Zhang (8):
MIPS: Octeon: delay enable irq to ->smp_finish()
MIPS: BMIPS: delay irq enable to ->smp_finish()
MIPS: SMTC: delay irq enable to ->smp_finish()
MIPS: Yosemite: delay irq enable to ->smp_finish()
MIPS: call ->smp_finish() a little late
MIPS: call set_cpu_online() on the uping cpu with irq disabled
MIPS: smp: Warn on too early irq enable
MIPS: sync-r4k: remove redundant irq operation
arch/mips/cavium-octeon/smp.c | 2 +-
arch/mips/kernel/smp-bmips.c | 14 +++++++-------
arch/mips/kernel/smp.c | 12 +++++++++---
arch/mips/kernel/smtc.c | 3 ++-
arch/mips/kernel/sync-r4k.c | 5 -----
arch/mips/pmc-sierra/yosemite/smp.c | 2 +-
6 files changed, 20 insertions(+), 18 deletions(-)
--
1.7.5.4
From: Yong Zhang <[email protected]>
To prepare for smoothing set_cpu_[active|online]() mess up
Signed-off-by: Yong Zhang <[email protected]>
Cc: Ralf Baechle <[email protected]>
Acked-by: David Daney <[email protected]>
---
arch/mips/kernel/smp-bmips.c | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/mips/kernel/smp-bmips.c b/arch/mips/kernel/smp-bmips.c
index 3046e29..298b437 100644
--- a/arch/mips/kernel/smp-bmips.c
+++ b/arch/mips/kernel/smp-bmips.c
@@ -197,13 +197,6 @@ static void bmips_init_secondary(void)
write_c0_brcm_action(ACTION_CLR_IPI(smp_processor_id(), 0));
#endif
-
- /* make sure there won't be a timer interrupt for a little while */
- write_c0_compare(read_c0_count() + mips_hpt_frequency / HZ);
-
- irq_enable_hazard();
- set_c0_status(IE_SW0 | IE_SW1 | IE_IRQ1 | IE_IRQ5 | ST0_IE);
- irq_enable_hazard();
}
/*
@@ -212,6 +205,13 @@ static void bmips_init_secondary(void)
static void bmips_smp_finish(void)
{
pr_info("SMP: CPU%d is running\n", smp_processor_id());
+
+ /* make sure there won't be a timer interrupt for a little while */
+ write_c0_compare(read_c0_count() + mips_hpt_frequency / HZ);
+
+ irq_enable_hazard();
+ set_c0_status(IE_SW0 | IE_SW1 | IE_IRQ1 | IE_IRQ5 | ST0_IE);
+ irq_enable_hazard();
}
/*
--
1.7.5.4
From: Yong Zhang <[email protected]>
To prepare for smoothing set_cpu_[active|online]() mess up
Signed-off-by: Yong Zhang <[email protected]>
Acked-by: David Daney <[email protected]>
---
arch/mips/kernel/smtc.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/arch/mips/kernel/smtc.c b/arch/mips/kernel/smtc.c
index f5dd38f..af46fdd 100644
--- a/arch/mips/kernel/smtc.c
+++ b/arch/mips/kernel/smtc.c
@@ -615,7 +615,6 @@ void __cpuinit smtc_boot_secondary(int cpu, struct task_struct *idle)
void smtc_init_secondary(void)
{
- local_irq_enable();
}
void smtc_smp_finish(void)
@@ -631,6 +630,8 @@ void smtc_smp_finish(void)
if (cpu > 0 && (cpu_data[cpu].vpe_id != cpu_data[cpu - 1].vpe_id))
write_c0_compare(read_c0_count() + mips_hpt_frequency/HZ);
+ local_irq_enable();
+
printk("TC %d going on-line as CPU %d\n",
cpu_data[smp_processor_id()].tc_id, smp_processor_id());
}
--
1.7.5.4
From: Yong Zhang <[email protected]>
To prepare for smoothing set_cpu_[active|online]() mess up
Signed-off-by: Yong Zhang <[email protected]>
Acked-by: David Daney <[email protected]>
---
arch/mips/pmc-sierra/yosemite/smp.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/mips/pmc-sierra/yosemite/smp.c b/arch/mips/pmc-sierra/yosemite/smp.c
index b71fae2..5edab2b 100644
--- a/arch/mips/pmc-sierra/yosemite/smp.c
+++ b/arch/mips/pmc-sierra/yosemite/smp.c
@@ -115,11 +115,11 @@ static void yos_send_ipi_mask(const struct cpumask *mask, unsigned int action)
*/
static void __cpuinit yos_init_secondary(void)
{
- set_c0_status(ST0_CO | ST0_IE | ST0_IM);
}
static void __cpuinit yos_smp_finish(void)
{
+ set_c0_status(ST0_CO | ST0_IM | ST0_IE);
}
/* Hook for after all CPUs are online */
--
1.7.5.4
From: Yong Zhang <[email protected]>
We have move irq enable to ->smp_finish. Place ->smp_finish() a little
late to prepare for move set_cpu_online() into start_secondary.
And it's not necessary to call cpu_set(cpu, cpu_callin_map) and
synchronise_count_slave() with irq enabled.
Signed-off-by: Yong Zhang <[email protected]>
Acked-by: David Daney <[email protected]>
---
arch/mips/kernel/smp.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index ba9376b..73a268a 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -122,13 +122,14 @@ asmlinkage __cpuinit void start_secondary(void)
notify_cpu_starting(cpu);
- mp_ops->smp_finish();
set_cpu_sibling_map(cpu);
cpu_set(cpu, cpu_callin_map);
synchronise_count_slave();
+ mp_ops->smp_finish();
+
cpu_idle();
}
--
1.7.5.4
From: Yong Zhang <[email protected]>
To prepare for smoothing set_cpu_[active|online]() mess up
Signed-off-by: Yong Zhang <[email protected]>
Cc: Ralf Baechle <[email protected]>
Acked-by: David Daney <[email protected]>
---
arch/mips/cavium-octeon/smp.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/mips/cavium-octeon/smp.c b/arch/mips/cavium-octeon/smp.c
index 97e7ce9..ef9c34a 100644
--- a/arch/mips/cavium-octeon/smp.c
+++ b/arch/mips/cavium-octeon/smp.c
@@ -185,7 +185,6 @@ static void __cpuinit octeon_init_secondary(void)
octeon_init_cvmcount();
octeon_irq_setup_secondary();
- raw_local_irq_enable();
}
/**
@@ -233,6 +232,7 @@ static void octeon_smp_finish(void)
/* to generate the first CPU timer interrupt */
write_c0_compare(read_c0_count() + mips_hpt_frequency / HZ);
+ local_irq_enable();
}
/**
--
1.7.5.4
From: Yong Zhang <[email protected]>
To prevent a problem as commit 5fbd036b [sched: Cleanup cpu_active madness]
and commit 2baab4e9 [sched: Fix select_fallback_rq() vs cpu_active/cpu_online]
try to resolve, move set_cpu_online() to the brought up CPU and with irq
disabled.
Signed-off-by: Yong Zhang <[email protected]>
Acked-by: David Daney <[email protected]>
---
arch/mips/kernel/smp.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index 73a268a..042145f 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -122,6 +122,8 @@ asmlinkage __cpuinit void start_secondary(void)
notify_cpu_starting(cpu);
+ set_cpu_online(cpu, true);
+
set_cpu_sibling_map(cpu);
cpu_set(cpu, cpu_callin_map);
@@ -249,8 +251,6 @@ int __cpuinit __cpu_up(unsigned int cpu)
while (!cpu_isset(cpu, cpu_callin_map))
udelay(100);
- set_cpu_online(cpu, true);
-
return 0;
}
--
1.7.5.4
From: Yong Zhang <[email protected]>
Just to catch a potential issue.
Signed-off-by: Yong Zhang <[email protected]>
Acked-by: David Daney <[email protected]>
---
arch/mips/kernel/smp.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index 042145f..4fbafa8 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -130,6 +130,11 @@ asmlinkage __cpuinit void start_secondary(void)
synchronise_count_slave();
+ /*
+ * irq will be enabled in ->smp_finish(), enabling it too early
+ * is dangerous.
+ */
+ WARN_ON_ONCE(!irqs_disabled());
mp_ops->smp_finish();
cpu_idle();
--
1.7.5.4
From: Yong Zhang <[email protected]>
Since we have delayed irq enabling to ->smp_finish()
Signed-off-by: Yong Zhang <[email protected]>
Acked-by: David Daney <[email protected]>
---
arch/mips/kernel/sync-r4k.c | 5 -----
1 files changed, 0 insertions(+), 5 deletions(-)
diff --git a/arch/mips/kernel/sync-r4k.c b/arch/mips/kernel/sync-r4k.c
index 99f913c..842d55e 100644
--- a/arch/mips/kernel/sync-r4k.c
+++ b/arch/mips/kernel/sync-r4k.c
@@ -111,7 +111,6 @@ void __cpuinit synchronise_count_master(void)
void __cpuinit synchronise_count_slave(void)
{
int i;
- unsigned long flags;
unsigned int initcount;
int ncpus;
@@ -123,8 +122,6 @@ void __cpuinit synchronise_count_slave(void)
return;
#endif
- local_irq_save(flags);
-
/*
* Not every cpu is online at the time this gets called,
* so we first wait for the master to say everyone is ready
@@ -154,7 +151,5 @@ void __cpuinit synchronise_count_slave(void)
}
/* Arrange for an interrupt in a short while */
write_c0_compare(read_c0_count() + COUNTON);
-
- local_irq_restore(flags);
}
#undef NR_LOOPS
--
1.7.5.4
Hello.
On 21-05-2012 10:00, Yong Zhang wrote:
> From: Yong Zhang<[email protected]>
> To prevent a problem as commit 5fbd036b [sched: Cleanup cpu_active madness]
> and commit 2baab4e9 [sched: Fix select_fallback_rq() vs cpu_active/cpu_online]
> try to resolve, move set_cpu_online() to the brought up CPU and with irq
^^^^^^^^^^
Now the same change in the subject please.
> disabled.
> Signed-off-by: Yong Zhang <[email protected]>
> Acked-by: David Daney <[email protected]>
WBR, Sergei
On 05/21/2012 11:30 AM, Yong Zhang wrote:
> From: Yong Zhang <[email protected]>
>
> To prevent a problem as commit 5fbd036b [sched: Cleanup cpu_active madness]
> and commit 2baab4e9 [sched: Fix select_fallback_rq() vs cpu_active/cpu_online]
> try to resolve, move set_cpu_online() to the brought up CPU and with irq
> disabled.
>
> Signed-off-by: Yong Zhang <[email protected]>
> Acked-by: David Daney <[email protected]>
> ---
> arch/mips/kernel/smp.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
> index 73a268a..042145f 100644
> --- a/arch/mips/kernel/smp.c
> +++ b/arch/mips/kernel/smp.c
> @@ -122,6 +122,8 @@ asmlinkage __cpuinit void start_secondary(void)
>
> notify_cpu_starting(cpu);
>
> + set_cpu_online(cpu, true);
> +
You will also need to use ipi_call_lock/unlock() around this.
See how x86 does it. (MIPS also selects USE_GENERIC_SMP_HELPERS).
Regards,
Srivatsa S. Bhat
> set_cpu_sibling_map(cpu);
>
> cpu_set(cpu, cpu_callin_map);
> @@ -249,8 +251,6 @@ int __cpuinit __cpu_up(unsigned int cpu)
> while (!cpu_isset(cpu, cpu_callin_map))
> udelay(100);
>
> - set_cpu_online(cpu, true);
> -
> return 0;
> }
>
On Mon, May 21, 2012 at 02:30:57PM +0400, Sergei Shtylyov wrote:
> Hello.
>
> On 21-05-2012 10:00, Yong Zhang wrote:
>
> >From: Yong Zhang<[email protected]>
>
> >To prevent a problem as commit 5fbd036b [sched: Cleanup cpu_active madness]
> >and commit 2baab4e9 [sched: Fix select_fallback_rq() vs cpu_active/cpu_online]
> >try to resolve, move set_cpu_online() to the brought up CPU and with irq
> ^^^^^^^^^^
> Now the same change in the subject please.
Ah, yes, forgot that.
Thanks,
Yong
On Mon, May 21, 2012 at 04:09:16PM +0530, Srivatsa S. Bhat wrote:
> On 05/21/2012 11:30 AM, Yong Zhang wrote:
>
> > From: Yong Zhang <[email protected]>
> >
> > To prevent a problem as commit 5fbd036b [sched: Cleanup cpu_active madness]
> > and commit 2baab4e9 [sched: Fix select_fallback_rq() vs cpu_active/cpu_online]
> > try to resolve, move set_cpu_online() to the brought up CPU and with irq
> > disabled.
> >
> > Signed-off-by: Yong Zhang <[email protected]>
> > Acked-by: David Daney <[email protected]>
> > ---
> > arch/mips/kernel/smp.c | 4 ++--
> > 1 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
> > index 73a268a..042145f 100644
> > --- a/arch/mips/kernel/smp.c
> > +++ b/arch/mips/kernel/smp.c
> > @@ -122,6 +122,8 @@ asmlinkage __cpuinit void start_secondary(void)
> >
> > notify_cpu_starting(cpu);
> >
> > + set_cpu_online(cpu, true);
> > +
>
>
> You will also need to use ipi_call_lock/unlock() around this.
> See how x86 does it. (MIPS also selects USE_GENERIC_SMP_HELPERS).
Hmm... But look at the comments in arch/x86/kernel/smpboot.c::start_secondary()
start_secondary()
{
...
/*
* We need to hold call_lock, so there is no inconsistency
* between the time smp_call_function() determines number of
* IPI recipients, and the time when the determination is made
* for which cpus receive the IPI. Holding this
* lock helps us to not include this cpu in a currently in progress
* smp_call_function().
*
* We need to hold vector_lock so there the set of online cpus
* does not change while we are assigning vectors to cpus. Holding
* this lock ensures we don't half assign or remove an irq from a cpu.
*/
ipi_call_lock();
lock_vector_lock();
set_cpu_online(smp_processor_id(), true);
unlock_vector_lock();
ipi_call_unlock();
...
}
which ipi_call_lock()/ipi_call_unlock() is to pretect race with concurrent
smp_call_function(), but it seems that is already broken, because
1) The comments is alread there before we switch to generic smp helper(commit
3b16cf87), and at that time the comments is true because
smp_call_function_interrupt() doesn't test if a cpu should handle the
IPI interrupt.
But in the gereric smp helper, we have checked if a cpu should handle the IPI
in generic_smp_call_function_interrupt():
if (!cpumask_test_cpu(cpu, data->cpumask))
continue;
2) call_function.lock used in smp_call_function_many() is just to protect
call_function.queue and &data->refs, cpu_online_mask is outside of the
lock. And I don't think it's necessary to protect cpu_online_mask,
because data->cpumask is pre-calculate and even if a cpu is brougt up
when calling arch_send_call_function_ipi_mask(), it's harmless because
validation test in generic_smp_call_function_interrupt() will take care
of it.
3) For cpu down issue, stop_machine() will guarantee that no concurrent
smp_call_fuction() is processing.
So it seems ipi_call_lock()/ipi_call_unlock() is not needed and could be
removed IMHO.
Or am I missing something?
Thanks,
Yong
On 05/22/2012 11:51 AM, Yong Zhang wrote:
> On Mon, May 21, 2012 at 04:09:16PM +0530, Srivatsa S. Bhat wrote:
>> On 05/21/2012 11:30 AM, Yong Zhang wrote:
>>
>>> From: Yong Zhang <[email protected]>
>>>
>>> To prevent a problem as commit 5fbd036b [sched: Cleanup cpu_active madness]
>>> and commit 2baab4e9 [sched: Fix select_fallback_rq() vs cpu_active/cpu_online]
>>> try to resolve, move set_cpu_online() to the brought up CPU and with irq
>>> disabled.
>>>
>>> Signed-off-by: Yong Zhang <[email protected]>
>>> Acked-by: David Daney <[email protected]>
>>> ---
>>> arch/mips/kernel/smp.c | 4 ++--
>>> 1 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
>>> index 73a268a..042145f 100644
>>> --- a/arch/mips/kernel/smp.c
>>> +++ b/arch/mips/kernel/smp.c
>>> @@ -122,6 +122,8 @@ asmlinkage __cpuinit void start_secondary(void)
>>>
>>> notify_cpu_starting(cpu);
>>>
>>> + set_cpu_online(cpu, true);
>>> +
>>
>>
>> You will also need to use ipi_call_lock/unlock() around this.
>> See how x86 does it. (MIPS also selects USE_GENERIC_SMP_HELPERS).
>
> Hmm... But look at the comments in arch/x86/kernel/smpboot.c::start_secondary()
>
> start_secondary()
> {
> ...
> /*
> * We need to hold call_lock, so there is no inconsistency
> * between the time smp_call_function() determines number of
> * IPI recipients, and the time when the determination is made
> * for which cpus receive the IPI. Holding this
> * lock helps us to not include this cpu in a currently in progress
> * smp_call_function().
> *
> * We need to hold vector_lock so there the set of online cpus
> * does not change while we are assigning vectors to cpus. Holding
> * this lock ensures we don't half assign or remove an irq from a cpu.
> */
> ipi_call_lock();
> lock_vector_lock();
> set_cpu_online(smp_processor_id(), true);
> unlock_vector_lock();
> ipi_call_unlock();
>
> ...
> }
>
> which ipi_call_lock()/ipi_call_unlock() is to pretect race with concurrent
> smp_call_function(), but it seems that is already broken, because
>
> 1) The comments is alread there before we switch to generic smp helper(commit
> 3b16cf87), and at that time the comments is true because
> smp_call_function_interrupt() doesn't test if a cpu should handle the
> IPI interrupt.
> But in the gereric smp helper, we have checked if a cpu should handle the IPI
> in generic_smp_call_function_interrupt():
> if (!cpumask_test_cpu(cpu, data->cpumask))
> continue;
>
> 2) call_function.lock used in smp_call_function_many() is just to protect
> call_function.queue and &data->refs, cpu_online_mask is outside of the
> lock. And I don't think it's necessary to protect cpu_online_mask,
> because data->cpumask is pre-calculate and even if a cpu is brougt up
> when calling arch_send_call_function_ipi_mask(), it's harmless because
> validation test in generic_smp_call_function_interrupt() will take care
> of it.
>
> 3) For cpu down issue, stop_machine() will guarantee that no concurrent
> smp_call_fuction() is processing.
>
> So it seems ipi_call_lock()/ipi_call_unlock() is not needed and could be
> removed IMHO.
> Or am I missing something?
>
No, I think you are right. Sorry for the delay in replying.
It indeed looks like we need not use ipi_call_lock/unlock() in CPU bringup
code..
However, it does make me wonder about this:
commit 3d4422332 introduced the generic ipi helpers, and reduced the scope
of call_function.lock and also added the check in
generic_smp_call_function_interrupt() to proceed only if the cpu is present
in data->cpumask.
Then, commit 3b16cf8748 converted x86 to the generic ipi helpers, but while
doing that, it explicitly retained ipi_call_lock/unlock(), which is kind of
surprising.. I guess it was a mistake rather than intentional.
Regards,
Srivatsa S. Bhat
On Mon, May 28, 2012 at 05:34:51PM +0530, Srivatsa S. Bhat wrote:
> No, I think you are right. Sorry for the delay in replying.
> It indeed looks like we need not use ipi_call_lock/unlock() in CPU bringup
> code..
>
> However, it does make me wonder about this:
> commit 3d4422332 introduced the generic ipi helpers, and reduced the scope
> of call_function.lock and also added the check in
> generic_smp_call_function_interrupt() to proceed only if the cpu is present
> in data->cpumask.
>
> Then, commit 3b16cf8748 converted x86 to the generic ipi helpers, but while
> doing that, it explicitly retained ipi_call_lock/unlock(), which is kind of
> surprising.. I guess it was a mistake rather than intentional.
Agree. I think it's a mistake(or leftover) too :)
Anyway, let me cook a patch to throw a stone to clear the road.
Thanks,
Yong