apic_icr_write and its users in smpboot.c were apparently written under
the assumption that this code would only run during early boot. But
nowadays we also execute it when onlining a CPU later on while the
system is fully running. That will make wakeup_cpu_via_init_nmi and,
thus, also native_apic_icr_write run in plain process context. If we
migrate the caller to a different CPU at the wrong time or interrupt it
and write to ICR/ICR2 to send unrelated IPIs, we can end up sending
INIT, SIPI or NMIs to wrong CPUs.
Fix this by disabling interrupts during the write to the ICR halves and
disable preemption around waiting for ICR availability and using it.
Signed-off-by: Jan Kiszka <[email protected]>
---
arch/x86/kernel/apic/apic.c | 4 ++++
arch/x86/kernel/smpboot.c | 11 +++++++++--
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 7f26c9a..06f90b8 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -283,8 +283,12 @@ u32 native_safe_apic_wait_icr_idle(void)
void native_apic_icr_write(u32 low, u32 id)
{
+ unsigned long flags;
+
+ local_irq_save(flags);
apic_write(APIC_ICR2, SET_APIC_DEST_FIELD(id));
apic_write(APIC_ICR, low);
+ local_irq_restore(flags);
}
u64 native_apic_icr_read(void)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index a32da80..37e11e5 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -701,11 +701,15 @@ wakeup_cpu_via_init_nmi(int cpu, unsigned long start_ip, int apicid,
int id;
int boot_error;
+ preempt_disable();
+
/*
* Wake up AP by INIT, INIT, STARTUP sequence.
*/
- if (cpu)
- return wakeup_secondary_cpu_via_init(apicid, start_ip);
+ if (cpu) {
+ boot_error = wakeup_secondary_cpu_via_init(apicid, start_ip);
+ goto out;
+ }
/*
* Wake up BSP by nmi.
@@ -725,6 +729,9 @@ wakeup_cpu_via_init_nmi(int cpu, unsigned long start_ip, int apicid,
boot_error = wakeup_secondary_cpu_via_nmi(id, start_ip);
}
+out:
+ preempt_enable();
+
return boot_error;
}
--
1.8.1.1.298.ge7eed54
On Mon, Jan 27, 2014 at 08:14:06PM +0100, Jan Kiszka wrote:
> apic_icr_write and its users in smpboot.c were apparently written under
> the assumption that this code would only run during early boot. But
> nowadays we also execute it when onlining a CPU later on while the
> system is fully running. That will make wakeup_cpu_via_init_nmi and,
> thus, also native_apic_icr_write run in plain process context. If we
> migrate the caller to a different CPU at the wrong time or interrupt it
> and write to ICR/ICR2 to send unrelated IPIs, we can end up sending
> INIT, SIPI or NMIs to wrong CPUs.
>
> Fix this by disabling interrupts during the write to the ICR halves and
> disable preemption around waiting for ICR availability and using it.
If you just want to disable migration use get_cpu()/put_cpu()
-Andi
On 2014-01-27 21:22, Andi Kleen wrote:
> On Mon, Jan 27, 2014 at 08:14:06PM +0100, Jan Kiszka wrote:
>> apic_icr_write and its users in smpboot.c were apparently written under
>> the assumption that this code would only run during early boot. But
>> nowadays we also execute it when onlining a CPU later on while the
>> system is fully running. That will make wakeup_cpu_via_init_nmi and,
>> thus, also native_apic_icr_write run in plain process context. If we
>> migrate the caller to a different CPU at the wrong time or interrupt it
>> and write to ICR/ICR2 to send unrelated IPIs, we can end up sending
>> INIT, SIPI or NMIs to wrong CPUs.
>>
>> Fix this by disabling interrupts during the write to the ICR halves and
>> disable preemption around waiting for ICR availability and using it.
>
> If you just want to disable migration use get_cpu()/put_cpu()
Fine with me if that is now preferred. Will that be the upstream way of
-rt's migrate_disable()?
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
* Jan Kiszka <[email protected]> wrote:
> On 2014-01-27 21:22, Andi Kleen wrote:
> > On Mon, Jan 27, 2014 at 08:14:06PM +0100, Jan Kiszka wrote:
> >> apic_icr_write and its users in smpboot.c were apparently written under
> >> the assumption that this code would only run during early boot. But
> >> nowadays we also execute it when onlining a CPU later on while the
> >> system is fully running. That will make wakeup_cpu_via_init_nmi and,
> >> thus, also native_apic_icr_write run in plain process context. If we
> >> migrate the caller to a different CPU at the wrong time or interrupt it
> >> and write to ICR/ICR2 to send unrelated IPIs, we can end up sending
> >> INIT, SIPI or NMIs to wrong CPUs.
> >>
> >> Fix this by disabling interrupts during the write to the ICR halves and
> >> disable preemption around waiting for ICR availability and using it.
> >
> > If you just want to disable migration use get_cpu()/put_cpu()
>
> Fine with me if that is now preferred. Will that be the upstream way of
> -rt's migrate_disable()?
Your original patch is fine, the suggestion to do ICR accesses with
just preemption disabled is crap and is really asking for trouble: if
some IRQ comes in at that point after all then it might cause all
sorts of hard to debug problems (hangs, delays, missed IPIs, etc.).
Thanks,
Ingo
On 2014-01-28 12:55, Ingo Molnar wrote:
>
> * Jan Kiszka <[email protected]> wrote:
>
>> On 2014-01-27 21:22, Andi Kleen wrote:
>>> On Mon, Jan 27, 2014 at 08:14:06PM +0100, Jan Kiszka wrote:
>>>> apic_icr_write and its users in smpboot.c were apparently written under
>>>> the assumption that this code would only run during early boot. But
>>>> nowadays we also execute it when onlining a CPU later on while the
>>>> system is fully running. That will make wakeup_cpu_via_init_nmi and,
>>>> thus, also native_apic_icr_write run in plain process context. If we
>>>> migrate the caller to a different CPU at the wrong time or interrupt it
>>>> and write to ICR/ICR2 to send unrelated IPIs, we can end up sending
>>>> INIT, SIPI or NMIs to wrong CPUs.
>>>>
>>>> Fix this by disabling interrupts during the write to the ICR halves and
>>>> disable preemption around waiting for ICR availability and using it.
>>>
>>> If you just want to disable migration use get_cpu()/put_cpu()
>>
>> Fine with me if that is now preferred. Will that be the upstream way of
>> -rt's migrate_disable()?
>
> Your original patch is fine, the suggestion to do ICR accesses with
> just preemption disabled is crap and is really asking for trouble: if
> some IRQ comes in at that point after all then it might cause all
> sorts of hard to debug problems (hangs, delays, missed IPIs, etc.).
Of course, we still need irqs off during ICR writes. I thought Andi was
just suggesting to replace preempt_disable with get_cpu, maybe to
document why we are disabling preemption here.
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
> Of course, we still need irqs off during ICR writes. I thought Andi was
> just suggesting to replace preempt_disable with get_cpu, maybe to
> document why we are disabling preemption here.
Yes that was my point.
Also with irq-off you of course still always have races against
the NMI-level machine check.
-Andi
On 2014-01-28 22:17, Andi Kleen wrote:
> Also with irq-off you of course still always have races against
> the NMI-level machine check.
The self-IPI triggered over NMI won't touch the high-part of the ICR and
will properly wait for ICR to become free again. So we are safe.
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
On 2014-01-27 20:14, Jan Kiszka wrote:
> apic_icr_write and its users in smpboot.c were apparently written under
> the assumption that this code would only run during early boot. But
> nowadays we also execute it when onlining a CPU later on while the
> system is fully running. That will make wakeup_cpu_via_init_nmi and,
> thus, also native_apic_icr_write run in plain process context. If we
> migrate the caller to a different CPU at the wrong time or interrupt it
> and write to ICR/ICR2 to send unrelated IPIs, we can end up sending
> INIT, SIPI or NMIs to wrong CPUs.
>
> Fix this by disabling interrupts during the write to the ICR halves and
> disable preemption around waiting for ICR availability and using it.
>
> Signed-off-by: Jan Kiszka <[email protected]>
> ---
> arch/x86/kernel/apic/apic.c | 4 ++++
> arch/x86/kernel/smpboot.c | 11 +++++++++--
> 2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 7f26c9a..06f90b8 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -283,8 +283,12 @@ u32 native_safe_apic_wait_icr_idle(void)
>
> void native_apic_icr_write(u32 low, u32 id)
> {
> + unsigned long flags;
> +
> + local_irq_save(flags);
> apic_write(APIC_ICR2, SET_APIC_DEST_FIELD(id));
> apic_write(APIC_ICR, low);
> + local_irq_restore(flags);
> }
>
> u64 native_apic_icr_read(void)
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index a32da80..37e11e5 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -701,11 +701,15 @@ wakeup_cpu_via_init_nmi(int cpu, unsigned long start_ip, int apicid,
> int id;
> int boot_error;
>
> + preempt_disable();
> +
> /*
> * Wake up AP by INIT, INIT, STARTUP sequence.
> */
> - if (cpu)
> - return wakeup_secondary_cpu_via_init(apicid, start_ip);
> + if (cpu) {
> + boot_error = wakeup_secondary_cpu_via_init(apicid, start_ip);
> + goto out;
> + }
>
> /*
> * Wake up BSP by nmi.
> @@ -725,6 +729,9 @@ wakeup_cpu_via_init_nmi(int cpu, unsigned long start_ip, int apicid,
> boot_error = wakeup_secondary_cpu_via_nmi(id, start_ip);
> }
>
> +out:
> + preempt_enable();
> +
> return boot_error;
> }
>
>
What's the status of this? Waiting for further review, or is it queued
somewhere by now? Would be good to have in 3.14, and then also in stable
kernels.
Jan