2014-04-02 17:58:00

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] x86: replace timeouts when booting secondary CPU with infinite wait loop

Igor Mammedov <[email protected]> writes:

> Hang is observed on virtual machines during CPU hotplug,
> especially in big guests with many CPUs. (It reproducible
> more often if host is over-committed).
>
> It happens because master CPU gives up waiting on
> secondary CPU and allows it to run wild. As result
> AP causes locking or crashing system. For example
> as described here: https://lkml.org/lkml/2014/3/6/257
>
> If master CPU have sent STARTUP IPI successfully,
> make it wait indefinitely till AP boots.


But what happens on a real machine when the other CPU is dead?

I've seen that. Kernel still boots. With your patch it would
hang.

I don't think you can do that. It needs to have some timeout.
Maybe a longer or configurable one?

-Andi

--
[email protected] -- Speaking for myself only


2014-04-02 21:31:14

by Igor Mammedov

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] x86: replace timeouts when booting secondary CPU with infinite wait loop

On Wed, 02 Apr 2014 10:15:29 -0700
Andi Kleen <[email protected]> wrote:

> Igor Mammedov <[email protected]> writes:
>
> > Hang is observed on virtual machines during CPU hotplug,
> > especially in big guests with many CPUs. (It reproducible
> > more often if host is over-committed).
> >
> > It happens because master CPU gives up waiting on
> > secondary CPU and allows it to run wild. As result
> > AP causes locking or crashing system. For example
> > as described here: https://lkml.org/lkml/2014/3/6/257
> >
> > If master CPU have sent STARTUP IPI successfully,
> > make it wait indefinitely till AP boots.
>
>
> But what happens on a real machine when the other CPU is dead?
One possible way to boot such machine would be to disable dead CPU
in kernel parameters.

> I've seen that. Kernel still boots. With your patch it would
> hang.
>
> I don't think you can do that. It needs to have some timeout.
> Maybe a longer or configurable one?
there were patch that tried to keep timeouts and 'gracefully'
cancel AP boot if master timed out on it.
https://lkml.org/lkml/2014/3/6/257

It's possible to keep timeouts in do_boot_cpu(), is setting
trampoline_status sufficient indication that AP is not dead
and worth waiting for?

than it could be rewritten like this:
if (!boot_error) {
boot_error = 1;
for (timeout = 0; timeout < 50000; timeout++) {
/* Wait till AP signals that it's ready to start initialization */
if (*trampoline_status == 0xA5A5A5A5) {
boot_error = 0;
/* allow AP to start initializing. */
cpumask_set_cpu(cpu, cpu_callout_mask);

/* wait till AP boots till cpu_callin_mask point */
while (cpumask_test_cpu(cpu, cpu_callin_mask))
schedule();

break; /* It has booted */
}
udelay(100);
}
}

it will provide timeout if AP is dead and still keep AP from running wild
if master CPU timed out on it.


>
> -Andi
>
> --
> [email protected] -- Speaking for myself only


--
Regards,
Igor

2014-04-02 23:48:14

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] x86: replace timeouts when booting secondary CPU with infinite wait loop

On Wed, Apr 02, 2014 at 11:29:56PM +0200, Igor Mammedov wrote:
> On Wed, 02 Apr 2014 10:15:29 -0700
> Andi Kleen <[email protected]> wrote:
>
> > Igor Mammedov <[email protected]> writes:
> >
> > > Hang is observed on virtual machines during CPU hotplug,
> > > especially in big guests with many CPUs. (It reproducible
> > > more often if host is over-committed).
> > >
> > > It happens because master CPU gives up waiting on
> > > secondary CPU and allows it to run wild. As result
> > > AP causes locking or crashing system. For example
> > > as described here: https://lkml.org/lkml/2014/3/6/257
> > >
> > > If master CPU have sent STARTUP IPI successfully,
> > > make it wait indefinitely till AP boots.
> >
> >
> > But what happens on a real machine when the other CPU is dead?
> One possible way to boot such machine would be to disable dead CPU
> in kernel parameters.

That would need explicit user action. It's much better to recover
automatically, even if somewhat crippled.

-Andi

2014-04-03 06:43:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] x86: replace timeouts when booting secondary CPU with infinite wait loop


* Igor Mammedov <[email protected]> wrote:

> > I've seen that. Kernel still boots. With your patch it would hang.

Nonsense, not booting is OK when critical hardware is genuinely bad -
this isn't a disk drive or networking where bad IO 'happens sometimes'
and failure is something we have to engineer for - this is the CPU!

If a critical piece of hardware like the CPU or RAM is non-functional
then it should be excluded by the user explicitly, not worked around
after some ugly, non-deterministic and fragile timeout.

The timeout in the SMP bringup code was really an ancient property,
introduced back more than a decade ago when hardware makers were
ignorant of Linux we were ignorant of how to properly interface with
SMP hardware.

Today a 'timeout' means one of 3 things:

- bad, fragile hardware - this we don't want to hide, unless
explicitly told so by the user. I've seen such symptoms related to
overclocking for example - so not booting is perfectly justified,
it can prevent reporting a bogus kernel crash down the line.

- buggy SMP bringup. That is a bug that needs to be fixed, not
worked around.

- timeout fragility in virtualized environments

I'm not aware of any genuine case where timing out is the correct
thing to do.

So the patches look fine to me as-is, I planned on looking at them
more closely after the merge window.

Thanks,

Ingo

2014-04-03 21:03:07

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] x86: replace timeouts when booting secondary CPU with infinite wait loop

On Thu, Apr 03, 2014 at 08:43:37AM +0200, Ingo Molnar wrote:
>
> * Igor Mammedov <[email protected]> wrote:
>
> > > I've seen that. Kernel still boots. With your patch it would hang.
>
> Nonsense, not booting is OK when critical hardware is genuinely bad -
> this isn't a disk drive or networking where bad IO 'happens sometimes'
> and failure is something we have to engineer for - this is the CPU!
>
> If a critical piece of hardware like the CPU or RAM is non-functional
> then it should be excluded by the user explicitly, not worked around
> after some ugly, non-deterministic and fragile timeout.

That's generally not true. We try to recover as best as we can
and continue.

That's true for RCU stalls, and RAM errors (hwpoison) and
other error conditions. It's true for kernel problems
(we try to oops and continue, not to panic etc.)

Hanging forever is not recovering, it's just poor and broken
error handling and generally not acceptable these days.

-Andi