2021-09-01 05:20:24

by Longpeng(Mike)

[permalink] [raw]
Subject: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup again

The cpu's cpu_hotplug_state will be set to CPU_UP_PREPARE before
the cpu is waken up, but it won't be reset when the failure occurs.
Then the user cannot to make the cpu online anymore, because the
CPU_UP_PREPARE state makes cpu_check_up_prepare() unhappy.

We should allow the user to try again in this case.

Signed-off-by: Longpeng(Mike) <[email protected]>
---
kernel/smpboot.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index f6bc0bc..d18f8ff 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu)
*/
return -EAGAIN;

+ case CPU_UP_PREPARE:
+ /*
+ * The CPU failed to bringup last time, allow the user
+ * continue to try to start it up.
+ */
+ return 0;
+
default:

/* Should not happen. Famous last words. */
--
1.8.3.1


Subject: Re: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup again

On 2021-09-01 13:11:43 [+0800], Longpeng(Mike) wrote:
> The cpu's cpu_hotplug_state will be set to CPU_UP_PREPARE before
> the cpu is waken up, but it won't be reset when the failure occurs.
> Then the user cannot to make the cpu online anymore, because the
> CPU_UP_PREPARE state makes cpu_check_up_prepare() unhappy.
>
> We should allow the user to try again in this case.

Can you please describe where it failed / what did you reach that state?

> Signed-off-by: Longpeng(Mike) <[email protected]>
> ---
> kernel/smpboot.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index f6bc0bc..d18f8ff 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu)
> */
> return -EAGAIN;
>
> + case CPU_UP_PREPARE:
> + /*
> + * The CPU failed to bringup last time, allow the user
> + * continue to try to start it up.
> + */
> + return 0;
> +
> default:
>
> /* Should not happen. Famous last words. */
> --
> 1.8.3.1

Sebastian

2021-10-08 03:13:13

by Longpeng(Mike)

[permalink] [raw]
Subject: RE: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup again



> -----Original Message-----
> From: Sebastian Andrzej Siewior [mailto:[email protected]]
> Sent: Thursday, September 30, 2021 10:01 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; Gonglei (Arei)
> <[email protected]>
> Subject: Re: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup
> again
>
> On 2021-09-01 13:11:43 [+0800], Longpeng(Mike) wrote:
> > The cpu's cpu_hotplug_state will be set to CPU_UP_PREPARE before
> > the cpu is waken up, but it won't be reset when the failure occurs.
> > Then the user cannot to make the cpu online anymore, because the
> > CPU_UP_PREPARE state makes cpu_check_up_prepare() unhappy.
> >
> > We should allow the user to try again in this case.
>
> Can you please describe where it failed / what did you reach that state?
>

native_cpu_up
cpu_check_up_prepare
do_boot_cpu
/* Wait 10s total for first sign of life from AP */

It will fail if the AP doesn't response in 10s and then cpu_hotplug_state
will stay in CPU_UP_PREPARE state.

This could happen on a virtualized system, especially in some special usages,
e.g. Software Enclaves [1][2]

[1] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
[2] https://www.alibabacloud.com/help/doc-detail/203433.htm?spm=a3c0i.23986742.6981761520.1.7e30715eZCRXmk


> > Signed-off-by: Longpeng(Mike) <[email protected]>
> > ---
> > kernel/smpboot.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> > index f6bc0bc..d18f8ff 100644
> > --- a/kernel/smpboot.c
> > +++ b/kernel/smpboot.c
> > @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu)
> > */
> > return -EAGAIN;
> >
> > + case CPU_UP_PREPARE:
> > + /*
> > + * The CPU failed to bringup last time, allow the user
> > + * continue to try to start it up.
> > + */
> > + return 0;
> > +
> > default:
> >
> > /* Should not happen. Famous last words. */
> > --
> > 1.8.3.1
>
> Sebastian

Subject: Re: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup again

Sorry for forgetting…

On 2021-10-08 03:10:34 [+0000], Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote:
> > -----Original Message-----
> > From: Sebastian Andrzej Siewior [mailto:[email protected]]
> > Sent: Thursday, September 30, 2021 10:01 PM
> > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; Gonglei (Arei)
> > <[email protected]>
> > Subject: Re: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup
> > again
> >
> > On 2021-09-01 13:11:43 [+0800], Longpeng(Mike) wrote:
> > > The cpu's cpu_hotplug_state will be set to CPU_UP_PREPARE before
> > > the cpu is waken up, but it won't be reset when the failure occurs.
> > > Then the user cannot to make the cpu online anymore, because the
> > > CPU_UP_PREPARE state makes cpu_check_up_prepare() unhappy.
> > >
> > > We should allow the user to try again in this case.
> >
> > Can you please describe where it failed / what did you reach that state?
> >
>
> native_cpu_up
> cpu_check_up_prepare
> do_boot_cpu
> /* Wait 10s total for first sign of life from AP */
>
> It will fail if the AP doesn't response in 10s and then cpu_hotplug_state
> will stay in CPU_UP_PREPARE state.
>
> This could happen on a virtualized system, especially in some special usages,
> e.g. Software Enclaves [1][2]

So wakeup_cpu_via_init_nmi() / wakeup_secondary_cpu() succeeds but the
CPU does not show up with 10 seconds.
Does the CPU come in later and spins in wait_for_master_cpu() or is the
CPU completely missing?

> [1] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
> [2] https://www.alibabacloud.com/help/doc-detail/203433.htm?spm=a3c0i.23986742.6981761520.1.7e30715eZCRXmk
>
>
> > > Signed-off-by: Longpeng(Mike) <[email protected]>
> > > ---
> > > kernel/smpboot.c | 7 +++++++
> > > 1 file changed, 7 insertions(+)
> > >
> > > diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> > > index f6bc0bc..d18f8ff 100644
> > > --- a/kernel/smpboot.c
> > > +++ b/kernel/smpboot.c
> > > @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu)
> > > */
> > > return -EAGAIN;
> > >
> > > + case CPU_UP_PREPARE:
> > > + /*
> > > + * The CPU failed to bringup last time, allow the user
> > > + * continue to try to start it up.
> > > + */
> > > + return 0;
> > > +
> > > default:
> > >
> > > /* Should not happen. Famous last words. */

Sebastian

2021-11-22 00:26:10

by Longpeng(Mike)

[permalink] [raw]
Subject: RE: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup again



> -----Original Message-----
> From: Sebastian Andrzej Siewior [mailto:[email protected]]
> Sent: Saturday, November 20, 2021 1:37 AM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; Gonglei (Arei)
> <[email protected]>
> Subject: Re: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup
> again
>
> Sorry for forgetting…
>
> On 2021-10-08 03:10:34 [+0000], Longpeng (Mike, Cloud Infrastructure Service
> Product Dept.) wrote:
> > > -----Original Message-----
> > > From: Sebastian Andrzej Siewior [mailto:[email protected]]
> > > Sent: Thursday, September 30, 2021 10:01 PM
> > > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > > <[email protected]>
> > > Cc: [email protected]; [email protected]; [email protected];
> > > [email protected]; [email protected]; Gonglei (Arei)
> > > <[email protected]>
> > > Subject: Re: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup
> > > again
> > >
> > > On 2021-09-01 13:11:43 [+0800], Longpeng(Mike) wrote:
> > > > The cpu's cpu_hotplug_state will be set to CPU_UP_PREPARE before
> > > > the cpu is waken up, but it won't be reset when the failure occurs.
> > > > Then the user cannot to make the cpu online anymore, because the
> > > > CPU_UP_PREPARE state makes cpu_check_up_prepare() unhappy.
> > > >
> > > > We should allow the user to try again in this case.
> > >
> > > Can you please describe where it failed / what did you reach that state?
> > >
> >
> > native_cpu_up
> > cpu_check_up_prepare
> > do_boot_cpu
> > /* Wait 10s total for first sign of life from AP */
> >
> > It will fail if the AP doesn't response in 10s and then cpu_hotplug_state
> > will stay in CPU_UP_PREPARE state.
> >
> > This could happen on a virtualized system, especially in some special usages,
> > e.g. Software Enclaves [1][2]
>
> So wakeup_cpu_via_init_nmi() / wakeup_secondary_cpu() succeeds but the
> CPU does not show up with 10 seconds.
> Does the CPU come in later and spins in wait_for_master_cpu() or is the
> CPU completely missing?
>

The cpu is completely missing at the moment since the hypervisor can reject
all events that send to this cpu when the enclave vm is running.

But the cpu can receive the events and bring up again if the enclave vm is
terminated.


> > [1] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
> > [2]
> https://www.alibabacloud.com/help/doc-detail/203433.htm?spm=a3c0i.23986742.
> 6981761520.1.7e30715eZCRXmk
> >
> >
> > > > Signed-off-by: Longpeng(Mike) <[email protected]>
> > > > ---
> > > > kernel/smpboot.c | 7 +++++++
> > > > 1 file changed, 7 insertions(+)
> > > >
> > > > diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> > > > index f6bc0bc..d18f8ff 100644
> > > > --- a/kernel/smpboot.c
> > > > +++ b/kernel/smpboot.c
> > > > @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu)
> > > > */
> > > > return -EAGAIN;
> > > >
> > > > + case CPU_UP_PREPARE:
> > > > + /*
> > > > + * The CPU failed to bringup last time, allow the user
> > > > + * continue to try to start it up.
> > > > + */
> > > > + return 0;
> > > > +
> > > > default:
> > > >
> > > > /* Should not happen. Famous last words. */
>
> Sebastian