To online a new CPU, the master CPU signals the secondary and waits for at
most 10 seconds until cpu_initialized_mask is set by the secondary CPU.
There is a case that the master CPU fails the online when it takes 10+
seconds for schedule() to return (although the cpu_initialized_mask is
already set by the secondary CPU).
1. The master CPU signals the secondary CPU in do_boot_cpu().
2. As the cpu_initialized_mask is still not set, the master CPU reschedules
via schedule().
3. Suppose it takes 10+ seconds until schedule() returns due to performance
issue. The secondary CPU sets the cpu_initialized_mask during those 10+
seconds.
4. Once the schedule() at the master CPU returns, although the
cpu_initialized_mask is set, the time_before(jiffies, timeout) fails. As a
result, the master CPU regards this as failure.
[ 51.983296] smpboot: do_boot_cpu failed(-1) to wakeup CPU#4
5. Since the secondary CPU is stuck at state CPU_UP_PREPARE, any further
online operation will fail by cpu_check_up_prepare(), unless the below
patch set is applied.
https://lore.kernel.org/all/[email protected]/
This issue is resolved by always first checking whether the secondary CPU
has set cpu_initialized_mask.
Cc: Longpeng(Mike) <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Joe Jin <[email protected]>
Signed-off-by: Dongli Zhang <[email protected]>
---
To resend by Cc [email protected] as well.
arch/x86/kernel/smpboot.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 617012f4619f..faad4fcf67eb 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1145,7 +1145,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
*/
boot_error = -1;
timeout = jiffies + 10*HZ;
- while (time_before(jiffies, timeout)) {
+ while (true) {
if (cpumask_test_cpu(cpu, cpu_initialized_mask)) {
/*
* Tell AP to proceed with initialization
@@ -1154,6 +1154,10 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
boot_error = 0;
break;
}
+
+ if (time_after_eq(jiffies, timeout))
+ break;
+
schedule();
}
}
--
2.17.1
On Thu, 2021-12-23 at 13:03 -0800, Dongli Zhang wrote:
> To online a new CPU, the master CPU signals the secondary and waits
> for at
> most 10 seconds until cpu_initialized_mask is set by the secondary
> CPU.
> There is a case that the master CPU fails the online when it takes
> 10+
> seconds for schedule() to return (although the cpu_initialized_mask
> is
> already set by the secondary CPU).
>
> 1. The master CPU signals the secondary CPU in do_boot_cpu().
>
> 2. As the cpu_initialized_mask is still not set, the master CPU
> reschedules
> via schedule().
>
> 3. Suppose it takes 10+ seconds until schedule() returns due to
> performance
> issue. The secondary CPU sets the cpu_initialized_mask during those
> 10+
> seconds.
The patch seems reasonable. But do you know what other task got run
and caused such long delay (>10 sec), preventing switch
back to in the master CPU? It seems like there could be some problem
causing the long delay. It is better if we understand the root cause
of that.
Tim
>
> 4. Once the schedule() at the master CPU returns, although the
> cpu_initialized_mask is set, the time_before(jiffies, timeout) fails.
> As a
> result, the master CPU regards this as failure.
>
> [ 51.983296] smpboot: do_boot_cpu failed(-1) to wakeup CPU#4
>
> 5. Since the secondary CPU is stuck at state CPU_UP_PREPARE, any
> further
> online operation will fail by cpu_check_up_prepare(), unless the
> below
> patch set is applied.
>
> https://lore.kernel.org/all/[email protected]/
>
> This issue is resolved by always first checking whether the secondary
> CPU
> has set cpu_initialized_mask.
>
> Cc: Longpeng(Mike) <[email protected]>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Cc: Joe Jin <[email protected]>
> Signed-off-by: Dongli Zhang <[email protected]>
> ---
> To resend by Cc [email protected] as well.
>
> arch/x86/kernel/smpboot.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 617012f4619f..faad4fcf67eb 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1145,7 +1145,7 @@ static int do_boot_cpu(int apicid, int cpu,
> struct task_struct *idle,
> */
> boot_error = -1;
> timeout = jiffies + 10*HZ;
> - while (time_before(jiffies, timeout)) {
> + while (true) {
> if (cpumask_test_cpu(cpu,
> cpu_initialized_mask)) {
> /*
> * Tell AP to proceed with
> initialization
> @@ -1154,6 +1154,10 @@ static int do_boot_cpu(int apicid, int cpu,
> struct task_struct *idle,
> boot_error = 0;
> break;
> }
> +
> + if (time_after_eq(jiffies, timeout))
> + break;
> +
> schedule();
> }
> }
Hi Tim,
On 1/5/22 11:19 AM, Tim Chen wrote:
> On Thu, 2021-12-23 at 13:03 -0800, Dongli Zhang wrote:
>> To online a new CPU, the master CPU signals the secondary and waits
>> for at
>> most 10 seconds until cpu_initialized_mask is set by the secondary
>> CPU.
>> There is a case that the master CPU fails the online when it takes
>> 10+
>> seconds for schedule() to return (although the cpu_initialized_mask
>> is
>> already set by the secondary CPU).
>>
>> 1. The master CPU signals the secondary CPU in do_boot_cpu().
>>
>> 2. As the cpu_initialized_mask is still not set, the master CPU
>> reschedules
>> via schedule().
>>
>> 3. Suppose it takes 10+ seconds until schedule() returns due to
>> performance
>> issue. The secondary CPU sets the cpu_initialized_mask during those
>> 10+
>> seconds.
>
> The patch seems reasonable. But do you know what other task got run
> and caused such long delay (>10 sec), preventing switch
> back to in the master CPU? It seems like there could be some problem
> causing the long delay. It is better if we understand the root cause
> of that.
So far I do not have a consistent repro to verify which kernel function (or
kernel thread) is making the trouble.
However, it is not necessary for other kernel task to take >10 sec to reproduce
the issue, e.g.:
1. The master CPU signals the secondary CPU.
2. Due to the issue at KVM or hardware level (let's assume there is an issue),
the secondary CPU is waken up at the 8th second.
3. The schedule() returns after 3 seconds (due to the 3-second delay by another
kernel task).
4. The 3+8=11 seconds are more than 10 seconds.
Therefore, we should first check whether the secondary CPU is waken up, instead
of timeout. Otherwise, the secondary CPU will be put into an non-recoverable
state, until the OS reboots,
Thank you very much!
Dongli Zhang
>
> Tim
>
>>
>> 4. Once the schedule() at the master CPU returns, although the
>> cpu_initialized_mask is set, the time_before(jiffies, timeout) fails.
>> As a
>> result, the master CPU regards this as failure.
>>
>> [ 51.983296] smpboot: do_boot_cpu failed(-1) to wakeup CPU#4
>>
>> 5. Since the secondary CPU is stuck at state CPU_UP_PREPARE, any
>> further
>> online operation will fail by cpu_check_up_prepare(), unless the
>> below
>> patch set is applied.
>>
>> https://urldefense.com/v3/__https://lore.kernel.org/all/[email protected]/__;!!ACWV5N9M2RV99hQ!d7NY8MgLj7W5ZGWS_0HHsvE52WNh2WJbJwLNBJYLGzGIY9BzKg-PUYiIYMmrwud36Ys$
>>
>> This issue is resolved by always first checking whether the secondary
>> CPU
>> has set cpu_initialized_mask.
>>
>> Cc: Longpeng(Mike) <[email protected]>
>> Cc: Sebastian Andrzej Siewior <[email protected]>
>> Cc: Joe Jin <[email protected]>
>> Signed-off-by: Dongli Zhang <[email protected]>
>> ---
>> To resend by Cc [email protected] as well.
>>
>> arch/x86/kernel/smpboot.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>> index 617012f4619f..faad4fcf67eb 100644
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -1145,7 +1145,7 @@ static int do_boot_cpu(int apicid, int cpu,
>> struct task_struct *idle,
>> */
>> boot_error = -1;
>> timeout = jiffies + 10*HZ;
>> - while (time_before(jiffies, timeout)) {
>> + while (true) {
>> if (cpumask_test_cpu(cpu,
>> cpu_initialized_mask)) {
>> /*
>> * Tell AP to proceed with
>> initialization
>> @@ -1154,6 +1154,10 @@ static int do_boot_cpu(int apicid, int cpu,
>> struct task_struct *idle,
>> boot_error = 0;
>> break;
>> }
>> +
>> + if (time_after_eq(jiffies, timeout))
>> + break;
>> +
>> schedule();
>> }
>> }
>
May I have feedback for this patch? This may mitigate a CPU hotplug issue which
is only recoverable across OS reboot, unless the below patch is available.
https://lore.kernel.org/all/[email protected]/
I see there is a patchset that may rework this part. That patch set does not
change the logic here.
https://lore.kernel.org/all/[email protected]/
Thank you very much!
Dongli Zhang
On 12/23/21 1:03 PM, Dongli Zhang wrote:
> To online a new CPU, the master CPU signals the secondary and waits for at
> most 10 seconds until cpu_initialized_mask is set by the secondary CPU.
> There is a case that the master CPU fails the online when it takes 10+
> seconds for schedule() to return (although the cpu_initialized_mask is
> already set by the secondary CPU).
>
> 1. The master CPU signals the secondary CPU in do_boot_cpu().
>
> 2. As the cpu_initialized_mask is still not set, the master CPU reschedules
> via schedule().
>
> 3. Suppose it takes 10+ seconds until schedule() returns due to performance
> issue. The secondary CPU sets the cpu_initialized_mask during those 10+
> seconds.
>
> 4. Once the schedule() at the master CPU returns, although the
> cpu_initialized_mask is set, the time_before(jiffies, timeout) fails. As a
> result, the master CPU regards this as failure.
>
> [ 51.983296] smpboot: do_boot_cpu failed(-1) to wakeup CPU#4
>
> 5. Since the secondary CPU is stuck at state CPU_UP_PREPARE, any further
> online operation will fail by cpu_check_up_prepare(), unless the below
> patch set is applied.
>
> https://lore.kernel.org/all/[email protected]/
>
> This issue is resolved by always first checking whether the secondary CPU
> has set cpu_initialized_mask.
>
> Cc: Longpeng(Mike) <[email protected]>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Cc: Joe Jin <[email protected]>
> Signed-off-by: Dongli Zhang <[email protected]>
> ---
> To resend by Cc [email protected] as well.
>
> arch/x86/kernel/smpboot.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 617012f4619f..faad4fcf67eb 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1145,7 +1145,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
> */
> boot_error = -1;
> timeout = jiffies + 10*HZ;
> - while (time_before(jiffies, timeout)) {
> + while (true) {
> if (cpumask_test_cpu(cpu, cpu_initialized_mask)) {
> /*
> * Tell AP to proceed with initialization
> @@ -1154,6 +1154,10 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
> boot_error = 0;
> break;
> }
> +
> + if (time_after_eq(jiffies, timeout))
> + break;
> +
> schedule();
> }
> }
>