The cpu hotplug path may be utilized while hotplug is disabled for a
brief moment leading to failures. As an example, attempts to perform
cpu hotplug by userspace soon after boot may race with pci_device_probe
leading to inconsistent results.
Proposed idea:
Call restart_syscall instead of returning -EBUSY since
cpu_hotplug_disabled seems to only have a positive value
for short, temporary amounts of time.
Does anyone see any serious problems with this?
Signed-off-by: Joel Savitz <[email protected]>
---
kernel/cpu.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 5797c2a7a93f..2992c7d1d24e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -35,6 +35,7 @@
#include <linux/percpu-rwsem.h>
#include <linux/cpuset.h>
#include <linux/random.h>
+#include <linux/delay.h>
#include <trace/events/power.h>
#define CREATE_TRACE_POINTS
@@ -1401,7 +1402,9 @@ static int cpu_up(unsigned int cpu, enum cpuhp_state target)
cpu_maps_update_begin();
if (cpu_hotplug_disabled) {
- err = -EBUSY;
+ /* avoid busy looping (5ms of sleep should be enough) */
+ msleep(5);
+ err = restart_syscall();
goto out;
}
if (!cpu_smt_allowed(cpu)) {
--
2.27.0
On 18.04.22 21:54, Joel Savitz wrote:
> The cpu hotplug path may be utilized while hotplug is disabled for a
> brief moment leading to failures. As an example, attempts to perform
> cpu hotplug by userspace soon after boot may race with pci_device_probe
> leading to inconsistent results.
You might want to extend a bit in which situation we observed that issue
fairly reliably.
When restricting the number of boot cpus on the kernel cmdline, e.g.,
via "maxcpus=2", udev will find the offline cpus when enumerating all
cpus and try onlining them. Due to the race, onlining of some cpus fails
e.g., when racing with pci_device_probe().
While teaching udev to not online coldplugged CPUs when "maxcpus" was
specified ("policy"), it revealed the underlying issue that onlining a
CPU can fail with -EBUSY in corner cases when cpu hotplug is temporarily
disabled.
>
> Proposed idea:
> Call restart_syscall instead of returning -EBUSY since
> cpu_hotplug_disabled seems to only have a positive value
> for short, temporary amounts of time.
>
> Does anyone see any serious problems with this?
>
> Signed-off-by: Joel Savitz <[email protected]>
> ---
> kernel/cpu.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 5797c2a7a93f..2992c7d1d24e 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -35,6 +35,7 @@
> #include <linux/percpu-rwsem.h>
> #include <linux/cpuset.h>
> #include <linux/random.h>
> +#include <linux/delay.h>
>
> #include <trace/events/power.h>
> #define CREATE_TRACE_POINTS
> @@ -1401,7 +1402,9 @@ static int cpu_up(unsigned int cpu, enum cpuhp_state target)
> cpu_maps_update_begin();
>
> if (cpu_hotplug_disabled) {
> - err = -EBUSY;
> + /* avoid busy looping (5ms of sleep should be enough) */
> + msleep(5);
> + err = restart_syscall();
It's worth noting that we use the same approach in
lock_device_hotplug_sysfs(). It's far from perfect I would say, but we
really wanted to avoid letting user space having to deal with retry logic.
For example, while memory onlining can fail with -EBUSY, it's not
expected to fail during memory onlining (we only fail in very rare
cases, when a memory notifier fails -- for example when kasan fails to
allocate memory).
--
Thanks,
David / dhildenb
>> It's far from perfect I would say, but we really wanted to avoid
>> letting user space having to deal with retry logic.
>
> What's so hard with retry logic in user space?
>
> If you can come up with a reasonable argument why user space cannot be
> fixed, then there is certainly a better solution than slapping a
> msleep(5) at some random place into the code.
Most probably you're right and we should just retry in udev. Staring at
the history, it looks like the -EBUSY might have been returned forever,
so user space just never really triggered it on actual CPU hotplug
because it doesn't usually happen that cpu hotplug is disabled.
--
Thanks,
David / dhildenb
On Tue, Apr 19 2022 at 14:34, David Hildenbrand wrote:
> On 18.04.22 21:54, Joel Savitz wrote:
>> The cpu hotplug path may be utilized while hotplug is disabled for a
>> brief moment leading to failures. As an example, attempts to perform
>> cpu hotplug by userspace soon after boot may race with pci_device_probe
>> leading to inconsistent results.
>
> You might want to extend a bit in which situation we observed that issue
> fairly reliably.
>
> When restricting the number of boot cpus on the kernel cmdline, e.g.,
> via "maxcpus=2", udev will find the offline cpus when enumerating all
> cpus and try onlining them. Due to the race, onlining of some cpus fails
> e.g., when racing with pci_device_probe().
maxcpus is a horrible hack and broken vs. MCE broadcasting on x86.
> While teaching udev to not online coldplugged CPUs when "maxcpus" was
> specified ("policy"), it revealed the underlying issue that onlining a
> CPU can fail with -EBUSY in corner cases when cpu hotplug is temporarily
> disabled.
Right. It can fail with -EBUSY and because userspace fails to handle it
gracefully we need to hack around it?
>> Proposed idea:
>> Call restart_syscall instead of returning -EBUSY since
>> cpu_hotplug_disabled seems to only have a positive value
>> for short, temporary amounts of time.
>>
>> Does anyone see any serious problems with this?
Yes. It's a horrible hack and wrong...
>> if (cpu_hotplug_disabled) {
>> - err = -EBUSY;
>> + /* avoid busy looping (5ms of sleep should be enough) */
>> + msleep(5);
>> + err = restart_syscall();
... as it sleeps with cpu_add_remove_lock held, which protects
cpu_hotplug_disabled. IOW, cpu_hotplug_enable() is blocked until
msleep() returns.
> It's worth noting that we use the same approach in
> lock_device_hotplug_sysfs().
That does not make it any better, really.
> It's far from perfect I would say, but we really wanted to avoid
> letting user space having to deal with retry logic.
What's so hard with retry logic in user space?
If you can come up with a reasonable argument why user space cannot be
fixed, then there is certainly a better solution than slapping a
msleep(5) at some random place into the code.
Thanks,
tglx
On 21.04.22 16:31, David Hildenbrand wrote:
>>> It's far from perfect I would say, but we really wanted to avoid
>>> letting user space having to deal with retry logic.
>>
>> What's so hard with retry logic in user space?
>>
>> If you can come up with a reasonable argument why user space cannot be
>> fixed, then there is certainly a better solution than slapping a
>> msleep(5) at some random place into the code.
> Most probably you're right and we should just retry in udev. Staring at
> the history, it looks like the -EBUSY might have been returned forever,
> so user space just never really triggered it on actual CPU hotplug
> because it doesn't usually happen that cpu hotplug is disabled.
The last part was confusing: "on actual CPU hotplug because it doesn't
usually happen that cpu hotplug is disabled when onlining a CPU that was
just hotplugged."
--
Thanks,
David / dhildenb