Recent changes in logical package management (Commit 9d85eb9119f4
("x86/smpboot: Make logical package management more robust") and its
predecessor) caused boot failures for some Xen guests. E.g. I'm trying to
boot 10 CPU guest on AMD Opteron 4284 system and I see the following crash:
[ 0.116104] smpboot: Max logical packages: 1
...
[ 0.590068] #8
[ 0.001000] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
[ 0.001000] ------------[ cut here ]------------
[ 0.001000] kernel BUG at arch/x86/kernel/cpu/common.c:1020!
This is happening because total_cpus is 10 and x86_max_cores is 16(!).
Turns out, the number of CPUs (vCPUs in our case) in each logical package
doesn't have to be exactly x86_max_cores, we can have any number of CPUs
<= x86_max_cores and they also don't have to match for all logical
packages. This breaks the current concept of __max_logical_packages.
In this patch I suggest we set __max_logical_packages based on the
max_physical_pkg_id and total_cpus, this should be safe and cover all
possible cases. Alternatively, we may think about eliminating the concept
of __max_logical_packages completely and relying on max_physical_pkg_id/
total_cpus where we currently use topology_max_packages().
The issue could've been solved in Xen too I guess. CPUID returning
x86_max_cores can be tweaked to be the lowerest(?) possible number of
all logical packages of the guest.
Fixes: 9d85eb9119f4 ("x86/smpboot: Make logical package management more robust")
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kernel/smpboot.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index bd1f1ad..85f41cd 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -359,7 +359,6 @@ static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu)
ncpus = 1;
}
- __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
logical_packages = 0;
/*
@@ -367,6 +366,15 @@ static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu)
* package can be smaller than the actual used apic ids.
*/
max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);
+
+ /*
+ * Each logical package has not more than x86_max_cores CPUs but
+ * it can happen that it has less, e.g. we may have 1 CPU per logical
+ * package regardless of what's in x86_max_cores. This is seen on some
+ * Xen setups with AMD processors.
+ */
+ __max_logical_packages = min(max_physical_pkg_id, total_cpus);
+
size = max_physical_pkg_id * sizeof(unsigned int);
physical_to_logical_pkg = kmalloc(size, GFP_KERNEL);
memset(physical_to_logical_pkg, 0xff, size);
--
2.9.3
On Thu, Apr 20, 2017 at 03:24:53PM +0200, Vitaly Kuznetsov wrote:
> In this patch I suggest we set __max_logical_packages based on the
> max_physical_pkg_id and total_cpus,
So my 4 socket 144 CPU system will then get max_physical_pkg_id=144,
instead of 4.
This wastes quite a bit of memory for the per-node arrays. Luckily most
are just pointer arrays, but still, wasting 140*8 bytes for each of
them.
> this should be safe and cover all
> possible cases. Alternatively, we may think about eliminating the concept
> of __max_logical_packages completely and relying on max_physical_pkg_id/
> total_cpus where we currently use topology_max_packages().
>
> The issue could've been solved in Xen too I guess. CPUID returning
> x86_max_cores can be tweaked to be the lowerest(?) possible number of
> all logical packages of the guest.
This is getting ludicrous. Xen is plain broken, and instead of fixing
it, you propose to somehow deal with its obviously crack induced
behaviour :-(
Peter Zijlstra <[email protected]> writes:
> On Thu, Apr 20, 2017 at 03:24:53PM +0200, Vitaly Kuznetsov wrote:
>> In this patch I suggest we set __max_logical_packages based on the
>> max_physical_pkg_id and total_cpus,
>
> So my 4 socket 144 CPU system will then get max_physical_pkg_id=144,
> instead of 4.
>
> This wastes quite a bit of memory for the per-node arrays. Luckily most
> are just pointer arrays, but still, wasting 140*8 bytes for each of
> them.
>
>> this should be safe and cover all
>> possible cases. Alternatively, we may think about eliminating the concept
>> of __max_logical_packages completely and relying on max_physical_pkg_id/
>> total_cpus where we currently use topology_max_packages().
>>
>> The issue could've been solved in Xen too I guess. CPUID returning
>> x86_max_cores can be tweaked to be the lowerest(?) possible number of
>> all logical packages of the guest.
>
> This is getting ludicrous. Xen is plain broken, and instead of fixing
> it, you propose to somehow deal with its obviously crack induced
> behaviour :-(
Totally agree and I don't like the solution I propose (and that's why
this is RFC)... The problem is that there are such Xen setups in the
wild and with the recent changes some guests will BUG() :-(
Alternatively, we can just remove the BUG() and do something with CPUs
which have their pkg >= __max_logical_packages, e.g. assign them to the
last package. Far from ideal but will help to avoid the regression.
--
Vitaly
On 04/20/2017 11:40 AM, Vitaly Kuznetsov wrote:
> Peter Zijlstra <[email protected]> writes:
>
>> On Thu, Apr 20, 2017 at 03:24:53PM +0200, Vitaly Kuznetsov wrote:
>>> In this patch I suggest we set __max_logical_packages based on the
>>> max_physical_pkg_id and total_cpus,
>> So my 4 socket 144 CPU system will then get max_physical_pkg_id=144,
>> instead of 4.
>>
>> This wastes quite a bit of memory for the per-node arrays. Luckily most
>> are just pointer arrays, but still, wasting 140*8 bytes for each of
>> them.
>>
>>> this should be safe and cover all
>>> possible cases. Alternatively, we may think about eliminating the concept
>>> of __max_logical_packages completely and relying on max_physical_pkg_id/
>>> total_cpus where we currently use topology_max_packages().
>>>
>>> The issue could've been solved in Xen too I guess. CPUID returning
>>> x86_max_cores can be tweaked to be the lowerest(?) possible number of
>>> all logical packages of the guest.
>> This is getting ludicrous. Xen is plain broken, and instead of fixing
>> it, you propose to somehow deal with its obviously crack induced
>> behaviour :-(
> Totally agree and I don't like the solution I propose (and that's why
> this is RFC)... The problem is that there are such Xen setups in the
> wild and with the recent changes some guests will BUG() :-(
>
> Alternatively, we can just remove the BUG() and do something with CPUs
> which have their pkg >= __max_logical_packages, e.g. assign them to the
> last package. Far from ideal but will help to avoid the regression.
Do you observe this failure on PV or HVM guest?
We've had a number of issues with topology discovery for PV guests but
AFAIK they have been addressed (so far). I wonder though whether it
would make sense to have some sort of a callback (or an smp_ops.op) to
override native topology info, if needed.
-boris
On Thu, Apr 20, 2017 at 05:40:37PM +0200, Vitaly Kuznetsov wrote:
> > This is getting ludicrous. Xen is plain broken, and instead of fixing
> > it, you propose to somehow deal with its obviously crack induced
> > behaviour :-(
>
> Totally agree and I don't like the solution I propose (and that's why
> this is RFC)... The problem is that there are such Xen setups in the
> wild and with the recent changes some guests will BUG() :-(
>
> Alternatively, we can just remove the BUG() and do something with CPUs
> which have their pkg >= __max_logical_packages, e.g. assign them to the
> last package. Far from ideal but will help to avoid the regression.
So currently none of the stuff that uses this should appear in Xen. Its
all drivers for hardware that isn't virtualized (afaik). So assigning to
the last package 'works'.
But the moment this ends up getting used that explodes, because we'll
need different object instances for each piece of hardware.
There just isn't a good solution; on the one hand the BIOS is prone to
providing crap numbers, on the other hand virt (esp. Xen as it turns
out) provides absolutely bonkers/inconsistent topology information.
Very frustrating :-/
Boris Ostrovsky <[email protected]> writes:
> On 04/20/2017 11:40 AM, Vitaly Kuznetsov wrote:
>> Peter Zijlstra <[email protected]> writes:
>>
>>> On Thu, Apr 20, 2017 at 03:24:53PM +0200, Vitaly Kuznetsov wrote:
>>>> In this patch I suggest we set __max_logical_packages based on the
>>>> max_physical_pkg_id and total_cpus,
>>> So my 4 socket 144 CPU system will then get max_physical_pkg_id=144,
>>> instead of 4.
>>>
>>> This wastes quite a bit of memory for the per-node arrays. Luckily most
>>> are just pointer arrays, but still, wasting 140*8 bytes for each of
>>> them.
>>>
>>>> this should be safe and cover all
>>>> possible cases. Alternatively, we may think about eliminating the concept
>>>> of __max_logical_packages completely and relying on max_physical_pkg_id/
>>>> total_cpus where we currently use topology_max_packages().
>>>>
>>>> The issue could've been solved in Xen too I guess. CPUID returning
>>>> x86_max_cores can be tweaked to be the lowerest(?) possible number of
>>>> all logical packages of the guest.
>>> This is getting ludicrous. Xen is plain broken, and instead of fixing
>>> it, you propose to somehow deal with its obviously crack induced
>>> behaviour :-(
>> Totally agree and I don't like the solution I propose (and that's why
>> this is RFC)... The problem is that there are such Xen setups in the
>> wild and with the recent changes some guests will BUG() :-(
>>
>> Alternatively, we can just remove the BUG() and do something with CPUs
>> which have their pkg >= __max_logical_packages, e.g. assign them to the
>> last package. Far from ideal but will help to avoid the regression.
>
> Do you observe this failure on PV or HVM guest?
>
> We've had a number of issues with topology discovery for PV guests but
> AFAIK they have been addressed (so far). I wonder though whether it
> would make sense to have some sort of a callback (or an smp_ops.op) to
> override native topology info, if needed.
>
This is HVM.
I guess that CPUID handling for AMD processors in the hypervisor doesn't
adjust the core information and passes it from hardware as-is while it
should be tweaked.
--
Vitaly
On 20/04/17 16:06, Peter Zijlstra wrote:
> On Thu, Apr 20, 2017 at 03:24:53PM +0200, Vitaly Kuznetsov wrote:
>> In this patch I suggest we set __max_logical_packages based on the
>> max_physical_pkg_id and total_cpus,
> So my 4 socket 144 CPU system will then get max_physical_pkg_id=144,
> instead of 4.
>
> This wastes quite a bit of memory for the per-node arrays. Luckily most
> are just pointer arrays, but still, wasting 140*8 bytes for each of
> them.
>
>> this should be safe and cover all
>> possible cases. Alternatively, we may think about eliminating the concept
>> of __max_logical_packages completely and relying on max_physical_pkg_id/
>> total_cpus where we currently use topology_max_packages().
>>
>> The issue could've been solved in Xen too I guess. CPUID returning
>> x86_max_cores can be tweaked to be the lowerest(?) possible number of
>> all logical packages of the guest.
> This is getting ludicrous. Xen is plain broken, and instead of fixing
> it, you propose to somehow deal with its obviously crack induced
> behaviour :-(
Xen is (and has been forever) plain broken in this specific regard.
Fixing CPUID handling for guests has taken me 18 months and ~80 patches
so far, and it still isn't complete.
However, Linux needs to be able to function on existing production
systems (which is every instance of Xen currently running).
Linux shouldn't BUG() because something provided by the firmware looks
wonky. This is conceptually no different from finding junk the APCI tables.
(I fully agree that we shouldn't have got to this state, but we are 12
years too late in this respect.)
~Andrew
On 04/20/2017 12:15 PM, Peter Zijlstra wrote:
> On Thu, Apr 20, 2017 at 05:40:37PM +0200, Vitaly Kuznetsov wrote:
>>> This is getting ludicrous. Xen is plain broken, and instead of fixing
>>> it, you propose to somehow deal with its obviously crack induced
>>> behaviour :-(
>> Totally agree and I don't like the solution I propose (and that's why
>> this is RFC)... The problem is that there are such Xen setups in the
>> wild and with the recent changes some guests will BUG() :-(
>>
>> Alternatively, we can just remove the BUG() and do something with CPUs
>> which have their pkg >= __max_logical_packages, e.g. assign them to the
>> last package. Far from ideal but will help to avoid the regression.
> So currently none of the stuff that uses this should appear in Xen. Its
> all drivers for hardware that isn't virtualized (afaik). So assigning to
> the last package 'works'.
>
> But the moment this ends up getting used that explodes, because we'll
> need different object instances for each piece of hardware.
This already gets used. I don't remember details but we had to fix
something due to RAPL code referencing topology info (and yes, there is
no reason for a guest to use RAPL, but we shouldn't crash neither)
>
> There just isn't a good solution; on the one hand the BIOS is prone to
> providing crap numbers, on the other hand virt (esp. Xen as it turns
> out) provides absolutely bonkers/inconsistent topology information.
>
> Very frustrating :-/
>
So we might need a way to bypass topology discovery and present some
sort of default topology (single package, or 1 CPU per package, for
example) and ignore APICID and all that.
-boris