Hello,
I've just stumbled across this commit, and I'm wondering if that's actually
correct (I too have at least one system where such IDs are reported in
MADT): For offline/absent CPUs, the firmware may not know the APIC IDs
at the point MADT is built, so I think it is quite reasonable to put ~0 in
there. The ACPID spec specifically calls out that the IDs must not change
across sleep states, which implies to me that they may change across an
offline period of a CPU. IOW I think such entries still need to contribute to
the count of disabled CPUs.
I notice a similar change has been done for the xAPIC case a while ago
by you, Thomas.
Jan
Hi Jan,
At 04/27/2018 03:21 PM, Jan Beulich wrote:
> Hello,
>
> I've just stumbled across this commit, and I'm wondering if that's actually
> correct (I too have at least one system where such IDs are reported in
> MADT): For offline/absent CPUs, the firmware may not know the APIC IDs
The MADT table is not reasonable, the Local APIC entries(xAPIC/x2APIC)
in MADT is not always correct as OS want. So, we should do this sanity
check.
> at the point MADT is built, so I think it is quite reasonable to put ~0 in
> there. The ACPID spec specifically calls out that the IDs must not change
> across sleep states, which implies to me that they may change across an
> offline period of a CPU. IOW I think such entries still need to contribute to
> the count of disabled CPUs.
>
Aha, there are no CPU devices which will map it's physical ID to the
illegal number 0xffffffff, So it will never be used.
BTW, Limiting the number of the disabled CPUs is also a goal.
Thanks,
dou
> I notice a similar change has been done for the xAPIC case a while ago
> by you, Thomas.
>
> Jan
>
>
>
>
>
>>> On 27.04.18 at 10:32, <[email protected]> wrote:
> At 04/27/2018 03:21 PM, Jan Beulich wrote:
>> I've just stumbled across this commit, and I'm wondering if that's actually
>> correct (I too have at least one system where such IDs are reported in
>> MADT): For offline/absent CPUs, the firmware may not know the APIC IDs
>
> The MADT table is not reasonable, the Local APIC entries(xAPIC/x2APIC)
> in MADT is not always correct as OS want. So, we should do this sanity
> check.
Of course, sanity checking is necessary.
>> at the point MADT is built, so I think it is quite reasonable to put ~0 in
>> there. The ACPID spec specifically calls out that the IDs must not change
>> across sleep states, which implies to me that they may change across an
>> offline period of a CPU. IOW I think such entries still need to contribute to
>> the count of disabled CPUs.
>
> Aha, there are no CPU devices which will map it's physical ID to the
> illegal number 0xffffffff, So it will never be used.
The ID will never be used, yes, but the CPU may be (after firmware has
assigned it a valid ID).
> BTW, Limiting the number of the disabled CPUs is also a goal.
I'm afraid I don't understand: Limiting the number of disabled CPUs is
certainly desirable when those can never be used, but why would you
want to do this when they might later get hotplugged? I'm not aware
of a way to tell, by just looking at the MADT, which of the two cases
apply.
Jan
At 04/27/2018 08:09 PM, Jan Beulich wrote:
>>>> On 27.04.18 at 10:32, <[email protected]> wrote:
>> At 04/27/2018 03:21 PM, Jan Beulich wrote:
>>> I've just stumbled across this commit, and I'm wondering if that's actually
>>> correct (I too have at least one system where such IDs are reported in
>>> MADT): For offline/absent CPUs, the firmware may not know the APIC IDs
>>
>> The MADT table is not reasonable, the Local APIC entries(xAPIC/x2APIC)
>> in MADT is not always correct as OS want. So, we should do this sanity
>> check.
>
> Of course, sanity checking is necessary.
>
>>> at the point MADT is built, so I think it is quite reasonable to put ~0 in
>>> there. The ACPID spec specifically calls out that the IDs must not change
>>> across sleep states, which implies to me that they may change across an
>>> offline period of a CPU. IOW I think such entries still need to contribute to
>>> the count of disabled CPUs.
>>
>> Aha, there are no CPU devices which will map it's physical ID to the
>> illegal number 0xffffffff, So it will never be used.
>
> The ID will never be used, yes, but the CPU may be (after firmware has
> assigned it a valid ID).
>
>> BTW, Limiting the number of the disabled CPUs is also a goal.
>
> I'm afraid I don't understand: Limiting the number of disabled CPUs is
> certainly desirable when those can never be used, but why would you
> want to do this when they might later get hotplugged? I'm not aware
Let's see the workflow of CPU hotplug:
1) get the CPU device info from ACPI namespace
- it contains logical processor id
2) through the logical processor id, get the LACPI entry in MADT.
3) generate the CPU for kernel(will create a CPU id, can see by lscpu)
Normally, there are no valid CPU devices in 1) which are mapped to
the LACPI entries(0xff or 0xffffffff).
The actually number of hotplugged CPUs depends on CPU devices/processors
in ACPI namespace. The number calculated from MADT is the maximum
situation which can be cut and doesn't affect CPU hotplug. Don't worry
about it.
Now, in ACPI-based system, Linux gets the number of possible CPUs by
MADT, We are going to use the ACPI namespace to make the number
accurate. But it is so hard, because it's so late to initialize the ACPI
namespace.
Thanks,
dou
> of a way to tell, by just looking at the MADT, which of the two cases
> apply.
>
> Jan
>
>
>
>
>
>>> On 02.05.18 at 03:56, <[email protected]> wrote:
> At 04/27/2018 08:09 PM, Jan Beulich wrote:
>> I'm afraid I don't understand: Limiting the number of disabled CPUs is
>> certainly desirable when those can never be used, but why would you
>> want to do this when they might later get hotplugged? I'm not aware
>
> Let's see the workflow of CPU hotplug:
>
> 1) get the CPU device info from ACPI namespace
> - it contains logical processor id
>
> 2) through the logical processor id, get the LACPI entry in MADT.
>
> 3) generate the CPU for kernel(will create a CPU id, can see by lscpu)
>
> Normally, there are no valid CPU devices in 1) which are mapped to
> the LACPI entries(0xff or 0xffffffff).
>
> The actually number of hotplugged CPUs depends on CPU devices/processors
> in ACPI namespace. The number calculated from MADT is the maximum
> situation which can be cut and doesn't affect CPU hotplug. Don't worry
> about it.
>
> Now, in ACPI-based system, Linux gets the number of possible CPUs by
> MADT, We are going to use the ACPI namespace to make the number
> accurate. But it is so hard, because it's so late to initialize the ACPI
> namespace.
So are you envisioning a model when the number of disabled CPUs can be
increased once the ACPI interpreter has been enabled? Otherwise the
maximum recorded during early boot may be too low with the changes in
question. (And FTR, I agree this number may also be way too large without
them, but it being too large is a resource efficiency problem, while it being
too low is a functionality one.)
Also, for background, besides wanting to clarify the correctness of these
two changes, I'm also trying to understand whether we want to mirror
them into the Xen hypervisor, which relies on the Dom0 kernel's ACPI
interpreter (i.e. it can and does parse MADT, but can't and doesn't parse
the ACPI name space). Hence late adjustment of the number of
hotpluggable CPUs would be even more problematic in that environment.
Jan
Hi Jan,
At 05/02/2018 02:39 PM, Jan Beulich wrote:
>>>> On 02.05.18 at 03:56, <[email protected]> wrote:
>> At 04/27/2018 08:09 PM, Jan Beulich wrote:
>>> I'm afraid I don't understand: Limiting the number of disabled CPUs is
>>> certainly desirable when those can never be used, but why would you
>>> want to do this when they might later get hotplugged? I'm not aware
>>
>> Let's see the workflow of CPU hotplug:
>>
>> 1) get the CPU device info from ACPI namespace
>> - it contains logical processor id
>>
>> 2) through the logical processor id, get the LACPI entry in MADT.
>>
>> 3) generate the CPU for kernel(will create a CPU id, can see by lscpu)
>>
>> Normally, there are no valid CPU devices in 1) which are mapped to
>> the LACPI entries(0xff or 0xffffffff).
>>
>> The actually number of hotplugged CPUs depends on CPU devices/processors
>> in ACPI namespace. The number calculated from MADT is the maximum
>> situation which can be cut and doesn't affect CPU hotplug. Don't worry
>> about it.
>>
>> Now, in ACPI-based system, Linux gets the number of possible CPUs by
>> MADT, We are going to use the ACPI namespace to make the number
>> accurate. But it is so hard, because it's so late to initialize the ACPI
>> namespace.
>
> So are you envisioning a model when the number of disabled CPUs can be
> increased once the ACPI interpreter has been enabled? Otherwise the
Yes, good idea, but, As Thomas said:
It needs to run _before_ setup_percpu() is invoked to
scale everything correctly.
> maximum recorded during early boot may be too low with the changes in
> question. (And FTR, I agree this number may also be way too large without
> them, but it being too large is a resource efficiency problem, while it being
> too low is a functionality one.)
Too large number will waste vectors, and even cause bugs.
IMO, the number will just be more accurate by excluding 0xffffffff, it
only equal to, even be larger than the real number of possible CPUs.
>
> Also, for background, besides wanting to clarify the correctness of these
> two changes, I'm also trying to understand whether we want to mirror
> them into the Xen hypervisor, which relies on the Dom0 kernel's ACPI
> interpreter (i.e. it can and does parse MADT, but can't and doesn't parse
> the ACPI name space). Hence late adjustment of the number of
> hotpluggable CPUs would be even more problematic in that environment.
>
I am not familiar with Xen, I will consider that.
Thanks,
dou
> Jan
>
>
>
>
>