2014-07-10 08:43:09

by Yasuaki Ishimatsu

[permalink] [raw]
Subject: [PATCH] x86,cpu-hotplug: assign same CPU number to readded CPU

llc_shared_map is not cleared even if CPU is offline or hot removed.
So when hot-plugging CPU and assigning new CPU number to hot-added CPU,
the mask has wrong value. The mask is used by CSF schduler to create
sched_domain. So it breaks CFS scheduler.

Here is a example on my system.
My system has 4 sockets and each socket has 15 cores and HT is enabled.
In this case, each core of sockes is numbered as follows:

| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-44, 90-104
Socket#3 | 45-59, 105-119

Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
It means that last level cache of Socket#2 is shared with
CPU#30-44 and 90-104.

When hot-removing socket#2 and #3, each core of sockets is numbered
as follows:

| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89

But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains
having 0x3fff80000001fffc0000000.

After that, when hot-adding socket#2 and #3, each core of sockets is
numbered as follows:

| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-59
Socket#3 | 90-119

Then llc_shared_mask of CPU#30 becomes 0x3fff8000fffffffc0000000.
It means that last level cache of Socket#2 is shared with CPU#30-59
and 90-104. So the mask has wrong value.

At first, I cleared hot-removed CPU number's bit from llc_shared_map
when hot removing CPU. But Borislav suggested that the problem will
disappear if readded CPU is assigned same CPU number. And llc_shared_map
must not be changed.

So the patch assigns same CPU number to readded CPU by linking CPU
number to APIC ID. And by the patch, the problem disappers.

Signed-off-by: Yasuaki Ishimatsu <[email protected]>
Suggested-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/apic/apic.c | 32 +++++++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index ad28db7..1cc715b 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -220,6 +220,22 @@ static void apic_pm_activate(void);
static unsigned long apic_phys;

/*
+ * Bind ACPI ID to CPU ID
+ * CPU ID to APIC ID does not change by this array even if CPU is
+ * hotplugged. So don't clear the array even if CPU is hot-removed
+ */
+static int apicid_to_cpuid[MAX_LOCAL_APIC] = {
+ [0 ... MAX_LOCAL_APIC-1] = -1,
+};
+
+/*
+ * Represent CPU ID bound to APIC
+ * Don't clear a bit even if CPU is hot-removed
+ */
+static DECLARE_BITMAP(cpu_used_bits, CONFIG_NR_CPUS);
+static struct cpumask *const cpu_used_mask = to_cpumask(cpu_used_bits);
+
+/*
* Get the LAPIC version
*/
static inline int lapic_get_version(void)
@@ -2122,6 +2138,17 @@ void disconnect_bsp_APIC(int virt_wire_setup)
apic_write(APIC_LVT1, value);
}

+static int get_cpuid(int apicid)
+{
+ int cpuid;
+
+ cpuid = apicid_to_cpuid[apicid];
+ if (cpuid < 0)
+ cpuid = cpumask_next_zero(-1, cpu_used_mask);
+
+ return cpuid;
+}
+
int generic_processor_info(int apicid, int version)
{
int cpu, max = nr_cpu_ids;
@@ -2199,7 +2226,9 @@ int generic_processor_info(int apicid, int version)
*/
cpu = 0;
} else
- cpu = cpumask_next_zero(-1, cpu_present_mask);
+ cpu = get_cpuid(apicid);
+
+ apicid_to_cpuid[apicid] = cpu;

/*
* Validate version
@@ -2228,6 +2257,7 @@ int generic_processor_info(int apicid, int version)
early_per_cpu(x86_cpu_to_logical_apicid, cpu) =
apic->x86_32_early_logical_apicid(cpu);
#endif
+ cpumask_set_cpu(cpu, cpu_used_mask);
set_cpu_possible(cpu, true);
set_cpu_present(cpu, true);


2014-07-10 09:27:21

by Igor Mammedov

[permalink] [raw]
Subject: Re: [PATCH] x86,cpu-hotplug: assign same CPU number to readded CPU

On Thu, 10 Jul 2014 17:41:50 +0900
Yasuaki Ishimatsu <[email protected]> wrote:

> llc_shared_map is not cleared even if CPU is offline or hot removed.
> So when hot-plugging CPU and assigning new CPU number to hot-added CPU,
> the mask has wrong value. The mask is used by CSF schduler to create
> sched_domain. So it breaks CFS scheduler.
>
> Here is a example on my system.
> My system has 4 sockets and each socket has 15 cores and HT is enabled.
> In this case, each core of sockes is numbered as follows:
>
> | CPU#
> Socket#0 | 0-14 , 60-74
> Socket#1 | 15-29, 75-89
> Socket#2 | 30-44, 90-104
> Socket#3 | 45-59, 105-119
>
> Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
> It means that last level cache of Socket#2 is shared with
> CPU#30-44 and 90-104.
>
> When hot-removing socket#2 and #3, each core of sockets is numbered
> as follows:
>
> | CPU#
> Socket#0 | 0-14 , 60-74
> Socket#1 | 15-29, 75-89
>
> But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains
> having 0x3fff80000001fffc0000000.
>
> After that, when hot-adding socket#2 and #3, each core of sockets is
> numbered as follows:
>
> | CPU#
> Socket#0 | 0-14 , 60-74
> Socket#1 | 15-29, 75-89
> Socket#2 | 30-59
> Socket#3 | 90-119
>
> Then llc_shared_mask of CPU#30 becomes 0x3fff8000fffffffc0000000.
> It means that last level cache of Socket#2 is shared with CPU#30-59
> and 90-104. So the mask has wrong value.
>
> At first, I cleared hot-removed CPU number's bit from llc_shared_map
> when hot removing CPU. But Borislav suggested that the problem will
> disappear if readded CPU is assigned same CPU number. And llc_shared_map
> must not be changed.
>
> So the patch assigns same CPU number to readded CPU by linking CPU
> number to APIC ID. And by the patch, the problem disappers.
>
> Signed-off-by: Yasuaki Ishimatsu <[email protected]>
> Suggested-by: Borislav Petkov <[email protected]>
> ---
> arch/x86/kernel/apic/apic.c | 32 +++++++++++++++++++++++++++++++-
> 1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index ad28db7..1cc715b 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -220,6 +220,22 @@ static void apic_pm_activate(void);
> static unsigned long apic_phys;
>
> /*
> + * Bind ACPI ID to CPU ID
> + * CPU ID to APIC ID does not change by this array even if CPU is
> + * hotplugged. So don't clear the array even if CPU is hot-removed
> + */
> +static int apicid_to_cpuid[MAX_LOCAL_APIC] = {
> + [0 ... MAX_LOCAL_APIC-1] = -1,
> +};
> +
> +/*
> + * Represent CPU ID bound to APIC
> + * Don't clear a bit even if CPU is hot-removed
> + */
> +static DECLARE_BITMAP(cpu_used_bits, CONFIG_NR_CPUS);
> +static struct cpumask *const cpu_used_mask = to_cpumask(cpu_used_bits);
> +
> +/*
> * Get the LAPIC version
> */
> static inline int lapic_get_version(void)
> @@ -2122,6 +2138,17 @@ void disconnect_bsp_APIC(int virt_wire_setup)
> apic_write(APIC_LVT1, value);
> }
>
> +static int get_cpuid(int apicid)
> +{
> + int cpuid;
> +
> + cpuid = apicid_to_cpuid[apicid];
> + if (cpuid < 0)
> + cpuid = cpumask_next_zero(-1, cpu_used_mask);
Why do you need additional cpu bitmask?
How about just finding the first apicid_to_cpuid[apicid] < 0
and dropping not needed anymore bitmask.

> +
> + return cpuid;
> +}
> +
> int generic_processor_info(int apicid, int version)
> {
> int cpu, max = nr_cpu_ids;
> @@ -2199,7 +2226,9 @@ int generic_processor_info(int apicid, int version)
> */
> cpu = 0;
> } else
> - cpu = cpumask_next_zero(-1, cpu_present_mask);
> + cpu = get_cpuid(apicid);
> +
> + apicid_to_cpuid[apicid] = cpu;
>
> /*
> * Validate version
> @@ -2228,6 +2257,7 @@ int generic_processor_info(int apicid, int version)
> early_per_cpu(x86_cpu_to_logical_apicid, cpu) =
> apic->x86_32_early_logical_apicid(cpu);
> #endif
> + cpumask_set_cpu(cpu, cpu_used_mask);
> set_cpu_possible(cpu, true);
> set_cpu_present(cpu, true);
>
>

2014-07-11 00:49:24

by Yasuaki Ishimatsu

[permalink] [raw]
Subject: Re: [PATCH] x86,cpu-hotplug: assign same CPU number to readded CPU

(2014/07/10 18:26), Igor Mammedov wrote:
> On Thu, 10 Jul 2014 17:41:50 +0900
> Yasuaki Ishimatsu <[email protected]> wrote:
>
>> llc_shared_map is not cleared even if CPU is offline or hot removed.
>> So when hot-plugging CPU and assigning new CPU number to hot-added CPU,
>> the mask has wrong value. The mask is used by CSF schduler to create
>> sched_domain. So it breaks CFS scheduler.
>>
>> Here is a example on my system.
>> My system has 4 sockets and each socket has 15 cores and HT is enabled.
>> In this case, each core of sockes is numbered as follows:
>>
>> | CPU#
>> Socket#0 | 0-14 , 60-74
>> Socket#1 | 15-29, 75-89
>> Socket#2 | 30-44, 90-104
>> Socket#3 | 45-59, 105-119
>>
>> Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
>> It means that last level cache of Socket#2 is shared with
>> CPU#30-44 and 90-104.
>>
>> When hot-removing socket#2 and #3, each core of sockets is numbered
>> as follows:
>>
>> | CPU#
>> Socket#0 | 0-14 , 60-74
>> Socket#1 | 15-29, 75-89
>>
>> But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains
>> having 0x3fff80000001fffc0000000.
>>
>> After that, when hot-adding socket#2 and #3, each core of sockets is
>> numbered as follows:
>>
>> | CPU#
>> Socket#0 | 0-14 , 60-74
>> Socket#1 | 15-29, 75-89
>> Socket#2 | 30-59
>> Socket#3 | 90-119
>>
>> Then llc_shared_mask of CPU#30 becomes 0x3fff8000fffffffc0000000.
>> It means that last level cache of Socket#2 is shared with CPU#30-59
>> and 90-104. So the mask has wrong value.
>>
>> At first, I cleared hot-removed CPU number's bit from llc_shared_map
>> when hot removing CPU. But Borislav suggested that the problem will
>> disappear if readded CPU is assigned same CPU number. And llc_shared_map
>> must not be changed.
>>
>> So the patch assigns same CPU number to readded CPU by linking CPU
>> number to APIC ID. And by the patch, the problem disappers.
>>
>> Signed-off-by: Yasuaki Ishimatsu <[email protected]>
>> Suggested-by: Borislav Petkov <[email protected]>
>> ---
>> arch/x86/kernel/apic/apic.c | 32 +++++++++++++++++++++++++++++++-
>> 1 file changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
>> index ad28db7..1cc715b 100644
>> --- a/arch/x86/kernel/apic/apic.c
>> +++ b/arch/x86/kernel/apic/apic.c
>> @@ -220,6 +220,22 @@ static void apic_pm_activate(void);
>> static unsigned long apic_phys;
>>
>> /*
>> + * Bind ACPI ID to CPU ID
>> + * CPU ID to APIC ID does not change by this array even if CPU is
>> + * hotplugged. So don't clear the array even if CPU is hot-removed
>> + */
>> +static int apicid_to_cpuid[MAX_LOCAL_APIC] = {
>> + [0 ... MAX_LOCAL_APIC-1] = -1,
>> +};
>> +
>> +/*
>> + * Represent CPU ID bound to APIC
>> + * Don't clear a bit even if CPU is hot-removed
>> + */
>> +static DECLARE_BITMAP(cpu_used_bits, CONFIG_NR_CPUS);
>> +static struct cpumask *const cpu_used_mask = to_cpumask(cpu_used_bits);
>> +
>> +/*
>> * Get the LAPIC version
>> */
>> static inline int lapic_get_version(void)
>> @@ -2122,6 +2138,17 @@ void disconnect_bsp_APIC(int virt_wire_setup)
>> apic_write(APIC_LVT1, value);
>> }
>>

>> +static int get_cpuid(int apicid)
>> +{
>> + int cpuid;
>> +
>> + cpuid = apicid_to_cpuid[apicid];
>> + if (cpuid < 0)
>> + cpuid = cpumask_next_zero(-1, cpu_used_mask);
> Why do you need additional cpu bitmask?

To assing new CPU number, I prepared new cpu bitmask.

The following two steps are necessary to assign CPU number to APIC ID.
1. Check whether APIC ID has been assigned CPU number
2. Assign new CPU number if ACPI ID has not been assigned CPU number (it
means apicid_to_cpuid[] returns -1)

Step 1. is checked by apicid_to_cpuid[]. And step 2. assigns new CPU
number by using cpu_used_mask.

To keep cpu number, cpumask must not be cleared by hot removing CPU.
If cpumask is cleared by hot removing CPU, the cpumask cannot be used
to keep CPU number.

Currently, cpu_present_map is used to assign CPU number. But the cpumask
is cleared by hot removing CPU since the mask is prepared to remember
existed CPUs in the system. So the cpu_present_map must be cleared
at CPU hot remove.

I confirmed whether present cpumasks (cpu_possible_map, cpu_online_map
et al) is usable or not for this purpose. But there is no cpumask that
can be used to keep CPU number. So I prepared new cpu bitmask.

> How about just finding the first apicid_to_cpuid[apicid] < 0
> and dropping not needed anymore bitmask.

When apicid_to_cpuid[] return -1, kernel assigns new CPU number. For
this, the cpu_used_mask is necessary.

Thanks,
Yasuaki Ishimatsu

>
>> +
>> + return cpuid;
>> +}
>> +
>> int generic_processor_info(int apicid, int version)
>> {
>> int cpu, max = nr_cpu_ids;
>> @@ -2199,7 +2226,9 @@ int generic_processor_info(int apicid, int version)
>> */
>> cpu = 0;
>> } else
>> - cpu = cpumask_next_zero(-1, cpu_present_mask);
>> + cpu = get_cpuid(apicid);
>> +
>> + apicid_to_cpuid[apicid] = cpu;
>>
>> /*
>> * Validate version
>> @@ -2228,6 +2257,7 @@ int generic_processor_info(int apicid, int version)
>> early_per_cpu(x86_cpu_to_logical_apicid, cpu) =
>> apic->x86_32_early_logical_apicid(cpu);
>> #endif
>> + cpumask_set_cpu(cpu, cpu_used_mask);
>> set_cpu_possible(cpu, true);
>> set_cpu_present(cpu, true);
>>
>>
>

2014-07-11 10:59:32

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86,cpu-hotplug: assign same CPU number to readded CPU

On Fri, Jul 11, 2014 at 09:48:27AM +0900, Yasuaki Ishimatsu wrote:
> >>+static int get_cpuid(int apicid)

Btw this "cpuid" is misleading. Call it "cpu_num" or so.

> >>+{
> >>+ int cpuid;
> >>+
> >>+ cpuid = apicid_to_cpuid[apicid];
> >>+ if (cpuid < 0)
> >>+ cpuid = cpumask_next_zero(-1, cpu_used_mask);
> >Why do you need additional cpu bitmask?
>
> To assing new CPU number, I prepared new cpu bitmask.
>
> The following two steps are necessary to assign CPU number to APIC ID.
> 1. Check whether APIC ID has been assigned CPU number
> 2. Assign new CPU number if ACPI ID has not been assigned CPU number (it
> means apicid_to_cpuid[] returns -1)
>
> Step 1. is checked by apicid_to_cpuid[]. And step 2. assigns new CPU
> number by using cpu_used_mask.
>
> To keep cpu number, cpumask must not be cleared by hot removing CPU.
> If cpumask is cleared by hot removing CPU, the cpumask cannot be used
> to keep CPU number.
>
> Currently, cpu_present_map is used to assign CPU number. But the cpumask
> is cleared by hot removing CPU since the mask is prepared to remember
> existed CPUs in the system. So the cpu_present_map must be cleared
> at CPU hot remove.
>
> I confirmed whether present cpumasks (cpu_possible_map, cpu_online_map
> et al) is usable or not for this purpose. But there is no cpumask that
> can be used to keep CPU number. So I prepared new cpu bitmask.
>
> >How about just finding the first apicid_to_cpuid[apicid] < 0
> >and dropping not needed anymore bitmask.
>
> When apicid_to_cpuid[] return -1, kernel assigns new CPU number. For
> this, the cpu_used_mask is necessary.

And we can't have that - we cannot have cores which had number X get
number Y after hotplug.

Can you send a full dmesg after you've done a physical node hotplug on a
machine? Privately is fine too.

Boot with

"ignore_loglevel log_buf_len=10M debug apic=debug show_lapic=all"

please.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-07-15 04:28:27

by Yasuaki Ishimatsu

[permalink] [raw]
Subject: Re: [PATCH] x86,cpu-hotplug: assign same CPU number to readded CPU

Hi Borislav,

(2014/07/11 19:59), Borislav Petkov wrote:
> On Fri, Jul 11, 2014 at 09:48:27AM +0900, Yasuaki Ishimatsu wrote:
>>>> +static int get_cpuid(int apicid)
>
> Btw this "cpuid" is misleading. Call it "cpu_num" or so.

O.K.
I'll update it.

>
>>>> +{
>>>> + int cpuid;
>>>> +
>>>> + cpuid = apicid_to_cpuid[apicid];
>>>> + if (cpuid < 0)
>>>> + cpuid = cpumask_next_zero(-1, cpu_used_mask);
>>> Why do you need additional cpu bitmask?
>>
>> To assing new CPU number, I prepared new cpu bitmask.
>>
>> The following two steps are necessary to assign CPU number to APIC ID.
>> 1. Check whether APIC ID has been assigned CPU number
>> 2. Assign new CPU number if ACPI ID has not been assigned CPU number (it
>> means apicid_to_cpuid[] returns -1)
>>
>> Step 1. is checked by apicid_to_cpuid[]. And step 2. assigns new CPU
>> number by using cpu_used_mask.
>>
>> To keep cpu number, cpumask must not be cleared by hot removing CPU.
>> If cpumask is cleared by hot removing CPU, the cpumask cannot be used
>> to keep CPU number.
>>
>> Currently, cpu_present_map is used to assign CPU number. But the cpumask
>> is cleared by hot removing CPU since the mask is prepared to remember
>> existed CPUs in the system. So the cpu_present_map must be cleared
>> at CPU hot remove.
>>
>> I confirmed whether present cpumasks (cpu_possible_map, cpu_online_map
>> et al) is usable or not for this purpose. But there is no cpumask that
>> can be used to keep CPU number. So I prepared new cpu bitmask.
>>
>>> How about just finding the first apicid_to_cpuid[apicid] < 0
>>> and dropping not needed anymore bitmask.
>>
>> When apicid_to_cpuid[] return -1, kernel assigns new CPU number. For
>> this, the cpu_used_mask is necessary.
>
> And we can't have that - we cannot have cores which had number X get
> number Y after hotplug.
>

> Can you send a full dmesg after you've done a physical node hotplug on a
> machine? Privately is fine too.

O.K.
I'll send dmesg.

Thanks,
Yasuaki Ishimatsu

>
> Boot with
>
> "ignore_loglevel log_buf_len=10M debug apic=debug show_lapic=all"
>
> please.
>
> Thanks.
>