2018-03-15 03:09:28

by Dou Liyang

[permalink] [raw]
Subject: [RFC PATCH v2] ACPI / processor: Fix possible CPUs map

Rafael J told me in order for the ACPI-based physical CPU hotplug to work,
there have to be objects in the ACPI namespace corresponding to all of the
processors in question. If they are not present, there is no way to signal
insertion and eject the processors safely.

But, Kernel calculates the possible CPU count from the number of Local APIC
entries in ACPI MADT. It doesn't consider with the ACPI namespace and
reports unrealistically high numbers. And kernel allocates resources
according to num_possible_cpus(), such as vectors, that may cause vector
space exhaustion and even bugs.

Depth-first search the namespace tree, check and collect the correct CPUs
and update the possible map.

Signed-off-by: Dou Liyang <[email protected]>
---
Changelog v1 --> v2:
-Optimize the code by Andy Shevchenko's suggestion
-modify the changelog

drivers/acpi/acpi_processor.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 449d86d39965..ac45380f4439 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -671,6 +671,24 @@ static acpi_status __init acpi_processor_ids_walk(acpi_handle handle,

}

+static void __init acpi_update_possible_map(void)
+{
+ unsigned int cpu;
+
+ if (nr_unique_ids >= nr_cpu_ids)
+ return;
+
+ /* Don't yet figure out if it's superfluous */
+ if (nr_unique_ids >= cpumask_last(cpu_possible_mask))
+ return;
+
+ for_each_cpu_wrap(cpu, cpu_possible_mask, nr_unique_ids)
+ set_cpu_possible(cpu, false);
+
+ nr_cpu_ids = nr_unique_ids;
+ pr_info("Allowing %d possible CPUs\n", nr_cpu_ids);
+}
+
static void __init acpi_processor_check_duplicates(void)
{
/* check the correctness for all processors in ACPI namespace */
@@ -680,6 +698,9 @@ static void __init acpi_processor_check_duplicates(void)
NULL, NULL, NULL);
acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID, acpi_processor_ids_walk,
NULL, NULL);
+
+ /* make possible CPU count more realistic */
+ acpi_update_possible_map();
}

bool acpi_duplicate_processor_id(int proc_id)
--
2.14.3





2018-03-15 13:47:13

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH v2] ACPI / processor: Fix possible CPUs map

On Thu, 15 Mar 2018, Dou Liyang wrote:
>
> +static void __init acpi_update_possible_map(void)
> +{
> + unsigned int cpu;
> +
> + if (nr_unique_ids >= nr_cpu_ids)
> + return;
> +
> + /* Don't yet figure out if it's superfluous */
> + if (nr_unique_ids >= cpumask_last(cpu_possible_mask))
> + return;
> +
> + for_each_cpu_wrap(cpu, cpu_possible_mask, nr_unique_ids)
> + set_cpu_possible(cpu, false);
> +
> + nr_cpu_ids = nr_unique_ids;
> + pr_info("Allowing %d possible CPUs\n", nr_cpu_ids);
> +}
> +
> static void __init acpi_processor_check_duplicates(void)
> {
> /* check the correctness for all processors in ACPI namespace */
> @@ -680,6 +698,9 @@ static void __init acpi_processor_check_duplicates(void)
> NULL, NULL, NULL);
> acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID, acpi_processor_ids_walk,
> NULL, NULL);
> +
> + /* make possible CPU count more realistic */
> + acpi_update_possible_map();
> }

I tested this on a machine which claims to have gazillion of hotplugable
CPUs:

smpboot: Allowing 152 CPUs, 120 hotplug CPUs
setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:152 nr_node_ids:2
smp: Brought up 2 nodes, 32 CPUs

Now with your patch applied it's still saying:

smpboot: Allowing 152 CPUs, 120 hotplug CPUs
setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:152 nr_node_ids:2
smp: Brought up 2 nodes, 32 CPUs

and the above code runs later on and the result is:

nr_unique_ids 1 nr_cpu_ids 152
Allowing 1 possible CPU

which subsequently causes the machine to die as we have already 32 CPUs
online.

So nr_unique_ids is not what it should be and even if it would be the code
runs way too late. It needs to run _before_ setup_percpu() is invoked to
scale everything correctly.

Thanks,

tglx




2018-03-16 02:55:10

by Dou Liyang

[permalink] [raw]
Subject: Re: [RFC PATCH v2] ACPI / processor: Fix possible CPUs map

Hi Thomas,

At 03/15/2018 09:45 PM, Thomas Gleixner wrote:
[...]

> I tested this on a machine which claims to have gazillion of hotplugable
> CPUs:

I really appreciate your test.

>
> smpboot: Allowing 152 CPUs, 120 hotplug CPUs
> setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:152 nr_node_ids:2
> smp: Brought up 2 nodes, 32 CPUs
>
> Now with your patch applied it's still saying:
>
> smpboot: Allowing 152 CPUs, 120 hotplug CPUs
> setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:152 nr_node_ids:2
> smp: Brought up 2 nodes, 32 CPUs
>
> and the above code runs later on and the result is:
>
> nr_unique_ids 1 nr_cpu_ids 152
> Allowing 1 possible CPU
>
> which subsequently causes the machine to die as we have already 32 CPUs
> online.
>

That is so interesting, it proofs that this strategy is risky.

I've been wondering how to determine the number of possible CPUs. Due to
the diversity of ACPI in different systems, that seems impossible. Why
don't I change my mind? Here is a new strategy and I think it is more
reasonable and minimally invasive.

As we all know, we reset possible CPUs in the prefill_possible_map(),
If CONFIG_HOTPLUG_CPU = y :
nr_possible_CPUs = num_processors + disabled_cpus.

  Before the prefill_possible_map(), *Fix the inaccurate disabled_cpus*:

    1) For each disabled CPUs, get it's processor id from ACPI MADT

    2) Check whether this processor id is existed in ACPI namespace or not.
If false, disabled_cpus--;

I will show you the code and test it later. And IMO, the code logic in
prefill_possible_map() is a little mess, will try to sort it out first.

> So nr_unique_ids is not what it should be and even if it would be the code
> runs way too late. It needs to run _before_ setup_percpu() is invoked to
> scale everything correctly.
>

Yes, Got it.

Thanks,
dou

> Thanks,
>
> tglx
>
>
>
>
>
>



2018-03-19 06:30:15

by Fengguang Wu

[permalink] [raw]
Subject: [ACPI / processor] c1294b481b: WARNING:at_include/linux/cpumask.h:#invoke_rcu_core

FYI, we noticed the following commit (built with gcc-6):

commit: c1294b481baa89caa30b4c2933c5ce662d52070c ("ACPI / processor: Fix possible CPUs map")
url: https://github.com/0day-ci/linux/commits/Dou-Liyang/ACPI-processor-Fix-possible-CPUs-map/20180318-023857


in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu IvyBridge -smp 4 -m 2G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+------------------------------------------------------------------+------------+------------+
| | 3266b5bd97 | c1294b481b |
+------------------------------------------------------------------+------------+------------+
| boot_successes | 35 | 0 |
| boot_failures | 0 | 40 |
| WARNING:at_include/linux/cpumask.h:#invoke_rcu_core | 0 | 19 |
| WARNING:at_include/linux/cpumask.h:#console_unlock | 0 | 36 |
| RIP:console_unlock | 0 | 36 |
| RIP:invoke_rcu_core | 0 | 20 |
| RIP:native_safe_halt | 0 | 35 |
| WARNING:at_include/linux/cpumask.h:#load_balance | 0 | 35 |
| RIP:load_balance | 0 | 34 |
| WARNING:at_include/linux/cpumask.h:#select_task_rq_fair | 0 | 25 |
| RIP:select_task_rq_fair | 0 | 27 |
| WARNING:at_include/linux/cpumask.h:#try_to_wake_up | 0 | 31 |
| RIP:try_to_wake_up | 0 | 33 |
| WARNING:at_include/linux/cpumask.h:#cpumask_next | 0 | 22 |
| RIP:cpumask_next | 0 | 23 |
| WARNING:at_include/linux/cpumask.h:#do_idle | 0 | 32 |
| RIP:do_idle | 0 | 32 |
| WARNING:at_include/linux/cpumask.h:#rcu_process_callbacks | 0 | 19 |
| RIP:rcu_process_callbacks | 0 | 19 |
| WARNING:at_include/linux/cpumask.h:#native_smp_send_reschedule | 0 | 18 |
| RIP:native_smp_send_reschedule | 0 | 18 |
| WARNING:at_include/linux/cpumask.h:#pick_next_task_fair | 0 | 19 |
| RIP:pick_next_task_fair | 0 | 19 |
| BUG:kernel_hang_in_boot_stage | 0 | 19 |
| RIP:kmem_cache_alloc | 0 | 1 |
| kernel_BUG_at_kernel/smpboot.c | 0 | 10 |
| invalid_opcode:#[##] | 0 | 10 |
| RIP:smpboot_thread_fn | 0 | 12 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 4 |
| RIP:kmem_cache_free | 0 | 2 |
| WARNING:CPU:#PID:#at/kbuild/src | 0 | 4 |
| kernel_BUG_at_ke | 0 | 1 |
| Kernel_panic-not_syncing:Fatal_exc | 0 | 2 |
| invoked_oom-killer:gfp_mask=0x | 0 | 4 |
| Mem-Info | 0 | 4 |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 0 | 4 |
| WARNING:at_include/linux/cpumask.h:#active_load_balance_cpu_stop | 0 | 1 |
| WARNING:at_include/linux/cpumask.h:#set_task_cpu | 0 | 2 |
| RIP:active_load_balance_cpu_stop | 0 | 1 |
| RIP:set_task_cpu | 0 | 2 |
| WARNING:at_include/linux/cpumask.h:#can_migrate_task | 0 | 1 |
| RIP:can_migrate_task | 0 | 1 |
| kernel_BUG_at/kbuild | 0 | 1 |
| WARNING:CPU:#PID:#at_include/l | 0 | 1 |
| WARNING:at_include/linux/cpumask.h:#wake_up_new_task | 0 | 1 |
| RIP:wake_up_new_task | 0 | 1 |
| WARNING:at_include/linux/cpumask.h:#cpupri_set | 0 | 1 |
| RIP:cpupri_set | 0 | 1 |
| WARNING:at_include/linux/cpumask.h:#switched_to_rt | 0 | 1 |
| RIP:switched_to_rt | 0 | 1 |
| RIP:debug_lockdep_rcu_enabled | 0 | 1 |
+------------------------------------------------------------------+------------+------------+



[ 0.190000] WARNING: CPU: 1 PID: 0 at include/linux/cpumask.h:122 invoke_rcu_core+0x41/0x5c
[ 0.190000] WARNING: CPU: 1 PID: 0 at include/linux/cpumask.h:122 console_unlock+0x90/0x789
[ 0.190000] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.16.0-rc4-00340-gc1294b48 #1
[ 0.190000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 0.190000] RIP: 0010:console_unlock+0x90/0x789
[ 0.190000] RSP: 0000:ffff880057e07b18 EFLAGS: 00010046
[ 0.190000] RAX: fffffbfff06dc200 RBX: dffffc0000000000 RCX: 5127adceda934e3a
[ 0.190000] RDX: dffffc0000000000 RSI: ffff880057590ea0 RDI: 0000000000000046
[ 0.190000] RBP: ffff880057e07b88 R08: 0000000000000001 R09: 0000000000000001
[ 0.190000] R10: ffff880057e07a08 R11: 0000000000000001 R12: 0000000000000066
[ 0.190000] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000006
[ 0.190000] FS: 0000000000000000(0000) GS:ffff880057e00000(0000) knlGS:0000000000000000
[ 0.190000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.190000] CR2: 00000000ffffffff CR3: 0000000003015000 CR4: 00000000001406e0
[ 0.190000] Call Trace:
[ 0.190000] <IRQ>
[ 0.190000] ? __down_trylock_console_sem+0x7f/0x8e
[ 0.190000] vprintk_emit+0x41b/0x42e
[ 0.190000] ? invoke_rcu_core+0x41/0x5c
[ 0.190000] printk+0x97/0xbe
[ 0.190000] ? show_regs_print_info+0x5/0x5
[ 0.190000] ? __probe_kernel_read+0xdf/0x16f
[ 0.190000] ? invoke_rcu_core+0x41/0x5c
[ 0.190000] __warn+0x8a/0x160
[ 0.190000] ? invoke_rcu_core+0x41/0x5c
[ 0.190000] report_bug+0x10c/0x156
[ 0.190000] fixup_bug+0x41/0x78
[ 0.190000] do_error_trap+0x104/0x240
[ 0.190000] ? fixup_bug+0x78/0x78
[ 0.190000] ? kvm_clock_read+0x21/0x29
[ 0.190000] ? kvm_sched_clock_read+0x5/0xd
[ 0.190000] ? sched_clock+0x5/0x8
[ 0.190000] ? sched_clock_local+0x36/0xe8
[ 0.190000] ? sched_clock_cpu+0x123/0x13f
[ 0.190000] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 0.190000] invalid_op+0x18/0x40
[ 0.190000] RIP: 0010:invoke_rcu_core+0x41/0x5c
[ 0.190000] RSP: 0000:ffff880057e07ef8 EFLAGS: 00010046
[ 0.190000] RAX: 0000000000000007 RBX: 0000000000000001 RCX: dffffc0000000001
[ 0.190000] RDX: 1ffffffff06dc200 RSI: 1ffff1000afc0fdb RDI: ffffffff836e1164
[ 0.190000] RBP: ffffffff83190e00 R08: 0000000000000001 R09: 0000000000000001
[ 0.190000] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880057fe821a
[ 0.190000] R13: ffff880057fe8220 R14: 00000000ffffb3e6 R15: ffff880057fe8200
[ 0.190000] rcu_check_callbacks+0x10f5/0x1109
[ 0.190000] update_process_times+0x1f/0x38
[ 0.190000] tick_periodic+0x111/0x11c
[ 0.190000] tick_handle_periodic+0x3b/0x96
[ 0.190000] smp_apic_timer_interrupt+0xc6/0xd6
[ 0.190000] apic_timer_interrupt+0xf/0x20
[ 0.190000] </IRQ>
[ 0.190000] RIP: 0010:native_safe_halt+0x2/0x3
[ 0.190000] RSP: 0000:ffff88005759fdf8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff12
[ 0.190000] RAX: 0000000000000007 RBX: 0000000000000000 RCX: dffffc0000000000
[ 0.190000] RDX: 1ffff1000aeb2100 RSI: 0000000000000004 RDI: ffff880057590e6c
[ 0.190000] RBP: ffff880057590300 R08: 0000000000000001 R09: 0000000000000001
[ 0.190000] R10: ffff880057547da8 R11: 0000000000000001 R12: 0000000000000000
[ 0.190000] R13: dffffc0000000000 R14: ffff880057590300 R15: 0000000000000001
[ 0.190000] default_idle+0xa/0xd
[ 0.190000] do_idle+0x10c/0x23c
[ 0.190000] cpu_startup_entry+0xc8/0xca
[ 0.190000] ? play_idle+0x38f/0x38f
[ 0.190000] ? trace_hardirqs_on_caller+0x3e6/0x495
[ 0.190000] start_secondary+0x445/0x48d
[ 0.190000] ? set_cpu_sibling_map+0x10e1/0x10e1
[ 0.190000] secondary_startup_64+0xa5/0xb0
[ 0.190000] Code: 00 00 00 00 41 89 c6 48 8b 45 b0 8a 00 38 45 c6 7c 10 84 c0 74 0c 48 c7 c7 64 11 6e 83 e8 5f 8d 16 00 44 3b 35 77 9c 4d 02 72 02 <0f> 0b 44 89 f0 48 0f a3 05 24 93 4d 02 48 8b 05 bd 9f a5 03 73
[ 0.190000] ---[ end trace 84701069fb3fe0d9 ]---


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
lkp


Attachments:
(No filename) (9.80 kB)
config-4.16.0-rc4-00340-gc1294b48 (97.09 kB)
job-script (4.28 kB)
dmesg.xz (12.12 kB)
Download all attachments