2018-03-14 10:30:27

by Dou Liyang

[permalink] [raw]
Subject: [RFC PATCH] ACPI / processor: Get accurate possible CPU count

Rafael J told me in order for the ACPI-based physical CPU hotplug to work,
there have to be objects in the ACPI namespace corresponding to all of the
processors in question. If they are not present, there is no way to signal
insertion and eject the processors safely.

But, Kernel calculates the possible CPU count from the number of Local APIC
entries in ACPI MADT. It doesn't consider with the ACPI namespace and
reports unrealistically high numbers.

Depth-first search the namespace tree, check and collect the correct CPUs
and update the possible map

Signed-off-by: Dou Liyang <[email protected]>
---
drivers/acpi/acpi_processor.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 449d86d39965..ca4fa95e0515 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -671,6 +671,23 @@ static acpi_status __init acpi_processor_ids_walk(acpi_handle handle,

}

+static void __init acpi_update_possible_map(void)
+{
+ unsigned int cpu, nr = 0;
+
+ if (nr_cpu_ids <= nr_unique_ids)
+ return;
+
+ for_each_possible_cpu(cpu) {
+ if (nr >= nr_unique_ids)
+ set_cpu_possible(cpu, false);
+ nr++;
+ }
+
+ nr_cpu_ids = nr_unique_ids;
+ pr_info("Allowing %d possible CPUs\n", nr_cpu_ids);
+}
+
static void __init acpi_processor_check_duplicates(void)
{
/* check the correctness for all processors in ACPI namespace */
@@ -680,6 +697,9 @@ static void __init acpi_processor_check_duplicates(void)
NULL, NULL, NULL);
acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID, acpi_processor_ids_walk,
NULL, NULL);
+
+ /* make possible CPU count more realistic */
+ acpi_update_possible_map();
}

bool acpi_duplicate_processor_id(int proc_id)
--
2.14.3





2018-03-14 17:25:23

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [RFC PATCH] ACPI / processor: Get accurate possible CPU count

On Wed, Mar 14, 2018 at 12:28 PM, Dou Liyang <[email protected]> wrote:

> +static void __init acpi_update_possible_map(void)
> +{
> + unsigned int cpu, nr = 0;
> +

> + if (nr_cpu_ids <= nr_unique_ids)
> + return;
> +
> + for_each_possible_cpu(cpu) {
> + if (nr >= nr_unique_ids)
> + set_cpu_possible(cpu, false);
> + nr++;
> + }

IIUC this can be optimized to:

if (nr_unique_ids >= nr_cpu_ids)
return;

/* Don't yet figure out if it's superfluous */
if (nr_unique_ids >= cpumask_last(cpu_possible_mask))
return;

for_each_cpu_wrap(cpu, cpu_possible_mask, nr_unique_ids)
set_cpu_possible(cpu, false);

> + nr_cpu_ids = nr_unique_ids;
> + pr_info("Allowing %d possible CPUs\n", nr_cpu_ids);
> +}

--
With Best Regards,
Andy Shevchenko

2018-03-15 03:02:13

by Dou Liyang

[permalink] [raw]
Subject: Re: [RFC PATCH] ACPI / processor: Get accurate possible CPU count

Hi Andy,

At 03/15/2018 01:24 AM, Andy Shevchenko wrote:
> On Wed, Mar 14, 2018 at 12:28 PM, Dou Liyang <[email protected]> wrote:
>
>> +static void __init acpi_update_possible_map(void)
>> +{
>> + unsigned int cpu, nr = 0;
>> +
>
>> + if (nr_cpu_ids <= nr_unique_ids)
>> + return;
>> +
>> + for_each_possible_cpu(cpu) {
>> + if (nr >= nr_unique_ids)
>> + set_cpu_possible(cpu, false);
>> + nr++;
>> + }
>
> IIUC this can be optimized to:
>

Yes, I agree, It's smarter and clearer. Will use it.

Thanks,
dou

> if (nr_unique_ids >= nr_cpu_ids)
> return;
>
> /* Don't yet figure out if it's superfluous */
> if (nr_unique_ids >= cpumask_last(cpu_possible_mask))
> return;
>
> for_each_cpu_wrap(cpu, cpu_possible_mask, nr_unique_ids)
> set_cpu_possible(cpu, false);
>



>> + nr_cpu_ids = nr_unique_ids;
>> + pr_info("Allowing %d possible CPUs\n", nr_cpu_ids);
>> +}
>



2018-03-16 11:19:31

by Fengguang Wu

[permalink] [raw]
Subject: [ACPI / processor] d619c81e24: WARNING:at_include/linux/cpumask.h:#cpumask_test_cpu

FYI, we noticed the following commit (built with gcc-7):

commit: d619c81e246424e322f7a902bed6e60b90668d56 ("ACPI / processor: Get accurate possible CPU count")
url: https://github.com/0day-ci/linux/commits/Dou-Liyang/ACPI-processor-Get-accurate-possible-CPU-count/20180316-140349


in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 1G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+------------------------------------------------------+------------+------------+
| | 3266b5bd97 | d619c81e24 |
+------------------------------------------------------+------------+------------+
| boot_successes | 16 | 2 |
| boot_failures | 0 | 18 |
| WARNING:at_include/linux/cpumask.h:#cpumask_test_cpu | 0 | 18 |
| WARNING:at_include/linux/cpumask.h:#cpumask_check | 0 | 18 |
| RIP:cpumask_check | 0 | 18 |
| RIP:cpumask_test_cpu | 0 | 18 |
| general_protection_fault:#[##] | 0 | 13 |
| RIP:__lock_acquire | 0 | 13 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 13 |
| BUG:kernel_hang_in_boot_stage | 0 | 5 |
+------------------------------------------------------+------------+------------+



[ 0.830741] WARNING: CPU: 1 PID: 1 at include/linux/cpumask.h:122 cpumask_test_cpu+0x32/0x57
[ 0.830785] WARNING: CPU: 1 PID: 1 at include/linux/cpumask.h:122 cpumask_check+0x2d/0x48
[ 0.830855] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 4.16.0-rc4-00340-gd619c81 #2
[ 0.830903] RIP: 0010:cpumask_check+0x2d/0x48
[ 0.830949] RSP: 0000:ffff880035803c70 EFLAGS: 00010046
[ 0.831049] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[ 0.831095] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff82caee68
[ 0.831141] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
[ 0.831187] R10: ffff880035803c38 R11: 0000000000000088 R12: 0000000000000001
[ 0.831232] R13: ffffffff82dbd4b0 R14: 0000000000000000 R15: 0000000000000006
[ 0.831279] FS: 0000000000000000(0000) GS:ffff880035800000(0000) knlGS:0000000000000000
[ 0.831328] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.831374] CR2: 00000000ffffffff CR3: 0000000002a6a000 CR4: 00000000000006e0
[ 0.831421] Call Trace:
[ 0.831466] <IRQ>
[ 0.831514] ? console_unlock+0x46/0x5d2
[ 0.831563] ? vprintk_emit+0x382/0x3bd
[ 0.831611] ? cpumask_test_cpu+0x32/0x57
[ 0.831661] ? printk+0x3e/0x46
[ 0.831717] ? __check_object_size+0x170/0x1a8
[ 0.831767] ? cpumask_test_cpu+0x32/0x57
[ 0.831816] ? __warn+0x62/0xe6
[ 0.831865] ? cpumask_test_cpu+0x32/0x57
[ 0.831914] ? report_bug+0x74/0xbd
[ 0.831965] ? fixup_bug+0x1b/0x31
[ 0.832000] ? do_error_trap+0x8d/0x110
[ 0.832000] ? __run_timers+0x180/0x190
[ 0.832000] ? kvm_clock_read+0x44/0x54
[ 0.832000] ? kvm_sched_clock_read+0x5/0xd
[ 0.832000] ? paravirt_sched_clock+0x5/0x8
[ 0.832000] ? sched_clock_local+0x2c/0x9c
[ 0.832000] ? trace_hardirqs_off_caller+0xa7/0xb7
[ 0.832000] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 0.832000] ? native_iret+0x7/0x7
[ 0.832000] ? invalid_op+0x18/0x40
[ 0.832000] ? cpumask_test_cpu+0x32/0x57
[ 0.832000] ? trace_rcu_dyntick+0xd6/0x138
[ 0.832000] ? rcu_nmi_enter+0x85/0x92
[ 0.832000] ? irq_enter+0x6/0x62
[ 0.832000] ? smp_irq_work_interrupt+0x6/0x244
[ 0.832000] ? irq_work_interrupt+0xf/0x14
[ 0.832000] </IRQ>
[ 0.832000] ? arch_local_irq_restore+0x2/0x8
[ 0.832000] ? vprintk_emit+0x23a/0x3bd
[ 0.832000] ? do_early_param+0x88/0x88
[ 0.832000] ? printk+0x3e/0x46
[ 0.832000] ? acpi_processor_init+0xa0/0xc3
[ 0.832000] ? acpi_scan_init+0x15/0x200
[ 0.832000] ? acpi_sleep_init+0xf2/0xf2
[ 0.832000] ? acpi_init+0x2af/0x2f5
[ 0.832000] ? acpi_sleep_init+0xf2/0xf2
[ 0.832000] ? do_one_initcall+0x93/0x173
[ 0.832000] ? do_early_param+0x88/0x88
[ 0.832000] ? kernel_init_freeable+0x119/0x1a1
[ 0.832000] ? rest_init+0xba/0xba
[ 0.832000] ? kernel_init+0x5/0xe1
[ 0.832000] ? ret_from_fork+0x24/0x30
[ 0.832000] Code: 44 8b 25 13 bb ca 01 55 31 ed 53 89 fb 41 39 fc 48 c7 c7 68 ee ca 82 40 0f 96 c5 31 c9 31 d2 89 ee e8 4f 77 05 00 41 39 dc 77 02 <0f> 0b 89 ee 31 c9 31 d2 48 c7 c7 38 ee ca 82 e8 36 77 05 00 89
[ 0.832000] ---[ end trace 2cf918b312b54122 ]---


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
lkp


Attachments:
(No filename) (4.97 kB)
config-4.16.0-rc4-00340-gd619c81 (114.28 kB)
dmesg.xz (12.02 kB)
Download all attachments

2018-03-16 11:20:16

by Fengguang Wu

[permalink] [raw]
Subject: [ACPI / processor] d619c81e24: WARNING:at_include/linux/cpumask.h:#cpumask_test_cpu

FYI, we noticed the following commit (built with gcc-7):

commit: d619c81e246424e322f7a902bed6e60b90668d56 ("ACPI / processor: Get accurate possible CPU count")
url: https://github.com/0day-ci/linux/commits/Dou-Liyang/ACPI-processor-Get-accurate-possible-CPU-count/20180316-140349


in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 1G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+------------------------------------------------------+------------+------------+
| | 3266b5bd97 | d619c81e24 |
+------------------------------------------------------+------------+------------+
| boot_successes | 16 | 2 |
| boot_failures | 0 | 18 |
| WARNING:at_include/linux/cpumask.h:#cpumask_test_cpu | 0 | 18 |
| WARNING:at_include/linux/cpumask.h:#cpumask_check | 0 | 18 |
| RIP:cpumask_check | 0 | 18 |
| RIP:cpumask_test_cpu | 0 | 18 |
| general_protection_fault:#[##] | 0 | 13 |
| RIP:__lock_acquire | 0 | 13 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 13 |
| BUG:kernel_hang_in_boot_stage | 0 | 5 |
+------------------------------------------------------+------------+------------+



[ 0.830741] WARNING: CPU: 1 PID: 1 at include/linux/cpumask.h:122 cpumask_test_cpu+0x32/0x57
[ 0.830785] WARNING: CPU: 1 PID: 1 at include/linux/cpumask.h:122 cpumask_check+0x2d/0x48
[ 0.830855] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 4.16.0-rc4-00340-gd619c81 #2
[ 0.830903] RIP: 0010:cpumask_check+0x2d/0x48
[ 0.830949] RSP: 0000:ffff880035803c70 EFLAGS: 00010046
[ 0.831049] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[ 0.831095] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff82caee68
[ 0.831141] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
[ 0.831187] R10: ffff880035803c38 R11: 0000000000000088 R12: 0000000000000001
[ 0.831232] R13: ffffffff82dbd4b0 R14: 0000000000000000 R15: 0000000000000006
[ 0.831279] FS: 0000000000000000(0000) GS:ffff880035800000(0000) knlGS:0000000000000000
[ 0.831328] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.831374] CR2: 00000000ffffffff CR3: 0000000002a6a000 CR4: 00000000000006e0
[ 0.831421] Call Trace:
[ 0.831466] <IRQ>
[ 0.831514] ? console_unlock+0x46/0x5d2
[ 0.831563] ? vprintk_emit+0x382/0x3bd
[ 0.831611] ? cpumask_test_cpu+0x32/0x57
[ 0.831661] ? printk+0x3e/0x46
[ 0.831717] ? __check_object_size+0x170/0x1a8
[ 0.831767] ? cpumask_test_cpu+0x32/0x57
[ 0.831816] ? __warn+0x62/0xe6
[ 0.831865] ? cpumask_test_cpu+0x32/0x57
[ 0.831914] ? report_bug+0x74/0xbd
[ 0.831965] ? fixup_bug+0x1b/0x31
[ 0.832000] ? do_error_trap+0x8d/0x110
[ 0.832000] ? __run_timers+0x180/0x190
[ 0.832000] ? kvm_clock_read+0x44/0x54
[ 0.832000] ? kvm_sched_clock_read+0x5/0xd
[ 0.832000] ? paravirt_sched_clock+0x5/0x8
[ 0.832000] ? sched_clock_local+0x2c/0x9c
[ 0.832000] ? trace_hardirqs_off_caller+0xa7/0xb7
[ 0.832000] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 0.832000] ? native_iret+0x7/0x7
[ 0.832000] ? invalid_op+0x18/0x40
[ 0.832000] ? cpumask_test_cpu+0x32/0x57
[ 0.832000] ? trace_rcu_dyntick+0xd6/0x138
[ 0.832000] ? rcu_nmi_enter+0x85/0x92
[ 0.832000] ? irq_enter+0x6/0x62
[ 0.832000] ? smp_irq_work_interrupt+0x6/0x244
[ 0.832000] ? irq_work_interrupt+0xf/0x14
[ 0.832000] </IRQ>
[ 0.832000] ? arch_local_irq_restore+0x2/0x8
[ 0.832000] ? vprintk_emit+0x23a/0x3bd
[ 0.832000] ? do_early_param+0x88/0x88
[ 0.832000] ? printk+0x3e/0x46
[ 0.832000] ? acpi_processor_init+0xa0/0xc3
[ 0.832000] ? acpi_scan_init+0x15/0x200
[ 0.832000] ? acpi_sleep_init+0xf2/0xf2
[ 0.832000] ? acpi_init+0x2af/0x2f5
[ 0.832000] ? acpi_sleep_init+0xf2/0xf2
[ 0.832000] ? do_one_initcall+0x93/0x173
[ 0.832000] ? do_early_param+0x88/0x88
[ 0.832000] ? kernel_init_freeable+0x119/0x1a1
[ 0.832000] ? rest_init+0xba/0xba
[ 0.832000] ? kernel_init+0x5/0xe1
[ 0.832000] ? ret_from_fork+0x24/0x30
[ 0.832000] Code: 44 8b 25 13 bb ca 01 55 31 ed 53 89 fb 41 39 fc 48 c7 c7 68 ee ca 82 40 0f 96 c5 31 c9 31 d2 89 ee e8 4f 77 05 00 41 39 dc 77 02 <0f> 0b 89 ee 31 c9 31 d2 48 c7 c7 38 ee ca 82 e8 36 77 05 00 89
[ 0.832000] ---[ end trace 2cf918b312b54122 ]---


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
lkp


Attachments:
(No filename) (4.97 kB)
config-4.16.0-rc4-00340-gd619c81 (114.28 kB)
job-script (4.20 kB)
dmesg.xz (12.02 kB)
Download all attachments

2018-03-19 00:43:15

by Dou Liyang

[permalink] [raw]
Subject: Re: [ACPI / processor] d619c81e24: WARNING:at_include/linux/cpumask.h:#cpumask_test_cpu

Hi lkp team

Thank you for testing.

At 03/16/2018 07:17 PM, kernel test robot wrote:
> FYI, we noticed the following commit (built with gcc-7):
>
> commit: d619c81e246424e322f7a902bed6e60b90668d56 ("ACPI / processor: Get accurate possible CPU count")
> url: https://github.com/0day-ci/linux/commits/Dou-Liyang/ACPI-processor-Get-accurate-possible-CPU-count/20180316-140349
>
>
> in testcase: boot
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 1G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> +------------------------------------------------------+------------+------------+
> | | 3266b5bd97 | d619c81e24 |
> +------------------------------------------------------+------------+------------+
> | boot_successes | 16 | 2 |
> | boot_failures | 0 | 18 |
> | WARNING:at_include/linux/cpumask.h:#cpumask_test_cpu | 0 | 18 |
> | WARNING:at_include/linux/cpumask.h:#cpumask_check | 0 | 18 |
> | RIP:cpumask_check | 0 | 18 |
> | RIP:cpumask_test_cpu | 0 | 18 |
> | general_protection_fault:#[##] | 0 | 13 |
> | RIP:__lock_acquire | 0 | 13 |
> | Kernel_panic-not_syncing:Fatal_exception | 0 | 13 |
> | BUG:kernel_hang_in_boot_stage | 0 | 5 |
> +------------------------------------------------------+------------+------------+
>
>
>
> [ 0.830741] WARNING: CPU: 1 PID: 1 at include/linux/cpumask.h:122 cpumask_test_cpu+0x32/0x57
> [ 0.830785] WARNING: CPU: 1 PID: 1 at include/linux/cpumask.h:122 cpumask_check+0x2d/0x48

Yes, this patch broke the number of CPUs, we will find out a new way,
please drop the test work of this patch.

Thanks,
dou