Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932609AbcJNPBy (ORCPT ); Fri, 14 Oct 2016 11:01:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52544 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932455AbcJNPBp (ORCPT ); Fri, 14 Oct 2016 11:01:45 -0400 Subject: Re: aarch64 ACPI boot regressed by commit 7ba5f605f3a0 ("arm64/numa: remove the limitation that cpu0 must bind to node0") To: Andrew Jones , Zhen Lei , Will Deacon , Lorenzo Pieralisi , Hanjun Guo References: <4a64cd93-5ead-aad6-1057-f42224d65b43@redhat.com> <20161014080524.4hm2b4p373r7rhel@hawk.localdomain> <04f22a79-301b-f05b-033d-c7a24c9f4084@redhat.com> Cc: main kernel list , linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Shannon Zhao , Wei Huang From: Laszlo Ersek Message-ID: Date: Fri, 14 Oct 2016 17:01:40 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <04f22a79-301b-f05b-033d-c7a24c9f4084@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Fri, 14 Oct 2016 15:01:44 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 32163 Lines: 590 On 10/14/16 15:18, Laszlo Ersek wrote: > On 10/14/16 10:05, Andrew Jones wrote: >> On Fri, Oct 14, 2016 at 12:50:29AM +0200, Laszlo Ersek wrote: >>> (4) Analysis (well, a lame attempt at that, because I have zero >>> familiarity with this code). Let me quote the patch: >>> >>>> commit 7ba5f605f3a0d9495aad539eeb8346d726dfc183 >>>> Author: Zhen Lei >>>> Date: Thu Sep 1 14:55:04 2016 +0800 >>>> >>>> arm64/numa: remove the limitation that cpu0 must bind to node0 >>>> >>>> 1. Remove the old binding code. >>>> 2. Read the nid of cpu0 from dts. >>>> 3. Fallback the nid of cpu0 to 0 when numa=off is set in bootargs. >>>> >>>> Signed-off-by: Zhen Lei >>>> Signed-off-by: Will Deacon >>>> >>>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c >>>> index c3c08368a685..8b048e6ec34a 100644 >>>> --- a/arch/arm64/kernel/smp.c >>>> +++ b/arch/arm64/kernel/smp.c >>>> @@ -624,6 +624,7 @@ static void __init of_parse_and_init_cpus(void) >>>> } >>>> >>>> bootcpu_valid = true; >>>> + early_map_cpu_to_node(0, of_node_to_nid(dn)); >>>> >>>> /* >>>> * cpu_logical_map has already been >>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c >>>> index 0a15f010b64a..778a985c8a70 100644 >>>> --- a/arch/arm64/mm/numa.c >>>> +++ b/arch/arm64/mm/numa.c >>>> @@ -116,16 +116,24 @@ static void __init setup_node_to_cpumask_map(void) >>>> */ >>>> void numa_store_cpu_info(unsigned int cpu) >>>> { >>>> - map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]); >>>> + map_cpu_to_node(cpu, cpu_to_node_map[cpu]); >>>> } >>>> >>>> void __init early_map_cpu_to_node(unsigned int cpu, int nid) >>>> { >>>> /* fallback to node 0 */ >>>> - if (nid < 0 || nid >= MAX_NUMNODES) >>>> + if (nid < 0 || nid >= MAX_NUMNODES || numa_off) >>>> nid = 0; >> >> The ACPI equivalent code must be missing (at least) the above, >> because, even with DT, mach-virt won't have cpu to node mappings >> unless numa is configured on the command line. Can you try adding >> something like >> >> -m 512 -smp 4 \ >> -numa node,mem=256M,cpus=0-1,nodeid=0 \ >> -numa node,mem=256M,cpus=2-3,nodeid=1 >> >> to your QEMU command line? > > I added the following to my domain XML, under : > > > > > > > (See .) > > With that, each NUMA node gets half of the VCPUs and half of the guest > RAM. > > (This is in a different guest now, one that has a bleeding edge Fedora > kernel -- I didn't want to rebuild the upstream kernel yet again, just > for this test. So, "4.9.0-0.rc0.git7.1.fc26.aarch64" is based on > upstream v4.8-14109-g1573d2c, and it reproduces the problem too.) > >> Then when you boot with ACPI you'll get a >> SRAT. > > Yes, that's confirmed by the guest kernel log (see below). > >> If that works, then we're just missing the "no SRAT, nid = 0" >> code (that should have been added with this patch) > > It still crashes with the SRAT, with the following log: > >> EFI stub: Booting Linux Kernel... >> ConvertPages: Incompatible memory types >> EFI stub: Using DTB from configuration table >> EFI stub: Exiting boot services and installing virtual address map... >> [ 0.000000] Booting Linux on physical CPU 0x0 >> [ 0.000000] Linux version 4.9.0-0.rc0.git7.1.fc26.aarch64 (mockbuild@buildvm-aarch64-01.arm.fedoraproject.org) (gcc version 6.2.1 20160916 (Red Hat 6.2.1-2) (GCC) ) #1 SMP Wed Oct 12 17:44:54 UTC 2016 >> [ 0.000000] Boot CPU: AArch64 Processor [500f0000] >> [ 0.000000] efi: Getting EFI parameters from FDT: >> [ 0.000000] efi: EFI v2.60 by EDK II >> [ 0.000000] efi: SMBIOS 3.0=0xbbdb0000 ACPI 2.0=0xb86d0000 MEMATTR=0xb936b018 >> [ 0.000000] cma: Reserved 512 MiB at 0x00000000e0000000 >> [ 0.000000] ACPI: Early table checksum verification disabled >> [ 0.000000] ACPI: RSDP 0x00000000B86D0000 000024 (v02 BOCHS ) >> [ 0.000000] ACPI: XSDT 0x00000000B86C0000 000054 (v01 BOCHS BXPCFACP 00000001 01000013) >> [ 0.000000] ACPI: FACP 0x00000000B83E0000 00010C (v05 BOCHS BXPCFACP 00000001 BXPC 00000001) >> [ 0.000000] ACPI: DSDT 0x00000000B83F0000 0010E5 (v02 BOCHS BXPCDSDT 00000001 BXPC 00000001) >> [ 0.000000] ACPI: APIC 0x00000000B83D0000 00018C (v03 BOCHS BXPCAPIC 00000001 BXPC 00000001) >> [ 0.000000] ACPI: GTDT 0x00000000B83C0000 000060 (v02 BOCHS BXPCGTDT 00000001 BXPC 00000001) >> [ 0.000000] ACPI: MCFG 0x00000000B83B0000 00003C (v01 BOCHS BXPCMCFG 00000001 BXPC 00000001) >> [ 0.000000] ACPI: SPCR 0x00000000B83A0000 000050 (v02 BOCHS BXPCSPCR 00000001 BXPC 00000001) >> [ 0.000000] ACPI: SRAT 0x00000000B8390000 0000C8 (v03 BOCHS BXPCSRAT 00000001 BXPC 00000001) >> [ 0.000000] ACPI: SPCR: console: pl011,mmio,0x9000000,9600 >> [ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '9600') >> [ 0.000000] bootconsole [pl11] enabled >> [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x0 -> Node 0 >> [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> MPIDR 0x1 -> Node 0 >> [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x2 -> Node 1 >> [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> MPIDR 0x3 -> Node 1 >> [ 0.000000] NUMA: Adding memblock [0x40000000 - 0xbfffffff] on node 0 >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x40000000-0xbfffffff] >> [ 0.000000] NUMA: Adding memblock [0xc0000000 - 0x13fffffff] on node 1 >> [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0xc0000000-0x13fffffff] >> [ 0.000000] NUMA: Initmem setup node 0 [mem 0x40000000-0xbfffffff] >> [ 0.000000] NUMA: NODE_DATA [mem 0xbfff2580-0xbfffffff] >> [ 0.000000] NUMA: Initmem setup node 1 [mem 0xc0000000-0x13fffffff] >> [ 0.000000] NUMA: NODE_DATA [mem 0x13fff2580-0x13fffffff] >> [ 0.000000] Zone ranges: >> [ 0.000000] DMA [mem 0x0000000040000000-0x00000000ffffffff] >> [ 0.000000] Normal [mem 0x0000000100000000-0x000000013fffffff] >> [ 0.000000] Movable zone start for each node >> [ 0.000000] Early memory node ranges >> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000b838ffff] >> [ 0.000000] node 0: [mem 0x00000000b8390000-0x00000000b83fffff] >> [ 0.000000] node 0: [mem 0x00000000b8400000-0x00000000b841ffff] >> [ 0.000000] node 0: [mem 0x00000000b8420000-0x00000000b874ffff] >> [ 0.000000] node 0: [mem 0x00000000b8750000-0x00000000bbc1ffff] >> [ 0.000000] node 0: [mem 0x00000000bbc20000-0x00000000bbffffff] >> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bfffffff] >> [ 0.000000] node 1: [mem 0x00000000c0000000-0x000000013fffffff] >> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff] >> [ 0.000000] Initmem setup node 1 [mem 0x00000000c0000000-0x000000013fffffff] >> [ 0.000000] psci: probing for conduit method from ACPI. >> [ 0.000000] psci: PSCIv0.2 detected in firmware. >> [ 0.000000] psci: Using standard PSCI v0.2 function IDs >> [ 0.000000] psci: Trusted OS migration not required >> [ 0.000000] percpu: Embedded 3 pages/cpu @fffffe007fda0000 s117832 r8192 d70584 u196608 >> [ 0.000000] Detected PIPT I-cache on CPU0 >> [ 0.000000] Built 2 zonelists in Node order, mobility grouping on. Total pages: 65472 >> [ 0.000000] Policy zone: Normal >> [ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.9.0-0.rc0.git7.1.fc26.aarch64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap LANG=en_US.UTF-8 earlycon acpi=force >> [ 0.000000] PID hash table entries: 4096 (order: -1, 32768 bytes) >> [ 0.000000] software IO TLB [mem 0xdbff0000-0xdfff0000] (64MB) mapped at [fffffe009bff0000-fffffe009ffeffff] >> [ 0.000000] Memory: 3542976K/4194304K available (9148K kernel code, 1612K rwdata, 3776K rodata, 1600K init, 15899K bss, 127040K reserved, 524288K cma-reserved) >> [ 0.000000] Virtual kernel memory layout: >> [ 0.000000] modules : 0xfffffc0000000000 - 0xfffffc0008000000 ( 128 MB) >> vmalloc : 0xfffffc0008000000 - 0xfffffdff5fff0000 ( 2045 GB) >> .text : 0xfffffc0008080000 - 0xfffffc0008970000 ( 9152 KB) >> .rodata : 0xfffffc0008970000 - 0xfffffc0008d30000 ( 3840 KB) >> .init : 0xfffffc0008d30000 - 0xfffffc0008ec0000 ( 1600 KB) >> .data : 0xfffffc0008ec0000 - 0xfffffc0009053200 ( 1613 KB) >> .bss : 0xfffffc0009053200 - 0xfffffc0009fda058 ( 15900 KB) >> fixed : 0xfffffdff7e7d0000 - 0xfffffdff7ec00000 ( 4288 KB) >> PCI I/O : 0xfffffdff7ee00000 - 0xfffffdff7fe00000 ( 16 MB) >> vmemmap : 0xfffffdff80000000 - 0xfffffe0000000000 ( 2 GB maximum) >> 0xfffffdff80000000 - 0xfffffdff80400000 ( 4 MB actual) >> memory : 0xfffffe0000000000 - 0xfffffe0100000000 ( 4096 MB) >> [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=2 >> [ 0.000000] Running RCU self tests >> [ 0.000000] Hierarchical RCU implementation. >> [ 0.000000] RCU lockdep checking is enabled. >> [ 0.000000] Build-time adjustment of leaf fanout to 64. >> [ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4. >> [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=4 >> [ 0.000000] kmemleak: Kernel memory leak detector disabled >> [ 0.000000] NR_IRQS:64 nr_irqs:64 0 >> [ 0.000000] GICv2m: ACPI overriding V2M MSI_TYPER (base:80, num:64) >> [ 0.000000] GICv2m: range[mem 0x08020000-0x08020fff], SPI[80:143] >> [ 0.000000] GIC: PPI11 is secure or misconfigured >> [ 0.000000] arm_arch_timer: WARNING: Invalid trigger for IRQ3, assuming level low >> [ 0.000000] arm_arch_timer: WARNING: Please fix your firmware >> [ 0.000000] arm_arch_timer: Architected cp15 timer(s) running at 50.00MHz (virt). >> [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns >> [ 0.000003] sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns >> [ 0.002198] Console: colour dummy device 80x25 >> [ 0.003319] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar >> [ 0.005236] ... MAX_LOCKDEP_SUBCLASSES: 8 >> [ 0.006183] ... MAX_LOCK_DEPTH: 48 >> [ 0.007273] ... MAX_LOCKDEP_KEYS: 8191 >> [ 0.008287] ... CLASSHASH_SIZE: 4096 >> [ 0.009296] ... MAX_LOCKDEP_ENTRIES: 32768 >> [ 0.010327] ... MAX_LOCKDEP_CHAINS: 65536 >> [ 0.011318] ... CHAINHASH_SIZE: 32768 >> [ 0.012453] memory used by lock dependency info: 8159 kB >> [ 0.013736] per task-struct memory footprint: 1920 bytes >> [ 0.015742] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl >> [ 0.018710] Calibrating delay loop (skipped), value calculated using timer frequency.. 100.00 BogoMIPS (lpj=50000) >> [ 0.021221] pid_max: default: 32768 minimum: 301 >> [ 0.022806] ACPI: Core revision 20160831 >> [ 0.027885] ACPI: 1 ACPI AML tables successfully acquired and loaded >> >> [ 0.030252] Security Framework initialized >> [ 0.031355] Yama: becoming mindful. >> [ 0.032176] SELinux: Initializing. >> [ 0.033925] Dentry cache hash table entries: 524288 (order: 6, 4194304 bytes) >> [ 0.037039] Inode-cache hash table entries: 262144 (order: 5, 2097152 bytes) >> [ 0.039383] Mount-cache hash table entries: 8192 (order: 0, 65536 bytes) >> [ 0.041135] Mountpoint-cache hash table entries: 8192 (order: 0, 65536 bytes) >> [ 0.044725] ftrace: allocating 29596 entries in 8 pages >> [ 0.080467] ASID allocator initialised with 65536 entries >> [ 0.082070] ------------[ cut here ]------------ >> [ 0.083227] WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:5458 wq_numa_init+0x178/0x21c >> [ 0.085304] Modules linked in: >> [ 0.086102] >> [ 0.086499] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-0.rc0.git7.1.fc26.aarch64 #1 >> [ 0.088611] Hardware name: linux,dummy-virt (DT) >> [ 0.089816] task: fffffe00700aac00 task.stack: fffffe00f8044000 >> [ 0.091375] PC is at wq_numa_init+0x178/0x21c >> [ 0.092514] LR is at wq_numa_init+0x14c/0x21c >> [ 0.093654] pc : [] lr : [] pstate: 60000045 >> [ 0.095589] sp : fffffe00f8047cb0 >> [ 0.096457] x29: fffffe00f8047cb0 [ 0.097311] x28: 0000000000000000 >> [ 0.098201] >> [ 0.098601] x27: 0000000000000000 [ 0.099450] x26: fffffc0008ef4a28 >> [ 0.100342] >> [ 0.100730] x25: fffffc0008ef3000 [ 0.101576] x24: fffffc0008ef3574 >> [ 0.102466] >> [ 0.102853] x23: 0000000000000000 [ 0.103700] x22: fffffe007937de00 >> [ 0.104593] >> [ 0.104982] x21: fffffc0008e887f8 [ 0.105829] x20: fffffc0009091000 >> [ 0.106723] >> [ 0.107111] x19: 0000000000000000 [ 0.107956] x18: 0000000050642c6a >> [ 0.108847] >> [ 0.109234] x17: 0000000000000000 [ 0.110078] x16: 0000000000000000 >> [ 0.110968] >> [ 0.111363] x15: 00000000fcacdc89 [ 0.112199] x14: 0000000000000000 >> [ 0.113087] >> [ 0.113481] x13: 0000000000000000 [ 0.114324] x12: 00000000fe2ce6e0 >> [ 0.115204] >> [ 0.115597] x11: 0000000000000001 [ 0.116439] x10: 0000000000000048 >> [ 0.117328] >> [ 0.117716] x9 : 0000000000000000 [ 0.118563] x8 : fffffe00f4010080 >> [ 0.119453] >> [ 0.119833] x7 : 0000000000000000 [ 0.120678] x6 : 0000000000000000 >> [ 0.121571] >> [ 0.121959] x5 : 000000000000000f [ 0.122804] x4 : 0000000000000000 >> [ 0.123695] >> [ 0.124084] x3 : 0000000000000000 [ 0.124922] x2 : 0000000000000000 >> [ 0.125815] >> [ 0.126204] x1 : 0000000000000004 [ 0.127055] x0 : 00000000ffffffff >> [ 0.127966] >> [ 0.128361] >> [ 0.128767] ---[ end trace 0000000000000000 ]--- >> [ 0.129983] Call trace: >> [ 0.130629] Exception stack(0xfffffe00f8047ad0 to 0xfffffe00f8047c00) >> [ 0.132316] 7ac0: 0000000000000000 0000040000000000 >> [ 0.134360] 7ae0: fffffe00f8047cb0 fffffc0008d3f434 0000000060000045 000000000000003d >> [ 0.136405] 7b00: fffffc0008ef4000 fffffe007937df00 0000000000000000 0000000000000000 >> [ 0.138446] 7b20: fffffc0008bf4110 0000000000000189 0000000000000018 0000000000000028 >> [ 0.140498] 7b40: fffffe00f8047b80 0000000000000000 fffffe0000000000 fffffc000848af30 >> [ 0.142541] 7b60: fffffe00f8047ba0 fffffc0008134d24 fffffe00f8044000 0000000000000040 >> [ 0.144558] 7b80: 00000000ffffffff 0000000000000004 0000000000000000 0000000000000000 >> [ 0.146607] 7ba0: 0000000000000000 000000000000000f 0000000000000000 0000000000000000 >> [ 0.148664] 7bc0: fffffe00f4010080 0000000000000000 0000000000000048 0000000000000001 >> [ 0.150704] 7be0: 00000000fe2ce6e0 0000000000000000 0000000000000000 00000000fcacdc89 >> [ 0.152752] [] wq_numa_init+0x178/0x21c >> [ 0.154160] [] init_workqueues+0xa0/0x4b8 >> [ 0.155596] [] do_one_initcall+0x44/0x138 >> [ 0.157059] [] kernel_init_freeable+0x178/0x2dc >> [ 0.158670] [] kernel_init+0x18/0x110 >> [ 0.160036] [] ret_from_fork+0x10/0x20 >> [ 0.161440] workqueue: NUMA node mapping not available for cpu0, disabling NUMA support >> [ 0.165296] Remapping and enabling EFI services. >> [ 0.166586] Unable to handle kernel paging request at virtual address b91000006be8 >> [ 0.168448] pgd = fffffc000a010000 >> [ 0.169341] [b91000006be8] *pgd=0000000000000000[ 0.170505] , *pud=0000000000000000 >> , *pmd=0000000000000000[ 0.171942] >> [ 0.172332] Internal error: Oops: 96000004 [#1] SMP >> [ 0.173600] Modules linked in: >> [ 0.174407] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 4.9.0-0.rc0.git7.1.fc26.aarch64 #1 >> [ 0.176836] Hardware name: linux,dummy-virt (DT) >> [ 0.178038] task: fffffe00700aac00 task.stack: fffffe00f8044000 >> [ 0.179579] PC is at __ll_sc_atomic_add+0x20/0x40 >> [ 0.180800] LR is at __lock_acquire+0xe8/0x698 >> [ 0.181961] pc : [] lr : [] pstate: 800000c5 >> [ 0.183895] sp : fffffe00f8047820 >> [ 0.184755] x29: fffffe00f8047820 [ 0.185588] x28: fffffc0008ef3000 >> [ 0.186479] >> [ 0.186868] x27: fffffc0008ef2358 [ 0.187713] x26: fffffc0009ce6000 >> [ 0.188606] >> [ 0.188997] x25: 0000000000000001 [ 0.189857] x24: 0000000000000000 >> [ 0.190731] >> [ 0.191115] x23: fffffe00700aac00 [ 0.191951] x22: 0000000000000000 >> [ 0.192843] >> [ 0.193231] x21: fffffe007fd9a018 [ 0.194074] x20: 0000000000000000 >> [ 0.194966] >> [ 0.195361] x19: fffffe007fd9a018 [ 0.196192] x18: 0000000000000010 >> [ 0.197077] >> [ 0.197476] x17: 0000000057181979 [ 0.198325] x16: 0000000000000000 >> [ 0.199209] >> [ 0.199604] x15: 0000000000000000 [ 0.200450] x14: 0000000000000000 >> [ 0.201337] >> [ 0.201723] x13: 0000000000000001 [ 0.202555] x12: fffffe007fff2580 >> [ 0.203432] >> [ 0.203819] x11: 0000000000000000 [ 0.204664] x10: 0000000000000011 >> [ 0.205550] >> [ 0.205937] x9 : 0000000000000001 [ 0.206784] x8 : 0000b91000006be8 >> [ 0.207678] >> [ 0.208062] x7 : fffffc0008299fcc [ 0.208899] x6 : 0000000000000000 >> [ 0.209787] >> [ 0.210176] x5 : 0000000000000080 [ 0.211022] x4 : 0000b91000006a50 >> [ 0.211913] >> [ 0.212307] x3 : 0000000000000000 [ 0.213147] x2 : 000022c80000f420 >> [ 0.214034] >> [ 0.214421] x1 : 0000b91000006be8 [ 0.215251] x0 : fffffc0008138c08 >> [ 0.216134] >> [ 0.216527] >> [ 0.216916] Process swapper/0 (pid: 1, stack limit = 0xfffffe00f8044020) >> [ 0.218671] Stack: (0xfffffe00f8047820 to 0xfffffe00f8048000) >> [ 0.220167] 7820: fffffe00f8047840 fffffc0008138c08 fffffe00f8044000 0000000000000001 >> [ 0.222190] 7840: fffffe00f80478c0 fffffc0008139590 fffffe007fd9a018 0000000000000000 >> [ 0.224238] 7860: 0000000000000000 0000000000000000 0000000000000001 0000000000000000 >> [ 0.226284] 7880: fffffc0008299fcc 00000000000000c0 fffffc0008ef2358 fffffc0008ef3000 >> [ 0.228318] 78a0: 0000000000000001 fffffc0009ce6000 0000000000000000 fffffe0000000000 >> [ 0.230362] 78c0: fffffe00f8047930 fffffc000895f2c4 fffffe007fd9a000 fffffc0008299fcc >> [ 0.232394] 78e0: fffffe007fd9a000 fffffc000829ad94 fffffe007001db00 000000000000e8e8 >> [ 0.234435] 7900: fffffe007001db00 fffffe007001dbf8 fffffe00fff3ef50 0000000000000000 >> [ 0.236481] 7920: fffffe00f8047a20 fffffc0008ef2000 fffffe00f8047950 fffffc0008299fcc >> [ 0.238516] 7940: 00000000ffffffff fffffe007fd9a000 fffffe00f8047a70 fffffc000829aa68 >> [ 0.240560] 7960: 00000000ffffffff 0000000000000001 00000000024000c0 fffffc000829ad94 >> [ 0.242604] 7980: 0000000000210d00 000000000000e8e8 fffffe007001db00 fffffe007001dbf8 >> [ 0.244634] 79a0: fffffe00fff3ef50 0000000000000000 fffffe00f8044000 0000000000000040 >> [ 0.246678] 79c0: fffffc000828d620 fffffc0008ef3000 00000000026080c0 fffffe00fff3ef60 >> [ 0.248733] 79e0: fffffe00f8047a00 fffffc00024000c0 fffffc0008f89000 0000000000000000 >> [ 0.250783] 7a00: fffffe00f8047a20 fffffc000822f62c fffffc0009016b30 fffffe00f8047b40 >> [ 0.252896] 7a20: fffffe00f8047ba0 fffffc000828d620 0000000000000000 fffffc0008ef0b28 >> [ 0.255009] 7a40: fffffe007fff3c00 0000000000000000 0000000000000000 0000000000000000 >> [ 0.257121] 7a60: fffffe00f8044000 0000000000000000 fffffe00f8047b90 fffffc000829ad94 >> [ 0.259240] 7a80: 0000000000000040 fffffe007001db00 00000000024000c0 00000000ffffffff >> [ 0.261358] 7aa0: fffffc0008266284 fffffe00fff3ef50 0000000020000000 00e8000000000f07 >> [ 0.263472] 7ac0: 0000000000000000 0000000000000400 fffffc0008f89000 0000000000000000 >> [ 0.265662] 7ae0: fffffe00f8047b00 fffffc000822f62c fffffe00fff3ef60 0000000000000000 >> [ 0.267787] 7b00: 0000001000000000 fffffc0008266284 fffffe00f8047b50 fffffc0008134d24 >> [ 0.269905] 7b20: fffffe00f8044000 0000000000000040 fffffc0008bf4110 0000000000000189 >> [ 0.272020] 7b40: fffffc0008ef4000 0000000000000000 fffffe00f8047b70 fffffc000810267c >> [ 0.274136] 7b60: fffffc0009016893 0000000000000000 fffffe00f8047ba0 fffffc0008102784 >> [ 0.276250] 7b80: fffffe00f8047b90 fffffc000829ad7c fffffe00f8047bd0 fffffc000829b13c >> [ 0.278371] 7ba0: fffffe007001db00 00000000024000c0 fffffc0008266284 fffffe007001db00 >> [ 0.280484] 7bc0: fffffc0008ef4000 0000000000000000 fffffe00f8047c30 fffffc0008266284 >> [ 0.282600] 7be0: fffffdff801b0200 fffffe006c080000 000000006c080000 0000000020000000 >> [ 0.284715] 7c00: fffffe00f0010008 0000000004000000 0000000020000000 00e8000000000f07 >> [ 0.286831] 7c20: 0000000000000000 0000000000000000 fffffe00f8047c50 fffffc0008098e24 >> [ 0.288948] 7c40: fffffdff801b0200 0000000000000001 fffffe00f8047c80 fffffc00080991d0 >> [ 0.291062] 7c60: 0000000024000000 0000000000000001 0000000024000000 fffffc0008ef0b28 >> [ 0.293178] 7c80: fffffe00f8047d00 fffffc0008d361cc fffffe0078416018 00e8000000000707 >> [ 0.295296] 7ca0: fffffc0008ff6410 fffffc0008ef7000 0000000000000000 fffffc0008ff6410 >> [ 0.297408] 7cc0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 0.299523] 7ce0: 0000000000000000 00e8000000000f05 fffffc0008098dd0 0000000023ffffff >> [ 0.301636] 7d00: fffffe00f8047d10 fffffc0008d35020 fffffe00f8047d40 fffffc0008d88284 >> [ 0.303748] 7d20: fffffe0078416018 fffffc0008ff6000 fffffc0008c87348 fffffc0008d8821c >> [ 0.305863] 7d40: fffffe00f8047d90 fffffc0008083594 fffffc0008d88154 fffffe00f8044000 >> [ 0.307987] 7d60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 0.310099] 7d80: 0000000000000000 0000000004000000 fffffe00f8047e00 fffffc0008d30d28 >> [ 0.312217] 7da0: fffffc0008e622d8 fffffc0008e622e0 0000000000000040 0000000000000000 >> [ 0.314333] 7dc0: fffffe00f8047e00 fffffc0008d30d18 fffffc0008e62220 fffffc0008e622e0 >> [ 0.316445] 7de0: 0000000000000040 0000000000000000 0000000000000000 fffffc0008e622e0 >> [ 0.318572] 7e00: fffffe00f8047ea0 fffffc0008956f48 fffffc0008956f30 0000000000000000 >> [ 0.320692] 7e20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 0.322805] 7e40: 0000000000000000 0000000000000000 0000000000000000 0000000000000001 >> [ 0.324914] 7e60: 0000000000000003 0000000000000000 0000000000000000 0000000000000000 >> [ 0.327027] 7e80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 0.329139] 7ea0: 0000000000000000 fffffc0008083330 fffffc0008956f30 0000000000000000 >> [ 0.331248] 7ec0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 0.333361] 7ee0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 0.335470] 7f00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 0.337585] 7f20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 0.339695] 7f40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 0.341810] 7f60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 0.343923] 7f80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 0.346037] 7fa0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 0.348154] 7fc0: 0000000000000000 0000000000000005 0000000000000000 0000000000000000 >> [ 0.350272] 7fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 0.352392] Call trace: >> [ 0.353049] Exception stack(0xfffffe00f8047650 to 0xfffffe00f8047780) >> [ 0.354792] 7640: fffffe007fd9a018 0000040000000000 >> [ 0.356910] 7660: fffffe00f8047820 fffffc0008487390 fffffe00f80476e0 fffffc0008131290 >> [ 0.359025] 7680: fffffc000901690b fffffc0008f1e000 0000000000000001 fffffe00700aac00 >> [ 0.361140] 76a0: fffffc000901690b fffffc0008f27a28 fffffe00fff3b700 fffffc0008e8b700 >> [ 0.363255] 76c0: fffffe00fff3b700 fffffc0008ef1000 fffffe00f80476e0 00000000000000c0 >> [ 0.365373] 76e0: fffffe00f8047720 fffffc000811a374 fffffc0008138c08 0000b91000006be8 >> [ 0.367483] 7700: 000022c80000f420 0000000000000000 0000b91000006a50 0000000000000080 >> [ 0.369593] 7720: 0000000000000000 fffffc0008299fcc 0000b91000006be8 0000000000000001 >> [ 0.371702] 7740: 0000000000000011 0000000000000000 fffffe007fff2580 0000000000000001 >> [ 0.373817] 7760: 0000000000000000 0000000000000000 0000000000000000 0000000057181979 >> [ 0.375935] [] __ll_sc_atomic_add+0x20/0x40 >> [ 0.377489] [] __lock_acquire+0xe8/0x698 >> [ 0.378960] [] lock_acquire+0xd8/0x2c0 >> [ 0.380394] [] _raw_spin_lock+0x4c/0x60 >> [ 0.381843] [] get_partial_node.isra.23+0x4c/0x440 >> [ 0.383559] [] ___slab_alloc+0x438/0x710 >> [ 0.385031] [] __slab_alloc+0x54/0xa0 >> [ 0.386441] [] kmem_cache_alloc+0x35c/0x428 >> [ 0.387983] [] ptlock_alloc+0x2c/0x58 >> [ 0.389394] [] pgd_pgtable_alloc+0x54/0xd8 >> [ 0.390912] [] __create_pgd_mapping+0x158/0x2a8 >> [ 0.392556] [] create_pgd_mapping+0x30/0x38 >> [ 0.394100] [] efi_create_mapping+0xfc/0x110 >> [ 0.395682] [] arm_enable_runtime_services+0x130/0x204 >> [ 0.397501] [] do_one_initcall+0x44/0x138 >> [ 0.399001] [] kernel_init_freeable+0x178/0x2dc >> [ 0.400646] [] kernel_init+0x18/0x110 >> [ 0.402053] [] ret_from_fork+0x10/0x20 >> [ 0.403488] Code: aa1e03e0 aa0103e8 d503201f f9800111 (885f7d00) >> [ 0.405145] ---[ end trace f6be31446b0a9526 ]--- >> [ 0.406286] note: swapper/0[1] exited with preempt_count 1 >> [ 0.407687] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b >> [ 0.407687] >> [ 0.410047] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b >> [ 0.410047] >> > > This log contains two call traces. The first is a WARNING in > wq_numa_init(). The second is the unhandled page fault. > > Note the warning message (from wq_numa_init()): > > workqueue: NUMA node mapping not available for cpu0, disabling NUMA support > > Something looks genuinely broken with the cpu <-> numa-node > associations in the ACPI case -- it even seems to fail when the SRAT > does exist. > > So, perhaps, commit 7ba5f605f3a0 may not have introduced the bug, only > exposed one in the ACPI code?... Okay, so let me repeat, smp_init_cpus() [arch/arm64/kernel/smp.c] acpi_table_parse_madt() [drivers/acpi/tables.c] acpi_parse_gic_cpu_interface() [arch/arm64/kernel/smp.c] acpi_map_gic_cpu_interface() [arch/arm64/kernel/smp.c] early_map_cpu_to_node() [arch/arm64/mm/numa.c] We have acpi_map_gic_cpu_interface() being called for each GICC structure in the MADT (signature "APIC"). This function is supposed to set up a number of things for the CPU found, including its association with a NUMA node. This should happen even if we have only one node (no SRAT), and it should happen for CPU#0 as well. acpi_map_gic_cpu_interface() uses the global variable "cpu_count" like this: (a) on input, it is the number of CPUs found previously, that is, the logical identifier of the CPU being added presently, (b) on output, it is bumped by one, if the CPU got added / parsed correctly, (c) in-between, we have expressions like: > if (is_mpidr_duplicate(cpu_count, hwid)) { > pr_err("duplicate CPU MPIDR 0x%llx in MADT\n", hwid); > return; > } and > if (cpu_count >= NR_CPUS) > return; (note: this implies that NR_CPUS is an exclusive limit) and -- importantly -- > /* map the logical cpu id to cpu MPIDR */ > cpu_logical_map(cpu_count) = hwid; and -- even more importantly -- > early_map_cpu_to_node(cpu_count, acpi_numa_get_nid(cpu_count, hwid)); A whole bunch of stuff seems to be wrong with this, when we try to interpret it for CPU#0. Such as: (1) the global variable "cpu_count" is initialized to one, not zero. This dates back to the following commit: > commit 0f0783365cbb7ec13a8f02198f6e1a146d94a5a9 > Author: Lorenzo Pieralisi > Date: Wed May 13 14:12:47 2015 +0100 > > ARM64: kernel: unify ACPI and DT cpus initialization It means that none of the above checks and assignments will be performed for CPU#0. It also means that should we actually find NR_CPUs CPUs, the last one will be rejected, because at that point, cpu_count will equal NR_CPUs *on input*. (2) On arm64, cpu_logical_map() is implemented like this [arch/arm64/include/asm/smp_plat.h]: > /* > * Logical CPU mapping. > */ > extern u64 __cpu_logical_map[NR_CPUS]; > #define cpu_logical_map(cpu) __cpu_logical_map[cpu] So this is the declaration. The definition is back in "arch/arm64/kernel/setup.c": > u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID }; where INVALID_HWID is ULONG_MAX. This implies that > /* map the logical cpu id to cpu MPIDR */ > cpu_logical_map(cpu_count) = hwid; will never store a hwid different from INVALID_HWID to __cpu_logical_map[0], because "cpu_count" -- the offset into that array, for the assignment -- is never zero. (3) early_map_cpu_to_node() will never set cpu_to_node_map[0] to any NUMA node ID. (If early_map_cpu_to_node() was called with cpu_count==0 (correctly), it would call set_cpu_numa_node(), due to the change implemented by 7ba5f605f3a0: > /* > * We should set the numa node of cpu0 as soon as possible, because it > * has already been set up online before. cpu_to_node(0) will soon be > * called. > */ > if (!cpu) > set_cpu_numa_node(cpu, nid); but I don't know what that would suffice for.) (4) The acpi_numa_get_nid() function deserves separate treatment: > int acpi_numa_get_nid(unsigned int cpu, u64 hwid) > { > int i; > > for (i = 0; i < cpus_in_srat; i++) { > if (hwid == early_node_cpu_hwid[i].cpu_hwid) > return early_node_cpu_hwid[i].node_id; > } > > return NUMA_NO_NODE; > } So, (4a) if we have no SRAT (because there's only one NUMA node), then this function will invariably return NUMA_NO_NODE (value -1), which means that *even if* early_map_cpu_to_node() was called with cpu_count==0 (which it is not, see (3) above), the assigned NUMA node ID would still be NUMA_NO_NODE. That's wrong, it should be zero. (4b) The acpi_numa_get_nid() function completely ignores its first parameter, called "cpu" (set from "cpu_count" at the call site). This has been the case since the birth of that function, namely > commit d8b47fca8c233642d1a20fa4025579ebc8be6f1e > Author: Hanjun Guo > Date: Tue May 24 15:35:44 2016 -0700 > > arm64, ACPI, NUMA: NUMA support based on SRAT and SLIT I guess if that parameter is unnecessary, it should be removed. I'm sorry but I can't even begin to untangle this mess. Maybe the code I tried to analyze in this email was never *meant* to associate CPU#0 with any NUMA node at all (not even node 0); instead, other code -- for example code removed by 7ba5f605f3a0 -- was meant to perform that association. If that's the case, then the code I listed here might even be correct, for CPUs with logical IDs >= 1. The initialization of "cpu_count" to 1 does suggest that CPU#0 was never meant to be handled by acpi_map_gic_cpu_interface(). I can't tell. What I can tell is that 7ba5f605f3a0 breaks the ACPI boot. So - either (parts of) it should be reverted please, - or the ACPI boot path should be extended please, so that it handles CPU#0 as well (associating it with NUMA node #0 if there is no SRAT, and NUMA node #whatever, if there's an SRAT saying so). Thanks, Laszlo