Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752869AbdLDB2g (ORCPT ); Sun, 3 Dec 2017 20:28:36 -0500 Received: from mx4.wp.pl ([212.77.101.11]:11268 "EHLO mx4.wp.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751383AbdLDB2c (ORCPT ); Sun, 3 Dec 2017 20:28:32 -0500 Date: Sun, 3 Dec 2017 17:28:24 -0800 From: Jakub Kicinski To: LKML Cc: "netdev@vger.kernel.org" , Prarit Bhargava , Thomas Gleixner Subject: [bisected] x86 boot still broken on -rc2 Message-ID: <20171203172824.3d9b7ede@cakuba.netronome.com> In-Reply-To: <20171201163954.2e356787@cakuba.netronome.com> References: <20171201163954.2e356787@cakuba.netronome.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-WP-MailID: a2cfa5e992057bdbdc84361ffeea0b3b X-WP-AV: skaner antywirusowy Poczty Wirtualnej Polski X-WP-SPAM: NO 000000A [YUPU] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10033 Lines: 170 Same thing on rc2, bisected down to: commit b4c0a7326f5dc0ef7a64128b0ae7d081f4b2cbd1 (refs/bisect/bad) Author: Prarit Bhargava Date: Tue Nov 14 07:42:57 2017 -0500 x86/smpboot: Fix __max_logical_packages estimate A system booted with a small number of cores enabled per package panics because the estimate of __max_logical_packages is too low. This occurs when the total number of active cores across all packages is less than the maximum core count for a single package. e.g.: On a 4 package system with 20 cores/package where only 4 cores are enabled on each package, the value of __max_logical_packages is calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4. Calculate __max_logical_packages after the cpu enumeration has completed. Use the boot cpu's data to extrapolate the number of packages. Signed-off-by: Prarit Bhargava Signed-off-by: Thomas Gleixner Cc: Tom Lendacky Cc: Andi Kleen Cc: Christian Borntraeger Cc: Peter Zijlstra Cc: Kan Liang Cc: He Chen Cc: Stephane Eranian Cc: Dave Hansen Cc: Piotr Luc Cc: Andy Lutomirski Cc: Arvind Yadav Cc: Vitaly Kuznetsov Cc: Borislav Petkov Cc: Tim Chen Cc: Mathias Krause Cc: "Kirill A. Shutemov" Link: https://lkml.kernel.org/r/20171114124257.22013-4-prarit@redhat.com On Fri, 1 Dec 2017 16:39:54 -0800, Jakub Kicinski wrote: > Hi! > > I'm hitting these after DaveM pulled rc1 into net-next on my Xeon > E5-2630 v4 box. It also happens on linux-next. Did anyone else > experience it? (.config attached) > > [ 5.003771] WARNING: CPU: 14 PID: 1 at ../arch/x86/events/intel/uncore.c:936 uncore_pci_probe+0x285/0x2b0 > [ 5.007544] Modules linked in: > [ 5.007544] CPU: 14 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782 > [ 5.007544] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016 > [ 5.007544] task: 000000009e842725 task.stack: 000000008a63fd2d > [ 5.007544] RIP: 0010:uncore_pci_probe+0x285/0x2b0 > [ 5.007544] RSP: 0000:ffffad8580163d10 EFLAGS: 00010286 > [ 5.007544] RAX: ffff98576cc3df30 RBX: ffffffffb08037e0 RCX: ffffffffb0c1a120 > [ 5.007544] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffb0c1a960 > [ 5.007544] RBP: ffff985b6c00ac00 R08: fffffffffffffffe R09: 00000000000fffff > [ 5.007544] R10: ffff98576f1b6018 R11: 0000000000000022 R12: ffff985b6c641000 > [ 5.007544] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000001 > [ 5.007544] FS: 0000000000000000(0000) GS:ffff98576fb80000(0000) knlGS:0000000000000000 > [ 5.007544] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 5.007544] CR2: 0000000000000000 CR3: 0000000185c09001 CR4: 00000000003606e0 > [ 5.007544] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 5.007544] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 5.007544] Call Trace: > [ 5.007544] local_pci_probe+0x3d/0x90 > [ 5.007544] ? pci_match_device+0xd9/0x100 > [ 5.007544] pci_device_probe+0x122/0x180 > [ 5.007544] driver_probe_device+0x246/0x330 > [ 5.007544] ? set_debug_rodata+0x11/0x11 > [ 5.007544] __driver_attach+0x8a/0x90 > [ 5.007544] ? driver_probe_device+0x330/0x330 > [ 5.007544] bus_for_each_dev+0x5c/0x90 > [ 5.007544] bus_add_driver+0x196/0x220 > [ 5.007544] driver_register+0x57/0xc0 > [ 5.007544] intel_uncore_init+0x1e3/0x249 > [ 5.007544] ? uncore_type_init+0x193/0x193 > [ 5.007544] ? set_debug_rodata+0x11/0x11 > [ 5.007544] do_one_initcall+0x4b/0x190 > [ 5.007544] kernel_init_freeable+0x16e/0x1f5 > [ 5.007544] ? rest_init+0xd0/0xd0 > [ 5.007544] kernel_init+0xa/0x100 > [ 5.007544] ret_from_fork+0x1f/0x30 > [ 5.007544] Code: 48 8b 52 08 48 85 d2 74 0d 89 44 24 04 48 89 df ff d2 8b 44 24 04 48 89 df 89 44 24 04 e8 54 0a 1c 00 8b 44 24 0 > [ 5.007544] ---[ end trace 4dc4c3d5f5afcd2f ]--- > [ 5.244504] bdx_uncore: probe of 0000:ff:08.2 failed with error -22 > [ 5.251604] bdx_uncore: probe of 0000:ff:0b.1 failed with error -22 > [ 5.258711] bdx_uncore: probe of 0000:ff:10.1 failed with error -22 > [ 5.265819] bdx_uncore: probe of 0000:ff:14.0 failed with error -22 > [ 5.272919] bdx_uncore: probe of 0000:ff:14.1 failed with error -22 > [ 5.280019] bdx_uncore: probe of 0000:ff:15.0 failed with error -22 > [ 5.287112] bdx_uncore: probe of 0000:ff:15.1 failed with error -22 > [ 5.294376] WARNING: CPU: 1 PID: 15 at ../arch/x86/events/intel/uncore.c:1065 uncore_change_type_ctx.isra.5+0xe6/0xf0 > [ 5.298362] Modules linked in: > [ 5.298362] CPU: 1 PID: 15 Comm: cpuhp/1 Tainted: G W 4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782 > [ 5.298362] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016 > [ 5.298362] task: 00000000ae78bc8f task.stack: 00000000f79660c1 > [ 5.298362] RIP: 0010:uncore_change_type_ctx.isra.5+0xe6/0xf0 > [ 5.298362] RSP: 0000:ffffad85833b3db8 EFLAGS: 00010213 > [ 5.298362] RAX: 0000000000000000 RBX: ffff9857669b0200 RCX: 0000000000000001 > [ 5.298362] RDX: ffff985b6f000000 RSI: ffff985b66580400 RDI: ffffffffb0c1ae8c > [ 5.298362] RBP: ffff985b66580400 R08: ffffffffb0c1ae8c R09: 0000000000000001 > [ 5.298362] R10: 0000000000000000 R11: 00000000003d0900 R12: 0000000000000000 > [ 5.298362] R13: ffffffffffffffff R14: 0000000000000001 R15: 0000000000000008 > [ 5.298362] FS: 0000000000000000(0000) GS:ffff985b6f000000(0000) knlGS:0000000000000000 > [ 5.298362] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 5.298362] CR2: 0000000000000000 CR3: 0000000185c09001 CR4: 00000000003606e0 > [ 5.298362] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 5.298362] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 5.298362] Call Trace: > [ 5.298362] uncore_event_cpu_online+0x283/0x340 > [ 5.298362] ? uncore_event_cpu_offline+0x180/0x180 > [ 5.298362] cpuhp_invoke_callback+0x8c/0x620 > [ 5.298362] ? __schedule+0x1ad/0x6c0 > [ 5.298362] ? sort_range+0x20/0x20 > [ 5.298362] cpuhp_thread_fun+0xbc/0x140 > [ 5.298362] smpboot_thread_fn+0x114/0x1d0 > [ 5.298362] kthread+0x111/0x130 > [ 5.298362] ? kthread_create_on_node+0x40/0x40 > [ 5.298362] ret_from_fork+0x1f/0x30 > [ 5.298362] Code: 2a 44 89 73 10 41 83 c4 01 48 81 c5 40 01 00 00 45 3b 20 7c cf 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f f > [ 5.298362] ---[ end trace 4dc4c3d5f5afcd30 ]--- > [ 5.504808] Scanning for low memory corruption every 60 seconds > [ 5.512347] Initialise system trusted keyrings > [ 5.517470] workingset: timestamp_bits=40 max_order=23 bucket_order=0 > [ 5.524840] BUG: unable to handle kernel paging request at 0000000023314bf4 > [ 5.528761] IP: __kmalloc_track_caller+0xa8/0x210 > [ 5.528761] PGD 185c0a067 P4D 185c0a067 PUD 185c0c067 PMD 0 > [ 5.528761] Oops: 0000 [#1] PREEMPT SMP > [ 5.528761] Modules linked in: > [ 5.528761] CPU: 14 PID: 1 Comm: swapper/0 Tainted: G W 4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782 > [ 5.528761] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016 > [ 5.528761] task: 000000009e842725 task.stack: 000000008a63fd2d > [ 5.528761] RIP: 0010:__kmalloc_track_caller+0xa8/0x210 > [ 5.528761] RSP: 0000:ffffad8580163d58 EFLAGS: 00010286 > [ 5.528761] RAX: 0000000000000000 RBX: ffffffffffffffff RCX: 000000000012ce0e > [ 5.528761] RDX: 000000000012cd0e RSI: 000000000012cd0e RDI: 000000000001dde0 > [ 5.528761] RBP: ffff985700000001 R08: ffff98576f407c00 R09: ffffffffb071edbf > [ 5.528761] R10: ffffd54de1995600 R11: ffff985b6655915f R12: 0000000000000004 > [ 5.528761] R13: 00000000014000c0 R14: ffffffffb026c239 R15: ffff98576f407c00 > [ 5.528761] FS: 0000000000000000(0000) GS:ffff98576fb80000(0000) knlGS:0000000000000000 > [ 5.528761] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 5.528761] CR2: ffffffffffffffff CR3: 0000000185c09001 CR4: 00000000003606e0 > [ 5.528761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 5.528761] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 5.528761] Call Trace: > [ 5.528761] kstrdup+0x2d/0x60 > [ 5.528761] __kernfs_new_node+0x29/0x130 > [ 5.528761] kernfs_new_node+0x24/0x50 > [ 5.528761] kernfs_create_link+0x29/0x90 > [ 5.528761] sysfs_do_create_link_sd.isra.0+0x5d/0xc0 > [ 5.528761] sysfs_slab_add+0x1f5/0x270 > [ 5.528761] ? set_debug_rodata+0x11/0x11 > [ 5.528761] slab_sysfs_init+0x8b/0xfa > [ 5.528761] ? kmem_cache_init+0xf9/0xf9 > [ 5.528761] do_one_initcall+0x4b/0x190 > [ 5.528761] kernel_init_freeable+0x16e/0x1f5 > [ 5.528761] ? rest_init+0xd0/0xd0 > [ 5.528761] kernel_init+0xa/0x100 > [ 5.528761] ret_from_fork+0x1f/0x30 > [ 5.528761] Code: 49 63 47 20 49 8b 3f 48 8d 8a 00 01 00 00 48 8b 5c 05 00 48 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 ab 48 85 db 7 > [ 5.528761] RIP: __kmalloc_track_caller+0xa8/0x210 RSP: ffffad8580163d58 > [ 5.528761] CR2: ffffffffffffffff > [ 5.528761] ---[ end trace 4dc4c3d5f5afcd31 ]--- > [ 5.773089] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 > [ 5.773089] > [ 5.777076] Kernel Offset: 0x2f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > [ 5.777076] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009