Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S944949AbcJSTTy (ORCPT ); Wed, 19 Oct 2016 15:19:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60786 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932977AbcJSTTw (ORCPT ); Wed, 19 Oct 2016 15:19:52 -0400 Date: Wed, 19 Oct 2016 21:19:43 +0200 From: Jiri Olsa To: CAI Qian Cc: Rob Herring , Peter Zijlstra , Kan Liang , Greg Kroah-Hartman , linux-kernel , Ingo Molnar Subject: Re: [4.9-rc1+] intel_uncore builtin + CONFIG_DEBUG_TEST_DRIVER_REMOVE kernel panic Message-ID: <20161019191943.GA7951@krava> References: <907882571.66590.1476113724660.JavaMail.zimbra@redhat.com> <1219480016.67057.1476113847440.JavaMail.zimbra@redhat.com> <20161010172023.GA7148@kroah.com> <1035662571.647973.1476888331396.JavaMail.zimbra@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1035662571.647973.1476888331396.JavaMail.zimbra@redhat.com> User-Agent: Mutt/1.7.1 (2016-10-04) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Wed, 19 Oct 2016 19:19:47 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10384 Lines: 165 On Wed, Oct 19, 2016 at 10:45:31AM -0400, CAI Qian wrote: > It turns out this can only be reproducible when compiled intel_uncore as a builtin, i.e., > not compiled it as a module. The can still be reproduced in the yesterday's mainline. > > Here is some information about the system, > > Intel Platform: Grantley-R Wildcat Pass CPU: Broadwell-EP, B0. > Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > > [ ? 66.349263] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) > [ ? 66.356672] software IO TLB [mem 0x71c7d000-0x75c7d000] (64MB) mapped at [ffff880071c7d000-ffff880075c7cfff] > [ ? 66.369911] Intel CQM monitoring enabled > [ ? 66.374445] Intel MBM enabled > [ ? 66.385708] RAPL PMU: API unit is 2^-32 Joules, 4 fixed counters, 655360 ms ovfl timer > [ ? 66.394564] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules > [ ? 66.400991] RAPL PMU: hw unit of domain package 2^-14 Joules > [ ? 66.407317] RAPL PMU: hw unit of domain dram 2^-14 Joules > [ ? 66.413358] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules > [ ? 66.434040] ================================================================================ > [ ? 66.443462] UBSAN: Undefined behaviour in drivers/base/core.c:1251:17 > [ ? 66.450653] member access within null pointer of type 'struct device' > [ ? 66.457845] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48 > [ ? 66.465809] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015 > [ ? 66.477168] ?ffff880847aff798 ffffffff81d370b4 0000000041b58ab3 ffffffff83348dcf > [ ? 66.485469] ?ffffffff81d36ff4 ffff880847aff7c0 ffff880847aff770 ffff880e3f9d8000 > [ ? 66.493770] ?ffffffff82ff8a00 ffffffff8309c5c0 00000000000004e3 000000009091f309 > [ ? 66.502073] Call Trace: > [ ? 66.504811] ?[] dump_stack+0xc0/0x12c > [ ? 66.510644] ?[] ? _atomic_dec_and_lock+0xc4/0xc4 > [ ? 66.517548] ?[] ubsan_epilogue+0xd/0x8a > [ ? 66.523574] ?[] __ubsan_handle_type_mismatch+0x166/0x434 > [ ? 66.531253] ?[] ? get_lock_stats+0x1d/0x120 > [ ? 66.537667] ?[] ? ubsan_epilogue+0x8a/0x8a > [ ? 66.543985] ?[] device_del+0x6fc/0x860 > [ ? 66.549917] ?[] ? _raw_spin_unlock_irqrestore+0x42/0x70 > [ ? 66.557494] ?[] ? cleanup_glue_dir+0x140/0x140 > [ ? 66.564202] ?[] perf_pmu_unregister+0x142/0x6d0 > [ ? 66.571006] ?[] ? preempt_count_sub+0x5e/0xe0 > [ ? 66.577619] ?[] uncore_pmu_unregister+0x67/0xd0 > [ ? 66.584422] ?[] uncore_pci_remove+0x32c/0x510 > [ ? 66.591025] ?[] pci_device_remove+0xb2/0x240 > [ ? 66.597539] ?[] driver_probe_device+0x146/0xfc0 > [ ? 66.604340] ?[] ? driver_probe_device+0xfc0/0xfc0 > [ ? 66.611334] ?[] __driver_attach+0x1b5/0x230 > [ ? 66.617749] ?[] bus_for_each_dev+0x130/0x200 > [ ? 66.624264] ?[] ? do_raw_spin_trylock+0x110/0x110 > [ ? 66.631258] ?[] ? subsys_dev_iter_init+0x100/0x100 > [ ? 66.638349] ?[] ? preempt_count_sub+0x5e/0xe0 > [ ? 66.644959] ?[] driver_attach+0x42/0x70 > [ ? 66.650976] ?[] bus_add_driver+0x406/0x870 > [ ? 66.657292] ?[] driver_register+0x1a9/0x3d0 > [ ? 66.663704] ?[] ? __raw_spin_lock_init+0x32/0x120 > [ ? 66.670700] ?[] __pci_register_driver+0x1ad/0x2b0 > [ ? 66.677694] ?[] ? pci_pm_runtime_idle+0x180/0x180 > [ ? 66.684694] ?[] intel_uncore_init+0x58d/0x64c > [ ? 66.691300] ?[] ? amd_iommu_pc_init+0x16/0x344 > [ ? 66.698006] ?[] ? uncore_type_init+0x5cb/0x5cb > [ ? 66.704710] ?[] do_one_initcall+0xb7/0x2a0 > [ ? 66.711025] ?[] ? initcall_blacklisted+0x1a0/0x1a0 > [ ? 66.718116] ?[] ? up_write+0x7d/0x120 > [ ? 66.723949] ?[] ? up_read+0x40/0x40 > [ ? 66.729587] ?[] ? _raw_spin_unlock_irqrestore+0x42/0x70 > [ ? 66.737165] ?[] ? __wake_up+0x44/0x50 > [ ? 66.743000] ?[] kernel_init_freeable+0x68a/0x768 > [ ? 66.749900] ?[] ? start_kernel+0x751/0x751 > [ ? 66.756219] ?[] ? compat_start_thread+0xa0/0xa0 > [ ? 66.763013] ?[] ? rest_init+0x190/0x190 > [ ? 66.769039] ?[] kernel_init+0x13/0x140 > [ ? 66.774967] ?[] ? rest_init+0x190/0x190 > [ ? 66.780993] ?[] ret_from_fork+0x27/0x40 > [ ? 66.787019] ================================================================================ > [ ? 66.796479] kasan: CONFIG_KASAN_INLINE enabled > [ ? 66.801450] kasan: GPF could be caused by NULL-ptr deref or user memory access > [ ? 66.809525] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN > [ ? 66.817878] Modules linked in: > [ ? 66.821295] CPU: 68 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc1-lockfix+ #48 > [ ? 66.829260] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015 > [ ? 66.840618] task: ffff880e3f9d8000 task.stack: ffff880847af8000 > [ ? 66.847225] RIP: 0010:[] ?[] device_del+0x96/0x860 > [ ? 66.856076] RSP: 0000:ffff880847aff868 ?EFLAGS: 00010246 > [ ? 66.862002] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 > [ ? 66.869967] RDX: 0000000000000000 RSI: ffffffff82ea0cc0 RDI: ffffed0108f5ff06 > [ ? 66.877931] RBP: ffff880847aff920 R08: ffff880e3f9d8000 R09: 0000000000000007 > [ ? 66.885894] R10: 0000000000000000 R11: 0000000000000006 R12: ffff880844094930 > [ ? 66.893859] R13: 0000000000000001 R14: ffff880844094800 R15: ffff880844095258 > [ ? 66.901824] FS: ?0000000000000000(0000) GS:ffff880e54e00000(0000) knlGS:0000000000000000 > [ ? 66.910853] CS: ?0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ ? 66.917265] CR2: 0000000000000000 CR3: 000000000360a000 CR4: 00000000003406e0 > [ ? 66.925228] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ ? 66.933191] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ ? 66.941154] Stack: > [ ? 66.943396] ?ffffffff82c8a5d2 ffff881077f705c0 1ffff10108f5ff13 ffff880847aff920 > [ ? 66.951698] ?0000000000000000 ffffffff86d346c8 0000000041b58ab3 ffffffff8338e870 > [ ? 66.959997] ?ffffffff822413d0 ffff880e00000044 ffffffff00000000 ffff880847aff8c0 > [ ? 66.968296] Call Trace: > [ ? 66.971025] ?[] ? _raw_spin_unlock_irqrestore+0x42/0x70 > [ ? 66.978603] ?[] ? cleanup_glue_dir+0x140/0x140 > [ ? 66.985309] ?[] perf_pmu_unregister+0x142/0x6d0 > [ ? 66.992111] ?[] ? preempt_count_sub+0x5e/0xe0 > [ ? 66.998720] ?[] uncore_pmu_unregister+0x67/0xd0 > [ ? 67.005523] ?[] uncore_pci_remove+0x32c/0x510 > [ ? 67.012131] ?[] pci_device_remove+0xb2/0x240 > [ ? 67.018641] ?[] driver_probe_device+0x146/0xfc0 > [ ? 67.025442] ?[] ? driver_probe_device+0xfc0/0xfc0 > [ ? 67.032437] ?[] __driver_attach+0x1b5/0x230 > [ ? 67.038852] ?[] bus_for_each_dev+0x130/0x200 > [ ? 67.045361] ?[] ? do_raw_spin_trylock+0x110/0x110 > [ ? 67.052357] ?[] ? subsys_dev_iter_init+0x100/0x100 > [ ? 67.059450] ?[] ? preempt_count_sub+0x5e/0xe0 > [ ? 67.066056] ?[] driver_attach+0x42/0x70 > [ ? 67.072081] ?[] bus_add_driver+0x406/0x870 > [ ? 67.078397] ?[] driver_register+0x1a9/0x3d0 > [ ? 67.084809] ?[] ? __raw_spin_lock_init+0x32/0x120 > [ ? 67.091803] ?[] __pci_register_driver+0x1ad/0x2b0 > [ ? 67.098798] ?[] ? pci_pm_runtime_idle+0x180/0x180 > [ ? 67.105792] ?[] intel_uncore_init+0x58d/0x64c > [ ? 67.112399] ?[] ? amd_iommu_pc_init+0x16/0x344 > [ ? 67.119103] ?[] ? uncore_type_init+0x5cb/0x5cb > [ ? 67.125806] ?[] do_one_initcall+0xb7/0x2a0 > [ ? 67.132124] ?[] ? initcall_blacklisted+0x1a0/0x1a0 > [ ? 67.139215] ?[] ? up_write+0x7d/0x120 > [ ? 67.145046] ?[] ? up_read+0x40/0x40 > [ ? 67.150684] ?[] ? _raw_spin_unlock_irqrestore+0x42/0x70 > [ ? 67.158262] ?[] ? __wake_up+0x44/0x50 > [ ? 67.164094] ?[] kernel_init_freeable+0x68a/0x768 > [ ? 67.170992] ?[] ? start_kernel+0x751/0x751 > [ ? 67.177310] ?[] ? compat_start_thread+0xa0/0xa0 > [ ? 67.184111] ?[] ? rest_init+0x190/0x190 > [ ? 67.190137] ?[] kernel_init+0x13/0x140 > [ ? 67.196064] ?[] ? rest_init+0x190/0x190 > [ ? 67.202090] ?[] ret_from_fork+0x27/0x40 > [ ? 67.208115] Code: f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 85 ff 0f 84 69 06 00 00 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 41 06 00 00 48 8b 03 48 89 85 68 ff ff ff 48 > [ ? 67.229872] RIP ?[] device_del+0x96/0x860 > [ ? 67.236101] ?RSP > [ ? 67.240059] ---[ end trace 69358e866a1e3f6c ]--- > [ ? 67.245377] Kernel panic - not syncing: Fatal exception > [ ? 67.251271] ---[ end Kernel panic - not syncing: Fatal exception I think the reason here is that presume pmu devices are always added, but we add them only if pmu_bus_running (in perf_event_sysfs_init) is set which might happen after uncore initcall attached patch fixes the issue for me jirka --- diff --git a/kernel/events/core.c b/kernel/events/core.c index c6e47e97b33f..c2099b799d16 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -8871,8 +8871,10 @@ void perf_pmu_unregister(struct pmu *pmu) idr_remove(&pmu_idr, pmu->type); if (pmu->nr_addr_filters) device_remove_file(pmu->dev, &dev_attr_nr_addr_filters); - device_del(pmu->dev); - put_device(pmu->dev); + if (pmu_bus_running) { + device_del(pmu->dev); + put_device(pmu->dev); + } free_pmu_context(pmu); } EXPORT_SYMBOL_GPL(perf_pmu_unregister);