Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754633AbcJTU2C (ORCPT ); Thu, 20 Oct 2016 16:28:02 -0400 Received: from mx0a-000f0801.pphosted.com ([67.231.144.122]:59614 "EHLO mx0a-000f0801.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752476AbcJTU2A (ORCPT ); Thu, 20 Oct 2016 16:28:00 -0400 To: From: "Charles (Chas) Williams" Subject: Oops in rapl_cpu_prepare() Message-ID: Date: Thu, 20 Oct 2016 16:27:55 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.3.0 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: hq1wp-excas14.corp.brocade.com (10.70.38.103) To BRMWP-EXMB12.corp.brocade.com (172.16.59.130) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-10-20_11:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1610200353 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4011 Lines: 89 Recent 4.8 kernels have been oopsing when running under VMWare: [ 2.270203] BUG: unable to handle kernel NULL pointer dereference at 0000000000000408 [ 2.270325] IP: [] rapl_cpu_online+0x59/0x70 [ 2.270448] PGD 0 [ 2.270570] Oops: 0002 [#1] SMP [ 2.270693] Modules linked in: [ 2.270815] CPU: 2 PID: 21 Comm: cpuhp/2 Not tainted 4.8.2-1-amd64-vyatta #1 [ 2.270938] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014 [ 2.271060] task: ffff8802361fc2c0 task.stack: ffff880236208000 [ 2.271183] RIP: 0010:[] [] rapl_cpu_online+0x59/0x70 [ 2.271306] RSP: 0000:ffff88023620be68 EFLAGS: 00010246 [ 2.271428] RAX: 0000000000000004 RBX: ffff88023fd0d940 RCX: 0000000000000000 [ 2.271551] RDX: 0000000000000040 RSI: 0000000000000004 RDI: 0000000000000004 [ 2.271673] RBP: 0000000000000002 R08: fffffffffffffffc R09: 0000000000000000 [ 2.271796] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000400 [ 2.271918] R13: ffff8802361fc2c0 R14: ffff8802361fc2c0 R15: ffff8802361fc2c0 [ 2.272041] FS: 0000000000000000(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000 [ 2.272163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.272286] CR2: 0000000000000408 CR3: 0000000001a06000 CR4: 00000000000406e0 [ 2.272408] Stack: [ 2.272531] ffff88023fd0d940 0000000000000002 ffffffff81a38240 ffffffff81061231 [ 2.272654] ffff8802361fc2c0 ffff880237002180 ffffffff8107ddcf 0000000000000000 [ 2.272776] ffff8802361a5a80 ffff880237002180 ffffffff8107dcb0 ffffffff81a6a380 [ 2.272899] Call Trace: [ 2.273021] [] ? cpuhp_thread_fun+0x31/0x100 [ 2.273144] [] ? smpboot_thread_fn+0x11f/0x180 [ 2.273266] [] ? sort_range+0x20/0x20 [ 2.273389] [] ? kthread+0xca/0xe0 [ 2.273511] [] ? ret_from_fork+0x1f/0x40 [ 2.273634] [] ? kthread_park+0x50/0x50 [ 2.273757] Code: 00 00 48 83 c0 22 4c 8b 24 c1 48 c7 c0 30 a1 00 00 48 8b 14 10 e8 a8 61 26 00 3b 05 b6 56 ae 00 7c 0e f0 48 0f a [ 2.279445] RIP [] rapl_cpu_online+0x59/0x70 [ 2.279568] RSP [ 2.279690] CR2: 0000000000000408 [ 2.279813] ---[ end trace c95da920748eb432 ]--- gdb tells me: (gdb) info line *(rapl_cpu_online+0x59) Line 595 of "arch/x86/events/intel/rapl.c" starts at address 0xffffffff81012bb9 and ends at 0xffffffff81012bbe . Which is: target = cpumask_any_and(&rapl_cpu_mask, topology_core_cpumask(cpu)); if (target < nr_cpu_ids) return 0; cpumask_set_cpu(cpu, &rapl_cpu_mask); pmu->cpu = cpu; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<< return 0; This code was recently changed by commit 8b5b773d6245138c "perf/x86/intel/rapl: Convert to hotplug state machine" and it appears that the setup is done as a callback: /* * Install callbacks. Core will call them for each online cpu. */ ret = cpuhp_setup_state(CPUHP_PERF_X86_RAPL_PREP, "PERF_X86_RAPL_PREP", rapl_cpu_prepare, NULL); if (ret) goto out; ret = cpuhp_setup_state(CPUHP_AP_PERF_X86_RAPL_ONLINE, "AP_PERF_X86_RAPL_ONLINE", rapl_cpu_online, rapl_cpu_offline); Is there a particular order guaranteed by the callbacks? Will rapl_cpu_prepare() always happen before online/offline? Additionally, rapl_cpu_prepare() can fail to allocate pmu, static int rapl_cpu_prepare(unsigned int cpu) { struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu); if (pmu) return 0; pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu)); if (!pmu) return -ENOMEM; But rapl_cpu_online() would have no idea about this. What should be done in this case?