Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753520AbcKBM0C convert rfc822-to-8bit (ORCPT ); Wed, 2 Nov 2016 08:26:02 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:59187 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751858AbcKBM0B (ORCPT ); Wed, 2 Nov 2016 08:26:01 -0400 Date: Wed, 2 Nov 2016 13:25:57 +0100 From: Sebastian Andrzej Siewior To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, "Charles (Chas) Williams" , "M. Vefa Bicakci" Subject: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory Message-ID: <20161102122557.qs4rl6mb7n7l7j7p@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8BIT User-Agent: NeoMutt/20161014 (1.7.1) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2519 Lines: 58 After the hotplug rework Charles Williams reported that his vmware virtualized system no longer boots and crashes in rapl_cpu_online(). As it turns out topology_max_packages() reports four while topology_logical_package_id() for CPU two and three returns 65535. That means cpu_to_rapl_pmu() for those CPUs is accessing not allocated memory of rapl_pmus->pmus[]. "M. Vefa Bicakci" reported the same problem on XEN. This patch ensures we error out in such an invalid situation. Reported-by: "Charles (Chas) Williams" Tested-by: "M. Vefa Bicakci" Signed-off-by: Sebastian Andrzej Siewior --- I am not sure if this a race with the new hotplug code or something that was always there. Both (M. Vefa Bicakc and Charles) say that the box boots sometimes fine (without the patch). smp_store_boot_cpu_info() should have run before the notofoert and thus should have set the info properly. However I got the following bootlog from Charles with this patch: [ 0.017110] smpboot: APIC(0) Converting physical 0 to logical package 0 [ 0.017111] smpboot: APIC(1) Converting physical 1 to logical package 1 [ 0.017113] smpboot: Max logical packages: 2 … [ 1.995494] RAPL PMU: rapl pmu error: max package: 2 but CPU1 belongs to 65535 [ 1.995647] rapl pmu error: max package: 2 but CPU1 belongs to 65535 So it seems that the information got overwritten. I am not sure how to proceed here. That memory corruption should be found and fixed and a boot crash might motivate one to do so… I can't reproduce this on barematal. Thread starts at d40f8e3c-b332-c331-38b9-11eb4f4aaaa7@brocade.com arch/x86/events/intel/rapl.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c index 0a535cea8ff3..f5d85f2853d7 100644 --- a/arch/x86/events/intel/rapl.c +++ b/arch/x86/events/intel/rapl.c @@ -682,6 +682,15 @@ static int __init init_rapl_pmus(void) { int maxpkg = topology_max_packages(); size_t size; + unsigned int cpu; + + for_each_possible_cpu(cpu) { + if (topology_logical_package_id(cpu) >= maxpkg) { + pr_err("rapl pmu error: max package: %u but CPU%d belongs to %u\n", + maxpkg, cpu, topology_logical_package_id(cpu)); + return -EINVAL; + } + } size = sizeof(*rapl_pmus) + maxpkg * sizeof(struct rapl_pmu *); rapl_pmus = kzalloc(size, GFP_KERNEL); -- 2.10.2