Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753121AbcKHOVK (ORCPT ); Tue, 8 Nov 2016 09:21:10 -0500 Received: from mx0a-000f0801.pphosted.com ([67.231.144.122]:60724 "EHLO mx0a-000f0801.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751100AbcKHOVI (ORCPT ); Tue, 8 Nov 2016 09:21:08 -0500 Subject: Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory To: Thomas Gleixner References: <20161102122557.qs4rl6mb7n7l7j7p@linutronix.de> <24e69019-60d0-29e7-e31f-c6f00f9ed98a@brocade.com> <58e229e2-91f4-a97f-1b9f-089f48ef994a@brocade.com> CC: Sebastian Andrzej Siewior , , , "M. Vefa Bicakci" From: "Charles (Chas) Williams" Message-ID: Date: Tue, 8 Nov 2016 09:20:52 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: hq1wp-excas12.corp.brocade.com (10.70.38.22) To BRMWP-EXMB12.corp.brocade.com (172.16.59.130) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-11-08_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1611080254 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3945 Lines: 106 On 11/07/2016 03:20 PM, Thomas Gleixner wrote: > On Mon, 7 Nov 2016, Charles (Chas) Williams wrote: >> On 11/07/2016 11:19 AM, Thomas Gleixner wrote: >>> On Wed, 2 Nov 2016, Charles (Chas) Williams wrote: >>>> I don't know why the CPU's phys_proc_id is 2. >>> >>> max_physical_pkg_id gets initialized via: >>> >>> cpus = boot_cpu_data.x86_max_cores; >>> max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus); >>> >>> What's the value of boot_cpu_data.x86_max_cores and MAX_LOCAL_APIC? >> >> I have discovered that that is not the problem. smp_init_package_map() >> is calculating the physical core id using the following: >> >> for_each_present_cpu(cpu) { >> unsigned int apicid = apic->cpu_present_to_apicid(cpu); >> >> ... >> if (!topology_update_package_map(apicid, cpu)) >> continue; >> >> ... >> int topology_update_package_map(unsigned int apicid, unsigned int cpu) >> { >> unsigned int new, pkg = apicid >> >> boot_cpu_data.x86_coreid_bits; >> >> But later when the secondary CPU's are identified they use a different >> calculation using the local APIC ID from the CPU's registers: >> >> static void generic_identify(struct cpuinfo_x86 *c) >> ... >> if (c->cpuid_level >= 0x00000001) { >> c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF; >> ... >> c->phys_proc_id = c->initial_apicid; >> >> So at the end of identify_cpu() when the boot/hotplug assignment is >> put back: >> >> c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id); >> >> topology_phys_to_logical_pkg() is returning an invalid logical processor >> since one isn't configured. >> >> It's not clear to me what the right thing to do is or which is right. > > Nice detective work! So the issue is that the package mapping code honours > boot_cpu_data.x86_coreid_bit, while generic_identify does > not. boot_cpu_data.x86_coreid_bit is obviously 1 in your case. Tentative > fix below. I still need to gow through that maze and figure out what could > go wrong with that :( I don't think that will fix the issue. My deubugging leads me to believe that boot_cpu_data.x86_coreid_bits is probably 0: [ 0.016335] topology_update_package_map: apicid 0 pkg 0 cpu 0 [ 0.016398] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (ffff88023fc0a040) [ 0.016399] topology_update_package_map: apicid 1 pkg 1 cpu 1 [ 0.016462] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 1 (ffff88023fd0a040) So, I don't know where apic->cpu_present_to_apicid(cpu) is getting its apicid but it certainly doesn't seem to the match the apicid in the CPU's registers. For whatever reason, my VMware system is reporting that the second CPU has a local APIC ID of 2: [ 0.009115] identify_cpu: cpu_index 0 phys_proc_id is now 0, apicid 0, initial_apicid 0 ... [ 0.237401] identify_cpu: cpu_index 1 phys_proc_id is now 2, apicid 2, initial_apicid 2 I was thinking it might be better to call topology_update_package_map() at the bottom of identify_cpu() to setup the secondary CPU's. The boot cpu could be setup during smp_init_package_map(). topology_update_package_map() is also poorly named it can create the assignment, but can't update it (since it doesn't know the previous mapping). > 8<------------------------ > --- a/arch/x86/kernel/cpu/common.c > +++ b/arch/x86/kernel/cpu/common.c > @@ -905,6 +905,8 @@ static void detect_null_seg_behavior(str > > static void generic_identify(struct cpuinfo_x86 *c) > { > + unsigned int pkg; > + > c->extended_cpuid_level = 0; > > if (!have_cpuid_p()) > @@ -929,7 +931,8 @@ static void generic_identify(struct cpui > c->apicid = c->initial_apicid; > # endif > #endif > - c->phys_proc_id = c->initial_apicid; > + pkg = c->initial_apicid >> boot_cpu_data.x86_coreid_bits; > + c->phys_proc_id = pkg; > } > > get_model_name(c); /* Default name */ > > >